Google May Leverage Content for Search AI Training Regardless

Google’s search products may utilize content from publishers even if those publishers have opted out of artificial intelligence (AI) training, according to recent revelations in an ongoing antitrust case against the tech giant. A Google DeepMind executive disclosed this information during testimony, clarifying that while the AI models developed by DeepMind do not incorporate content from publishers who have opted out, the rules differ for Googleโ€™s search products. This situation raises significant questions about the implications for publishers and their control over their content.

Different Rules for AI Models and Search Products

Eli Collins, Vice President of Product at Google DeepMind, confirmed that the guidelines governing the use of publisher content differ between AI models and Googleโ€™s Search products. During the antitrust proceedings, attorney Diana Aguilar presented evidence indicating that a substantial portion of the data used to train Google’s AI modelsโ€”80 billion out of 160 billion tokensโ€”originated from content belonging to publishers who had opted out of AI training. Collins clarified that once a publisher opts out, their content is not utilized in DeepMind’s AI models.

However, the situation becomes complex when considering the Gemini AI model. Collins acknowledged that if content is integrated into the Search product, it could be used, provided the application aligns with Search functionalities. This includes features like Google’s AI Overviews and the newly launched AI Mode. Consequently, traditional opt-out mechanisms may not suffice to prevent Google from leveraging publisher content, raising concerns about the extent of control publishers have over their own material.

Implications of Google’s Privacy Policy Update

In June 2023, Google updated its privacy policy to state that it would utilize all publicly available internet data to train its language models. This definition of publicly available data encompasses any website that lacks a paywall or mandatory sign-up requirements, effectively broadening the scope of data Google can access. A spokesperson for Google explained that publishers can only prevent their data from being used in Search AI by opting out of being indexed for search. This is achieved by disabling the robots.txt web standard, which allows Google’s crawler bots to index content for search results.

However, this action has significant repercussions for publishers. By opting out of indexing, their web pages would no longer appear in Google search results, leaving them with little choice but to allow Google to train its AI models on their data. This situation highlights the challenges publishers face in protecting their content in an increasingly AI-driven landscape.

Ongoing Antitrust Case Against Google

The antitrust case against Google aims to establish whether the company holds a monopoly in the search and AI sectors. The Department of Justice is urging US District Judge Amit Mehta to compel Google to divest its Chrome browser and to disclose the data it uses to generate search results. However, no similar measures have been proposed concerning the company’s AI products. The outcome of this case could have significant implications for the future of digital content and the balance of power between tech giants and content creators. As the legal proceedings continue, the focus remains on how Googleโ€™s practices will evolve in response to regulatory scrutiny and the concerns of publishers.


Observer Voice is the one stop site for National, International news, Sports, Editorโ€™s Choice, Art/culture contents, Quotes and much more. We also cover historical contents. Historical contents includes World History, Indian History, and what happened today. The website also covers Entertainment across the India and World.

Follow Us on Twitter, Instagram, Facebook, & LinkedIn

Back to top button