Although Google wants all online content available for AI training, the New York Times clearly wants to opt out.
The Times has made numerous changes to its terms of service – all aimed at preventing AI companies from using the media organization’s content to train their systems.
Why we care. Many large language models are trained using website content (see: Search the 15.7 million websites in Google’s C4 dataset). While Google is exploring alternatives or supplemental ways of controlling crawling and indexing beyond robots.txt, many brands (e.g., Reddit) are making it clear right now they don’t want their content used to increase the products and profits for Google, Microsoft and OpenAI – at least not without compensation. You may want to consider adding some similar AI-related messaging to your website’s terms page.
What has changed. The New York Times updated its terms of service page Aug. 3. It includes AI-specific additions that apply to its content (which it defines as “including, but not limited to text, photographs, images, illustrations, designs, audio clips, video clips, ‘look and feel,’ metadata, data, or compilations”).
In the “Prohibited use of the services” section:
Will AI companies compensate publishers? OpenAI and the Associated Press signed a deal last month. OpenAI licensed the AP’s news article archive dating back to 1985 for training.
Google and the New York Times Co. already have a lucrative “commercial agreement” in place, but that deal is about working together on “tools for content distribution and subscriptions.”
Microsoft is also promising publishers some sort of revenue sharing. However, most of the benefits will apparently go to members of its Start program.
The post New York Times: Don’t use our content to train AI systems appeared first on Search Engine Land.
from Search Engine Land https://searchengineland.com/new-york-times-content-train-ai-systems-430556
via free Seo Tools