Several websites, including Amazon, The New York Times (NYT) and Shutterstock, have blocked OpenAI’s Web crawler from obtaining content that may enhance its artificial intelligence (AI) models.

OpenAI, the company behind ChatGPT, launched GPTBot earlier in August.

The three websites are among the six biggest websites that blocked GPTBot within the first two weeks following its launch, according to recent research from Originality.AI, a company that checks for the presence of AI content.

The other websites include Quora, CNN and wikiHow, the research revealed.

NYT’s terms of service were recently updated to make the prohibition against “the scraping of our content for AI training and development… even more clear”, according to a spokesman quoted in a report by The Guardian newspaper on Friday.

The terms of service webpage, last updated on Aug 3, indicated that NYT’s content cannot be used for “the development of any software program, including, but not limited to, training a machine learning or AI system” without its consent.

GPTBot will comb through the Internet and “may potentially be used” to improve future AI models, among other things, said OpenAI on its website.

“Allowing GPTBot to access your site can help AI models become more accurate and improve their general capabilities and safety,” it added.

The company also said that websites can choose to restrict GPTBot from accessing them by opting out either partially or entirely.

AI language models such as ChatGPT develop knowledge from vast amount of information gleaned through the Internet. The models then learn how to give correct outputs.

There have been concerns about how some Web crawlers train AI models. For instance, pirated works of some authors like Stephen King have been used to train AI tools, according to The Atlantic.



Source link

By admin

Leave a Reply

Your email address will not be published. Required fields are marked *