On August 7, OpenAI — the creator of ChatGPT — introduced a tool called GPTbot. OpenAI’s GPTbot is a web crawler that is designed to collect data from the internet to improve the accuracy and capabilities of AI models. It is a sophisticated piece of software that can navigate the web and extract information from a variety of sources, including text, images, and code. “Allowing GPTBot to access your site can help AI models become more accurate and improve their general capabilities and safety,” said OpenAI. Now, a report suggests that 15% of the top 100 websites in the world have blocked GPTbot.
According to Originality.AI, in the first 14 days since GPTBot documentation was launched almost 10% of the top 1000 websites in the world have chosen to block GPTBot. Among the websites that have blocked GPTBot are Amazon, Quora, Wikihow and several international news publications.
The report says that GPTbot was launched because OpenAI is facing an increasing number of lawsuits, some of which are related to using content without proper permission.

How does GPTbot work?

GPTbot works by first identifying potential sources of data. It does this by crawling the web and looking for websites that contain relevant information. Once a potential source has been identified, GPTbot will then extract the information from the website. This information is then stored in a database and can be used to train AI models.
The tool is able to extract information from a variety of sources, including text, images and even code. GPTbot can extract text from websites, articles, books, and other documents. GPTbot can extract information from images, such as the objects that are depicted in the image and the text that is associated with the image. Furthermore, GPTbot can extract code from websites, GitHub repositories, and other sources.
OpenAI’s ChatGPT and other generative AI tools rely on data from websites to train the models to become more efficient. A few months back — when it was still called Twitter — Elon Musk blocked OpenAI from scraping data from the social media platform.





Source link

By admin

Leave a Reply

Your email address will not be published. Required fields are marked *