Project Rebound

FOLLOW US:

youtube

instagram

tiktok

Subscribe to our daily newsletter

Your subscription could not be saved. Please try again.

Your subscription has been successful.

By providing an email address. I agree to the Terms of Use and acknowledge that I have read the Privacy Policy.

Public Lives

Will ChatGPT ever stop learning?

By: Randy David - @inquirerdotnet

Philippine Daily Inquirer / 05:13 AM May 05, 2024

Anyone who has ever turned to ChatGPT for answers may have wondered at some point where this popular artificial intelligence (AI) tool gets all the information it provides. The simple answer is that ChatGPT voraciously scrapes all the English-language data that is to be found on the open web and synthesizes these for you. Everything that is posted on the internet—from news articles to digital books, blogs, podcasts, documentaries, computer programs, etc.—is raw material for this AI system.

The unauthorized use of such content has triggered concerns of intellectual property rights violations. Indeed, multiple lawsuits have been filed against Open AI (the developer of ChatGPT) by computer programmers, book authors, publishing companies, news organizations, and various other content producers who assert their legal rights over copyrighted works on which these powerful AI systems are trained.

But there is an equally important issue that is troubling many observers of the AI world. Given the ever-growing scale of computing power that is now available to AI systems, it is just a matter of time before all the human-generated digital material that can possibly be mined on the internet runs out. When that point is reached, what will the likes of ChatGPT do? What happens when they have exhausted all the texts on which they could continue to train?

Article continues after this advertisement

Cade Metz, a New York Times technology reporter, and author of the book “Genius Makers: The Mavericks Who Brought AI to Google, Facebook, and the World,” says that, in fact, this is already happening. To make up for this growing scarcity in raw material supply, Open AI has begun transcribing audio and video files from various platforms using a speech-recognition tool it has developed. The digital transcripts are then fed into their machine-learning systems to build newer and more powerful versions of ChatGPT.

But even this may only be a stopgap measure. In time, these AI tools may have to turn to non-English digital material on the internet and translate these into readable English text. Right now, according to Matteo Wong in an Atlantic Monthly article (April 12, 2024), “Nearly 90 percent of websites are written in just 10 languages (English, Russian, Spanish, German, French, Japanese, Turkish, Portuguese, Italian, and Persian).” It is these—and the worldviews they represent—that will dominate the world of the chatbot. Wong believes that the rest of the world’s 7,000 languages, particularly those that are not served by Google Translate, will be further pushed to the periphery.

But Metz thinks that all the amount of digital content that human beings can post on the web will never be enough to match the virtually inexhaustible computing capability and memory that is now at the disposal of AI systems. AI systems will then have no choice but to train themselves on machine-generated content or “synthetic data.” Metz believes this may be a dangerous turn for AI chatbots as it may only amplify the errors that result from their tendency to “hallucinate” or fabricate false information when they try to replicate patterns of human thinking.

Article continues after this advertisement

All this may sound surreal, but perhaps we are getting a glimpse here of a fully digitized future with all its obsession with speed. Are we humans just being alarmist? So, I asked ChatGPT itself for its own thoughts on these issues. ChatGPT opens with a denial: “It’s unlikely that large language models will run out of human-made digital data to learn from anytime soon. However, if that were to happen, the models would likely plateau in terms of their ability to learn new information. Researchers would then need to find alternative sources or develop new techniques to keep improving models.”

Article continues after this advertisement

Me: “Is so-called synthetic data one of such alternatives?”

Article continues after this advertisement

ChatGPT: “Synthetic data is artificially generated data that mimics real data but is created by algorithms. It can be used to augment existing datasets or to create entirely new data sets for training machine learning models.”

Me: “Machines learning from machine-generated data? Doesn’t that only amplify errors that may be committed in the process?”

Article continues after this advertisement

ChatGPT: “It’s crucial for researchers to carefully design and validate the algorithms generating synthetic data to minimize these risks. Additionally, combining synthetic data with real data and employing rigorous validation techniques can help mitigate these issues. Validation techniques for synthetic data can include Cross-validation, Statistical analysis, Domain expert review, Adversarial testing, and Fine-tuning.”

The chatbot offered further elaboration of these validation measures, most of which take the form of mathematical operations performed by machines. I came away feeling that what AI systems are offering is not truth as we understand it, but a model of real-world phenomena.

ChatGPT is no doubt extremely capable of responding to almost any request for information, solution, or advice. One can easily get addicted to it, mistaking it for a real person who is always there. But beware. As Simone de Beauvoir warned through one of her characters in the novel “Les belles images,” “Soon technology will seem to us like nature itself, and we will live in a completely inhuman world.”

—————-

Your subscription could not be saved. Please try again.

Your subscription has been successful.

Subscribe to our daily newsletter

By providing an email address. I agree to the Terms of Use and acknowledge that I have read the Privacy Policy.

[email protected]

READ NEXT

A win for people living with HIV

Hold on to God’s words

EDITORS' PICK

Catanduanes’ abaca farms ravaged by Pepito

House execs: 170,000 typhoon victims in Bicol get P850-M aid in BPSF

‘Case solved’: Neca Denise Lagria strangled to death after fighting attacker

Hontiveros warns gov’t officials ‘coaching’ illegal Pogos

P112M found in safes of Bataan company tagged as Pogo

PNP prolongs administrative relief of NCRPO, ACG chiefs until Nov. 22

MOST READ

Pentagon, AFP on Task Force Ayungin: US troops limited to supporting role

Sara Duterte visits detained chief of staff, spends night at House

Burial, medical aid: VP Sara Duterte says Noli de Castro did it first

Romualdez calls out VP Sara Duterte: Explain fund use yourself

newsinfo

Catanduanes’ abaca farms ravaged by Pepito

newsinfo

House execs: 170,000 typhoon victims in Bicol get P850-M aid in BPSF

cebudailynews

‘Case solved’: Neca Denise Lagria strangled to death after fighting attacker

globalnation

Hontiveros warns gov’t officials ‘coaching’ illegal Pogos

globalnation

P112M found in safes of Bataan company tagged as Pogo

newsinfo

PNP prolongs administrative relief of NCRPO, ACG chiefs until Nov. 22

www

Pentagon, AFP on Task Force Ayungin: US troops limited to supporting role

newsinfo

Sara Duterte visits detained chief of staff, spends night at House

newsinfo

Burial, medical aid: VP Sara Duterte says Noli de Castro did it first

newsinfo

Romualdez calls out VP Sara Duterte: Explain fund use yourself

sports

RESULTS: Gilas Pilipinas vs New Zealand at Fiba Asia Cup Qualifiers

newsinfo

House probe retraces bulk withdrawals of confidential funds

TAGS: AI, opinion

pdi

Your subscription could not be saved. Please try again.

Your subscription has been successful.

Subscribe to our newsletter!

By providing an email address. I agree to the Terms of Use and acknowledge that I have read the Privacy Policy.

Disclaimer: Comments do not represent the views of INQUIRER.net. We reserve the right to exclude comments which are inconsistent with our editorial standards. FULL DISCLAIMER

© Copyright 1997-2024 INQUIRER.net | All Rights Reserved

This is an information message

We use cookies to enhance your experience. By continuing, you agree to our use of cookies. Learn more here.