1.4 C
Niagara Falls
Wednesday, December 3, 2025
Dr. Brown: Want to know how ChatGPT works? Create a simpler model
BabyGPT had no idea what letters, never mind words or grammar, were. Rather, it started from scratch with only algorithms designed to look for patterns in whatever characters were presented.  UNSPLASH

Last fall was a big year for AI at the Niagara-on-the-Lake Public Library, beginning with a several-week course on artificial intelligence, including exercises with ChatGPT in September, followed in October by reviews of the 2024 Nobel Prizes in physics and chemistry, both of which were closely tied to AI.

In the case of the physics prize, the two laureates played key roles in the early development of large language programs and for the chemistry prize, the three laureates focused on the development of software designed to decipher the 3D structure of proteins based on their amino acid sequences and developing entirely novel proteins.

But the Nobel committee’s press release, short summary and extended scientific papers weren’t all that helpful when it came to describing what AI was and explaining what jargon terms such as neural networks, large language models and other technical terms actually mean in layman’s terms, leaving me to wonder if the writers were as perplexed as the rest of us.

Then, I discovered a gem of an article written by Aatish Bhatia, which appeared in the New York Times on April 27, 2023, with the intriguing title: “Watch an AI Learn to Write by Reading Nothing but Jane Austen.”

To keep matters as simple as possible, Bhatia chose a laptop computer loaded with a basic form of ChatGPT, called BabyGPT, which intentionally was not loaded with the usual reams of data but left blank except for core large language algorithms that mimic written language.

Sounds simple, but as Bhatia stated, “The inner workings of these algorithms are notoriously opaque,” even if “the basic ideas behind them are surprisingly simple.”

That’s what I’ve always suspected, but it was comforting to realize I’m not the only dummy who struggles to understand what those algorithms are doing.

ChatGPT is usually trained by trolling through mountains of internet text and other sources and repeatedly guessing the next few letters and grading themselves against the real thing.

For the purposes of this illustrative study, the computer was loaded with core large language algorithms but no reams of data to train on.

Hence, starting out, BabyGPT had no idea what letters, never mind words or grammar, were. Rather, BabyGPT started from scratch with only algorithms designed to look for patterns in whatever characters were presented.

What was presented to BabyGPT for this illustrative experiment?

There were six options from which the operator could choose, all small, such as the complete works of Jane Austen or William Shakespeare, or other similar-sized databases — each with no more than a few megabytes, compared to the usual many terabytes of internet text.

For Jane Austen, the exercise began with a standard prompt: “You must decide for yourself,” said Elizabeth.

Before training with the Jane Austen database, BabyGPT produced a gibberish of letters, punctuation, capitalized or not letters, but no meaningful text.

After 250 rounds — about 30 seconds of processing time on a laptop — a few simple words emerged such as “the, us, all, he and be,” but otherwise all gibberish, random punctuations and capitalizations.

After 500 rounds — about a minute of laptop processing — more complex words appeared, such as “she, refer, was, prove, and what,” and some punctuation such as periods in their right place, sometimes.

After 5,000 rounds — about ten minutes of processing time — the program produced longer words, made fewer spelling mistakes and grammar improved, but sentences still made no sense.

Here, Bhatia pauses in his article to explain what BabyGPT is actually doing.

To quote, “BabyGPT is an extremely complicated mathematical function involving millions of numbers that converts a sequence of letters, in this case, into an output — predication of the next letter. And with every round of training the algorithm adjusts the numbers to improve its guesses – thus learning. What the algorithm generates is not letters but probabilities and why we get a different answer each time a response is generated.”

After 30,000 rounds and one hour of training, full sentences finally emerge but still don’t make sense and at this point, BabyGPT reached its best.

Further improvements would require expanding the database on which the computer algorithm trains.

That’s why this simple model progressed so far and no further; it needed a far larger database from which to learn and improve.

Even so, as the author pointed out, “in just an hour of training on a laptop, a language model evolved from generating random characters to a crude approximation of language.”

For comparison, babies take many months, even a few years, to learn a language.

Depending on the version, ChatGPT was trained on millions to trillions of times more data than was used in this simple model. That’s where ChatGPT gets its information and power.

ChatGPT and similar devices also need very complex mathematical algorithms to analyze and make sense of the data. Sometimes programs can find and even develop novel algorithms on their own, which improve processing without human input.

That’s amazing and reflected in the development sometimes of new abilities unanticipated by their human creators and surely a sign of intelligence.

In the two years since Bhatia’s article was published in the New York Times, the talents of AI have grown exponentially with traditional silicon chip computers.

Imagine the giant steps forward when quantum computing matures, which, theoretically, can handle far more and more complex data than silicon chips, which are close to reaching their limits.

Will AI exceed human intelligence? For many tasks, it already has and will soon possess intelligence exceeding human intelligence in all spheres because AI continues to improve at breakneck speed and human intelligence is stuck with a brain not noticeably improving.

That’s the point Steven Weinberg, a Nobel-winning particle physicist, made a few years ago — humans may have reached the limit of individual and beyond which evolving computers will be necessary, as they already are in analyzing complex astronomical data gathered from the latest generation of telescopes.

Dr. William Brown is a professor of neurology at McMaster University and co-founder of the InfoHealth series at the Niagara-on-the-Lake Public Library.

Subscribe to our mailing list