Huxley
Author: Huxley
© Huxley — an almanac about philosophy, art and science

AI REVOLUTION: What the Newspapers Didn’t Cover

AI REVOLUTION: What the Newspapers Didn’t Cover
Artistic illustration of artificial intelligence (AI). This image depicts how multimodal models interpret user input and generate output. It was created by Bakken & Baeck as part of the Visualising AI project, launched by Google DeepMind. Photo by Google DeepMind on Unsplash

 

This year promises to witness not only numerous achievements but also actual battles in the field of artificial intelligence development. Amid the informational hype, it is easy to overlook the subtle yet profound changes shaping our world.

Unfortunately, not all AI research and innovations immediately reach the public eye. For instance, the creation of a large language model that does not require vast amounts of real-world data for training has largely gone unnoticed by the media.

 

BILLIONS FOR STARGATE

 

The year 2025 has barely begun, yet many experts are already calling it a defining moment for the development of artificial intelligence technologies.

Just a day after taking office, U.S. President Donald Trump announced the launch of Stargate — a large-scale international project involving leading technology and financial companies from the United States, Japan, and the United Arab Emirates.

These nations have allocated an astonishing $500 billion for joint research, with the bulk of the funds directed toward developing the U.S. AI infrastructure.

 

DEEPSEEK: «ADVANCED AND AFFORDABLE»

 

Was it merely a coincidence or not? Just a day after the announcement of Stargate, a Hangzhou-based Chinese company specializing in artificial intelligence research unveiled DeepSeek, a new large language model (LLM).

The company demonstrated that achieving a breakthrough in this field might not require massive sums of investment in Stargate. Early tests showed that DeepSeek’s performance in chemistry and mathematics tasks matched that of o1 LLM from the American company OpenAI. However, the DeepSeek-R1 version went even further — it could handle step-by-step reasoning tasks, mimicking human-like thinking.

Remarkably, Chinese developers managed to achieve this not only at a fraction of the cost, but also by utilizing relatively modest computing power from existing LLMs. The news of an «advanced yet affordable» AI sent the stock prices of several tech companies into a sharp decline.

 

THE REVOLUTION THAT WENT UNNOTICED

 

Different visions of AI will likely shape its future development. Significant research, new data, and ambitious plans continue to emerge, yet not all of them make headlines — even when they genuinely deserve widespread public discussion.

One such groundbreaking study was featured earlier this year in the international scientific journal Nature. However, overshadowed by the politically driven mainstream media, the news went unnoticed by major outlets.

The study in question focuses on «accurate predictions based on small data using a tabular foundation model». According to one of its reviewers, Duncan McElfresh, a data engineer at Stanford Health Care, this new technology could be revolutionary for the field of data science.

 

By joining the Huxley friends club, you support philosophy, science and art

 

TRAINING ON «SYNTHETIC DATA»

 

The most well-known LLMs are pre-trained on hundreds of billions of real-world data examples, such as text and images. This allows them to respond to user queries with a certain degree of reliability. But what if you don’t have enough accurate data? Can AI still be trained to provide correct answers with a smaller dataset?

This is a key challenge for researchers using AI for predictions based on tabular datasets — which are nowhere near available in the quantities required for training models. However, scientists have found a way to achieve reliable results by training AI models not on actual data but on randomly generated «synthetic data» that mimic the statistical properties of real datasets.

 

IS THE REAL WORLD NO LONGER NEEDED?

 

The creators of this breakthrough are computer scientists Noah Hollmann, Samuel Müller, and Frank Hutter from the University of Freiburg in Germany. Their model, called TabPFN, is designed to analyze tabular data, such as those found in spreadsheets.

Typically, users manually fill rows and columns with data, using mathematical models to draw conclusions or make projections. TabPFN, however, can generate predictions from even the smallest datasets — ranging from those used in accounting and finance to highly specialized data in genomics and neuroscience.

The most astonishing aspect is that its predictions remain highly accurate despite being trained without real-world data. Instead, the model learns from 100 million randomly generated datasets. This means that TabPFN can reconstruct a relatively complete understanding of reality from just a «fragment».

 

HOW TO CRACK THE «BLACK BOX»?

 

Of course, like all other AI models, this one is not immune to inaccurate results or hallucinations. Synthetic data come with risks, making it crucial for research in this field to be reproducible so users can trust the outcomes.

The work of Hollmann and his colleagues is a prime example of how necessity drives innovation. Faced with an insufficient amount of real-world data for training their model, the researchers found an alternative approach.

However, one fact remains: all AI models, whether trained on synthetic or accurate data, still function as «black boxes». Users and regulatory bodies have little to no insight into how these models arrive at their results.

With 2025 set to bring even more exciting advancements, let’s not overlook research aimed at understanding how AI works — as well as methodological studies. These are just as important as breakthrough announcements.

 

Original research:

 


When copying materials, please place an active link to www.huxley.media
By joining the Huxley friends club, you support philosophy, science and art
Get fresh articles

Spelling error report

The following text will be sent to our editors: