Author: Huxleў
© Huxleў — almanac about philosophy, business, art and science.
4 minutes for reading

Artificial intelligence has been taught to look at the world through children’s eyes

Artificial intelligence has been taught to look at the world through children's eyes
Share material
Artwork: via Photoshop


New York University conducted an unusual experiment. Using a lightweight camera attached to a toddler’s head, they videotaped the toddler’s learning and education. After this visual and auditory information received from the child, scientists began to train a «from scratch» model of artificial intelligence. What did the human language training of the real and neural network infant reveal?




Artificial intelligence systems such as GPT-4 know how to «learn». But they do so on astronomical amounts of language input inaccessible to humans. Naturally, these data sets are not comparable to those that a small child receives in real life when trying to learn, understand, and speak a language.

AI learns from text with word counts in the trillions, while a child hears only a few million words a year. It is because of this difference that most scientists believed that despite all its achievements, AI was unable to provide an appropriate representation of how human learning and development actually occurs.

Researchers at New York University decided to address this unfortunate injustice and equalize the chances of the baby and AI. If a toddler learning a language cannot learn from massive data from the Internet, then the information for AI learning can be limited to the input data that the child receives from the outside world.

But can the AI model learn words and concepts that are present in a child’s everyday life? This is exactly what the scientists tried to find out during the experiment.




The learning process of one baby was recorded weekly, starting at six months of age and for the next 25 months. The result was more than 60 hours of footage that contained approximately 250,000 words, many of which were repeated.

Naturally, all of the words spoken during eating, reading, playing, etc., were linked to the visual series that the child saw and heard. Based on these videos, the researchers then began to teach an artificial intelligence system with two different modules.

The first was a visual encoder that took in individual video frames. The second was a language encoder that perceived the decoded speech addressed to the child. These two modules were combined using algorithms to form a representation of the cross-modal associations made by the input.


By joining the Huxleў friends club, you support philosophy, science and art




The point is that during parent-child communication, what is called «contrastive learning» occurs — understanding is achieved by linking visual and linguistic cues that the baby is trying to relate to each other. AI training was built on the same principle — the model was given an idea of which words should be associated with which objects.

In particular, the model was offered to relate one word to one of four variants of images. It turned out that the model, called Child’s View of Contrastive Learning (CVCL), was quite capable of doing this.

CVCL is able to learn a significant number of words and concepts present in a child’s daily life. Moreover, some of the words learned by the model are summaries of visual examples that were not present in the training. This effect is also observed in children when they are tested in the lab.




The findings, published in the latest issue of Science, suggest that the brain and neural networks are similar in many ways. Artificial intelligence can learn language at a child’s level, using data that the baby has seen and heard in the 1.5 years of life.

At the same time, we should take into account that the neural network received fragmentary experience of the child’s comprehension of the world. The video recorded only 1% of the time during which the child was awake. But even this was enough for the AI to learn the language successfully. Scientists believe that studies like this could change our understanding of children’s early acquisition of words and concepts.

Around 6 to 9 months of age, children begin to build their vocabulary by linking spoken words to their visual counterparts. What place do innate abilities, associative learning, and inductive biases specific to language have in this process?

According to one of the paper’s authors, associate professor Brenden Lake, the neural network experiment showed that by «just learning», we gain more than is commonly thought.


Original research:

  • AI Learns Through the Eyes and Ears of a Child
  • Grounded language acquisition through the eyes and ears of a single child


When copying materials, please place an active link to
By joining the Huxleў friends club, you support philosophy, science and art
Share material

Spelling error report

The following text will be sent to our editors: