Huxley
Author: Huxley
© Huxley — an almanac about philosophy, art and science

ZIPF’S LAW: How the «Mathematics of Language» Sets Us Apart from Animals

ZIPF’S LAW: How the «Mathematics of Language» Sets Us Apart from Animals
Photo by Raphael Schaller on Unsplash

 

The connection between mathematics and physical reality is one of the most intriguing problems the philosophy of science seeks to unravel. Yet, mathematics doesn’t only operate in the external world — it also shapes our speech. Most of the world’s languages follow an equation known as Zipf’s Law. And scientists have no idea why this happens.

 

THE MATHEMATICAL UNIVERSE

 

According to the Mathematical Universe Hypothesis proposed by astrophysicist and Massachusetts Institute of Technology professor Max Tegmark, our external physical reality is a mathematical structure. Of course, this is just a hypothesis, and many scientists disagree with it.

Nevertheless, even Tegmark’s opponents lack a convincing answer to why mathematics so effectively describes phenomena in the universe. We can only acknowledge that mathematics has become the foundation for describing many physical laws, even though it developed independently of physics.

Moreover, some phenomena were discovered mathematically before being observed in reality. For example, Urbain Le Verrier «calculated» the existence of Neptune long before humans visually confirmed it.

Dirac mathematically predicted the existence of positrons. Maxwell described waves that create oscillations in electric or magnetic fields. Einstein’s theory of relativity was preceded by non-Euclidean geometry, while Kepler’s descriptions of planetary orbits were anticipated by ancient Greek studies on conic sections.

In the 1960s, Nobel Prize-winning physicist Eugene Wigner wrote, «The unreasonable effectiveness of mathematics in the natural sciences is something bordering on the mystical, as there is no rational explanation for this fact».

 

ON THE EDGE OF MYSTICISM

 

Zipf’s Law also borders on mysticism, as no clear scientific explanation for it exists to this day. George Kingsley Zipf, an American scientist who worked at Harvard University, specialized in the psychobiology of language and statistical methods.

Through his research, he noticed that certain words are used much more frequently than others. Moreover, the most common word will always be used twice as often as the second most common word.

In English, for example, the most frequently used word is the. It appears twice as often as the second most frequent word, three times as often as the third, four times as often as the fourth, and so on. What’s even more astonishing is that the same pattern has been observed in other fields.

For instance, in the distribution of incomes and city sizes, the person with the highest income earns exactly twice as much as the second-richest individual. Similarly, the largest city in a country will have twice the population of the second-largest.

 

WHAT EXACTLY DID ZIPF DISCOVER?

 

Returning to language, Zipf uncovered another peculiarity. He began by assigning ranks to words: the most frequently used word was ranked 1, the next most frequent word was 2, the third 3, and so on. He then calculated the probability of encountering a word X in a text by dividing the number of occurrences of X by the total number of words in the text.

Next, multiplying the probability of X by its rank consistently yielded roughly the same value. For English, this constant is approximately 0.1, and for Russian, it’s around 0.06. How can such a discovery not provoke an existential crisis?

After all, we like to think of humans as unpredictable beings, governed by their free will, which somehow emerges from physical processes. Yet linguistic studies challenge this perception of human uniqueness.

 

By joining the Huxley friends club, you support philosophy, science and art

 

THE ZIPF-ENCODED GUTENBERG

 

It seems that Zipf’s Law applies to most of the world’s languages. It doesn’t matter whether you speak English, Hindi, French, Mandarin, or Spanish — Zipf’s Law is valid for the first 10 million words in 30 different languages. Moreover, it even holds true for languages that have yet to be deciphered.

For example, this law applies to the enigmatic Voynich Manuscript, written in the 15th century in an unknown script and an unidentified language. Zipf’s Law is not limited to spoken languages but also applies to scientific and artistic texts, whether it’s On the Origin of Species by Charles Darwin or Hamlet by William Shakespeare. However, when it comes to books, things get a bit more complicated.

Mathematicians once tested Zipf’s Law on the extensive dataset of Project Gutenberg — an online universal library comprising 31,075 books in English. They found that while Zipf’s Law doesn’t work perfectly across large datasets, it still holds true in 55% of cases.

It appears that there may be a certain class of texts that doesn’t conform to this mathematical pattern. But who’s to say that this class isn’t governed by another, yet-to-be-discovered law?

 

MORE QUESTIONS THAN ANSWER

 

In any case, Zipf’s Law is undoubtedly a non-trivial property of human language. Before its discovery, it might have seemed logical — though incorrect — to assume that all words in linguistic practice are used with roughly the same frequency. Yet, even after the law’s discovery, scientists are no closer to understanding why it exists. Instead, they now face the daunting question: why do words adhere to such a precise mathematical rule?

There are many potential explanations, ranging from statistical distortions to constraints imposed by human memory and vocabulary. George Zipf himself suggested that the law arises from a balance between minimizing the effort of both speakers and listeners.

People strive to convey meaning as efficiently as possible, favoring words that maximize information delivery. Zipf also noticed another crucial detail: the higher a word ranks on the frequency list, the shorter it tends to be. Other theories exist as well, but none have been universally accepted as satisfactory.

 

IS IT ALL ABOUT EVOLUTION

 

The foundations of language likely have deep, not fully understood evolutionary roots. For instance, by studying the behavior of macaques, chimpanzees, and dolphins, linguists have discovered that their calls and communication patterns partially adhere to principles characteristic of natural human languages.

One such principle is Menzerath’s Law, which states that longer linguistic units consist of shorter components: syllables in relatively long words tend to be shorter than those in brief words. However, things aren’t as straightforward when it comes to Zipf’s Law.

After extensive analysis, scientists had to conclude that, overall, Zipf’s Law is a uniquely human trait — it doesn’t apply to animals. In chimpanzees, only an inverse relationship noted by Zipf between the length of a gesture and its frequency was observed, and even then, it was limited to the group of the shortest gestures.

 

Original Research:

 


When copying materials, please place an active link to www.huxley.media
By joining the Huxley friends club, you support philosophy, science and art
Get fresh articles

Spelling error report

The following text will be sent to our editors: