Did you know that Zipf’s law applies to every writer’s word count? Zipf also applies to the word count of all the words you have ever spoken. If you have never encountered Zipf before, it basically means that you use a few words a lot and a lot of words only occasionally.
If you take some text you have written and dropped it into our Words Counted Tool, you may notice something strange – there are far more uncommon words than common ones and your average word length is probably around four or five.
The chances are any article on this site or any other would have a chart that looks roughly like this one.
This distribution of words is not random but neither is it unique to our – or any other – language. Were we to plot the frequency of word occurrence each word would appear at a rate of roughly one divided by its position. So the second most common word would occur about half as often as the first most common (usually “the”) and the next a third, and so on and so forth.
For example, Zipf’s law states that given some corpus of natural language utterances, the frequency of any word is inversely proportional to its rank in the frequency table.
Zipf’s law, Wikipedia
This distribution is known as Zipf’s law. It gives rise to things like the Pareto principle (or the 80/20 rule).
The likes of Google use Zipf to identify keywords – when a word occurs in a body of text more frequently than the collection of all writing, the chances are that text is strongly related to that word.
Other than being useful for search engines, why does Zipf seem to have such a command over our words and what – if anything – does it mean?
Why does Zipf’s law describe word count?
I’m not the least bit qualified to answer that question. Fortunately, we live in the digital age when the answers to many questions can be found in video form. The mystery of Zipf’s law and our word use is here for you viewing pleasure.
If you want to see Zipf in action, check out our Words Counted Tool.