Readability Formula: Learning Languages with LIX

I decided to learn Russian four years ago when, heavily pregnant, I had nothing better to do. I bought a book called Colloquial Russian off the internet and I read it. Then I turned on my computer and discovered that there were web pages in Russian. Free reading practice, there for the taking! I couldn’t understand them though, apart from the pictures I had no idea whether the content would be worth the time it would take me to translate them. I needed material I was reasonably familiar with to start with.

I went to an academic bookshop in search of their Russian section. The choice was limited, but I found a copy of “The Hobbit”. I couldn’t read it yet, I could barely read the title. Nevertheless I was encouraged by the thought that, when I got good enough, there was a whole book of material I would want to read.

But when would I be good enough? The Hobbit, as I recall, was written for children. Does that mean that it is an easy read? How many months, how many years, would it take me before I could pick it up and read it comfortably?

When I discovered LingQ I was delighted. There was a vast library of authentic material, with audio and transcript, graded by natives for ease of reading. Furthermore I could import my own reading material, and it would keep track for me of the words that I knew and the words I was meeting for the first time. I’d never heard of anything so sensible.

It didn’t help me with reading my Russian copy of “The Hobbit” though. If only there was some kind of formula, some calculation you could do, using a pencil and maybe a calculator, to tell you whether the book in your hand was written in easy, moderate or difficult language.

I remember when I trained to write technical documents. We were briefly shown something called the Gunning Fogg Index, which is a simple little formula to calculate how easy to read a document is. You have to count up, over a sample piece of a few sentences, the average sentence length, and the average number of long words (three or more syllables). Do the sums and you end up with a number. 15 means easy to read, 25 means moderately hard to read (your reader needs to have a good school education), 30 means you had better rewrite it if you want anyone else to understand it.

Maybe this would work for Russian? Although Russian seemed to have more long words and shorter sentences than English. Also (this was frustrating), although document readability statistics were built into the version of Word I was running, the program refused to recognise Russian text as language and kept returning answers of zero.

Someone I raised this with (possibly on the LingQ forum) pointed out that there were several well-known readability formulas, and they all were designed to work on English language documents only. So what do the Russians use to determine how easy to read their documents are? Even Russians didn’t know.

Googling in Russian didn’t produce any results. Maybe I wasn’t using the right keywords. Maybe I was spelling them wrong.

I did eventually find, to my surprise, a result in Swedish. It turns out that, back in the fifties and sixties, a Swede called Björnsson did exactly this. He produced a readability formula called LIX, and tested it for eleven different Western European languages. He found that, although the norms are slightly different across languages, you can use the same formula to decide whether a piece of French is easy or difficult to read, as you can for a piece of Greek. You do NOT need to be able to read French or Greek to be able to use it.

Why didn’t I know this before? Because, it seems, no-one very much was interested. Back in the sixties there wasn’t the computer power to automate the calculations, and besides, for everyone in the English-speaking world, there were already a whole handful of formulas to choose from.

Anderson, however, was interested. Despite the name, not another Swede, but an Australian academic working in educational research. He studied the LIX and published on his findings in the 1980s. In brief he found the LIX to work for English, German, French and Greek, and also proposed an alternative index: the RIX. Given that the RIX is simpler to calculate I’m guessing he didn’t have computer power back then either.

It looks like the LIX formula is exactly what I have been looking for. I take a sample paragraph from my Russian copy of “The Hobbit”, count the sentences, count the words, count the long words (seven letters or more, so I don’t even have to work the number of syllables in each word), and plug

them into this formula:

LIX = (number of words)/( number of sentences) + (number of long words ) * 100% / (number of words)

Based on this text:

“Жил-был в норе под землей хоббит. Не в какой-то там

мерзкой грязной сырой норе, где со всех сторон торчат хвосты червей и

противно пахнет плесенью, но и не в сухой песчаной голой норе, где не

на что сесть и нечего съесть. Нет, нора была хоббичья, а значит –


Words: 49 Long Words: 11

Sentences: 3 Chars: 223″

LIX = 16.3 + 22.5 = 38.8

Is that high for Russian? Well, that’s a good question. To understand these results we need to calibrate them. Ideally we would run everything in the LingQ Russian library through this formula, and come out with a table, for each level from Beginner 1 to Advanced 1, of the representative LIX ranges. Then I could say, with an air of authority:

“This Russian translation of the Hobbit in Russian is written in low Intermediate 2 level language. A good Intermediate 2 student should be able to read it comfortably.”

I don’t think I can go that far, because it would require more time and a better grasp of statistics than I have at my disposal. Nevertheless, Ilya has very kindly written me a program which should calculate the LIX for any input language. I’m currently testing it with the first chapter of each of the 7 books in J. K. Rowling’s Harry Potter series, in each LingQ language. I shall report back on the results.

6 Replies to “Readability Formula: Learning Languages with LIX”

  1. Reminds me of a "language learning" anecdote from my family. My father was an underage sailor in the US Navy during World War II and became conversationally fluent in Russian serving along side Russian ships in the North Sea. At the end of the war, he took advantage of the GI Bill to go to college, where he faced a language requirement for his degree, but lacked any genuine interest in learning a language. So, being practical in nature, he concluded that since he already could speak and understand Russian, he would make that his language.Well, it didn’t take long to discover that the Russian a sailor learns speaking with other sailors does NOT equate to the language you get tested on in an American liberal arts university. The college assigned him to an intensive, remedial Russian language course the summer after his first year in college so that he could catch up with the rest of his class.The first day of that course he was assigned a copy of War and Peace in the original Russian. They started on page one and proceeded, throughout the entire summer, to work their way through the book. By the end of the summer he was proficient enough to join the regular language courses again. But he will tell that, to this day,he has absolutely NO idea what happens in the first 500 pages of War and Peace.

  2. Thank you for your interesting post! Might I suggest though, that the formula be presented differently? To me the following looks clearer:LIX = (number of words/number of sentences) + (number of long words X 100/number of words) In any case the percentage symbol confuses the issue. But thanks again for showing us this useful tool!

  3. I forgot to add:Björnsson’s researches on newspapers:"yielded the following figures for "normal" newspapers: Swedish 17 +30 = 47, Norwegian 20 + 28 = 48, Danish 22 + 29 -= 51, English 25 + 27 = 52, French 23 + 32 = 55, German 22 + 37 = 59, Italian 30 + 35 = 65, Spanish 35 + 32 = 67, Portuguese 36 + 34 = 70, Finnish 14 + 58 = 72, and Russian 18 + 47 = 65."He neglected to analyse the works of J,K. Rowling however.

  4. Researchers into readability caution against those formulas, because they can be very misleading when working with small samples. Measures of sentence and word length provide rough estimates of readability, mostly because long words tend to be words which are less frequently used and are therefore less known. Word frequency is the most common determiner of readability. When we have ‘over-learned’ words they require very little processing. Anderson’s RIX variant of LIX is a useful tool to provide a rough guide, but as I said, it should only be used with large amounts of text and the results should be treated with caution. Basically, to get RIX, simply divide the number of words of 7 or more letters by the number of sentences. The problem with RIX can be illustrated with this example: "Almost everyone in England likes sunshine, chocolate, football, sausages, drinking, clubbing, dancing, laughing and sight-seeing." The above statement would have an extremely high RIX score of 11, which would put it at College (University) level. And yet, those words would appear familiar to many beginners.Sources:Alderson, J. (2000). Assessing reading. Cambridge: Cambridge University Press.Anderson, J. (1983). Lix and Rix: Variations on a little known reading index. Journal of Reading, Vol. 26, No. 6, pp. 490-496.

  5. Readability formulas? In the end I favour two factors. 1) My interest in and familiarity with the subject and 2) The number of unknown words. That is basically what we try to use at LingQ. I am not sure we can get much more accurate than that.

Leave a Reply

Your email address will not be published.