Google's Use of Readability, Reading Level & Vocabulary Metrics in Search Algorithms

We do know that Google is able to determine content that is high quality – or low quality – as well as being able to determine when content is nonsensical, such as the cases with spam content. But how does Google determine it? Do they use things such as evaluating the level of vocabulary, the tone of the content, or readability? The question came up at the last Google Webmaster Office Hours with John Mueller.

What makes his answer particularly interesting is that he specifies that Google does not have anything public that they use to determine this, rather than simply saying they don’t use this as a ranking factor in their search algo for evaluating content.

First, the question:

I’m about to ask you something about related how Google calculates the quality of a content, piece of content. So what is the importance of the some metrics like fresh reading, like the length of paragraphs, the paragraphs after hearings, and the basic voice tone or for example how difficult the text is written and something like this, in this direction.

John Mueller’s response:

So from from an SEO point of view it’s probably not something that you need to focus on, in the sense that as far as I know we don’t have kind of these basic algorithms that just count words and try to figure out what the reading level is based on these existing algorithms.
But it is something that you should figure out for your audience. So that’s something where I see a lot of issues come up in that a website will be kind of talking past their audience. So maybe you’re making like – a common example is a medical site, you want to provide some medical information for the general public because you know they’re worried about this and all of your article is used like these medical words or twenty characters long. Then technically it’s all tracked and you could calculate like the reading level score of that content you come up with a number.
But it’s not a matter of Google kind of using that reading level score and saying this is good or bad but rather does it matter what the people are searching for and if nobody’s searching for those long words, then nobody’s going to find your content. Or if they do find your content they’re gonna be like, I don’t know what this means, like does anyone have an English translation for this this long word that I don’t understand and they go somewhere else to either convert or to read more, or to find more information.

Word count has long been a known factor, in that there is no “right” word count. Content only needs to be as long as needed to answer the question the content is providing. There is no algo or signal that says content needs to be over X words because there are plenty of examples of pages that rank highly, even earning a featured snippet, with as few as 50 words.

He also uses the word “basic”, which could also mean a more advanced one is being use, although it is likely just a word choice.

But the person asks for more clarification, about specific algorithms which gauge this, which prompts Mueller’s response about Google not having anything public for this.

So you don’t have any specific algorithms which calculates these metrics so something like that?

And Mueller’s response:

At least we don’t have anything public that we say this is what we do and this is what happens there.
It’s something that I know the team is still working on this so it’s not like a one-time algorithm thing and we figured it out and now it’s working forever. I know that people here in Zurich that are still working a lot trying to understand the quality of pages better and to figure out where where pages are good and what pages are bad and when to show them, where they’re relevant.

So it does confirm that there is something Google is using something algorithmically to determine the quality of content taking these factors into account, something that has been clear with the way the Google Panda algo works. Google is fairly good at determining when content is good content and when it is spun or nonsensical spam content.

There are some tools that site owners can use to try and determine these factors, although there is no way to confirm that these match – or even come close – to what Google is doing in their algo. There are readability score tools to determine how easy a piece of content is to read, as well as grade level scores to determine the reading level of content.

But there are obviously some caveats. John Mueller uses the example of medical sites targeting non-medical people who need the content to be of a lower reading level. But for a medical site targeting those in the medical profession, those sites could risk dumbing down the content too much with real life consequences, if they followed some of these reading level guidelines. So ensure you are using the right readability or grade/reading level tools for your intended audience.

Bio
Latest Posts

Jennifer Slegg

Founder & Editor at The SEM Post

Jennifer Slegg is a longtime speaker and expert in search engine marketing, working in the industry for almost 20 years. When she isn't sitting at her desk writing and working, she can be found grabbing a latte at her local Starbucks or planning her next trip to Disneyland. She regularly speaks at Pubcon, SMX, State of Search, Brighton SEO and more, and has been presenting at conferences for over a decade.

Latest posts by Jennifer Slegg (see all)

2022 Update for Google Quality Rater Guidelines – Big YMYL Updates - August 1, 2022
Google Quality Rater Guidelines: The Low Quality 2021 Update - October 19, 2021
Rethinking Affiliate Sites With Google’s Product Review Update - April 23, 2021
New Google Quality Rater Guidelines, Update Adds Emphasis on Needs Met - October 16, 2020
Google Updates Experiment Statistics for Quality Raters - October 6, 2020

Comments

C Jones says
February 1, 2018 at 6:26 am
Really interesting. John chooses his words carefully so it’s tough to decipher this without making a bunch of assumptions. We know there’s a Flesch-Kincaid Readability calculation. I would think this is partially used to calculate the content quality algo. Google could easily combine this with their inbound link data to determine who the audience is for specific content and then use this to come up with a relevancy calculation. What I don’t understand is how they come up with an accurate score if the page is designed so that the content is all on one page… maybe I’ve missed the news in this area.

M	T	W	T	F	S	S
« Aug
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

Google’s Use of Readability, Reading Level & Vocabulary Metrics in Search Algorithms

Jennifer Slegg

Latest posts by Jennifer Slegg (see all)

Sign up for our newsletter

Comments