• About Us
  • Contributors
  • Guides
  • Speaking Engagements
  • Write for The SEM Post
  • Submit a tip or contact us!
  • Newsletters

The SEM Post

Latest News About SEO, SEM, PPC & Search Engines

  • Google
  • SEO
  • Mobile
  • Local
  • Bing
  • Pay Per Click
  • Facebook
  • Twitter
  • State of the Industry
You are here: Home / Google / Getting Your Head Around Google’s RankBrain

Getting Your Head Around Google’s RankBrain

March 28, 2016 at 7:16 am PST By David Harry

  • Facebook
  • Twitter
  • Google+
  • Pinterest
  • LinkedIn
  • Email
  • WhatsApp
  • Evernote
  • SMS

david harry rankbrainRecently, the Google RankBrain system has started to garner quite the buzz within the SEO community — but to a large degree, it’s not entirely being understood. We’ve even had some somewhat link-baity post titles that didn’t help things either. And of course, Google didn’t do itself any favors including the word “rank” in the name.

So, let’s start with a statement by Google’s Gary Illyes:

” Lemme try one last time: Rankbrain lets us understand queries better. No affect on crawling nor indexing or replace anything in ranking” – via Twitter 

RankBrain1

The core, from what we understand, is more about better assessment of queries and the classifications therein. Add to this the potential use of similar concepts to better understand the words, phrases and concepts on a given page, and the outcome should be better results. But for now, let’s take a step back and look at another Google project: Word2Vec.

Google’s Word2Vec Project

Essentially, without getting too crazy with it, the Word2Vec project was not a single algorithm, but based from two models: skip-gram and CBOW (continuous bag of words) which are considered “shallow neural models.” The skip-gram is used to predict neighbouring words to a given word while the CBOW tries to predict the current word, based on the neighbouring words. Somewhat opposite but they have the potential to correlate each other.

In fact, much of the conceptual approach was to trade in complexity (in computing models) for efficiency. But let’s not consider this within that simplistic explanation. There’s nothing simple in the various neural network research I’ve read on this. I just wanted to get the basics out of the way before we move onto the meat of the moment.

Another related concept I came across a lot was Distributional Semantics; which essentially can break down into words that co-occur regularly, often tend to share aspects of semantic meaning. There’s also a patent that was awarded last year that is seemingly for W2V since the authors are mostly the same people that worked on the original project.

Hopefully that helps better understand where we’re headed with that.

Probable patent: Computing numeric representations of words in a high-dimensional space

Is Word2Vec really RankBrain?

As with all things Google, it’s hard to say. But again, many of the folks working on RankBrain also worked on the Word2Vec project, and even some of the descriptions of RankBrain are nearly identical to those for the W2V project such as:

The word2vec tool takes a text corpus as input and produces the word vectors as output. It first constructs a vocabulary from the training text data and then learns vector representation of words. The resulting word vector file can be used as features in many natural language processing and machine learning applications.

We also know that RankBrain is “one of the ‘hundreds’ of signals that go into an algorithm that determines what results appear on a Google search page, and where they are ranked,” as stated by Google senior research scientist Greg Corrado in a statement to Bloomberg. The term “signal” is telling to me —  it implies it is entirely possible that RankBrain does no direct scoring whatsoever.

Furthermore, most of this machine learning artificial intelligence (AI) seems to be geared towards understanding queries, not really web pages per se. As mentioned in a Daily Mail article;

For example, a user searching for ‘What is low in the army’ is a hard query for the AI to resolve, but RankBrain would interpret it to mean ‘What is low rank in the army’.

Another example shows the search engine may have previous struggles with ‘Why are PDFs so weak?’, taking the search literally and returning results of PDFs containing the word ‘weak’.

Whereas now, RankBrain might interpret it as a question about the security of PDF files, and so returns a better first result.

And so I do have to believe that RankBrain and Word2Vec are related to some degree. For those of us that have studied patents, papers and research projects from Google over the years, both RankBrain and Word2Vec deal with the never ending struggle of semantic analysis; both seek a better understanding of words, phrases and how they relate to each other.

RankBrain2

Word Vector Concepts

In the past, semantic analysis approaches would use a simpler word relationship model based on singular relations. The belief with the vector approach is that when trained on larger data-sets with larger dimensionality, it should be more effective. By utilizing 10 examples to form a relationship vector instead of one, semantic accuracy increases.

Again, in the past, semantic analysis was somewhat more simplistic. The belief now is that by using numerical representations of words, they can better predict relationships of words.

In another example, consider that the phrase “Company executive” isn’t necessarily intuitive, but the word vector model would find closest vector relations for:

  • Steve Ballmer – Microsoft
  • Samuel J. Palmisano – IBM
  • Larry Page – Google
  • Werner Vogels – Amazon

The main thing to consider is that RankBrain probably isn’t directly ranking anything. It’s trying to better understand concepts related to the words in a query (and presumably on pages it might return). It is also about improving the efficiency of the algorithms therein.

For more, please see my coverage of the Word2vec patent here.

The Artificial Intelligence Angle

Another piece to this puzzle that I came across was the 2011 Google Brain project, which also had some common players in the form of Jeff Dean, Geoffrey Hinton and Greg Corrado.

Google Brain was a deep learning / artificial intelligence research project that may indeed also be part of the evolution of what we know as RankBrain today. As always, it’s hard to say.

Google later acquired Deep Mind, an artificial intelligence company, in 2014. This was considered a talent acquisition at the time, and in retrospect might be much more of an AI tech and research play. This may also have been something that has played a role in the evolution of things.

The Hummingbird connection

I also think that it bears noting that this move is also somewhat in line with Google’s 2013 initiative dubbed “Hummingbird.” At the time, we knew that this new search platform was, among other things, more focused on dealing with queries and natural language processing. This is seemingly being addressed and expanded upon with RankBrain.

As my good friend Bill Slawski penned at the time:

[Hummingbird is] being presented as a query expansion or broadening approach which can better understand longer natural language queries, like the ones that people might speak instead of shorter keyword matching queries which someone might type into a search box.

And as Danny Sullivan reported last fall, in regards to where RankBrain fits in with Hummingbird,

Hummingbird is the overall search algorithm, just like a car has an overall engine in it. The engine itself may be made up of various parts, such as an oil filter, a fuel pump, a radiator and so on. In the same way, Hummingbird encompasses various parts, with RankBrain being one of the newest.

They also might be using it to further improve the Knowledge Graph as well, as noted in this paper (PDF);

Our ongoing work shows that the word vectors can be successfully applied to automatic extension of facts in Knowledge Bases, and also for verification of correctness of existing facts.

From what we understand, RankBrain is actually part of the overall Hummingbird algorithm.

I would also venture a guess that both of these are also in play with an eye towards the future, which is “conversational search” for mobile. Surely they want to be the one that gets that right and leads the way forward.

RankBrain3

Does it rank?

This aspect of RankBrain has been more than a bit confusing as well. Obviously, with the word “rank” in the name, it certainly implies such. Also, in talking with Danny Sullivan while crafting this piece, he assured me that “Google has specifically said it is also a ranking factor in addition to query analysis.”

But some statements from Google have seemingly contradicted this claim, including the one from Gary Illyes at the top of this post. So, let’s dig in…

Greg Corrado said in the original Bloomberg article that “RankBrain is one of the ‘hundreds’ of signals that go into an algorithm that determines what results appear on a Google search page and where they are ranked,” which of course further implies that it’s used in rankings (and/or re-ranking).

Certainly, if it’s not a scoring mechanism directly, the signals that it feeds could be used in re-ranking of results (boosting and dampening).

Corrado also mentioned that “RankBrain has become the third-most important signal contributing to the result of a search query.” I am not entirely sure if that is in terms of an actual scoring mechanism? Or that it’s importance is related to the value of better understanding a query and the pages returned in the results. Hard to say.

In short, yes. It would seem, directly or indirectly, that RankBrain is affecting rankings.

Let’s get this clear

So hopefully by now you’re starting to get the idea that RankBrain;

  • Is for dealing with unknown or unclear queries
  • Is for understanding and predicting related concepts of words
  • Is an important signal, but in the sense of serving better results to a query
  • It MIGHT be helping to better classify pages
  • Is re-ranking based on new classifications
  • May NOT be a direct ranking factor
  • Does NOT affect crawling nor indexing
  • Is NOT affecting other traditional ranking/scoring processes
  • It does NOT use machine learning on-the-fly (it’s a push/retraining process)

I have done some exhaustive research into this and there is still a TON of varied opinions floating around the SEO space, and many of them well off base. When Google said it is the “third most important signal,” they seem to mean that it’s a massive part of improving query classification/refinement, which in turn vastly improves search quality. Not all “signals” are scoring mechanisms (ranking factors, for you SEO-types). How it is actually affecting rankings is still unknown. Different Googlers have said different things about that.

For the record, content and links being the other two.

What should I do to leverage RankBrain?

That’s the $64 000 question now isn’t it? Oddly enough, I have been telling people that “if it ain’t broke, don’t fix it” and what I mean by that is if you’re still growing traffic month-over-month with your SEO and marketing efforts, then I wouldn’t get worked up. Apparently Amit Singhal gave the green light for RankBrain to be rolled out back in early 2015. And subsequent statements in the fall of 2015 said it had been active in the wild for a few months.

In short… this isn’t actually something ‘new‘ that we’re dealing with.

And of course all that being said, this is still more about better classification and handling of queries on Google’s end. If anything, it can potentially lessen the need to be overly concerned about what terms you’ve seeded on a page. They can, in theory, better understand the concepts on that page, regardless if that exact word or term is in the query that the user puts into Google.

I’ve long encouraged SEOs and content creators to be more focused on solidifying concepts and entities, than on exact match targeting approaches. So, my advice really won’t be changing in light of RankBrain.

More RankBrain Reading

  • RankBrain: Everything We Know About Google’s AI Algorithm
  • Investigating Google RankBrain and Query Term Substitutions
  • Google’s Gary Illyes On RankBrain Replacing Links & Other Ranking Factors
  • RankBrain Does Not Impact Site’s Crawl Budget
  • RankBrain is For Understanding Queries, Not for Crawling or Replacing Ranking
  • How RankBrain Changes Entity Search

More reading on Word2Vec

  • Wiki listing;
  • Insights on Word2Vec
  • word2vec Parameter Learning Explained (PDF)

Other Reading

  • Efficient estimation of word representations in vector space
  • Distributed Representations of Words and Phrases
    and their Compositionality
    (pdf) Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean.
  • “Efficient Estimation of Word Representations in Vector Space”  (PDF) Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean.

Videos worth watching

Notable people

  • Jeffery Dean
  • Tomas Mikolov – (more here)
  • Kai Chen
  • Gregory Corrado;
  • Paul Haahr 
  • Thomas Strohmann
  • Facebook
  • Twitter
  • Google+
  • Pinterest
  • LinkedIn
  • Email
  • WhatsApp
  • Evernote
  • SMS
The following two tabs change content below.
  • Bio
  • Latest Posts
My Twitter profile

David Harry

SEO Consultant at Verve Developments
David is a SEO consultant with Verve Developments with more than 18yrs experience. He is also one of the founders of the SEO Training Dojo, a community for professionals.
My Twitter profile

Latest posts by David Harry (see all)

  • How Hackers are Hiding Content & Links via PNG Files - January 25, 2018
  • Getting Your Head Around Google’s RankBrain - March 28, 2016

Filed Under: Google, Google RankBrain, SEO

Sign up for our newsletter


Comments

  1. Tony Dimmock says

    March 29, 2016 at 2:50 am

    Superb study, commentary and resources David. Thanks for setting the record straight – let’s hope there’s an end to click-bait RB posts now.

    I’ll be referencing this post with fellow SEO’s and clients in the future.

    Another research paper here: https://www.cs.ubc.ca/~murphyk/Papers/kv-kdd14.pdf “Knowledge Vault: A Web-Scale Approach to Probabilistic Knowledge Fusion” includes Thomas Strohmann and mentions the following:

    at point 4.2 (Neural Network Model (MLP):

    “To illustrate that the neural network model learns a meaningful “semantic” representation of the entities and predicates, we can compute the nearest neighbors of various items in the a K-dimensional space. It is known from previous work (e.g., [27]) that related entities cluster together in the space, so here we focus on predicates. The results are shown in Table 4. We see that the model learns to put semantically related (but not necessarily similar) predicates near each other. For example, we see that the closest predicates (in the ~w embedding space) to the ’children’ predicate are ’parents’, ’spouse’ and ’birth-place’.”

    and point 4.3 (Fusing the Priors):

    “We can combine the different priors together using the fusion method described in Section 3.2. The only difference is the features that we use, since we no longer have any extractions. Instead, the feature vector contains the vector of confidence values from each prior system, plus indicator values specifying if the prior was able to predict or not. (This lets us distinguish a missing prediction from a prediction score of 0.0.) We train a boosted classifier using these signals, and calibrate it with Platt Scaling, as before.”

    Both comments are possibly key considerations (or a front-runner) to the development of RankBrain (note this paper was published in 2014).

    Again, thanks for being as detailed as needed 🙂

  2. Paulo says

    March 29, 2016 at 3:57 am

    congratulations! great post and very complete argumentation.

    how do you think that RankBrain information is served right now? Humanly, with some team inserting info on a database, or algorithmcally, with AI being tested?

  3. Nikolay Stoyanov says

    April 21, 2016 at 5:02 am

    Awesome article David! I read it like three times already. However, as it is with RankBrain, there is still a lot more questions to be asked. What are your thoughts on its impact on on-page SEO? I mean, they told us that it is one of thee most important ranking signals. But, will it change the way articles are written?

Trackbacks

  1. SearchCap: Google AdWords revamp, Google Maps iOS Spotlight & more says:
    March 28, 2016 at 1:01 pm

    […] Getting Your Head Around Google’s RankBrain, thesempost.com […]

Founder & Editor

Jennifer Slegg (2052)

Sign up for our daily news recap & weekly newsletter.


Follow us online

  • Facebook
  • Google+
  • Linkedin
  • Pinterest
  • Twitter

Latest News

2022 Update for Google Quality Rater Guidelines – Big YMYL Updates

We finally have the first Google Quality Rater Guidelines update of 2022, and like usual, it is … [Read More...]

Recent Posts

  • 2022 Update for Google Quality Rater Guidelines – Big YMYL Updates
  • Google Quality Rater Guidelines: The Low Quality 2021 Update
  • Rethinking Affiliate Sites With Google’s Product Review Update
  • New Google Quality Rater Guidelines, Update Adds Emphasis on Needs Met
  • Google Updates Experiment Statistics for Quality Raters
  • Analyzing “How Google Search Works” Changes from Google
  • Google Quality Rater Guidelines Update: New Introduction, Rater Bias & Political Affiliations
  • Google Updates Quality Rater Guidelines: Reputation for News Sites; Video Content Updates; Quality for Information Sites
  • Google Makes Major Changes to NoFollow, Adds Sponsored & UGC Tags
  • Google Updates Quality Rater Guidelines Targeting E-A-T, Page Quality & Interstitials

Categories

  • Affiliate Marketing
  • Amazon
  • Apple
  • Bing
  • Branding
  • Browsers
  • Chrome
  • Content Marketing
  • Design
  • Domains
  • DuckDuckGo
  • Email
  • Facebook
  • Firefox
  • Foursquare
  • Google
    • Analytics
    • Google RankBrain
    • Quality Rater's Guidelines
  • History of Search
  • Industry Spotlight
  • Instagram
  • Internet Explorer
  • Links
  • Local
  • Mobile
  • Native Advertising
  • Other Search Engines
  • Pay Per Click
  • Pinterest
  • Publishers
  • Security
  • SEO
  • Snapchat
  • Social Media
  • State of the Industry
  • The SEM Post
  • Tools
  • Twitter
  • Uncategorized
  • User Experience
  • Video Marketing
  • Week in Review
  • Whitepapers
  • Wordpress
  • Yahoo
  • Yelp
  • YouTube
March 2023
MTWTFSS
« Aug  
 12345
6789101112
13141516171819
20212223242526
2728293031 

Meta

  • Log in
  • Entries RSS
  • Comments RSS
  • WordPress.org

Copyright © 2023 · News Pro Theme On Genesis Framework · WordPress · Log in