Google's Hummingbird PageRank Algorithm
Google ‘Hummingbird’ is the most recent update to the Google search engine, released on the event of Google’s 15th anniversary, in September 2013. It is an update to the current PageRank algorithm, which indexes webpages and uses semantic keyword search in order to provide comprehensive search results for users.
‘Hummingbird’ is the first major update since the release of the update entitled ‘Caffeine’ in 2010. ‘Hummingbird’ was introduced gradually for about a month prior to the September announcement. It incorporates new features such as a more intelligent ranking of webpages, an ability to search by asking Google a question (taking into account words such as ‘how,’ ‘why,’ ‘where,’ and ‘when’ in addition to the standard keyword search, and a better filing system to act as a base.
Google celebrated its 15th birthday on September 27th, 2013, and for the occasion, the company introduced a new PageRank Algorithm called ‘Hummingbird.’ Since the introduction of Google Search in 1998, technology has vastly improved, and there have been a variety of improvements, including the 2001 “Did you mean” feature, 2005 autocomplete feature, 2008 mobile application, 2009 voice search, 2010 Instant Search, 2010 ‘Caffeine’ update, 2011 image search, and 2012 knowledge graph. Each update was defined to increase the ease, efficiency, and/or effectiveness of a user’s search.
Prior to ‘Hummingbird,’ the existing search engine algorithm (introduced in 2010) was named ‘Caffeine,’ a metaphor for a faster and more accurate search. One of the main properties of ‘Caffeine’ was that it indexed webpages more quickly and often than previously, which allowed search results to be more accurate in real-time. The indexing storage system was updated, and the trademark Google filing system (GFS) was redone and renamed GFS2.
The motivation for the ‘Hummingbird’ update, however, was the advent voice-recognition searching. When users perform a web search using voice recognition software, they will be more likely to speak in natural phrases such as questions or sentences rather than just using keywords, as they would when typing a search into a computer. This is known as conversational search. Google thus wanted its search engine to become more friendly to such queries. When a user enters this type of full question as a query, instead of focusing on main keywords (as a basic search engine would), ‘Hummingbird’ will take into account the meaning of the search, based on the query words that the user inputs.
Theory and an Interpretation of the Google PageRank Algorithm
The core query analysis algorithm ‘Hummingbird’ relies on both Google’s Knowledge Graph as well as GFS2. The Knowledge Graph allows a user to search for something within the search engine’s knowledge base, and the database will return information that it finds, ranked in order of relevance.
A simplified interpreted version of the Google ‘Hummingbird’ PageRank Algorithm is shown below. Essentially, a webpage’s ranking is determined by analyzing the ranking of all the other webpages that link to the webpage in question. In the simplified equation, ‘n’ represents the number of pages in the web. (Note: As of 2013, there are more than 60 billion pages on the web. However, not all of these are given a separate indexing by Google.)
To determine the ranking of webpage A, we take into account the rankings of every other page on the web. The system is streamlined by dividing the PageRank of each page by the number of outgoing links on that page — C(Tn). Thus, the importance of the page is split evenly between each link on its page. A damping factor ‘d’ is also incorporated in order to proportionally decrease the influence of all of the pages in relation to the page in question. Finally, the expression (1 - d) takes care of corner cases, so that pages with no links to it will still get a ranking. This expression normalizes the sum of the other PageRanks. However, it is possible that Google is considering changing their PageRank algorithm in the future by adding a “post-spidering” action. In this phase, pages that have no links to themselves will be completely deleted from the index. Thus, these pages would not undergo the proportional normalizing with the (1 - d) expression.