Overview Of The True Google's Algorithm

Google, the most visited website in the world, depends upon the quality of its search engine and all the interest is in the classification of the pages in results of queries. Most of webmasters are still believing the algo used is that of the PageRank but actually PR is only one criterion of page scoring among hundreds others, and actually the real algorithm is never achieved, the teams at Google work continuously on analysis of results to correct it and control the classification of Web pages.
A journalist, Saul Hansell, has had the occasion to spend one day with engineers of Google directly implied in the development of the algorithm, and to take part in their meeting.
You will understand why sometimes website disappear from the list of results independently of the penalty known as sandbox.

Why is the algorithm modified?

The team work is justified by complaints of companies whose site is badly classified without reason, and by their own analysis of the results. It should be known that each one of the 10000 employees at Google has a "buganizer", a tool to dispatch problems encountered in a query, and that all problems are transmitted to the team working on the algorithm.
It was noted for example that queries about "French revolution" directed on articles of election campaign because the candidates spoke about "revolution"! Correction simply consisted in this case by giving more weight to the words "French revolution" when the terms are coupled.

Which tools?

The team has a special tool named "Debug", which displays how computers evaluate each request and each Web page. One can see thus which importance the algorithm gives to links to a page, and correct it if needed.
Once the problem identified, a new mathematical formula is developed to address the case, and it is incorporated to the algorithm.

The dilemma of freshness

A crucial problem for the development team is that of freshness. Is it necessary to privilege new pages, that have better chance to reflect the actuality, or on the contrary oldest ones which already proved their quality, by the number of backlinks?
Google always privileged the last ones but recently it realized that this was not always the good choice, also it has been necessary to develop a new algorithm which determines when the user needs fresh information and when they must be stable on the contrary. That is called the QDF formula for Query Deserves Freshness.
One can determine that a subject is hot when many blogs are suddenly speaking about it, or when there is a sudden lot of queries on this subject.

Snippets

A group works on snippets. It is a manner of improving presentation of results, by extracting information about a site and by displaying it to inform users about the site before they click on the link.

A gigantic index

Google has hundreds of thousands of computers to index the billion of pages from all Web sites in the world. The goal is - apart the addition of new pages that is continuous - to be able to update the entire index within a few days!
It is important to know that datacenters store a copy of all the pages of the Web to be able to reach them quickly.

PageRank: signal and classifier

PageRank used at the beginnings of the company by Larry Page and Sergey Brin, is a score corresponding to the numbers of links on a page, which guarantees its quality. But it is now deprecated. Google now uses 200 criteria which it calls "signals". That depends at the same time on the contents of the page, and on its history, the queries and the behavior of users but all that is described in detail in the PageRank and Sandbox patent.
Beside signals on the pages and their history, Google uses classifiers on queries, whose goal is to identify the context of a search, the mind of the user who made it. For example, does one want to find a product to buy it or to get some information?

The need for diversity

Once pages were selected and classified, some should occupy the first ten positions, the best ones, but it is not finished. Google wants to add a diversity of point of view, for example blogs and commercial sites, and so page with a lower score can be moved at the top of results, the first of each category being thus promoted.

All is not said

The techniques of Google seem rather academic, with its signals and classifiers, if one compares them with the competitors as Microsoft which uses neural networks. But one does not know all, Google still preserves many secrets, wanting not to reveal to competitors all his techniques.