PageRank and sandbox

This document is established from an patent made by Google in April 2007 and summarize it. It explains in detail how is assigned to each page of a website the score which will determine its position in results of the search engine. All the criteria which determine the rank of a page are analyzed and consequently the reasons which cause the sandbox effect are revealed.

The date of the document

The date is important to assign a PageRank. To determine the date of a document, several methods are possible, this can be the date of the indexing, or the date at which a backlink is placed to the page.
If the number of links on a page increases more quickly than for an older page, that will give a better PakeRank, but that can also signal spamming.
If a document is more recent than the average of the pages in a result, one can assign a better PageRank to him to improve his position in order to take account of his novelty.

Evolution of the contents of the page

The score is not the same one according to whether the contents of the document are often changed or not.
To determine the changes, one can store the whole document, or a signature which represents it in short, or a part considered to be essential to this document.
The score can be positive or negative according to these changes.

Analyzis of requests and clicks on results

One can take into account the way in which a document is selected among the results of a request.
So when certain terms appear more frequently in the requests of users, a document associated to these terms (containing them or having backlinks which contains them) will have a better score.
If a document often answers similar requests, this document will obtain a better score.
Account will be taken of the fact that certain requests are maintained in time while the pages which answer it are not the same ones (in sport results for example). The score decrease if the document does not answer the request any more.
In certain domains, like a FAQ, the innovation of a document is important and improves the score.
However if users click on the link of an older document and are unaware of most recent ones, this document will have a better score.
A document which more often appears in the requests on a topic, but less as soon the topic is restrained, will have a less score (for example the topic can be a sport and it is restrained to a precise sporting club).
If a document appears in requests without relationship between them, that signals a spam and the score is reduced.

Links to the page

The appearance of backlinks and their disappearance is taken into account for the PakeRank.
If the appearance of new backlinks is reduced with time, that means that the document becomes staled, its score is reduced.
But conversely if this number tends to progress it will have a better score.
If the contents of a document are modified, but that the link which it holds to another page is maintained, that adds value to this link and thus increases the score of the dependent page.
The value of links increases if they are trusted, which is the case for example for governmental sites.
The speed of appearance of backlinks signals spam. It is supposed that the pages of a given type attract the links according to a given speed. So when too much backlinks appear, that implies an exchange or purchase of links, or pages of free inscription (such as directories) and that is spam.

Text of anchors

The modification of the text of anchors means that there was an update of the document.
If the text changes and differs from the wording of the anchors, that means a rebuild of the document, and the fact that it is not relevant any more with the anchors, which is not desirable.
One can from that determining the date when a domain changes the topic and the links former to the date will be ignored.
If the document knows minor changes, it is better to preserve the wording of the anchors, their seniority means for relevance.

Traffic on the page

If traffic, in other words the number of readings of a page decrease to a significant degree, that means that the document is staled. Comparisons are made over time and the periods to estimate the decrease of the traffic.
The traffic coming from advertisements is taken into account. If advertisements are placed about other sites with strong traffic, then the page will have a better PakeRank than with advertisements for minor sites.

Behavior of visitors

The number of times a page is selected in results of requests, as well as time spent to reach the page are taken into account.
According to whether the visitor spends more or less time on a page, this one will be regarded as relevant or staled. If the visitors spend less and less time on a page with time, it will be regarded as staled.

Informations on the domain name

Hosting is taken into account, Intranet, Internet or network of databases of documents.
Recent domains can be used by spammers and thus are regarded as less legitimate.
The data of the DNS, the owner of the domain, contacts, DNS addresses, are taken into account. Frequent changes are signs of spam. IP and other data used for these ephemeral sites are recorded in a database as well as the associated documents.
The DNS is better considered if it refers various domains and different registrars. It is bad if it hosts porn sites, sites of spams, domains containing commercial words.
The PakeRank of a page depends on the domain and its hosting.

Previous ranks

The previous ranks are taken into account. The number of positions which a document gains in a given time modifies its score. However if a rank remains high whereas the positions tend to change with time on a subject, that indicates a commercial topic and a stronger probability of spam.
If the number of selections for a page tends to increase, or if the selections are more frequent, the page will have a better score.
The engine takes in account spike in the rank of documents, typically meaning for spam. To make the difference, various factors are taken into account. A document evoked in news for example, is not a spam.
Contrary, a sudden fall of the rank of a document indicates that it is staled.
In conclusion, the evolution of the rank of a document influences its score and its future rank.

Bookmarks

The bookmarks and other data of this type influence the PageRank of a document. The fact of being added or of being removed of this type of list is taken into account. The fact that one often select the document in the list influences too.
Memory cache, temporary directories are taken into account, as well as cookies. All that indicates if a document is consulted or if net surfers ignores it.

Unique words and anchors

The frequency of a single word or a sentence in anchors is taken into account in relation to the links to the linked page.
If anchors are suspect, in particular because there are many occurrence of unique words in different documents, that will have an impact on the score of these documents and those which link them.

Unrelated links

Unrelated backlinks and outgoing unrelated links are an indicator of spam and cause a drop in the PakeRank of the page.

Topic of the page

It is used to determine its PakeRank.
The topic of a page is determinated from rare words, the URL, the synopsis, the contents, etc.
If the topic of a set of documents changes, that indicates a new owner or a different topic for the site and all information on the page become out-of-date. Or that means that the page is used to make spam.