BrowseRank algorithm from Microsoft

The BrowseRank is an alternative to PageRank from Google, which evaluates the popularity of a page according to the number of links to this page. The BrowseRank assesses the importance of a page according to the navigation of users and their behaviour.

The authors are Chinese and most have a teaching or research position at universities in Beijing: Liu Yuting, Bin Gao, Tie-Liu Yan, Zhang Ying, Ma Zhiming, Shuyuan He, Hang Li.

Calculating the BrowseRank

The graph of navigation: dots represent the pages, and ordered relations are transitions of users between the pages. In addition the time spent by users on pages is also taken into account.
It is more effective than the graph of links on the pages, to determine their importance.

The information on visitor behavior are obtained from the browser. They are:

These data are used to construct the graph. It represents the process of random browsing by Internet users. It is assumed that when a visitor goes on a page and remains there, he implicitly vote for this page.

The algorithm then is based on a process of continuous-time Markov which applies to taking the graph as a model for determining a stationary probability distribution process that corresponds to the importance of pages.
To obtain the time spent, the difference is calculated between the time where the page is loaded and the one where the user loads another page.  For the last page of the session, the average time observed to the page is applied.

The algorithm simplified

Input: Data on the behaviour of the user.
Output: Score of the importance of the page.

  1. Construction of the browsing graph.
  2. Estimateqii for all pages.
  3. Estimate the probability matrix of transitions of the EMC (embedded Markov chian) to get from it the stationary  probability distribution by the method of powers.
  4. Calculating the stationary probability distribution.

The details of the algorithm are given by the document in reference.

Comparing with PageRank

The PageRank is based on the graph of links between pages and it assumes that the more links there are links to a page and the more important it is. It uses a  discrete time Markov process on the links to evaluate their importance.
It has these disadvantages:

Google does not use only the PageRank algorithm to determine the position in the results but an algorithm of score of pages whose the patent was filed in 2007. The PageRank is one criterion among others to determine the score of a page.

Drawbacks of the BrowseRank:

The criterion of time spent

It is really impossible to know if the user is reading the page, or if he has opened the browser and left to do other things.
A page short, simple, clear will be read quickly without being less important than another page longer, confuse or difficult to read. For example, the page that provides the address of a shop is reads in a few seconds and it is nevertheless very crucial!

PageRank can evaluate link quality

The score of pages  evaluated by Google takes into account a number of criteria more important than the BrowseRank, as can be seen in the summary of the patent.
The spam by reciprocal links or farm link is now better mastered what also makes this argument somewhat unfounded.

The notion of trust

The PageRank depends not only on the number of links, but also of the relevance and the weight of links. The links from a trusted site are more important than from a small and new site.
These criteria are absent in BrowseRank for which in principle all clicks are equal.

Spam uncontrollable

As webmasters are trying to use the PageRank in their favor by creating links, they will try to divert the BrowseRank by running robots (scripts), which simulate human behaviour and that will stay a long time on their pages. Such scripts can run in million of copies.