BrowseRank Algorithm from Microsoft
The BrowseRank is an alternative to PageRank from Google, which evaluates the popularity of a page according to the number of links to this page. The BrowseRank assesses the importance of a page according to the navigation of users and their behaviour.
The authors are Chinese and most have a teaching or research position at universities in Beijing: Liu Yuting, Bin Gao, Tie-Liu Yan, Zhang Ying, Ma Zhiming, Shuyuan He, Hang Li.
The BrowseRank graph
The dots represent the pages, and ordered relations are transitions of users between the pages. In addition the time spent by users on pages is also taken into account.
The graph of navigation is more effective than the graph of links on the pages, to determine their importance.
Calculating the BrowseRank
The information on visitor behavior are obtained from the browser. They are:
- The number of visits.
- The average time spent on the page.
These data are used to construct the graph. It represents the process of random browsing by Internet users. It is assumed that when a visitor goes on a page and remains there, he implicitly vote for this page.
The algorithm then is based on a process of continuous-time Markov which applies to taking the graph as a model for determining a stationary probability distribution process that corresponds to the importance of pages.
To obtain the time spent, the difference is calculated between the time where the page is loaded and the one where the user loads another page. For the last page of the session, the average time observed to the page is applied.
The algorithm, simplified:
Input: Data on the behaviour of the user.
Output: Score of the importance of the page.
- Construction of the browsing graph.
- Estimateqii for all pages.
- Estimate the probability matrix of transitions of the EMC (embedded Markov chian) to get from it the stationary probability distribution by the method of powers.
- Calculating the stationary probability distribution.
The details of the algorithm are given by the document in reference.
Comparing with PageRank
The PageRank is based on the graph of links between pages and it assumes that the more links there are links to a page and the more important it is. It uses a discrete time Markov process on the links to evaluate their importance.
It has these disadvantages:
- Webmasters can create links farms to give importance to a page or engage in exchange of links.
- It does not take into account the time spent on pages that would be an indicator of their quality.
Criticism of BrowseRank
The criterion of time spent
It is really impossible to know if the user is reading the page, or if he has opened the browser and left to do other things.
A page short, simple, clear will be read quickly without being less important than another page longer, confuse or difficult to read. For example, the page that provides the address of a shop is reads in a few seconds and it is nevertheless very crucial!
Comparison with the PageRank
Google does not use the PageRank algorithm to determine the position in the results but an algorithm of score of pages whose the patent was filed in 2007. The PageRank is one criterion among others to determine the score of a page.
The score of pages evaluated by Google takes into account a number of criteria more important than the BrowseRank, as can be seen in the summary of the patent.
The spam by reciprocal links or farm link is now better mastered what also makes this argument somewhat unfounded.
The notion of trust
The PageRank depends not only on the number of links, but also of the relevance and the weight of links. The links from a trusted site are more important than from a small and new site.
These criteria are absent in BrowseRank.
As webmasters are trying to use the PageRank in their favor by creating links, they will try to divert the BrowseRank by running robots (scripts), which simulate human behaviour and that will stay a long time on their pages. Such scripts can run a million copies.