Foundations of ranking algorithms of search engines
How are determined
results by search engines, including Google, Yahoo!, Bing?
Indeed, the SERPs (Search Engine Result Pages) are often misunderstood, prone to misinterpretation and hasty conclusions based on partial elements.
This study was based on documents provided by Google and analyzed by experts.
- There is a hierarchy of criteria
- Criteria for relevance
- Keywords in content
- Domain, directories and file names
- External links
- Panda and internal links
- Criteria for ranking
- Clicks on results
- Bounce factor
- Time spent on documentss
- Penguin and inbound links
There is a hierarchy of criteria
The results are built in two steps:
- The pages are first selected according to their relevance to queries.
- Then the selected results are ordered according to the score of the page.
Which includes the PageRank but also other criteria.
This hierarchy of criteria explains that a page with a high PageRank can be placed after other pages of lower PR, in the SERP.
Many webmasters concentrate all their efforts on positioning and forget the main part, relevance.
Criteria for relevance
As a first step, engines select pages that meet the search of the Net surfer, thus containing the keywords he has entered.
Keywords in content
The keywords in the page are used to determine the relevance to a request made by a Net surfer. Some words have more weight than others: those who are placed in title tags <h1>, <h2>, etc. those highlighted by the <strong> tag and those at top of the page.
The anchor is the text of a link to another page. The keywords in the anchor are used to determine the relevance of a page as well as keywords in the page.
"Anchors often provide more accurate descriptions of web pages than the pages themselves." Reference.
The language of the page is an essential criterion, for example french pages
are first selected by Google.fr, but lack of results makes pages in English
may also occur.
The ccTLD, the domain name extension promotes a site for search made by Internet users in this country. This is clear in the answers of Google to webmasters.
The fact that a domain name is associated with an IP address in a country improves the position because of proximity, which plays mainly on sales of articles. Some hosts offer the geolocation of the domain name for business.
Domain, directories and file names
Keywords in the domain name and in the path of the file are used to determine the relevance of a page as far as its content and that the anchors. The topic of the site is presumed according to its domain name.
Goggle perhaps gives too much importance to the domain name. If you type eg BlogSmith in the search bar, you are entitled to a blank page on the site blogsmith.com. You must go to the second results page for information on what is this site.
Should we reduce the number of external links to transmit the PR to internal
This is not a good idea, primarily because external links are part of the content, so of the relevance of the page, and this comes before the PR which is a criterion of position. It is not difficult to find examples of pages with lots of external links and that are at the top of search results.
Therefore do not hesitate to put many external links but avoid any form of exchanges which are detrimental to the confidence in your site. Links to bad sites also will devaluate the page, for engines as for users.
It is certain that external links should be avoided on the home page to keep visitors who discover the site.
Panda and internal links
The algorithm of "quality," said Panda devaluates a site when a part of its pages is deemed unoriginal, without any meaningful content. Internal links reinforce this negative score.
Criteria of ranking
After choosing a certain number of pages which can meet the request of a user, engines are trying to order the results. This ranking is obviously essential, most Internet users merely watch first results only and often the first page of results only.
For this positioning phase, search engines use criteria of popularity and trust.
Some sites are trusted and are very favored by search engines, more especially by minors engines that lack of content in their databases than by Google. You often see these sites appearing at top of results, sometimes even with empty pages!
The PageRank of a page is assigned depending on the weight of links on this page. It is both a score of popularity and trust. A popular page passes its score to the pages it links. The PR is the second criterion of position after the trustrank.
"PageRank: Bringing Order to the Web." Reference.
It is often said that PageRank does not have much importance yet, but Google strongly discourages webmaster practices of monetization of links that are intended to increase it, who risk to be penalyzed, so we must believe that this criterion is still essential for the engine.
Ingoing links - the backlinks - are a criterion for ranking. But external links are a proof of relevance. The relevance still predominates on the positioning. On one hand they enrich the content, on the other hand, they are used to evaluate the value of the page. External links to relevant articles with good content improve the quality of the page.
The PageRank is the most controversial topic. It is also the one about which one is wrong most often. The main source of error is that it is unclear that the SERP are built in two steps in order.
We rely on the article The Anatomy of a Large-Scale Hypertextual Web Search Engine written by the founders of Google and inventors of the PageRank algorithm.
The number of external links divides the transmitted PageRank.
"For instance, if a page with a starting PageRank of 4 has two outgoing links on it, we know that the amount of PageRank it passes on is divided equally between all of its outgoing links. In this case 4 / 2 = 2 units of PageRank is passed on to each of 2 separate pages." Reference.
A page gets a high PageRank with few backlinks.
"A page can have a high PageRank if there are many pages that point to it, or if there are some pages that point to it and have a high PageRank". Reference.
The quality of links from a page is more important than their number.
Clicks on results
This is stated in the patent of PageRank by Google, clicks in the pages of results are taken into account.
"The number of times a page is selected in results of requests, as well as time spent to reach the page are taken into account."
Given that the click on a result is based on the title and the snippet (description), it goes without saying that they must be relevant and attractive.
Bounce rate is the percentage of visits that do not lead the visitor to view a second page on the site.
It is unclear whether the bounce rate is a negative or positive value, because if the visitor finds exactly what he wants, he does not need to look at other pages. But the webmaster has interest to reduce it, so getting more page views.
The bounce rate is given by the communicatrors of Google as a criterion to be monitored.
Time spent on documents
If users chose a page, but return immediately to the results list, this is known by the engines. This page will be considered as irrelevant.
Penguin and inbound links
Since the Penguin algorithm, a lot of inbound links deemed artificial can penalize a website. These are mostly links placed in directories or comments or any page without useful content.