Frequently asked questions about search engine algorithms

While the glossary of SEO provides essential terms that give an overview of SEO, this page answers frequently asked questions and problems about optimization of a site for search engines. SEO goes beyond indexing in directories. Many details can avoid losses of rankings, unfair when the site's content is quality.

Questions about Google's PageRank, and ranking in general, and how to gain some points, by natural ways, without to use bad practices as cloaking and spamming and other forbidden artifacts that may lead you to the black list...


General questions

Anatomy of Google, thumbnail

Panda

Penguin

SEO tools

Improving ranking

Links and backlinks

Questions about the PageRank

Answers

How do I know if my pages are indexed by Google?
If your site is called "www.scriptol.com" for example (this is impossible), type this in the search window:
site:www.scriptol.com
Google will display your indexed pages and so allows you to check the title and description of the pages. You can also make on any engine a search to exact match, ie get a sentence on a page and put it between parenthesis in the search bar.
How to exclude a page from the index?
Insert a meta tag within <head> </head> into the HTML page:
<meta name="robots" content="noindex" />
A robots.txt at the root of the site can also contain rules to search engines for excluding files or directories.
Is a duplicate content penalized?
Duplicate content is the presence of same contents on page in the same site or in different site, or contents indexed twice. This could happen with different URLs pointing on the same page or with copies of pages. This would be a way for a site that would try to monopolize the top or result pages, but this never happen in the real world, so it can be concluded that engines penalize effectively duplicate content.
In a post on its blog, Google has clarified the rules about duplicate content.
Duplicate content, this can also be the incorporation of a portion of an article from another site on its site. It is a penalty factor insured, unless it is a quote placed in a <blockquote> tag . Quotations must be accompanied by a personal text.
Are RSS feeds useful for SEO?
It is a way to get visitors and amounts of backlinks. The RSS file contains a list of links on your articles and it can be replicated on other sites, as well as in directories. To find out how easily achieve an RSS file, and how to use it, consult RSS section on this site.
The backlinks provided by the RSS feeds which are echoed by many sites are temporary, they will disappear with the renewal of the content of the feed, therefore RSS is best suited for blogs.
Is the description meta used by Google?
The answer is given by Google on his blog for webmasters, in the article entitled "Improve snippets with a meta description makeover".
Snippets are the descriptions in search results under the titles.
The description in the meta must be unique and must give details on the page. It should contain keywords related to its contents.
Should I fill the meta keywords?
The meta keyword is not used by Google. It may be used by other search engines. Some webmasters performed a successful experience with the meta keyword and Yahoo.
If you need for additonnal trafic from Yahoo, fill the meta keyword.
The operator link in the search bar (link: site-name) is a command to display the number of links pointing to a site. In fact this command provides only a fraction of backlinks, in order to save servers bandwich.
The choice of outcome is totally random, this was confirmed by Matt Cutts in a video on Youtube. They have nothing to do with PR or with the quality of the pages, they are taken randomly.
Twitter and Facebook have they an effect on ranking?
The answer is officially yes. This was confirmed by Google and Bing in an interview with a journalist. The number of times a page is retweeted, or linked on Facebook affects its position in the results even if the links themselves are in nofollow.
We must therefore add another criterion to many signals of Google: social authority.
How to get a foreign version of the search engine?
To not be automatically redirected to the local version of the search engine, a language parameter must be added. For example:
  http://www.google.com/?hl=fr  
For the french version of Google.
Do you know some forums for webmasters?
Here is a list, with the Alexa rank in parenthesis (it may date):
How to optimize the title of a page for a better ranking?
An optimal title must tempt users to click on the link to read the page, contain essential keywords corresponding to the content, and have a length of about 60 characters.
More details: Creating a good web page title.
Are internal links helpful?
Internal links, mainly on the home page, facilitate the indexing of the pages, and also tend to spread the PageRank of a page to another. Put a maximum of internal links in the content of the pages, when a term refers to the content of another page of course.
The anchor of the link must be descriptive, it helps search engines to define the content of a target page and therefore favors its rank.
Several links to the same page may be even added, as explained further.
Are social bookmark links giving less weight than other back links?
For Matt Cutts, (see interview in references at bottom), a link is a link. And so links gained from social bookmark sites have same weight as other link in regular webpages.
But the weight of a link depends upon the PageRank of the page where it is added.
Is the domain extension important for PageRank?
No, the extension may be either .com, .edu or .org, this has no importance, only the PageRank of the page is important for backlinks. Links from these sites are not more trusted and do not pass more PageRank.
Reference in interview.
Are nofollow links followed by crawlers?
It is sometimes admitted that even if nofollowed links do not pass PageRank, they are used for discovery of new pages. This is denied by Google.
- Nofollow links do not pass PageRank.
- They are not used to discover new pages.
- The anchor is not used to define the content of the linked page.
They are totally ignored.
Reference in interview.
When multiple links point to the same page, only the first is taken into account by Google. But this is not the case if the links point to different sections of the page, determined by a fragment with the #xxxxxx format.
In this case, the anchor of each link is considered to index the target page. Whether it links to another site or on the same site.
It appears even that the first link on the page and not a section is ignored.
Tests have been made by seomoz to verify that. But there have been changes in the algorithm of Google in April 2012 on anchor links, and this could have changed.
Internal links can they cause a penalty?
Especially if they all have the same anchor...
No, internal links are not taken into account by algorithms that are likely to penalize a site because they are a way to navigate the site and are therefore considered necessary.
Except in the very special case where a page would contain quantities of identical links, algos see in internal links a navigation tool and not a mean to transmit PageRank. You do not need to worry about your menus, sidebars or internal references or their anchors. (Dixit Mister G.)
How many keywords can I put into a URL?
In the directory + filename, you can put until 5 keywords with no problem. Beyond that, your URL look as spam and the algorithm weights these words less. You can get spam report with lot of keyword in URLs (Matt Cutts in references).
The guidelines recommend to put less than 100 links. You can bypass this number, technically, there is no problem as Google can parse a page up to 500 KB, but it is bad practice and it is better to split the page into smaller ones.
Can I force a Web page to be indexed?
If robots do not come frequenlty enough on your site (the date of the last visit is indicated on the home page of webmaster tools), you can still force the indexing by getting a link to the page on another site that is frequently crawled.
See the article How to obtain backlinks and similar article on this site for details.
How to improve the SEO of my site?
Several page here are dedicated to SEO, see the SEO summary.
This page is dedicated to the optimization for search engines.
Where can I get more information about Googlebot?
Googlebot is the crawler of Google. It could parse some pages on your site every day. This Googlebot FAQ gives details of how it works.
How to avoid cloaking?
Cloaking is presenting to search engines text that is not visible to visitors. It may not be intentional when you add text unnecessary to visitors to index pages made of flash or images or dynamic text that are not scanned by robots. But this is not allowed.
You should use an alt attributes dedicated for images instead. And for text displayed by JavaScript and not seen by robots, it can be submitted into the noscript tag, it is permitted.
How to type google.com without being redirected to my country version?
When you want to access the search engine, it automatically redirects you to the regional version of the engine. This is suitable for most users but not to the webmaster or the user who wants to do a search on google.com.
To reach google.com, type in the URL bar:
www.google.com/ncr
What can be placed in bookmark. "ncr" could mean "no country redirect".
How can I leave the sandbox?

Leaving sandox or Alexa mystery?
The first thing to do to get out of the sandbox due to a penalty is to delete from the content all possible causes of penalties, such as hidden text, a partial content duplicated over all pages, multiple redirects...
A site may be penalized because of an sudden increase of links from a same site or lot of directories, or for reciprocal links to other sites.
Whatever the site has moved to the sandbox because it has been restructured or because it has been penalized, the means to retrieve its ranking are the same: getting the trust of search engines. Do not use any such techniques prohibited by the webmaster guide from Google.
All tips to improve PageRank are relevant also to leave the sandbox, once the causes of penalties removed.
But your site may be penalized also by Panda for lack of genuine backlinks. Then see the advices given on the Panda Update study.
How can we overshoot Wikipedia?
Wikipedia, the big wiki, sort of online encyclopedia, tends to arrive at the top in Google, although before websites with more comprehensive article and with more backlinks!
One of the reasons is that this site is favored and another is in the impressive number of links between articles and sub-domains.
But there is room to move ahead and achieve top results in search engines. The weakness of the wiki is that all articles have a single word for name and thus anchor are also a single keyword.
The solution is to make articles based on two keywords, for example, grape + health, or health + diet. The title of the article include two keywords, as well as the file name, and the anchors of internal links...
Searches made on two keywords should return your page rather the one keyword page of the wiki.
Changing the design can it affect the ranking?
It should not. However, some webmasters have experienced a loss of positioning with the change of design of a site without changing the content, immediately after the passage of Googlebot.
This experience has been shared on Webmasterworld. The ranking returns to the previous state after a variable delay. It is probable a massive change raises some signal on the engine.
We therefore recommends to change the design little by little and not globally. If something leads to a penalty it will be easier to see what it is.
Another advice from Google is to not change the design when you change the domain and redirect the pages.
To be temporarily unavailable is it harmful for a website?
This may be the case if the situation is not properly managed, which means that we know in advance that the site will be unavailable.
If this is not the case, webmasters may think that the site, if not very important, is closed and remove backlinks. Similarly, robots of search engines can return a negative signal.
If the outage is expected, the ideal is to return an HTTP code 503, which is defined for this situation. In PHP, the code of the home page or all pages in the case of a CMS can be like this:
header('HTTP/1.1 503 Service Temporarily Unavailable');
header('Retry-After: Mon, 25 Jan 2011 12:00:00 GMT');
This code is supplied by Google.
What is minus thirty?
Many webmasters believe they have suffered a penalty that is called minus 30 or -30. Their site is bumped from #1 to #31 in results of Google, and it is very clear with the URL of the site. In general, a site ranks first on its name with the extension, or the sites are now found in 31th position.
The link operator in the search bar (link: site-name) is a command to display the number of links pointing to a site. In fact this command provides only a fraction of backlinks, in order to save bandwich of servers.
The choice of outcomes is totally random, this was confirmed by Matt Cutts in a video on Youtube. They have nothing to do with PR or with the quality of the pages, they are taken randomly.
Should we add content frequently?
Continuously adding new pages can it not be harmful since it increases the number of links on the homepage?
Adding content is good but you we must follow some rules of organization. The homepage does not link to all articles but only a few. Each page must have a link on the home page and links to related articles: links should always be relevant.
That said, Google promotes new content, so assuming that your new articles are related to the actuality, or your change in previous articles update them, it is good for SEO.
The changes that are not of actuality have little interest, it serves mostly Adsense which targets preferentially pages that evolve.
What is the difference between "white hat" and "black hat"?
These are two forms of optimization, one regular and one prohibited by the rules of search engines.
The first is to make the content of a site more accessible to search engines: internal links, choice of keywords, etc..
The second is to manipulate them to gain better positions in the results with less useful or even inappropriate content, for example by creating link farms.
Google said many, times, and especially in this video, it does not consider SEO as spam.
Provided it is designed to make access to the site easier for search engines:
- Make all pages accessible through links.
- Put keywords corresponding to the content on the page (without artificially multiply them, one occurrence is enough).
- Make loading faster.
Unlike black hat techniques:
- Bad: To present different content to search engines and users.
- Bad: Overloading page with keywords useless for the user.
Spelling and grammar are they taken into account by search engines?
This is not part of the 200 signals that determine the ranking, but Matt Cutts said that there was a correlation between the authority sites, well positioned, with spelling and grammar.
In addition there is a correlation with PageRank, which is normal, it will be easier to link to or bookmark a well written page, which gives the impression of having be entreated.

Panda

How Google determines the originality of content?
Google has perfected over time an instrument used to detect duplicate content, and uses a derivative to the online tool Translate. This allows to recognize a same idea in different formulations.
The page size is not taken into account, what matters is the answer to a query and each page can respond to multiple requests, specifically if it contains subtitles that correspond to questions.
What makes the better optimized sites could have suffered most from Panda: they are more responsive to requests.
Why big sites are they favored?
Panda does not actually evaluates sites based on their quality as marketed but rather by the ratio of the number of backlinks and other references, on the number of inclusions in results pages.
It goes without saying that large sites with more visitors also get much more backlinks that small sites with even when they get their contents from small specialized sites.
Robin Hood robbed the rich to redistribute to the poor, Panda Hood tends to do the opposite.
Why do forums have preferential treatment?
Panda is based primarily on the number of back links. Not everyone add links, Most visitors are not webmasters or bloggers and therefore do not create links. But visitors on forums often provide links in their comments, and responses in forums are often linked.
Why is a page penalized?
Why a page is not indexed or it appears in the results far behind other less rich content sites?
A page can trigger a negative signal, without even the webmaster can suspect it. Some examples ...
  • Too much blank space.
    It is a sometimes falsely negative signal indicating that the page is almost empty or generated automatically.
  • Hidden text.
    You wanted to create dynamic content and a portion of text in the page appears after a user action. This is seen by the crawler as text intended only for search engines, called cloaking.
  • Duplicate content.
    The same text listed on all pages triggers a signal of duplicate content.
  • Accidental redirection.
    The page contains a link to another page that was subsequently redirected to another, perhaps to the same page.
  • A keyword filtered.
    The site essex.edu was a time filtered by Google from searches and limited to mature audience because the domain contains the word "sex".
    The words are not always splitted, that seems limited to domain names, but verify, however, that words on the page does not cause filtering.

Penguin

Negative SEO does it exists?
Defining negative SEO: it is to be penalized because a competitor, by creating many artificial links to your site, makes the Penguin algorithm penalizes it. We know that this algorithm targets websites with lots of fictitious backlinks, but it does not make a difference if these links are created by the webmaster or a competitor.
According to a study by webmarketingschool, negative SEO is demonstrated by the fact that two sites that have undergone such an attack by the same perpetrator, were both demoted in search results.
It is possible to fight against these attacks by using Google Webmaster Tools to make these backlinks irrelevant. Sites that hosts them may be added to a list to be ignored.

Links

Javascript links are they taken into account by Google?
If they are easy to interpret - because Google can execute JavaScript code - they are taken into account as regular <a> tags and may even pass PageRank to the page that is linked. We know as this has been confirmed by Matt Cutts, the Google spokesman about referencing issues, especially the second point:
  1. JavaScript links are used to discover and to index pages by search engines.
  2. And so they are treated as HTML links and can pass PageRank.
A JavaScript link is typically a link like this:
<span onClick="location.replace('http://www.scriptol.com/seo/')" />Anchor</ span>
or with an image:
<img src="image.jpg" onClick="location.replace('http://www.scriptol.com/seo/')"  />
It is then easy to interpret these links by search engines. But it is possible to write JavaScript code to make an URL invisible to search engines.

How to improve naturally the PageRank

How to know the PR of a page?
The greenbar is a Chrome extension. URL:
PageRank bar.
What is PageRank?
PageRank, or website ranking, is a notation from 0 to 10, given by Google to each page of a website.
The higher is this value, the better will be the position of the page in results of searches, among other pages that match the request.
A 5 points PageRank is Good. 7 points may be reached with valuable backlinks. The number of 10 points PageRank websites is very short!
The word PageRank comes both from "page ranking" and "Page" that is the name of one of the two authors of the algorithm (Serguey Brin and Larry Page).
More in Google's PageRank.
How is calculated the PageRank?
PageRank does not depend on page content but links pointing to it (and outbound links to a lesser degree).
The authors of the algorithm, from the links between all the web pages, establish a chain of Markhov which gives the probability of reaching a given page in the shortest time possible.
To be simple, to get a higher PR, you must have many links pointing to your site, and mainly quality links from sites that then themselves have many quality links pointing to them.
The "vote" of a page is transitive, if A links to B , it enhances the importance of B, and therefore, if B links to other pages, it also enhances their importance.
Moreover, if a page points to several links, the weight it brings is divided by the number of links (including nofollow links that are discounted here).
Links between pages within a site are taken into account, and they tend to transfer the PageRank from a page to another.
Reference: "Deeper inside PageRank" by A.N. Langville et C.D. Meyer.
Is PageRank important?
According to Google, PageRank is the more important among 200 criteria to order pages in results of searches.
Thus, it is not the only one. But for websites that match a same group of keywords, it is very important.
Is PageRank used against duplicate content?
When two pages are identical, and if the date of indexing is not sufficient to know what is the original and what is the copy, Google considers that the page with the higher PageRank is the original. This was clearly stated in an interview of Matt Cuts by Stephan Spencer and confirmed by a post on the Google's blog about duplicate content.
PageRank is it transmitted through a link to an image?
The official answer from Google is YES. With no more details.
How to improve my PageRank?
The score of confidence and popularity of a page, which is more or less represented by a green bar in the toolbar of Google, depends on the number and quality of backlinks to this page, but how they are taken into account depends on many criteria. To best meet these criteria, here is a list of things to do:
  1. Put internal links between pages of your website, except to the pages that are unlikely to receive links from other sites.
  2. Avoid the "nofollow" attribute, example: <a http://www.sitexxx.com rel="nofollow" >The site</a>, or lose 2 points of PageRank.
    Use the nofollow attribute only to pages that should not appear in Google's index, and not for other reasons.
  3. Don't put links to pages with low quality.
  4. Get quality backlinks. Directories are not interesting. Link bait is the best natural way to get quality backlinks.
  5. Create interesting and unique content. Let it known by other websites to gain backlinks from them.
  6. A page must have an unique, well-defined topic. The PR of a page depends upon a group of keywords, or several groups of keywords in the page, actually.
  7. Add more pages to your website (but with different contents).
  8. Be patient. You have to wait for months before you can view the effects of your work.
Why is a site above another that has a more important PR?
Why my site is better positioned than another in search results while the latter has a PageRank greater, or vice versa?
Because the results page is related to a group of keywords, while the PageRank reflects the number and quality of backlinks to the page regardless of the query. It is possible that another page is best positioned on a group of different keywords.
What is cloaking?
This is creating alternate pages that are read by crawlers (robots of search engines) but not by human readers. These hidden pages are full of keywords to improve search results.
When cloaking is detected the website goes to the blacklist, their pages are no longer indexed. See "bmw.de" et "ricoh.de" affairs (same webmaster?)
What is spamming?
This is putting lot of hidden links into a web page (inside "no script" tags for example) to make more links to a friend website and improve its ranking. Once spamming is recognized, the two websites goes to the blacklist.
What is spoofing?
This is redirecting a page to a page in another website with a high PageRank, and this result in the source page to get the PR of the destination. The redirection is achieved by the use of the "refresh" meta tag. Visitors see the current page, but search engines see only the target page with the high PR.
This is known as a bug in the calculation of the PR, and is probably fixed now.
How to know my PageRank?
Just install Google's toolbar on your browser. The PageRank of each page is displayed when you visit your website.
But this is a kind of mean as PageRank depends upon a group of keywords. To know the real ranking, perform searches with various keywords. The position of your page (when several match the request) gives the ranking: the top of list means for a ranking of 10. First page of search results means for 6-9 PR when lot of matches exists.
A company guarantees me a 10 points PR.
I have been contacted by a company and it guarantees me a 10 points PR, and I want to improve my ranking. Should I accept?
According to Google, nobody can garantee a PageRank, for any position. (And I know only a dozen of big websites with a 10 PR).
Is the PageRank the first factor for the position?
Matt Cutts is the member of the Google's SEO staff who communicates the most often on medias about the algorithm. He said in an interview published on the Stonetemple site, in Octobre 8, 2007:
"I would certainly say that the links are the primary way that we look at things now in terms of reputation."
Links are the source of the PageRank, according their weight and their number, and they are the first factor for the reputation of the document, which in turn is certainly the first factor for the position in results (but one among 200 signals).
What means a graybar PR? Is this a penalty?
This is not necessarily a penalty and this is not a problem with the toolbar as some think. This is not equivalent to a PR 0.
The graybar is a signal that something is wrong with the page from the rules that Google wants to see applied by webmasters. The more often a lack of content, an excess number of internal or external links compared to the content.
In practice, it prevents the spread of PR. A page should not be grayed if it has quality backlinks, otherwise you should study it as it can contain anomalies.
What are other factors for the position in results.
PageRank, that is based upon backlinks, is only one factor among several ones, to calculate the position of link to your website, in results of search engines.
These factors are also considered:
- The localization of the host and the language of the request.
- Clicks on the link to your website rather than other links in results. Your page must be chosen. Imagine good title and description, clear and attracting.
- The number of relevant keywords (different). This is used first to select a page, and then to calculate its position in the list.
A more complete list is given in the Google patent.
Does a 301 redirect mean a lost in PageRank?
When a page is redirected through the HTTP code 301, the PageRank is transmitted with a discount. This has been confirmed by Matt Cutts. The ratio of this reduction is the same than for a link, about 15%. We can say from experience that it is enough to lose one or more positions in results.
It is better to avoid changing the domain of a site if it is not absolutely necessary.
Ref WebmasterWorld.
A change in content let it lose PageRank?
If a page has many backlinks, when the content is changed, does it kill the value of these links (which makes sense since they were linked to the original content)?
We know that when a domain expired and bought by another Webmaster, PR is reset. This was communicated by Google.
In the case of change of content, it is a personal impression, things are perhaps worse, the change can be seen as a tactic of spam and the page penalized if it does not get new backlinks.
When the PageRank is it updated?
The actual PageRank depends on the evolution of backlinks among other factors and is constantly modified.
But the public PR as displayed by the green bar of the toolbar is automatically changed to fixed dates, every three months, in the beginning of January, April, July, October.
References