SEO Frequently Asked Questions
While the glossary of SEO provides essential terms that give an overview of SEO, this page answers frequently asked questions and problems about optimization of a site for search engines. SEO goes beyond indexing in directories. Many details can avoid losses of rankings, unfair when the site's content is quality.
Is it really useful to provide a sitemap to Google?
This is the sitemap file in XML format. Some fact about sitemap:
- The main interest of the sitemap is to facilitate the work of the Google search engine. But there's another...
- Dynamic link is ignored by robots of search engines. The XML or HTML sitemap create a static link for it.
- A sitemap has no effect on the Google's PageRank. It is used for indexing only.
- If all properties of items in the sitemap are identical (priority, frequency, last modified) an XML sitemap has few interest. In addition, Google, as it said, does not necessarily take in account the values you give.
- The XML sitemap can now be used by all the major search engines.
- It will be necessary to rebuild the sitemap whenever the site's content is changed. But it should be registered only once. See the FAQ at Sitemaps.org.
- Google not always take into account the sitemap for indexing your pages, it said.
- Once registered the sitemap, it is possible to obtain statistics and analysis of the site by Google, with errors encountered.
- In conclusion, build an XML sitemap if your site is poorly indexed, or if you want to have statistical information.
How do I know if my pages are indexed by Google?
If your site is called "www.scriptol.com" for example (this is impossible), type this in the search window:
site:www.scriptol.com
Google will display your indexed pages and so allows you to check the title and description of the pages.
How to exclude a page of the index?
Insert a meta tag within <head> </head> into the HTML page:
<meta name="robots" content="noindex" />
A robots.txt at the root of the site may also contain rules to search engines for excluding files or directories.
Is the duplicate content penalized?
Duplicate content is the presence of same contents on page in the same site or in different site, or contents indexed twice. This could happen with different URLs pointing on the same page or with copies of pages. This would be a way for a site that would try to monopolize the top or result pages, but this never happen in the real world, so it can be concluded that engines penalize effectively duplicate content.
Is robots.txt helpful?
This file, in the format of the operating system of the server, Unix if the
server is Unix, must be stored at root of any website. He said to search engines
which pages should be indexed or excluded.
The default typical content of robots.txt is:
User-agent:* Disallow:
User-agent is the name of a search engine crawler and Disallow specifies the full pathname (with / at the beginning) of a page or directory that you want to exclude from the index. Note that
Disallow:/
excludes the whole site to be indexed!
To exclude the cgi directory, the format will be:
User-agent:* Disallow: /cgi-bin/
To exclude a file:
User-agent:* Disallow: /rep/filename.html
The names are case-sensitive. Do not put multiple filenames or crawlers
on the same line, put several groups of User-agent+Disallow or several Disallow
with the same User-agent.
Note: Do not include white line without the # code at the beginning
of the line. And consequently do not store empty file under the name
robots.txt.
It is possible to check the validity of a robots-txt file from the webmaster
tools of Google.
Are RSS feeds useful for SEO?
It is a way to get visitors and amounts of backlinks. The RSS file contains
a list of links on your articles and it can be replicated on other sites,
as well as in directories. To find out how easily achieve an RSS file, and
how to use it, consult the RSS
tutorial or the RSS
section on this site.
The backlinks provided by the RSS feeds which are echoed by many sites are
temporary, they will disappear with the renewal of the content of the feed,
therefore RSS is best suited for blogs.
Is the description meta used by Google?
The answer is given by Google on his blog for webmasters, in the article
entitled "Improve
snippets with a meta description makeover".
Snippets are the descriptions in search results under the titles.
The description in the meta must be unique and must give details on the page.
It should contain keywords related to its contents.
Why a second indented link for the same site in results page?
The result of a query displays for a site, a link, and then a second, which is shifted. This means that the same site appears twice among the same search results page, in which case the two pairs title and descriptions are combined with no respect to the score of the second one.
Are internal links useful?
Internal links, mainly on the home page, facilitate the indexing of the pages,
and also tend to spread the PageRank of a page to another. Put a maximum of
internal links in the content of the pages, when a term refers to the content
of another page of course.
The anchor of the link must be descriptive, it helps search engines to define
the content of a target page and therefore favors its rank.
Are social bookmark links given less weight than other back links?
For Matt Cutts, (see interview in references at bottom), a link is a link.
And so links gained from social bookmark sites have same weight as other link
in regular webpages.
The weight of a link depends upon the PageRank of the page where it is added.
How many keywords can I put into a URL?
In the directory + filename, you can put until 5 keywords with no problem. Beyond that, your URL look as spam and the algorithm weights these words less. You can get spam report with lot of keyword in URLs (Matt Cutts in references).
How many links can I put into a page?
The guidelines recommend to put less than 100 links. You can bypass this number, technically, there is no problem as Google can parse a page up to 500 KB, but it is bad practice and it is better to split the page into smaller ones.
My page is not indexed by search engines
Perhaps the HTML format is not correct and therefore not recognized by crawlers...
Check your syntax with the validator of the
W3 Consortium.
If the page is new, it takes several days or weeks for it to be taken into
account. See also paragraph on sitemaps.
It is also possible that Google or another search engine decides not to index
your site because robots.txt is empty or malformed.
See at robot.txt.
Can I force a Web page to be indexed?
If robots do not come frequenlty enough on your site (the date of the last
visit is indicated on the home page of webmaster tools), you can still force
the indexing by getting a link to the page on another site that is frequently
crawled.
See the article How to obtain
backlinks and similar article on this site for details.
How to improve the SEO of my site?
Several page here are dedicated to SEO, see the SEO
summary.
This page is dedicated to the
optimization for search engines.
Where can I get more information about Googlebot?
Googlebot is the crawler of Google. It could parse some pages on your site every day. This Googlebot FAQ gives details of how it works.
What is lemmatisation?
An expected progress for search engines to identify the root of words and retrieve pages sharing same roots of words. Do not really seem yet implemented in 2007.
What is hilltop?
A theoretical extension to the PageRank, and that could prevent manipulations by an algorithm which classifies a page solely on the basis of links from authoritative sites. This is partially used by search engines according to the Google's patent.
What is SERP?
Search Engine Result Pages, ie results pages provided by search engines in response to a query.
How to avoid cloaking?
Cloaking is presenting
to search engines text that is not visible to visitors. It
may not be intentional when you add text unnecessary to visitors to index
pages made of flash or images or dynamic text that are not scanned by robots.
But this is not allowed.
You should use an
alt attributes dedicated for images instead. And
for text displayed by JavaScript and not seen by robots, it can be submitted
into the noscript tag, it is permitted.
What is the bounce rate?
Definition from Google: "Specifies in what percentage visitors left the site
without viewing any other pages." The bounce is the fact that a visitor leaves
the site as soon as he read the page on which it arrives. So if three out
of four visitors do read a single page and leave the site without to read
others, the bounce rate will be 75%.
It is generally preferable to have a low bounce rate, it means that there
is interest in the content of the site and that one read so many pages, but
on the other hand, when a visitor searches for something very precise he will
leave the site after having found it and the bounce in this case is a positive
factor!
How can we overshoot Wikipedia?
Wikipedia, the big wiki, sort of online encyclopedia, tends to arrive at
the top in Google, although before websites with more comprehensive article
and with more backlinks!
One of the reasons is that this site is favored and another is in the impressive
number of links between articles and sub-domains.
But there is room to move ahead and achieve top results in search engines.
The weakness of the wiki is that all articles have a single word for name
and thus anchor are also a single keyword.
The solution is to make articles based on two keywords, for example, grape
+ health, or health + diet. The title of the article include two keywords,
as well as the file name, and the anchors of internal links...
Searches made on two keywords should return your page rather the one keyword
page of the wiki.
How can I leave the sandbox?
Whether the site is crossing in the sandbox because it is new or has been restructured or because he has been penalized, the means of escape are the same: get the confidence of Google by a minimum of popularity (and remove the causes of possible penalties).
- Encourage links with a useful and original content, with a friendly look.
- Use a spell checker. For me I use that of Star Office that can be downloaded for free with the Google Pack.
- Publicize the site. Some well chosen directories. Needless to include the site in quantities of directories with a low PageRank in internal pages, that adds nothing.
- Make scoops. Publish your news in digg-likes or sites of press releases.
- Do not place reciprocal links to other sites nor use any such techniques prohibited by the webmaster guide from Google.
You will gradually leave the sandbox. The links will appear gradually with the command link:, the crawler will come more and more frequently.
References
- SEO manual. Step by step manual for how to succeed in SEO and to increase the number of visitors.
- Answers from
Google to webmasters
Lot of questions and the team at Google Webmaster Central answered all of them. - Interview of Matt Cutts. Head of Google’s webspam team.
- Articles on robots.txt.