Sitemap and Sitemap Generator

The sitemap in XML helps crawlers of search engines and and the HTML version helps users, for guidance as needed on the site.
Now, sitemaps are extended with image and video tags and even a set a tags that turn them into a RSS feed.

You can generate a site map with just a command and edit the generated document from within the built-in viewer (or any text or XML editor), and then upload it directly at the root of your site.
Finally you must register the map if the format is XML or text. The XML format is the standard from Google adopted by Yahoo and Live Search (Microsoft).



Sitemap concepts

How to create the map of a website?

With the graphical interface, you have just to type the name of the home page and click on the "Generate" button!

Why to make a sitemap?

A sitemap, either in the XML or HTML format, helps search engines to index all pages of a website.
Additionally, Google, when an XML sitemap is registered, produces an analysis of problems encountered and a report of errors, and statistics.
Google returns which requests to search engines will lead to your pages and the pages that have not been indexed.


Simple Map, the screen

XML, text or HTML, which format to choose?

The XML format is now recognized by main search engines. It is intended to give various information to Googlebot and other crawlers. This XML document is generated by Simple Map according to the standard specification.
- The priority tag indicates which pages are the most important ones.
- The lastmod tag, gives the date of the last modification. Used along with the frequency attribute.
- The changefreq tag, is the frequency of scanning by the robot, from always, for a very big website with pages changing continuously, to yearly or never, for static pages. As for W3C specifications of formats with a version number.

The text format gives only the list of URL of pages to be indexed. It is accepted by Google.

The HTML format is for visitors of your website. It may display links, titles, descriptions or other infos. It is scanned by search engines and allows to give URL of pages that are not indexed, specially in the case of multi-level sub-directories, since deeper levels are not always scanned

The plain text or HTML files are simple list of URLs, but the XML format is made of tags corresponding to a standard format.

Sitemap formats

XML format

The container is urlset and it holds a set of url tags matching the pages of the site.

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
     <url>
        <loc>http://www.example.com/</loc>
        <lastmod>2005-01-01</lastmod>
        <changefreq>monthly</changefreq>
        <priority>0.8</priority>
     </url>  
</urlset>

Images in sitemap

To have an image indexed by search engines, the format is as following:

<url>
   <loc>http://example.com/sample.html</loc>
   <image:image>
       <image:loc>http://example.com/image.jpg</image:loc>
   </image:image>
</url>

More on the Center for Webmasters from Google.

Videos in sitemap

See the extended format and the FAQ of video sitemap, by Google.

News sitemap

For your articles to be published on Google News, it must, in addition to the URL containing a unique ID, a specific sitemap. It is the standard XML sitemap with tags added.

In fact these tags are transforming the sitemap in a RSS file:

A News Sitemap should only contain articles published in the last two days.

View the News Sitemap format.

Sitemap index

A sitemap index is a file holding a list of sitemaps. It allows if you have several maps or if the file is splitted into several file, to give the URLs.
You do not need to create an index if you have only one sitemap file and even if you have multiple sitemaps for different contents, they may be joined in a single file, as explained further.

The index file has a standard XML format too.
The container is sitemapindex and it holds a set of inner sitemap tags.

<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
     <sitemap>
        <loc>http://www.example.com/sitemap1.xml</loc>
        <lastmod>2004-10-01T18:23:17+00:00</lastmod>
     </sitemap>
</sitemapindex>

Multiple contents in one sitemap

To cope with the proliferation of types of sitemap files, Google has decided to integrate all types of content in a single file.
The multiple file contents looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns="http://www.sitemaps.org/schemas/sitemap-image/1.1"
xmlns="http://www.sitemaps.org/schemas/sitemap-video/1.1">
<url>
<loc>http://www.example.com/mapage.html</loc>
<image:image>
<image:loc>http://example.com/image.jpg</image:loc>
</image:image>
<video:video>
<video:content_loc>http://www.example.com/mavideo.flv</video:content_loc>
<video:title>Look growing the youngest.</video:title>
</video>
</url>
</urlset>

Thus three types of tags in the URL tag: loc for a page, image plus image:loc to an image file, and video with video:content_loc.

Tips, important advices about XML, text or HTML sitemaps

XML sitemap

HTML sitemap

RSS sitemap

Sitemap index

Validating the sitemap.xml file

To validate your XML site map, you need for a validator and two files:
- sitemap.xsd, the schema of the format, included in the archive
- sitemap.xml, the list of web pages, on your website or locally on your computer.
See resources for a validator.

Submitting the sitemap (sitemap.xml)

The XML map file must stays at root of your site, belong the home page, index.html or index.php or other name.

You can submit it in three ways:

  1. Register at the site of the search engine.
  2. By a ping request.
  3. With the robots.txt file.

1) Register through the interface

Register at:

Create an account if you don't have one.
Google will provide you a test file to upload into your website, and once this is done, you have to return and click on the verify button... and forget them for a day before to return again for the results.

2) Ping request

You can also submit the sitemap with a ping, see at "What do I do after I create my Sitemap? " in the FAQ in resources below.
Once the sitemap is registered, when it is updated you don't have to register it again, you can inform the search engine that the file is modified by a ping:

http://www.google.com/ping?sitemap=http://www.example.com/sitemap.xml

Replace scriptol.com by the URL of your website, and google.com by the URL provided by the search engine: yahoo, ask, etc.

3) Use the robots.txt file

According to the blog of Google, you can now add a sitemap entry to the robots.txt file, and the sitemap is parsed when the robot of Google and other search engines encounters this file. the syntax is:

User-Agent: *
Disallow:
Sitemap: http://www.example.com/sitemap.xml

The robots.txt file is stored at root of the website along with the sitemap file and the home page, index.html or other.

It is possible for the owner of several websites, to define in the robots.txt file of a site, the URL of sitemaps for each website, one per line. Référence.

User-Agent:*
Disallow:
Sitemap: http://www.example.com/sitemap.xml
Sitemap: http://www.example.fr/sitemap.xml

The tool

How it works

The program parses recursively the content of a website, from the main page to each page that is linked, and builds a list of all pages to be indexed by search engines. A list of extensions in the source allow to select the type of files to index.
The program works locally offline on the image of the site for now. There is a lot of websites that offer you to build the sitemap of your Internet site, directly online.

Getting the program

Getting the source code

The source code is included in the archive. This is a Scriptol program, and it is very clear and short, thanks to the text processing functions of the Scriptol programming language.
You can compile the source to PHP or C++, or binary.
The source of the graphical user interface is furnished free of charge also.

Changes

Resources