Simple Map - Site Map Generator
Simple Map is a free open source site map generator and checker...
You can generate a site map with just a command and edit the generated document
from within the built-in viewer (or any text or XML editor), and then upload
it directly at the root of your site.
Finally you must register the map if the format is XML or text. The XML format
is the standard from Google adopted by Yahoo and Live Search (Microsoft).
- Sitemap concepts
- How to create the map of a website?
- Why to make a sitemap?
- XML, text or HTML, which format to choose?
- Tips, important advices about XML, text or HTML site maps
- Validating the sitemap.xml file
- Submitting the sitemap
- The tool
- Resources
Sitemap concepts
How to create the map of a website?
With the graphical interface, you have just to type the name of the home page and click on the "Generate" button!
Why to make a sitemap?
A sitemap, either in the XML or HTML format, helps search engines to index
all pages of a website.
Additionally, Google, when an XML sitemap is registered, produces an analysis
of problems encountered and a report of errors, and statistics.
Google returns which requests to search engines will lead to your pages and
the pages that have not been indexed.
|
Simple Map, the screen
|
![]() |
XML, text or HTML, which format to choose?
The XML format is now recognized by main search engines. It is intended to
give various information to Googlebot and other crawlers. This XML document
is generated by Simple Map according to the standard specification.
- The priority tag indicates which pages are the most important ones.
- The lastmod tag, gives the date of the last modification. Used along
with the frequency attribute.
- The changefreq tag, is the frequency of scanning by the robot, from
always, for a very big website with pages changing continuously, to
yearly or never, for static pages. As for W3C specifications
of formats with a version number.
The text format gives only the list of URL of pages to be indexed. It is accepted by Google.
The HTML format is for visitors of your website. It may display links, titles, descriptions or other infos. It is scanned by search engines and allows to give URL of pages that are not indexed, specially in the case of multi-level sub-directories, since deeper levels are not always scanned
Tips, important advices about XML, text or HTML sitemaps
XML sitemap
- The XML format is now supported at least by Google, Yahoo, Live Search, Ask...
- XML sitemaps are required if you use dynamic links for your articles (JavaScript links).
- Don't use an XML sitemap if all pages of your site are already indexed at Google. (You can verify by typing in the search box site:www.sitexyz.com.)
- If some pages in your website are not indexed yet, give these pages a higher priority in the XML sitemap (the "priority" tag).
- To let a page ignored by crawlers of search engines you should use a Robots.txt file or the "ROBOTS" meta tag with value noindex.
- The sitemap is for the whole site. Don't create a map with only links of files that are not indexed by Google.
- You can omit optional tags (priority, lastmod, changefreq) if you are not sure you need for them.
- The time option (in lastmod tag) is for very big websites! Only date is useful in most cases.
- A sitemap with all pages same higher priority and faster frequency is of null interest for Google. Give to pages the lower priority and the slower frequency if they are already indexed and not frequently changed.
- For videos, a tag has been added to the sitemap protocol. See the tutorial for videos in sitemap from Google.
HTML sitemap
- You may create a HTML map for visitors and an XML map for robots of search engines.
- Put the link to the HTML sitemap on the home page.
- When a page is added to your site, it is not indexed immediately. You have to wait for days or for weeks before it becomes visible. Meanwhile search engine's robots scan your site daily, the database is updated from time to time (weeks or months).
RSS sitemap
- A RSS file is a valid sitemap for Google and other search engines, but
for recent pages only.
Validating the sitemap.xml file
To validate your XML site map, you need for a validator and two files:
- sitemap.xsd, the schema of the format, included in the archive
- sitemap.xml, the list of web pages, on your website or locally on your computer.
See resources for a validator.
Submitting the sitemap (sitemap.xml)
The XML map file must stays at root of your site, belong the home page, index.html or index.php or other name.
You can submit it in three ways:
- Register at the site of the search engine.
- By a ping request.
- With the robots.txt file.
1) Register through the interface
Register at:
Create an account if you don't have one.
Google will provide you a test file to upload into your website, and once
this is done, you have to return and click on the verify button... and forget
them for a day before to return again for the results.
2) Ping request
You can also submit the sitemap with a ping, see at "What do I do after
I create my Sitemap? " in the FAQ in resources below.
Once the sitemap is registered, when it is updated you don't have to register
it again, you can inform the search engine that the file is modified by a
ping:
http://www.google.com/ping?sitemap=http://www.scriptol.com/sitemap.xml
Replace scriptol.com by the URL of your website, and google.com by the URL provided by the search engine: yahoo, ask, etc.
3) Use the robots.txt file
According to the blog of Google, you can now add a sitemap entry to the robots.txt file, and the sitemap is parsed when the robot of Google and other search engines encounters this file. the syntax is:
User-Agent: * Disallow: Sitemap: http://www.scriptol.com/sitemap.xml
The robots.txt file is stored at root of the website along with the sitemap file and the home page, index.html or other.
It is possible for the owner of several websites, to define in the robots.txt file of a site, the URL of sitemaps for each website, one per line. Référence.
User-Agent:* Disallow: Sitemap: http://www.scriptol.com/sitemap.xml Sitemap: http://www.scriptol.net/sitemap.xml
The tool
How it works
The program parses recursively the content of a website, from the main page
to each page that is linked, and builds a list of all pages to be indexed
by search engines. A list of extensions in the source allow to select the
type of files to index.
The program works locally offline on the image of the site for now. There
is a lot of websites that offer you to build the sitemap of your Internet
site, directly online.
- View the manual.
A PDF version for a printed document could be made with Star Office.
Getting the program
Getting the source code
The source code is included in the archive. This is a Scriptol
program, and it is very clear and short, thanks to the text processing functions
of the Scriptol programming language.
You can compile the source to PHP or C++, or binary.
The source of the graphical user interface is furnished free of charge also.
Changes
Planned
- Displaying titles rather than filename on the HTML map.
- Displaying more info on HTML map: description, keywords, etc.. at user's choice
- Using the map to check broken links.
- Building the map directly from the remote website.
- An "Update" command to add links to previously created sitemaps.
Resources
- Specification of the XML standard.
- Validator. Check your sitemap for validity.
- Sitemaps.org - Official website of the standard, common to Google, Yahoo, Live Search.
- Robotstxt.org. More infos about robots.txt and indexing of your website.
- Video Sitemap. Videos can be indexed too.
(c) 2007 Denis Sureau. Scriptol.com
Licence: Mozilla 1.1.
