What are Sitemaps?
Sitemaps are an easy way for webmasters to inform search engines about pages on their sites that are available for crawling. In its simplest form, a Sitemap is an XML file that lists URLs for a site along with additional metadata about each URL (when it was last updated, how often it usually changes, and how important it is, relative to other URLs in the site) so that search engines can more intelligently crawl the site.
Web crawlers usually discover pages from links within the site and from other sites. Sitemaps supplement this data to allow crawlers that support Sitemaps to pick up all URLs in the Sitemap and learn about those URLs using the associated metadata. Using the Sitemap protocol does not guarantee that web pages are included in search engines, but provides hints for web crawlers to do a better job of crawling your site.
Sitemap 0.90 is offered under the terms of the Attribution-ShareAlike Creative Commons License and has wide adoption, including support from Google, Yahoo!, and Microsoft.
Sitemaps XML format
This document briefly describes the XML schema for the Sitemap protocol.
The Sitemap protocol format consists of XML tags. All data values in a Sitemap must be entity-escaped. The file itself must be UTF-8 encoded.
The Sitemap must:
- Begin with an opening
<urlset>tag and end with a closing
- Specify the namespace (protocol standard) within the
- Include a
<url>entry for each URL, as a parent XML tag.
- Include a
<loc>child entry for each
All other tags are optional. Support for these optional tags may vary among search engines. Refer to each search engine's documentation for details.
Also, all URLs in a Sitemap must be from a single host, such as www.example.com or store.example.com.
Sample XML Sitemap
The following example shows a Sitemap that contains just one URL and uses all optional tags. The optional tags are in italics.
<?xml version="1.0" encoding="UTF-8"?> <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <url> <loc>http://www.example.com/</loc> <lastmod>2005-01-01</lastmod> <changefreq>monthly</changefreq> <priority>0.8</priority> </url> </urlset>