Sitemaps for Drupal

Sitemaps are XML files that describe the contents of your website, including the URL of every public page, when that page was last modified, and how often it should be reindexed. This information is used by search engines to index your website more intelligently, and to make sure that no pages are missed. For new sites (like mine), registering your sitemap gets your site noticed by the search engines, and subsequently indexed. And as a bonus, submitting a sitemap to Google opens up their webmaster tools which provide a number of useful reports. For more information on sitemaps, see the Sitemap Wikipedia entry.

Before You Begin

You must have Clean URLs enabled for your Drupal site (an important step in making your site search engine friendly, even if you don't use sitemaps).

Important: The Single Sign-on module is incompatible with sitemaps. In fact, it is incompatible with Google, RSS, and most major search engines. If you are using it, or plan to use it, please read this article first!

Setting It Up

Begin by downloading and installing Drupal's XML Sitemap module. You don't need to configure it yet, just install and activate it.

With the module activated, you can start the process of registering your site with Google. Sign in to the Webmaster Tools section with your Google account (you do have a Google account don't you?). The Webmaster Dashboard will be displayed; from here you can register websites by entering the URL.

Google Site Submission Form
Put your site URL in the box and click "Add Site" to register with Google.

You will be asked to verify your ownership of the website. This isn't strictly necessary -- you can still submit a sitemap -- but verification will give you access to useful statistics and crawl diagnostics. And it is easy: click "Verify your site", then choose "Upload an HTML file" as the verification method.

Google Site Verification
Verifying your site gives you access to statistics and crawl diagnostics.

You will be given a random file name. When you verify, Google will look for this file on your website. Since you (presumably) are the only one who knows this secret file, its presence on your site proves your ownership. The XML Sitemap module makes this step easy: just copy the filename, then visit the Sitemap module administration page and paste it into the Verification Link field, which is located under the Google section, and save the form.

XML Sitemap Verification Field
Paste the magic filename into the Verification link and save.

Now you are ready to finish the process; click the Verify button on the Google Sitemaps screen.

Google will try to access the secret file, and if all goes well your site will now be verified. Congratulations, you've successfully registered your sitemap with Google!

It may take a couple of days before Google has a chance to process the sitemap and index your site. You can check the indexing status by clicking your site URL on the Webmaster Dashboard, which displays an overview of the latest crawl results.

Google Index Status
Google likes me!

The results page will list any problems that were encountered, invaluable for monitoring the health of your website. Notice the "URLs restricted by robots.txt"? That is caused by a conflict between the sitemap module, which tells Google that it should index user comments, and Drupal's robots.txt file which tells Google it shouldn't. I'm still sorting out the best way to handle that particular issue; I'll post a solution here when I have it.

Google Crawl Report
Mind those robots!

There is a lot of information available, including the top search keywords, your search ranking for those keywords, sites that link to yours, and more. Explore!

Submit To More Sites

Now that the sitemap is up and running, get the most out of it by sending it to more search engines. As of this writing (2007-09-16) the sitemap module can automatically submit the sitemap to Yahoo and Ask.com; just set the checkboxes on the settings screen. Or, if you prefer, you can manually submit your sitemap through Yahoo! Site Explorer.

Yahoo! sitemap settings
The sitemap module can send your sitemap to Yahoo! and Ask.com.

There doesn't seem to be an official sitemap submission processes for MSN (except for robots.txt, see below). There is a ping address, but it doesn't seem to recognize the sitemap format yet. The address, if you want to try it is:

http://search.live.com/ping?sitemap=http://www.mysite.com/sitemap.xml 

However, some people have reported that submitting through the "moreover" service does appear to work. Just enter this address in your web browser, and your site will show in the MSN index a couple of weeks later.

http://api.moreover.com/ping?u=http://www.mysite.com/sitemap.xml 

Add Your Sitemap to Robots.txt

You can also notify the world of your sitemap by adding it to your robots.txt file. This file is located in the root directory of your Drupal installation, and it may be modified with any text editor. Just add this line to the end of it:

Sitemap: http://www.mysite.com/sitemap.xml

Note that you must specify the full URL of your sitemap. And keep in mind that robots.txt is part of the Drupal distribution. If you update Drupal this file will be overwritten, so you'll have to come make the change again.

Also note that if, like me, you run multiple sites off a single Drupal install, you can't use this trick. There is only one robots.txt file, and it is shared by all the sites. And since the sitemap specification requires the full URL in robots.txt so you can't use a generic relative path; you must specify the sitemap for one and only one site. It appears that Drupal 6.x might support a dynamic robots.txt; that should resolve these problems.

That's It!

You now have a sitemap for your website, and will soon be included the search results for Google, Yahoo!, and Ask.com.

I will update this article from time to time as I discover new information or better solutions; check for updates on my blog.

Filed under: Drupal, google

I am Jason Perkins (starkos), the founder of Industrious One. I'm yammering on about life as an indie, getting things done, Saabs, roadtrips, finding inspiration, and creating the big audacious stuff.

projects:

categories: