- Cool Web Stuff
- People Acting Badly
- User Interface
- Web News
Remember your robots.txt to ensure your development/staging site remains invisible
As part of our website creation process, we utilize one or more servers for development. This keeps the live site live while we do our work but gives us a dev website on the Internet for testing purposes.
Because the development site is also on the web, it needs its own web address (e.g. dev.yourcompany.com). There are many adjustments needed in the configuration file to accommodate the development URL, especially when testing third-party connections. For example, ecommerce gateways don't like when credit card information is coming from a URL other than expected.
Often, the robots.txt file gets lost in the shuffle. This file controls what pages within a website are eligible for search engines to index and display as part of their search returns. It is an important element of a live website and, it turns out, an important element of your development site.
Naysayers will say that this is not very important, because the search engine needs to find the site before it can index it, and that means there must be an incoming link. There should be no incoming links to the dev site.
Mostly true, however consider this scenario. A website development project can often take months to complete. A marketing executive is giving a presentation and wants to showcase his new (under development) website so he takes a screenshot and adds a link to his PowerPoint presentation. Of course, this presentation is provided to the conference organizers who put it online. And now we're off to the races.
A search engine crawls the conference site and follows the link to the dev site. It looks for robots.txt and follows instructions, including indexing the whole site if robots.txt is missing. Within a few seconds, the entire new site is indexed and being returned in searches.
Because the robots.txt file needs to be present on dev and different than its counterpart on live, we create a special development version outside of document root and add a conditional pointer to in the Apache Web config files.
First, the simple job of creating the "index nothing" robots.txt file:
We save that in a centralized place and call it devel-robots.txt. Then we add the following to our base custom-rewrites.conf file:
Alias /robots.txt /centralized-place/devel-robots.txt
Wrapping the alias in a conditional ensures that it will only be applied in the development environment.
Updated 07/02/12 @ 01:45PM CDT by lbk