AWS: Modifying Robots.txt within a Bitnami EC2 instance

Bitnami offers Wordpress for AWS Cloud which is great for developers whom are not interested in focusing on DevOps topics such as installing php, Apache and Wordpress.

The Bitnami package does it all but there are a few things you'll need to modify including robots.txt files.

In this blog entry, I will show you how to modify your robots.txt file for a Wordpress staging server. The goal will be to create a document that prevents search engines from showing your staging server on their search results.


Finding robots.txt

When you create a Bitnami Wordpress site, by default it comes with a /robots.txt file but it's not easy to find. If you're interested in understanding its path, run a find command with grep.

sudo find / | grep robots.txt

Modifying robots.txt

Although the we were able to find a working copy of robots.txt above, this is not where we should modify the file. Instead, we should create a new robots.txt the "Bitnami Way".

Depending on how you intend to install Wordpress, there are two possible places to add a robots.txt.

Option 1 (recommended)

If you access your Wordpress website as www.yourdomain.com/robots.txt then place the file here:

/opt/bitnami/apps/wordpress/htdocs

Option 2

If you access your Wordpress website as www.yourdomain.com/wordpress/robots.txt then place the file here:

/opt/bitnami/apache2/htdocs

Optimal Solution

I absolutely love the solution presented by Henning Koch. It's smart without running the risk to forget deleting the file once we go live.

Step 1 - Modify Robots.txt file

Create a new file titled robots.exclude.txt.

nano /opt/bitnami/apps/wordpress/htdocs/robots.exclude.txt

Within this new file, disallow everything.

# This file is returned for /robots.txt on staging servers
User-agent: *
Disallow: /

The allows us to create a file specifically for a staging server without touching the live version of robots.txt.

Step 2 - Modify Apache VHosts

The next step is to write a configuration that says, "If this website is staging or development, serve robots.exclude.txt instead of robots.txt.

This special rule is best written within the Apache config file. Specifically, the virtual hosts (VHosts) file.

Open the Apache VHosts config within the Bitnami package. To keep things simple, I will use nano editor.

nano /opt/bitnami/apache2/conf/bitnami/bitnami-apps-prefix.conf 

I'm choosing to paste this rule within bitnami-apps-prefix.conf.

# Bitnami applications installed in a prefix URL

# Custom Robots.txt data here
RewriteEngine On
RewriteCond %{HTTP_HOST} ^subdomain.domain.com$
RewriteRule ^/robots.txt$ /robots.exclude.txt

...

Step 3 - Restart Apache

sudo /opt/bitnami/ctlscript.sh restart apache

Troubleshooting

Double checking your VHosts

If you want to double check the paths and better understand how things are connected, visit bitnami.conf.

nano /opt/bitnami/apache2/conf/bitnami/bitnami.conf 

Resources