AWS: Modifying Robots.txt within a Bitnami EC2 instance
Bitnami offers Wordpress for AWS Cloud which is great for developers whom are not interested in focusing on DevOps topics such as installing php, Apache and Wordpress.
The Bitnami package does it all but there are a few things you'll need to modify including robots.txt
files.
In this blog entry, I will show you how to modify your robots.txt
file for a Wordpress staging server. The goal will be to create a document that prevents search engines from showing your staging server on their search results.
Finding robots.txt
When you create a Bitnami Wordpress site, by default it comes with a /robots.txt
file but it's not easy to find. If you're interested in understanding its path, run a find command with grep.
sudo find / | grep robots.txt
Modifying robots.txt
Although the we were able to find a working copy of robots.txt
above, this is not where we should modify the file. Instead, we should create a new robots.txt
the "Bitnami Way".
Depending on how you intend to install Wordpress, there are two possible places to add a robots.txt
.
Option 1 (recommended)
If you access your Wordpress website as www.yourdomain.com/robots.txt then place the file here:
/opt/bitnami/apps/wordpress/htdocs
Option 2
If you access your Wordpress website as www.yourdomain.com/wordpress/robots.txt then place the file here:
/opt/bitnami/apache2/htdocs
Optimal Solution
I absolutely love the solution presented by Henning Koch. It's smart without running the risk to forget deleting the file once we go live.
Step 1 - Modify Robots.txt file
Create a new file titled robots.exclude.txt
.
nano /opt/bitnami/apps/wordpress/htdocs/robots.exclude.txt
Within this new file, disallow everything.
# This file is returned for /robots.txt on staging servers
User-agent: *
Disallow: /
The allows us to create a file specifically for a staging server without touching the live version of robots.txt
.
Step 2 - Modify Apache VHosts
The next step is to write a configuration that says, "If this website is staging or development, serve robots.exclude.txt
instead of robots.txt
.
This special rule is best written within the Apache config file. Specifically, the virtual hosts (VHosts) file.
Open the Apache VHosts config within the Bitnami package. To keep things simple, I will use nano
editor.
nano /opt/bitnami/apache2/conf/bitnami/bitnami-apps-prefix.conf
I'm choosing to paste this rule within bitnami-apps-prefix.conf
.
# Bitnami applications installed in a prefix URL
# Custom Robots.txt data here
RewriteEngine On
RewriteCond %{HTTP_HOST} ^subdomain.domain.com$
RewriteRule ^/robots.txt$ /robots.exclude.txt
...
Step 3 - Restart Apache
sudo /opt/bitnami/ctlscript.sh restart apache
Troubleshooting
Double checking your VHosts
If you want to double check the paths and better understand how things are connected, visit bitnami.conf
.
nano /opt/bitnami/apache2/conf/bitnami/bitnami.conf