Generating routing rules for Amazon S3 bucket

Today I've moved my personal site from free PHP hosting to Amazon S3 in static website hosting mode. The reason was that I didn't feel that it's right thing to host static content in MySQL database ツ. As result all my URLs changed. Search engines indexed old URLs long time ago, and to save their rates it was necessary to make redirection from old URLs to new ones. Yes, in some way it's a SEO-related article ツ

There were not too many pages, but enough to consider at least semi-automating the process. First I created a list of old URLs. For this purpose in Linux I used wget on old website version:

wget -r --spider --no-verbose

and cropped hostnames from its output using Geany (vertical text selection rules!), and less than in minute I had a list.

Then I had to put the corresponding item to each element in the list. Unfortunately I had no choise but do it manually. So I had a file with two columns:

index.php                        index.html
blog.php?id=21                   blog/mysql-utf-8-database-creation-snippet.html
blog.php?id=20                   blog/jboss-as-7-configuration.html

Amazon S3's routing rules are XML file actually. To convert my text file to XML format I decided to write a simple Python script. Luckily, 'cause I didn't obtain correct URLs after the first attempt ツ

UPD: It's Ok if you don't know how to execute Python scripts. I wrote a simple GWT application to simplify the process, it replaces Python script completely. Just follow the link and copy-paste the redirection mapping you've just get (yes, that two columns).
For curious, you can find the source code of this GWT application in this GitHub repository.

import sys, re

def txt_to_xml(filename):
    fr = open(filename)
    fw = open(filename + '.xml', 'w')
    fw.write('<?xml version="1.0"?>\n')
    for line in fr.readlines():
        line = line.strip()
        if not line:
        listFromLine = re.split('\s+', line)
        % (listFromLine[0],listFromLine[-1]) )

for filename in sys.argv:

After executing the script

$ ./ redirects.txt

I had XML file ready for insertion to AWS S3 bucket's settings via AWS Console:

S3 bucket's settings

and after applying changes the requests to old URLs are redirected:

Redirection in browser

Well, I've got acquainted with SEO. Hope, I will have much time posting here in future, and creating interesting and searchable content.

Tagged as : Amazon S3 Python