In this post I will go through the steps on how to easily set do an S3 cross region migration or start the replication if you want to keep them synced.
What will we use today?
- AWS Cross Region Replication
- Python + Boto
- A simple script
Amazon Web Services enabled a wonderful serviced called CRR (Cross Region replication) around 4 years ago. Cross-region replication (CRR) enables automatic, asynchronous copying of objects across buckets in different AWS Regions.
Lovely no? We just go into our console. Follow the simple steps to turn on CRR and we are done. Buuuuuuuuuuuuuut there is a little problem in our plans my dear reader. If you dig a little deeper you will see this dreadful line in:
Objects created after you add a replication configuration, with exceptions described in the next section.
Which means that all your previous data will not be replicated! This doesn’t matter if you are starting from cero but if you already have a lot of data this is far from ideal.
So the rest of this post is on how to make CRR think that the old files are actually new files so that they can be replicated using CRR.
So before you move one please enable CRR and make sure you make a rule to delete previous versions of your S3 objects or your storage will grow quite fast.
You could run the script on either you laptop or an external server (e.g EC2). That depends on you and the volume of data.
With boto we will be able to do actions directly to Amazon inside python. The first step would be to give the scripts the necessary permissions. For this you need a pair of keys. Please follow the AWS tutorials to get a nice pair of keys.
So the trick to fool CRR is quite simple. What the script does is adding metadata to each object in a bucket. It adds a metadata called CRR and just stamps the day and time the script runs.
By changing the metadata CRR thinks this is a new file and then replicates! So you only need to run this script once per bucket and voilà CRR will do the rest.
So there are 3 likely places you need to add your information on the script.
1. The Secret keys mentioned above
# boto is the library to use AWS in python import boto3 # Start a session with your keys session = boto3.Session( aws_access_key_id='AWS_SERVER_PUBLIC_KEY', aws_secret_access_key='AWS_SERVER_SECRET_KEY', )
2. The bucket name
# The bucket that you want to copy bucket = "Your_Bucket"
3. The metadata you wish to add
And that is all you need! So, at last, here is the script:
To run it just save it on the computer you wish to use and in a terminal use
At the end if you left my metrics you should be greeted with how many objects were modified, how many GBs and the execution time. I do recommend that for really big buckets you remove this parts of the code.
Just so you have an idea. From my actual use of this script it took around 42 secs per GB in average.
And cero data loss haha just if you were wondering.