Cloud Computing Lab: Implementing a Backup Utility

The objective of this lab is to implement a simple backup utility using cloud computing services. You may work in pairs or individually.

Background

We will be using the cloud computing provided by Amazon, known as Amazon Web Services (AWS). The specific AWS service used in this lab is the Amazon Simple Storage Service (S3). S3 allows users to store data items, known as objects, in locations known as buckets. Technically, an object is an arbitrary array of bytes, but it's best to think of an object as being the same thing as a file stored on a computer's hard drive. Each object has a key, which is a string that can be used to retrieve the object, and is analogous to the full name of a file, including its parent folders. The data in an object is analogous to the contents of a file. Is S3, a bucket is an abstract concept that can be thought of a location where objects are stored. Very roughly, a bucket is analogous to a disk drive attached to a computer: just as you can store files on the disk drives attached to your computer, you can store S3 objects in the buckets in an S3 account. One important difference between S3 buckets and the file systems we are used to is that buckets have no notion of hierarchy: a bucket has no folders or directories, so its namespace is flat and every object has a unique key in the bucket's namespace.

Certain credentials are required interacting with S3. Specifically, you need an access key and a secret key. The access key is similar to a login name; it allows S3 to identify you. The secret key is similar to a password. For this lab, you have been provided with an access key, a secret key, and the name of a bucket. You have full privileges to read, write, and list objects within this bucket.

Please note that your bucket resides within the instructor's personal AWS account. Amazon charges the instructor real money for use of this account. For any reasonable usage, the charges are minuscule: about 10 cents per gigabyte per month for storage, and one cent for every thousand objects transferred into or out of S3. Thus, please limit your backup experiments to at most a few thousand objects whose total size is less than a gigabyte. As with any other unacceptable behavior at the college, any abuse of your S3 access will be dealt with through the college disciplinary system. After the lab has been graded, your S3 credentials will be revoked and all objects in your bucket will be deleted.

Basic Requirements

The final product of your lab should be an executable program that can be run from the command line, in two different modes: backup mode and restore mode. The first commandline argument should specify the mode, and the remaining arguments and program behavior will depend on the mode, as follows:

Apart from these high-level requirements, the behavior of the program is left unspecified. You are free to use creativity in designing a useful, robust, and user-friendly backup utility. In particular, your program may use additional commandline arguments.

Possible extensions

The requirements given above describe a bare-bones backup utility. Some optional possible extensions to make this utility more useful include the following:

Hints for using S3 with Java

Amazon provides many ways of interacting with S3, including Java, Perl, PHP, C#, Python, and Ruby. You are welcome to use any programming language and programming tools for this lab, but these notes will provide tips only for Java, and will assume you're developing within the Eclipse environment.

Assuming you are using Java, you'll need the following zip file of code libraries and documentation: CloudLab.zip. Unzip the file, create a new Eclipse project, and add the four .jar files to its build path ("add External JARs" in Java Build Path properties). Add the AWS JavaDoc to the project by associating it with the file aws-java-sdk-1.1.1.jar (right-click on this file in Package Explorer, choose Properties, choose JavaDoc location, and navigate to the directory aws-javadoc in your unzipped version of CloudLab.zip).

Following are some code snippets that provide simple examples of some of the basic functionality you will need from S3.

For further details, consult the online JavaDoc for AWS, and other documents as needed from Amazon's online S3 documentation.

Submission

Submit a single zip file of all your source files to Moodle.

Grading

The code will be graded on the following criteria: correctness, clarity, elegance, and efficiency. Documentation is particularly important in this assignment. Make sure to include comments explaining how to run your utility, including the precise meaning of all commandline arguments.

It will be possible to achieve an excellent grade on the assignment by implementing only the bare-bones utility described above in the "Basic Requirements" section. A reasonable attempt at any of the extensions suggested in the "Possible extensions" section above will receive up to 10% extra credit. If you attempt any of the extensions, clearly indicate this in the comments at the top of your main code source file.