Disaster Recovery in Cloud Computing: OVERVIEW OF AWS

companies with an infrastructure web services platform in the cloud. AWS allows you to requisition compute power, storage and distributed computing services on demand. The biggest advantage is the flexibility to choose development platform or language as per what makes the most sense to the development of your application. You pay only for what you use, with no up-front expenses or long-term commitments. It guarantees users the server scalability to the number of servers required, no matter how fast the application grows or how big it gets. This property allows it to absorb the peaks and valleys of your application’s workload. If one node fails, then another can be requisitioned within a matter of minutes, instead of hours, as would be required in a typical data center. This enables an always-on and self-healing infrastructure, without ever having to configure or replace hardware.

5.1. AMAZON EC2
Amazon Elastic Block Store (EBS) offers persistent storage for Amazon EC2 instances [22]. Amazon EBS defines an EBS volume that provides off-instance storage that persists independently from the life of an instance. Amazon EBS volumes are highly available and reliable which can be attached to a running Amazon EC2 instance and are standard block devices. EBS volumes offer greatly improved durability over local Amazon EC2 instance stores, as they are automatically replicated on the backend (in a single Availability Zone). For those wanting even more durability, EBS provides the ability to create point-in-time consistent snapshots of your volumes that are then stored in Amazon S3, and automatically replicated across multiple Availability Zones. These snapshots can be used as the starting point for new Amazon EBS volumes, and can protect data for long term durability. EBS is ideal for databases and file systems.

5.1.1 FEATURES
Total boot time of an Amazon EC2 instance is related to the size of an Amazon AMI, Amazon EC2 (Linux) allows the user to boot an instance of a server in the range of few minutes [5]. This guarantees a near instant availability of additional servers in cases where an application needs to scale up dynamically. This is typically the case when slashdotting occurs [13]. Once these servers are up and running, there is no CPU latency observed, since EC2 works on Xen virtualization, which allows each virtual CPU time of its own, with no sharing.

Elastic Compute Cloud employs a pay-as-you-go model with no minimal fee from the user for the usage. The services are available in two regions, namely, the US and Europe. There are four unique availability zones in the US whilst there are two unique availability zones in Europe. This allows a developer the freedom to choose his/her region, based on observed network latencies and geographical preferences. Each developer is given full control of his/her instance by granting root access authority to the machine. This enables a user to deploy desired custom software on each instance with administrative rights and configure application level security settings as well. In order to run the same configuration on multiple servers, the same image is booted n times across n Amazon servers. Hence, minimizing system administration efforts. There are startup companies such RightScale that have suite of system administration tools suitable for managing Amazon infrastructure [14].

EC2 is particularly cost-effective given its near instant launching of server instances and pay-per-use cost model for applications which experience unpredictable workload patterns. This type of demand pattern necessitates dynamic scaling of servers in order to satisfy QoS guarantees [15]. Amazon EC2 provides wide range of machines, application runtime and development environment which allows a developer choice of required platforms and environment, such as REST, SOAP APIs, in several languages (Java, PHP etc.) are provided in order to integrate this functionality within code as well [16].

As an added security feature, Amazon allows users to only present web servers to the Internet. This is done by not having a public IP address associated with every EC2 instance running. For a multi-tier application, you can choose to have a public IP address on your web server and have just an internal IP address on your database server. To access the database server, you would have to log into the web server and then ssh, telnet or RDP (Remote Desktop Protocol) from there to the database server.

5.1.2 LIMITATIONS
Once an EC2 instance is shut down, all data on that instance's hard disk is lost. Hence, there is no provision for persistent storage [17]. For user data, this means having a predefined backup policy for copying data between S3 and EC2. For system software, this backup is more difficult since changes may mean re-bundling and registering the new AMI which takes between 4 and 15 minutes depending on the size of the image and the power of the EC2 instance. This can be alleviated by taking a snapshot of EC2 instance every hour since cost of data transfer between EC2 and S3 is very cheap (i.e 0.01 cent per 1 GB).

Amazon instances do not have static IP addresses. This means that every time an instance is shut down and started again, it acquires a different IP address. This is a huge drawback for applications that make use of mail servers. A mail server runs only with a static IP address. A reverse proxy approach in this case does not work because it is used as a measure to combat spam. One needs a different static IP address for different servers to be run, preferably in different class C networks, because search engines don't like dynamic IP addresses. Hence, the requirement for most developers today is a static IP for each running instance such that e-mail can be sent which can pass spam filters. Another issue with IP addresses is that DNS resolutions are not cached for an extended period of time and Amazon cannot guarantee the validity of an IP address at any given time. To overcome static IP address problem Amazon provide Elastic IP address service model which can be assigned as a static public IP to each EC2 instance.

EC2 provides only instance level support and no support in terms of dynamic scalability at the application level. This is a useful feature for most developers but is missing from EC2 today.

On the Operating system front, reserved instances available from EC2, currently support only variations of Linux. Also, no public Macintosh AMIs are available. While custom-made images can be created from scratch, for Linux, this customization is restricted to systems using the 2.6 kernel only. Lower compatibility versions of the Linux kernel are not supported. This would mean refactoring many legacy applications running on 2.4.x kernels.

Considering the pricing model, reserved instance charges are applicable only when instances are run in the same availability zone. EC2 also places a maximum threshold of 20 instances per account that can be run concurrently at any given time. To run more than 20, we are required to fill an additional form. Replication to local server incurs additional network cost. Amazon EC2 support, which is at the instance level only, is billed as an extra charge.

EC2 defines an Annual Uptime Percentage of at least 99.95% in their SLA [18]. However, if the Annual Uptime Percentage for a customer drops below 99.95%, then the said customer is eligible to receive a service credit equal to 10% of their bill for the eligible credit period. Presently, Amazon does not offer SLA on I/O performance of EC2 instance. Furthermore, Amazon EC2 is based on X86 hardware therefore we cannot deploy X64 (UltraSPARC) application on Amazon EC2. Finally, If an application is running on 8 core CPU then it is not possible to migrate or deploy such an application on an Amazon EC2 platform.

5.2 AMAZON S3
Amazon Simple Storage Service (S3) provides a simple web services interface that can be used to store and retrieve any amount of data, at any time, from anywhere on the web. It gives any developer access to the same highly scalable, reliable, fast, inexpensive data storage infrastructure that Amazon uses to run its own global network of web sites [19].

5.2.1 THE S3 STORAGE MODEL
S3 stores data as named “objects” that are grouped in named “buckets” [19]. Buckets must be explicitly created before they can be used; each user may create up to 100 buckets. Bucket names are globally unique; an S3 user attempting to create a bucket named “foo” will be told that the bucket name is already in use. S3 is designed to primarily store large objects and hence objects may contain any byte string between 1 and 5 Gbytes. For each object, S3 maintains a name, modification time, an access control list, and up to 4 Kbytes of user-defined metadata. Each bucket has an access control list allowing read or read/write permission to be given to other specific AWS users or to the world.

5.2.2 FEATURES
S3 provides persistent storage for EC2 instances, the one thing that EC2 instances lacks. Moreover, S3 is fast and scalable. It allows an unlimited number of objects to be stored in buckets. Additionally, in order to increase availability of data, Amazon allows sharing of an S3 bucket between multiple EC2 instances. Snapshots of user or application data can be stored on S3 and can be accessed from any availability zone.

In terms of throughput, S3 has maximum throughput (single threaded) of approximately 20 MB/s or 25 MB/s (multithreaded) for a small instance. This rises to 50 MB/s on the large and extra large instances. S3 is slow for file listing and search is by prefix only. Thus, S3 can be performance optimized by using multiple buckets. The write performance is optimized by writing keys in sorted order.

Amazon has public APIs for S3 in Java, PHP etc, to facilitate the use of storing and retrieving data from S3, from within the application [20]. S3 follows a similar pay model as that of EC2, with no upfront minimal fees to be paid to Amazon for usage of S3.

5.2.3 LIMITATIONS
S3 is subject to “eventual consistency” which means that there may be a delay in writes appearing in the system. In terms of performance S3 suffers from higher latency as well as higher variability in latency. S3 write latency can also be higher than read latency. S3 delivers dramatically faster throughput on large objects than small ones due to a high per-transaction overhead. It is designed to quickly fail requests that encounter problems; it is the client’s responsibility to retry failed requests until they succeed. This is different than traditional web servers, which implement a “best effort” policy for satisfying web requests.

S3 supports PUT, GET, and DELETE primitives, but there is no way to copy or rename an object, move an object to a different bucket, or change an object’s ownership. Although these primitives can be implemented by combining PUT, GET and DELETE operations, it can take days to move a few terabytes of data from one bucket to another using successive PUTs and GETs. Furthermore, moving data via this strategy can result in significant data transfer charges unless the GETs and PUTs are done from EC2.

S3 allows only 100 buckets per user account. If additional buckets are needed, we are required to fill out an application form that needs to be approved by Amazon. Maximum storage size of an object in a bucket is 5GB. S3 SLA describes a monthly uptime percentage of at least 99.9% during any monthly billing cycle [15]. For application critical data this may prove to be insufficient.

Fundamentally, S3 is not ideal for querying with respect to database and content distribution with respect to filesystem. When storing object data, developers can associate metadata with each object. Metadata entries are key-value associations that are stored with the object. Developers may create any metadata entries necessary to support the application. Amazon doesn't publish a maximum number of metadata entries that may be associated with an object [21].

All types of business or services may not be comfortable with storing their data in the ‘cloud’ especially those with extremely sensitive and confidential data e.g financial [21]. Although, S3 promises 99.99% of uptime SLA, there were 2 major outages in February and July of 2008 which caused major disruption to services such as Twitter. Back in 2007 S3 had read/write speed issue [21]. Finally, Amazon does not provide de-duplication (version difference) at s3 level.

5.3 AMAZON EBS
Amazon Elastic Block Store (EBS) offers persistent storage for Amazon EC2 instances [22]. Amazon EBS defines an EBS volume that provides off-instance storage that persists independently from the life of an instance. Amazon EBS volumes are highly available and reliable which can be attached to a running Amazon EC2 instance and are standard block devices. EBS volumes offer greatly improved durability over local Amazon EC2 instance stores, as they are automatically replicated on the backend (in a single Availability Zone). For those wanting even more durability, EBS provides the ability to create point-in-time consistent snapshots of your volumes that are then stored in Amazon S3, and automatically replicated across multiple Availability Zones. These snapshots can be used as the starting point for new Amazon EBS volumes, and can protect data for long term durability. EBS is ideal for databases and file systems.

5.3.1 FEATURES
EBS provides unlimited size of block storage that can be formatted using a file system of your choice (ext3 for Linux and NTFS for Windows). No eventual consistency exists for EBS and it exhibits lower latency with less variation. It also has write-back caching policy for very low write latency. Unlike S3, it has a fast directory listing and searching.

EBS offers the same characteristic as 'pay-as-you-go' model and no minimal fees, as the rest of the Web Services offered by Amazon [22]. Snapshots of public datasets related to demographics, biology, chemistry are available and new volumes can be pre-loaded with these datasets [23]. Various databases currently available on AWS include Human Genome Data from ENSEMBL, PubCHEM Library from Indiana University and various census databases from the US Census Bureau.

5.3.2 LIMITATIONS
EBS volumes can only be attached to instances running in the same zone. Hence, in order to access data stored on a volume, say vol1, running in zone us-east-1c to an instance running in us-east-1b, we would require to create a snapshot of the volume vol1 and load this snapshot into a newly created volume, say vol2, in zone us-east-1b. This incurs additional data transfer cost.

For a 20 GB volume, Amazon estimates an annual failure rate for EBS volumes to be from 1-in-200 to 1-in-1000. The failure rate increases as the size of the volume increases. Therefore you either need to keep an up-to-date snapshot on S3, or have a backup of the contents somewhere else such that it can be restored quickly to meet your needs in the event of a failure. EBS has a maximum throughput defined by the network. This is approximately 25 MB/s on a small instance and 50 MB/s on large instances and 100 MB/s on extra large instances.
The maximum number of volumes that can concurrently be used by an account is 20 [22]. Use of additional volumes requires an additional request form to be filled out. Amazon does not provide re-duplication at EBS Level. Amazon does not offer SLAs on EBS. Only one EC2 instance can be connected to an EBS volume i.e. sharing a single EBS volume amongst two or more EC2 instances is not feasibleThe Elastic Compute Cloud or EC2 is a web service that provides resizable compute capacity in the cloud. The compute capacity allows leasing small, large or extra-large instances of virtual servers with one, two and four virtual cores respectively, running, either at normal capacity or at high CPU capacity. Such virtual servers are defined in terms of EC2 Compute Units, where one EC2 Compute Unit (ECU) provides the equivalent CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor. Two types of servers are available, those that are launched and terminated on demand and those that are reserved. Whilst on demand and reserved instance are equivalent in terms of functionality, the respective cost model is quite different [5].
Each server is available as an Amazon Machine Image (AMI). An AMI is an encrypted file stored in Amazon's persistent storage service, called Simple Storage Service or S3 [6]. It is uniquely identified by an eight digit number, of the form ami-xxxxxxxx [7, 8]. An AMI contains all the information necessary to boot an instance of the virtual machine with a guest OS and relevant applications. Amazon makes available a number of such AMIs in the form of public AMIs, in different flavors such as Linux, Windows, OpenSolaris etc. A number of customized AMIs are publicly available, such as the customized Ubuntu AMI by alestic [9]. We have the freedom to use the images already made available by Amazon, or create one specific to the application. These AMIs can be used as they are, or customized and repackaged. A running instance of an AMI or a server is called an EC2 instance [5]. Repackaging is done through a process called bundling whereby the AMI is uploaded on S3 and the new AMI is registered with Amazon [7, 8].
Amazon's EC2 infrastructure is built using a large number of machines based on x86 hardware running Xen [10]. Rather than controlling the instances (which are Xen DomU guests in Xen terminology) from the host machine using the xm command, in a traditional Xen setup, one controls them with an XML web services API. Amazon also provides a set of Java command line tools called ec2-tools which implements the XML web services API so one can control instances from the command line if preferable [11, 12]. To start an instance one uses an XML web service API to instruct EC2 to download a series of encrypted and compressed 10Mb chunks of data from Amazon's S3 service for the particular image one wishes to use. EC2 then reassembles by decrypting and decompressing the image and subsequently boots the operating system. The kernel on your AMI gets replaced with a Xen 2.6.16 kernel compiled with GCC 4.0 because Amazon does not allow custom kernels.
6. A

Disaster Recovery in Cloud Computing

Friday, July 16, 2010

OVERVIEW OF AWS

2 comments: