Monday, July 19, 2010

Case Study: Multi-tierWeb Application - Disaster recovery in cloud computing


To understand the cost of providing DR in the cloud, we
first consider a common multi-tier web application architecture
composed of several web front ends connected to
a database server containing the persistent state for the
application. This scenario illustrates how some components
of an application may have different DR requirements.
The web servers in this example contain only
transient state (e.g., session cookies that can be lost without
significantly disrupting the application) and only require
a weak backup policy; we assume that all the front
ends can be recreated from a template image stored in the
backup site and do not require any other form of synchronization.
The database node, however, requires stronger
consistency and uses a disk based replication scheme to
send all writes to a VM in the backup site. Applications

such as this are a natural fit for a cloud based DR service
because fewer resources are required to replicate the
important state than to run the full application.
To analyze the cost of providing DR for such an application,
we calculate the Replication Mode and Failover
Mode costs of running DR for the RUBiS web benchmark.
RUBiS is an e-commerce web application that
can be run using multiple Tomcat servers and a MySQL
database [3]. Figure 1 shows RUBiS’s structure and how
it replicates state to the cloud. We calculate costs based
on resource usage traces recorded from running RUBiS
with 300 clients, and prices gathered from Amazon’s
Cost Comparison Calculator [1]; we have validated that
the colocation pricing information is competitive with offerings
from other providers.
Cost Breakdown: Figure 2(a) shows the yearly cost
for running the DR service with a public cloud or a private
colocation facility. The server cost only requires one
“small” VM to run the DR server in Replication mode in
the cloud whereas the colocation DR approach must always
be provisioned with the four “large” servers needed
to run the application during failover. Figure 2(b) shows
the resource requirements for both modes. The network
and IO consumption during failover mode includes the
web traffic of the live application with clients whereas the
replication mode only includes the replicated state persisted
to the database. The storage cost for EC2 is based
on EBS volumes (Amazon’s persistent storage product)
and IO costs, whereas the colocation center storage cost
is included as part of the server hardware costs.
99% Uptime Cost: Since disasters are rare, most of
the time only the Replication Mode cost must be paid.
The best way to compare total costs is thus to calculate
the yearly cost of each approach based on a certain
level of downtime caused by disasters. Assuming a
99% uptime model where a total of 3.6 days of downtime
is handled by transitioning from Replication to Failover
Mode, the yearly cost of the cloud DR service comes to
only $1,562, compared to $10,373 with the colocation
provider—an 85% reduction (Figure 2a). This illustrates
the benefit of the cloud’s pay-as-you-go pricing model—

substantial savings can be achieved if the cost to synchronize
state to a backup site is lower than the cost of running
the full application.
Cost of Adding DR: Our analysis so far considered
the primary site to run on the user’s own private resources,
but they could also be run in the cloud. However,
simply using cloud resources does not eliminate the
need for DR—it is still critical to run a DR service to
ensure continued operation if the primary cloud provider
is disrupted. Running the whole application in the cloud
costs $18,992 per year and using cloud DR in addition
only adds 8%. Running the application in a colo center
costs more in the first place ($24,095 per year) but adding
DR in a second colo facility increases the total cost by almost
42%. Finally, if a colocation center is used for the
primary site but a cloud is used for DR, then the incremental
cost of having DR is only 6.5%.

No comments: