Planning Availability in the Cloud: The Laws of Physics Still Apply!

Just like on-premises applications, the availability for cloud applications needs to be carefully planned. In this blog post I’ll discuss different levels of availability for Windows Azure-hosted applications.

Ultimately the level of availability you choose needs to be a business decision that balances cost with your personal tolerance for the nines.

To guide us though this discussion, I’ll use a typical Windows Azure application as an example. The architecture of our sample application consists of three primary components: Windows Azure Compute Instances, Blob Storage Data, and SQL Azure Data.

Compute: The compute component is comprised of a single web role that hosts two to three extra-large instances. The extra-large instances are being utilized to the high IO and large number of CPUs to facilitate dynamic image scaling. The web role hosts an ASP.NET MVC application that serves up both an HTML 5 and Silverlight version of the application. The application is also currently using a large portion of local file storage to cache resized images.

Blob Storage: The Blob (Binary Large Object) Storage component is being used as the source image repository. It contains the original images and thumbnails for all content. The approximate size of the blob container is 1 TB.

SQL Azure: The SQL Azure database is being used to store relational content for the application. The approximate size of the SQL Azure database is approximately 2 GB.

Now, Let’s Talk Availability Options.

The availability options range from no potential downtime to more than 20 minutes of time. As expected, the associated costs for these options are negatively correlated to the length of the downtime.

Minimal to No Potential Downtime: Hot Swappable Environments

This scenario provides for minimal to no downtime by relying on a backup version of the application hosted in one or more data centers. In this implementation, fully functional duplicates of the entire application stack are hosted in one or more data centers, including the Hosted Service, Blob Storage, and SQL Azure components. Application requests are routed to the various data centers using Windows Azure Traffic Manager. A policy can be defined to keep all the data centers in active rotation, or simply provide for failover. The only data is synchronization required is for SQL Azure and can be achieved using SQL Azure Data Sync. The blob storage component can be utilized as an independent copy since its primary used for read-only access to content.

Note: If the use of Windows Azure Traffic Manager is not desired, this scenario can be supported by using a CName record and external management service.

Expected Costs: This scenario requires the use of Windows Azure Traffic Manager and a fully functional backup deployment including all compute, blob storage SQL Azure, SQL Data Sync, and associated bandwidth costs. An estimated cost for this scenario is $3153.45 per month (as outlined in the Cost Breakdown graphic below.)

Less Than 20-Minute Potential Downtime: Backup Storage

This scenario protects against extended periods of blob storage and SQL Azure outages, but does not address the downtime needed to spin up additional compute resources. A backup copy of the data repositories are kept in another data center. When an outage is detected, an administrator will be required to manually deploy a new instance of the hosted application, which points to the backup storage. In this scenario, a CName record will also be required to provide for consistent access to the application. The primary benefit of this scenario is the lack of downtime required to bring the data content online and the cost savings of not having a set of backup compute instances running. There still is a period of downtime, however, while the new hosted service is being brought online.

Note: Windows Azure Geo-Replication is not accessible by the consumer and only will be activated in the event of a critical data failure and at the discretion of Microsoft. This scenario addresses the instance where a data center failure occurs, but Microsoft chooses not to failover to the alternate data center location.

Expected Costs: This scenario requires the use of backup blob storage and SQL Azure components. Additional incurred costs are related to blob storage data storage, SQL Azure data storage, SQL Data Sync and associated bandwidth costs. An estimated cost for this scenario is $1713.45 per month.

Greater Than 20-Minute Potential Downtime

This scenario relies primarily on the existing architecture and configuration. All application components are deployed within a single data center and rely on the compute and storage resiliency within that data center. For the Compute components, the application will get the benefit of two fault domains and multiple upgrade domains. For the blob storage and SQL Azure components, the application will get the benefit of the inherent replicas built into the products. This scenario does not protect against a data center outage and in that instance, the blob storage repositories/data will need to be redeployed to another data center, the SQL Azure database restored from backup, and the hosted service re-deployed to the alternate location.

Cost Estimate Breakdowns*

*Cost estimates based off of Windows Azure Pricing Calculator on a Pay-As-You-Go model.

About Harin Sandhoo

Harin Sandhoo is a solutions architect with over 10 years of experience working with federal and commercial clients. He has extensive expertise in a wide range of technologies and development methodologies. He has most recently been working with SharePoint and cloud platforms such as Windows Azure. Harin also participates in Microsoft’s VTSP Program. Prior to joining AIS, Harin worked for L-3 Titan and ITT Industries. Harin received both his Bachelor’s degree in Computer Science and MBA from the University of Maryland, College Park.