Identifying a Rogue Azure Web Role Instance

When dealing with a web “farm” situation, there are various reasons a developer needs to identify which machine responded to a request, especially when dealing with particularly nefarious bugs.

Recently, an Azure production application I was monitoring was experiencing a problem with stale/old data showing up intermittently as users navigated the website. I began to suspect a single instance of the application did not have a properly updated cache on it. I needed to know which Web Role Instance was causing the problem so I could restart it. Obviously, there was a bug that would need to be tracked down in the future, but the immediate need was to stop the problem.

For lack of other information, I had to restart each instance of the web role individually, waiting for that one to come back up and move on to the next. I couldn’t trust the situation until every single instance was restarted.

I eventually found that bug and fixed it, but I wanted to mitigate this type of situation in the future. At first, I thought about adding an additional “standard” field to our JSON structures that showed which role instance handled the request, but realized that wouldn’t help us if a regular web call or failed WebAPI call was made. In order to address every kind of HTTP request, I chose to add an HTTP Header called “Azure-WebRole-Instance” to every web response.  This way, we’re covered in every scenario, since all the calls are HTTP calls.

I wrote a simple HttpModule to add this header.  The code, in its entirety, follows:

using System.Web;
using Microsoft.WindowsAzure.ServiceRuntime;

namespace AppliedIS.Web.Modules
{
    /// <summary>
    /// Append "Azure-WebRole-Instance-ID" HTTP Header to all responses.
    /// </summary>
    public class WebRoleInfoModule : IHttpModule
    {
        static bool _isAzure;
        static string _instanceID;

        static WebRoleInfoModule()
        {
            _isAzure = RoleEnvironment.IsAvailable;
            if (_isAzure)
            {
                _instanceID = RoleEnvironment.CurrentRoleInstance.Id;
            }
        }

        public void Init(HttpApplication context)
        {
            if (_isAzure == true)
            {
                context.PostRequestHandlerExecute += (sender, e) =>
                {
                    HttpContext httpContext =
                        ((HttpApplication)sender).Context;
                    HttpResponse response = httpContext.Response;
                    response.Headers.Add("Azure-WebRole-Instance-ID",
                        _instanceID);
                };
            }
        }
        public void Dispose() { /* Not needed */ }
    }
}

Adding this module into a project is simple. Just add this to your web.config:

<system.webServer>
  ...
 <modules>
 <add name="WebRoleInfo"
 type="AppliedIS.Web.Modules.WebRoleInfoModule, AppliedIS.Web"/>
 </modules>
  ...
</system.webServer>

Since the code also checks to make sure that we’re running in Azure, this won’t adversely affect the application when it’s running in a non-Azure environment.

Here is the result running inside the Chrome development tools:

Now you can track down that rogue instance in order to keep your production sites running properly, so you can then go track down the real problem in your code with proper instrumentation of your code….but that’s a whole separate blog post!

 

About Tom McKearney

Thomas has been with AIS for almost 12 years. His background has been varied and has been involved in nearly all mission critical development areas, including Automated Weather Observation, Phone Switch Management, Financial Software, Facial Recognition, Fingerprints, Document Management Systems and Military Battle Simulation Software, etc.
He has developed at all levels, from on-chip Smart Card development and Embedded systems up to large scale distributed systems deployed to Azure. The last 12 years he has been almost exclusively doing .NET development in C#. He recently lead a multi-year Azure-based project for a large commercial jewelry store chain, followed by an effort on a Windows 8.1 tablet based application in WinJS. Some of his specialties are: Software Architecture (Analysis, Design and Implementation), general problem solving, various .Net-related toolsets like Azure, WPF, Silverlight, JavaScript, HTML and related web technologies.

He is also technical reviewer on various books, including Windows Store App Development (C# and XAML), Windows 8 Phone in Action, Silverlight 5 in Action, Silverlight 4 in Action and Scratch 1.4 Beginner’s Guide.

  • Thiago Silva

    I am having an issue where a 2-instance web role deployment is only responding to web requests from a single instance. The 2nd instance doesn’t seem to want to respond. I manually added the headers to IIS via an RDP session just to confirm that instance wasn’t responding. Do you know what would make a web role instance not respond to requests anymore? Or could this be a load-balancing issue on Azure? If the latter, how do you address that?

    • Maxim

      Hi Thiago,
      I also found that the load balancer is not sending every second request to different instances. The only way I could hit the second machine in my case is to put more request to the first one, then I could hit the second machine. But while the load is not enough it would not trigger balancing.

      • Thiago Silva

        that makes sense. thanks, I’ll keep that in mind.