Azure AppGateways and Sitecore’s Use of X-Forwarded-For

I’m writing this up so I have a convenient reference for future projects — it looks like there’s a bug with Sitecore’s Analytics library and how it handles IP addresses through an Azure Application Gateway.

Sitecore relies on the X-Forwarded-For HTTP header when a load balancer sits between the Sitecore IIS server and the client browser.  I rarely encounter Sitecore implementations without load balancers, they’re critical for performance, security, resiliency during upgrades, etc.  During some Sitecore testing behind the App Gateway, we observed the following message in the Sitecore logs:

Cannot parse a valid IP address from X-Forwarded-For header

About this time, a friend of mine — and another smart Sitecore guy “Bryan Furlong” — commented to me how his current project ran into port numbers in their IP addresses for xDB purposes . . . so we committed to investigating.

Using Reflector, I confirmed this specific “Cannot parse a valid IP address from” exception message appears in the Process method in the Sitecore.Analytics.Pipelines.CreateVisits.XForwardedFor class:

try
{
    address = IPAddress.Parse(ipFromHeader);
}
catch (FormatException)
{
    Log.Warn($"Cannot parse a valid IP address from {forwardedRequestHttpHeader} header.", this);
    return;
}

It looked like the Azure App Gateway, a specific variety of load balancer for Azure implementations, includes port numbers with the IP address when relaying traffic.  This port number is not handled well by the Sitecore.Analytics processing code, and — in this particular case — led to the failure of GeoIP resolution for an Azure Sitecore implementation.

To verify what was going on, I added the X-Forwarded-For field as a custom field to the IIS Logs and compared the contents.

xfor

Behind the Azure App Gateway, “X-Forwarded-For” fields in the IIS Logs show data such as:

  • 50.14.232.1:45712
  • 74.14.167.1:28336

By comparison, behind the other types of load balancers I looked at, the IIS Logs show data such as:

  • 46.246.335.99
  • 92.77.214.84

Looks like confirmation of the issue!

One cool aspect of working at Rackspace is access to lots of smart people across the industry, and we verified with the App Gateway team at Microsoft that X-Forwarded-For is a comma separated list of <I{:Port> and changing the presence of the port number is NOT currently configurable.  We would need our Sitecore implementation to strip off the port portion.

The Sitecore customization to address this is fairly straight-forward.  Instead of the default CreateVisit pipeline defined as follows in Sitecore.Analytics.Tracking.config . . .

      <createVisit>
        ...
        <processor type="Sitecore.Analytics.Pipelines.CreateVisits.XForwardedFor, Sitecore.Analytics">
          <HeaderIpIndex>0</HeaderIpIndex>
        </processor>
        ...
      </createVisit>

. . . one must introduce their own library and override the GetIpFromHeader method to account for a port number:

public class XForwardedFor : Sitecore.Analytics.Pipelines.CreateVisits.XForwardedFor
    {
        protected override string GetIpFromHeader(string theHeader)
        {
            string[] source = theHeader.Split(new char[] { ',' });
            int headerIpIndex = base.HeaderIpIndex;
            string str = (headerIpIndex < source.Length) ? source[headerIpIndex] : source.LastOrDefault<string>();
            if (string.IsNullOrEmpty(str))
            {
                return null;
            }
            string[] strArray2 = str.Split(new char[] { ':' }, StringSplitOptions.RemoveEmptyEntries);
            if (strArray2.Length > 1)
            {
                str = strArray2[0];
            }
            return str.Trim();
        }
    }

In talking through this all with Sitecore support, they confirmed it’s a product bug and tracked as 132442 issue.

To ensure our custom code replaces the default Sitecore pipeline code, the following patch include file is important:

<?xml version="1.0" encoding="utf-8" ?>
<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/" xmlns:xdt="http://schemas.microsoft.com/XML-Document-Transform">
  <sitecore>
    <pipelines>
      <createVisit>
        <processor type="Our.Custom.Namespace.And.Class.XForwardedFor,Our.Custom.Assembly.dll"
              patch:instead="*[@type='Sitecore.Analytics.Pipelines.CreateVisits.XForwardedFor, Sitecore.Analytics']" />
      </createVisit>
    </pipelines>
  </sitecore>
</configuration>

The Deal With Reverse Proxies and Sitecore

Once in a while the topic of a reverse proxy with Sitecore comes up in client conversations . . . I think it’s not out-of-the-question with Sitecore, but it can certainly complicate an implementation.  Let me share a few of the complications I’m aware of, and some remedies. 

Reverse Proxy for Media Requests

I know anecdotally of customers using a reverse proxy just for media items, where they customized the publishing pipeline to clear the reverse proxy cache for the specific item.  This blog post explains it further: http://sitecoreblog.patelyogesh.in/2014/05/improve-sitecore-media-performance.html, but there are problems with this if the request contains language in the file path (en-us/item/…) and in some other cases.  I would consider this experimental as I haven’t personally seen it working in a real implementation.  There are a few blogs out there, however, that claim it’s 100% reliable for them.

I know one Sitecore implementation pursuing an approach somewhat like this, where they farm out media requests to dedicated Sitecore servers to only handle media.  This is one way they’re working to pull the overhead of returning media off of the main Sitecore IIS servers and onto another set of servers tuned for media requests.  A key difference here is that instead of this being a true reverse proxy, they’re using load balancer rules to steer media requests to specific Sitecore servers to handle the request.  From 1 million miles up, one could consider this a reverse proxy – but on closer inspection, it’s really a load balancer connecting to dedicated Sitecore media servers.

Reverse Proxy for Everything

I also know of customers using a reverse proxy more generally for content, so they must use the setting ‘Analytics.ForwardedRequestHttpHeader’ and set it to ‘X-Forwarded-For’ or ‘X-Real-IP’, depending on their proxy settings (it varies based on how the proxy represents the original IP address).  Through this setting, Analytics can make use of the client IP and provide content as one would expect from Sitecore.  This is the standard response to “how does one configure Sitecore to work with a reverse proxy.”

One Known Sitecore Gotcha & Resolution

Amazon CloudFront does what many reverse proxies do and pushes a comma-separated list of IPs into their HTTP Header (see the Client IP Address section in the Amazon docs) so it comes through in the header as X-Forwarded-For: client-IP-address, proxy-IP-address, another-proxy-IP-address.

By default, when using the ForwardedRequestHttpHeader, Sitecore pulls the last address in a comma-delimited string  in the forwarded request header (in the processor for Sitecore.Analytics.Pipelines.CreateVisits.XForwardedFor is this Process method . . . yellow highlight shows the logic in question) :

        public override void Process(CreateVisitArgs args)
        {
            string forwardedRequestHttpHeader = AnalyticsSettings.ForwardedRequestHttpHeader;
            if (!string.IsNullOrEmpty(forwardedRequestHttpHeader))
            {
                string str2 = args.Request.Headers[forwardedRequestHttpHeader];
                if (!string.IsNullOrEmpty(str2))
                {
                    string str3 = str2.Split(new char[] { ',' }).Last().Trim();
                    if (string.IsNullOrEmpty(str3))
                    {
                        this.LogWrongIp(forwardedRequestHttpHeader, str2);
                    }
                    else
                    {
                        IPAddress address;
                        try
                        {
                            address = IPAddress.Parse(str3);
                        }
                        catch (FormatException)
                        {
                            this.LogWrongIp(forwardedRequestHttpHeader, str2);
                            return;
                        }
                        args.Visit.Ip = address.GetAddressBytes();
                    }
                }
            }
        }

The resolution for this is to swap in a custom assembly and config file (Sitecore support can hook you up with that if you need it, reference issue #421555) that provides specific behavior to change this use the last IP logic via a new Analytics.ForwardedRequestHttpHeaderGetFirstIP setting.  This may be folded into the main Sitecore product at some point, since it’s recognized as a common challenge for some reverse proxies.

One Known Sitecore Gotcha Without Resolution

If you’re running with SSL so url https://theSite.com/Sitecore  goes to your reverse proxy and transforms into a non-SSL request, as in http://theSite.com/Sitecore, then once it reaches Sitecore, some client dialogs using IFrames wouldn’t work due to the src value being http and not https.  I think the main point here is be careful around SSL and a reverse proxy with your site, especially the Sitecore Authoring environment.  I’m no reverse proxy expert, so I don’t know how wide-spread a challenge this is, but I would thoroughly test with SSL certificates any Sitecore configuration using a reverse proxy.

Conclusion

With these complications in mind, I would suggest one avoid the reverse proxy for Sitecore requests as a general rule, but if security or other external pressures drive the decision, it’s not out-of-the-question.  Just test it thoroughly and consider engaging Sitecore services or support if you run into obstacles.  The answer from Sitecore could be that what you’re doing is not officially supported — and that’s my main reservation.  The reverse proxy is not a fully supported technology for Sitecore.  Sitecore QA doesn’t thoroughly test reverse proxy configurations (from what I’ve seen), and there are questions around how sticky session will work and other pieces of the HTTP request when a reverse proxy is introduced.