Azure AppGateways and Sitecore’s Use of X-Forwarded-For

I’m writing this up so I have a convenient reference for future projects — it looks like there’s a bug with Sitecore’s Analytics library and how it handles IP addresses through an Azure Application Gateway.

Sitecore relies on the X-Forwarded-For HTTP header when a load balancer sits between the Sitecore IIS server and the client browser.  I rarely encounter Sitecore implementations without load balancers, they’re critical for performance, security, resiliency during upgrades, etc.  During some Sitecore testing behind the App Gateway, we observed the following message in the Sitecore logs:

Cannot parse a valid IP address from X-Forwarded-For header

About this time, a friend of mine — and another smart Sitecore guy “Bryan Furlong” — commented to me how his current project ran into port numbers in their IP addresses for xDB purposes . . . so we committed to investigating.

Using Reflector, I confirmed this specific “Cannot parse a valid IP address from” exception message appears in the Process method in the Sitecore.Analytics.Pipelines.CreateVisits.XForwardedFor class:

try
{
    address = IPAddress.Parse(ipFromHeader);
}
catch (FormatException)
{
    Log.Warn($"Cannot parse a valid IP address from {forwardedRequestHttpHeader} header.", this);
    return;
}

It looked like the Azure App Gateway, a specific variety of load balancer for Azure implementations, includes port numbers with the IP address when relaying traffic.  This port number is not handled well by the Sitecore.Analytics processing code, and — in this particular case — led to the failure of GeoIP resolution for an Azure Sitecore implementation.

To verify what was going on, I added the X-Forwarded-For field as a custom field to the IIS Logs and compared the contents.

xfor

Behind the Azure App Gateway, “X-Forwarded-For” fields in the IIS Logs show data such as:

  • 50.14.232.1:45712
  • 74.14.167.1:28336

By comparison, behind the other types of load balancers I looked at, the IIS Logs show data such as:

  • 46.246.335.99
  • 92.77.214.84

Looks like confirmation of the issue!

One cool aspect of working at Rackspace is access to lots of smart people across the industry, and we verified with the App Gateway team at Microsoft that X-Forwarded-For is a comma separated list of <I{:Port> and changing the presence of the port number is NOT currently configurable.  We would need our Sitecore implementation to strip off the port portion.

The Sitecore customization to address this is fairly straight-forward.  Instead of the default CreateVisit pipeline defined as follows in Sitecore.Analytics.Tracking.config . . .

      <createVisit>
        ...
        <processor type="Sitecore.Analytics.Pipelines.CreateVisits.XForwardedFor, Sitecore.Analytics">
          <HeaderIpIndex>0</HeaderIpIndex>
        </processor>
        ...
      </createVisit>

. . . one must introduce their own library and override the GetIpFromHeader method to account for a port number:

public class XForwardedFor : Sitecore.Analytics.Pipelines.CreateVisits.XForwardedFor
    {
        protected override string GetIpFromHeader(string theHeader)
        {
            string[] source = theHeader.Split(new char[] { ',' });
            int headerIpIndex = base.HeaderIpIndex;
            string str = (headerIpIndex < source.Length) ? source[headerIpIndex] : source.LastOrDefault<string>();
            if (string.IsNullOrEmpty(str))
            {
                return null;
            }
            string[] strArray2 = str.Split(new char[] { ':' }, StringSplitOptions.RemoveEmptyEntries);
            if (strArray2.Length > 1)
            {
                str = strArray2[0];
            }
            return str.Trim();
        }
    }

In talking through this all with Sitecore support, they confirmed it’s a product bug and tracked as 132442 issue.

To ensure our custom code replaces the default Sitecore pipeline code, the following patch include file is important:

<?xml version="1.0" encoding="utf-8" ?>
<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/" xmlns:xdt="http://schemas.microsoft.com/XML-Document-Transform">
  <sitecore>
    <pipelines>
      <createVisit>
        <processor type="Our.Custom.Namespace.And.Class.XForwardedFor,Our.Custom.Assembly.dll"
              patch:instead="*[@type='Sitecore.Analytics.Pipelines.CreateVisits.XForwardedFor, Sitecore.Analytics']" />
      </createVisit>
    </pipelines>
  </sitecore>
</configuration>

Update December 10, 2018 – I learned that Sitecore shares an official patch for this issue at https://sitecore.app.box.com/s/g9g8mu7orb3qkbc7dct2i351qaarhwes, so if you want to review their Sitecore.Support.Analytics.Pipelines.CreateVisits.XForwardedFor to address this issue, check it out.

The Sitecore Pie: strategic slicing for better implementations

The Sitecore Pie

pie
At some point I want to catalog the full set of features a Sitecore Content Delivery and Content Management server can manage in a project.  My goal would be to identify all the elements that can be split apart into independent services.  This post is not a comprehensive list of those features, but serves to introduce the concept.

Think of Sitecore as a big blueberry pie that can be sliced into constituent parts.  Some Sitecore sites can really benefit from slicing the pie into small pieces and letting dedicated servers or services manage that specific aspect of the pie.  Too often, companies don’t strategize around how much different work their Sitecore solution is doing.

An example will help communicate my point: consider IIS and how it serves as the execution context for Sitecore.   Many implementations will run logic for the following through the same IIS server that is handling the Sitecore request for rendering a web page.  These are all slices of the Sitecore pie for a Sitecore Content Delivery server:

  1. URL redirection through Sitecore
  2. Securing HTTP traffic with SSL
  3. Image resizing for requests using low-bandwidth or alternative devices
  4. Serving static assets like CSS, JS, graphics, etc
  5. Search indexing and query processing (if one is using Lucene)

If you wanted to cast a broader net, you could include HTTP Session state for when InProc mode is chosen, Geo-IP look-ups for certain CD servers, and others to this list of pie slices.  Remember, I didn’t claim this was an exhaustive list.  The point is: IIS is enlisted in all this other work besides processing content into HTML output for Sitecore website visitors.

Given our specific pie slices above, one could employ the following alternatives to relieve IIS of the processing:

  1. URL Redirection at the load balancer level can be more performant than having Sitecore process redirects
  2. Apply SSL between the load balancer and the public internet, but not between the IIS nodes behind your load balancer — caled “SSL Offloading” or “SSL Termination”
  3. There are services like Akamai that fold in dynamic image processing as part of their suite of products
  4. Serving static assets from a CDN is common practice for Sitecore
  5. Coveo for Sitecore is an alternative search provider that can take a lot of customer-facing search aspects and shift it to dedicated search servers or even Coveo’s Cloud.  One can go even further with Solr for Sitecore or still other search tiers if you’re really adventurous

My point is, just like how we hear a lot this election season about “let Candidate X be Candidate X” — we can let Sitecore be Sitecore and allow it to focus on rendering content created and edited by content authors and presenting it as HTML responses.  That’s what Sitecore is extremely valuable for.

Enter the Cache

I’ve engaged with a lot of Sitecore implementations who were examining their Sitecore pie and determining what slices belong where . . . and frequently we’d make the observation that the caching layer of Sitecore was tightly coupled with the rest of the Sitecore system and caching wasn’t a good candidate for slicing off.  There wasn’t a slick provider model for Sitecore caches, and while certain caches could be partially moved to other servers, it wasn’t clean, complete, or convenient.

That all changed officially with the initial release of Sitecore 8.2 last month.  Now there is a Sitecore.Caching.DefaultCacheManager class, a Sitecore.Caching.ICache interface, and other key extension points as part of the standard Sitecore API.  One can genuinely add the Sitecore cache to the list of pie slices one can consider for off-loading.

In my next post, I will explore using this new API to use Redis as the cache provider for Sitecore instead of the standard in-memory IIS cache.

GeoIP Resolution For Sitecore Explained

A Sitecore implementation can use MaxMind lookups for GeoIP resolution; this is valuable information for marketers as it provides for geography-based segmentation in reports and decision making in the Sitecore Analytics package.  The Analytics.PerformLookup setting in the /App_Config/Include/Sitecore.Analytics.config file governs the lookup activity, and is true with an OOTB installation config file (so be careful!).

It’s recommended to only set this setting to true for one server in an implementation; set this setting to false for the other servers.  For scaled (multi-server) Sitecore installations, this is a common issue since the default config file enables Analytics.PerformLookup.

You can enable lookups on either a CM or a CD instance.  The Sitecore instance should be able to access the MaxMind web service (via the public internet) in order to complete the work.  Having more than one server configured as true for this setting can cause SQL deadlocks and other problems, as well as duplicating your traffic to MaxMind (and using any lookups you’re paying for).

Under the covers, this is what this PerformLookup setting is doing:

  • Sitecore periodically starts a task on the server to perform GeoIP lookups if the setting Analytics.PerformLookup is true in Sitecore.Analytics.config
    1. This task is considered the “Lookup Worker” task
    2. The frequency of the Lookup Worker running is governed by the Analytics.PerformLookup.Interval setting in Sitecore.Analytics.config
    3. The Lookup Worker takes records in the Sitecore “Visit” table and processes any with an EmptyLocationID value in the LocationID column
      • the EmptyLocationId is a special GUID that is generated on each environment, check the first record in the Locations table with a blank business name, country, etc if you want to see what GUID your system is using
    4. The goal of the Lookup Worker is to process Visit records with an EmptyLocationID and populate them with data, so they are no longer empty.  It does this by communicating with MaxMind.

This is only part of the picture, though, as it explains how the background agent determines GeoIP  information for Visits.

What about when a page loads that makes decisions based on the visitor’s GeoIP information; for example, news sites might target different Sitecore content to European visitors compared with Asian visitors.

The process of resolving a visitor’s GeoIP information is as follows:

  1. Attempt to retrieve the GeoIP information from the memory cache
  2. If there’s no data from step 1, retrieve GeoIP information from the Analytics database
  3. If there’s no data from step 2, request GeoIP information from the MaxMind geolocation service

Each of the above steps gets progressively slower; reading from memory >  reading from the database > reading from a remote web service.

When it comes to step 3, if a request is made to MaxMind during a visitor’s HTTP request, Sitecore doesn’t make that a synchronous call and wait for a response.  I’ve seen many websites perform slowly due to blocking web service calls, so this is a smart design decision by Sitecore.  If, however, you’re in a situation where you want to wait for the GeoIP data, refer to this KB article from Sitecore for two methods of addressing it: https://kb.sitecore.net/articles/320734.

If you don’t want to pursue either work-around in the Sitecore KB article, but are using geolocation targeting in your implementation, you should plan for a graceful fallback.  Provide for a generic set of data for non-geolocation targeted visitors.

Additional references on GeoIP resolution with Sitecore include: