Multi-region Sitecore Publishing

Sitecore’s Publishing Service that runs on .NET Core is a great addition to the Sitecore ecosystem. It allows us to solve some interesting customer scaling challenges by using this micro-services approach to Publishing content. I’m going to write-up a pattern we’re using these days that updates our approach from a few years ago.

See an example of the older pattern in this piece I wrote for the Rackspace site at https://developer.rackspace.com/blog/Sitecore-Enterprise-Architecture-For-Global-Publishing/.

Now in May 2019, we’re shifting away from the SQL replication game and using Sitecore’s new Publishing Service to connect Sitecore across multiple regions. Refer to this general diagram below to see how we’re approaching it:

2RegionPublishingService

Sitecore’s Publishing Service is the key element between the two regions and the blue arrows show the flow of publishing activities coordinated through the one “Sitecore Publishing Service” host in Region 1.

A few caveats on the picture above:

  1. It’s Sitecore 8.2, so MongoDB is present but not shown on the diagram for simplicity (we use ObjectRocket’s hosted MongoDB service for the majority of these types of customers — but I don’t want to get into that here); Redis and other elements are also not included in the diagram
  2. This applies for any multi-region setup with Sitecore. . . it could be East US and West US, for example, but we used Europe and Asia in the diagram. This approach is most useful where network latency between the regions is enough to make synchronous database connectivity unacceptably slow. This model can apply to more than 2 regions, too, as the pattern can be repeated to support as many regions as you require.

There are just a few crucial configuration steps to make this happen, but it’s built on a lot of lessons learned along the way. Let me catalog the key elements:

  1. The Publishing Service runs in Region 1, but requires a Sitecore Publishing Target to the Region 2 database. The documentation on setting up this type of Publishing Target is vague, so I summarized this process at https://grantkillian.wordpress.com/2018/12/17/how-i-add-custom-sitecore-publishing-service-targets/.
  2. Each region has an isolated Solr cluster (because Solr CDCR or file synchronization for Solr were not suitable in this use-case). This means one of the Region 2 Sitecore CD servers needs to employ the onPublishEndAsync strategy to update the Solr Cloud collections relevant to the implementation. This is standard ContentSearch configuration material, but if you use the manual strategy here with the CDs (which is the general best practice for Sitecore CD servers connected to a Solr cluster with a CM that drives search indexing), the Solr data will never get updated in the other region:
    • <strategies hint="list:AddStrategy">
        <strategy 
        ref="contentSearch/indexConfigurations/indexUpdateStrategies/onPublishEndAsync"/>
      </strategies>
  3. If you are using Sitecore ContentTesting with this approach (<setting name=”ContentTesting.AutomaticContentTesting.Enabled” value=”true” />), you should be aware that Sitecore CM performance can occasionally stall for several minutes (we’ve seen it last up to 20 minutes!) due to an aspect of the ContentTesting logic that checks every content database for eligible published items to factor into the content testing system. Part of setting up the Region 2 Publishing Target involves adding a ConnectionStrings.config entry to the Region 2 “web” database on the Region 1 Sitecore CM server. This adds the Region 2 “web” database into this ContentTesting routine, and the network latency between Region 1 and Region 2 makes this ContentTesting behaviour slow the CM to a crawl every so often.  If you don’t want to disable Sitecore ContentTesting, you can address this by customizing the Sitecore.ContentTesting.Helpers.VersionHelper.GetLatestPublishedVersion method to employ logic to exclude the Region 2 “web” database. Once you dig deep into this topic, you’ll see the Sitecore.ContentTesting.Helpers.VersionHelper class contains this logic and it’s used in 3 places (according to the decompilation of the .dll):

dude

To adjust ContentTesting to ignore our Region 2 “web” database, we can alter the foreach loop above with something like this that uses a custom “ContentTesting.IgnoredDatabases” setting:

foreach (Database db in Factory.GetDatabases())
{
  string[] excludeList = 
    Sitecore.Configuration.Settings.GetSetting(
    "ContentTesting.IgnoredDatabases")
    .ToLowerInvariant().Split(
        new char[1]
       {
        '|'
       }, 
   StringSplitOptions.RemoveEmptyEntries);
  if (database != null && 
    db.Name != database.Name && 
    !excludeList.Contains(db.Name))
  {
    Item item2 = db.GetItem(item.ID, item.Language);
    if (item2 != null && item2.Version.Number > num)
    {
      num = item2.Version.Number;
    }
  }
}

We can define our custom setting like the following, if we assume region2web is the “web” database ConnectionString name for the Region 2 publishing target on the Sitecore CM:

<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/">
  <sitecore>
    <settings>
      <setting name="ContentTesting.IgnoredDatabases">
        <patch:attribute name="value">core|region2web</patch:attribute>
      </setting> 
    </settings>
  </sitecore>
</configuration>

This work to override the default configuration from . . .

<getVersionedTestCandidates>
  <processor 
    type="Sitecore.ContentTesting.Pipelines.GetTestCandidates.GetPageVersionTestCandidates, Sitecore.ContentTesting">

. . . can dramatically improve the Sitecore CM performance when using this formula for multi-region Sitecore with the new Publishing Service.

Hopefully these notes help other efforts on their Sitecore journey!

Advertisements

Sitecore 9 CD Servers May Assume “Master” EventQueue by default

Here’s a quick one, and I wish it was a clever April Fools’ joke but it isn’t.

Sitecore support recently confirmed a bug for me in Sitecore 9.0 update-2 (may be present for other versions in the Sitecore 9 space — I’m unsure). A Sitecore CD environment might report exceptions assuming a “master” database endpoint like this:

Unknown connection string. Name: 'master'

For the stacktraces I’ve seen with this issue, it’s something like the following:

ERROR One or more exceptions occurred while processing the 
subscribers to the 'publish:end:remote' event.

In the days of Sitecore 8 (I guess those are the olden times now?), we’d adjust our SwitchMasterToWeb.config to address the EventQueue configuration that assumes the presence of a “master” database. For what it’s worth, I always thought Kam Figy’s was the most thorough at https://gist.github.com/kamsar/8096336f141c0e5e97b3.

In the case of this Sitecore 9 issue, we could brew up our own SwitchMasterToWeb.config patch file or work around the issue using role:require logic on the <eventQueue> node in Sitecore.config file. I thought the Sitecore 9 role:define features were designed to make the SwitchMasterToWeb.config obsolete, but if we don’t want to alter Sitecore’s default sitecore.config file, we may need a SwitchMasterToWebForSitecore9.config. History is cyclical!

Here’s the fragment of sitecore.config I’m referring to:

<eventQueueProvider defaultEventQueue=”core”>

<eventQueue role:require=”ContentManagement or Standalone” name=”master” type=”Sitecore.Data.Eventing.$(database)EventQueue, Sitecore.Kernel”>
<param ref=”dataApis/dataApi[@name=’$(database)’]” param1=”$(name)” />
<param hint=”” ref=”PropertyStoreProvider/store[@name=’$(name)’]” />
</eventQueue>

How I Add Custom Sitecore Publishing Service Targets

At this point, I think I’ve installed, configured, or customized the new Sitecore Publishing Service at least a dozen times for various projects. Sometimes it’s on PaaS, sometimes on IaaS . . . I’ve used a variety of different versions depending on the compatibility matrix (see below as of Dec 16, 2018):

PubSvcVisual

I’m going to skip all the preamble about how the new Sitecore Publishing Service works, about .Net core being the new hotness, why this component can be a great addition to many distributed Sitecore implementations, etc — smart people have written a lot about this already. For example, check out Stephen Pope’s no-holds-barred look at the Publishing Service at http://www.stephenpope.co.uk/publishing or Jonathan Robbins has a nice overview piece at https://jonathanrobbins.co.uk/2016/09/02/setting-up-sitecore-publishing-service/.

I’ve learned a good bit from all the iterations of working with the component and I think consistently the most error-prone part of the setup is aligning any additional custom Sitecore publishing targets one is using in an implementation. This write-up from Geykel Moreno at AlphaSolutions has all the good information, but it’s not as easy to follow because it doesn’t post a comprehensive sc.publishing.xml file — it took a bit of trial and error for me, so to simplify for posterity I’m going to share a reference sample Gist at https://gist.github.com/grant-killian/d2fe8d3e89c5d7b15f47464dd1809d62 that includes 2 additional custom publishing targets. I’ve inserted XML comments for the 3 locations one must update in the config\sitecore\publishing\sc.publishing.xml file:

  1. You need to add your ConnectionString entry for each database to the Publishing/ConnectionStrings XML
  2. You need to add your Services/DefaultConnectionFactory/Options/Connections XML definition for each custom target
  3. You need to add entries for each target to the StoreFactory/Options/Stores/Targets XML that will include the GUID of the Sitecore item that defines each publishing target, along with the Name of the item and additional details

Here’s the gist with the full XML for reference:

A few Solr thoughts

Solr has never been more pervasive through the Sitecore projects I’m seeing these days.  Deciding which version of Solr for a greenfield Sitecore project, however, is not clear-cut.

Easy answer: use Solr 5.1

Sitecore’s KB article on compatibility with Solr serves as our official reference when it comes to selecting a Solr version to standardize on.  At face-value, if you’re using Sitecore version 8.2, you’re steered to Solr version 5.1:

SolrCompat

The diagram has a note [3], however, that is worth noting:

WARN  Unable to connect to Solr: [http://{hostname}:{port}/solr], the [SolrNet.Exceptions.SolrConnectionException] was caught.
Exception: SolrNet.Exceptions.SolrConnectionException
Message: Error handling 'status' action
org.apache.solr.common.SolrException: Error handling 'status' action
  • “To resolve issue, upgrade Solr to 5.5.1 or later version.”

Easy answer: use Solr 5.5.1

I asked Sitecore support about this, and in fact the guidance I received from Sitecore Support was to build on Solr version 5.5.1 instead of what the KB article states.  There are no plans to alter the guidance in that KB article, however, since Sitecore 8.2 as a whole platform was thoroughly tested with Solr 5.1.  Apparently, Solr 5.5.1 was not available at the time of that testing.

Anecdotally, Sitecore has found fewer errors when using Solr 5.5.1 instead of Solr 5.1 — when pressed for specifics, it was shared that these two Solr issues have caused problems for other Sitecore implementations:

  1. https://issues.apache.org/jira/browse/SOLR-8793
    • FileNotFoundException or NoSuchFileException with Solr — see comment from Sitecore KB article that it can cause “Unable to connect to Solr” exceptions in some cases
  2. https://issues.apache.org/jira/browse/LUCENE-7188
    • NRTCachingDirectory error where an IllegalStateException exception is thrown

Easy answer: there are no easy answers

I’ve worked with a number of Solr 5.1 projects with Sitecore, and some using other Solr versions prior to Solr 5.5.1, but haven’t encountered the above errors as major impediments.

It’s tempting to use Solr 5.5.1, but if a project is using EXM or WFFM or Sitecore Commerce or some other combination of technology edge case, it’s at least theoretically possible that Sitecore support could fall back on the officially published “Solr 5.1 ✓ ‘officially tested, recommended'” guidance from their KB article.  That’s enough for us to approach new Sitecore projects depending on Solr to go with Solr version 5.1 and keep an eye out for those particular gotchas that may cause us to upgrade to Solr 5.5.1.

The catch is, if you’re upgrading Solr and stopping at Solr 5.5.1 — is there a strong rationale not to upgrade beyond  5.5.1?  At this point, http://archive.apache.org/dist/lucene/solr/ has a wealth of newer Solr versions that are bound to have more patches and fixes that 5.5.1.  This is what you call a slippery slope:

solrslippery.JPG

I have to be careful here as I walk the line of a non-discolosure agreement, but there are still more variables to consider: in the near future, a Sitecore release is likely to involve thorough Solr support for a very recent version of Solr.  Expect a Solr version newer than 5.5.1 (which was released May of 2016 ☺).

So…

I believe I’ve sold myself on the wisdom of Solr 5.1 for now — so long as the sacred Sitecore Support ✓ is present on the official compatibility table.  It’s key to continue learning with Solr, though, and in the months to come we may be talking about SolrCloud and managed Solr schemas . . . cool new aspects to improve Sitecore implementations.

Sitecore Session Persistence Notes

I’ve neglected this blog of late, being focused on a number of “not easily blogged about” scenarios across several Sitecore projects.  It’s too bad, because the work is very interesting, but it doesn’t lend itself to a page or two write-up with a digestible take-away for the general Sitecore community out there.

I do want to keep in the habit of blogging, though, so I’m going to mention this ongoing discussion I’ve been a part of about session management with regards to Sitecore.  There are a few options for managing HTTP session state with Sitecore covered in https://doc.sitecore.net/sitecore_experience_platform/setting_up_and_maintaining/xdb/session_state/session_state: SQL Server, MongoDB, and Redis.  Those three technologies are really just the tip of the mountain, as implementation details for each can get quite detailed.  For the discerning Sitecore implementation, it can be useful to understand the nuances of each session state provider.  While not an exhaustive look at any one of these solutions, I wanted to post some notes on each one given the current state of Sitecore architecture (June 2017):

SQL Server

This is often the default session provider we gravitate to.  The SQL Server “Boost” script from Sitecore is something we’ve used on implementations (see “Optimize SQL Server performance” on that link), but it is not without it’s rough edges (see our Rackspace write-up on how to alter permissions so TempDB is reliably available across service restarts).

You’ll notice the approach for improving SQL Server performance with session state is all about getting session state “in-memory” to the furthest extent possible.  Remember this when we examine the other two providers below . . .

I will say that, generally speaking, SQL Server is easy to administer as it’s a well-known technology and updating it, scaling it, managing fail-overs, etc is simple compared with the alternatives.  SQL Server has been part of the Windows dev stack for ages, now, so it’s often the default session provider one gravitates to.

MongoDB

With MongoDB serving as the persistence layer for Sitecore’s xDB, it became a fully supported and viable option for HTTP session state with Sitecore at the same time.  The comparative performance between MongoDB and SQL Server is up for debate (Redis too, for that matter!), and it usually comes down to testing based on how the specific implementation is using session with Sitecore etc; I’m not going to hazard any generalizations on relative perf, as that’s not really the point of this post.

Instead, I’d like to point out how MongoDB does not come in just a single flavor.  The two most common flavors, or “storage engines,” are MMAP and WiredTiger, but there are still others designed to serve specific use cases.  Take, for example, the Percona Server for MongoDB hosted by ObjectRocket that has a posted option for the RocksDB storage engine.  RocksDB with MongoDB may not be a great fit for Sitecore session state (RocksDB is tuned for write-heavy work loads — and, in some cases, if you’re making extensive use of TTL indexes for Sitecore then RocksDB fits those scenarios in certain appealing ways), but it does open the door to MongoDB being more than just a one-size-fits-all data repository (read more about RocksDB and it’s Facebook pedigree here).  One MongoDB storage engine option that is easily overlooked is for WiredTiger “in-memory” that will force data to be stored in RAM . . . and this is perfect for HTTP Session State for most Sitecore builds.

In fact, if you consider the SQL Server “boost” approach that uses TempDB to store session state for Sitecore . . . WiredTiger “in-memory” is attacking the problem from the same direction.  Store everything in RAM!  This is why one must be cautious with general comparisons between SQL Server and MongoDB, the devil is always in the details: a far better comparison would be “boosted” SQL Server for Sitecore using TempDB vs MongoDB WiredTiger “in-memory” storage engine.  And note the network latency . . . and the size of the session objects . . . and you’re getting the point, I trust.  To really answer the SQL Server vs MongoDB question for Sitecore sessions, one has to develop a matrix of performance evaluations and level assumptions across the board.  “It depends” is the only honest answer that doesn’t come with a list of caveats.

If you’re curious on this MongoDB topic for your project, go to http://objectrocket.com/docs/mongodb_plans.html and spin up a WT 3.2 storage engine plan for 5 GB of storage (this allows 1.5 GB for RAM).  1.5 GB for RAM is going to be overkill for most small/medium Sitecore implementations — but again, you’ll want to test with your specific session data set to see!  Furthermore, network latency of 10 ms or less is going to help make the most of an ObjectRocket hosted MongoDB service like this — otherwise, the network latency may not make it worth the money.  Let me know if you pursue this with ObjectRocket, as there are some benchmarking measures we want to do but we haven’t had a real implementation to try it out on.  So if you feel like being a guinea pig, please let me know at grant.killian [at] rackspace.com.  It would be great to have real world metrics to prove this all out.

Redis

If the way to get the best session management perf out of SQL Server and MongoDB is to find in-memory solutions, Redis looks like the slam dunk since it’s just an in-memory storage solution.  We find most clients aren’t interested in managing Redis infrastructure, so again a hosted option such as ObjectRocket has appeal.

Sitecore relies on the StackExchange.Redis assembly, which doesn’t support Redis Sentinel — it’s a bit of a saga at https://github.com/StackExchange/StackExchange.Redis/pull/406;  therefore there’s not a great high availability story with the self-hosted Redis and Sitecore right now.  How concerned one should be with HA of fairly transient HTTP Session State for Sitecore, however, is an open question.  I usually wouldn’t worry about it too much.  Honestly, Redis is a technology that we’re just now starting to get really serious about at Rackspace so our sophistication in this space will improve dramatically in the months to come.  Between Azure Redis and all the Sitecore PaaS movement we’re seeing, it’s become a key player in a lot of Sitecore architectures.

Azure Search compared to Solr for Sitecore PaaS (Chapter 2: Querying)

I carried forward my Azure PaaS benchmarking work from earlier this month (see this post on the indexing side of the equation for the start of the story).

For a quick refresher, I’ve used an ARM template based deployment of Sitecore to get a system resembling the following:

ARM Templates Arch

The element I’m exercising in the benchmarks is how Sitecore’s web servers work with the “Search” icon in the diagram above.  I tackled the document ingestion side (how data gets into the search indexes) in my earlier post.  This post addresses the querying side of things (how data gets out of the search indexes).

By default, Azure PaaS search with Sitecore is configured to use Azure Search.  Solr is another viable option.

Here’s where I’ll interject that Coveo also has an excellent search technology for Sitecore.  There are specific use-cases where Coveo is a strong fit, however, and in my indexing the sitecore_core_index evaluations in the earlier post Coveo would not be considered a good fit.  This changes, however, for the set of benchmarks I’ve run in this post.  I am in the process of testing the Coveo approach in Azure PaaS for Sitecore . . . it’s hot off the presses, so there are still rough edges to work around . . . but Coveo is not part of this write-up for the time being.  I will post an update here once I’ve completed the analysis involving Coveo.

In considering Azure Search vs Solr, I used a methodology with JMeter laid out in a great KB article from Sitecore at https://kb.sitecore.net/articles/398589.  I have a LaunchSitecore site running and I use JMeter to automate visits to the site, simulating simple user behaviour.  I don’t go too crazy with this, because I’m more interested in exercising a basic Sitecore work load than doing a deep-dive in xDB traffic simulation.

My first post showed a clear advantage to Solr for the indexing side of search, but for the querying side I can say there is very little variance between Azure Search and Solr.  Sitecore does a good job of protecting data repositories with layers of data and html caches, but even with those those features disabled (we’re talking cacheHtml=”false”on the site definition, <cacheSizes> configuration all set to a heretical zero (“0”), etc) there isn’t a significant difference between the two technologies.

I’m not going to put up a graph of it, because the throughput as measured by JMeter for tests of 20, 50, 100, 200, or  more visitors performed almost the same.

I could develop a more search heavy set of benchmarks, performing a random dictionary of searches against a large custom index that Sitecore responds to but must bypass all caches etc, but that feels like overkill for what I’m looking to achieve.  Maybe that’s appropriate once I bring Coveo into the benchmarking fun.

For this, I wanted to get a sense for the relative performance between Azure Search and Solr as it relates to Sitecore PaaS and I think I’ve done that.  Succinctly:

  1. Solr is considerably faster at search indexing (courtesy of the search provider implementation in Sitecore)
  2. both Azure Search and Solr perform about the same when it comes to querying a basic Sitecore site like LaunchSitecore (again, courtesy of the search provider implementation in Sitecore)

This isn’t the definitive take on the topic.  It’s more like the beginning.  Azure Search is native to Azure, so there are significant advantages there.  There is a lot of momentum around Azure and Sitecore in general, so that story will continue to evolve.

There are Solr as a service options out there that make Solr for Sitecore much easier (such as www.measuredsearch.com which I’ll blog about in the next few days), but Solr can be a lot for corporate IT departments to take on, so it isn’t a simple choice for everyone.

 

 

High Availability of Azure Search with Sitecore

I’ve been investigating Azure Search with Sitecore’s new Azure App Service offering.  I’ve got a giant Excel file of benchmarks and charts based on several permutations and configurations, and several other interesting tidbits that I need to organize into posts to this blog . . . so look for much more about this general topic in the future.

For now, I thought I’d share a point I’ve confirmed with Sitecore support regarding a limitation of Azure Search with Sitecore’s CloudSearchProviderIndex.  The CloudSearchProviderIndex is what the standard Platform-As-A-Service product from Sitecore will use in place of Lucene or Solr or Coveo to power content search for Sitecore.  This is the key building block for working with Azure Search through Sitecore.  While I was performing performance benchmarks for search re-indexing with Sitecore, I noticed the Azure Search document count would drop to 0 and I’d see odd results from Sitecore requests that depended on the search index.  This was classic “search index is being worked on, don’t rely on querying it until the work is done” behaviour.  This was corrected several years ago through Sitecore’s addition of a SwitchOnRebuildLuceneIndex and equivalent for Solr . . . but there is no such equivalent for the CloudSearchProviderIndex used by Azure PaaS solutions.  Essentially: Sitecore is using a single copy of search indexes for query and re-indexing operations, limiting the availability of search during maintenance work.

One could argue this may not be such a big deal because one may not rebuild Azure Search indexes with any frequency.  I’m not sold on this argument, however, since the Sitecore projects I know will frequently perform re-indexing due to development changes to the schema, content synchronization demands, or just routine deployment standard practices.

Further complicating this issue is that my benchmarking for Azure Search re-indexing through Sitecore leaves a lot to be desired.  It can be slow.  This could make for an extended period of search index unavailability due to the CloudSearchProviderIndex‘s limitations.  I’ll share the full battery of testing I’ve done in a future post, but for now let me share the timings I’m observing regardless of the number of Azure Search partitions or replicas I’m working through (partitions should generally improve indexing performance; replicas should generally improve querying performance):

App Service Configuration Time for 20,000 Sitecore Items to Re-Index with Azure Search
Azure PaaS Standard (S1) CM IIS (OOTB from the Marketplace) 66 minutes
Azure S2 CM IIS 35 minutes
Azure S3 CM IIS 25 minutes
Azure P2 CM IIS 35 minutes
Azure P3 CM IIS 24 minutes

For reference, with Lucene indexes this operation would take 5 minutes or less.  The scaling options for Azure Search, Partition count and Replica count, have a minimal impact to the re-indexing operation.

I’ll go into details of this later, but it could be that . . .

  • 20,000 Sitecore items is too small a figure to benefit from scaling with Azure Search?  Many customers have 100,000 or more items, so perhaps I should evaluate a larger data set.
  • there are bottlenecks at the SQL tier?  App Insights here I come…
  • the fact Sitecore isn’t using Azure Search Indexers to ingest data and relies on the Sitecore crawling logic to handle data indexing is artificially slowing this process down

For the time being, Sitecore has responded that improving the availability of Azure Search indexes during rebuilds is an official “feature request” and assigned reference number 146822 

In the meantime, if a project needs high availability for Azure Search indexes one may need to roll up their sleeves and craft their own SwitchOnCloudSearchProviderIndex.  It appears fairly straight-forward based on reviewing how this is solved for Solr, just as one example.  A key caveat is in the Azure Search capacity planning documentation:

High availability for Azure Search pertains to queries and index updates that don’t involve rebuilding an index. If you add or delete a field, change a data type, or rename a field, you will need to rebuild the index. To rebuild the index, you must delete the index, re-create the index, and reload the data.

To maintain index availability during a rebuild, you must have a copy of the index with a different name on the same service, or a copy of the index with the same name on a different service, and then provide redirection or failover logic in your code.

It looks like providing for high availability would double the price of Azure Search indexes, so there are a cascade of complications related to this.

My investigations into Sitecore and Azure Search yielded this complication — it’s not insurmountable, and I actually find it fascinating how an on-premises product (classic Sitecore) will evolve into a cloud-first product.  This is just one piece of the evolutionary story.  I expect this will be addressed sooner rather than later in an official upgrade or patch from Sitecore, and until then it’s important to understand this nuance to the Sitecore PaaS landscape.