Multi-region Sitecore Publishing

Sitecore’s Publishing Service that runs on .NET Core is a great addition to the Sitecore ecosystem. It allows us to solve some interesting customer scaling challenges by using this micro-services approach to Publishing content. I’m going to write-up a pattern we’re using these days that updates our approach from a few years ago.

See an example of the older pattern in this piece I wrote for the Rackspace site at https://developer.rackspace.com/blog/Sitecore-Enterprise-Architecture-For-Global-Publishing/.

Now in May 2019, we’re shifting away from the SQL replication game and using Sitecore’s new Publishing Service to connect Sitecore across multiple regions. Refer to this general diagram below to see how we’re approaching it:

2RegionPublishingService

Sitecore’s Publishing Service is the key element between the two regions and the blue arrows show the flow of publishing activities coordinated through the one “Sitecore Publishing Service” host in Region 1.

A few caveats on the picture above:

  1. It’s Sitecore 8.2, so MongoDB is present but not shown on the diagram for simplicity (we use ObjectRocket’s hosted MongoDB service for the majority of these types of customers — but I don’t want to get into that here); Redis and other elements are also not included in the diagram
  2. This applies for any multi-region setup with Sitecore. . . it could be East US and West US, for example, but we used Europe and Asia in the diagram. This approach is most useful where network latency between the regions is enough to make synchronous database connectivity unacceptably slow. This model can apply to more than 2 regions, too, as the pattern can be repeated to support as many regions as you require.

There are just a few crucial configuration steps to make this happen, but it’s built on a lot of lessons learned along the way. Let me catalog the key elements:

  1. The Publishing Service runs in Region 1, but requires a Sitecore Publishing Target to the Region 2 database. The documentation on setting up this type of Publishing Target is vague, so I summarized this process at https://grantkillian.wordpress.com/2018/12/17/how-i-add-custom-sitecore-publishing-service-targets/.
  2. Each region has an isolated Solr cluster (because Solr CDCR or file synchronization for Solr were not suitable in this use-case). This means one of the Region 2 Sitecore CD servers needs to employ the onPublishEndAsync strategy to update the Solr Cloud collections relevant to the implementation. This is standard ContentSearch configuration material, but if you use the manual strategy here with the CDs (which is the general best practice for Sitecore CD servers connected to a Solr cluster with a CM that drives search indexing), the Solr data will never get updated in the other region:
    • <strategies hint="list:AddStrategy">
        <strategy 
        ref="contentSearch/indexConfigurations/indexUpdateStrategies/onPublishEndAsync"/>
      </strategies>
  3. If you are using Sitecore ContentTesting with this approach (<setting name=”ContentTesting.AutomaticContentTesting.Enabled” value=”true” />), you should be aware that Sitecore CM performance can occasionally stall for several minutes (we’ve seen it last up to 20 minutes!) due to an aspect of the ContentTesting logic that checks every content database for eligible published items to factor into the content testing system. Part of setting up the Region 2 Publishing Target involves adding a ConnectionStrings.config entry to the Region 2 “web” database on the Region 1 Sitecore CM server. This adds the Region 2 “web” database into this ContentTesting routine, and the network latency between Region 1 and Region 2 makes this ContentTesting behaviour slow the CM to a crawl every so often.  If you don’t want to disable Sitecore ContentTesting, you can address this by customizing the Sitecore.ContentTesting.Helpers.VersionHelper.GetLatestPublishedVersion method to employ logic to exclude the Region 2 “web” database. Once you dig deep into this topic, you’ll see the Sitecore.ContentTesting.Helpers.VersionHelper class contains this logic and it’s used in 3 places (according to the decompilation of the .dll):

dude

To adjust ContentTesting to ignore our Region 2 “web” database, we can alter the foreach loop above with something like this that uses a custom “ContentTesting.IgnoredDatabases” setting:

foreach (Database db in Factory.GetDatabases())
{
  string[] excludeList = 
    Sitecore.Configuration.Settings.GetSetting(
    "ContentTesting.IgnoredDatabases")
    .ToLowerInvariant().Split(
        new char[1]
       {
        '|'
       }, 
   StringSplitOptions.RemoveEmptyEntries);
  if (database != null && 
    db.Name != database.Name && 
    !excludeList.Contains(db.Name))
  {
    Item item2 = db.GetItem(item.ID, item.Language);
    if (item2 != null && item2.Version.Number > num)
    {
      num = item2.Version.Number;
    }
  }
}

We can define our custom setting like the following, if we assume region2web is the “web” database ConnectionString name for the Region 2 publishing target on the Sitecore CM:

<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/">
  <sitecore>
    <settings>
      <setting name="ContentTesting.IgnoredDatabases">
        <patch:attribute name="value">core|region2web</patch:attribute>
      </setting> 
    </settings>
  </sitecore>
</configuration>

This work to override the default configuration from . . .

<getVersionedTestCandidates>
  <processor 
    type="Sitecore.ContentTesting.Pipelines.GetTestCandidates.GetPageVersionTestCandidates, Sitecore.ContentTesting">

. . . can dramatically improve the Sitecore CM performance when using this formula for multi-region Sitecore with the new Publishing Service.

Hopefully these notes help other efforts on their Sitecore journey!

Sitecore 9 CD Servers May Assume “Master” EventQueue by default

Here’s a quick one, and I wish it was a clever April Fools’ joke but it isn’t.

Sitecore support recently confirmed a bug for me in Sitecore 9.0 update-2 (may be present for other versions in the Sitecore 9 space — I’m unsure). A Sitecore CD environment might report exceptions assuming a “master” database endpoint like this:

Unknown connection string. Name: 'master'

For the stacktraces I’ve seen with this issue, it’s something like the following:

ERROR One or more exceptions occurred while processing the 
subscribers to the 'publish:end:remote' event.

In the days of Sitecore 8 (I guess those are the olden times now?), we’d adjust our SwitchMasterToWeb.config to address the EventQueue configuration that assumes the presence of a “master” database. For what it’s worth, I always thought Kam Figy’s was the most thorough at https://gist.github.com/kamsar/8096336f141c0e5e97b3.

In the case of this Sitecore 9 issue, we could brew up our own SwitchMasterToWeb.config patch file or work around the issue using role:require logic on the <eventQueue> node in Sitecore.config file. I thought the Sitecore 9 role:define features were designed to make the SwitchMasterToWeb.config obsolete, but if we don’t want to alter Sitecore’s default sitecore.config file, we may need a SwitchMasterToWebForSitecore9.config. History is cyclical!

Here’s the fragment of sitecore.config I’m referring to:

<eventQueueProvider defaultEventQueue=”core”>

<eventQueue role:require=”ContentManagement or Standalone” name=”master” type=”Sitecore.Data.Eventing.$(database)EventQueue, Sitecore.Kernel”>
<param ref=”dataApis/dataApi[@name=’$(database)’]” param1=”$(name)” />
<param hint=”” ref=”PropertyStoreProvider/store[@name=’$(name)’]” />
</eventQueue>

How I Add Custom Sitecore Publishing Service Targets

At this point, I think I’ve installed, configured, or customized the new Sitecore Publishing Service at least a dozen times for various projects. Sometimes it’s on PaaS, sometimes on IaaS . . . I’ve used a variety of different versions depending on the compatibility matrix (see below as of Dec 16, 2018):

PubSvcVisual

I’m going to skip all the preamble about how the new Sitecore Publishing Service works, about .Net core being the new hotness, why this component can be a great addition to many distributed Sitecore implementations, etc — smart people have written a lot about this already. For example, check out Stephen Pope’s no-holds-barred look at the Publishing Service at http://www.stephenpope.co.uk/publishing or Jonathan Robbins has a nice overview piece at https://jonathanrobbins.co.uk/2016/09/02/setting-up-sitecore-publishing-service/.

I’ve learned a good bit from all the iterations of working with the component and I think consistently the most error-prone part of the setup is aligning any additional custom Sitecore publishing targets one is using in an implementation. This write-up from Geykel Moreno at AlphaSolutions has all the good information, but it’s not as easy to follow because it doesn’t post a comprehensive sc.publishing.xml file — it took a bit of trial and error for me, so to simplify for posterity I’m going to share a reference sample Gist at https://gist.github.com/grant-killian/d2fe8d3e89c5d7b15f47464dd1809d62 that includes 2 additional custom publishing targets. I’ve inserted XML comments for the 3 locations one must update in the config\sitecore\publishing\sc.publishing.xml file:

  1. You need to add your ConnectionString entry for each database to the Publishing/ConnectionStrings XML
  2. You need to add your Services/DefaultConnectionFactory/Options/Connections XML definition for each custom target
  3. You need to add entries for each target to the StoreFactory/Options/Stores/Targets XML that will include the GUID of the Sitecore item that defines each publishing target, along with the Name of the item and additional details

Here’s the gist with the full XML for reference:


<?xml version="1.0" encoding="UTF-8"?>
<Settings>
<Sitecore>
<Publishing>
<InstanceName>${SITECORE_InstanceName}</InstanceName>
<ConnectionStrings>
<Service>${Sitecore:Publishing:ConnectionStrings:Master}</Service>
<!– Add any additional publishing targets you may use (first location for changes to this file) –>
<previewweb>Data Source=Server-012345;Initial Catalog=PrevWeb;Integrated Security=True;MultipleActiveResultSets=True;ConnectRetryCount=15;ConnectRetryInterval=1</previewweb>
<liveweb>Data Source=Server-012345;Initial Catalog=LiveWeb;Integrated Security=True;MultipleActiveResultSets=True;ConnectRetryCount=15;ConnectRetryInterval=1</liveweb>
<!– end first location for changes –>
</ConnectionStrings>
<Services>
<DefaultConnectionFactory>
<Options>
<Connections>
<Links>
<Type>Sitecore.Framework.Publishing.Data.AdoNet.SqlDatabaseConnection, Sitecore.Framework.Publishing.Data</Type>
<LifeTime>Transient</LifeTime>
<Options>
<ConnectionString>${Sitecore:Publishing:ConnectionStrings:Core}</ConnectionString>
<DefaultCommandTimeout>120</DefaultCommandTimeout>
<Behaviours>
<backend>sql-backend-default</backend>
<api>sql-api-default</api>
</Behaviours>
</Options>
</Links>
<Service>
<Type>Sitecore.Framework.Publishing.Data.AdoNet.SqlDatabaseConnection, Sitecore.Framework.Publishing.Data</Type>
<LifeTime>Transient</LifeTime>
<Options>
<ConnectionString>${Sitecore:Publishing:ConnectionStrings:Service}</ConnectionString>
<DefaultCommandTimeout>120</DefaultCommandTimeout>
<Behaviours>
<backend>sql-backend-default</backend>
<api>sql-api-default</api>
</Behaviours>
</Options>
</Service>
<Master>
<Type>Sitecore.Framework.Publishing.Data.AdoNet.SqlDatabaseConnection, Sitecore.Framework.Publishing.Data</Type>
<LifeTime>Transient</LifeTime>
<Options>
<ConnectionString>${Sitecore:Publishing:ConnectionStrings:Master}</ConnectionString>
<DefaultCommandTimeout>120</DefaultCommandTimeout>
<Behaviours>
<backend>sql-backend-default</backend>
<api>sql-api-default</api>
</Behaviours>
</Options>
</Master>
<Internet>
<!– Should match the name of the publishing target configured in SC. –>
<Type>Sitecore.Framework.Publishing.Data.AdoNet.SqlDatabaseConnection, Sitecore.Framework.Publishing.Data</Type>
<LifeTime>Transient</LifeTime>
<Options>
<ConnectionString>${Sitecore:Publishing:ConnectionStrings:Web}</ConnectionString>
<DefaultCommandTimeout>120</DefaultCommandTimeout>
<Behaviours>
<backend>sql-backend-default</backend>
<api>sql-api-default</api>
</Behaviours>
</Options>
</Internet>
<!– start custom publishing target additions (2nd location) –>
<Preview>
<!– Should match the name of the publishing target configured in Sitecore –>
<Type>Sitecore.Framework.Publishing.Data.AdoNet.SqlDatabaseConnection, Sitecore.Framework.Publishing.Data</Type>
<LifeTime>Transient</LifeTime>
<Options>
<ConnectionString>${Sitecore:Publishing:ConnectionStrings:previewweb}</ConnectionString>
<DefaultCommandTimeout>120</DefaultCommandTimeout>
<Behaviours>
<backend>sql-backend-default</backend>
<api>sql-api-default</api>
</Behaviours>
</Options>
</Preview>
<Live>
<!– Should match the name of the publishing target configured in Sitecore –>
<Type>Sitecore.Framework.Publishing.Data.AdoNet.SqlDatabaseConnection, Sitecore.Framework.Publishing.Data</Type>
<LifeTime>Transient</LifeTime>
<Options>
<ConnectionString>${Sitecore:Publishing:ConnectionStrings:liveweb}</ConnectionString>
<DefaultCommandTimeout>120</DefaultCommandTimeout>
<Behaviours>
<backend>sql-backend-default</backend>
<api>sql-api-default</api>
</Behaviours>
</Options>
</Live>
<!– end custom publishing target additions (2nd location) –>
</Connections>
</Options>
</DefaultConnectionFactory>
<DbConnectionBehaviours>
<Options>
<Entries>
<sql-backend-default>
<Type>Sitecore.Framework.Publishing.Data.AdoNet.NoRetryConnectionBehaviour, Sitecore.Framework.Publishing.Data</Type>
<Options>
<Name>Default Backend No Retry behaviour</Name>
<CommandTimeout>120</CommandTimeout>
</Options>
</sql-backend-default>
<sql-api-default>
<Type>Sitecore.Framework.Publishing.Data.AdoNet.NoRetryConnectionBehaviour, Sitecore.Framework.Publishing.Data</Type>
<Options>
<Name>Default Api No Retry behaviour</Name>
<CommandTimeout>10</CommandTimeout>
</Options>
</sql-api-default>
</Entries>
</Options>
</DbConnectionBehaviours>
<StoreFactory>
<Options>
<Stores>
<Service>
<Type>Sitecore.Framework.Publishing.Data.ServiceStore, Sitecore.Framework.Publishing.Data</Type>
<ConnectionName>Service</ConnectionName>
<FeaturesListName>ServiceStoreFeatures</FeaturesListName>
</Service>
<Sources>
<Master>
<Type>Sitecore.Framework.Publishing.Data.SourceStore, Sitecore.Framework.Publishing.Data</Type>
<ConnectionNames>
<master>Master</master>
</ConnectionNames>
<FeaturesListName>SourceStoreFeatures</FeaturesListName>
<!– The name of the Database entity in Sitecore. –>
<ScDatabase>master</ScDatabase>
</Master>
</Sources>
<Targets>
<!–Additional targets can be configured here–>
<Internet>
<Type>Sitecore.Framework.Publishing.Data.TargetStore, Sitecore.Framework.Publishing.Data</Type>
<ConnectionName>Internet</ConnectionName>
<FeaturesListName>TargetStoreFeatures</FeaturesListName>
<!– The id of the target item definition in Sitecore. –>
<Id>8E080626-DDC3-4EF4-A1D1-F0BE4A200254</Id>
<!– The name of the Database entity in Sitecore. –>
<ScDatabase>web</ScDatabase>
</Internet>
<!– start custom publishing target additions (third location) –>
<!– this XML node should be named the same as the item in Sitecore (not the "Display Name", but the Item name) –>
<Preview>
<Type>Sitecore.Framework.Publishing.Data.TargetStore, Sitecore.Framework.Publishing.Data</Type>
<ConnectionName>Preview</ConnectionName>
<FeaturesListName>TargetStoreFeatures</FeaturesListName>
<!– make sure the GUID below matches the GUID stored in Sitecore for the Publishing Target –>
<Id>8D1249E6-9413-4C2D-8C72-06561CE1D026</Id>
<ScDatabase>preveiwweb</ScDatabase>
</Preview>
<!– this XML node should be named the same as the item in Sitecore (not the "Display Name", but the Item name) –>
<Live>
<Type>Sitecore.Framework.Publishing.Data.TargetStore, Sitecore.Framework.Publishing.Data</Type>
<ConnectionName>Live</ConnectionName>
<FeaturesListName>TargetStoreFeatures</FeaturesListName>
<!– make sure the GUID below matches the GUID stored in Sitecore for the Publishing Target –>
<Id>0EA57D57-7837-4B51-A72C-E8B3F1322C07</Id>
<ScDatabase>liveweb</ScDatabase>
</Live>
<!– end custom publishing target additions (third location) –>
</Targets>
<ItemsRelationship>
<Type>Sitecore.Framework.Publishing.Data.ItemsRelationshipStore, Sitecore.Framework.Publishing.Data</Type>
<ConnectionName>Links</ConnectionName>
<FeaturesListName>ItemsRelationshipStoreFeatures</FeaturesListName>
</ItemsRelationship>
</Stores>
</Options>
</StoreFactory>
<StoreFeaturesLists>
<Options>
<FeatureLists>
<!–Source Store Features–>
<SourceStoreFeatures>
<ItemReadRepositoryFeature>
<Type>Sitecore.Framework.Publishing.Data.CompositeItemReadRepository, Sitecore.Framework.Publishing.Data</Type>
</ItemReadRepositoryFeature>
<TestableContentRepositoryFeature>
<Type>Sitecore.Framework.Publishing.Data.CompositeTestableContentRepository, Sitecore.Framework.Publishing.Data</Type>
</TestableContentRepositoryFeature>
<WorkflowStateRepositoryFeature>
<Type>Sitecore.Framework.Publishing.Data.CompositeWorkflowStateRepository, Sitecore.Framework.Publishing.Data</Type>
</WorkflowStateRepositoryFeature>
<EventQueueRepositoryFeature>
<Type>Sitecore.Framework.Publishing.Data.CompositeEventQueueRepository, Sitecore.Framework.Publishing.Data</Type>
<options>
<ConnectionName>master</ConnectionName>
</options>
</EventQueueRepositoryFeature>
<SourceIndexFeature>
<Type>Sitecore.Framework.Publishing.ItemIndex.SourceIndexWrapper, Sitecore.Framework.Publishing</Type>
</SourceIndexFeature>
</SourceStoreFeatures>
<!–Service Store Features–>
<ServiceStoreFeatures>
<ManifestRepositoryFeature>
<Type>Sitecore.Framework.Publishing.Manifest.ManifestRepository, Sitecore.Framework.Publishing</Type>
</ManifestRepositoryFeature>
<PublisherOperationRepositoryFeature>
<Type>Sitecore.Framework.Publishing.PublisherOperations.PublisherOperationRepository, Sitecore.Framework.Publishing</Type>
</PublisherOperationRepositoryFeature>
<PublishJobQueueRepositoryFeature>
<Type>Sitecore.Framework.Publishing.PublishJobQueue.PublishJobQueueRepository, Sitecore.Framework.Publishing</Type>
</PublishJobQueueRepositoryFeature>
<TargetSyncStateRepositoryFeature>
<Type>Sitecore.Framework.Publishing.TargetSyncState.TargetSyncStateRepository, Sitecore.Framework.Publishing</Type>
</TargetSyncStateRepositoryFeature>
<ActivationLockRepositoryFeature>
<Type>Sitecore.Framework.Publishing.InstanceActivation.ActivationLockRepository, Sitecore.Framework.Publishing</Type>
</ActivationLockRepositoryFeature>
</ServiceStoreFeatures>
<!–Target Store Features–>
<TargetStoreFeatures>
<IndexableItemRepositoryFeature>
<Type>Sitecore.Framework.Publishing.Data.Classic.ClassicIndexableItemRepository, Sitecore.Framework.Publishing.Data.Classic</Type>
</IndexableItemRepositoryFeature>
<ItemWriteRepositoryFeature>
<Type>Sitecore.Framework.Publishing.Data.Classic.ClassicItemRepository, Sitecore.Framework.Publishing.Data.Classic</Type>
</ItemWriteRepositoryFeature>
<MediaRepositoryFeature>
<Type>Sitecore.Framework.Publishing.Data.Classic.Repositories.ClassicMediaRepository, Sitecore.Framework.Publishing.Data.Classic</Type>
</MediaRepositoryFeature>
<TargetIndexFeature>
<Type>Sitecore.Framework.Publishing.ItemIndex.TargetIndexWrapper, Sitecore.Framework.Publishing</Type>
</TargetIndexFeature>
</TargetStoreFeatures>
<!–ItemsRelationship Store Features–>
<ItemsRelationshipStoreFeatures>
<DatabaseItemRelationshipRepositoryFeature>
<Type>Sitecore.Framework.Publishing.Data.Classic.ClassicItemRelationshipRepository, Sitecore.Framework.Publishing.Data.Classic</Type>
</DatabaseItemRelationshipRepositoryFeature>
</ItemsRelationshipStoreFeatures>
</FeatureLists>
</Options>
</StoreFeaturesLists>
</Services>
</Publishing>
</Sitecore>
</Settings>

https://gist.github.com/grant-killian/d2fe8d3e89c5d7b15f47464dd1809d62.js

A few Solr thoughts

Solr has never been more pervasive through the Sitecore projects I’m seeing these days.  Deciding which version of Solr for a greenfield Sitecore project, however, is not clear-cut.

Easy answer: use Solr 5.1

Sitecore’s KB article on compatibility with Solr serves as our official reference when it comes to selecting a Solr version to standardize on.  At face-value, if you’re using Sitecore version 8.2, you’re steered to Solr version 5.1:

SolrCompat

The diagram has a note [3], however, that is worth noting:

WARN  Unable to connect to Solr: [http://{hostname}:{port}/solr], the [SolrNet.Exceptions.SolrConnectionException] was caught.
Exception: SolrNet.Exceptions.SolrConnectionException
Message: Error handling 'status' action
org.apache.solr.common.SolrException: Error handling 'status' action
  • “To resolve issue, upgrade Solr to 5.5.1 or later version.”

Easy answer: use Solr 5.5.1

I asked Sitecore support about this, and in fact the guidance I received from Sitecore Support was to build on Solr version 5.5.1 instead of what the KB article states.  There are no plans to alter the guidance in that KB article, however, since Sitecore 8.2 as a whole platform was thoroughly tested with Solr 5.1.  Apparently, Solr 5.5.1 was not available at the time of that testing.

Anecdotally, Sitecore has found fewer errors when using Solr 5.5.1 instead of Solr 5.1 — when pressed for specifics, it was shared that these two Solr issues have caused problems for other Sitecore implementations:

  1. https://issues.apache.org/jira/browse/SOLR-8793
    • FileNotFoundException or NoSuchFileException with Solr — see comment from Sitecore KB article that it can cause “Unable to connect to Solr” exceptions in some cases
  2. https://issues.apache.org/jira/browse/LUCENE-7188
    • NRTCachingDirectory error where an IllegalStateException exception is thrown

Easy answer: there are no easy answers

I’ve worked with a number of Solr 5.1 projects with Sitecore, and some using other Solr versions prior to Solr 5.5.1, but haven’t encountered the above errors as major impediments.

It’s tempting to use Solr 5.5.1, but if a project is using EXM or WFFM or Sitecore Commerce or some other combination of technology edge case, it’s at least theoretically possible that Sitecore support could fall back on the officially published “Solr 5.1 ✓ ‘officially tested, recommended'” guidance from their KB article.  That’s enough for us to approach new Sitecore projects depending on Solr to go with Solr version 5.1 and keep an eye out for those particular gotchas that may cause us to upgrade to Solr 5.5.1.

The catch is, if you’re upgrading Solr and stopping at Solr 5.5.1 — is there a strong rationale not to upgrade beyond  5.5.1?  At this point, http://archive.apache.org/dist/lucene/solr/ has a wealth of newer Solr versions that are bound to have more patches and fixes that 5.5.1.  This is what you call a slippery slope:

solrslippery.JPG

I have to be careful here as I walk the line of a non-discolosure agreement, but there are still more variables to consider: in the near future, a Sitecore release is likely to involve thorough Solr support for a very recent version of Solr.  Expect a Solr version newer than 5.5.1 (which was released May of 2016 ☺).

So…

I believe I’ve sold myself on the wisdom of Solr 5.1 for now — so long as the sacred Sitecore Support ✓ is present on the official compatibility table.  It’s key to continue learning with Solr, though, and in the months to come we may be talking about SolrCloud and managed Solr schemas . . . cool new aspects to improve Sitecore implementations.

Sitecore Session Persistence Notes

I’ve neglected this blog of late, being focused on a number of “not easily blogged about” scenarios across several Sitecore projects.  It’s too bad, because the work is very interesting, but it doesn’t lend itself to a page or two write-up with a digestible take-away for the general Sitecore community out there.

I do want to keep in the habit of blogging, though, so I’m going to mention this ongoing discussion I’ve been a part of about session management with regards to Sitecore.  There are a few options for managing HTTP session state with Sitecore covered in https://doc.sitecore.net/sitecore_experience_platform/setting_up_and_maintaining/xdb/session_state/session_state: SQL Server, MongoDB, and Redis.  Those three technologies are really just the tip of the mountain, as implementation details for each can get quite detailed.  For the discerning Sitecore implementation, it can be useful to understand the nuances of each session state provider.  While not an exhaustive look at any one of these solutions, I wanted to post some notes on each one given the current state of Sitecore architecture (June 2017):

SQL Server

This is often the default session provider we gravitate to.  The SQL Server “Boost” script from Sitecore is something we’ve used on implementations (see “Optimize SQL Server performance” on that link), but it is not without it’s rough edges (see our Rackspace write-up on how to alter permissions so TempDB is reliably available across service restarts).

You’ll notice the approach for improving SQL Server performance with session state is all about getting session state “in-memory” to the furthest extent possible.  Remember this when we examine the other two providers below . . .

I will say that, generally speaking, SQL Server is easy to administer as it’s a well-known technology and updating it, scaling it, managing fail-overs, etc is simple compared with the alternatives.  SQL Server has been part of the Windows dev stack for ages, now, so it’s often the default session provider one gravitates to.

MongoDB

With MongoDB serving as the persistence layer for Sitecore’s xDB, it became a fully supported and viable option for HTTP session state with Sitecore at the same time.  The comparative performance between MongoDB and SQL Server is up for debate (Redis too, for that matter!), and it usually comes down to testing based on how the specific implementation is using session with Sitecore etc; I’m not going to hazard any generalizations on relative perf, as that’s not really the point of this post.

Instead, I’d like to point out how MongoDB does not come in just a single flavor.  The two most common flavors, or “storage engines,” are MMAP and WiredTiger, but there are still others designed to serve specific use cases.  Take, for example, the Percona Server for MongoDB hosted by ObjectRocket that has a posted option for the RocksDB storage engine.  RocksDB with MongoDB may not be a great fit for Sitecore session state (RocksDB is tuned for write-heavy work loads — and, in some cases, if you’re making extensive use of TTL indexes for Sitecore then RocksDB fits those scenarios in certain appealing ways), but it does open the door to MongoDB being more than just a one-size-fits-all data repository (read more about RocksDB and it’s Facebook pedigree here).  One MongoDB storage engine option that is easily overlooked is for WiredTiger “in-memory” that will force data to be stored in RAM . . . and this is perfect for HTTP Session State for most Sitecore builds.

In fact, if you consider the SQL Server “boost” approach that uses TempDB to store session state for Sitecore . . . WiredTiger “in-memory” is attacking the problem from the same direction.  Store everything in RAM!  This is why one must be cautious with general comparisons between SQL Server and MongoDB, the devil is always in the details: a far better comparison would be “boosted” SQL Server for Sitecore using TempDB vs MongoDB WiredTiger “in-memory” storage engine.  And note the network latency . . . and the size of the session objects . . . and you’re getting the point, I trust.  To really answer the SQL Server vs MongoDB question for Sitecore sessions, one has to develop a matrix of performance evaluations and level assumptions across the board.  “It depends” is the only honest answer that doesn’t come with a list of caveats.

If you’re curious on this MongoDB topic for your project, go to http://objectrocket.com/docs/mongodb_plans.html and spin up a WT 3.2 storage engine plan for 5 GB of storage (this allows 1.5 GB for RAM).  1.5 GB for RAM is going to be overkill for most small/medium Sitecore implementations — but again, you’ll want to test with your specific session data set to see!  Furthermore, network latency of 10 ms or less is going to help make the most of an ObjectRocket hosted MongoDB service like this — otherwise, the network latency may not make it worth the money.  Let me know if you pursue this with ObjectRocket, as there are some benchmarking measures we want to do but we haven’t had a real implementation to try it out on.  So if you feel like being a guinea pig, please let me know at grant.killian [at] rackspace.com.  It would be great to have real world metrics to prove this all out.

Redis

If the way to get the best session management perf out of SQL Server and MongoDB is to find in-memory solutions, Redis looks like the slam dunk since it’s just an in-memory storage solution.  We find most clients aren’t interested in managing Redis infrastructure, so again a hosted option such as ObjectRocket has appeal.

Sitecore relies on the StackExchange.Redis assembly, which doesn’t support Redis Sentinel — it’s a bit of a saga at https://github.com/StackExchange/StackExchange.Redis/pull/406;  therefore there’s not a great high availability story with the self-hosted Redis and Sitecore right now.  How concerned one should be with HA of fairly transient HTTP Session State for Sitecore, however, is an open question.  I usually wouldn’t worry about it too much.  Honestly, Redis is a technology that we’re just now starting to get really serious about at Rackspace so our sophistication in this space will improve dramatically in the months to come.  Between Azure Redis and all the Sitecore PaaS movement we’re seeing, it’s become a key player in a lot of Sitecore architectures.

Azure Search compared to Solr for Sitecore PaaS (Chapter 2: Querying)

I carried forward my Azure PaaS benchmarking work from earlier this month (see this post on the indexing side of the equation for the start of the story).

For a quick refresher, I’ve used an ARM template based deployment of Sitecore to get a system resembling the following:

ARM Templates Arch

The element I’m exercising in the benchmarks is how Sitecore’s web servers work with the “Search” icon in the diagram above.  I tackled the document ingestion side (how data gets into the search indexes) in my earlier post.  This post addresses the querying side of things (how data gets out of the search indexes).

By default, Azure PaaS search with Sitecore is configured to use Azure Search.  Solr is another viable option.

Here’s where I’ll interject that Coveo also has an excellent search technology for Sitecore.  There are specific use-cases where Coveo is a strong fit, however, and in my indexing the sitecore_core_index evaluations in the earlier post Coveo would not be considered a good fit.  This changes, however, for the set of benchmarks I’ve run in this post.  I am in the process of testing the Coveo approach in Azure PaaS for Sitecore . . . it’s hot off the presses, so there are still rough edges to work around . . . but Coveo is not part of this write-up for the time being.  I will post an update here once I’ve completed the analysis involving Coveo.

In considering Azure Search vs Solr, I used a methodology with JMeter laid out in a great KB article from Sitecore at https://kb.sitecore.net/articles/398589.  I have a LaunchSitecore site running and I use JMeter to automate visits to the site, simulating simple user behaviour.  I don’t go too crazy with this, because I’m more interested in exercising a basic Sitecore work load than doing a deep-dive in xDB traffic simulation.

My first post showed a clear advantage to Solr for the indexing side of search, but for the querying side I can say there is very little variance between Azure Search and Solr.  Sitecore does a good job of protecting data repositories with layers of data and html caches, but even with those those features disabled (we’re talking cacheHtml=”false”on the site definition, <cacheSizes> configuration all set to a heretical zero (“0”), etc) there isn’t a significant difference between the two technologies.

I’m not going to put up a graph of it, because the throughput as measured by JMeter for tests of 20, 50, 100, 200, or  more visitors performed almost the same.

I could develop a more search heavy set of benchmarks, performing a random dictionary of searches against a large custom index that Sitecore responds to but must bypass all caches etc, but that feels like overkill for what I’m looking to achieve.  Maybe that’s appropriate once I bring Coveo into the benchmarking fun.

For this, I wanted to get a sense for the relative performance between Azure Search and Solr as it relates to Sitecore PaaS and I think I’ve done that.  Succinctly:

  1. Solr is considerably faster at search indexing (courtesy of the search provider implementation in Sitecore)
  2. both Azure Search and Solr perform about the same when it comes to querying a basic Sitecore site like LaunchSitecore (again, courtesy of the search provider implementation in Sitecore)

This isn’t the definitive take on the topic.  It’s more like the beginning.  Azure Search is native to Azure, so there are significant advantages there.  There is a lot of momentum around Azure and Sitecore in general, so that story will continue to evolve.

There are Solr as a service options out there that make Solr for Sitecore much easier (such as www.measuredsearch.com which I’ll blog about in the next few days), but Solr can be a lot for corporate IT departments to take on, so it isn’t a simple choice for everyone.

 

 

High Availability of Azure Search with Sitecore

I’ve been investigating Azure Search with Sitecore’s new Azure App Service offering.  I’ve got a giant Excel file of benchmarks and charts based on several permutations and configurations, and several other interesting tidbits that I need to organize into posts to this blog . . . so look for much more about this general topic in the future.

For now, I thought I’d share a point I’ve confirmed with Sitecore support regarding a limitation of Azure Search with Sitecore’s CloudSearchProviderIndex.  The CloudSearchProviderIndex is what the standard Platform-As-A-Service product from Sitecore will use in place of Lucene or Solr or Coveo to power content search for Sitecore.  This is the key building block for working with Azure Search through Sitecore.  While I was performing performance benchmarks for search re-indexing with Sitecore, I noticed the Azure Search document count would drop to 0 and I’d see odd results from Sitecore requests that depended on the search index.  This was classic “search index is being worked on, don’t rely on querying it until the work is done” behaviour.  This was corrected several years ago through Sitecore’s addition of a SwitchOnRebuildLuceneIndex and equivalent for Solr . . . but there is no such equivalent for the CloudSearchProviderIndex used by Azure PaaS solutions.  Essentially: Sitecore is using a single copy of search indexes for query and re-indexing operations, limiting the availability of search during maintenance work.

One could argue this may not be such a big deal because one may not rebuild Azure Search indexes with any frequency.  I’m not sold on this argument, however, since the Sitecore projects I know will frequently perform re-indexing due to development changes to the schema, content synchronization demands, or just routine deployment standard practices.

Further complicating this issue is that my benchmarking for Azure Search re-indexing through Sitecore leaves a lot to be desired.  It can be slow.  This could make for an extended period of search index unavailability due to the CloudSearchProviderIndex‘s limitations.  I’ll share the full battery of testing I’ve done in a future post, but for now let me share the timings I’m observing regardless of the number of Azure Search partitions or replicas I’m working through (partitions should generally improve indexing performance; replicas should generally improve querying performance):

App Service Configuration Time for 20,000 Sitecore Items to Re-Index with Azure Search
Azure PaaS Standard (S1) CM IIS (OOTB from the Marketplace) 66 minutes
Azure S2 CM IIS 35 minutes
Azure S3 CM IIS 25 minutes
Azure P2 CM IIS 35 minutes
Azure P3 CM IIS 24 minutes

For reference, with Lucene indexes this operation would take 5 minutes or less.  The scaling options for Azure Search, Partition count and Replica count, have a minimal impact to the re-indexing operation.

I’ll go into details of this later, but it could be that . . .

  • 20,000 Sitecore items is too small a figure to benefit from scaling with Azure Search?  Many customers have 100,000 or more items, so perhaps I should evaluate a larger data set.
  • there are bottlenecks at the SQL tier?  App Insights here I come…
  • the fact Sitecore isn’t using Azure Search Indexers to ingest data and relies on the Sitecore crawling logic to handle data indexing is artificially slowing this process down

For the time being, Sitecore has responded that improving the availability of Azure Search indexes during rebuilds is an official “feature request” and assigned reference number 146822 

In the meantime, if a project needs high availability for Azure Search indexes one may need to roll up their sleeves and craft their own SwitchOnCloudSearchProviderIndex.  It appears fairly straight-forward based on reviewing how this is solved for Solr, just as one example.  A key caveat is in the Azure Search capacity planning documentation:

High availability for Azure Search pertains to queries and index updates that don’t involve rebuilding an index. If you add or delete a field, change a data type, or rename a field, you will need to rebuild the index. To rebuild the index, you must delete the index, re-create the index, and reload the data.

To maintain index availability during a rebuild, you must have a copy of the index with a different name on the same service, or a copy of the index with the same name on a different service, and then provide redirection or failover logic in your code.

It looks like providing for high availability would double the price of Azure Search indexes, so there are a cascade of complications related to this.

My investigations into Sitecore and Azure Search yielded this complication — it’s not insurmountable, and I actually find it fascinating how an on-premises product (classic Sitecore) will evolve into a cloud-first product.  This is just one piece of the evolutionary story.  I expect this will be addressed sooner rather than later in an official upgrade or patch from Sitecore, and until then it’s important to understand this nuance to the Sitecore PaaS landscape.

Strategies for Sitecore Index Organization into Solr Cores

A few days ago, I shared a graphic I put together to illustrate how Solr can be used to organize Sitecore “indexes” into Solr “cores” — this post has the complete graphic.  I want to elaborate on how one sets Sitecore up to use these two approaches, and dig further into the details.

1:1 Sitecore Index to Solr Core Strategy

To start, here’s a visual showing the typical way Sitecore “indexes” are structured in Solr using a one-to-one (1:1) mapping:

solrseparate

This shows each of the default search indexes defined by Sitecore organized into their own cores defined in Solr.  It’s a 1:1 mapping.  This 1:1 strategy means each index has their own configuration (“conf”) directory in Solr, so seperate stopwords.txt, solrconfig.xml, schema.xml, and so on; it also means each index has their own (“data”) directory in Solr, so separate tlog folders, separate Segment files, etc.

This is the setup one achieves by following the community documentation on setting up Sitecore with Solr; specifically, this quote from that write-up is where you’re doing a lot of the grunt work around setting up distinct Solr cores for each Sitecore index:

“Use the process detailed in Steps 4-7 to create new cores for all the remaining indexes you would like to move to SOLR.”

Since this is the common strategy, I’m not going to go into more details as it’s straight-forward to Sitecore teams.

Kitchen Sink (∞:1 Sitecore Index to Solr Core) Strategy

Here is the comparable graphic showing the ∞:1 strategy of structuring Sitecore indexes in Solr; I like to think of this as the Kitchen Sink container for all Sitecore indexes, since everything goes into that single core just like the kitchen sink:

solrsame

With this approach, a single data and configuration definition is shared by all the Sitecore indexes that reside in Solr.  The advantages are reduced management (setting up the Solr replicationHandler, for example, requires updating 15 solrconfig.xml files in the 1:1 approach, but the Kitchen Sink would require only one solrconfig.xml file to update).  There are significant drawbacks to consider with the Kitchen Sink, however, as you’re sacrificing scaling options specific to each Sitecore index and enforcing a common schema.xml for every index stored in this single core.  There are plenty of reasons not to do this for a production installation of Sitecore, but for a crowded Sitecore environment used for acceptance testing or other use-cases where bullet-proof stability and lots of flexibility when it comes to performance tuning, sharding, etc is not necessary, you could make a good case for the Kitchen Sink strategy.

The only change necessary to a standard Sitecore configuration to support this Kitchen Sink approach is to patch the contentSearch definitions for the Sitecore indexes where the name of the Solr “core” is specified (stored by default in config files like Sitecore.ContentSearch.Solr.Index.Master.config,  Sitecore.ContentSearch.Solr.Index.Web.config, etc).   This is telling Sitecore which Solr core contains the index, but the actual name of the core doesn’t factor into the ContentSearch API code one uses with Sitecore.   A patch such as the following would handle both the sitecore_master_index and the sitecore_web_index to organize into a Solr Core named “kitchen_sink:”

<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/">
  <sitecore>
    <contentSearch>
      <configuration>
        <indexes>
          <index id="sitecore_master_index" type="Sitecore.ContentSearch.SolrProvider.SolrSearchIndex, Sitecore.ContentSearch.SolrProvider">
            <param desc="core">kitchen_sink</param>
          </index>
          <index id="sitecore_web_index" type="Sitecore.ContentSearch.SolrProvider.SolrSearchIndex, Sitecore.ContentSearch.SolrProvider">
            <param desc="core">kitchen_sink</param>
          </index>
        </indexes>
        </configuration>
    </contentSearch>
  </sitecore>
</configuration>

If you peek into the Solr Admin for the kitchen_sink core that I’m using, specifically the Schema Browser in the Solr Admin UI, it becomes clear how Sitecore uses a field named “_indexname” to represent the Sitecore index value.  For this screenshot below, I’ve set the kitchen_sink core to contain two Sitecore indexes: sitecore_master_index and sitecore_web index:

solrterms

This shows us the two terms stored in that _indexname field, and that there are 18,774 for sitecore_master_index and 5,851 for sitecore_web_index.  Even though the indexes are contained in the same Solr Core, Sitecore ContentSearch API code like this . . .

Sitecore.ContentSearch.ISearchIndex index = 
  ContentSearchManager.GetIndex(indexName);
    using (Sitecore.ContentSearch.IProviderSearchContext ctx = 
      index.CreateSearchContext())

. . . doesn’t care whether all the Sitecore indexes reside in a single Solr “Core” or if they’re in their own following a 1:1 mapping strategy.

Caveats and Going In A Different Direction

There was a bug or two in earlier versions of Sitecore related to this, so be careful with early Sitecore 7.2 or Sitecore 8 implementations (and if you’re using Sitecore 7.5, you’ve got plenty of other things to worry about so don’t sweat a Solr Core organization strategy!).

I should also note that while this post is looking at combining Sitecore indexes into a single Solr Core for convenience and to reduce the management headaches of having 15 sets of Solr Cores to update etc, there are some implementations that go in the opposite direction.  Consider a strategy like the following:

solrmindblown

 

There may be circumstances where keeping Sitecore indexes in their own Solr Core — and even isolating them further into their own Solr implementation — could be in order.  Solr runs in a JVM and this could certainly factor in, but there are other shared run-time resources that Solr sets aside for the whole Solr application.

I’m not familiar enough with these sorts of implementations that I want to comment further or recommend any course of action related to this right now, but it’s good to think about and consider with Solr tuning scenarios.  I just wanted to share it, as it’s a logical dimension to consider given the two previous strategies in this post.

 

Solr Configuration for Integration with Sitecore

I’ve got a few good Solr and Sitecore blogs around 75% finished, but I’ve been too busy lately to focus on finishing them.  In the meantime, I figure a picture can be worth 1,000 words sometimes so let me post this visual representation of Solr strategies for Sitecore integrations.  One Solr core per index is certainly the best practice for production Sitecore implementations, but now that Solr support has significantly matured at Sitecore a one Solr core for all the Sitecore indexes is a viable, if limited, option:

draft

There used to be a bug (or two?) that made this single Solr core for every Sitecore index unstable, but that’s been corrected for some time now.

More to follow!

Sitecore Gets Serious About Search, by Leaving the Game

Sitecore is steadily reducing the use-cases for using Lucene as the search provider for the Sitecore product.  Sitecore published this document on “Solr or Lucene” a few days ago, going so far as to state this one criteria where you must use Solr:

  • You have two or more content delivery servers

Read that bullet point again.  If you’ve worked with Sitecore for a while, this should trigger an alarm.

Sound the alarm: most implementations have 2 or more CD servers!  I’d say 80% or more of Sitecore implementations are using more than a single CD server, in fact.  Carrying this logic forward, 80% of Sitecore implementations should not be running on Lucene!  This is a big departure for Sitecore as a company, who would historically dodge conversations about what is the right technology for a given situation.  I think the Sitecore philosophy was that advocating for one technical approach over another is risky and means a Sitecore endorsement could make them accountable if a technology approach fails for one reason or another.  For as long as I’ve known Sitecore, it’s a software company intent on selling licenses, not dictating how to use the product.  Risk aversion has kept them from getting really down in the weeds with customers.  This tide has been turning, however, with things like Helix and now this more aggressive messaging about the limitations of Lucene.  I think it’s great for Sitecore to be more vocal about best practices, it’s just taken years for them to come around to the idea.

As a bit of a search geek, I want to state for the record that this new Solr over Lucene guidance from Sitecore is not really an indictment of Lucene.  The Apache Lucene project, and it’s cousin the .Net port Lucene.net that Sitecore makes use of out-of-the-box, was ground breaking in many ways.  As a technology, Lucene can handle enormous volumes of documents and performs really well.  Solr is making use of Lucene underneath it all, anyway!  This recent announcement from Sitecore is more acknowledgement that Sitecore’s event plumbing is no substitute for Solr’s CAP-theorem straddling acrobatics.  Sitecore is done trying to roll their own distributed search system.  I think this is Sitecore announcing that they’re tired of patching the EventQueue, troubleshooting search index update strategies for enterprise installations, and giving up on ensuring clients don’t hijack the Sitecore heartbeat thread and block search indexing with some custom boondoggle in the publishing pipeline.  They’re saying: we give up — just use Solr.

Amen.  I’m fine with that.

To be honest, I think this change of heart can also be explained by the predominant role Azure Search plays in the newest Sitecore PaaS offering.  Having an IP address for all the “search stuff” is awful nice whether you’re running in Azure or anywhere else.  It’s clear Sitecore decided they weren’t keen to recreate the search wheel a few years ago, and are steadily converging around these technologies with either huge corporate backing (Azure Search) or a vibrant open source community (Solr).

I should also note, here, that Coveo for Sitecore probably welcomes this opportunity for Sitecore teams to break away from the shackles of the local Lucene index.  I’m not convinced that long-term Coveo can out-run the likes of Azure Search or Solr, but I know today if your focus is quickly ramping up a search heavy site on Sitecore you should certainly weight the pros/cons of Coveo.

With all this said, I want to take my next few posts and dig really deep into Solr for Sitecore and talk about performance monitoring, tuning, and lots of areas that get overlooked when it comes to search with Sitecore and Solr.   So I’ll have more to say on this topic in the next few days!