Multi-region Sitecore Publishing

Sitecore’s Publishing Service that runs on .NET Core is a great addition to the Sitecore ecosystem. It allows us to solve some interesting customer scaling challenges by using this micro-services approach to Publishing content. I’m going to write-up a pattern we’re using these days that updates our approach from a few years ago.

See an example of the older pattern in this piece I wrote for the Rackspace site at https://developer.rackspace.com/blog/Sitecore-Enterprise-Architecture-For-Global-Publishing/.

Now in May 2019, we’re shifting away from the SQL replication game and using Sitecore’s new Publishing Service to connect Sitecore across multiple regions. Refer to this general diagram below to see how we’re approaching it:

2RegionPublishingService

Sitecore’s Publishing Service is the key element between the two regions and the blue arrows show the flow of publishing activities coordinated through the one “Sitecore Publishing Service” host in Region 1.

A few caveats on the picture above:

  1. It’s Sitecore 8.2, so MongoDB is present but not shown on the diagram for simplicity (we use ObjectRocket’s hosted MongoDB service for the majority of these types of customers — but I don’t want to get into that here); Redis and other elements are also not included in the diagram
  2. This applies for any multi-region setup with Sitecore. . . it could be East US and West US, for example, but we used Europe and Asia in the diagram. This approach is most useful where network latency between the regions is enough to make synchronous database connectivity unacceptably slow. This model can apply to more than 2 regions, too, as the pattern can be repeated to support as many regions as you require.

There are just a few crucial configuration steps to make this happen, but it’s built on a lot of lessons learned along the way. Let me catalog the key elements:

  1. The Publishing Service runs in Region 1, but requires a Sitecore Publishing Target to the Region 2 database. The documentation on setting up this type of Publishing Target is vague, so I summarized this process at https://grantkillian.wordpress.com/2018/12/17/how-i-add-custom-sitecore-publishing-service-targets/.
  2. Each region has an isolated Solr cluster (because Solr CDCR or file synchronization for Solr were not suitable in this use-case). This means one of the Region 2 Sitecore CD servers needs to employ the onPublishEndAsync strategy to update the Solr Cloud collections relevant to the implementation. This is standard ContentSearch configuration material, but if you use the manual strategy here with the CDs (which is the general best practice for Sitecore CD servers connected to a Solr cluster with a CM that drives search indexing), the Solr data will never get updated in the other region:
    • <strategies hint="list:AddStrategy">
        <strategy 
        ref="contentSearch/indexConfigurations/indexUpdateStrategies/onPublishEndAsync"/>
      </strategies>
  3. If you are using Sitecore ContentTesting with this approach (<setting name=”ContentTesting.AutomaticContentTesting.Enabled” value=”true” />), you should be aware that Sitecore CM performance can occasionally stall for several minutes (we’ve seen it last up to 20 minutes!) due to an aspect of the ContentTesting logic that checks every content database for eligible published items to factor into the content testing system. Part of setting up the Region 2 Publishing Target involves adding a ConnectionStrings.config entry to the Region 2 “web” database on the Region 1 Sitecore CM server. This adds the Region 2 “web” database into this ContentTesting routine, and the network latency between Region 1 and Region 2 makes this ContentTesting behaviour slow the CM to a crawl every so often.  If you don’t want to disable Sitecore ContentTesting, you can address this by customizing the Sitecore.ContentTesting.Helpers.VersionHelper.GetLatestPublishedVersion method to employ logic to exclude the Region 2 “web” database. Once you dig deep into this topic, you’ll see the Sitecore.ContentTesting.Helpers.VersionHelper class contains this logic and it’s used in 3 places (according to the decompilation of the .dll):

dude

To adjust ContentTesting to ignore our Region 2 “web” database, we can alter the foreach loop above with something like this that uses a custom “ContentTesting.IgnoredDatabases” setting:

foreach (Database db in Factory.GetDatabases())
{
  string[] excludeList = 
    Sitecore.Configuration.Settings.GetSetting(
    "ContentTesting.IgnoredDatabases")
    .ToLowerInvariant().Split(
        new char[1]
       {
        '|'
       }, 
   StringSplitOptions.RemoveEmptyEntries);
  if (database != null && 
    db.Name != database.Name && 
    !excludeList.Contains(db.Name))
  {
    Item item2 = db.GetItem(item.ID, item.Language);
    if (item2 != null && item2.Version.Number > num)
    {
      num = item2.Version.Number;
    }
  }
}

We can define our custom setting like the following, if we assume region2web is the “web” database ConnectionString name for the Region 2 publishing target on the Sitecore CM:

<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/">
  <sitecore>
    <settings>
      <setting name="ContentTesting.IgnoredDatabases">
        <patch:attribute name="value">core|region2web</patch:attribute>
      </setting> 
    </settings>
  </sitecore>
</configuration>

This work to override the default configuration from . . .

<getVersionedTestCandidates>
  <processor 
    type="Sitecore.ContentTesting.Pipelines.GetTestCandidates.GetPageVersionTestCandidates, Sitecore.ContentTesting">

. . . can dramatically improve the Sitecore CM performance when using this formula for multi-region Sitecore with the new Publishing Service.

Hopefully these notes help other efforts on their Sitecore journey!

How I Add Custom Sitecore Publishing Service Targets

At this point, I think I’ve installed, configured, or customized the new Sitecore Publishing Service at least a dozen times for various projects. Sometimes it’s on PaaS, sometimes on IaaS . . . I’ve used a variety of different versions depending on the compatibility matrix (see below as of Dec 16, 2018):

PubSvcVisual

I’m going to skip all the preamble about how the new Sitecore Publishing Service works, about .Net core being the new hotness, why this component can be a great addition to many distributed Sitecore implementations, etc — smart people have written a lot about this already. For example, check out Stephen Pope’s no-holds-barred look at the Publishing Service at http://www.stephenpope.co.uk/publishing or Jonathan Robbins has a nice overview piece at https://jonathanrobbins.co.uk/2016/09/02/setting-up-sitecore-publishing-service/.

I’ve learned a good bit from all the iterations of working with the component and I think consistently the most error-prone part of the setup is aligning any additional custom Sitecore publishing targets one is using in an implementation. This write-up from Geykel Moreno at AlphaSolutions has all the good information, but it’s not as easy to follow because it doesn’t post a comprehensive sc.publishing.xml file — it took a bit of trial and error for me, so to simplify for posterity I’m going to share a reference sample Gist at https://gist.github.com/grant-killian/d2fe8d3e89c5d7b15f47464dd1809d62 that includes 2 additional custom publishing targets. I’ve inserted XML comments for the 3 locations one must update in the config\sitecore\publishing\sc.publishing.xml file:

  1. You need to add your ConnectionString entry for each database to the Publishing/ConnectionStrings XML
  2. You need to add your Services/DefaultConnectionFactory/Options/Connections XML definition for each custom target
  3. You need to add entries for each target to the StoreFactory/Options/Stores/Targets XML that will include the GUID of the Sitecore item that defines each publishing target, along with the Name of the item and additional details

Here’s the gist with the full XML for reference:


<?xml version="1.0" encoding="UTF-8"?>
<Settings>
<Sitecore>
<Publishing>
<InstanceName>${SITECORE_InstanceName}</InstanceName>
<ConnectionStrings>
<Service>${Sitecore:Publishing:ConnectionStrings:Master}</Service>
<!– Add any additional publishing targets you may use (first location for changes to this file) –>
<previewweb>Data Source=Server-012345;Initial Catalog=PrevWeb;Integrated Security=True;MultipleActiveResultSets=True;ConnectRetryCount=15;ConnectRetryInterval=1</previewweb>
<liveweb>Data Source=Server-012345;Initial Catalog=LiveWeb;Integrated Security=True;MultipleActiveResultSets=True;ConnectRetryCount=15;ConnectRetryInterval=1</liveweb>
<!– end first location for changes –>
</ConnectionStrings>
<Services>
<DefaultConnectionFactory>
<Options>
<Connections>
<Links>
<Type>Sitecore.Framework.Publishing.Data.AdoNet.SqlDatabaseConnection, Sitecore.Framework.Publishing.Data</Type>
<LifeTime>Transient</LifeTime>
<Options>
<ConnectionString>${Sitecore:Publishing:ConnectionStrings:Core}</ConnectionString>
<DefaultCommandTimeout>120</DefaultCommandTimeout>
<Behaviours>
<backend>sql-backend-default</backend>
<api>sql-api-default</api>
</Behaviours>
</Options>
</Links>
<Service>
<Type>Sitecore.Framework.Publishing.Data.AdoNet.SqlDatabaseConnection, Sitecore.Framework.Publishing.Data</Type>
<LifeTime>Transient</LifeTime>
<Options>
<ConnectionString>${Sitecore:Publishing:ConnectionStrings:Service}</ConnectionString>
<DefaultCommandTimeout>120</DefaultCommandTimeout>
<Behaviours>
<backend>sql-backend-default</backend>
<api>sql-api-default</api>
</Behaviours>
</Options>
</Service>
<Master>
<Type>Sitecore.Framework.Publishing.Data.AdoNet.SqlDatabaseConnection, Sitecore.Framework.Publishing.Data</Type>
<LifeTime>Transient</LifeTime>
<Options>
<ConnectionString>${Sitecore:Publishing:ConnectionStrings:Master}</ConnectionString>
<DefaultCommandTimeout>120</DefaultCommandTimeout>
<Behaviours>
<backend>sql-backend-default</backend>
<api>sql-api-default</api>
</Behaviours>
</Options>
</Master>
<Internet>
<!– Should match the name of the publishing target configured in SC. –>
<Type>Sitecore.Framework.Publishing.Data.AdoNet.SqlDatabaseConnection, Sitecore.Framework.Publishing.Data</Type>
<LifeTime>Transient</LifeTime>
<Options>
<ConnectionString>${Sitecore:Publishing:ConnectionStrings:Web}</ConnectionString>
<DefaultCommandTimeout>120</DefaultCommandTimeout>
<Behaviours>
<backend>sql-backend-default</backend>
<api>sql-api-default</api>
</Behaviours>
</Options>
</Internet>
<!– start custom publishing target additions (2nd location) –>
<Preview>
<!– Should match the name of the publishing target configured in Sitecore –>
<Type>Sitecore.Framework.Publishing.Data.AdoNet.SqlDatabaseConnection, Sitecore.Framework.Publishing.Data</Type>
<LifeTime>Transient</LifeTime>
<Options>
<ConnectionString>${Sitecore:Publishing:ConnectionStrings:previewweb}</ConnectionString>
<DefaultCommandTimeout>120</DefaultCommandTimeout>
<Behaviours>
<backend>sql-backend-default</backend>
<api>sql-api-default</api>
</Behaviours>
</Options>
</Preview>
<Live>
<!– Should match the name of the publishing target configured in Sitecore –>
<Type>Sitecore.Framework.Publishing.Data.AdoNet.SqlDatabaseConnection, Sitecore.Framework.Publishing.Data</Type>
<LifeTime>Transient</LifeTime>
<Options>
<ConnectionString>${Sitecore:Publishing:ConnectionStrings:liveweb}</ConnectionString>
<DefaultCommandTimeout>120</DefaultCommandTimeout>
<Behaviours>
<backend>sql-backend-default</backend>
<api>sql-api-default</api>
</Behaviours>
</Options>
</Live>
<!– end custom publishing target additions (2nd location) –>
</Connections>
</Options>
</DefaultConnectionFactory>
<DbConnectionBehaviours>
<Options>
<Entries>
<sql-backend-default>
<Type>Sitecore.Framework.Publishing.Data.AdoNet.NoRetryConnectionBehaviour, Sitecore.Framework.Publishing.Data</Type>
<Options>
<Name>Default Backend No Retry behaviour</Name>
<CommandTimeout>120</CommandTimeout>
</Options>
</sql-backend-default>
<sql-api-default>
<Type>Sitecore.Framework.Publishing.Data.AdoNet.NoRetryConnectionBehaviour, Sitecore.Framework.Publishing.Data</Type>
<Options>
<Name>Default Api No Retry behaviour</Name>
<CommandTimeout>10</CommandTimeout>
</Options>
</sql-api-default>
</Entries>
</Options>
</DbConnectionBehaviours>
<StoreFactory>
<Options>
<Stores>
<Service>
<Type>Sitecore.Framework.Publishing.Data.ServiceStore, Sitecore.Framework.Publishing.Data</Type>
<ConnectionName>Service</ConnectionName>
<FeaturesListName>ServiceStoreFeatures</FeaturesListName>
</Service>
<Sources>
<Master>
<Type>Sitecore.Framework.Publishing.Data.SourceStore, Sitecore.Framework.Publishing.Data</Type>
<ConnectionNames>
<master>Master</master>
</ConnectionNames>
<FeaturesListName>SourceStoreFeatures</FeaturesListName>
<!– The name of the Database entity in Sitecore. –>
<ScDatabase>master</ScDatabase>
</Master>
</Sources>
<Targets>
<!–Additional targets can be configured here–>
<Internet>
<Type>Sitecore.Framework.Publishing.Data.TargetStore, Sitecore.Framework.Publishing.Data</Type>
<ConnectionName>Internet</ConnectionName>
<FeaturesListName>TargetStoreFeatures</FeaturesListName>
<!– The id of the target item definition in Sitecore. –>
<Id>8E080626-DDC3-4EF4-A1D1-F0BE4A200254</Id>
<!– The name of the Database entity in Sitecore. –>
<ScDatabase>web</ScDatabase>
</Internet>
<!– start custom publishing target additions (third location) –>
<!– this XML node should be named the same as the item in Sitecore (not the "Display Name", but the Item name) –>
<Preview>
<Type>Sitecore.Framework.Publishing.Data.TargetStore, Sitecore.Framework.Publishing.Data</Type>
<ConnectionName>Preview</ConnectionName>
<FeaturesListName>TargetStoreFeatures</FeaturesListName>
<!– make sure the GUID below matches the GUID stored in Sitecore for the Publishing Target –>
<Id>8D1249E6-9413-4C2D-8C72-06561CE1D026</Id>
<ScDatabase>preveiwweb</ScDatabase>
</Preview>
<!– this XML node should be named the same as the item in Sitecore (not the "Display Name", but the Item name) –>
<Live>
<Type>Sitecore.Framework.Publishing.Data.TargetStore, Sitecore.Framework.Publishing.Data</Type>
<ConnectionName>Live</ConnectionName>
<FeaturesListName>TargetStoreFeatures</FeaturesListName>
<!– make sure the GUID below matches the GUID stored in Sitecore for the Publishing Target –>
<Id>0EA57D57-7837-4B51-A72C-E8B3F1322C07</Id>
<ScDatabase>liveweb</ScDatabase>
</Live>
<!– end custom publishing target additions (third location) –>
</Targets>
<ItemsRelationship>
<Type>Sitecore.Framework.Publishing.Data.ItemsRelationshipStore, Sitecore.Framework.Publishing.Data</Type>
<ConnectionName>Links</ConnectionName>
<FeaturesListName>ItemsRelationshipStoreFeatures</FeaturesListName>
</ItemsRelationship>
</Stores>
</Options>
</StoreFactory>
<StoreFeaturesLists>
<Options>
<FeatureLists>
<!–Source Store Features–>
<SourceStoreFeatures>
<ItemReadRepositoryFeature>
<Type>Sitecore.Framework.Publishing.Data.CompositeItemReadRepository, Sitecore.Framework.Publishing.Data</Type>
</ItemReadRepositoryFeature>
<TestableContentRepositoryFeature>
<Type>Sitecore.Framework.Publishing.Data.CompositeTestableContentRepository, Sitecore.Framework.Publishing.Data</Type>
</TestableContentRepositoryFeature>
<WorkflowStateRepositoryFeature>
<Type>Sitecore.Framework.Publishing.Data.CompositeWorkflowStateRepository, Sitecore.Framework.Publishing.Data</Type>
</WorkflowStateRepositoryFeature>
<EventQueueRepositoryFeature>
<Type>Sitecore.Framework.Publishing.Data.CompositeEventQueueRepository, Sitecore.Framework.Publishing.Data</Type>
<options>
<ConnectionName>master</ConnectionName>
</options>
</EventQueueRepositoryFeature>
<SourceIndexFeature>
<Type>Sitecore.Framework.Publishing.ItemIndex.SourceIndexWrapper, Sitecore.Framework.Publishing</Type>
</SourceIndexFeature>
</SourceStoreFeatures>
<!–Service Store Features–>
<ServiceStoreFeatures>
<ManifestRepositoryFeature>
<Type>Sitecore.Framework.Publishing.Manifest.ManifestRepository, Sitecore.Framework.Publishing</Type>
</ManifestRepositoryFeature>
<PublisherOperationRepositoryFeature>
<Type>Sitecore.Framework.Publishing.PublisherOperations.PublisherOperationRepository, Sitecore.Framework.Publishing</Type>
</PublisherOperationRepositoryFeature>
<PublishJobQueueRepositoryFeature>
<Type>Sitecore.Framework.Publishing.PublishJobQueue.PublishJobQueueRepository, Sitecore.Framework.Publishing</Type>
</PublishJobQueueRepositoryFeature>
<TargetSyncStateRepositoryFeature>
<Type>Sitecore.Framework.Publishing.TargetSyncState.TargetSyncStateRepository, Sitecore.Framework.Publishing</Type>
</TargetSyncStateRepositoryFeature>
<ActivationLockRepositoryFeature>
<Type>Sitecore.Framework.Publishing.InstanceActivation.ActivationLockRepository, Sitecore.Framework.Publishing</Type>
</ActivationLockRepositoryFeature>
</ServiceStoreFeatures>
<!–Target Store Features–>
<TargetStoreFeatures>
<IndexableItemRepositoryFeature>
<Type>Sitecore.Framework.Publishing.Data.Classic.ClassicIndexableItemRepository, Sitecore.Framework.Publishing.Data.Classic</Type>
</IndexableItemRepositoryFeature>
<ItemWriteRepositoryFeature>
<Type>Sitecore.Framework.Publishing.Data.Classic.ClassicItemRepository, Sitecore.Framework.Publishing.Data.Classic</Type>
</ItemWriteRepositoryFeature>
<MediaRepositoryFeature>
<Type>Sitecore.Framework.Publishing.Data.Classic.Repositories.ClassicMediaRepository, Sitecore.Framework.Publishing.Data.Classic</Type>
</MediaRepositoryFeature>
<TargetIndexFeature>
<Type>Sitecore.Framework.Publishing.ItemIndex.TargetIndexWrapper, Sitecore.Framework.Publishing</Type>
</TargetIndexFeature>
</TargetStoreFeatures>
<!–ItemsRelationship Store Features–>
<ItemsRelationshipStoreFeatures>
<DatabaseItemRelationshipRepositoryFeature>
<Type>Sitecore.Framework.Publishing.Data.Classic.ClassicItemRelationshipRepository, Sitecore.Framework.Publishing.Data.Classic</Type>
</DatabaseItemRelationshipRepositoryFeature>
</ItemsRelationshipStoreFeatures>
</FeatureLists>
</Options>
</StoreFeaturesLists>
</Services>
</Publishing>
</Sitecore>
</Settings>

https://gist.github.com/grant-killian/d2fe8d3e89c5d7b15f47464dd1809d62.js

Sitecore artifact table patch config

I’ve patched EventQueues several times through the years, so I saved this Gist to make it easier for future opportunities.  There’s not much new to share in terms of introducing why one does this, refer to this blog about Sitecore artifact tables (or the old reliable Sitecore CMS Tuning Guide). You also have to take care around the order of configuration file processing, so zzzzzArtifactTableRetention.config this sucker if you really must 🙂

Here’s the Gist – https://gist.github.com/grant-killian/ffa1e84770b10a90e2454e241986b911  and here’s the expanded XML:


<?xml version="1.0" encoding="utf-8" ?>
<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/&quot; xmlns:xdt="http://schemas.microsoft.com/XML-Document-Transform"&gt;
<sitecore>
<scheduling>
<agent type="Sitecore.Tasks.CleanupEventQueue, Sitecore.Kernel">
<patch:delete />
</agent>
<agent type="Sitecore.Tasks.CleanupEventQueue, Sitecore.Kernel" method="Run" interval="01:00:00">
<IntervalToKeep>06:00:00</IntervalToKeep>
</agent>
<agent type="Sitecore.Tasks.CleanupPublishQueue, Sitecore.Kernel">
<patch:delete />
</agent>
<agent type="Sitecore.Tasks.CleanupPublishQueue, Sitecore.Kernel" method="Run" interval="04:00:00">
<DaysToKeep>7</DaysToKeep>
</agent>
</scheduling>
<databases>
<database id="master">
<Engines.HistoryEngine.Storage>
<patch:delete />
</Engines.HistoryEngine.Storage>
<Engines.HistoryEngine.Storage>
<obj type="Sitecore.Data.SqlServer.SqlServerHistoryStorage, Sitecore.Kernel">
<param connectionStringName="$(id)" />
<EntryLifeTime>7.00:00:00</EntryLifeTime>
</obj>
</Engines.HistoryEngine.Storage>
</database>
<database id="web">
<Engines.HistoryEngine.Storage>
<patch:delete />
</Engines.HistoryEngine.Storage>
<Engines.HistoryEngine.Storage>
<obj type="Sitecore.Data.SqlServer.SqlServerHistoryStorage, Sitecore.Kernel">
<param connectionStringName="$(id)" />
<EntryLifeTime>7.00:00:00</EntryLifeTime>
</obj>
</Engines.HistoryEngine.Storage>
</database>
</databases>
</sitecore>
</configuration>

Walkthrough of Solr Query Analysis for Sitecore

Anton Tishchenko wrote a good quick piece on “stemming” for Solr, and I wanted to build on what he shared.

Solr is generally way underutilized by the Sitecore implementations I’ve seen. It makes for plenty of territory for blogging, though, so maybe bit-by-bit the word will get out about what a powerful distributed system Solr really is.

Let me setup a specific example using Sitecore 9 update-2 with Commerce. I’ll use the CatalogItemsScope Solr core, but you could review most any Solr core with the standard Sitecore schema.

Let me pause here to explain where the default Sitecore Commerce Solr Configuration defines this search index, because it took some digging. In the  . . . SIF\Configuration\Commerce\Solr\sitecore-commerce-solr.json file you’ll find:

// The names of the cores to create
“Customers.Name”: “[concat(parameter(‘CorePrefix’), ‘CustomersScope’)]”,
“Orders.Name”: “[concat(parameter(‘CorePrefix’), ‘OrdersScope’)]”,
“CatalogItems.Name”: “[concat(parameter(‘CorePrefix’), ‘CatalogItemsScope’)]”,

Using a fresh IaaS install of Sitecore 9 with Commerce, I’ll go to the Solr admin interface. Select the CatalogItemsScope Solr core from the dropdown list on the left side navigation. Choose the “Analysis” option to access this powerful way of evaluating two key operations one does with Solr: indexing and querying.  Solr defines different sets of analyzers for these operations, sort of like how Sitecore exposes the httpRequestBegin pipeline or other extensibility points that projects are always customizing. For this exercise, I’ll focus on the Query operation. The documentation on this Analysis screen has a lot more information on this, if you’re interested.

Enter the search phrase “an AWESOME television!” in the textbox for the Field Value(Query), and then specify the text_general as the Fieldname to Analyse: step1

Solr is going to run “an AWESOME television!” through the analysis process and show the results as they progress through each step of that analysis. For this write-up, I’ll uncheck the “Verbose Output” checkbox — but definitely play around with that feature as it shows the offsets and ordinal positions as Solr works it’s magic through each transformation.

After clicking the blue “Analyse Values” button you’ll see output like the following:

step2

Each row of output starts with an abbreviation (ST, SF, etc). That’s the key to which process of the “analyzer” has run; the pipe-separated list to the right of the abbreviation shows the search phrase as it’s progressing through Solr’s transformation logic.

  1. ST is the StandardTokenizerFactory. I removes the “!” punctuation mark, but otherwise doesn’t make any changes to the search query.
  2. SF is the StopFilterFactory. This would remove words listed in the CatalogItemsScope\conf\stopwords.txt file. In this case, with an out-of-the-box Sitecore 9 Commerce installation, there are no stopwords specified. This doesn’t change our search query in any way.
  3. SGF is the SynonymGraphFilterFactory. This component reads a list of synonyms from CatalogItemsScope\conf\synonyms.txt . . . and “television” is one of the samples they provide in that file, so now our query includes those synonyms
    • synonyms.txt includes this entry for example purposes:
      • Television, Televisions, TV, TVs
  4. The final step, LCF, is to run through LowerCaseFilterFactory which is takes “AWESOME” to “awesome” to ensure case isn’t evaluated in the query.

To drive home the point of this example, change the Fieldname to “text_en” from “text_general” and click the blue “Analyse Values” button. The results are different because the text_general and text_en Solr fields have a different set of components defined for the Query operation. Specifically, text_en adds:

  • EPF (EnglishPossessiveFilterFactory)
  • SKF (KeywordMarkerFilterFactory)
  • PSF (PorterStemFilterFactory)

Here’s what the Solr Analysis screen looks like:

step3

There is a lot that could be said about all this, and I’ll probably build on this in a future blog post . . . but I want to return to Anton’s original point about the Porter stemmer and highlight how powerful the Porter stemmer can be. Solr’s text_general is significantly different than text_en, and hopefully I’ve shed light on precisely how they differ on the query side. The docs at http://snowball.tartarus.org/algorithms/porter/stemmer.html review this in some detail. I’ve also used this online Porter stemmer to quickly see how words are decomposed into their key fragments for search relevancy. You don’t really need that online stemming tool, however, since you now see how to use the Solr administrative UI with the standard “text_en” field to have your own Porter stemming sandbox.

Quick note on onPublishEndAsyncSingleInstance vs onPublishEndAsync

This is more a note for my benefit — for search index update strategies, the onPublishEndAsyncSingleInstance’ makes the ‘onPublishEndAsync’ a deprecated option.
The legacy onPublishEndAsync remains to ensure backwards compatibility, but from Sitecore 9.0 onward it’s the default index update strategy used by Sitecore.
With that said, it appears in Sitecore 9.0 update-2 there’s a major defect with OnPublishEndAsynchronousSingleInstanceStrategy. The ContentSearch.ParallelIndexing.MaxThreadLimit setting is ignored by the onPublishEndAsyncSingleInstance strategy — so incorrect thread limits can be used (slow perf!). Sitecore’s patch reference # 285903 can be requested through Sitecore Support to address this.
I suppose it’s a consequence of the new  onPublishEndAsyncSingleInstance not having a mature and well-tested codebase surrounding it (onPublishEndAsync has been around for ages!).

 

The Game Is Afoot . . . Solr Shenanigans for Sitecore (Part 2)

Following on from Part 1 where I introduced what I’m up to here, let me jump right in to the other 5 Shenanigans for Solr + Sitecore:

Case 4 – The case of the default query crippler

  • In this scenario, a customer’s Solr was straining to the breaking point and we tracked it down to a set of circumstances where Sitecore was using a default value for ContentSearch.SearchMaxResults (defaults to “” which is int.MaxValue which is 2,147,483,647) and flooding Solr with essentially unbounded queries. That default is downright dangerous. The query logs showed the queries using a rows value of int.MaxValue in rapid succession:
  • INFO Solr Query – ?q=associated_docs:(“\*A6C71A21-47B5-156E-FBD1-B0E5EFED4D33\*”)&rows=2147483647&fq=_indexname:(domain_index_web)
    INFO Solr Query – ?q=((_fullpath:(\/sitecore/content/Branches/XYZ/*) AND _templates:(d0321826b8cd4f57ac05816471ba3ebc)))&rows=2147483647&fq=_indexname:(domain_index_web)
  • Solr will set aside some memory for the 2,147,483,647 results even if the dataset isn’t that large. I discussed a scenario like this in detail in this earlier post from 2018
  • This write-up on Solr and “Large Number of Rows” speaks exactly to this scenario: https://risdenk.github.io/2018/10/21/apache-solr-out-of-memory-symptoms-and-solutions.html

Case 5 – The case of the bandwidth blowout

  • Network bandwidth usage was off the charts for the customer we considered in this scenario. It took some digging, but we discovered it was due to a 23 GB Solr core being replicated across data centers. If one viewed the replication panel in the Solr UI, one could see the slow creep of the replication progress bar and it would never reach 100% complete before starting over.

shen5

    • There was additional supporting material such as Solr WARN messages etc:

shen6

    • The network latency was too much for Solr master/slave replication to complete it’s work, but Solr kept on trying to move that 23 GB Solr core across the planet . . . and since this was the sitecore_analytics_index it was kept very busy by Sitecore. It all made for a feedback loop of frequent updates to the analytics index that couldn’t properly synchronize between data centers.
    • For this particular scenario, we determined that there wasn’t a need to replicate the sitecore_analytics_index (it was consumed only by the CM environment which didn’t require the geographical scaling through Solr replication). We disabled the master/slave replication for that specific Solr core and the tidal wave of network traffic stopped. Case closed!

Case 6 – The case of the misguided, well-intentioned, administrator

  • This scenario introduces a server administrator into the equation, and they actually cause more harm than good. The “optimize now” button in the Solr UI lured this administrator into clicking it without understanding the consequences:

OptimizeNow.JPG

  • I posted about this in detail last year, so I won’t dig too far into it here, but the gist of this scenario illustrates how Solr internally organizes files and that there are questionable UI choices for that Optimize Now button. It makes it look like an easy way for one to improve Solr performance when — in reality — clicking that Optimize Now can be pretty expensive in terms of perf, especially for a volatile Solr core.

Case 7 – The case of the AppPool recycle-fest

  • This scenario is one from a couple years ago, but it’s still relevant as a cautionary tale. For a long time, if Sitecore lost an active connection to Solr, the only option was to recycle the Sitecore AppPool. For a Solr server restart, or service restart, or even a transient network failure . . . Sitecore would need to run through the application initialization logic to reacquire Solr connectivity. In this specific case, there was a recurring network issue that interfered with Sitecore’s connectivity to Solr, so the customer scheduled IIS AppPool recycles every 15 minutes to ensure a fresh connection to Solr was available. This AppPool recycle-fest has terrible consequences for website performance as the site is constantly spending time on recycles and the related pipeline of events.
  • This case highlights why there are now more elegant ways of handling this; I recently blogged about the IsSolrAliveAgent designed to solve this exact problem. There’s periodic logic to reconnect Sitecore with Solr now, and it’s important to appreciate why it’s there and — probably — why you may want to tune the default setting of every 10 minutes for your production environment.

That’s the 7 Shenanigans related to Sitecore and Solr from my talk earlier in October. It’s a fun paradigm for learning more about the overlaps of Sitecore and Solr and I hope it helps others to get more from their Sitecore + Solr technology stack!

The Game Is Afoot . . . Solr Shenanigans for Sitecore (Part 1)

I took the challenge of presenting at the Manchester, New Hampshire Sitecore User Group a few days ahead of the 2018 Sitecore Symposium. I say challenge because

  1. Delivering content that isn’t superseded by the Sitecore Symposium agenda can be difficult (all the good Commerce or Azure or DevOps material would be saved for Symposium week)
  2. I would be following Michael West and his showcase of the new Sitecore PowerShell Extensions version 5.0 module (curious that https://doc.sitecorepowershell.com/releases doesn’t list the 5.0 yet, but I’m sure it’s coming there too).

As I finalized my topic, I surveyed the work I’ve been up to recently and figured I could take my talk in one of two directions that would be generally absent from the Sitecore Symposium agenda: Sitecore and Solr, or The Heresy of Sitecore on AWS. While we are doing some interesting things around AWS with RDS, ElastiCache, that’s officially unsupported territory with Sitecore and not fully baked enough for me to present it as anything approaching a best practice — but check back with me in 6 months. So, I elected to give a talk on Solr and explore some of the lessons learned from years in the trenches making Sitecore successful with Solr; the topic was finalized as Solr, Sitecore 9, 7 Shenanigans:

shenanigans
The title slide from the talk

I covered some of the history and underpinnings of Solr with regards to Sitecore, the dependence on Solr.Net (which is <important>NOT</important> a port of Solr to .Net the way Lucene.Net IS a port of Java’s Lucene — and why we should care), and common architecture patterns for Sitecore integrations based on Solr master/slave and Solr Cloud. I guess I’ve blogged a lot about Solr over the years; for instance, here and here are a couple sample areas I delved into.

I think the most fun part of the talk was the Shenanigans, however, as I went with a Sherlock Holmes theme to frame the conversation. I reviewed 7 cases and we had fun digging into some of the diagnostic bits.

shenanigans2

Here’s a quick run down of the first 3 Shenanigans:

  1. The case of the disappearing Java
    • Where we started with Solr that wouldn’t start for a set of Production servers  . . .

shen1.JPG

    • . . . and eventually solved the case by determining the development team installed a Java SDK with auto-update enabled and the system had removed the Java identified in the ClassPath. This is a brutal one for a Production implementation!
  1. The case of the underachieving mega-server
    • These OutOfMemoryErrors are not fun and can be due to a variety of issues:
      • shen2.JPG
    • In this case, we determined this 32 GB server was running Solr with the default 512 MB of memory set aside for Solr . . . a pretty fundamental issue:3.512 MB - default of 32 GB
    • We tuned the Solr start settings to use more of the server capacity for Java and Solr, in this case I think we used 10 GB as a starting point, and solved this specific case. This isn’t the last we’ll hear about OutOfMemory errors in our Shenanigans, however, as there can be many causes (see this great summary published just yesterday, for example).
  2. The case of the disappearing content
    • This scenario had a public-facing website’s content disappear periodically after content publishes . . . sound familiar to anyone? It was due to the threshold for full rebuilding of the search indexes after a Sitecore publish set low enough to trigger regularly
      • For the record, the setting is ContentSearch.FullRebuildItemCountThreshold and the default is 100,000 (0x186a0 – 100,000):Shen3

    • This customer didn’t have SwitchOnRebuild implemented for these key public-facing search indexes, so the first step of the index rebuild logic was to remove all documents from the Solr collection, then add the documents back in as the indexing process ran it’s course.  To the site visitor it created missing or inconsistent search results while the rebuilding took place, and for a large set of items it can take 60 minutes or longer for rebuilding.
    • The solution is to use the SwitchOnRebuild implementation for their Sitecore search indexes – https://doc.sitecore.net/sitecore_experience_platform/setting_up_and_maintaining/search_and_indexing/indexing/switch_solr_indexes and related documentation from Sitecore covers this process.

I will cover the remaining four Solr + Sitecore Shenanigans in my next post; here’s a teaser for the topics:

  • Case #4 – The case of the default query crippler
  • Case #5 – The case of the bandwidth blowout
  • Case #6 – The case of the misguided, well-intentioned, administrator
  • Case #7 – The case of the AppPool recycle-fest

As Sherlock Holmes would say: “The game is afoot!”

The tale of the IsSolrAliveAgent for Sitecore

I had the pleasure of assessing a Sitecore 7 implementation the other day and it made me a bit nostalgic for the good old days — OMS had grown into DMS, all nicely contained in a tidy SQL Server database; barely a whisper of Solr; ItemBuckets were the new kid on the block.

It got me thinking about how some features can evolve over time until the obscure becomes mainstream. One example of this feature evolution, that for some reason has relevance to a variety of customers we’re working with right now, is in how Sitecore maintains connections with Solr (or should I say how Sitecore doesn’t maintain connections?).

Many of us have learned hard lessons that if you have a Solr server and it reboots or experiences an interruption in service, it can mean downtime for your Sitecore site unless you take special measures to guard against it. Sitecore’s initialization process connects to Solr and then holds that connection for the lifetime of the IIS AppPool. If that connection fails for any reason, there wasn’t logic in the Sitecore Solr Provider for a graceful reconnect. At least, that was the case for several iterations of Sitecore’s standard Solr search integration.

A few years ago, a basic approach evolved through the Sitecore developer community to correct the above limitation with Sitecore; it may have come from Sitecore support, but it wasn’t publicized in any way. One could explicitly repeat Sitecore’s initialization logic for Solr in a custom agent, scheduled to run periodically. This was a custom solution and I saw a few iterations of it — mostly a brute force approach. But it worked.

In Sitecore 8.2 update-1,  buried in the Release Notes is this fragment relevant to the story:

Solr

The Sitecore.ContentSearch.SolrProvider, from that point onward, contained a new agent defined in Sitecore.ContentSearch.Solr.DefaultIndexConfiguration.config named IsSolrAliveAgent that would serve as a retryer for Solr if the connectivity was lost.  It was configured to run every 10 minutes for a default implementation.

By the way, 10 minutes is probably too long an interval in my experience — even 1 minute can be too long for a production environment to wait before trying to reconnect a key component such as Solr. Also, if you set the agent to 1 minute, but have a /sitecore/scheduling/frequency value defined as something like 10 minutes, you need to change the frequency value to ensure the IsSolrAliveAgent is executed on the schedule you expect.

What had been a little-known approach to keeping Sitecore connected to Solr across interruptions had made the big time: it was now part of the official Sitecore search provider code!

This first implementation included in the Sitecore code base wasn’t perfect, though, and it was improved upon in subsequent releases of Sitecore. Improvements include better logging . . . more efficient iteration through the search indexes . . . but the general approach remains the same.

At this point, there’s an official patch for the IsSolrAliveAgent that Sitecore makes available at https://github.com/SitecoreSupport/Sitecore.Support.163850.171950/releases/tag/8.2.6.0 and please note the compatibility caveats on that repo. We do have a Sitecore 8.2 update-5 customer making use of the patch without issue, even though it’s not officially listed as compatible for that version — but that could be exceptional in our case, so always perform your own evaluations, tests, etc. This patch addresses a known issue where, if Solr is unavailable during Sitecore initialization, the SwitchOnRebuildSolrCloudSearchIndex indexes are not properly initialized.

It’s interesting — at least to a Sitecore/Solr nerd like me — to decompile and compare all the changes over time to this component; there’s logging and tests and generally better code present in the newest iteration of this IsSolrAliveAgent vs the earlier implementations. Some of the changes are subtle, but the evolution of this IsSolrAliveAgent from the days of “hey, we’ve got this homegrown ten lines of code that reconnects Sitecore to Solr if Solr goes down” is remarkable.

In some ways, it parallels the progress of Sitecore as an entire platform. The CMS became a “personalization platform” which now is adding a Commerce ecosystem. OMS became DMS which became xDB. On we go.

At this point, adding a patch .config file such as the following to your Sitecore project is the state of the art in Sitecore and Solr connectivity . . . but it surely will be improved upon over time:

<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/" xmlns:set="http://www.sitecore.net/xmlconfig/set/">
  <sitecore>
    <scheduling>
      <agent type="Sitecore.ContentSearch.SolrProvider.Agents.IsSolrAliveAgent, Sitecore.ContentSearch.SolrProvider">
        <patch:attribute name="type">Sitecore.Support.ContentSearch.SolrProvider.Agents.IsSolrAliveAgent, Sitecore.Support.163850.171950</patch:attribute>
        <patch:attribute name="interval">00:01:00</patch:attribute>
      </agent>
    </scheduling>
  </sitecore>
</configuration>

 

Curious case of the LocationsDictionaryCache

We have some cache monitoring in place for some enterprise Sitecore customers and that system has found one particular cache being evicted roughly 40 times per hour for one particular customer. It’s curious, as this cache isn’t covered in the standard documentation for Sitecore’s caches. Sitecore support had to point me in the right direction on this . . . and it’s an interesting case.

The LocationsDictionaryCache cache is defined in Sitecore.Analytics.Data.Dictionaries.LocationsDictionary and there’s a hard coded 600 second expiration in that object definition:

Expires

There’s also a maxCacheSize set of 0xf4240 hex (1,000,000 or 976 KB). You can’t alter these settings through configuration, you’d have to compile a new DLL to alter these values.

It’s not clear to me that the quick eviction/turn-over of this cache is a perf issue to worry about . . . I think, at this point, it’s working as expected and indicative of a busy site with lots of Sitecore Analytics and Tracking behaviour. Reviewing the decompiled Sitecore code that uses this class (like in Sitecore.Analytics.Tracking.CurrentVisitContext or Sitecore.Analytics.Pipelines.CommitSession.ProcessSubscriptions), it appears that this cache serves as a short-term lookup along the same lines as devices, user-agents, etc. Why this particular cache is active in ways UserAgentDictionaryCache and others is not, besides the obvious 600 second life span, is something we need to dig further into — but I don’t know that it’s a perf bottleneck for our given scenario.

 

Sitecore and SearchMaxResults for Solr

I’ve consulted with a number of Sitecore implementations in the last month or two that had a variety of challenges with Sitecore integration with Solr. Solr is a powerful distributed system with a variety of angles of interest to Sitecore. I wanted to take this chance to comment on a Sitecore setting that can have a significant impact on how Sitecore search functions, but is easily overlooked. The setting is defined in Sitecore.ContentSearch.config and it’s called ContentSearch.SearchMaxResults. The XML comment for this setting is straight-forward, here’s how it’s presented in that file:

snip

There’s a lot to digest in that xml comment above. One could read it and assume “this can be set but it is best kept as the default” means this shouldn’t be altered, but in my experience that can be problematic.

The .Net int.MaxValue constant is 2,147,483,647. If you leave this setting at the default (so “”), one is telling Solr to return up to 2,147,483,647 results in a single response to a query, which we’ve observed in some cases to cause significant performance problems (Solr will fetch the large volume of records from disk and write them to the Solr response causing IO pressure etc.) It’s not always the case since it really comes down to the number of documents one is querying from Solr, but this sets up the potential for a virtually unbounded Solr query.

It’s interesting to trace this setting through Sitecore and into Solr, and it sheds light on how these two complex systems work together. Fun, right!? I cooked up the diagram below that shows an overview of how Sitecore and Solr work together in completing search queries:

snipp

Each application has it’s own logging which will help trace activity between the systems.

The Sitecore ContentSearch Provider for Solr relies on Solr.Net for connectivity to Solr. It’s common for .Net libraries to copy their open source equivalents from the Java world (like Log4J has a .Net port for logging named Log4net, Lucene has a .Net port for search called Lucene.Net, etc). Solr.Net, however, is not a port of the Solr Java application to .Net. Instead, Solr.Net is a wrapper for the main Solr API elements that can be easily called by .Net applications. When it comes to Sitecore’s ContentSearch Provider for Solr, Solr.Net is Sitecore’s bridge for getting data to and from the Solr application.

Just an aside: some projects do creative things with Solr search and Sitecore, and for certain scenarios it’s necessary to bypass Solr.Net and use the REST API directly from Sitecore. This write-up focuses on a conventional Sitecore -> Solr.Net -> Solr pattern, but I wanted to acknowledge that it’s not the only pattern.

Tracking ContentSearch.SearchMaxResults in Sitecore

On the Sitecore side, one can see the ContentSearch.SearchMaxResults setting in the Sitecore logs when you turn up the diagnostics to get more granular data; this isn’t a configuration that’s recommended for using beyond a discrete troubleshooting session as the amount of data it can generate can be significant . . . but here’s how you dial up the diagnostic data Sitecore reports about queries:

snip3

If you run a few queries that exercise your Sitecore implementation code that queries Solr, you can review the contents of the Search log in the Sitecore /data directory and find entries such as:

INFO Solr Query – ?q=associated_docs:(“\*A5C71A21-47B5-156E-FBD1-B0E5EFED4D33\*”)&rows=2147483647&fq=_indexname:(domain_index_web)

or

INFO  Solr Query – ?q=((_fullpath:(\/sitecore/content/Branches/ABC/*) AND _templates:(d0351826b1cd4f57ac05816471ba3ebc)))&rows=2147483647&fq=_indexname:(domain_index_web)

The .Net int.MaxValue 2147483647 is what Sitecore, through Solr.Net, is specifying as the number of rows to return from this query. For Solr cores with only a few hundred results matching this query, it’s not that big a deal because the query has a fairly small universe to process and retrieve. If you have 100,000 documents, however, that’s a very heavy query for Solr to respond to and it will probably impact the performance of your Sitecore implementation.

Tracking ContentSearch.SearchMaxResults in Solr

Solr has it’s own logging systems and this 2147483647 value can be seen in these logs once Solr has completed the API call. In a default Solr installation, the logs will be located at server/logs (check your log4j.properties file if you don’t know for sure where your logs are being stored) and you can a open up the log and scan for our ContentSearch.SearchMaxResults setting value. You’ll see entries such as:

INFO  – 2018-03-26 21:20:19.624; org.apache.solr.core.SolrCore; [domain_index_web] webapp=/solr path=/select params={q=(_template:(a6f3979d03df4441904309e4d281c11b)+AND+_path:(1f6ce22fa51943f5b6c20be96502e6a7))&fl=*,score&fq=_indexname:(domain_index_web)&rows=2147483647&version=2.2} hits=2681 status=0 QTime=88

  • The above Solr query returned 2,681 results (hits) and the QTime (time elapsed between the arrival of the query request to Solr and the completion of the request handler) was 88 milliseconds. This is probably no big deal as it relates to the ContentSearch.SearchMaxResults, but you don’t know if this data will increase over time…

INFO  – 2018-03-26 21:20:19.703; org.apache.solr.core.SolrCore; [domain_index_web] webapp=/solr path=/select params={q=((((include_nav_xml_b:(True)+AND+_path:(00ada316e3e4498797916f411bc283cf)))+AND+url_s:[*+TO+*])+AND+(_language:(no-NO)+OR+_language:(en)))&fl=*,score&fq=_indexname:( domain_index_web)&rows=2147483647&version=2.2} hits=9 status=0 QTime=16

  • The above Solr query returned 9 results (hits) and the QTime was 16 milliseconds. This is unlikely a problem when it comes to ContentSearch.SearchMaxResults.

 INFO  – 2018-03-26 21:20:19.812; org.apache.solr.core.SolrCore; [domain_index_web] webapp=/solr path=/select params={q=(_template:(8ed95d51A5f64ae1927622f3209a661f)+AND+regionids_sm:(33ada316e3e4498799916f411bc283cf))&fl=*,score&fq=_indexname:(domain_index_web)&rows=2147483647&version=2.2} hits=89372 status=0 QTime=1600

  • The above Solr query returned 89,372 results (hits) and the QTime was 1600 milliseconds. Look out. This is the type of query that could easily cause problems based on the Sitecore ContentSearch.SearchMaxResults setting as the volume of data Solr is working with is getting large. That query time is already climbing high and that’s a lot of results for Sitecore to require in a single request.

The impact of retrieving so many documents from Solr can cause a cascade of difficulties besides just the handling of the query. Solr caches the query results in memory and if you request 1,000,000 documents you could also be caching 1,000,000 million documents. Too much of this activity and it can stress Solr to the breaking point.

Best Practices

There is no magic value to set for ContentSearch.SearchMaxResults other than not “”. General best practice when retrieving lots of data from most any system is to use paging. It’s recommended to do that for Sitecore ContentSearch queries, too. A general recommendation would be to set a specific value for the ContentSearch.SearchMaxResults setting, such as “500” or “1000”. This should be thoroughly tested, however, as limiting the search results for an implementation that isn’t properly using search result paging can lead to inconsistent behavior across the site. Areas such as site maps, general site search, and other areas with implementation logic that could assume all the search results are available in a single request deserve special attention when tuning this setting.

What About Noisy Solr Neighbors?

I’ve worked on some implementations where Solr was a resource shared between a variety of Sitecore implementations. One project, in this example, might set ContentSearch.SearchMaxResults to “2000” for their Sitecore sites while another project sets a value of “500” – but what if there’s a third organization making use of the same Solr resources and that project doesn’t explicitly set a value for ContentSearch.SearchMaxResults? That one project leaves the setting at the “” default, so it uses the .Net int.MaxValue. This is a classic noisy neighbor problem where shared services become a point of vulnerability to all the consuming applications. The one project with “” for ContentSearch.SearchMaxResults could be responsible for dragging Solr performance down across all the sites.

Solr is an extensible platform much like Sitecore, and in some ways even more so. In Sitecore one extends pipelines or overrides events to achieve the customizations you desire; the same general idea can be applied to Solr – you just use Java with Solr instead of C#.

In this case, our concern being unbounded Solr queries, we can use an extension to a search component (org.apache.solr.handler.component.SearchComponent) to introduce our custom logic into the Solr query processing. In our case, we want to enforce limits to the number of rows a query can request. This would be a safety net in case an un-tuned Sitecore implementation left a ContentSearch.SearchMaxResults setting at the default.

Some care must be taken in how this is introduced into the Solr execution context and where exactly the custom logic is handled. This topic is all covered very well, along with sample code etc, at https://jorgelbg.wordpress.com/2015/03/12/adding-some-safeguard-measures-to-solr-with-a-searchcomponent/. For an enterprise Solr implementation with a variety of Sitecore consumers, a safety measure such as this could be vital to the general stability and perf of the platform – especially if one doesn’t have control over the specific Sitecore projects and their use (or abuse!) of the ContentSearch.SearchMaxResults setting. Maybe file this under best practices for governing Sitecore implementations with shared Solr infrastructure.