Multi-region Sitecore Publishing

Sitecore’s Publishing Service that runs on .NET Core is a great addition to the Sitecore ecosystem. It allows us to solve some interesting customer scaling challenges by using this micro-services approach to Publishing content. I’m going to write-up a pattern we’re using these days that updates our approach from a few years ago.

See an example of the older pattern in this piece I wrote for the Rackspace site at https://developer.rackspace.com/blog/Sitecore-Enterprise-Architecture-For-Global-Publishing/.

Now in May 2019, we’re shifting away from the SQL replication game and using Sitecore’s new Publishing Service to connect Sitecore across multiple regions. Refer to this general diagram below to see how we’re approaching it:

2RegionPublishingService

Sitecore’s Publishing Service is the key element between the two regions and the blue arrows show the flow of publishing activities coordinated through the one “Sitecore Publishing Service” host in Region 1.

A few caveats on the picture above:

  1. It’s Sitecore 8.2, so MongoDB is present but not shown on the diagram for simplicity (we use ObjectRocket’s hosted MongoDB service for the majority of these types of customers — but I don’t want to get into that here); Redis and other elements are also not included in the diagram
  2. This applies for any multi-region setup with Sitecore. . . it could be East US and West US, for example, but we used Europe and Asia in the diagram. This approach is most useful where network latency between the regions is enough to make synchronous database connectivity unacceptably slow. This model can apply to more than 2 regions, too, as the pattern can be repeated to support as many regions as you require.

There are just a few crucial configuration steps to make this happen, but it’s built on a lot of lessons learned along the way. Let me catalog the key elements:

  1. The Publishing Service runs in Region 1, but requires a Sitecore Publishing Target to the Region 2 database. The documentation on setting up this type of Publishing Target is vague, so I summarized this process at https://grantkillian.wordpress.com/2018/12/17/how-i-add-custom-sitecore-publishing-service-targets/.
  2. Each region has an isolated Solr cluster (because Solr CDCR or file synchronization for Solr were not suitable in this use-case). This means one of the Region 2 Sitecore CD servers needs to employ the onPublishEndAsync strategy to update the Solr Cloud collections relevant to the implementation. This is standard ContentSearch configuration material, but if you use the manual strategy here with the CDs (which is the general best practice for Sitecore CD servers connected to a Solr cluster with a CM that drives search indexing), the Solr data will never get updated in the other region:
    • <strategies hint="list:AddStrategy">
        <strategy 
        ref="contentSearch/indexConfigurations/indexUpdateStrategies/onPublishEndAsync"/>
      </strategies>
  3. If you are using Sitecore ContentTesting with this approach (<setting name=”ContentTesting.AutomaticContentTesting.Enabled” value=”true” />), you should be aware that Sitecore CM performance can occasionally stall for several minutes (we’ve seen it last up to 20 minutes!) due to an aspect of the ContentTesting logic that checks every content database for eligible published items to factor into the content testing system. Part of setting up the Region 2 Publishing Target involves adding a ConnectionStrings.config entry to the Region 2 “web” database on the Region 1 Sitecore CM server. This adds the Region 2 “web” database into this ContentTesting routine, and the network latency between Region 1 and Region 2 makes this ContentTesting behaviour slow the CM to a crawl every so often.  If you don’t want to disable Sitecore ContentTesting, you can address this by customizing the Sitecore.ContentTesting.Helpers.VersionHelper.GetLatestPublishedVersion method to employ logic to exclude the Region 2 “web” database. Once you dig deep into this topic, you’ll see the Sitecore.ContentTesting.Helpers.VersionHelper class contains this logic and it’s used in 3 places (according to the decompilation of the .dll):

dude

To adjust ContentTesting to ignore our Region 2 “web” database, we can alter the foreach loop above with something like this that uses a custom “ContentTesting.IgnoredDatabases” setting:

foreach (Database db in Factory.GetDatabases())
{
  string[] excludeList = 
    Sitecore.Configuration.Settings.GetSetting(
    "ContentTesting.IgnoredDatabases")
    .ToLowerInvariant().Split(
        new char[1]
       {
        '|'
       }, 
   StringSplitOptions.RemoveEmptyEntries);
  if (database != null && 
    db.Name != database.Name && 
    !excludeList.Contains(db.Name))
  {
    Item item2 = db.GetItem(item.ID, item.Language);
    if (item2 != null && item2.Version.Number > num)
    {
      num = item2.Version.Number;
    }
  }
}

We can define our custom setting like the following, if we assume region2web is the “web” database ConnectionString name for the Region 2 publishing target on the Sitecore CM:

<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/">
  <sitecore>
    <settings>
      <setting name="ContentTesting.IgnoredDatabases">
        <patch:attribute name="value">core|region2web</patch:attribute>
      </setting> 
    </settings>
  </sitecore>
</configuration>

This work to override the default configuration from . . .

<getVersionedTestCandidates>
  <processor 
    type="Sitecore.ContentTesting.Pipelines.GetTestCandidates.GetPageVersionTestCandidates, Sitecore.ContentTesting">

. . . can dramatically improve the Sitecore CM performance when using this formula for multi-region Sitecore with the new Publishing Service.

Hopefully these notes help other efforts on their Sitecore journey!

Quick note on onPublishEndAsyncSingleInstance vs onPublishEndAsync

This is more a note for my benefit — for search index update strategies, the onPublishEndAsyncSingleInstance’ makes the ‘onPublishEndAsync’ a deprecated option.
The legacy onPublishEndAsync remains to ensure backwards compatibility, but from Sitecore 9.0 onward it’s the default index update strategy used by Sitecore.
With that said, it appears in Sitecore 9.0 update-2 there’s a major defect with OnPublishEndAsynchronousSingleInstanceStrategy. The ContentSearch.ParallelIndexing.MaxThreadLimit setting is ignored by the onPublishEndAsyncSingleInstance strategy — so incorrect thread limits can be used (slow perf!). Sitecore’s patch reference # 285903 can be requested through Sitecore Support to address this.
I suppose it’s a consequence of the new  onPublishEndAsyncSingleInstance not having a mature and well-tested codebase surrounding it (onPublishEndAsync has been around for ages!).

 

The Game Is Afoot . . . Solr Shenanigans for Sitecore (Part 2)

Following on from Part 1 where I introduced what I’m up to here, let me jump right in to the other 5 Shenanigans for Solr + Sitecore:

Case 4 – The case of the default query crippler

  • In this scenario, a customer’s Solr was straining to the breaking point and we tracked it down to a set of circumstances where Sitecore was using a default value for ContentSearch.SearchMaxResults (defaults to “” which is int.MaxValue which is 2,147,483,647) and flooding Solr with essentially unbounded queries. That default is downright dangerous. The query logs showed the queries using a rows value of int.MaxValue in rapid succession:
  • INFO Solr Query – ?q=associated_docs:(“\*A6C71A21-47B5-156E-FBD1-B0E5EFED4D33\*”)&rows=2147483647&fq=_indexname:(domain_index_web)
    INFO Solr Query – ?q=((_fullpath:(\/sitecore/content/Branches/XYZ/*) AND _templates:(d0321826b8cd4f57ac05816471ba3ebc)))&rows=2147483647&fq=_indexname:(domain_index_web)
  • Solr will set aside some memory for the 2,147,483,647 results even if the dataset isn’t that large. I discussed a scenario like this in detail in this earlier post from 2018
  • This write-up on Solr and “Large Number of Rows” speaks exactly to this scenario: https://risdenk.github.io/2018/10/21/apache-solr-out-of-memory-symptoms-and-solutions.html

Case 5 – The case of the bandwidth blowout

  • Network bandwidth usage was off the charts for the customer we considered in this scenario. It took some digging, but we discovered it was due to a 23 GB Solr core being replicated across data centers. If one viewed the replication panel in the Solr UI, one could see the slow creep of the replication progress bar and it would never reach 100% complete before starting over.

shen5

    • There was additional supporting material such as Solr WARN messages etc:

shen6

    • The network latency was too much for Solr master/slave replication to complete it’s work, but Solr kept on trying to move that 23 GB Solr core across the planet . . . and since this was the sitecore_analytics_index it was kept very busy by Sitecore. It all made for a feedback loop of frequent updates to the analytics index that couldn’t properly synchronize between data centers.
    • For this particular scenario, we determined that there wasn’t a need to replicate the sitecore_analytics_index (it was consumed only by the CM environment which didn’t require the geographical scaling through Solr replication). We disabled the master/slave replication for that specific Solr core and the tidal wave of network traffic stopped. Case closed!

Case 6 – The case of the misguided, well-intentioned, administrator

  • This scenario introduces a server administrator into the equation, and they actually cause more harm than good. The “optimize now” button in the Solr UI lured this administrator into clicking it without understanding the consequences:

OptimizeNow.JPG

  • I posted about this in detail last year, so I won’t dig too far into it here, but the gist of this scenario illustrates how Solr internally organizes files and that there are questionable UI choices for that Optimize Now button. It makes it look like an easy way for one to improve Solr performance when — in reality — clicking that Optimize Now can be pretty expensive in terms of perf, especially for a volatile Solr core.

Case 7 – The case of the AppPool recycle-fest

  • This scenario is one from a couple years ago, but it’s still relevant as a cautionary tale. For a long time, if Sitecore lost an active connection to Solr, the only option was to recycle the Sitecore AppPool. For a Solr server restart, or service restart, or even a transient network failure . . . Sitecore would need to run through the application initialization logic to reacquire Solr connectivity. In this specific case, there was a recurring network issue that interfered with Sitecore’s connectivity to Solr, so the customer scheduled IIS AppPool recycles every 15 minutes to ensure a fresh connection to Solr was available. This AppPool recycle-fest has terrible consequences for website performance as the site is constantly spending time on recycles and the related pipeline of events.
  • This case highlights why there are now more elegant ways of handling this; I recently blogged about the IsSolrAliveAgent designed to solve this exact problem. There’s periodic logic to reconnect Sitecore with Solr now, and it’s important to appreciate why it’s there and — probably — why you may want to tune the default setting of every 10 minutes for your production environment.

That’s the 7 Shenanigans related to Sitecore and Solr from my talk earlier in October. It’s a fun paradigm for learning more about the overlaps of Sitecore and Solr and I hope it helps others to get more from their Sitecore + Solr technology stack!

Which Solr Node Is Responding to My Sitecore Query?

In working with Sitecore + Solr, eventually you may need to determine which Solr server responded to a specific query in order to validate Solr replication or to compare Solr responses across machines. If you can’t imagine a reason why you’d want to do this, you’re probably fortunate and should maybe go buy a lottery ticket instead of reading this blog post 🙂

Brute Force Method with Solr Master/Slave

If you’re working with Solr master/slave behind a load-balancer, or with multiple slaves behind a load-balancer, I haven’t found a reliable way of determining which Solr server responded to a particular query besides the brute force method of comparing Sitecore and Solr logs. Specifically, from the Sitecore logs in Data\logs\Search.Logs.[Timestamp].txt you should see something like the following for each query:

Sitecore’s Query Log

15:11:48 INFO Serialized Query – ?q=(_template:(f613d8a8d9324b5f84516424f49c9102) AND (-_name:(“__Standard Values”) AND _language:(en-US)))&start=0&rows=1&fl=*,score&fq=_indexname:(sitecore_web_index)&sort=searchdate_tdt desc

Solr’s Query Log

You can cross-reference this Sitecore log data with Solr logs (like S:\solr-4.10.4\sitecore\logs\solr.log):

INFO – 2018-08-01 15:11:48.514; org.apache.solr.core.SolrCore; [sitecore_web_index] webapp=/solr path=/select params={q=(_template:(f613d8a8d9324b5f84516424f49c9102)+AND+(-_name:(“__Standard+Values”)+AND+_language:(en-US)))&fl=*,score&start=0&fq=_indexname:(sitecore_web_index)&sort=searchdate_tdt+desc&rows=1&version=2.2} hits=2958 status=0 QTime=16

It’s tedious to match up the exact query and time in these logs, but the Solr node with the matching record will reveal which one serviced the request.

Now, one can craft some PowerShell to scrounge the Sitecore and Solr logs and determine where we have matches — crazy as it may sound — but I’m not interested in sharing all that here. It’s academic to read the two logs and look for matches, anyway, so I’ll move on to the Solr Cloud scenario that is more interesting and forward-looking since Sitecore is steadily progressing towards full Solr Cloud support across the board.

Solr Cloud Debug Query Command

Solr Cloud supports a debug command where you append debug=true to the URL and Solr includes diagnostic output in the results. For example, a RESTful query to Solr like http://10.215.118.28:8983/solr/sitecore_web_index_shard1_replica2/select?q=_name%3ANEWS&wt=xml&indent=true&amp;debug=true. Using the XML formatting, debug=true adds something like this to the response from Solr:

Capture

There can be interesting tidbits in each of those debug sections, but I’m going to focus on the track node that shares information about the different phases of the distributed request Solr is making. Under the “EXECUTE_QUERY” item is a “name” attribute that will specify which Solr nodes, shards, and replica were involved in responding to the query, for example:

<lst name=”http://10.215.140.12:8983/solr/sitecore_web_index_shard2_replica1/|http://10.215.140.13:8983/solr/sitecore_web_index_shard2_replica2/”>

I’ve also found the “shard.url” value of the Response (nested under the EXECUTE_QUERY data) to share the same information. It’s possible that’s more reliable across Solr versions etc, but something to keep an eye on. Here’s a fragment of the XML response for the debug information:

Capture

A careful reader might point out that the “rid”  value includes the IP address of the server responding to the request, but this is designed to be the “request ID” that traces the query through Solr’s various moving parts — I wouldn’t rely on the “rid” to tell you the source for the response, though, as it could be changing across versions.

Here’s a quick run through of the other diagnostic data in that EXECUTE_QUERY data that I know about:

  1. QTime: the number of milliseconds it took Solr to execute a search with no regard for time spent sending a response across the network etc
  2. ElapsedTime: the number of milliseconds from when the query was sent
    to Solr until the time the client gets a response. This includes QTime, assembling the response, transmission time, etc.
  3. NumFound: the count of results

There is a ton to all this and we’re only scratching the surface, but as Sitecore gets more serious about scalable search with Solr, we’re all going to be learning a lot more about this in the months and years to come!

Sitecore and SearchMaxResults for Solr

I’ve consulted with a number of Sitecore implementations in the last month or two that had a variety of challenges with Sitecore integration with Solr. Solr is a powerful distributed system with a variety of angles of interest to Sitecore. I wanted to take this chance to comment on a Sitecore setting that can have a significant impact on how Sitecore search functions, but is easily overlooked. The setting is defined in Sitecore.ContentSearch.config and it’s called ContentSearch.SearchMaxResults. The XML comment for this setting is straight-forward, here’s how it’s presented in that file:

snip

There’s a lot to digest in that xml comment above. One could read it and assume “this can be set but it is best kept as the default” means this shouldn’t be altered, but in my experience that can be problematic.

The .Net int.MaxValue constant is 2,147,483,647. If you leave this setting at the default (so “”), one is telling Solr to return up to 2,147,483,647 results in a single response to a query, which we’ve observed in some cases to cause significant performance problems (Solr will fetch the large volume of records from disk and write them to the Solr response causing IO pressure etc.) It’s not always the case since it really comes down to the number of documents one is querying from Solr, but this sets up the potential for a virtually unbounded Solr query.

It’s interesting to trace this setting through Sitecore and into Solr, and it sheds light on how these two complex systems work together. Fun, right!? I cooked up the diagram below that shows an overview of how Sitecore and Solr work together in completing search queries:

snipp

Each application has it’s own logging which will help trace activity between the systems.

The Sitecore ContentSearch Provider for Solr relies on Solr.Net for connectivity to Solr. It’s common for .Net libraries to copy their open source equivalents from the Java world (like Log4J has a .Net port for logging named Log4net, Lucene has a .Net port for search called Lucene.Net, etc). Solr.Net, however, is not a port of the Solr Java application to .Net. Instead, Solr.Net is a wrapper for the main Solr API elements that can be easily called by .Net applications. When it comes to Sitecore’s ContentSearch Provider for Solr, Solr.Net is Sitecore’s bridge for getting data to and from the Solr application.

Just an aside: some projects do creative things with Solr search and Sitecore, and for certain scenarios it’s necessary to bypass Solr.Net and use the REST API directly from Sitecore. This write-up focuses on a conventional Sitecore -> Solr.Net -> Solr pattern, but I wanted to acknowledge that it’s not the only pattern.

Tracking ContentSearch.SearchMaxResults in Sitecore

On the Sitecore side, one can see the ContentSearch.SearchMaxResults setting in the Sitecore logs when you turn up the diagnostics to get more granular data; this isn’t a configuration that’s recommended for using beyond a discrete troubleshooting session as the amount of data it can generate can be significant . . . but here’s how you dial up the diagnostic data Sitecore reports about queries:

snip3

If you run a few queries that exercise your Sitecore implementation code that queries Solr, you can review the contents of the Search log in the Sitecore /data directory and find entries such as:

INFO Solr Query – ?q=associated_docs:(“\*A5C71A21-47B5-156E-FBD1-B0E5EFED4D33\*”)&rows=2147483647&fq=_indexname:(domain_index_web)

or

INFO  Solr Query – ?q=((_fullpath:(\/sitecore/content/Branches/ABC/*) AND _templates:(d0351826b1cd4f57ac05816471ba3ebc)))&rows=2147483647&fq=_indexname:(domain_index_web)

The .Net int.MaxValue 2147483647 is what Sitecore, through Solr.Net, is specifying as the number of rows to return from this query. For Solr cores with only a few hundred results matching this query, it’s not that big a deal because the query has a fairly small universe to process and retrieve. If you have 100,000 documents, however, that’s a very heavy query for Solr to respond to and it will probably impact the performance of your Sitecore implementation.

Tracking ContentSearch.SearchMaxResults in Solr

Solr has it’s own logging systems and this 2147483647 value can be seen in these logs once Solr has completed the API call. In a default Solr installation, the logs will be located at server/logs (check your log4j.properties file if you don’t know for sure where your logs are being stored) and you can a open up the log and scan for our ContentSearch.SearchMaxResults setting value. You’ll see entries such as:

INFO  – 2018-03-26 21:20:19.624; org.apache.solr.core.SolrCore; [domain_index_web] webapp=/solr path=/select params={q=(_template:(a6f3979d03df4441904309e4d281c11b)+AND+_path:(1f6ce22fa51943f5b6c20be96502e6a7))&fl=*,score&fq=_indexname:(domain_index_web)&rows=2147483647&version=2.2} hits=2681 status=0 QTime=88

  • The above Solr query returned 2,681 results (hits) and the QTime (time elapsed between the arrival of the query request to Solr and the completion of the request handler) was 88 milliseconds. This is probably no big deal as it relates to the ContentSearch.SearchMaxResults, but you don’t know if this data will increase over time…

INFO  – 2018-03-26 21:20:19.703; org.apache.solr.core.SolrCore; [domain_index_web] webapp=/solr path=/select params={q=((((include_nav_xml_b:(True)+AND+_path:(00ada316e3e4498797916f411bc283cf)))+AND+url_s:[*+TO+*])+AND+(_language:(no-NO)+OR+_language:(en)))&fl=*,score&fq=_indexname:( domain_index_web)&rows=2147483647&version=2.2} hits=9 status=0 QTime=16

  • The above Solr query returned 9 results (hits) and the QTime was 16 milliseconds. This is unlikely a problem when it comes to ContentSearch.SearchMaxResults.

 INFO  – 2018-03-26 21:20:19.812; org.apache.solr.core.SolrCore; [domain_index_web] webapp=/solr path=/select params={q=(_template:(8ed95d51A5f64ae1927622f3209a661f)+AND+regionids_sm:(33ada316e3e4498799916f411bc283cf))&fl=*,score&fq=_indexname:(domain_index_web)&rows=2147483647&version=2.2} hits=89372 status=0 QTime=1600

  • The above Solr query returned 89,372 results (hits) and the QTime was 1600 milliseconds. Look out. This is the type of query that could easily cause problems based on the Sitecore ContentSearch.SearchMaxResults setting as the volume of data Solr is working with is getting large. That query time is already climbing high and that’s a lot of results for Sitecore to require in a single request.

The impact of retrieving so many documents from Solr can cause a cascade of difficulties besides just the handling of the query. Solr caches the query results in memory and if you request 1,000,000 documents you could also be caching 1,000,000 million documents. Too much of this activity and it can stress Solr to the breaking point.

Best Practices

There is no magic value to set for ContentSearch.SearchMaxResults other than not “”. General best practice when retrieving lots of data from most any system is to use paging. It’s recommended to do that for Sitecore ContentSearch queries, too. A general recommendation would be to set a specific value for the ContentSearch.SearchMaxResults setting, such as “500” or “1000”. This should be thoroughly tested, however, as limiting the search results for an implementation that isn’t properly using search result paging can lead to inconsistent behavior across the site. Areas such as site maps, general site search, and other areas with implementation logic that could assume all the search results are available in a single request deserve special attention when tuning this setting.

What About Noisy Solr Neighbors?

I’ve worked on some implementations where Solr was a resource shared between a variety of Sitecore implementations. One project, in this example, might set ContentSearch.SearchMaxResults to “2000” for their Sitecore sites while another project sets a value of “500” – but what if there’s a third organization making use of the same Solr resources and that project doesn’t explicitly set a value for ContentSearch.SearchMaxResults? That one project leaves the setting at the “” default, so it uses the .Net int.MaxValue. This is a classic noisy neighbor problem where shared services become a point of vulnerability to all the consuming applications. The one project with “” for ContentSearch.SearchMaxResults could be responsible for dragging Solr performance down across all the sites.

Solr is an extensible platform much like Sitecore, and in some ways even more so. In Sitecore one extends pipelines or overrides events to achieve the customizations you desire; the same general idea can be applied to Solr – you just use Java with Solr instead of C#.

In this case, our concern being unbounded Solr queries, we can use an extension to a search component (org.apache.solr.handler.component.SearchComponent) to introduce our custom logic into the Solr query processing. In our case, we want to enforce limits to the number of rows a query can request. This would be a safety net in case an un-tuned Sitecore implementation left a ContentSearch.SearchMaxResults setting at the default.

Some care must be taken in how this is introduced into the Solr execution context and where exactly the custom logic is handled. This topic is all covered very well, along with sample code etc, at https://jorgelbg.wordpress.com/2015/03/12/adding-some-safeguard-measures-to-solr-with-a-searchcomponent/. For an enterprise Solr implementation with a variety of Sitecore consumers, a safety measure such as this could be vital to the general stability and perf of the platform – especially if one doesn’t have control over the specific Sitecore projects and their use (or abuse!) of the ContentSearch.SearchMaxResults setting. Maybe file this under best practices for governing Sitecore implementations with shared Solr infrastructure.

Sitecore Commerce 8.2.1 and ListManager with EXM

I’ve been engaged on a few more Sitecore Commerce builds (Commerce 8.2.1 still as these have carried over from 2017) and found an interesting wrinkle the other day. At first, it looked like a MongoDB issue as contacts weren’t being properly added to Sitecore ListManager “Lists” for use in EXM, but after scratching beneath the surface it was a lot more interesting. I used a utility sent my way by Sitecore support — it’s a .zip that has a Sitecore 8.2 update-5 specific tool for seeing Sitecore Lists and their status in terms of what’s in MongoDB and what’s in content search indexes (Solr in my case).

The tool made it pretty clear that the data was being stored properly in MongoDB but NOT in the search index (the screenshot below shows “Contacts in index: 3” which is after we corrected the problem — initially the Contacts in index would only ever show 0 and that’s what helped to isolate the problem to Sitecore Content Search):

lists

Another piece of evidence, in the Sitecore UI when we’d try to add a new contact to ListManager we’d see this message:

Please note that contacts in the list are currently being indexed, so not all contacts are available to view at this time. 0 out of 3 contacts are currently indexed.

Once we enabled verbose logging for search and examined the Search.log output, we see messages like this in the logs:

INFO  Solr Query - ?q=(type_t:(contact) AND contact.tags_sm:(ContactLists\:\{B76B0E74-E94D-4EBB-F219-6A347C75520D\}))&start=0&rows=20&fl=contact.contactid_s,contact.identifier_t,contact.firstname_t,contact.surname_t,contact.preferredemail_t,contactscount_tl,_uniqueid,_datasource&fq=_indexname:(sitecore_analytics_index)

I bolded the contact.tags_sm criteria in the query as that turned out to be key. This is the query that Sitecore issues to Solr when trying to obtain contacts for ListManager.

Through considerable trial and error, Solr schema inspection, and just determination (and I think Dana [https://twitter.com/thesoftwarejedi] was the one who finally yelled “bingo” and discovered this), when we run this query directly against Solr, we would find our missing ListManagement contact:s

http://solr-server:8983/solr/sitecore_analytics_index/select?q=(type_t:(contact) AND contact.tags_tm:(ContactLists\:\{B76B0E74-E94D-4EBB-F219-6A347C75520D\}))&start=0&rows=20&fl=contact.contactid_s,contact.identifier_t,contact.firstname_t,contact.surname_t,contact.preferredemail_t,contactscount_tl,_uniqueid,_datasource&fq=_indexname:(sitecore_analytics_index

The contact.tags_tm is bold above, and that was the crux of our challenge.

Sitecore was indexing contacts using tags_tm while Sitecore queries were looking for tags_sm.

In Sitecore config file CommerceServer\CommerceServer.ContentSearch.Solr.DefaultIndexConfiguration.config is a fragment of XML like the following:

<typeMatches hint="raw:AddTypeMatch">
 <typeMatch typeName="idCollection" type="System.Collections.Generic.List`1[[Sitecore.Data.ID, Sitecore.Kernel]]" fieldNameFormat="{0}_sm" multiValued="true" settingType="Sitecore.ContentSearch.SolrProvider.SolrSearchFieldConfiguration, Sitecore.ContentSearch.SolrProvider" />
 <typeMatch typeName="textCollection" type="System.Collections.Generic.List`1[System.String]" fieldNameFormat="{0}_tm" multiValued="true" settingType="Sitecore.ContentSearch.SolrProvider.SolrSearchFieldConfiguration, Sitecore.ContentSearch.SolrProvider" />

The typeMatch for typeName=”textCollection” was the issue, along with how it duplicates the mapping for the System.Collections.Generic.List`1[System.String] type — and the many places that were using the textCollection returnType that depended on this typeMatch. I removed the typeMatch from the config file and updated any dependency on textCollection to use stringCollection instead and . . . magic . . . the contacts properly indexed into Solr and the contact.tags_sm criteria would match the new data.

According to Sitecore Support, this is a defect in the way Commerce search indexing is setup and it’s overlap with Sitecore ListManager (EXM in our case). Commerce should probably use a custom configuration section instead of modifying the default index configuration, but we’ll have to wait and see how this is implemented in a future patch or release.

For the time being, I’ve created the following Sitecore patch configuration file to remove the textCollection elements. This is preferable to editing the standard Sitecore configuration files that come with the product and will make for easier Sitecore upgrades or adjustments when (or if?) a true correction for this defect is released by Sitecore:

<?xml version="1.0" encoding="utf-8" ?>
<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/">
 <sitecore>
 <contentSearch>
 <indexConfigurations>
 <defaultSolrIndexConfiguration type="Sitecore.ContentSearch.SolrProvider.SolrIndexConfiguration, Sitecore.ContentSearch.SolrProvider">
 <fieldMap type="Sitecore.ContentSearch.SolrProvider.SolrFieldMap, Sitecore.ContentSearch.SolrProvider">
 <typeMatches hint="raw:AddTypeMatch">
 <typeMatch typeName="textCollection">
 <patch:delete />
 </typeMatch>
 </typeMatches>
 <fieldNames>
 <field fieldName="instocklocations">
 <patch:attribute name="returnType">stringCollection</patch:attribute>
 </field>
 <field fieldName="outofstocklocations">
 <patch:attribute name="returnType">stringCollection</patch:attribute>
 </field>
 <field fieldName="orderablelocations">
 <patch:attribute name="returnType">stringCollection</patch:attribute>
 </field>
 <field fieldName="commerceancestornames">
 <patch:attribute name="returnType">stringCollection</patch:attribute>
 </field> 
 </fieldNames>
 <fieldTypes hint="raw:AddFieldByFieldTypeName">
 <fieldType fieldTypeName="catalog selection control">
 <patch:attribute name="returnType">stringCollection</patch:attribute>
 </fieldType>
 <fieldType fieldTypeName="child categories list control">
 <patch:attribute name="returnType">stringCollection</patch:attribute>
 </fieldType>
 <fieldType fieldTypeName="child products list control">
 <patch:attribute name="returnType">stringCollection</patch:attribute>
 </fieldType>
 <fieldType fieldTypeName="parent categories list control">
 <patch:attribute name="returnType">stringCollection</patch:attribute>
 </fieldType>
 <fieldType fieldTypeName="relationship list control">
 <patch:attribute name="returnType">stringCollection</patch:attribute>
 </fieldType>
 <fieldType fieldTypeName="variant list control">
 <patch:attribute name="returnType">stringCollection</patch:attribute>
 </fieldType>
 </fieldTypes>
 </fieldMap>
 <documentOptions>
 <fields hint="raw:AddComputedIndexField">
 <field fieldName="instocklocations">
 <patch:attribute name="returnType">stringCollection</patch:attribute>
 </field>
 <field fieldName="outofstocklocations">
 <patch:attribute name="returnType">stringCollection</patch:attribute>
 </field>
 <field fieldName="orderablelocations">
 <patch:attribute name="returnType">stringCollection</patch:attribute>
 </field>
 <field fieldName="commerceancestornames">
 <patch:attribute name="returnType">stringCollection</patch:attribute>
 </field>
 </fields>
 </documentOptions>
 </defaultSolrIndexConfiguration>
 </indexConfigurations>
 </contentSearch>
 </sitecore>
</configuration>

Azure Search compared to Solr for Sitecore PaaS (Chapter 2: Querying)

I carried forward my Azure PaaS benchmarking work from earlier this month (see this post on the indexing side of the equation for the start of the story).

For a quick refresher, I’ve used an ARM template based deployment of Sitecore to get a system resembling the following:

ARM Templates Arch

The element I’m exercising in the benchmarks is how Sitecore’s web servers work with the “Search” icon in the diagram above.  I tackled the document ingestion side (how data gets into the search indexes) in my earlier post.  This post addresses the querying side of things (how data gets out of the search indexes).

By default, Azure PaaS search with Sitecore is configured to use Azure Search.  Solr is another viable option.

Here’s where I’ll interject that Coveo also has an excellent search technology for Sitecore.  There are specific use-cases where Coveo is a strong fit, however, and in my indexing the sitecore_core_index evaluations in the earlier post Coveo would not be considered a good fit.  This changes, however, for the set of benchmarks I’ve run in this post.  I am in the process of testing the Coveo approach in Azure PaaS for Sitecore . . . it’s hot off the presses, so there are still rough edges to work around . . . but Coveo is not part of this write-up for the time being.  I will post an update here once I’ve completed the analysis involving Coveo.

In considering Azure Search vs Solr, I used a methodology with JMeter laid out in a great KB article from Sitecore at https://kb.sitecore.net/articles/398589.  I have a LaunchSitecore site running and I use JMeter to automate visits to the site, simulating simple user behaviour.  I don’t go too crazy with this, because I’m more interested in exercising a basic Sitecore work load than doing a deep-dive in xDB traffic simulation.

My first post showed a clear advantage to Solr for the indexing side of search, but for the querying side I can say there is very little variance between Azure Search and Solr.  Sitecore does a good job of protecting data repositories with layers of data and html caches, but even with those those features disabled (we’re talking cacheHtml=”false”on the site definition, <cacheSizes> configuration all set to a heretical zero (“0”), etc) there isn’t a significant difference between the two technologies.

I’m not going to put up a graph of it, because the throughput as measured by JMeter for tests of 20, 50, 100, 200, or  more visitors performed almost the same.

I could develop a more search heavy set of benchmarks, performing a random dictionary of searches against a large custom index that Sitecore responds to but must bypass all caches etc, but that feels like overkill for what I’m looking to achieve.  Maybe that’s appropriate once I bring Coveo into the benchmarking fun.

For this, I wanted to get a sense for the relative performance between Azure Search and Solr as it relates to Sitecore PaaS and I think I’ve done that.  Succinctly:

  1. Solr is considerably faster at search indexing (courtesy of the search provider implementation in Sitecore)
  2. both Azure Search and Solr perform about the same when it comes to querying a basic Sitecore site like LaunchSitecore (again, courtesy of the search provider implementation in Sitecore)

This isn’t the definitive take on the topic.  It’s more like the beginning.  Azure Search is native to Azure, so there are significant advantages there.  There is a lot of momentum around Azure and Sitecore in general, so that story will continue to evolve.

There are Solr as a service options out there that make Solr for Sitecore much easier (such as www.measuredsearch.com which I’ll blog about in the next few days), but Solr can be a lot for corporate IT departments to take on, so it isn’t a simple choice for everyone.

 

 

Auto-suggest with Solr Facets in Sitecore

Sitecore’s auto-suggest feature for search in the Content Authoring environment is pretty slick, but there is some confusing documentation from Sitecore about how to set it up properly with Solr.  As of today, Sitecore’s documentation on integrating with Solr indicates…

“When you implement Solr with Sitecore you need to enable term support in the Solr search handler.  The term functionality is built into Solr but is disabled by default. To power the dropdowns in the UI you must enable the terms component.

That above documentation will be updated at some point by Sitecore, since it’s no longer the case for the latest version of Sitecore — 8.2 rev. 161221 (Update-2).

In earlier versions of Sitecore, search in the Sitecore Content Editor could make use of the Solr “terms” component to populate suggestions.  This is why this guidance has previously been part of the Solr integration documentation from Sitecore.  Read more about Solr’s use of this auto-suggest through terms at https://cwiki.apache.org/confluence/display/solr/The+Terms+Component.

Sitecore’s strategy of making use of the “terms” component has changed with recently, however.  Sitecore now uses faceting with Solr instead of terms.

To prove this out, I’m going to turn to the Solr logs after I try some queries for content in the Sitecore client.  Refer to this documentation from Sitecore if you’re looking for more context on how to use the search facility — there are a lot of features that are very under-utilized, in my experience.  I’ll specify a clause by typing Updatedby: and then “siteco” to engage the auto-suggest feature:searbhby

Very nice, right?

Under the covers, the Solr logs will reveal something like this . . .

2017-02-17 19:33:07.546 INFO  (qtp33171127-11) [   x:trial_core] o.a.s.c.S.Request [trial_core]  webapp=/solr path=/select params={q=*:*&facet.field=parsedupdatedby_s&facet.prefix=siteco&rows=0&facet=true&version=2.2&facet.sort=true} hits=24626 status=0 QTime=2

. .  . and that can be further debugged by turning it into the URL request powering that auto-suggest response . . .

http://server:port/solr/sitecore_master_index/select?q=*:*&facet.field=parsedupdatedby_s&facet.prefix=siteco&rows=0&facet=true&version=2.2&facet.sort=true

. . . and that would return results like the following:

solrresponse

If instead we tried an author: search in Sitecore, for example, the facet.field would be parsedcreatedby_s instead of parsedupdatedby_s.

I don’t want to go too far down this rabbit hole.  I really just wanted to share that despite what the documentation shows, it’s not necessary to enable the Solr term component on the /select requestHandler in Solr if you’re using the most recent version of Sitecore.  I’ve confirmed with official Sitecore support that this change was tagged as change #444661 and that’s it was incorporated into the product since Sitecore 8.1 update-1 (rev. 151207); the release notes for 8.1 update-1 are vague, but here it is:

Autocomplete for known fields such as language did not work in the Content Editor Search tab using the SOLR provider. The problem was related to the SOLR server configuration. This has been fixed so that Sitecore no longer depends on this configuration. (444661)

Happy faceting to all!

 

High Availability of Azure Search with Sitecore

I’ve been investigating Azure Search with Sitecore’s new Azure App Service offering.  I’ve got a giant Excel file of benchmarks and charts based on several permutations and configurations, and several other interesting tidbits that I need to organize into posts to this blog . . . so look for much more about this general topic in the future.

For now, I thought I’d share a point I’ve confirmed with Sitecore support regarding a limitation of Azure Search with Sitecore’s CloudSearchProviderIndex.  The CloudSearchProviderIndex is what the standard Platform-As-A-Service product from Sitecore will use in place of Lucene or Solr or Coveo to power content search for Sitecore.  This is the key building block for working with Azure Search through Sitecore.  While I was performing performance benchmarks for search re-indexing with Sitecore, I noticed the Azure Search document count would drop to 0 and I’d see odd results from Sitecore requests that depended on the search index.  This was classic “search index is being worked on, don’t rely on querying it until the work is done” behaviour.  This was corrected several years ago through Sitecore’s addition of a SwitchOnRebuildLuceneIndex and equivalent for Solr . . . but there is no such equivalent for the CloudSearchProviderIndex used by Azure PaaS solutions.  Essentially: Sitecore is using a single copy of search indexes for query and re-indexing operations, limiting the availability of search during maintenance work.

One could argue this may not be such a big deal because one may not rebuild Azure Search indexes with any frequency.  I’m not sold on this argument, however, since the Sitecore projects I know will frequently perform re-indexing due to development changes to the schema, content synchronization demands, or just routine deployment standard practices.

Further complicating this issue is that my benchmarking for Azure Search re-indexing through Sitecore leaves a lot to be desired.  It can be slow.  This could make for an extended period of search index unavailability due to the CloudSearchProviderIndex‘s limitations.  I’ll share the full battery of testing I’ve done in a future post, but for now let me share the timings I’m observing regardless of the number of Azure Search partitions or replicas I’m working through (partitions should generally improve indexing performance; replicas should generally improve querying performance):

App Service Configuration Time for 20,000 Sitecore Items to Re-Index with Azure Search
Azure PaaS Standard (S1) CM IIS (OOTB from the Marketplace) 66 minutes
Azure S2 CM IIS 35 minutes
Azure S3 CM IIS 25 minutes
Azure P2 CM IIS 35 minutes
Azure P3 CM IIS 24 minutes

For reference, with Lucene indexes this operation would take 5 minutes or less.  The scaling options for Azure Search, Partition count and Replica count, have a minimal impact to the re-indexing operation.

I’ll go into details of this later, but it could be that . . .

  • 20,000 Sitecore items is too small a figure to benefit from scaling with Azure Search?  Many customers have 100,000 or more items, so perhaps I should evaluate a larger data set.
  • there are bottlenecks at the SQL tier?  App Insights here I come…
  • the fact Sitecore isn’t using Azure Search Indexers to ingest data and relies on the Sitecore crawling logic to handle data indexing is artificially slowing this process down

For the time being, Sitecore has responded that improving the availability of Azure Search indexes during rebuilds is an official “feature request” and assigned reference number 146822 

In the meantime, if a project needs high availability for Azure Search indexes one may need to roll up their sleeves and craft their own SwitchOnCloudSearchProviderIndex.  It appears fairly straight-forward based on reviewing how this is solved for Solr, just as one example.  A key caveat is in the Azure Search capacity planning documentation:

High availability for Azure Search pertains to queries and index updates that don’t involve rebuilding an index. If you add or delete a field, change a data type, or rename a field, you will need to rebuild the index. To rebuild the index, you must delete the index, re-create the index, and reload the data.

To maintain index availability during a rebuild, you must have a copy of the index with a different name on the same service, or a copy of the index with the same name on a different service, and then provide redirection or failover logic in your code.

It looks like providing for high availability would double the price of Azure Search indexes, so there are a cascade of complications related to this.

My investigations into Sitecore and Azure Search yielded this complication — it’s not insurmountable, and I actually find it fascinating how an on-premises product (classic Sitecore) will evolve into a cloud-first product.  This is just one piece of the evolutionary story.  I expect this will be addressed sooner rather than later in an official upgrade or patch from Sitecore, and until then it’s important to understand this nuance to the Sitecore PaaS landscape.

Strategies for Sitecore Index Organization into Solr Cores

A few days ago, I shared a graphic I put together to illustrate how Solr can be used to organize Sitecore “indexes” into Solr “cores” — this post has the complete graphic.  I want to elaborate on how one sets Sitecore up to use these two approaches, and dig further into the details.

1:1 Sitecore Index to Solr Core Strategy

To start, here’s a visual showing the typical way Sitecore “indexes” are structured in Solr using a one-to-one (1:1) mapping:

solrseparate

This shows each of the default search indexes defined by Sitecore organized into their own cores defined in Solr.  It’s a 1:1 mapping.  This 1:1 strategy means each index has their own configuration (“conf”) directory in Solr, so seperate stopwords.txt, solrconfig.xml, schema.xml, and so on; it also means each index has their own (“data”) directory in Solr, so separate tlog folders, separate Segment files, etc.

This is the setup one achieves by following the community documentation on setting up Sitecore with Solr; specifically, this quote from that write-up is where you’re doing a lot of the grunt work around setting up distinct Solr cores for each Sitecore index:

“Use the process detailed in Steps 4-7 to create new cores for all the remaining indexes you would like to move to SOLR.”

Since this is the common strategy, I’m not going to go into more details as it’s straight-forward to Sitecore teams.

Kitchen Sink (∞:1 Sitecore Index to Solr Core) Strategy

Here is the comparable graphic showing the ∞:1 strategy of structuring Sitecore indexes in Solr; I like to think of this as the Kitchen Sink container for all Sitecore indexes, since everything goes into that single core just like the kitchen sink:

solrsame

With this approach, a single data and configuration definition is shared by all the Sitecore indexes that reside in Solr.  The advantages are reduced management (setting up the Solr replicationHandler, for example, requires updating 15 solrconfig.xml files in the 1:1 approach, but the Kitchen Sink would require only one solrconfig.xml file to update).  There are significant drawbacks to consider with the Kitchen Sink, however, as you’re sacrificing scaling options specific to each Sitecore index and enforcing a common schema.xml for every index stored in this single core.  There are plenty of reasons not to do this for a production installation of Sitecore, but for a crowded Sitecore environment used for acceptance testing or other use-cases where bullet-proof stability and lots of flexibility when it comes to performance tuning, sharding, etc is not necessary, you could make a good case for the Kitchen Sink strategy.

The only change necessary to a standard Sitecore configuration to support this Kitchen Sink approach is to patch the contentSearch definitions for the Sitecore indexes where the name of the Solr “core” is specified (stored by default in config files like Sitecore.ContentSearch.Solr.Index.Master.config,  Sitecore.ContentSearch.Solr.Index.Web.config, etc).   This is telling Sitecore which Solr core contains the index, but the actual name of the core doesn’t factor into the ContentSearch API code one uses with Sitecore.   A patch such as the following would handle both the sitecore_master_index and the sitecore_web_index to organize into a Solr Core named “kitchen_sink:”

<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/">
  <sitecore>
    <contentSearch>
      <configuration>
        <indexes>
          <index id="sitecore_master_index" type="Sitecore.ContentSearch.SolrProvider.SolrSearchIndex, Sitecore.ContentSearch.SolrProvider">
            <param desc="core">kitchen_sink</param>
          </index>
          <index id="sitecore_web_index" type="Sitecore.ContentSearch.SolrProvider.SolrSearchIndex, Sitecore.ContentSearch.SolrProvider">
            <param desc="core">kitchen_sink</param>
          </index>
        </indexes>
        </configuration>
    </contentSearch>
  </sitecore>
</configuration>

If you peek into the Solr Admin for the kitchen_sink core that I’m using, specifically the Schema Browser in the Solr Admin UI, it becomes clear how Sitecore uses a field named “_indexname” to represent the Sitecore index value.  For this screenshot below, I’ve set the kitchen_sink core to contain two Sitecore indexes: sitecore_master_index and sitecore_web index:

solrterms

This shows us the two terms stored in that _indexname field, and that there are 18,774 for sitecore_master_index and 5,851 for sitecore_web_index.  Even though the indexes are contained in the same Solr Core, Sitecore ContentSearch API code like this . . .

Sitecore.ContentSearch.ISearchIndex index = 
  ContentSearchManager.GetIndex(indexName);
    using (Sitecore.ContentSearch.IProviderSearchContext ctx = 
      index.CreateSearchContext())

. . . doesn’t care whether all the Sitecore indexes reside in a single Solr “Core” or if they’re in their own following a 1:1 mapping strategy.

Caveats and Going In A Different Direction

There was a bug or two in earlier versions of Sitecore related to this, so be careful with early Sitecore 7.2 or Sitecore 8 implementations (and if you’re using Sitecore 7.5, you’ve got plenty of other things to worry about so don’t sweat a Solr Core organization strategy!).

I should also note that while this post is looking at combining Sitecore indexes into a single Solr Core for convenience and to reduce the management headaches of having 15 sets of Solr Cores to update etc, there are some implementations that go in the opposite direction.  Consider a strategy like the following:

solrmindblown

 

There may be circumstances where keeping Sitecore indexes in their own Solr Core — and even isolating them further into their own Solr implementation — could be in order.  Solr runs in a JVM and this could certainly factor in, but there are other shared run-time resources that Solr sets aside for the whole Solr application.

I’m not familiar enough with these sorts of implementations that I want to comment further or recommend any course of action related to this right now, but it’s good to think about and consider with Solr tuning scenarios.  I just wanted to share it, as it’s a logical dimension to consider given the two previous strategies in this post.