Azure Search compared to Solr for Sitecore PaaS (Chapter 2: Querying)

I carried forward my Azure PaaS benchmarking work from earlier this month (see this post on the indexing side of the equation for the start of the story).

For a quick refresher, I’ve used an ARM template based deployment of Sitecore to get a system resembling the following:

ARM Templates Arch

The element I’m exercising in the benchmarks is how Sitecore’s web servers work with the “Search” icon in the diagram above.  I tackled the document ingestion side (how data gets into the search indexes) in my earlier post.  This post addresses the querying side of things (how data gets out of the search indexes).

By default, Azure PaaS search with Sitecore is configured to use Azure Search.  Solr is another viable option.

Here’s where I’ll interject that Coveo also has an excellent search technology for Sitecore.  There are specific use-cases where Coveo is a strong fit, however, and in my indexing the sitecore_core_index evaluations in the earlier post Coveo would not be considered a good fit.  This changes, however, for the set of benchmarks I’ve run in this post.  I am in the process of testing the Coveo approach in Azure PaaS for Sitecore . . . it’s hot off the presses, so there are still rough edges to work around . . . but Coveo is not part of this write-up for the time being.  I will post an update here once I’ve completed the analysis involving Coveo.

In considering Azure Search vs Solr, I used a methodology with JMeter laid out in a great KB article from Sitecore at https://kb.sitecore.net/articles/398589.  I have a LaunchSitecore site running and I use JMeter to automate visits to the site, simulating simple user behaviour.  I don’t go too crazy with this, because I’m more interested in exercising a basic Sitecore work load than doing a deep-dive in xDB traffic simulation.

My first post showed a clear advantage to Solr for the indexing side of search, but for the querying side I can say there is very little variance between Azure Search and Solr.  Sitecore does a good job of protecting data repositories with layers of data and html caches, but even with those those features disabled (we’re talking cacheHtml=”false”on the site definition, <cacheSizes> configuration all set to a heretical zero (“0”), etc) there isn’t a significant difference between the two technologies.

I’m not going to put up a graph of it, because the throughput as measured by JMeter for tests of 20, 50, 100, 200, or  more visitors performed almost the same.

I could develop a more search heavy set of benchmarks, performing a random dictionary of searches against a large custom index that Sitecore responds to but must bypass all caches etc, but that feels like overkill for what I’m looking to achieve.  Maybe that’s appropriate once I bring Coveo into the benchmarking fun.

For this, I wanted to get a sense for the relative performance between Azure Search and Solr as it relates to Sitecore PaaS and I think I’ve done that.  Succinctly:

  1. Solr is considerably faster at search indexing (courtesy of the search provider implementation in Sitecore)
  2. both Azure Search and Solr perform about the same when it comes to querying a basic Sitecore site like LaunchSitecore (again, courtesy of the search provider implementation in Sitecore)

This isn’t the definitive take on the topic.  It’s more like the beginning.  Azure Search is native to Azure, so there are significant advantages there.  There is a lot of momentum around Azure and Sitecore in general, so that story will continue to evolve.

There are Solr as a service options out there that make Solr for Sitecore much easier (such as www.measuredsearch.com which I’ll blog about in the next few days), but Solr can be a lot for corporate IT departments to take on, so it isn’t a simple choice for everyone.

 

 

Auto-suggest with Solr Facets in Sitecore

Sitecore’s auto-suggest feature for search in the Content Authoring environment is pretty slick, but there is some confusing documentation from Sitecore about how to set it up properly with Solr.  As of today, Sitecore’s documentation on integrating with Solr indicates…

“When you implement Solr with Sitecore you need to enable term support in the Solr search handler.  The term functionality is built into Solr but is disabled by default. To power the dropdowns in the UI you must enable the terms component.

That above documentation will be updated at some point by Sitecore, since it’s no longer the case for the latest version of Sitecore — 8.2 rev. 161221 (Update-2).

In earlier versions of Sitecore, search in the Sitecore Content Editor could make use of the Solr “terms” component to populate suggestions.  This is why this guidance has previously been part of the Solr integration documentation from Sitecore.  Read more about Solr’s use of this auto-suggest through terms at https://cwiki.apache.org/confluence/display/solr/The+Terms+Component.

Sitecore’s strategy of making use of the “terms” component has changed with recently, however.  Sitecore now uses faceting with Solr instead of terms.

To prove this out, I’m going to turn to the Solr logs after I try some queries for content in the Sitecore client.  Refer to this documentation from Sitecore if you’re looking for more context on how to use the search facility — there are a lot of features that are very under-utilized, in my experience.  I’ll specify a clause by typing Updatedby: and then “siteco” to engage the auto-suggest feature:searbhby

Very nice, right?

Under the covers, the Solr logs will reveal something like this . . .

2017-02-17 19:33:07.546 INFO  (qtp33171127-11) [   x:trial_core] o.a.s.c.S.Request [trial_core]  webapp=/solr path=/select params={q=*:*&facet.field=parsedupdatedby_s&facet.prefix=siteco&rows=0&facet=true&version=2.2&facet.sort=true} hits=24626 status=0 QTime=2

. .  . and that can be further debugged by turning it into the URL request powering that auto-suggest response . . .

http://server:port/solr/sitecore_master_index/select?q=*:*&facet.field=parsedupdatedby_s&facet.prefix=siteco&rows=0&facet=true&version=2.2&facet.sort=true

. . . and that would return results like the following:

solrresponse

If instead we tried an author: search in Sitecore, for example, the facet.field would be parsedcreatedby_s instead of parsedupdatedby_s.

I don’t want to go too far down this rabbit hole.  I really just wanted to share that despite what the documentation shows, it’s not necessary to enable the Solr term component on the /select requestHandler in Solr if you’re using the most recent version of Sitecore.  I’ve confirmed with official Sitecore support that this change was tagged as change #444661 and that’s it was incorporated into the product since Sitecore 8.1 update-1 (rev. 151207); the release notes for 8.1 update-1 are vague, but here it is:

Autocomplete for known fields such as language did not work in the Content Editor Search tab using the SOLR provider. The problem was related to the SOLR server configuration. This has been fixed so that Sitecore no longer depends on this configuration. (444661)

Happy faceting to all!

 

High Availability of Azure Search with Sitecore

I’ve been investigating Azure Search with Sitecore’s new Azure App Service offering.  I’ve got a giant Excel file of benchmarks and charts based on several permutations and configurations, and several other interesting tidbits that I need to organize into posts to this blog . . . so look for much more about this general topic in the future.

For now, I thought I’d share a point I’ve confirmed with Sitecore support regarding a limitation of Azure Search with Sitecore’s CloudSearchProviderIndex.  The CloudSearchProviderIndex is what the standard Platform-As-A-Service product from Sitecore will use in place of Lucene or Solr or Coveo to power content search for Sitecore.  This is the key building block for working with Azure Search through Sitecore.  While I was performing performance benchmarks for search re-indexing with Sitecore, I noticed the Azure Search document count would drop to 0 and I’d see odd results from Sitecore requests that depended on the search index.  This was classic “search index is being worked on, don’t rely on querying it until the work is done” behaviour.  This was corrected several years ago through Sitecore’s addition of a SwitchOnRebuildLuceneIndex and equivalent for Solr . . . but there is no such equivalent for the CloudSearchProviderIndex used by Azure PaaS solutions.  Essentially: Sitecore is using a single copy of search indexes for query and re-indexing operations, limiting the availability of search during maintenance work.

One could argue this may not be such a big deal because one may not rebuild Azure Search indexes with any frequency.  I’m not sold on this argument, however, since the Sitecore projects I know will frequently perform re-indexing due to development changes to the schema, content synchronization demands, or just routine deployment standard practices.

Further complicating this issue is that my benchmarking for Azure Search re-indexing through Sitecore leaves a lot to be desired.  It can be slow.  This could make for an extended period of search index unavailability due to the CloudSearchProviderIndex‘s limitations.  I’ll share the full battery of testing I’ve done in a future post, but for now let me share the timings I’m observing regardless of the number of Azure Search partitions or replicas I’m working through (partitions should generally improve indexing performance; replicas should generally improve querying performance):

App Service Configuration Time for 20,000 Sitecore Items to Re-Index with Azure Search
Azure PaaS Standard (S1) CM IIS (OOTB from the Marketplace) 66 minutes
Azure S2 CM IIS 35 minutes
Azure S3 CM IIS 25 minutes
Azure P2 CM IIS 35 minutes
Azure P3 CM IIS 24 minutes

For reference, with Lucene indexes this operation would take 5 minutes or less.  The scaling options for Azure Search, Partition count and Replica count, have a minimal impact to the re-indexing operation.

I’ll go into details of this later, but it could be that . . .

  • 20,000 Sitecore items is too small a figure to benefit from scaling with Azure Search?  Many customers have 100,000 or more items, so perhaps I should evaluate a larger data set.
  • there are bottlenecks at the SQL tier?  App Insights here I come…
  • the fact Sitecore isn’t using Azure Search Indexers to ingest data and relies on the Sitecore crawling logic to handle data indexing is artificially slowing this process down

For the time being, Sitecore has responded that improving the availability of Azure Search indexes during rebuilds is an official “feature request” and assigned reference number 146822 

In the meantime, if a project needs high availability for Azure Search indexes one may need to roll up their sleeves and craft their own SwitchOnCloudSearchProviderIndex.  It appears fairly straight-forward based on reviewing how this is solved for Solr, just as one example.  A key caveat is in the Azure Search capacity planning documentation:

High availability for Azure Search pertains to queries and index updates that don’t involve rebuilding an index. If you add or delete a field, change a data type, or rename a field, you will need to rebuild the index. To rebuild the index, you must delete the index, re-create the index, and reload the data.

To maintain index availability during a rebuild, you must have a copy of the index with a different name on the same service, or a copy of the index with the same name on a different service, and then provide redirection or failover logic in your code.

It looks like providing for high availability would double the price of Azure Search indexes, so there are a cascade of complications related to this.

My investigations into Sitecore and Azure Search yielded this complication — it’s not insurmountable, and I actually find it fascinating how an on-premises product (classic Sitecore) will evolve into a cloud-first product.  This is just one piece of the evolutionary story.  I expect this will be addressed sooner rather than later in an official upgrade or patch from Sitecore, and until then it’s important to understand this nuance to the Sitecore PaaS landscape.

Digesting Sitecore Commerce 8.2.1

A whole new take on Sitecore Commerce is hot off the presses and I had an opportunity to dig into it briefly this week.  Taking from the release notes and the documentation, which is actually fairly extensive:

This is Sitecore’s new re-envisioned Commerce product.

“Release number 8.2.1 has been assigned to reflect the compatibility with release 8.2 of the Sitecore Experience Platform (Sitecore XP). However, Sitecore Commerce 8.2.1 is not an update to previous Commerce 8.2 releases, but is an entirely new Commerce product and release.”

I worked on a few Sitecore Commerce implementations a while back, but it had been over a year since I ran a proof-of-concept or even completed the installation.  My background with the permutations of “Commerce” on Windows goes back over 15 years, starting with the Microsoft Site Server product and the initial craze around XSLT rendering HTML output from content engines . . . I remember a horrendous e-commerce project designed with a Commerce Server beta and the “elegance” of XML was a complete productivity killer.  It’s a poor worker who blames their tools, right? 🙂   I digress…

Anyway, the last real work I did with Sitecore Commerce was in 2015 and I recall the installation/configuration process being arduous, with both Web and Desktop elements, lots of security hoops to jump through, COM everywhere, and even registry edits for good measure.

This new 8.2.1 Release installation process is certainly an improvement over what is now considered “Legacy” Sitecore Commerce . . . but standing up a baseline installation to kick the tires will still likely occupy a solid day of your time.  The documentation is good, but not 100% bulletproof because there are so many moving parts.  I know I ended up needing to install some new .Net elements for ASP.NET Core . . . and I needed to install an old .Net framework SDK to get another piece of the puzzle to run on the IIS server. I took notes on what extra steps I needed to perform, but I was using a fairly old Rackspace server image so not particularly applicable to everyone.  A few examples from my notes, however:

  1. Re-install the Default Web Site to IIS (our scripted Sitecore installation cleans out the Default Web Site in IIS, so I needed to add it back in to satisfy an assumption one of the various installers made)
  2. Configure IIS 6 Metabase Compatibility to satisfy a requirement for the Commerce Server installation

It’s these sorts of nuances that I recall from previous run-ins with the Commerce platform Sitecore inherited and now fully owns.  In some respects, not all that much has changed.

On the bright side, however, there are clean new SPEAK applications for working with Commerce data:

threecommerce

To get this far, however, you really have to earn it.  There are eight Sitecore “packages” that must be installed, for example, once you get the base Commerce Server + Sitecore + Commerce Core running . . . oh, and they need to be installed in a specific order that is NOT alphabetical, either:

packages

On the bright side, there is a lot more documentation than I’ve seen before on this set of products.  I worked with Sitecore Commerce at a time when there was essentially no real current information about the product, so maybe I’m satisfied too easily with what is now available . . . but I really found this an area Sitecore has improved upon.

Based on this documentation, I was able to pull out some of Sitecore’s diagrams of the product and compile this single visual of the Sitecore Commerce platform as I understand it for version 8.2.1:

8-2-1-annotation

The above is just consolidated from a variety of pictures and notes contained throughout the official documentation from Sitecore on the subject, but one of the ways I digest a system is by diagramming and scribbling notes as I go through a project.  Maybe others will find it useful, too.

Strategies for Sitecore Index Organization into Solr Cores

A few days ago, I shared a graphic I put together to illustrate how Solr can be used to organize Sitecore “indexes” into Solr “cores” — this post has the complete graphic.  I want to elaborate on how one sets Sitecore up to use these two approaches, and dig further into the details.

1:1 Sitecore Index to Solr Core Strategy

To start, here’s a visual showing the typical way Sitecore “indexes” are structured in Solr using a one-to-one (1:1) mapping:

solrseparate

This shows each of the default search indexes defined by Sitecore organized into their own cores defined in Solr.  It’s a 1:1 mapping.  This 1:1 strategy means each index has their own configuration (“conf”) directory in Solr, so seperate stopwords.txt, solrconfig.xml, schema.xml, and so on; it also means each index has their own (“data”) directory in Solr, so separate tlog folders, separate Segment files, etc.

This is the setup one achieves by following the community documentation on setting up Sitecore with Solr; specifically, this quote from that write-up is where you’re doing a lot of the grunt work around setting up distinct Solr cores for each Sitecore index:

“Use the process detailed in Steps 4-7 to create new cores for all the remaining indexes you would like to move to SOLR.”

Since this is the common strategy, I’m not going to go into more details as it’s straight-forward to Sitecore teams.

Kitchen Sink (∞:1 Sitecore Index to Solr Core) Strategy

Here is the comparable graphic showing the ∞:1 strategy of structuring Sitecore indexes in Solr; I like to think of this as the Kitchen Sink container for all Sitecore indexes, since everything goes into that single core just like the kitchen sink:

solrsame

With this approach, a single data and configuration definition is shared by all the Sitecore indexes that reside in Solr.  The advantages are reduced management (setting up the Solr replicationHandler, for example, requires updating 15 solrconfig.xml files in the 1:1 approach, but the Kitchen Sink would require only one solrconfig.xml file to update).  There are significant drawbacks to consider with the Kitchen Sink, however, as you’re sacrificing scaling options specific to each Sitecore index and enforcing a common schema.xml for every index stored in this single core.  There are plenty of reasons not to do this for a production installation of Sitecore, but for a crowded Sitecore environment used for acceptance testing or other use-cases where bullet-proof stability and lots of flexibility when it comes to performance tuning, sharding, etc is not necessary, you could make a good case for the Kitchen Sink strategy.

The only change necessary to a standard Sitecore configuration to support this Kitchen Sink approach is to patch the contentSearch definitions for the Sitecore indexes where the name of the Solr “core” is specified (stored by default in config files like Sitecore.ContentSearch.Solr.Index.Master.config,  Sitecore.ContentSearch.Solr.Index.Web.config, etc).   This is telling Sitecore which Solr core contains the index, but the actual name of the core doesn’t factor into the ContentSearch API code one uses with Sitecore.   A patch such as the following would handle both the sitecore_master_index and the sitecore_web_index to organize into a Solr Core named “kitchen_sink:”

<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/">
  <sitecore>
    <contentSearch>
      <configuration>
        <indexes>
          <index id="sitecore_master_index" type="Sitecore.ContentSearch.SolrProvider.SolrSearchIndex, Sitecore.ContentSearch.SolrProvider">
            <param desc="core">kitchen_sink</param>
          </index>
          <index id="sitecore_web_index" type="Sitecore.ContentSearch.SolrProvider.SolrSearchIndex, Sitecore.ContentSearch.SolrProvider">
            <param desc="core">kitchen_sink</param>
          </index>
        </indexes>
        </configuration>
    </contentSearch>
  </sitecore>
</configuration>

If you peek into the Solr Admin for the kitchen_sink core that I’m using, specifically the Schema Browser in the Solr Admin UI, it becomes clear how Sitecore uses a field named “_indexname” to represent the Sitecore index value.  For this screenshot below, I’ve set the kitchen_sink core to contain two Sitecore indexes: sitecore_master_index and sitecore_web index:

solrterms

This shows us the two terms stored in that _indexname field, and that there are 18,774 for sitecore_master_index and 5,851 for sitecore_web_index.  Even though the indexes are contained in the same Solr Core, Sitecore ContentSearch API code like this . . .

Sitecore.ContentSearch.ISearchIndex index = 
  ContentSearchManager.GetIndex(indexName);
    using (Sitecore.ContentSearch.IProviderSearchContext ctx = 
      index.CreateSearchContext())

. . . doesn’t care whether all the Sitecore indexes reside in a single Solr “Core” or if they’re in their own following a 1:1 mapping strategy.

Caveats and Going In A Different Direction

There was a bug or two in earlier versions of Sitecore related to this, so be careful with early Sitecore 7.2 or Sitecore 8 implementations (and if you’re using Sitecore 7.5, you’ve got plenty of other things to worry about so don’t sweat a Solr Core organization strategy!).

I should also note that while this post is looking at combining Sitecore indexes into a single Solr Core for convenience and to reduce the management headaches of having 15 sets of Solr Cores to update etc, there are some implementations that go in the opposite direction.  Consider a strategy like the following:

solrmindblown

 

There may be circumstances where keeping Sitecore indexes in their own Solr Core — and even isolating them further into their own Solr implementation — could be in order.  Solr runs in a JVM and this could certainly factor in, but there are other shared run-time resources that Solr sets aside for the whole Solr application.

I’m not familiar enough with these sorts of implementations that I want to comment further or recommend any course of action related to this right now, but it’s good to think about and consider with Solr tuning scenarios.  I just wanted to share it, as it’s a logical dimension to consider given the two previous strategies in this post.

 

Solr Configuration for Integration with Sitecore

I’ve got a few good Solr and Sitecore blogs around 75% finished, but I’ve been too busy lately to focus on finishing them.  In the meantime, I figure a picture can be worth 1,000 words sometimes so let me post this visual representation of Solr strategies for Sitecore integrations.  One Solr core per index is certainly the best practice for production Sitecore implementations, but now that Solr support has significantly matured at Sitecore a one Solr core for all the Sitecore indexes is a viable, if limited, option:

draft

There used to be a bug (or two?) that made this single Solr core for every Sitecore index unstable, but that’s been corrected for some time now.

More to follow!

Sitecore WFFM IsRemoteActions and Scaling Considerations

One of our Sitecore implementations was experiencing WFFM exceptions when trying to send emails from CD servers.  This post about WFFM configuration over on the Community site for Sitecore ends with a vague answer to fix a scenario such as this via the following:

Please remove the below mentioned setting from “Sitecore.Forms.Config” from all CD instance.

<setting name=”WFM.IsRemoteActions” value=”true” />

we have also added this setting as per sitecore documents but after adding this line on CD instance, mail won’t trigger. You have to remove this setting from “Sitecore.Forms.Config”. Making value as “false” won’t fix the issue.

Unfortunately, this is a fairly common scenario on that Community site . . . there are certainly answers tucked away in there if you’re diligent in your research, but sometimes they can leave you with more questions.  I think the new StackExchange site for Sitecore is a breath of fresh air on this front, but for this specific question about WFFM I was left to puzzle over this WFM.IsRemoteActions setting.  Here’s what I uncovered.

The official Sitecore documentation for setting up WFFM explicitly states (on page 6 of the WFFM 8.1 Update-3 installation guide in our case, but it’s mostly the same for other versions):

In the \Website\App_Config\Include\Sitecore.Forms.Config file,
  • Add the following node to the section:
  • <settings> section: <setting name=”WFM.IsRemoteActions” value=”true” />
The above contradicts the guidance on the Sitecore Community forum, and in our case we found that removing this setting altogether DID correct our problem of WFFM emails not being sent.
I used Reflector (like always!) so get some insight into what was going on, and in the Sitecore.Forms.Core.Handlers.FormDataHandler type is an ExecuteSaveActions method that is the crux of the logic for processing forms that are submitted through WFFM.  Right near the top of this method is a big conditional test based around the WFM.IsRemoteActions setting:
wffm
When one removes this setting from a CD server, this logic short-circuits this ExecuteSaveActions method and prevents the CD node from following through on the WFFM activities . . . and instead the Sitecore EventQueue of the “Core” database is used to store a record of this WFFM action.
At a later point, a server that shares that Sitecore “Core” database AND is configured with the proper WFFM hooks and events definition (from Sitecore’s WFFM documentation on Multiserver configuration) as follows . . .
The  section: 
<!--HOOKS-->
<hooks>
<!--remote events hook-->
<hook type="Sitecore.Form.Core.WffmActionHook, Sitecore.Forms.Core"/>
</hooks>

The  section: 
The <events> section:
<!--Remote events handler-->
<event name="wffm:action:remote">
<handler type="Sitecore.Form.Core.WffmActionHandler, Sitecore.Forms.Core"
method="OnWffmActionEventFired" />
</event>  
  . . . will pick up the EventQueue record and run the WFFM event.  Usually this is the CM server.  This is how it all worked in our case.

By removing (or setting to false) this WFM.IsRemoteActions setting we’re asking less of our Sitecore CD servers and centralizing the WFFM save behaviour to the CM server.  Most Sitecore implementations are eager to find ways to reduce the demands on the CD nodes in an implementation, so this measure could be considered a best-practice if you’re using WFFM and looking to save Sitecore CD cycles (CPU!) for responding to HTTP requests for site visitors instead of performing WFFM post save actions.  To me, it looks like you could take this further and delegate this responsibility to a “processing” server if you’ve got one in your scaled Sitecore architecture . . . or any node that suits you.