Walkthrough of Solr Query Analysis for Sitecore

Anton Tishchenko wrote a good quick piece on “stemming” for Solr, and I wanted to build on what he shared.

Solr is generally way underutilized by the Sitecore implementations I’ve seen. It makes for plenty of territory for blogging, though, so maybe bit-by-bit the word will get out about what a powerful distributed system Solr really is.

Let me setup a specific example using Sitecore 9 update-2 with Commerce. I’ll use the CatalogItemsScope Solr core, but you could review most any Solr core with the standard Sitecore schema.

Let me pause here to explain where the default Sitecore Commerce Solr Configuration defines this search index, because it took some digging. In the  . . . SIF\Configuration\Commerce\Solr\sitecore-commerce-solr.json file you’ll find:

// The names of the cores to create
“Customers.Name”: “[concat(parameter(‘CorePrefix’), ‘CustomersScope’)]”,
“Orders.Name”: “[concat(parameter(‘CorePrefix’), ‘OrdersScope’)]”,
“CatalogItems.Name”: “[concat(parameter(‘CorePrefix’), ‘CatalogItemsScope’)]”,

Using a fresh IaaS install of Sitecore 9 with Commerce, I’ll go to the Solr admin interface. Select the CatalogItemsScope Solr core from the dropdown list on the left side navigation. Choose the “Analysis” option to access this powerful way of evaluating two key operations one does with Solr: indexing and querying.  Solr defines different sets of analyzers for these operations, sort of like how Sitecore exposes the httpRequestBegin pipeline or other extensibility points that projects are always customizing. For this exercise, I’ll focus on the Query operation. The documentation on this Analysis screen has a lot more information on this, if you’re interested.

Enter the search phrase “an AWESOME television!” in the textbox for the Field Value(Query), and then specify the text_general as the Fieldname to Analyse: step1

Solr is going to run “an AWESOME television!” through the analysis process and show the results as they progress through each step of that analysis. For this write-up, I’ll uncheck the “Verbose Output” checkbox — but definitely play around with that feature as it shows the offsets and ordinal positions as Solr works it’s magic through each transformation.

After clicking the blue “Analyse Values” button you’ll see output like the following:

step2

Each row of output starts with an abbreviation (ST, SF, etc). That’s the key to which process of the “analyzer” has run; the pipe-separated list to the right of the abbreviation shows the search phrase as it’s progressing through Solr’s transformation logic.

  1. ST is the StandardTokenizerFactory. I removes the “!” punctuation mark, but otherwise doesn’t make any changes to the search query.
  2. SF is the StopFilterFactory. This would remove words listed in the CatalogItemsScope\conf\stopwords.txt file. In this case, with an out-of-the-box Sitecore 9 Commerce installation, there are no stopwords specified. This doesn’t change our search query in any way.
  3. SGF is the SynonymGraphFilterFactory. This component reads a list of synonyms from CatalogItemsScope\conf\synonyms.txt . . . and “television” is one of the samples they provide in that file, so now our query includes those synonyms
    • synonyms.txt includes this entry for example purposes:
      • Television, Televisions, TV, TVs
  4. The final step, LCF, is to run through LowerCaseFilterFactory which is takes “AWESOME” to “awesome” to ensure case isn’t evaluated in the query.

To drive home the point of this example, change the Fieldname to “text_en” from “text_general” and click the blue “Analyse Values” button. The results are different because the text_general and text_en Solr fields have a different set of components defined for the Query operation. Specifically, text_en adds:

  • EPF (EnglishPossessiveFilterFactory)
  • SKF (KeywordMarkerFilterFactory)
  • PSF (PorterStemFilterFactory)

Here’s what the Solr Analysis screen looks like:

step3

There is a lot that could be said about all this, and I’ll probably build on this in a future blog post . . . but I want to return to Anton’s original point about the Porter stemmer and highlight how powerful the Porter stemmer can be. Solr’s text_general is significantly different than text_en, and hopefully I’ve shed light on precisely how they differ on the query side. The docs at http://snowball.tartarus.org/algorithms/porter/stemmer.html review this in some detail. I’ve also used this online Porter stemmer to quickly see how words are decomposed into their key fragments for search relevancy. You don’t really need that online stemming tool, however, since you now see how to use the Solr administrative UI with the standard “text_en” field to have your own Porter stemming sandbox.

Advertisements

Powershell installation of Sitecore Performance Counters

Today was the day I bit the bullet and automated this bit of administrative work that we do for all our Sitecore projects. I think it was the fresh crop of 9 new Sitecore IIS nodes that needed this attention that was the final impetus.

Sitecore Performance Counters

This blog reviews all the perf counters Sitecore makes available.  The documentation is a bit thin from Sitecore, since these counters were added to the product several years ago and they apparently don’t get much attention.  They do require special effort to setup as part of your IIS environment, too, so it’s easy to overlook them.  They’re still very handy, though!  A future blog post could discuss the ones most worthy of attention (in my humble opininion).

You can download the .zip file with an installer for the Sitecore specific performance counters at http://sdn.sitecore.net/upload/sdn5/faq/administration/sitecorecounters.zip.  These files are what you’ll end up with:

counters.JPG

The .EXE runs through all the .XML files in the directory that enumerate perf counters relevant to Sitecore, and adds the counter to the computer.  It’s easy.

Scripted Installation

One would typically click the SitecoreCounters.EXE, press the {enter} key, and then the performance counters would quickly install to the server.  Doing this for many servers, or as part of an automated provisioning effort, could be tedious . . . we actually invoke this via remote magic with Ansible (but that’s not the focus of this blog post).  I do want to go over how we automated this, however, since it wasn’t the smooth sailing I expected it to be.

I’ve posted a Powershell baseline for installing Sitecore perf counters to this github gist.  It was straight-forward Powershell until I found the .EXE was running properly, but no performance counters were actually being added.  I had to look at the decompiled .EXE source to figure out why.  Here’s the Main method of the .EXE; I’ve bolded the instruction that was giving me problems:

        private static void Main(string[] args)
        {
            Console.WriteLine("Press enter to begin creating counters");
            Console.ReadLine();
            Console.WriteLine("Creating Sitecore counters");
            string[] files = args;
            if (args.Length == 0)
            {
                files = Directory.GetFiles(Environment.CurrentDirectory, "*counter*.xml");
            }
            foreach (string str in files)
            {
                new Counters(str).Execute();
            }
            Console.WriteLine("Done");
            Console.ReadLine();
        }

The “Environment.CurrentDirectory” in Powershell is by default something like C:\Users\UserName, but my Powershell was using a staging location to download and unzip the perf counter executable and related xml:

$zipFileURI = "https://your.cdn.with.the.Sitecore.zip.resources/sitecorecounters%207.5.zip"
$stageFolder = "C:\staging" 
if( !(test-path $stageFolder) )
{
    mkdir $stageFolder
}
$downLoadZipPath = $stageFolder + "/SitecoreCounters.zip"  
Invoke-WebRequest -Uri $zipFileURI -OutFile $downLoadZipPath
Add-Type -AssemblyName System.IO.Compression.FileSystem
[System.IO.Compression.ZipFile]::ExtractToDirectory($downLoadZipPath, $stageFolder)

Based on what I saw in Reflector, I needed to change my staging directory or to assign the Environment.CurrentDirectory to be $stageFolder in order for the perf counters to properly be installed to the new environment.  I really wanted to use a configurable location instead of some default Powershell directory, so I sifted through Powershell documentation and figured out the magic API call was [System.IO.Directory]SetCurrentDirectory; this example makes it apparent:

Write-Output "Default directory:"
 [System.IO.Directory]::GetCurrentDirectory()
 [System.IO.Directory]::SetCurrentDirectory($stageFolder + "\sitecorecounters 7.5")
 Write-Output "Directory changed to:"
 [System.IO.Directory]::GetCurrentDirectory()

With that in place, the context of the Sitecore perf counter .EXE was set properly and the program could locate all the counters defined in the XML files.

The end result is our new Sitecore specific perf counters are ready:

counters2.JPG

You will need to recycle the Sitecore App Pool in IIS for this to take effect, something like this should suffice:

$theSite = "Cool Sitecore Site"
$thePool = (Get-Item "IIS:\Sites\$theSite"| Select-Object applicationPool).applicationPool
Restart-WebAppPool $thePool