Search User Group Talk

I did a talk for the New England Search Technology (NEST) user group yesterday.  Even though the meetings in Boston are a good 90+ minutes away for me, I try to make the trip there a couple times a year since the topics are usually very relevant to what I’m up to with Sitecore.   I offered to do a talk bridging the Sitecore and Search domains, and they took me up on it.  The audience is typically serious Solr and ElasticSearch technologists, some regular committers to those projects, so it was fun to combine those domains with Sitecore’s relative immaturity when it comes to the platform of search.

I don’t want to just post the powerpoint presentations and say “have at it” since the presentations require context to make sense (and it is 2 different powerpoints to sift through).  I’m a non-conventional guy, and I try to avoid powerpoint with bullet points galore and hyper-structured material.  The talk was more a conversation about search and the challenges unique to search with Sitecore (and other CMS platforms that build on Lucene).

My premise relied heavily on Plato’s Allegory of the Cave where I was a “prisoner” experiencing the world of search through the “cave” of Sitecore.  In reality, search is an enormous space with lots of complexity and innovation across the technology . . . but in terms of Sitecore, we experience a filtered reality (the shadows on the cave wall).  This graphic represents a traditional Allegory of the Cave concept:

cave

I’m not going to summarize the entire talk, but I want to state for the record this isn’t a particular criticism of Sitecore — it’s just the nature of working with a product that is built on top of other products.  In Sitecore’s case with Lucene, for example, there’s a .Net port of Lucene (Lucene.Net) and a custom .Net event pipeline in Sitecore used to orchestrate Lucene.Net activities via an execution context of IIS, and so on.

Understanding Search technology with Sitecore is always going to be filtered through the lens of Sitecore.  My talk was addressing that fact, and how Search (with a capital “S”) is a far broader technology than what is put to use in Sitecore.  Understanding the world of Search beyond the confines of Sitecore can be very revealing, and opens up a lot of opportunity for improving a Sitecore implementation.

I should also note that my talk benefited from insightful input from Sitecore-Coveo Jeff, Tim “Sitecore MVP” Braga, and Al Cole from Lucidworks and others.  The host, Monster.com, shared a great meeting space for us and opened the evening with a quick survey of their specific experience with search using both Solr, Elasticsearch, and their own proprietary technology.

While not specific to Sitecore, one link I wanted to share in particular was the talk about Bloomberg’s 3-year journey with Solr/Lucene; it’s a talk from the Berlin Buzzwords conference a couple weeks ago and thoroughly worth watching.  Getting search right can take persistence and smart analysis, regardless of the platform.  With Sitecore, too often implementations assume Search will just work out of the box and not appreciate that it’s a critical set of features worthy of careful consideration.

I’ll have a few follow-up posts covering more of the points I made in my talk; some lend themselves to distinct blog posts instead of turning this into a sprawling re-hash of the entire evening.

Sitecore Live Session Agents

It can be fun when an hour-long meeting is cancelled at the last minute; you suddenly have a chance to tackle your personal “back log” right?

This note on xDB processing is part of my “back log” so without further ado . . . let me begin by noting how Sitecore is getting better at designing modular elements of functionality. They’ve been at it for years.

The Menu of Sitecore Server Roles

The notion of a “Sitecore server” being a single defined resource is a thing of the past.   Things used to be either Content Management (CM) or Content Delivery (CD); now, CM or CD servers can be supplemented with dedicated publishing servers, the xDB 3 amigos (processing, aggregation, and reporting servers), search work-horses (via Solr, Coveo, or ?).  One can combine elements together so you could have processing + aggregation, or your dedicated publishing server could run Solr for search.  I’m not going to discuss the databases that support all of this, MongoDB or SQL Server, but the count of databases and collections is also ever-increasing.

Then, there are the less well-publicized “mini server roles” like the one that does GeoIP resolution work in the background.  This post is mostly about another specific “mini” Sitecore server known as the Live Session agent.

There is a clear pattern from Sitecore that “servers” are combinations of several slices of different functionality.  I think this shows a maturation in the product, and realization that different implementations will be making use of Sitecore components in different ways.  It makes for flexibility and lots of tuning opportunities.  It keeps a Sitecore Architect employed, I suppose!  I do worry about the stability of all these permutations, however, and along with this pattern is the bloat of configuration files and the potential for misconfiguration.  There are some noble efforts at improving this, though, and I think the complexity of configuration will improve in future Sitecore releases.

The Live Session “Role”

While not truly a “role” in the standard Sitecore sense, the Live Session Agent extends a Content Delivery (CD) node to also process xDB automation states.  This is to assist in monitoring and processing the queue of contacts with automation states with timeouts. One could run a dedicated server just for this Live Session Agent activity, I suppose, or employ a set of CD servers to share the load.  From the documentation:

You can enable the background agent only on a subset of instances, such as on a dedicated instance or a set of instances that only process the timeout and not HTTP requests.

In reviewing the granular tuning settings exposed by Sitecore, Live Session Agent optimization could be a bit of a dark art.  Should be some fun ahead, though, and a chance to enlist additional server resources to alleviate a potential bottleneck around engagement automation state processing.

The Live Session Agent depends on two config files (Sitecore.EngagementAutomation.LiveSessionAgent.config and Sitecore.EngagementAutomation.LiveSessionAgent.Processing.config) and a single .dll (Sitecore.EngagementAutomation.LiveSessionAgent.dll) that can be downloaded under the Resources section here.  The documentation is sufficient to get you started and, while I find it light on practical guidance, there is an obvious effort to expose all the knobs and switches one might require for this sort of background process management (threads to allocate etc).

I suspect this Live Session Agent will become a mainstay in the performance and scaling efforts for more complicated Sitecore xDB implementations — but most any xDB project could benefit from additional resources devoted to engagement automation state processing.  I think the trick is not overwhelming the CD server that you ask to pitch-in and help.

Now, with the other 30 minutes I found by that meeting being cancelled, I’m going to review what all is going on with Sitecore.EngagementAutomation.LiveSessionAgent.dll with Reflector . . .