If you’ve worked with Sitecore and Solr, you’re no stranger to the Solr Admin UI. There are great aspects to that UI, and some exciting extension points with Sitecore implications too, but I know one element of that Solr UI that causes some head-scratching . . . the “optimize now” feature:
The inclusion of the causes people to think “something is wrong . . . red bad . . . must click now!”
What is this Optimize?
I’m writing this for the benefit of Sitecore developers who may not be coming at this from a deep search background: do not worry if your Solr cores show this icon encouraging you to optimize now. Fight that instinct. For standard Sitecore Solr cores that are frequently updating, such as the sitecore_core_index, sitecore_master_index, sitecore_analytics_index, and — depending on your publishing strategy — sitecore_web_index, one may notice these cores almost always appear with this “optimize now” button in the Solr Admin UI. Other indexes, too, may be heavily in use depending on how your Sitecore implementation is structured. If you choose the optimize now option and then reload the screen, you’ll see the friendly green check mark next to Optimized and you should notice the Segment Count drops to a value of 1:
Segments are Lucene’s (and therefore Solr’s) file system unit. On disk, “segments” are where data is durably stored and organized for Lucene. In Solr version 5 and newer, one can visualize Segment details for each Solr Core via the Solr Admin UI Segments Info screen. This shows 2 Segments:
If your Segment count is greater than 1, the Solr Admin UI will report that your Solr Core is in need of Optimization (with that somewhat alarmingicon). The Optimize operation re-organizes all the Segments in a Core down to just one single Segment . . . and for busy Sitecore indexes this is not something to do very often (or at all!).
To track an optimize operation through at the file system level, consider this snapshot of the /data/index directory for a sitecore_master_index before performing optimization; note the quantity of files:
After the optimization, consider the same file system:
When in doubt, don’t optimize
Solr’s optimize now command is like cleaning up a house after a party. It reduces the clutter and consolidates the representation of the Solr Core on disk to a minimal footprint. The problem, is, however, optimizing takes longer the larger the index is — so the act of optimizing may produce very non-optimal performance while it’s doing the work. Solr has to read a copy of the entire index and restructure the copy into a single Segment. This can be slow. Caches must be re-populated after an optimization, too, compounding the perf impact. To continue the analogy of the optimize now being like cleaning after a party, imagine cleaning up during a party; maybe you pause the music and ask everyone to leave the house for 20 minutes while you straighten everything up. Then everyone returns and the partying resumes, with the cleaning being a mostly useless delay.
To draw from the official Solr documentation at https://wiki.apache.org/solr/SolrPerformanceFactors#Optimization_Considerations:
“Optimizing is very expensive, and if the index is constantly changing, the slight performance boost will not last long. The trade-off is not often worth it for a non static index.”
For those Sitecore indexes in Solr that are decidedly non-static, then, ignore that “optimize now” feature of the Solr Admin UI. It’s better to pay attention to Solr “Merge Policies” for a rules based approach to maintaining Segments; this is a huge topic, one left for another time.
When to consider optimizing
Knowing more about the optimization process in Solr, then, we can think about when it may be appropriate to apply the optimize command. For external data one is pulling into Solr, for example, a routine optimization could make sense. If you have a weekly product data load, for instance, where 10,000 items are regularly loaded into a Solr Core and then they remain un-changed, optimization after the load completes makes a lot of sense. That data in the Core is not dynamic. When the data load completes, you could include an API call to Solr that triggers the optimize.
An API call to trigger an optimize in Solr is available through an update handler call : http://solr-server:8983/solr/sitecore_product_catalog/update?stream.body=<optimize><query>*:*</query></optimize>
Sitecore search has a very checkered past with the Lucene Optimize operation. I’ve worked on big projects that were crippled by too frequent optimizing work like that discussed in Uli Weltersbach’s post. We ended up customizing the Optimize methods to be no-op statements, or another variation like that. For additional validation, check out the Lucene docs on the optimize method:
“This method has been deprecated, as it is horribly inefficient and very rarely justified.”
Since Solr sits on top of Lucene, the heritage of Lucene’s optimize is still relevant and — in the Solr Admin UI — we see a potential performance bottleneck button ripe for clicking . . . fight that instinct!