Sitecore Publishing Data Through the EventQueue

Our Challenge

We work with a variety of Sitecore implementations at Rackspace, and sometimes we need to do a surgical task without requiring customers to install anything in particular, trigger app pool recycles, or other changes to an environment.  One such example came up today.

We needed to provide insight into a recent Sitecore publishing operation, but the customer didn’t have logging or other instrumentation showing what was published by whom, published where, and when.  While there may be Sitecore Marketplace modules and other solutions to this sort of challenge, they require customizations or at least package installations by the customer — and none of them can go back in time to answer the question about “who published XYZ in the last few hours?”

Preferably, by using the dedicated Publish log set to INFO in Sitecore, one can get at a ton of handy publishing information . . . and this is our general recommendation for implementations (provided logs aren’t retained for so long that they cause a space concern on disk).  In this case, however, the publish log wasn’t an option and so we had to get creative.

Our Solution

For this scenario, knowing the customer is using a global publishing pattern for Sitecore that we like to employ at Rackspace, we turned to the Sitecore EventQueue since we couldn’t rely on the Publish log.  Even though the EventQueue is mainly about propagating events to other Sitecore instances, we can take advantage of the fact that publishing events are some of those operations that run through the EventQueue.  We can run a query like the following to get a rough handle on what has been recently published:

SELECT Created as [Publish Timestamp]
        --, Username as [Initiator] -- not for distribution!
        , CAST(SUBSTRING(HASHBYTES('SHA1', UserName),1,3) as bigint)  as [Hashed Initiator]
        , IIF(CHARINDEX('"TargetDatabaseName":"pubTarget1"', InstanceData)>0,'1','0') AS [To 1]
        , IIF(CHARINDEX('"TargetDatabaseName":"pubTarget2"', InstanceData)>0,'1','0') AS [To 2]
        , IIF(CHARINDEX('"TargetDatabaseName":"pubTarget3"', InstanceData)>0,'1','0') AS [To 3]
        , IIF(CHARINDEX('"TargetDatabaseName":"pubTarget4"', InstanceData)>0,'1','0') AS [To 4]
       , (LEN(InstanceData) - LEN(REPLACE(InstanceData, '"LanguageName"', ''))) / LEN('"LanguageName"') as [# of Langs] --this is a proxy for how heavy a publish is
        , InstanceData AS [Raw Data]
FROM EventQueue
WHERE EventType LIKE 'Sitecore.Publishing.StartPublishingRemoteEvent%'
ORDER BY Created DESC

Here’s a glance of what the data might look like . . .

pubDump

Explanations & Caveats

Regarding the query, we use an SHA hash of the Sitecore login instead of showing the login (Username) in plain text.  A plain text username could be a security concern, so we don’t want to email that or casually distribute it.  Instead of generic “pubTarget1” etc, one should name specific publishing targets defined in the implementation.  This tells us if a publish went out to all the targets or just selectively.  Our use of “# of Langs” is a way of seeing how much data went out with the publish . . . it’s not perfect, but in most cases we’ve found counting the number of “LanguageName” elements in the JSON to be a reasonable barometer.  When in doubt, the Raw Data can be used to get at lots of other details.  I’d use a JSON viewer to format the JSON; it will look something like:

{
  "ClientLanguage": "en",
  "EventName": "publish:startPublishing",
  "Options": [
    {
      "CompareRevisions": true,
      "Deep": false,
      "FromDate": "\/Date(1464134576584)\/",
      "LanguageName": "de-DE",
      "Mode": 3,
      "PublishDate": "\/Date(1464183788221)\/",
      "PublishRelatedItems": false,
      "PublishingTargets": [
        "{8E080626-DDC3-4EF4-A1A1-F0BE4A200254}"
      ],
      "RecoveryId": "cd00ba58-61cb-4446-82ae-356eaae71957",
      "RepublishAll": false,
      "RootItemId": "afde64e9-7176-43ad-a1f2-89162d8ba4cb",
      "SourceDatabaseName": "master",
      "TargetDatabaseName": "web"
    }
  ],
  "PublishingServer": "SCMASTER-1-CLONE",
  "StatusHandle": {
    "instanceName": "SCMASTER-1-CLONE",
    "m_handle": "e3b5831f-2698-41b5-9bf9-3d88a5238e5a"
  },
  "UserName": "sitecore\\notalegituser"
}

One key caveat to this SQL approach is that data in the Sitecore EventQueue doesn’t remain for long.  Depending on how one tunes their Sitecore.Tasks.CleanupEventQueue agent, EventQueue records might be available for only a few hours.  It’s certainly not a good source for long term publishing history!  This is another reason why using the Sitecore Publishing log is really the way to go — but again, that text log isn’t available to us in this scenario.

Basic Azure Benchmarking for Sitecore

We used this approach to run through some database performance benchmarking for Azure for a real production Sitecore implementation (Sitecore version 8.0).  The idea is to use real resource utilization statistics to estimate DTU requirements and Azure SQL database tiers for if/when this workload shifts to Azure.  This was for a real site, not LaunchSitecore or JetStream or Habitat 🙂

We ran the tests for an hour, then six hours, then at different times of the day . . . yet the results were consistent each time.  The Standard S2 Tier was the recommendation based on our observed workload for every database (Core, Master, and Web databases).  This implementation used MongoDB for session state, however, so this doesn’t include SQL Server session state into the calculation.

There were usually around 9% of the time when the DTUs would spike and exceed the Standard S2 capacity, so I’m a bit concerned and curious about what that would translate into if we were running 100% in Azure (probably slow perf for those times — which roughly appeared to align with Sitecore publishing — which means the Standard S2 Tier might not really cut it).  We could see Content Authors complaining of slow perf, stalled content updates, etc . . . so this is just a starting point.

Sitecore Config for Publishing to Remote Data Centers

A couple months back, I shared our Rackspace publishing strategy for Sitecore in multiple data centers.  It’s fast, reliable, and we’ve had good experiences taking that approach.

This diagram shows what I’m talking about:2016-02-23-Sitecore-Enterprise-Architecture-For-Global-Publishing

That post at developer.Rackspace.com is only part of the picture, however, and I wanted to return to this topic since I recently had a chance to revisit our Sitecore configuration for a customer running Sitecore 7.2.  Referring to the diagram above, we needed to patch the client Sitecore configuration to accommodate the addition of the proxyWebForHongKong and proxyWebForDallas databases; we were finishing up the provisioning of a clean environment, and had replication running, but we needed to connect the two remote data centers so they fully participated with publishing.

Wading through the Sitecore documentation to find these key architectural pieces is tough, and you would need to synthesize at least the Search Scaling Guide, the Scaling Guide, and the Search and Indexing Guide to get to all this material.  For my own benefit, probably more than for any potential reader, I’m consolidating this here.

It’s important when one does a publish in Sitecore that it triggers index updates on the remote servers, as well as move the content to the appropriate Web database.  This is the crux of what I’m up to here.

At a high level, we had to complete these steps:

  1. Add the database connection strings for the two proxy databases serving our two remote data centers
  2. Add the two proxy database definitions under the <sitecore> node so they can be referenced as Sitecore Publishing Targets etc
  3. Add an index update strategy specific for each proxy database
    • This only needs to be on the environments where the index is used; for example, sitecore_web_index configurations would only be relevant on the CD servers
  4. Connect these new indexing strategies to indexes that need to be kept consistent between data centers
    • This customer is making some use of Lucene, so there is a sitecore_web_index that’s relevant here
    • This customer is using an alternative search provider, too, but I’m focusing on the out-of-the-box configuration as it relates to Lucene here

To implement all of the above, we would typically steer customers to a Sitecore patch configuration strategy such as this one by Bryan “Flash” Furlong.  For simplicity, however, we at Rackspace created a subdirectory App_Config/Include/Rackspace and placed a single config patch file in it.  We don’t own the solution structure, and work to isolate our changes as much as possible from a customer implementation.  So having a single file we can reference is handy for our use case — but not what I’d consider an absolute best practice.

Here’s a nice summary on using subdirectories for Sitecore configurations, in case that’s new to you; here’s Sitecore’s official configuration patch documentation if you’d like to review that: https://sdn.sitecore.net/upload/sitecore6/60/include_file_patching_facilities_sc6orlater-usletter.pdf.

Unfortunately, the <connectionStrings> definitions are not under the Sitecore node and so we can’t patch those the same way, but everything else is in the Sitecore configuration patch below.  I’m not going to detail the following XML, but summarize by saying steps 2 through 4 above are handled here.  Other relevant settings, such as EnableEventQueues etc, are set elsewhere as the Sitecore scaling guide has been broadly applied to this setup already . . . the following is just the changes we needed to make in this particular case, but I’m hoping the example is helpful:

<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/">
  <!-- 
  We cannot patch in a connection string, so be sure to explicitly add connectionstrings
  pointing to the appropriate proxy databases
  <add name="ProxyWebForHongKong" connectionString="Trusted_Connection=True;Data Source=SQLCluster;Database=ProxyWebForHongKong" />
  <add name="ProxyWebForDallas" connectionString="Trusted_Connection=True;Data Source=SQLCluster;Database=ProxyWebForDallas" />

-->
  <sitecore>

    <!-- Add the database configurations that match the connectionstring entries
    (The id of the <database> node must be equal to the name of your connection string node)-->
    <databases>
      <database id="ProxyWebForHongKong" singleInstance="true" type="Sitecore.Data.Database, Sitecore.Kernel">
        <param desc="name">$(id)</param>
        <icon>Network/16x16/earth.png</icon>
        <securityEnabled>true</securityEnabled>
        <dataProviders hint="list:AddDataProvider">
          <dataProvider ref="dataProviders/main" param1="$(id)">
            <disableGroup>publishing</disableGroup>
            <prefetch hint="raw:AddPrefetch">
              <sc.include file="/App_Config/Prefetch/Common.config" />
              <sc.include file="/App_Config/Prefetch/Webdb.config" />
            </prefetch>
          </dataProvider>
        </dataProviders>
        <proxiesEnabled>false</proxiesEnabled>
        <proxyDataProvider ref="proxyDataProviders/main" param1="$(id)" />
        <archives hint="raw:AddArchive">
          <archive name="archive" />
          <archive name="recyclebin" />
        </archives>
        <Engines.HistoryEngine.Storage>
          <obj type="Sitecore.Data.SqlServer.SqlServerHistoryStorage, Sitecore.Kernel">
            <param connectionStringName="$(id)" />
            <EntryLifeTime>30.00:00:00</EntryLifeTime>
          </obj>
        </Engines.HistoryEngine.Storage>
        <Engines.HistoryEngine.SaveDotNetCallStack>false</Engines.HistoryEngine.SaveDotNetCallStack>
        <cacheSizes hint="setting">
          <data>20MB</data>
          <items>10MB</items>
          <paths>500KB</paths>
          <itempaths>10MB</itempaths>
          <standardValues>500KB</standardValues>
        </cacheSizes>
      </database>
      <database id="ProxyWebForDallas" singleInstance="true" type="Sitecore.Data.Database, Sitecore.Kernel">
        <param desc="name">$(id)</param>
        <icon>Network/16x16/earth.png</icon>
        <securityEnabled>true</securityEnabled>
        <dataProviders hint="list:AddDataProvider">
          <dataProvider ref="dataProviders/main" param1="$(id)">
            <disableGroup>publishing</disableGroup>
            <prefetch hint="raw:AddPrefetch">
              <sc.include file="/App_Config/Prefetch/Common.config" />
              <sc.include file="/App_Config/Prefetch/Webdb.config" />
            </prefetch>
          </dataProvider>
        </dataProviders>
        <proxiesEnabled>false</proxiesEnabled>
        <proxyDataProvider ref="proxyDataProviders/main" param1="$(id)" />
        <archives hint="raw:AddArchive">
          <archive name="archive" />
          <archive name="recyclebin" />
        </archives>
        <Engines.HistoryEngine.Storage>
          <obj type="Sitecore.Data.SqlServer.SqlServerHistoryStorage, Sitecore.Kernel">
            <param connectionStringName="$(id)" />
            <EntryLifeTime>30.00:00:00</EntryLifeTime>
          </obj>
        </Engines.HistoryEngine.Storage>
        <Engines.HistoryEngine.SaveDotNetCallStack>false</Engines.HistoryEngine.SaveDotNetCallStack>
        <cacheSizes hint="setting">
          <data>20MB</data>
          <items>10MB</items>
          <paths>500KB</paths>
          <itempaths>10MB</itempaths>
          <standardValues>500KB</standardValues>
        </cacheSizes>
      </database>
    </databases>


    <!-- Add the specific update strategies for each data center-->
    <contentSearch>
      <indexUpdateStrategies>
        <onPublishEndAsyncProxyHongKong type="Sitecore.ContentSearch.Maintenance.Strategies.OnPublishEndAsynchronousStrategy, Sitecore.ContentSearch">
          <param desc="database">ProxyWebForHongKong</param>
          <CheckForThreshold>true</CheckForThreshold>
        </onPublishEndAsyncProxyHongKong>
        <onPublishEndAsyncProxyDallas type="Sitecore.ContentSearch.Maintenance.Strategies.OnPublishEndAsynchronousStrategy, Sitecore.ContentSearch">
          <param desc="database">ProxyWebForDallas</param>
          <CheckForThreshold>true</CheckForThreshold>
        </onPublishEndAsyncProxyDallas>
      </indexUpdateStrategies>




      <!-- Add the additional indexing strategies to the sitecore_web_index, so our additional regions get the messages to update Lucene-->
      <configuration type="Sitecore.ContentSearch.ContentSearchConfiguration, Sitecore.ContentSearch">
        <indexes hint="list:AddIndex">
          <index id="sitecore_web_index" type="Sitecore.ContentSearch.LuceneProvider.LuceneIndex, Sitecore.ContentSearch.LuceneProvider">
            <strategies hint="list:AddStrategy">
              <strategy ref="contentSearch/indexUpdateStrategies/onPublishEndAsyncProxyHongKong" />
              <strategy ref="contentSearch/indexUpdateStrategies/onPublishEndAsyncProxyDallas" />
            </strategies>
          </index>
        </indexes>
      </configuration>
      
      
    </contentSearch>
  </sitecore>
</configuration>