Sitecore Publishing Data Through the EventQueue

Our Challenge

We work with a variety of Sitecore implementations at Rackspace, and sometimes we need to do a surgical task without requiring customers to install anything in particular, trigger app pool recycles, or other changes to an environment.  One such example came up today.

We needed to provide insight into a recent Sitecore publishing operation, but the customer didn’t have logging or other instrumentation showing what was published by whom, published where, and when.  While there may be Sitecore Marketplace modules and other solutions to this sort of challenge, they require customizations or at least package installations by the customer — and none of them can go back in time to answer the question about “who published XYZ in the last few hours?”

Preferably, by using the dedicated Publish log set to INFO in Sitecore, one can get at a ton of handy publishing information . . . and this is our general recommendation for implementations (provided logs aren’t retained for so long that they cause a space concern on disk).  In this case, however, the publish log wasn’t an option and so we had to get creative.

Our Solution

For this scenario, knowing the customer is using a global publishing pattern for Sitecore that we like to employ at Rackspace, we turned to the Sitecore EventQueue since we couldn’t rely on the Publish log.  Even though the EventQueue is mainly about propagating events to other Sitecore instances, we can take advantage of the fact that publishing events are some of those operations that run through the EventQueue.  We can run a query like the following to get a rough handle on what has been recently published:

SELECT Created as [Publish Timestamp]
        --, Username as [Initiator] -- not for distribution!
        , CAST(SUBSTRING(HASHBYTES('SHA1', UserName),1,3) as bigint)  as [Hashed Initiator]
        , IIF(CHARINDEX('"TargetDatabaseName":"pubTarget1"', InstanceData)>0,'1','0') AS [To 1]
        , IIF(CHARINDEX('"TargetDatabaseName":"pubTarget2"', InstanceData)>0,'1','0') AS [To 2]
        , IIF(CHARINDEX('"TargetDatabaseName":"pubTarget3"', InstanceData)>0,'1','0') AS [To 3]
        , IIF(CHARINDEX('"TargetDatabaseName":"pubTarget4"', InstanceData)>0,'1','0') AS [To 4]
       , (LEN(InstanceData) - LEN(REPLACE(InstanceData, '"LanguageName"', ''))) / LEN('"LanguageName"') as [# of Langs] --this is a proxy for how heavy a publish is
        , InstanceData AS [Raw Data]
FROM EventQueue
WHERE EventType LIKE 'Sitecore.Publishing.StartPublishingRemoteEvent%'
ORDER BY Created DESC

Here’s a glance of what the data might look like . . .

pubDump

Explanations & Caveats

Regarding the query, we use an SHA hash of the Sitecore login instead of showing the login (Username) in plain text.  A plain text username could be a security concern, so we don’t want to email that or casually distribute it.  Instead of generic “pubTarget1” etc, one should name specific publishing targets defined in the implementation.  This tells us if a publish went out to all the targets or just selectively.  Our use of “# of Langs” is a way of seeing how much data went out with the publish . . . it’s not perfect, but in most cases we’ve found counting the number of “LanguageName” elements in the JSON to be a reasonable barometer.  When in doubt, the Raw Data can be used to get at lots of other details.  I’d use a JSON viewer to format the JSON; it will look something like:

{
  "ClientLanguage": "en",
  "EventName": "publish:startPublishing",
  "Options": [
    {
      "CompareRevisions": true,
      "Deep": false,
      "FromDate": "\/Date(1464134576584)\/",
      "LanguageName": "de-DE",
      "Mode": 3,
      "PublishDate": "\/Date(1464183788221)\/",
      "PublishRelatedItems": false,
      "PublishingTargets": [
        "{8E080626-DDC3-4EF4-A1A1-F0BE4A200254}"
      ],
      "RecoveryId": "cd00ba58-61cb-4446-82ae-356eaae71957",
      "RepublishAll": false,
      "RootItemId": "afde64e9-7176-43ad-a1f2-89162d8ba4cb",
      "SourceDatabaseName": "master",
      "TargetDatabaseName": "web"
    }
  ],
  "PublishingServer": "SCMASTER-1-CLONE",
  "StatusHandle": {
    "instanceName": "SCMASTER-1-CLONE",
    "m_handle": "e3b5831f-2698-41b5-9bf9-3d88a5238e5a"
  },
  "UserName": "sitecore\\notalegituser"
}

One key caveat to this SQL approach is that data in the Sitecore EventQueue doesn’t remain for long.  Depending on how one tunes their Sitecore.Tasks.CleanupEventQueue agent, EventQueue records might be available for only a few hours.  It’s certainly not a good source for long term publishing history!  This is another reason why using the Sitecore Publishing log is really the way to go — but again, that text log isn’t available to us in this scenario.

Faster Sitecore Publishing for Global Implementations

My post over at the main Rackspace developer blog about Sitecore publishing strategies was something I’d like to link to from here.

The crux of it is this diagram illustrating how we can make use of SQL Replication to move Sitecore content between data centers, but still take advantage of Sitecore publishing to promote content to the live site.  The blog at Rackspace discusses it in some detail, so I’ll just point out that the idea is to let SQL Server Replication do the hard (and potentially slow) work of synchronizing content around the planet, while Sitecore publishing can be the plain old HTTP publish operation we know and love from Sitecore:

2016-02-23-Sitecore-Enterprise-Architecture-For-Global-Publishing