The Odyssey of Sitecore Commerce Staging’s Reliance on SQL CE 3.5

The Problem

I recently completed a fun journey triaging a set of self-inflicted wounds around Sitecore Commerce Staging.  The rest of Sitecore Commerce 8.2.1 ran as expected except for when I setup the Sitecore Commerce Staging between a CM and CD tier . . . I would get an odd “The system cannot find the file specified” dialog box immediately after trying to start a basic Staging replication project:

cannot find the File

And when I say that pop-up appears immediately after clicking the “Start Replication” button, I mean it — we’re talking almost instantaneously.

Background

I’ll pause here just to lay out why one cares about Commerce Server Staging with Sitecore.  The Commerce Staging documentation is on MSDN in a variety of places, and Sitecore is using this sub-system to manage Commerce specific content promotion (such as product catalog changes, promotion codes, etc).  As explained in the summary on CommerceSDN.Sitecore.net :

The role of Commerce Server Staging is to move Commerce Server data between environments or sites.

If the master and web environments are pointing to the same Commerce Server site instance, as soon as data is changed on the master environment, it will be published to web. To ensure that Commerce Server data is not published unexpectedly, it is strongly recommended that you have one Commerce Server site for your Content Management (CM) environment, a separate Commerce Server site for your Content Delivery (CD) environment, and then use Commerce Server [Staging] to move the data.

I added the [Staging] to the final sentence above, as I think it’s pretty key to understanding the topic.

Commerce Server Staging is the vehicle one uses to manage Commerce data the same way we manage marketing data in the rest of Sitecore.  We create and update Sitecore CMS content in a “master” SQL Server database and then promote it to “web” with a Sitecore Publish operation . . . then we create and update Sitecore commerce content via the Sitecore CM and promote changes to the live website via Sitecore Staging operations.  The Sitecore Publish process has been extended with Sitecore Commerce to include Staging, so you can run it as a single integrated process.  Note the “Commerce Server Staging” checkbox included in the Sitecore Publish dialog below:

StagingPublishDialog

It’s usually magic and just works — but I was not able to run any type of Commerce Staging operation, so I needed to peel back the curtain and learn a bit more about how to troubleshoot Commerce Staging.

The Solution

This write-up on Monitoring Commerce Staging laid the foundation for my eventual resolution.  Before exploring that angle, however, I reviewed this write-up with some basic pitfalls on Commerce Staging . . . of course there’s this gem about DCOM permissions (& my write-up on applying that setting when the GUID isn’t readily available) . . . Sitecore’s KB site has some suggestions and the Community.Sitecore.net has some suggestions. There are hundreds of nooks and crannies one can investigate here, so it took me some time to discover what was going on.  Hopefully, my adventure can inform others, so here goes . . .

  1. The link about monitoring Commerce Staging (https://msdn.microsoft.com/en-us/library/ms961837(v=cs.70).aspx) talks about ways to configure what is logged and how to view the events of the Staging process.  I digested this info and, while the write-up assumes some old IIS configurations, I discovered the C:\Program Files (x86)\Commerce Server 11\Staging directory has two folders of particular interest to me: Data and Events.  The Events folder has .mdb and .ldb files (yes, that’s Microsoft Access) that form a type of localized Event Log for Commerce Staging; the Data folder has a StagingLog.sdf file — .sdf is a format used by SQL Server Compact Edition
  2. I opened the Access database and found that it didn’t have any helpful information for my scenario.  My “cannot find the file specified” exception was not in evidence there . . . but there is some interesting meta-data captured in that Access MDB for Staging.  This Events folder appeared to be a dead-end.
  3. I used LinqPad to analyze the StagingLog.sdf file (turns out Commerce Staging uses SQL CE 3.5 edition), and I learned two key things:
    • This StagingLog.sdf file was essentially empty — no data was being written to it when I tried to start a Commerce Staging operation
    • I couldn’t use LINQPad on the server running Commerce Staging as I didn’t have the proper SQL CE 3.5 drivers, but when I moved it to my developer machine I could view the .sdf file without issue.  Interesting . . .
  4. My next step was to use ProcMon (available at https://docs.microsoft.com/en-us/sysinternals/downloads/procmon) and compare a process capture from my environment that raised the exception with a properly functioning Commerce Staging environment.  This took some trial and error, but filtering on the CSSsrv.exe helped me focus on the task at hand.
    • The properly functioning Staging environment had a lot of activity with the StagingLog.sdf file.  We’re talking hundreds of operations.  The broken Staging environment had none.  Once I tracked ProcMon instruction sets 1:1 between the two environments I found the final piece of evidence I needed:

SqlCE

The highlighted area above, C:\Program Files (x86)\Microsoft SQL Server Compact Edition\v3.5\sqlceer35EN.dll entries, were entirely missing from the initialization process of my broken Commerce Staging environment.

The lightbulbs finally went off for me, and I did a quick comparison of the C:\Program Files (x86) directories of the two servers.  There were key components missing from the environment where Staging wasn’t working, and it revolved around SQL Server Compact Edition.  Here’s a highlight showing what obviously wasn’t on the server in question:

dirs

Neither MS SQL Server Compact Edition nor MS Synchronization Services were installed to this server.  I did some further research, and learned that the CommerceServer.exe should fold in these elements with any Commerce install, so I needed to do a repair installation with the CommerceServer.exe and that added SQL Server CE and other dependencies necessary to solve the issue.

Our PowerShell that installs the CommerceServer.exe must not have been using a properly elevated account, or this was done with a manual execution of the installer that didn’t use the proper administrative credentials.  Everything worked fine after I completed the repair install of Commerce Server.

There are some additional items to point out here:

  • Based on my testing, the CommerceServer.exe may exclude some key Staging dependencies if it’s not run in a proper Administrator context.  I thought it was written in the docs somewhere, but I don’t see it in Sitecore’s documentation on installing Commerce Server .exe — running that EXE in the context of a local admin account is crucial to the success of that process.  I use this account and the shift + right-click -> Run as different user” option when launching the CommerceServer.exe manually, and this has been successful for me.
  • If you’re like me, and love the post-script to a good story,  here’s what LINQPad shows for that StagingLog.sdf once you’ve run a couple successful Staging operations:

Linqpad

  • I think additional take aways from this write-up are:
    • there’s an MS Access database in C:\Program Files (x86)\Commerce Server 11\Staging\Events that could be a source of interesting diagnostic info on your Sitecore Commerce Staging activities
    • there’s an SQL CE database in C:\Program Files (x86)\Commerce Server 11\Staging\Data that could be a source of interesting diagnostic info on your Sitecore Commerce Staging activities
Advertisements

Sitecore Publishing Data Through the EventQueue

Our Challenge

We work with a variety of Sitecore implementations at Rackspace, and sometimes we need to do a surgical task without requiring customers to install anything in particular, trigger app pool recycles, or other changes to an environment.  One such example came up today.

We needed to provide insight into a recent Sitecore publishing operation, but the customer didn’t have logging or other instrumentation showing what was published by whom, published where, and when.  While there may be Sitecore Marketplace modules and other solutions to this sort of challenge, they require customizations or at least package installations by the customer — and none of them can go back in time to answer the question about “who published XYZ in the last few hours?”

Preferably, by using the dedicated Publish log set to INFO in Sitecore, one can get at a ton of handy publishing information . . . and this is our general recommendation for implementations (provided logs aren’t retained for so long that they cause a space concern on disk).  In this case, however, the publish log wasn’t an option and so we had to get creative.

Our Solution

For this scenario, knowing the customer is using a global publishing pattern for Sitecore that we like to employ at Rackspace, we turned to the Sitecore EventQueue since we couldn’t rely on the Publish log.  Even though the EventQueue is mainly about propagating events to other Sitecore instances, we can take advantage of the fact that publishing events are some of those operations that run through the EventQueue.  We can run a query like the following to get a rough handle on what has been recently published:

SELECT Created as [Publish Timestamp]
        --, Username as [Initiator] -- not for distribution!
        , CAST(SUBSTRING(HASHBYTES('SHA1', UserName),1,3) as bigint)  as [Hashed Initiator]
        , IIF(CHARINDEX('"TargetDatabaseName":"pubTarget1"', InstanceData)>0,'1','0') AS [To 1]
        , IIF(CHARINDEX('"TargetDatabaseName":"pubTarget2"', InstanceData)>0,'1','0') AS [To 2]
        , IIF(CHARINDEX('"TargetDatabaseName":"pubTarget3"', InstanceData)>0,'1','0') AS [To 3]
        , IIF(CHARINDEX('"TargetDatabaseName":"pubTarget4"', InstanceData)>0,'1','0') AS [To 4]
       , (LEN(InstanceData) - LEN(REPLACE(InstanceData, '"LanguageName"', ''))) / LEN('"LanguageName"') as [# of Langs] --this is a proxy for how heavy a publish is
        , InstanceData AS [Raw Data]
FROM EventQueue
WHERE EventType LIKE 'Sitecore.Publishing.StartPublishingRemoteEvent%'
ORDER BY Created DESC

Here’s a glance of what the data might look like . . .

pubDump

Explanations & Caveats

Regarding the query, we use an SHA hash of the Sitecore login instead of showing the login (Username) in plain text.  A plain text username could be a security concern, so we don’t want to email that or casually distribute it.  Instead of generic “pubTarget1” etc, one should name specific publishing targets defined in the implementation.  This tells us if a publish went out to all the targets or just selectively.  Our use of “# of Langs” is a way of seeing how much data went out with the publish . . . it’s not perfect, but in most cases we’ve found counting the number of “LanguageName” elements in the JSON to be a reasonable barometer.  When in doubt, the Raw Data can be used to get at lots of other details.  I’d use a JSON viewer to format the JSON; it will look something like:

{
  "ClientLanguage": "en",
  "EventName": "publish:startPublishing",
  "Options": [
    {
      "CompareRevisions": true,
      "Deep": false,
      "FromDate": "\/Date(1464134576584)\/",
      "LanguageName": "de-DE",
      "Mode": 3,
      "PublishDate": "\/Date(1464183788221)\/",
      "PublishRelatedItems": false,
      "PublishingTargets": [
        "{8E080626-DDC3-4EF4-A1A1-F0BE4A200254}"
      ],
      "RecoveryId": "cd00ba58-61cb-4446-82ae-356eaae71957",
      "RepublishAll": false,
      "RootItemId": "afde64e9-7176-43ad-a1f2-89162d8ba4cb",
      "SourceDatabaseName": "master",
      "TargetDatabaseName": "web"
    }
  ],
  "PublishingServer": "SCMASTER-1-CLONE",
  "StatusHandle": {
    "instanceName": "SCMASTER-1-CLONE",
    "m_handle": "e3b5831f-2698-41b5-9bf9-3d88a5238e5a"
  },
  "UserName": "sitecore\\notalegituser"
}

One key caveat to this SQL approach is that data in the Sitecore EventQueue doesn’t remain for long.  Depending on how one tunes their Sitecore.Tasks.CleanupEventQueue agent, EventQueue records might be available for only a few hours.  It’s certainly not a good source for long term publishing history!  This is another reason why using the Sitecore Publishing log is really the way to go — but again, that text log isn’t available to us in this scenario.

Faster Sitecore Publishing for Global Implementations

My post over at the main Rackspace developer blog about Sitecore publishing strategies was something I’d like to link to from here.

The crux of it is this diagram illustrating how we can make use of SQL Replication to move Sitecore content between data centers, but still take advantage of Sitecore publishing to promote content to the live site.  The blog at Rackspace discusses it in some detail, so I’ll just point out that the idea is to let SQL Server Replication do the hard (and potentially slow) work of synchronizing content around the planet, while Sitecore publishing can be the plain old HTTP publish operation we know and love from Sitecore:

2016-02-23-Sitecore-Enterprise-Architecture-For-Global-Publishing