xDB Reporting Database Rebuild Help

I’ve created something like this every time I need to rebuild the Sitecore “reporting” database (this link covers the basic process), this time I’m posting it online so I can re-use it next time around!

This is the script for generating the T-SQL that’s required to complete step #3 in the write-up when you’re following the “Rebuild Reporting Database” instructions:

“In the Rebuild Reporting Database page, when you see Waiting to receive to data status, copy the following marketing definition tables from the primary to the secondary reporting database”

I have written the SQL several times to do this, but this time I took a run at DRY (don’t repeat yourself) to script this SQL out.  Alas, I think my T-SQL comes in at 40+ lines of code versus the raw SQL to run which is just 35 lines and much easier to read, in my opinion.

Either way, you can pick which you prefer as I’ll share them both here

First, the plain vanilla SQL commands for copying those database tables:

INSERT INTO target_Analytics.dbo.CampaignActivityDefinitions
         SELECT source_Analytics.dbo.CampaignActivityDefinitions.*
         FROM  source_Analytics.dbo.CampaignActivityDefinitions ;

INSERT INTO target_Analytics.dbo.GoalDefinitions
         SELECT source_Analytics.dbo.GoalDefinitions.*
         FROM  source_Analytics.dbo.GoalDefinitions ;

INSERT INTO target_Analytics.dbo.OutcomeDefinitions
         SELECT source_Analytics.dbo.OutcomeDefinitions.*
         FROM  source_Analytics.dbo.OutcomeDefinitions ;

INSERT INTO target_Analytics.dbo.MarketingAssetDefinitions
         SELECT source_Analytics.dbo.MarketingAssetDefinitions.*
         FROM  source_Analytics.dbo.MarketingAssetDefinitions ;

INSERT INTO target_Analytics.dbo.Taxonomy_TaxonEntity
         SELECT source_Analytics.dbo.Taxonomy_TaxonEntity.*
         FROM  source_Analytics.dbo.Taxonomy_TaxonEntity ;

INSERT INTO target_Analytics.dbo.Taxonomy_TaxonEntityFieldDefinition
         SELECT source_Analytics.dbo.Taxonomy_TaxonEntityFieldDefinition.*
         FROM  source_Analytics.dbo.Taxonomy_TaxonEntityFieldDefinition ;

INSERT INTO target_Analytics.dbo.Taxonomy_TaxonEntityFieldValue
         SELECT source_Analytics.dbo.Taxonomy_TaxonEntityFieldValue.*
         FROM  source_Analytics.dbo.Taxonomy_TaxonEntityFieldValue ;

And now, here’s the T-SQL attempt to “simplify” the process of creating a script like the above for future projects (yet I prefer it less to the brute force approach):

The advantage to the below is you set your source and target variables to the names of the SQL Server databases, and then you’re all set.

DECLARE @source VARCHAR(100)
DECLARE @target VARCHAR(100)
SET @source = 'source_Analytics'
SET @target = 'target_Analytics'

--List approach will work in SQL Server 2012 only

SELECTX NULL mykey, * INTO #mytemp FROM @ListOfTables
DECLARE @theTable varchar(100)
DECLARE @sql varchar(1000)

UPDATE #mytemp SET mykey = 1

WHILE @@rowcount > 0
    SELECT @theTable = (SELECT IDs FROM #mytemp WHERE mykey = 1)
    PRINT 'INSERT INTO ' + @target + '.dbo.' + @theTable + '
         SELECT ' + @source +  '.dbo.' + @theTable + '.*
         FROM  ' + @source + '.dbo.' + @theTable + ' ;'
     --use 'EXEC to run the dynamic SQL, instead of PRINT, 
     --if you're feeling brave

    DELETE #mytemp WHERE mykey = 1
    UPDATE #mytemp SET mykey = 1
DROP TABLE #mytemp


Sitecore Publishing Data Through the EventQueue

Our Challenge

We work with a variety of Sitecore implementations at Rackspace, and sometimes we need to do a surgical task without requiring customers to install anything in particular, trigger app pool recycles, or other changes to an environment.  One such example came up today.

We needed to provide insight into a recent Sitecore publishing operation, but the customer didn’t have logging or other instrumentation showing what was published by whom, published where, and when.  While there may be Sitecore Marketplace modules and other solutions to this sort of challenge, they require customizations or at least package installations by the customer — and none of them can go back in time to answer the question about “who published XYZ in the last few hours?”

Preferably, by using the dedicated Publish log set to INFO in Sitecore, one can get at a ton of handy publishing information . . . and this is our general recommendation for implementations (provided logs aren’t retained for so long that they cause a space concern on disk).  In this case, however, the publish log wasn’t an option and so we had to get creative.

Our Solution

For this scenario, knowing the customer is using a global publishing pattern for Sitecore that we like to employ at Rackspace, we turned to the Sitecore EventQueue since we couldn’t rely on the Publish log.  Even though the EventQueue is mainly about propagating events to other Sitecore instances, we can take advantage of the fact that publishing events are some of those operations that run through the EventQueue.  We can run a query like the following to get a rough handle on what has been recently published:

SELECT Created as [Publish Timestamp]
        --, Username as [Initiator] -- not for distribution!
        , CAST(SUBSTRING(HASHBYTES('SHA1', UserName),1,3) as bigint)  as [Hashed Initiator]
        , IIF(CHARINDEX('"TargetDatabaseName":"pubTarget1"', InstanceData)>0,'1','0') AS [To 1]
        , IIF(CHARINDEX('"TargetDatabaseName":"pubTarget2"', InstanceData)>0,'1','0') AS [To 2]
        , IIF(CHARINDEX('"TargetDatabaseName":"pubTarget3"', InstanceData)>0,'1','0') AS [To 3]
        , IIF(CHARINDEX('"TargetDatabaseName":"pubTarget4"', InstanceData)>0,'1','0') AS [To 4]
       , (LEN(InstanceData) - LEN(REPLACE(InstanceData, '"LanguageName"', ''))) / LEN('"LanguageName"') as [# of Langs] --this is a proxy for how heavy a publish is
        , InstanceData AS [Raw Data]
FROM EventQueue
WHERE EventType LIKE 'Sitecore.Publishing.StartPublishingRemoteEvent%'

Here’s a glance of what the data might look like . . .


Explanations & Caveats

Regarding the query, we use an SHA hash of the Sitecore login instead of showing the login (Username) in plain text.  A plain text username could be a security concern, so we don’t want to email that or casually distribute it.  Instead of generic “pubTarget1” etc, one should name specific publishing targets defined in the implementation.  This tells us if a publish went out to all the targets or just selectively.  Our use of “# of Langs” is a way of seeing how much data went out with the publish . . . it’s not perfect, but in most cases we’ve found counting the number of “LanguageName” elements in the JSON to be a reasonable barometer.  When in doubt, the Raw Data can be used to get at lots of other details.  I’d use a JSON viewer to format the JSON; it will look something like:

  "ClientLanguage": "en",
  "EventName": "publish:startPublishing",
  "Options": [
      "CompareRevisions": true,
      "Deep": false,
      "FromDate": "\/Date(1464134576584)\/",
      "LanguageName": "de-DE",
      "Mode": 3,
      "PublishDate": "\/Date(1464183788221)\/",
      "PublishRelatedItems": false,
      "PublishingTargets": [
      "RecoveryId": "cd00ba58-61cb-4446-82ae-356eaae71957",
      "RepublishAll": false,
      "RootItemId": "afde64e9-7176-43ad-a1f2-89162d8ba4cb",
      "SourceDatabaseName": "master",
      "TargetDatabaseName": "web"
  "PublishingServer": "SCMASTER-1-CLONE",
  "StatusHandle": {
    "instanceName": "SCMASTER-1-CLONE",
    "m_handle": "e3b5831f-2698-41b5-9bf9-3d88a5238e5a"
  "UserName": "sitecore\\notalegituser"

One key caveat to this SQL approach is that data in the Sitecore EventQueue doesn’t remain for long.  Depending on how one tunes their Sitecore.Tasks.CleanupEventQueue agent, EventQueue records might be available for only a few hours.  It’s certainly not a good source for long term publishing history!  This is another reason why using the Sitecore Publishing log is really the way to go — but again, that text log isn’t available to us in this scenario.

Basic Azure Benchmarking for Sitecore

We used this approach to run through some database performance benchmarking for Azure for a real production Sitecore implementation (Sitecore version 8.0).  The idea is to use real resource utilization statistics to estimate DTU requirements and Azure SQL database tiers for if/when this workload shifts to Azure.  This was for a real site, not LaunchSitecore or JetStream or Habitat 🙂

We ran the tests for an hour, then six hours, then at different times of the day . . . yet the results were consistent each time.  The Standard S2 Tier was the recommendation based on our observed workload for every database (Core, Master, and Web databases).  This implementation used MongoDB for session state, however, so this doesn’t include SQL Server session state into the calculation.

There were usually around 9% of the time when the DTUs would spike and exceed the Standard S2 capacity, so I’m a bit concerned and curious about what that would translate into if we were running 100% in Azure (probably slow perf for those times — which roughly appeared to align with Sitecore publishing — which means the Standard S2 Tier might not really cut it).  We could see Content Authors complaining of slow perf, stalled content updates, etc . . . so this is just a starting point.