Sync content between XM Cloud environments automatically

How often did you get a bug ticket and whished you could try to reproduce the issue on a development environment with the content from production? Or how many times would you have liked to see how a new release is affecting the production content before promoting it from the staging environment? If this sounds familiar, you might want to continue reading.

Goal

In a regular interval, we want to sync one or multiple sites including the datasources from the production environment back to a staging and a development environment.

Overview

The approach described here will contain three building blocks. We start off with a dedicated serialization config which we use together with the Sitecore Content Serialization. Secondly, the serialization config is used to serialize and commit the content to a code repository. And lastly, we can use the repo to restore content automatically on other environments.

Part 1 - The Serialization

As we are using serialization to sync content between environments, we will have to define somewhere, which content we want to include. You could make this part of your existing serialization configs. However, I prefer to create a fully isolated configuration. It has the benefit that you can run your development related serialization commands without having to worry that you will accidentally also sync content items.

Configure the Serialization

To configure the serialization, we define a sitecore.json file in a separate folder. We will name the folder data.

The serialization config itself will be pretty similar to your existing configuration. Two notable mentions are that we...

  • ...look for a different file pattern ../src/*/*/*.contentmodule.json
  • ...define a different default serialization path contentitems

Here is the configuration file for reference:

{
  "$schema": "./.sitecore/schemas/RootConfigurationFile.schema.json",
  "modules": [
    "../src/*/*/*.contentmodule.json"
  ],
  "plugins": [
    "Sitecore.Edge.DevEx.Sitecore.Plugin@0.5.7",
    "Sitecore.DevEx.Extensibility.Serialization@5.2.113",
    "Sitecore.DevEx.Extensibility.Publishing@5.2.113",
    "Sitecore.DevEx.Extensibility.Indexing@5.2.113",
    "Sitecore.DevEx.Extensibility.ResourcePackage@5.2.113",
    "Sitecore.DevEx.Extensibility.Database@5.2.113",
    "Sitecore.DevEx.Extensibility.XMCloud@1.1.30",
    "Sitecore.DevEx.Extensibility.Tunneling@1.0.4"
  ],
  "serialization": {
    "defaultMaxRelativeItemPathLength": 100,
    "defaultModuleRelativeSerializationPath": "contentitems",
    "removeOrphansForRoles": true,
    "removeOrphansForUsers": true,
    "continueOnItemFailure": false,
    "excludedFields": [
      {
        "fieldId": "c7c26117-dbb1-42b2-ab5e-f7223845cca3",
        "description": "__Thumbnail"
      },
      {
        "fieldId": "001dd393-96c5-490b-924a-b0f25cd9efd8",
        "description": "__Lock"
      },
      {
        "fieldId": "2b2fe9fd-78a6-40eb-b9f9-28409d8d3700",
        "description": "SitemapMediaItems"
      }
    ]
  },
  "settings": {
    "telemetryEnabled": false,
    "cacheAuthenticationToken": true,
    "versionComparisonEnabled": true,
    "apiClientTimeoutInMinutes": 5
  }
}

Define the Content to Serialize

You can create one or more files to define paths you want to include in the automatic content sync. To include a specific site in an XM Cloud Headless SXA scenario, your config could look similar to this.

{
  "namespace": "Development.Website.Content",
  "items": {
    "includes": [
      {
        "name": "SiteContent",
        "path": "/sitecore/content/MySiteCollection/MySite/home",
        "scope": "ItemAndDescendants",
        "allowedPushOperations": "CreateUpdateAndDelete"
      },
      {
        "name": "SiteContentMedia",
        "path": "/sitecore/media library/Project/MySiteCollection/MySite",
        "scope": "DescendantsOnly",
        "allowedPushOperations": "CreateAndUpdate",
        "rules": [
          {
            "path": "/Sitemaps",
            "scope": "ignored"
          }
        ]
      },
      {
        "name": "SiteContentSharedMedia",
        "path": "/sitecore/media library/Project/MySiteCollection/shared",
        "scope": "DescendantsOnly",
        "allowedPushOperations": "CreateAndUpdate",
        "rules": [
          {
            "path": "/placeholders",
            "scope": "ignored"
          }
        ]
      },
      {
        "name": "SiteContentSharedData",
        "path": "/sitecore/content/MySiteCollection/MySite/data",
        "allowedPushOperations": "CreateAndUpdate"
      },
      {
        "name": "SiteContentSettings",
        "path": "/sitecore/content/MySiteCollection/MySite/Settings",
        "allowedPushOperations": "CreateAndUpdate",
        "scope": "singleItem"
      }
    ]
  }
}

Part 2 - The Backup

The second part involves pulling the serialized data from the production system and committing it to a code repository. This example will use Azure DevOps pipelines and repos, plus the Sitecore CLI. But you could use any similar tool to achieve the same.

To start off the process, the pipeline checks out both the repository with the code base (and the original serialization configs) and the data repository, where we will commit and push the content to. The pipeline definition is stored along the code base.

steps:
- checkout: self
  path: website

- checkout: data-repo
  path: $(website-data) # value: "website-data"
  persistCredentials: true
  clean: true

We now copy serialization related files to a target folder to be committed alongside the content. We will use these files later to restore content. It makes sure to save a configuration snapshot together with the content.

- task: CopyFiles@2
  displayName: "Copy Serialization Configs"
  inputs:
    SourceFolder: $(Agent.BuildDirectory)/website
    Contents: |
      **/dotnet-tools.json
      **/nuget.config
      **/sitecore.json
      **/*.contentmodule.json
    TargetFolder: $(website-data-target) # value: "$(Agent.BuildDirectory)/$(website-data)"
    OverWrite: true

If you have been using the Sitecore CLI in any automation scenario before, the next part might looks familiar to you. We restore the tooling and connect to the XM Cloud environment where we want to pull content from. This is our production system.

💡
The most important aspect here is, that we always use the --config data parameter to use the dedicated config from Part 1.
- script: "dotnet tool restore"
  displayName: "Restoring Sitecore CLI"
  workingDirectory: $(website-data-target)

- script: "dotnet sitecore --config data --help"
  displayName: "Installing Sitecore CLI Plugins"
  workingDirectory: $(website-data-target)

- script: "dotnet sitecore --config data --version"
  displayName: "Show Sitecore CLI Version"
  workingDirectory: $(website-data-target)

- script: "dotnet sitecore cloud login --config data --client-credentials --client-id $(XM_CLOUD_CLIENT_ID) --client-secret $(XM_CLOUD_CLIENT_SECRET) --allow-write"
  displayName: "Authenticate CLI with XM Cloud"
  workingDirectory: $(website-data-target)

- script: "dotnet sitecore cloud environment connect --config data -id $(XM_CLOUD_ENVIRONMENT_ID) --allow-write"
  displayName: "Connect the CLI to the Environment"
  workingDirectory: $(website-data-target)

Once we are connected to Sitecore, we setup the source control, pull the data and commit it to the code repository.

- script: |
    git config user.email $(automation-email)
    git config user.name $(automation-name)

    git checkout -b $(backup-branch)
    git pull origin $(backup-branch)
  displayName: "Setup Source Control Provider"
  workingDirectory: $(website-data-target)

- script: "dotnet sitecore ser pull --config data --environment-name $(XM_CLOUD_ENVIRONMENT_NAME)"
  displayName: "Pull Serialized Items"
  workingDirectory: $(website-data-target)
  condition: ne(${{ parameters.skipDataPull }}, 'true')

- script: "dotnet sitecore ser validate --config data --fix"
  displayName: "Validate Serialized Items"
  workingDirectory: $(website-data-target)
  condition: ne(${{ parameters.skipValidation }}, 'true')

- script: |
    git add .
    git commit -m "Update content module items for build: $(Build.BuildNumber)"
    git push origin $(backup-branch)
  displayName: "Commit and Push"
  workingDirectory: $(website-data-target)
  condition: ne(${{ parameters.skipCommit }}, 'true')

The last thing left to complete the pipeline is to define a schedule, link the code repo and include the variables.

trigger: none

schedules:
- cron: '*/15 * * * *'
  displayName: Continuous Content Backup (every quarter of an hour)
  branches:
    include:
      - develop
  always: true

resources:
  repositories:
  - repository: data-repo
    type: git
    name: website-data
    ref: main

variables:
  - template: /azure/azure-templates/variables.yml
  - group: xmcloud-global
  - group: xmcloud-production

Part 3 - Restoring the Content

The process to restore content to the development and staging environment contains many similar building blocks as the backup part. But we run it only once every night, instead of every 15 minutes and we define a stage per environment to run it in parallel.

For every environment we connect to XM Cloud and then push the serialized content using the Sitecore CLI. After that, we trigger a publish. For reference, here is the full pipeline definition.

parameters:
  - name: branchName
    displayName: Name of the branch to restore (main is default, ex. features/tools or refs/tags/MyTag)
    type: string
    default: main
  - name: skipDataPush
    displayName: Skip pushing data?
    type: boolean
    default: false
  - name: publishXmCloud
    displayName: Publish to Experience Edge?
    type: boolean
    default: true
  - name: fullPublish
    displayName: Run full publish?
    type: boolean
    default: false
  - name: publishPaths
    displayName: Paths to publish
    type: object
    default:
      - /sitecore/content/MySiteCollection/MySite
  - name: publishSubitems
    displayName: Include Subitems?
    type: boolean
    default: true

trigger: none

schedules:
- cron: '0 23 * * *'
  displayName: Continuous Content Restore (Nightly at 11 PM)
  branches:
    include:
      - develop
  always: true

resources:
  repositories:
  - repository: data-repo
    type: git
    name: website-data
    ref: ${{ parameters.branchName }}

variables:
  - template: /azure/azure-templates/variables.yml
  - group: xmcloud-global
  - name: environment
    value: $(XM_CLOUD_ENVIRONMENT_NAME)
  - name: environmentId
    value: $(XM_CLOUD_ENVIRONMENT_ID)

stages:
  - stage: Backup_Restore_Development
    displayName: Restore Backup on Development
    variables:
      - group: xmcloud-development
    jobs:
      - job: Backup_Restore
        displayName: Restoring Data
        steps:
          - template: /azure/azure-templates/steps-backup-restore.yml
            parameters:
              environmentName: $(environment)
              environmentId: $(environmentId)
              includeModules: $(include-modules)
              skipDataPush: ${{ parameters.skipDataPush }}

      - job: Backup_Publish
        displayName: Publishing Data
        dependsOn:
          - Backup_Restore
        condition: |
          and
          ( 
            eq(${{ parameters.publishXmCloud }}, 'true'),        
            in(dependencies.Backup_Restore.result, 'Succeeded', 'SucceededWithIssues', 'Skipped')
          )
        steps:
          - template: /azure/azure-templates/steps-publish-xmcloud.yml
            parameters:
              environmentName: $(environment)
              environmentId: $(environmentId)
              fullPublish: ${{ parameters.fullPublish }}
              publishPaths: ${{ parameters.publishPaths }}
              publishSubitems: ${{ parameters.publishSubitems }}

  - stage: Backup_Restore_Staging
    displayName: Restore Backup on Staging
    variables:
      - group: xmcloud-staging
    dependsOn: []
    jobs:
      - job: Backup_Restore
        displayName: Restoring Data
        steps:
          - template: /azure/azure-templates/steps-backup-restore.yml
            parameters:
              environmentName: $(environment)
              environmentId: $(environmentId)
              includeModules: $(include-modules)
              skipDataPush: ${{ parameters.skipDataPush }}

      - job: Backup_Publish
        displayName: Publishing Data
        dependsOn:
          - Backup_Restore
        condition: |
          and
          ( 
            eq(${{ parameters.publishXmCloud }}, 'true'),        
            in(dependencies.Backup_Restore.result, 'Succeeded', 'SucceededWithIssues', 'Skipped')
          )
        steps:
          - template: /azure/azure-templates/steps-publish-xmcloud.yml
            parameters:
              environmentName: $(environment)
              environmentId: $(environmentId)
              fullPublish: ${{ parameters.fullPublish }}
              publishPaths: ${{ parameters.publishPaths }}
              publishSubitems: ${{ parameters.publishSubitems }}

And the steps-backup-restore.yml template.

parameters:
  - name: environmentName
    type: string
  - name: environmentId
    type: string
  - name: includeModules
    type: string
  - name: skipDataPush
    type: boolean
    default: false

steps:
  - checkout: data-repo
    path: $(website-data)
    persistCredentials: true
    clean: true

  - script: "dotnet tool restore"
    displayName: "Restoring Sitecore CLI"
    workingDirectory: $(website-data-target)

  - script: "dotnet sitecore --config data --help"
    displayName: "Installing Sitecore CLI Plugins"
    workingDirectory: $(website-data-target)

  - script: "dotnet sitecore --config data --version"
    displayName: "Show Sitecore CLI Version"
    workingDirectory: $(website-data-target)

  - script: "dotnet sitecore cloud login --config data --client-credentials --client-id $(XM_CLOUD_CLIENT_ID) --client-secret $(XM_CLOUD_CLIENT_SECRET) --allow-write"
    displayName: "Authenticate CLI with XM Cloud"
    workingDirectory: $(website-data-target)

  - script: "dotnet sitecore cloud environment connect --config data -id ${{ parameters.environmentId }} --allow-write"
    displayName: "Connect the CLI to the Environment"
    workingDirectory: $(website-data-target)

  - script: "dotnet sitecore ser push --config data --environment-name ${{ parameters.environmentName }} --include ${{ parameters.includeModules }}"
    displayName: "Push Serialized Items"
    workingDirectory: $(website-data-target)
    condition: ne(${{ parameters.skipDataPush }}, 'true')

Conclusion

The Sitecore Content Serialization together with the Sitecore CLI makes is pretty easy to automatically sync content between multiple environments. We use the result of this frequently to reproduce content and configuration issues on staging without the risk of having to touch production.

An additional benefit from this is that you get a (partial) content backup for your platform and a content changelog as a by-product.

In an upcoming blog post, I will share some insights into how we combine this with another pipeline to reset customizations on the lower environments.

Automate XM Cloud Maintenance Tasks
When working with XM Cloud, there is a fundamental difference in how serialized items are handled during a deployment. This can lead to some unwanted side effects with your data or templates. In this blog post, we explore how you can automate some maintenance tasks to avoid unexpected situations.…