Sync content between XM Cloud environments automatically
How often did you get a bug ticket and whished you could try to reproduce the issue on a development environment with the content from production? Or how many times would you have liked to see how a new release is affecting the production content before promoting it from the staging environment? If this sounds familiar, you might want to continue reading.
Goal
In a regular interval, we want to sync one or multiple sites including the datasources from the production environment back to a staging and a development environment.
Overview
The approach described here will contain three building blocks. We start off with a dedicated serialization config which we use together with the Sitecore Content Serialization. Secondly, the serialization config is used to serialize and commit the content to a code repository. And lastly, we can use the repo to restore content automatically on other environments.
Part 1 - The Serialization
As we are using serialization to sync content between environments, we will have to define somewhere, which content we want to include. You could make this part of your existing serialization configs. However, I prefer to create a fully isolated configuration. It has the benefit that you can run your development related serialization commands without having to worry that you will accidentally also sync content items.
Configure the Serialization
To configure the serialization, we define a sitecore.json
file in a separate folder. We will name the folder data
.
The serialization config itself will be pretty similar to your existing configuration. Two notable mentions are that we...
- ...look for a different file pattern
../src/*/*/*.contentmodule.json
- ...define a different default serialization path
contentitems
Here is the configuration file for reference:
{
"$schema": "./.sitecore/schemas/RootConfigurationFile.schema.json",
"modules": [
"../src/*/*/*.contentmodule.json"
],
"plugins": [
"Sitecore.Edge.DevEx.Sitecore.Plugin@0.5.7",
"Sitecore.DevEx.Extensibility.Serialization@5.2.113",
"Sitecore.DevEx.Extensibility.Publishing@5.2.113",
"Sitecore.DevEx.Extensibility.Indexing@5.2.113",
"Sitecore.DevEx.Extensibility.ResourcePackage@5.2.113",
"Sitecore.DevEx.Extensibility.Database@5.2.113",
"Sitecore.DevEx.Extensibility.XMCloud@1.1.30",
"Sitecore.DevEx.Extensibility.Tunneling@1.0.4"
],
"serialization": {
"defaultMaxRelativeItemPathLength": 100,
"defaultModuleRelativeSerializationPath": "contentitems",
"removeOrphansForRoles": true,
"removeOrphansForUsers": true,
"continueOnItemFailure": false,
"excludedFields": [
{
"fieldId": "c7c26117-dbb1-42b2-ab5e-f7223845cca3",
"description": "__Thumbnail"
},
{
"fieldId": "001dd393-96c5-490b-924a-b0f25cd9efd8",
"description": "__Lock"
},
{
"fieldId": "2b2fe9fd-78a6-40eb-b9f9-28409d8d3700",
"description": "SitemapMediaItems"
}
]
},
"settings": {
"telemetryEnabled": false,
"cacheAuthenticationToken": true,
"versionComparisonEnabled": true,
"apiClientTimeoutInMinutes": 5
}
}
Define the Content to Serialize
You can create one or more files to define paths you want to include in the automatic content sync. To include a specific site in an XM Cloud Headless SXA scenario, your config could look similar to this.
{
"namespace": "Development.Website.Content",
"items": {
"includes": [
{
"name": "SiteContent",
"path": "/sitecore/content/MySiteCollection/MySite/home",
"scope": "ItemAndDescendants",
"allowedPushOperations": "CreateUpdateAndDelete"
},
{
"name": "SiteContentMedia",
"path": "/sitecore/media library/Project/MySiteCollection/MySite",
"scope": "DescendantsOnly",
"allowedPushOperations": "CreateAndUpdate",
"rules": [
{
"path": "/Sitemaps",
"scope": "ignored"
}
]
},
{
"name": "SiteContentSharedMedia",
"path": "/sitecore/media library/Project/MySiteCollection/shared",
"scope": "DescendantsOnly",
"allowedPushOperations": "CreateAndUpdate",
"rules": [
{
"path": "/placeholders",
"scope": "ignored"
}
]
},
{
"name": "SiteContentSharedData",
"path": "/sitecore/content/MySiteCollection/MySite/data",
"allowedPushOperations": "CreateAndUpdate"
},
{
"name": "SiteContentSettings",
"path": "/sitecore/content/MySiteCollection/MySite/Settings",
"allowedPushOperations": "CreateAndUpdate",
"scope": "singleItem"
}
]
}
}
Part 2 - The Backup
The second part involves pulling the serialized data from the production system and committing it to a code repository. This example will use Azure DevOps pipelines and repos, plus the Sitecore CLI. But you could use any similar tool to achieve the same.
To start off the process, the pipeline checks out both the repository with the code base (and the original serialization configs) and the data repository, where we will commit and push the content to. The pipeline definition is stored along the code base.
steps:
- checkout: self
path: website
- checkout: data-repo
path: $(website-data) # value: "website-data"
persistCredentials: true
clean: true
We now copy serialization related files to a target folder to be committed alongside the content. We will use these files later to restore content. It makes sure to save a configuration snapshot together with the content.
- task: CopyFiles@2
displayName: "Copy Serialization Configs"
inputs:
SourceFolder: $(Agent.BuildDirectory)/website
Contents: |
**/dotnet-tools.json
**/nuget.config
**/sitecore.json
**/*.contentmodule.json
TargetFolder: $(website-data-target) # value: "$(Agent.BuildDirectory)/$(website-data)"
OverWrite: true
If you have been using the Sitecore CLI in any automation scenario before, the next part might looks familiar to you. We restore the tooling and connect to the XM Cloud environment where we want to pull content from. This is our production system.
- script: "dotnet tool restore"
displayName: "Restoring Sitecore CLI"
workingDirectory: $(website-data-target)
- script: "dotnet sitecore --config data --help"
displayName: "Installing Sitecore CLI Plugins"
workingDirectory: $(website-data-target)
- script: "dotnet sitecore --config data --version"
displayName: "Show Sitecore CLI Version"
workingDirectory: $(website-data-target)
- script: "dotnet sitecore cloud login --config data --client-credentials --client-id $(XM_CLOUD_CLIENT_ID) --client-secret $(XM_CLOUD_CLIENT_SECRET) --allow-write"
displayName: "Authenticate CLI with XM Cloud"
workingDirectory: $(website-data-target)
- script: "dotnet sitecore cloud environment connect --config data -id $(XM_CLOUD_ENVIRONMENT_ID) --allow-write"
displayName: "Connect the CLI to the Environment"
workingDirectory: $(website-data-target)
Once we are connected to Sitecore, we setup the source control, pull the data and commit it to the code repository.
- script: |
git config user.email $(automation-email)
git config user.name $(automation-name)
git checkout -b $(backup-branch)
git pull origin $(backup-branch)
displayName: "Setup Source Control Provider"
workingDirectory: $(website-data-target)
- script: "dotnet sitecore ser pull --config data --environment-name $(XM_CLOUD_ENVIRONMENT_NAME)"
displayName: "Pull Serialized Items"
workingDirectory: $(website-data-target)
condition: ne(${{ parameters.skipDataPull }}, 'true')
- script: "dotnet sitecore ser validate --config data --fix"
displayName: "Validate Serialized Items"
workingDirectory: $(website-data-target)
condition: ne(${{ parameters.skipValidation }}, 'true')
- script: |
git add .
git commit -m "Update content module items for build: $(Build.BuildNumber)"
git push origin $(backup-branch)
displayName: "Commit and Push"
workingDirectory: $(website-data-target)
condition: ne(${{ parameters.skipCommit }}, 'true')
The last thing left to complete the pipeline is to define a schedule, link the code repo and include the variables.
trigger: none
schedules:
- cron: '*/15 * * * *'
displayName: Continuous Content Backup (every quarter of an hour)
branches:
include:
- develop
always: true
resources:
repositories:
- repository: data-repo
type: git
name: website-data
ref: main
variables:
- template: /azure/azure-templates/variables.yml
- group: xmcloud-global
- group: xmcloud-production
Part 3 - Restoring the Content
The process to restore content to the development and staging environment contains many similar building blocks as the backup part. But we run it only once every night, instead of every 15 minutes and we define a stage per environment to run it in parallel.
For every environment we connect to XM Cloud and then push the serialized content using the Sitecore CLI. After that, we trigger a publish. For reference, here is the full pipeline definition.
parameters:
- name: branchName
displayName: Name of the branch to restore (main is default, ex. features/tools or refs/tags/MyTag)
type: string
default: main
- name: skipDataPush
displayName: Skip pushing data?
type: boolean
default: false
- name: publishXmCloud
displayName: Publish to Experience Edge?
type: boolean
default: true
- name: fullPublish
displayName: Run full publish?
type: boolean
default: false
- name: publishPaths
displayName: Paths to publish
type: object
default:
- /sitecore/content/MySiteCollection/MySite
- name: publishSubitems
displayName: Include Subitems?
type: boolean
default: true
trigger: none
schedules:
- cron: '0 23 * * *'
displayName: Continuous Content Restore (Nightly at 11 PM)
branches:
include:
- develop
always: true
resources:
repositories:
- repository: data-repo
type: git
name: website-data
ref: ${{ parameters.branchName }}
variables:
- template: /azure/azure-templates/variables.yml
- group: xmcloud-global
- name: environment
value: $(XM_CLOUD_ENVIRONMENT_NAME)
- name: environmentId
value: $(XM_CLOUD_ENVIRONMENT_ID)
stages:
- stage: Backup_Restore_Development
displayName: Restore Backup on Development
variables:
- group: xmcloud-development
jobs:
- job: Backup_Restore
displayName: Restoring Data
steps:
- template: /azure/azure-templates/steps-backup-restore.yml
parameters:
environmentName: $(environment)
environmentId: $(environmentId)
includeModules: $(include-modules)
skipDataPush: ${{ parameters.skipDataPush }}
- job: Backup_Publish
displayName: Publishing Data
dependsOn:
- Backup_Restore
condition: |
and
(
eq(${{ parameters.publishXmCloud }}, 'true'),
in(dependencies.Backup_Restore.result, 'Succeeded', 'SucceededWithIssues', 'Skipped')
)
steps:
- template: /azure/azure-templates/steps-publish-xmcloud.yml
parameters:
environmentName: $(environment)
environmentId: $(environmentId)
fullPublish: ${{ parameters.fullPublish }}
publishPaths: ${{ parameters.publishPaths }}
publishSubitems: ${{ parameters.publishSubitems }}
- stage: Backup_Restore_Staging
displayName: Restore Backup on Staging
variables:
- group: xmcloud-staging
dependsOn: []
jobs:
- job: Backup_Restore
displayName: Restoring Data
steps:
- template: /azure/azure-templates/steps-backup-restore.yml
parameters:
environmentName: $(environment)
environmentId: $(environmentId)
includeModules: $(include-modules)
skipDataPush: ${{ parameters.skipDataPush }}
- job: Backup_Publish
displayName: Publishing Data
dependsOn:
- Backup_Restore
condition: |
and
(
eq(${{ parameters.publishXmCloud }}, 'true'),
in(dependencies.Backup_Restore.result, 'Succeeded', 'SucceededWithIssues', 'Skipped')
)
steps:
- template: /azure/azure-templates/steps-publish-xmcloud.yml
parameters:
environmentName: $(environment)
environmentId: $(environmentId)
fullPublish: ${{ parameters.fullPublish }}
publishPaths: ${{ parameters.publishPaths }}
publishSubitems: ${{ parameters.publishSubitems }}
And the steps-backup-restore.yml
template.
parameters:
- name: environmentName
type: string
- name: environmentId
type: string
- name: includeModules
type: string
- name: skipDataPush
type: boolean
default: false
steps:
- checkout: data-repo
path: $(website-data)
persistCredentials: true
clean: true
- script: "dotnet tool restore"
displayName: "Restoring Sitecore CLI"
workingDirectory: $(website-data-target)
- script: "dotnet sitecore --config data --help"
displayName: "Installing Sitecore CLI Plugins"
workingDirectory: $(website-data-target)
- script: "dotnet sitecore --config data --version"
displayName: "Show Sitecore CLI Version"
workingDirectory: $(website-data-target)
- script: "dotnet sitecore cloud login --config data --client-credentials --client-id $(XM_CLOUD_CLIENT_ID) --client-secret $(XM_CLOUD_CLIENT_SECRET) --allow-write"
displayName: "Authenticate CLI with XM Cloud"
workingDirectory: $(website-data-target)
- script: "dotnet sitecore cloud environment connect --config data -id ${{ parameters.environmentId }} --allow-write"
displayName: "Connect the CLI to the Environment"
workingDirectory: $(website-data-target)
- script: "dotnet sitecore ser push --config data --environment-name ${{ parameters.environmentName }} --include ${{ parameters.includeModules }}"
displayName: "Push Serialized Items"
workingDirectory: $(website-data-target)
condition: ne(${{ parameters.skipDataPush }}, 'true')
Conclusion
The Sitecore Content Serialization together with the Sitecore CLI makes is pretty easy to automatically sync content between multiple environments. We use the result of this frequently to reproduce content and configuration issues on staging without the risk of having to touch production.
An additional benefit from this is that you get a (partial) content backup for your platform and a content changelog as a by-product.
In an upcoming blog post, I will share some insights into how we combine this with another pipeline to reset customizations on the lower environments.