Sitecore – XM Cloud – Errors During Setup and Fixes/Work Around

This is the second part of my XM Cloud series, as mentioned in the end of my previous post, sharing the issues I faced while setting up XM Cloud instance and what I learned in the due process of fixing them, hoping it comes handy in troubleshooting the similar issue if you come across.

Deployment failure due to license:

                The first time when I spined  up my instance, I faced the below error. I was truly flabbergasted as I didn’t use any custom code. I used the default starter project which XM Cloud offered.

Starting deployment of environment 'DEV'
Starting deployment settings encryption
Encrypted deployment settings
Starting deployment configuration
Deployment configuration has been completed
Deployment configuration file has been created.
Winter is coming…
Environment 'sdkjfsdkjfbdskfnasdfbsdkj' starting for deployment 'sdkjfsdkjfbdskfnasdfbsdkj'.
Waking up the minions.
Deployment 'sdkjfsdkjfbdskfnasdfbsdkj' of environment 'DEV' has failed
Please consult the CM and/or the Rendering Host logs using the CLI.
If the problem persists, contact customer support, and provide them the session tracing ID of 'sjdf0ebb2-1234-8989-as3e-465c15a66455'.

When we face any errors during the deployment the relevant logs can be found in deployment logs. But chances are there that not all the time the logs mentioned here will give the full picture about an actual issue. In those situations, environment logs provide far more trace.

While checking the logs, I found the below strace,

385148 21:27:33 ERROR Error in LicenseWatcher Exception: System.TypeInitializationException Message: The type initializer for 'Sitecore.SecurityModel.License.LicenseManager' threw an exception.
Source: Sitecore.Kernel at Sitecore.SecurityModel.License.LicenseWatcher.Created(String filePath)
Nested Exception Exception: System.Xml.XmlException
Message: Data at the root level is invalid. Line 1, position 1.
Source: System.Xml
at System.Xml.XmlTextReaderImpl.Throw(Exception e)
at System.Xml.XmlTextReaderImpl.ParseRootLevelWhitespace()
at System.Xml.XmlTextReaderImpl.ParseDocumentContent()
at System.Xml.XmlLoader.Load(XmlDocument doc, XmlReader reader, Boolean preserveWhitespace)
at System.Xml.XmlDocument.Load(XmlReader reader)
at System.Xml.XmlDocument.LoadXml(String xml)
at Sitecore.Nexus.Licensing.NexusLicenseApi.get_Api()
at Sitecore.Nexus.Licensing.NexusLicenseApi.GetSnapShot(Guid instance)
at Sitecore.SecurityModel.License.LicenseManager.GetSnapshotData(Guid instance)
at Sitecore.SecurityModel.License.LicenseManager.UpdateSnapshot()
at Sitecore.SecurityModel.License.LicenseManager..cctor()

For this error, there is nothing you can do to fix other than reaching out to sitecore. This is because of an encoding issue in the license which only sitecore can update from their backend.

Rest of the issues are due to docker containers due to typical corporate networking components.

  • Docker and SOLR/ZooKeeper

                  XM Cloud – Local Docker Setup uses ZK with a single solr instance. Sometimes the solr container throws error – becoming unhealthy when it’s unable to connect to ZK.

In this case, you can configure SOLR straightaway without using ZK. Refer below blog posts,

1. Using Solr Standalone for your Local – Rock, Paper, Sitecore (rockpapersitecore.com)

2. Jeremy Davis – Strange Docker / Zookeeper errors (jermdavis.dev)

3. Miauw – a Sitecore blog: XMCloud SxaStarter local setup – solr-init issue (ggullentops.blogspot.com)

  • Docker unable to resolve hosts based on container name:

 By default, the XM Cloud containers – the apps be it solr or mssql they should be resolved based on the container name. Meaning, in the docker-compose.yml,

solr:
isolation: ${ISOLATION}
image: ${SITECORE_NONPRODUCTION_DOCKER_REGISTRY}nonproduction/solr:8.8.2-${EXTERNAL_IMAGE_TAG_SUFFIX}
ports:
- "8984:8983"
volumes:
- type: bind
source: .\solr-data
target: c:\data
environment:
SOLR_MODE: solrcloud
healthcheck:
test: ["CMD", "powershell", "-command", "try { $$statusCode = (iwr http://solr:8983/solr/admin/cores?action=STATUS -UseBasicParsing).StatusCode; if ($$statusCode -eq 200) { exit 0 } else { exit 1} } catch { exit 1 }"]
solr-init:
isolation: ${ISOLATION}
image: ${SITECORE_DOCKER_REGISTRY}sitecore-xmcloud-solr-init:${SITECORE_VERSION}
environment:
SITECORE_SOLR_CONNECTION_STRING: http://solr:8983/solr
SOLR_CORE_PREFIX_NAME: ${SOLR_CORE_PREFIX_NAME}
depends_on:
solr:
condition: service_healthy

solr is the container name. In solr-init container, the solr url is referred as http://solr:8983/solr. In non docker local setups solr will be referred as http://localhost:8983/solr. Referring them via container name is actually logical, because the IP address of these apps will be dynamically set each time they are spin up.

However, I have faced lot of times different containers not being able to resolve based on their hostnames.

2023-03-06 20:30:30 ManagedPoolThread #1 20:30:15 WARN  IsOnline: Test connection has failed with an exception. Type: 'SolrConnectionException', Message: 'The remote name could not be resolved: 'solr''

2023-03-05 11:12:01 Sqlcmd: Error: Microsoft ODBC Driver 17 for SQL Server : Named Pipes Provider: Could not open a connection to SQL Server [53]. .
2023-03-05 11:12:01 Sqlcmd: Error: Microsoft ODBC Driver 17 for SQL Server : Login timeout expired.
2023-03-05 11:12:01 Sqlcmd: Error: Microsoft ODBC Driver 17 for SQL Server : A network-related or instance-specific error has occurred while establishing a connection to SQL Server. Server is not found or not accessible. Check if instance name is correct and if SQL Server is configured to allow remote connections. For more information see SQL Server Books Online..

Although one is related to sol and another related to mssql and they both seem to provide a different error, fundamentally the cause for these two is same.

How?

You get the IP of SOLR container via, docker inspect “container-name” or if you use VS Code Docker extension, right click on a container and select inspect. This will give you a json of details of which one is IP Address. In place of the container name use this IP Name and rerun the up.ps1 if your issue is in fact because of docker not able to resolve host by the containers name this issue will now be resolved. However, this is not recommended as the IP will be changed the next time when you docker-compose down and docker-compose up.

In case of MSSQL, you can connect to the container mssql instance from host SSMS by using the localhost, portnumber and sql server authentication via sa account. The required port number/ password details are available in the .env file

These issues are mainly due to firewalls/proxies and VPNs. There is no one direct fix for this as this may vary from client network to client network. You can try disconnecting from VPNs. However, you can try troubleshooting based on the below,

  • TRY removing the default switch in the docker. Because anyways the XMC docker will create a separate switch for it’s list of containers.
  • See if the containers are connected to the XMC Docker Network (sxastarter_default)
    1. docker network ls
    2. docker network inspect "Network_Name"
    3. Connect/Disconnect,
      1. docker network connect "Network_Name" "Container_Name"
      2. docker network disconnect "Network_Name" "Container_Name"
  • Try pinging the target container. -> ping "container name"
  • Check if the target port is open in the corresponding container via it’s PowerShell, Instead of IP you should use the container name.

PS C:\inetpub\wwwroot> Test-NetConnection -ComputerName 172.30.148.232 -Port 8983
ComputerName : 172.30.148.232
RemoteAddress : 172.30.148.232
RemotePort : 8983
InterfaceAlias : Ethernet
SourceAddress : 172.30.157.226
TcpTestSucceeded : True

  • Try changing the network adopter priority, based on the article Blogs / Perficient / Sitecore Docker Troubleshooting
  • If none of the above works, as crazy as I may sound, try bringing down the entire containers, networks using docker compose down/prune and bring it up. I don’t know what difference that would actually make but this helped me overcome a couple of issues. (Yes..it is at this moment logic has left the world)

A Few Good to Know before you start your local setup:

  • The two init containers – solr-init and mssql-init are used to initialize your actual solr/mssql server with sitecore indexes/sitecore databases. So for these containers to be up, the relevant containers – solr/mssql should be up and healthy and they will be referred in the init containers via connection strings. Also, once the solr/mssql containers are initialized these two containers will be exited.
  • Most of us will be using a low disk space C drive….Okay if you are like me who had to use a low disk space C drive, we can actually map the docker data root to D drive via the daemon file.
  • Also, try to include the google dns 8.8.8.8. Sometimes the resources to be downloaded during docker compose up may be blocked.
  • Make sure the IIS is stopped. Sometimes if you are setting up local for multiple days chances are there that after starting you machine again, IIS will be turned on automatically. Although IIS Server stop is not mentioned anywhere and only the IIS apps needs to be stopped, I faced an issue which resolved only after I stopped IIS.

One thought on “Sitecore – XM Cloud – Errors During Setup and Fixes/Work Around

Leave a comment

Design a site like this with WordPress.com
Get started