I ran into an issue with Sitecore’s Data Exchange Framework v1.4.1 where my pipeline batches would intermittently create thousands of duplicate Sitecore items. This caused a bit of management overhead having to clean out the duplicates. Following some investigation, I decided to add some defensive coding by introducing a Custom Resolve Sitecore Item Processor to replace Sitecore’s OOTB pipeline step and prevent duplicates from being created.
Brief Overview of my DEF implementation
Basically, I have a custom SQL provider that calls a stored procedure and returns a lot of records. Using the Sitecore Resolve Sitecore Item processor which tries to resolve each item and either create a new item or update an existing item.
The unique identifier for each item is composed of various parts and each part contains useful information to help categorize the item. This is by design and the categorization helps determine the location in Sitecore where the item will live. Using custom bucket rules to interrogate the items unique identifier and placing the item in the correct bucket folder structure.
How does DEF Resolve items in Sitecore?
If you are not familiar with the OOTB Resolve Sitecore Item pipeline step it is probably helpful if I explain what is going on under the hood. The configuration contains the following fields to locate an existing item: Parent For Item & Matching Field Value Accessor. A search query is performed against the index matching on the value of the configured MatchingFieldValueAccessor to locate the item.
If you crack open the Sitecore Search log after you run a Batch Pipeline you will find the queries used to locate these items by the Matching Field Value Accessor.
The Object Resolution section of the configuration has a couple of different actions that can be triggered depending on the results of the search:
- Finish Pipeline If Object not Resolved – If the object is not resolved the pipeline will finish immediately. In other words, stop processing the current item and move onto the next item in the iteration.
- Do not Create If Object Not Resolved – If an object is not resolved a new object is not created.
Both of these would prevent a duplicate item being created. However, they also prevent new items from being created. This would be fine if you were only interested in updating existing items. But I wanted to create new items if they actually did not exist in Sitecore.
What was the Problem?
As you can see from above Sitecore uses the Index to perform a search and attempt to locate the item using a unique identifier you provide when configuring the Resolve Sitecore Item Processor. But what would happen if there was a glitch and the index is unavailable for a short period? During that same outage period, your DEF batch pipeline could also be running. DEF batch pipelines are capable of processing a single item in a few milliseconds so you could easily end up with a lot of duplicates very quickly. Even if your index is only unavailable for just a short period.
Now we don’t ever plan for our index to be unavailable as it plays such an integral part of your Sitecore solution. So when it is offline you usually have bigger issues to deal with. But we should never bank on something not occurring and therefore always plan for the worst case.
Could this be solved by Sitecore Configuration?
Sitecore introduced a new setting in 8.1 update 2 to prevent duplicates from being created where an item with the same name already exists at the same level in the content tree in Sitecore.
<setting name=”AllowDuplicateItemNamesOnSameLevel” value=”true” />
If you change the default setting from true to false you will not be able to add a duplicate item name to the content tree in Content Editor. As the unique identifier for my item is also the items name I had hoped setting this to false would have also prevented DEF from creating a duplicate item in the same bucket folder. Unfortunately, while this works effectively in the Content Editor it appeared to get ignored by the DEF pipeline step.
The Solution: A Custom Resolve Sitecore Pipeline Step
Time for some custom code. I needed the resolve Sitecore pipeline step to be able to perform an additional check for the item in the Sitecore master database if the item is not found by the index search. So I created a custom Resolve Sitecore Item Processor and Converter based on the OOTB pipeline with the additional check. I had to ensure that it was as efficient as possible so my sync time was not impacted too much. What would be the most efficient way to perform this search? Well, what do I know about the item that will help me locate it in Sitecore?
- I know the parent location the item will be created in as this is set on the Resolve Sitecore Item Pipeline step root path which is a bucket.
- I know my custom bucket rule is going to place the item within a specific bucket structure based on elements of the unique identifier.
- I know the name of the item.
With this information, I can construct the full path for the item and I can run a simple query to return a single item by its path: [sitecore root path][bucket root][custom bucket folders][item name]
/sitecore/content/Products/123/ABC/123-ABC-45678-DE9
I will need to add some additional configuration to the Pipeline Step Converter:
- Check for Duplicate – a switch so I can easily we enable or disable the check.
- Bucket Folder Depth – used to determine the bucket structure based on the segments in the unique identifier.
My Pipeline process with function as follows:
- Check if the Item exists in the Index
- If the item does not exist Then call new method GetBucketItem to search Sitecore master database for the item by its custom bucket folder path.
- If the item does not exist Then return null in the context and a new Sitecore item is created in the next pipeline step.
- If the item is found in the index or in Sitecore Then return the Item in the context and the next pipeline step updates the existing item.
protected ItemModel GetBucketItem(object identifierValue, IItemModelRepository repository, | |
Plugins.ResolveSitecoreItemSettings resolveItemSettings, PipelineContext pipelineContext) | |
{ | |
string[] identifierSegments = identifierValue.ToString().Split(new[] { '-' }); | |
var itemPath = string.Join("/", identifierSegments, 0, resolveItemSettings.BucketFolderDepth); | |
var bucketItemPath = resolveItemSettings.BucketRootPath + itemPath + "/" + identifierValue; | |
ItemModel itemModel = repository.Get(bucketItemPath); | |
if (itemModel != null) | |
{ | |
Log( | |
pipelineContext.Logger.Warn, | |
pipelineContext, | |
$"Item found in Bucket: {resolveItemSettings.BucketRootPath}", | |
$"identifier: {identifierValue}", | |
$"item id: {itemModel["ItemID"]}"); | |
return itemModel; | |
} | |
return null; | |
} |
That’s it a simple additional check that performs extremely well and I can guarantee no more duplicate items created by my batch process.