Clean Orphaned Blob Records
Originally published on michaellwest.blogspot.com
Today we explore some of the challenges with ignoring maintenance of your Sitecore installation.
Recently I found over 60% of the Sitecore master and web databases were consumed by orphaned blob records. This behavior seems to occur when you detach data from an existing media item or delete the media item entirely. The Blobs table holds onto these orphaned records until a separate cleanup process runs. There have been a few articles on this before, so I’ll keep this focused on what I discovered along the way.
The part you care about
Database details before cleanup — we ran this against Sitecore XM 10.2 and 10.4:
- The MDF file was approaching 60GB with over 90% used space
- Total blob records in the table: 127,300
- Total unused blob records: 104,466
- 59,353 distinct BlobId values
Consult with Sitecore support before running any cleanup scripts against a production database.
Database details after cleanup:
- The MDF file now shows used space at 8% (4.7GB)
- Total blob records in the table: 22,867
Total run time: 7 hours, 2 minutes, 23 seconds
What we used
Full scripts are in this gist: gist.github.com/michaellwest/9a9a7b78ac4f1a3463fe3d59ad5455f2
CountBlobsTotal.sql — faster than SELECT COUNT(1) on a large table because it reads partition statistics instead of scanning rows:
/* Count all the rows in the Blobs table.
Considerably faster than SELECT COUNT(1) FROM [Blobs] */
SELECT
Total_Rows= SUM(st.row_count)
FROM
sys.dm_db_partition_stats st
WHERE
object_name(object_id) = 'Blobs' AND (index_id < 2)
CountUnusedBlobs.sql — identifies orphaned records by cross-referencing the Blobs table against every field table that can reference a blob. Checks both braced and unbraced GUID formats:
/* Returns the count of unused blobs both in total and a unique list. */
WITH [ExistingBlobs] ([BlobId]) AS (
SELECT [Blobs].[BlobId] FROM [Blobs]
JOIN [SharedFields] ON '{' + CONVERT(NVARCHAR(MAX), [Blobs].[BlobId]) + '}' = [SharedFields].[Value]
UNION
SELECT [Blobs].[BlobId] FROM [Blobs]
JOIN [SharedFields] ON CONVERT(NVARCHAR(MAX), [Blobs].[BlobId]) = [SharedFields].[Value]
UNION
SELECT [Blobs].[BlobId] FROM [Blobs]
JOIN [VersionedFields] ON '{' + CONVERT(NVARCHAR(MAX), [Blobs].[BlobId]) + '}' = [VersionedFields].[Value]
UNION
SELECT [Blobs].[BlobId] FROM [Blobs]
JOIN [VersionedFields] ON CONVERT(NVARCHAR(MAX), [Blobs].[BlobId]) = [VersionedFields].[Value]
UNION
SELECT [Blobs].[BlobId] FROM [Blobs]
JOIN [UnversionedFields] ON '{' + CONVERT(NVARCHAR(MAX), [Blobs].[BlobId]) + '}' = [UnversionedFields].[Value]
UNION
SELECT [Blobs].[BlobId] FROM [Blobs]
JOIN [UnversionedFields] ON CONVERT(NVARCHAR(MAX), [Blobs].[BlobId]) = [UnversionedFields].[Value]
UNION
SELECT [Blobs].[BlobId] FROM [Blobs]
JOIN [ArchivedFields] ON '{' + CONVERT(NVARCHAR(MAX), [Blobs].[BlobId]) + '}' = [ArchivedFields].[Value]
UNION
SELECT [Blobs].[BlobId] FROM [Blobs]
JOIN [ArchivedFields] ON CONVERT(NVARCHAR(MAX), [Blobs].[BlobId]) = [ArchivedFields].[Value]
)
SELECT
COUNT([Blobs].[BlobId]) AS [UnusedCount],
COUNT(DISTINCT [Blobs].[BlobId]) AS [UnusedDistinctCount]
FROM [Blobs]
LEFT JOIN [ExistingBlobs] ON [Blobs].[BlobId] = [ExistingBlobs].[BlobId]
WHERE [ExistingBlobs].[BlobId] IS NULL
CleanUnusedBlobs.sql — the actual deletion script. It uses the same CTE to identify orphans, loads them into a temp table, then deletes in batches of 1,000 with a loop cap of 10,000 iterations. Each batch is its own transaction. Full script in the gist above — review it before you run it.
Takeaway
If your Sitecore database is ballooning and you can’t explain why, check the Blobs table. The numbers might surprise you. Regular maintenance tasks like the Clean Up agent in the Control Panel exist for a reason — make sure they’re actually running on a schedule.