White Paper DELETE DUPLICATE EMAILS IN THE EMC EMAILXTENDER ARCHIVE SYSTEM USING THE MSGIDCRACKER UTILITY Abstract This white paper describes the process of using the EmailXtender Customized MsgIdCracker Utility to delete duplicate emails from the EMC EmailXtender archiving system. July 2011
Copyright 2011 EMC Corporation. All Rights Reserved. EMC believes the information in this publication is accurate as of its publication date. The information is subject to change without notice. The information in this publication is provided as is. EMC Corporation makes no representations or warranties of any kind with respect to the information in this publication, and specifically disclaims implied warranties of merchantability or fitness for a particular purpose. Use, copying, and distribution of any EMC software described in this publication requires an applicable software license. For the most up-to-date listing of EMC product names, see EMC Corporation Trademarks on EMC.com. All trademarks used herein are the property of their respective owners. Part Number H8837 2
Table of Contents Executive summary... 4 Audience...4 Pre-requisites for using the MsgIDCracker utility... 4 Risk Factors... 5 Using the MsgIDCracker utility...5 Conclusion... 6 References... 6 3
Executive summary The information provided in this white paper will help you use the MsgIDCracker utility to delete duplicate email messages in the EMC EmailXtender archiving system. This paper also includes information about the consequences of deleting duplicates in a production environment. If you have enabled the Cached Exchange Mode feature in Microsoft Office Outlook, and if you archive email messages in the Sent Items folder using the EmailXtract tool, a duplicate copy of each message is archived. Use the customized MsgIDCracker utility to clean up duplicates after performing the archive operation. For additional information about the EmailXtender product and its utilities, see the relevant product documentation. Audience This white paper is intended for customers who want to delete duplicate email messages in their email archiving system. It also provides adequate information about risk factors involved in deleting duplicates in the production environment. Pre-requisites for using the MsgIDCracker utility Perform the following steps before you use the MsgIDCracker utility: 1. Take a backup of container files associated with duplicate messages. This will help rollback the files to their original state, if inconsistencies occur. 2. Identify the date range of duplicate messages. Use the following SQL query to retrieve duplicate messages within a specific date range: SELECT MD5HashKey,TrackingID,TimeStamp,MsgDate WHERE (TrackingId) IN (SELECT TrackingID GROUP BY TrackingID HAVING COUNT(TrackingId) > 1) GROUP BY MD5HashKey,TrackingID, TimeStamp, MsgDate ORDER BY TimeStamp 3. Save the result of the query you executed in Step 2, to a.csv file. Use Microsoft Office Excel to sort messages by MsgDate, to obtain the beginning and the end date (date range) of the duplicate messages. 4. Restore all shortcuts associated with duplicate messages (Journal and Sent Items) using the EmailXtract tool in the date range specified in Step 3. 4
Risk Factors Running the MsgIDCracker utility has the following risks associated with it: Archived Sent Items will have all users set as owners of the messages including Bcc users. If you delete duplicate messages archived from the Sent Items folder, you may lose Bcc information. When users who are blind carbon copied perform a search operation for their messages, no message is returned in the search results. If the user performs the Extract operation on the Sent Items folder after messages in the Sent Items folder are deleted from the archive, duplicate messages are created again. Using the MsgIDCracker utility Perform the following steps to use the MsgIDCracker utility: 1. Execute the following query to obtain a list of duplicate messages: CREATE TABLE DuplicateRecords (TrackId bigint NOT NULL PRIMARY KEY, MD5Hash bigint, TimeStm int, MsgDt datetime); Declare mycur CURSOR GLOBAL for SELECT MD5HashKey,TrackingID,TimeStamp,MsgDate WHERE (TrackingId) IN (SELECT TrackingID GROUP BY TrackingID HAVING COUNT(TrackingId) > 1) GROUP BY MD5HashKey,TrackingID, TimeStamp, MsgDate ORDER BY TimeStamp DESC Declare @MD5Key bigint, @Trackid bigint, @Tstamp int, @PreTrackid bigint, @MsgDate datetime Set @PreTrackid =-1 OPEN mycur FETCH NEXT FROM mycur INTO @MD5Key,@Trackid,@Tstamp,@MsgDate WHILE(@@FETCH_STATUS =0) BEGIN if (@PreTrackid!= @Trackid) Begin INSERT INTO DuplicateRecords(TrackId,MD5Hash,TimeStm,MsgDt) VALUES (@Trackid,@MD5Key,@Tstamp,@MsgDate); Set @PreTrackid = @Trackid END 5
FETCH NEXT FROM mycur INTO @MD5Key,@Trackid,@Tstamp,@MsgDate END SELECT MD5Hash,Timestm FROM DuplicateRecords Close mycur DEALLOCATE mycur 2. Save the result in the duplicates.txt file. 3. Create the C:\CleanUpActivity folder. 4. Execute the delivered utility as follows at the command prompt: CmdPrompt:\> <Utility path\ MsgIDCracker.exe> /c <File Path\duplicates.txt> After executing this command, 0 KB files are created with the.delreq extension in the C:\CleanUpActivity folder. 5. Take a backup of this folder to use as a reference for deleted messages. 6. Move all.delreq files from the C:\CleanUpActivity folder to the EX Installed Dir\Archive Deletion folder. All duplicate messages are deleted. Conclusion The paper describes the MsgIDCracker solution that customers can use to clean up their email archiving production environment. This paper also helps customers understand the consequences of deleting duplicate email messages. References The Powerlink website (http://powerlink.emc.com) contains the downloadable packages for EmailXtender Archiving Solution product versions along with the release notes and other relevant documentation associated with the product version. To locate product documentation, navigate to Support > Technical Documentation and Advisories > Software ~ D ~ Documentation, then select the product name and version number. Note: Most of the Content Management products are listed under Software D > Documentum?, where? = a letter, or letters, in the alphabet. Product documentation that is available online from the application (such as online help) does not appear as a separate item. It is automatically downloaded and installed with the software. 6