Best Practices: ediscovery Search Improve Speed and Accuracy of Reviews & Productions with the Latest Tools February 27, 2014 Karsten Weber Principal, Lexbe LC
ediscovery Webinar Series Info & Future Takes Place Monthly Cover a Variety of Relevant ediscovery Topics Next Month: Legal Timelines and Early Case Assessment Presentations Available for Download by Registrants.
ediscovery Webinar Series Questions & Technical Issues If you have any questions or technical issues, please e-mail them to: webinars@lexbe.com Questions will be forwarded to Karsten and answered during the webinar or via e-mail if we run out of time.
ediscovery Webinar Series Karsten Weber bio Current - Principal of Lexbe LC - Principal Architect of Lexbe ediscovery Suites and Lexbe ediscovery Services Prior Experience - Consulting Expert, Lumin Expert Group - Director of Software, nline Corporation - Software Engineering Manager, KLA-Tencor Education - MBA, University of Texas - M.S. Engineering, Danish Technical University Contact Karsten Weber 512-686-3469 karsten@lexbe.com
Use of Keyword Search In Discovery Early Stage Culling - Reduce amount of ESI to be reviewed by using keywords to cull document collections. Keyword-Based Responsive & Privilege Review - Construct search queries to return documents that are likely to be responsive, confidential. Search by name and email of counsel; privilege, workproduct, confidential and related keywords. ID Documents for Depo Prep - Find and assign key documents related to specific case participants to prepare for depositions. Search by email addresses used, names and nicknames used, important issues associated with deponent. ID of Key Docs for Trial - Find and mark key case documents. Code documents that will be needed for trial.
Pros of Keyword Searching Fast - Keyword search is very fast compared with other document search methodologies. Inexpensive - Good results can be obtained at little cost compared with manual review or other computer assisted methodologies. Quality - Search can deliver high quality results, particularly if keyword terms are carefully developed and tested. Avoids Manual Review Errors/Inconsistencies - Search results are computer generated, and so avoid known human review errors that can result from fatigue, inadequate training, lack of focus, etc.
Cons of Keyword Searching Search Can be Over or Under-Inclusive - Search terms can bring back too many junk results or miss good results. These are known as false positives and false negatives. Difficulty of Creating Good Search Terms - Constructing good search terms takes design time, testing, iterations, and analysis. Non-Searchable Text - Search results can only be as good as the underlying searchable text. ESI collections and review tools can miss text that a human reviewer might catch for a variety of reasons. Some file types can t be indexed - There is little consistency in what files can be indexed across litigation databases.
Construct Quality Searches Start with Request for Production - Translate the demands of the RFP into a keyword search strategy. Interview Custodians - Ask key case participants / data custodians about their ESI. Use their insights and their terminology to find obscure key documents. Include Jargon - Seek out industry or company, company sub-culture specific terms you may not be familiar with. Included Misspellings - Include misspelled versions of keywords or (use fuzzy search settings or boolean limiters) in your search string to account for emails, etc. with typos.
Use Search Expanders Search Expanders Enable Easy Expansion to Reduce False Negatives Concept - Thesaurus lookup and synonym search. Conceptually expands search query. Stemming - Expands query to include derivative terms associated with the search keywords. Fuzzy - insertion deletion, or substitution of a character in the search query to account for search error, spelling errors within the document, and potential OCR error Phonetic - Returns results that sound similar to the search query.
Use Search Expanders Concept Search Example Trade = Swap = quid pro quo
Use Search Expanders Stemming Search Example Trade = Trading = Trades
Use Search Expanders Fuzzy Search Example - Misspelling Fastow = Fastaw = Fasto
Use Search Expanders Boolean Search Basic Boolean Operators: - AND: returns results including both terms - OR : looking for at least one of a list of terms - NOT : exclude terms you don t want - ( ) : can be used to separate OR statements from the rest of the boolean string. - PRE/n : First search term does not precede the second term by more than n words. - Wildcard Characters: * replaces a letter in your search term,! allows for stemming search within a boolean query
Use Search Limiters Search Limiters Reduce False Positives (Noise) Filter Out Unneeded File Types. Some file types are unlikely to lead to useful information and can be excluded. Use Boolean Modifiers to Limit Overly Expansive Searches - Boolean modifiers can reduce the number of documents returned from a query while increasing the relevance of those files. Exclude certain words or combinations, and specify word order.
Use Search Limiters Boolean Search Example Lay! w/25 Chewco
Test Keyword Searching Results Look at Results Returned. Searching without review and testing may result in low quality results. Sample & Look for Ways to Limit Search - Create new queries that reduce false positives. More new keywords. - Viewing search results may prompt the discovery of additional keywords that could be used to expand or reduce search queries. Fuzzy and Concept Search - New keywords found by searching and returning synonyms and near identical words. Keyword searching becomes an iterative process.
Common Indexing Methods There Are Traditionally Two Types of Search Indices: Imaged and OCRed - The search text is coming from the files after they have been converted to TIFF / PDF. Extracted Text - The search text is coming from text extracted from the original file. Both approaches have significant limitations.
Search Index Based on OCR of Imaged Files Description - Native files (email, attachments, spreadsheets, etc.) are converted to a paginated image file and then OCR is applied to make the text searchable. (ex. TIFF production with no extracted text). How? - Conversion software uses a print-driver approach to virtually image what would have been physically printed. Data Not Indexed - Headers/footers/notes, comments and revisions, highlighted text, hidden sheets or text, print selections, applied filters,
Search Index Based on OCR of Imaged Files How Doc Appears Natively: OCR Based Index Will Include: Chewco 2000 Pro Forma Sheet Body Text
Search Index Based on Native Extraction Description - Available text from Native files (email, attachments, spreadsheets, etc.) is extracted and indexed by the search engine using text parsing. (ex. pure native review) How? - Only available text is used. There is no OCR applied. Data Not Indexed - Non-text files (ex. scanned documents) and embedded text, objects, or visuals will not be indexed. Different native extraction methods can also vary in their ability to recognize certain types of text.
Search Index Based on Native Extraction How Doc Appears Natively: Native Extraction Index Will Include: Page 1/12 Chewco 2000 Pro Forma Balance Statement Sheet [S1: CRITICAL ENRON EVIDENCE] Page 1/12
Dual Index The Lexbe search engine indexes both text extracted from Native files (email, attachments, spreadsheets, etc.) and a paginated file converted from Native files into PDF or TIFF and OCRed. Most comprehensive approach minimizes potential for lost and unsearchable data. Benefits of Dual Index Approach Index Method Captures Embedded Text Captures Text Excluded From Print Captures Hidden Text Imaged/OCR Yes No No Native Extraction No Yes Yes Lexbe Dual Index Yes Yes Yes
Dual Index
Thank You for Attending About Lexbe and Contact Information Phone (Toll Free) (800) 401-7809 Webinar Questions: webinars@lexbe.com Next Month s Webinar: Legal Timelines and Early Case Assessment Lexbe is an ediscovery software and services provider based in Austin, TX.