dtsearch Regular Expressions



Similar documents
dtsearch Frequently Asked Questions Version 3

Regular Expression Searching

Regular Expressions (in Python)

Regular Expression Syntax

Lecture 18 Regular Expressions

Regular Expressions. Abstract

Regular Expressions Overview Suppose you needed to find a specific IPv4 address in a bunch of files? This is easy to do; you just specify the IP

Creating Basic Excel Formulas

Searching Guide Version 8.0 December 11, 2013

Eventia Log Parsing Editor 1.0 Administration Guide

Scanner. tokens scanner parser IR. source code. errors

University Convocation. IT 3203 Introduction to Web Development. Pattern Matching. Why Match Patterns? The Search Method. The Replace Method

DigitalPersona. Password Manager Pro. Version 5.0. Administrator Guide

Regular Expressions. In This Appendix

Lecture 4. Regular Expressions grep and sed intro

Kiwi Log Viewer. A Freeware Log Viewer for Windows. by SolarWinds, Inc.

Search Requests (Overview)

Time Clock Import Setup & Use

Using Regular Expressions in Oracle

MS Access: Advanced Tables and Queries. Lesson Notes Author: Pamela Schmidt

JavaScript: Introduction to Scripting Pearson Education, Inc. All rights reserved.

URL encoding uses hex code prefixed by %. Quoted Printable encoding uses hex code prefixed by =.

SYMETRIX SOLUTIONS: TECH TIP August 2015

Counting in base 10, 2 and 16

Regular Expressions. General Concepts About Regular Expressions

What's New in ADP Reporting?

Virtual Integrated Design Getting started with RS232 Hex Com Tool v6.0

NiCE Log File Management Pack. for. System Center Operations Manager Quick Start Guide

X1 Professional Client

Using SPSS, Chapter 2: Descriptive Statistics

Exercise 4 Learning Python language fundamentals

Chapter 2 Introduction to SPSS

C H A P T E R Regular Expressions regular expression

Regular Expressions in Create Lists Revised April 2015 to account for both Millennium and Sierra

The Best Kept Secrets to Using Keyword Search Technologies

Advanced Query for Query Developers

How To Code In Vba Vb.Io (For Ahem)

novdocx (en) 11 December 2007 VIDistribution Lists, Groups, and Organizational Roles

Tutorial 5 Creating Advanced Queries and Enhancing Table Design

T201 - SEARCHING FOR MONEY

BACKSCATTER PROTECTION AGENT Version 1.1 documentation

Topics. Parts of a Java Program. Topics (2) CS 146. Introduction To Computers And Java Chapter Objectives To understand:

Verilog - Representation of Number Literals

Users Guide. V1.0 Page 1

Web Programming Step by Step

CS106A, Stanford Handout #38. Strings and Chars

FILESURF 7.5 SR3/WORKSITE INTEGRATION INSTALLATION MANUAL 1 PRELIMINARIES...3 STEP 1 - PLAN THE FIELD MAPPING...3 STEP 2 - WORKSITE CONFIGURATION...

Workflow Solutions for Very Large Workspaces

Tutorial Microsoft Office Excel 2003

Maple Introductory Programming Guide

Retrieving Data Using the SQL SELECT Statement. Copyright 2006, Oracle. All rights reserved.

Version 1.5 Satlantic Inc.

Excel: Introduction to Formulas

Unit 10: Microsoft Access Queries

Data Tool Platform SQL Development Tools

Python Loops and String Manipulation

CSE 341 Lecture 28. Regular expressions. slides created by Marty Stepp

Arctic Network SQL Server Data Analysis Using Microsoft Access

2.1 Data Collection Techniques

ASCII CONTROL COMMANDS FOR MATRIX SWITCHING SYSTEMS AN_0001

Coupling Microsoft Excel with NI Requirements Gateway

SOME EXCEL FORMULAS AND FUNCTIONS

Wire Transfer. Business Link. Creating a Wire Transfer Template. Wire Transfer Types. Wire Transfer Templates and Transactions

1.1 AppleScript Basics

Joins Joins dictate how two tables or queries relate to each other. Click on the join line with the right mouse button to access the Join Properties.

set in Options). Returns the cursor to its position prior to the Correct command.

Directions for the Well Allocation Deck Upload spreadsheet

COMSC 100 Programming Exercises, For SP15

Excel 2003 Tutorial I

Lab 9 Access PreLab Copy the prelab folder, Lab09 PreLab9_Access_intro

Kaseya 2. User Guide. Version 7.0. English

Access Queries (Office 2003)

Version August 2016

Introduction to OCR in Revu

Bachelors of Computer Application Programming Principle & Algorithm (BCA-S102T)

Includes Enhanced Search Tools! Reference Guide. Go to to discover this essential resource today

CNAM is the wireless Caller Name that displays on a caller id capable wire line device.

Ecma/TC39/2013/NN. 4 th Draft ECMA-XXX. 1 st Edition / July The JSON Data Interchange Format. Reference number ECMA-123:2009

IV-1Working with Commands

Using This Reference Manual Chapter 1 to Issue ACL Commands

Honors Class (Foundations of) Informatics. Tom Verhoeff. Department of Mathematics & Computer Science Software Engineering & Technology

ACE STUDY GUIDE. 3. Which Imager pane shows information specific to file systems such as HFS+, NTFS, and Ext2? - Properties Pane

FrontPage 2003: Forms

ASCII data query and parser plugin module PRINTED MANUAL

Variables, Constants, and Data Types

Qlik REST Connector Installation and User Guide

Appendix K Introduction to Microsoft Visual C++ 6.0

How To Use Excel With A Calculator

Compiler Construction

Chapter 9 Creating Reports in Excel

Evaluator s Guide. PC-Duo Enterprise HelpDesk v5.0. Copyright 2006 Vector Networks Ltd and MetaQuest Software Inc. All rights reserved.

Microsoft Access 3: Understanding and Creating Queries

ESCI 386 IDL Programming for Advanced Earth Science Applications Lesson 6 Program Control

java.util.scanner Here are some of the many features of Scanner objects. Some Features of java.util.scanner

Digital Resource Management

Chapter 2: Elements of Java

Importing Data from a Dat or Text File into SPSS

Setup Manual and Programming Reference. RGA Ethernet Adapter. Stanford Research Systems. Revision 1.05 (11/2010)

USING EXCEL 2010 TO SOLVE LINEAR PROGRAMMING PROBLEMS MTH 125 Chapter 4

Adding Clients in Track

Transcription:

dtsearch Regular Expressions In the AccessData Forensic Toolkit, regular expression searching capabilities has been incorporated in the dtsearch index search tab. This functionality does not use RegEx++ that we are accustomed to in the Live Search tab. dtsearch utilizes the TR1 (Technical Report 1) regular expressions. Regular expressions in dtsearch provide a powerful syntax for searching for complicated patterns in text, such as one of several possible sequences of letters followed by a sequence of numbers. Regular expressions can also be used to express spelling variations of individual words. Regular expression patterns are arbitrary (i.e., supplied by the user dynamically) and cannot be pre-indexed. Regular expression searching in dtsearch is limited to a single whole word. A regular expression included in the dtsearch box must be quoted and must begin with ##. An example of this is: Apple and "##199[0-9]" will find Apple and 1990 through 1999 Apple and "##19[0-9]+" will find Apple and 190 through 199 However, if you wanted to look for Apple Pie, you could not use ##app.*ie since this is two words. Only letters and numbers are searchable. You cannot search for any of the non-indexed characters as defined in the Index Search Settings in the Detailed Options section of a case creation. Also, dtsearch does not store information about line breaks so any searches that are made that include the beginning of a line or the end of a line will not work. Search considerations using the wildcard character * in a regular expression does have an effect on search speed: the closer to the front of a word the expression is, the more it will slow searching. "Appl.*" will be nearly as fast as "Apple", while ".*pple" will be much slower. NOTE: Advanced searching for Social Security Numbers and Credit Card Numbers and other number patterns can be achieved; however modifications to the dtsearch engine must be made before processing the case. For more details, see Advanced Searching on page 7 of this paper. Syntricate, Inc. 1

Element Terms Characters and target sequences are referred to as an Element and can be one of the following: A literal character typed as the actual letter or number ( a or 1). A. (period) is any single character. An * (asterisk) is a wildcard character. (a) is a capture group. \d is a decimal character. For hex searches, \xhh matches a hex entry (ie \x0f). {2} is a repetition character. A, (comma) is a minimum character. (aa?) is a target sequence. An alternation character search is this that. A concatenation sequence is (a){2,3}(b){2,3}(c). A back reference is ((a+)(b+))(c+)\3. (?:subexpression) matches the sequence of characters in the target sequence that is matched by the patter between the delimiters. (?!:sub-expression) matches any sequence of characters in the target sequence that does not match the pattern listed in the sub-expression) A bracket or range expression of the form "[expr]", which matches a value or a range, similar to a set in the Live Pattern Search. "##a" matches the target sequence "a" but does not match the target sequences "b", or "c" etc. "##." matches a single character such as "a", "b", and "c" etc. "##sal*" matches the target sale and the target salt and so on. "##(a)" capture group, matches the target sequence "a" but does not match the target sequences "b", or "c" etc. ##\d\d\d\d matches the target sequence of four digits 1234. ##aa? or {0,1} matches the target sequence of aa and the target sequence of aaa. ##ab matches the target sequence ab. "##[b-z]" or range, matches the target sequences "b" and "c" but does not match the target sequences "a". ##tom jerry matches the target sequence of tom or jerry. ##\d{4} or repetition, matches the target sequence of four digits 1234. ##(?:aa) or target sequence, matches the target sequence of aa and the target sequence of aaa, and so on. Syntricate, Inc. 2

Ordinary Character By entering actual ASCII characters, the search will return that set of characters after the element(s) are entered. By entering ordinary characters, ##nick, you would find said characters. However, if you wanted to look for Nick Drehel, you could not use ##nick drehel since this is two words. Single Any Character and Wildcard The use of the any character element can be used if a letter or letters may be different, such as difference in spelling (example marijuana and marihuana ). The wildcard is used to find any combination of characters after an element is entered. "##(a*)" matches the target sequence "a", the target sequence "aa", and so on. "##a*" matches the target sequence "a", the target sequence "aa", and so on. "##(a.)" matches the target sequence "aa", the target sequence "ab", but will not find the target sequence the target sequence aaa. "##a." matches the target sequence "aa", the target sequence "ab", but will not find the target sequence the target sequence aaa. ##.*ick matches the target sequence nick, the target sequence click, and so on. ##mari.uana matches the target sequence marijuana and the target sequence marihuana. Capture Group A capture group marks its contents as a single unit in the regular expression and labels the target text that matches its contents. The label that is associated with each capture group is a number, which is determined by counting the opening parentheses that mark capture groups up to and including the opening parenthesis. Example: ##(ab)* matches the target sequence ab, the target sequence abab, and so on. ##(a+)(b+) matches the target sequence ab, the target sequence aab, the target sequence abb, and so on. ##ab+ matches the target sequence abb but does not match the target sequence abab. ##(ab)+ matches the target sequence abab but does not match the target sequence abb. ##((a+)(b+))(c+)" matches the target sequence "aabbbc" and associates capture group 1 with the subsequence "aabbb", capture group 2 with the subsequence "aa", capture group 3 with "bbb", and capture group 4 with the subsequence "c". Syntricate, Inc. 3

Repetition Any element can be followed by a repetition count. "##(a{2})" matches the target sequence "aa" but not the target sequence "a" or the target sequence "aaa". "##(a{2,})" matches the target sequence "aa", the target sequence "aaa", and so on, but does not match the target sequence "a". A repetition count can also take the following form: "?" - Equivalent to "{0,1}". "a?" matches the target sequence "" and the target sequence "a", but not the target sequence "aa". ##(aa?)(bbbb?)(c) matches the target sequence aabbbbc and the target sequence abbbc. Decimal Character You can locate any set of decimals by using the \d character element in the expression. "##\d\d\d\d" matches the target sequence "1234". ##\d[3} matches the target sequence 123. ##\d{3}\d\d\d\d matches the target sequence 1234567. Visa and ##\d{4} will match any files that contain the word visa and any four digits. Syntricate, Inc. 4

Alternation A concatenated regular expression can be followed by the character ' ' and another concatenated regular expression. Any number of concatenated regular expressions can be combined in this manner. The resulting expression matches any target sequence that matches one or more of the concatenated regular expressions. Example: "##(nick houston)" matches the target sequence "nick", or the target sequence houston. Concatenation Regular expression elements, with or without repetition counts, can be concatenated to form longer regular expressions. The resulting expression matches a target sequence that is a concatenation of the sequences that are matched by the individual elements. "##(a){2,3}(b){2,3}(c)" matches the target sequence "aabbc", the target sequence aaabbbc. ##(\d{4}){4} matches the target sequence of 1234123412341234 (16 digits no spaces). Back Reference A back reference marks its contents as a single unit in the regular expression grammar and labels the target text that matches its contents. The label that is associated with each capture group is a number, which is determined by counting the opening parentheses that mark capture groups up to and including the opening parenthesis that marks the current capture group. A back reference is a backslash that is followed by a decimal value N. It matches the contents of the Nth capture group. The value of N must not be more than the number of capture groups that precede the back reference. Example: "((a+)(b+))(c+)\3" matches the target sequence "aabbbcbbb". The back reference "\3" matches the text in the third capture group, that is, the "(b+)". It does not match the target sequence "aabbbcbb". The first capture group is ((a+)(b+)) The second capture group is (a+) The third capture group is (b+) The fourth capture group is (c+) Syntricate, Inc. 5

Bracket or Character Range A character range in a bracket expression adds all the characters in the range to the character set that is defined by the bracket expression. To create a character range, put the character '-' between the first and last characters in the range. Doing this puts into the set all characters that have a numeric value that is more than or equal to the numeric value of the first character, and less than or equal to the numeric value of the last character. "[0-7]" represents the set of characters { '0', '1', '2', '3', '4', '5', '6', '7' }. It matches the target sequences "0", "1", and so on, but not "a". "[h-k]" represents the set of characters { 'h', 'i', 'j', 'k' }. "[0-24]" represents the set of characters {'0', '1', '2', '4' }. "[0-2]" represents the set of characters { '0', '1', '2' }. An individual character in a bracket expression adds that character to the character set that is defined by the expression. If the bracket expression begins with a ^ then this defines that the expression will consider all characters except for those listed. "[abc]" matches the target sequences "a", "b", or "c", but not the sequence "d". "[^abc]" matches the target sequence "d", but not the target sequences "a", "b", or "c". "[a^bc]" matches the target sequences "a", "b", "c", or "^", but not the target sequence "d". Syntricate, Inc. 6

Advanced Searching - TR1 Regular Expressions for Number Patterns If order to achieve dtsearch capability in FTK for search strings such as Social Security Numbers, Credit Card Numbers, Employee Identification Numbers, Telephone Numbers, and etc. where a period or hyphen is present. Certain steps must be done during the pre-processing phase of the case. NOTE: Currently, FTK cannot include search patterns with spaces. Normal dtsearch strings for credit card numbers or social security numbers The normal dtsearch wildcard string can be utilized as long as the hyphen is set to be indexed as a space: Social Security Numbers - === == ==== Returns 123-45-6789 Will not return 123 45 6789 Credit Card Numbers (16 digits) - ==== ==== ==== ==== Returns 1234-1234-1234-1234 Will not return 1234 1234 1234 1234 Number Patterns dtsearch TR1 Regular Expression for find number patterns as we do in Live Searches for such things as Credit Card Numbers, Social Security Numbers, xxxxxxxx. Certain pre-processing options MUST be completed by the examiner before this function will work. Pre-Processing Options If the examiner wants to utilize the dtsearch TR1 Regular Expression functions for looking for number patterns, they must complete the following pre-processing options: 1. Start a new case 2. Select Detailed Options 3. Click Indexing Options next to dtsearch Text Index 4. On the Indexing Options dialog window For Hyphen Treatments set to Hyphen In the Spaces section remove the period In the Spaces section remove the left and right parenthesis In the Letters section click Add and in all 4 spaces, type a. period and the left and right parenthesis, then click OK 5. Process the case Syntricate, Inc. 7

Examples of TR1 Regular Expressions for Number Patterns For Credit Card Numbers - "##(\d{4}[\.\-])(\d{4}[\.\-])(\d{4}[\.\-])(\d{4})" The first three groups are composed of (\d{4}[\s\.\-]). The expression is looking for four digits followed by a space, period, or hyphen. This group is repeated three times and followed by the group looking for the ending 4 digits. We can shorten that expression by writing it "##(\d{4})[\s\.\-]){3}(\d{4})". This will find 1234-5678-1234-5678 or 1234.5678.1234.5678. For Social Security Numbers "##(\d{3}[\.\-])(\d{2}[\.\-])(\d{4})". This will find 123-45-6789 or 123.45.6789. For U.S. Telephone Numbers "##(\d[\-\.])?(\(?\d{3}[\-\.\)])?([\-\.]?\d{3}[\-\.])(\d{4})" This will find: 567-8901 234-567-8901 1-234-567-8901 (234)567-8901 (234)-567-8901 567.8901 234.567.8901 1.234.567.8901 (234)567.8901 (234).567.8901 Syntricate, Inc. 8

References Regular Expressions dtsearch Support. http://support.dtsearch.com/webhelp/dtsearch/regular_.htm MSDN: TR1 Regular Expressions. http://msdn.microsoft.com/en-us/library/bb982727.aspx Syntricate, Inc. 9