Regular Expression Syntax



Similar documents
Regular Expressions Overview Suppose you needed to find a specific IPv4 address in a bunch of files? This is easy to do; you just specify the IP

Regular Expressions (in Python)

Regular Expressions for Perl, C, PHP, Python, Java, and.net. Regular Expression. Pocket Reference. Tony Stubblebine

Using Regular Expressions in Oracle

Variables, Constants, and Data Types

Regular Expression Searching

Lecture 18 Regular Expressions

Regular Expressions. The Complete Tutorial. Jan Goyvaerts

Kiwi Log Viewer. A Freeware Log Viewer for Windows. by SolarWinds, Inc.

Being Regular with Regular Expressions. John Garmany Session

Regular Expressions. In This Appendix

Bachelors of Computer Application Programming Principle & Algorithm (BCA-S102T)

Chapter 4: Computer Codes

Lecture 4. Regular Expressions grep and sed intro

So far we have considered only numeric processing, i.e. processing of numeric data represented

dtsearch Regular Expressions

Regular expressions are a formal way to

F ahrenheit = 9 Celsius + 32

C++ Language Tutorial

The l3regex package: regular expressions in TEX

Version August 2016

JavaScript: Introduction to Scripting Pearson Education, Inc. All rights reserved.

Binary Representation

ASCII data query and parser plugin module PRINTED MANUAL

ASCII Code. Numerous codes were invented, including Émile Baudot's code (known as Baudot

Introduction to Java Applications Pearson Education, Inc. All rights reserved.

Content of this lecture. Regular Expressions in Java. Hello, world! In Java. Programming in Java

Lecture 5: Java Fundamentals III

JAVA - QUICK GUIDE. Java SE is freely available from the link Download Java. So you download a version based on your operating system.

Number Representation

Pemrograman Dasar. Basic Elements Of Java

Verilog - Representation of Number Literals

CSI 333 Lecture 1 Number Systems

The ASCII Character Set

Regular Expressions. General Concepts About Regular Expressions

Name: Class: Date: 9. The compiler ignores all comments they are there strictly for the convenience of anyone reading the program.

Web Programming Step by Step

Numeral Systems. The number twenty-five can be represented in many ways: Decimal system (base 10): 25 Roman numerals:

DEBT COLLECTION SYSTEM ACCOUNT SUBMISSION FILE

Some Scanner Class Methods

SYMETRIX SOLUTIONS: TECH TIP August 2015

Lecture 11: Number Systems

Python Lists and Loops

Regular Expressions in Create Lists Revised April 2015 to account for both Millennium and Sierra

University Convocation. IT 3203 Introduction to Web Development. Pattern Matching. Why Match Patterns? The Search Method. The Replace Method

Topics. Parts of a Java Program. Topics (2) CS 146. Introduction To Computers And Java Chapter Objectives To understand:

Handout 1. Introduction to Java programming language. Java primitive types and operations. Reading keyboard Input using class Scanner.

Ecma/TC39/2013/NN. 4 th Draft ECMA-XXX. 1 st Edition / July The JSON Data Interchange Format. Reference number ECMA-123:2009

Form ONRR-2014 Reporting Instructions ASCII Format

IBM Emulation Mode Printer Commands

Systems I: Computer Organization and Architecture

Chapter Binary, Octal, Decimal, and Hexadecimal Calculations

Express Yourself! Regular Expressions vs SAS Text String Functions Spencer Childress, Rho, Inc., Chapel Hill, NC

PL / SQL Basics. Chapter 3

.NET Standard DateTime Format Strings

Hal E-Bank Foreign payments (Format of export/import files)

grep, awk and sed three VERY useful command-line utilities Matt Probert, Uni of York grep = global regular expression print

Example of a Java program

Text Processing In Java: Characters and Strings

Retrieving Data Using the SQL SELECT Statement. Copyright 2006, Oracle. All rights reserved.

Regular Expressions and Pattern Matching

Revision 5 - (Released April 2008) Added further definitions of RFID tag contents and testing requirements for RFID tag read range.

Symbols in subject lines. An in-depth look at symbols

Introduction to Visual C++.NET Programming. Using.NET Environment

Evaluation of JFlex Scanner Generator Using Form Fields Validity Checking

Chapter 2: Elements of Java

Searching Guide Version 8.0 December 11, 2013

Managing Code Development Using CCS Project Manager

2.1 Data Collection Techniques

URL encoding uses hex code prefixed by %. Quoted Printable encoding uses hex code prefixed by =.

CS106A, Stanford Handout #38. Strings and Chars

Binary Representation. Number Systems. Base 10, Base 2, Base 16. Positional Notation. Conversion of Any Base to Decimal.

How to represent characters?

Frequently Asked Questions on character sets and languages in MT and MX free format fields

Programming Languages CIS 443

VMware vcenter Log Insight User's Guide

How Strings are Stored. Searching Text. Setting. ANSI_PADDING Setting

Analyzing Unicode Text with Regular Expressions

ASCII Encoding. The char Type. Manipulating Characters. Manipulating Characters

Chart of ASCII Codes for SEVIS Name Fields

VMware vrealize Log Insight User's Guide

PaymentNet Federal Card Solutions Cardholder FAQs

MS Access: Advanced Tables and Queries. Lesson Notes Author: Pamela Schmidt

Play a Sound file in Visual Basic 6

Regular Expressions. Abstract

Excel Project From the Start menu select New Office Document. If necessary, click the General tab and then double-click Blank workbook.

Conditionals (with solutions)

Eventia Log Parsing Editor 1.0 Administration Guide

Exchanger XML Editor - Canonicalization and XML Digital Signatures

The use of binary codes to represent characters

Introduction to UNIX and SFTP

Expense and Cost Recovery Software. Tabs3 Device Interface Instructions. WCNVASCV Instructions. DOS Device Interface Instructions

Secrets of printf. 1 Background. 2 Simple Printing. Professor Don Colton. Brigham Young University Hawaii. 2.1 Naturally Special Characters

Data Storage. Chapter 3. Objectives. 3-1 Data Types. Data Inside the Computer. After studying this chapter, students should be able to:

Corporate Online. Import format for Payment Processing Service files

Solution for Homework 2

CHAPTER 8 BAR CODE CONTROL

Compiler Construction

Transcription:

1 of 5 12/22/2014 9:55 AM EmEditor Home - EmEditor Help - How to - Search Regular Expression Syntax EmEditor regular expression syntax is based on Perl regular expression syntax. Literals All characters are literals except: ".", "*", "?", "+", "(", ")", "{", "}", "[", "]", "^", "$" and "\". These characters are literals when preceded by a "\". A literal is a character that matches itself. For example, searching for "\?" will match every "?" in the document, or searching for "Hello" will match every "Hello" in the document. Metacharacters The following tables contain the complete list of metacharacters (non-literals) and their behavior in the context of regular expressions. \ ^ Marks the next character as a special character, a literal, or a back reference. For example, 'n' matches the character "n". '\n' matches a newline character. The sequence '\\' matches "\" and "\(" matches "(". Matches the position at the beginning of the input string. For example, "^e" matches any "e" that begins a string. $ Matches the position at the end of the input string. For example, "e$" matches any "e" that ends a string. * +? {n} {n,} {n,m}?. Matches the preceding character or sub-expression zero or more times. For example, zo* matches "z" and "zoo". * is equivalent to {0,}. Matches the preceding character or sub-expression one or more times. For example, 'zo+' matches "zo" and "zoo", but not "z". + is equivalent to {1,}. Matches the preceding character or sub-expression zero or one time. For example, "do(es)?" matches the "do" in "do"or "does".? is equivalent to {0,1} n is a nonnegative integer. Matches exactly n times. For example, 'o{2}' does not match the "o" in "Bob" but matches the two o's in "food". n is a nonnegative integer. Matches at least n times. For example, 'o{2,}' does not match "o" in "Bob" and matches all the o's in "foooood". "o{1,}" is equivalent to 'o+'. 'o{0,}' is equivalent to 'o*'. m and n are nonnegative integers, where n <= m. Matches at least n and at most m times. For example, "o{1,3}" matches the first three o's in "fooooood". 'o{0,1}' is equivalent to 'o?'. Note that you cannot put a space between the comma and the numbers. When this character immediately follows any of the other quantifiers (*, +,?, {n}, {n,}, {n,m}), the matching pattern is non-greedy. A non-greedy pattern matches as little of the searched string as possible, whereas the default greedy pattern matches as much of the searched string as possible. For example, in the string "oooo", 'o+?' matches a single "o", while 'o+' matches all 'o's. Matches any single character. For example, ".e" will match text where any character precedes an "e", like "he", "we", or "me". In EmEditor Professional, it matches a new line within the range specified in the Additional Lines to Search for Regular Expressions text box if the A Regular Expression "." Can Match the New Line Character check box is checked.

meditor How to: Regular Expression Syntax 2 of 5 12/22/2014 9:55 AM (pattern) \1 - \9 (?:pattern) (?=pattern) (?!pattern) Parentheses serve two purposes: to group a pattern into a sub-expression and to capture what generated the match. For example the expression "(ab)*" would match all of the string "ababab". Each sub-expression match is captured as a back reference (see below) numbered from left to right. To match parentheses characters ( ), use '\(' or '\)'. Indicates a back reference - a back reference is a reference to a previous sub-expression that has already been matched. The reference is to what the sub-expression matched, not to the expression itself. A back reference consists of the escape character "\" followed by a digit "1" to "9", "\1" refers to the first sub-expression, "\2" to the second etc. For example, "(a)\1" would capture "a" as the first back reference and match any text "aa". Back references can also be used when using the Replace feature under the Search menu. Use regular expressions to locate a text pattern, and the matching text can be replaced by a specified back reference. For example, "(h)(e)" will find "he", and putting "\1" in the Replace With box will replace "he" with "h" whereas "\2\1" will replace "he" with "eh". A subexpression that matches pattern but does not capture the match, that is, it is a non-capturing match that is not stored for possible later use with back references. This is useful for combining parts of a pattern with the "or" character ( ). For example, 'industr(?:y ies) is a more economical expression than 'industry industries'. A subexpression that performs a positive lookahead search, which matches the string at any point where a string matching pattern begins. For example, "x(?=abc)" matches an "x"only if it is followed by the expression "abc". This is a non-capturing match, that is, the match is not captured for possible later use with back references. pattern cannot contain a new line. A subexpression that performs a negative lookahead search, which matches the search string at any point where a string not matching pattern begins. For example, "x(?!abc)" matches an "x" only if it is not followed by the expression "abc". This is a non-capturing match, that is, the match is not captured for possible later use with back references. pattern cannot contain a new line. (?<=pattern) A subexpression that performs a positive lookbehind search, which matches the search string at any point where a string matching pattern ends. For example, "(?<=abc)x" matches an "x" only if it is preceded by the expression "abc". This is a non-capturing match, that is, the match is not captured for possible later use with back references. pattern cannot contain a new line. pattern must be of fixed length. (?<!pattern) x y [xyz] [^xyz] [a-z] [^a-z] A subexpression that performs a negative lookbehind search, which matches the search string at any point where a string not matching pattern ends. For example, "(?<!abc)x" matches an "x" only if it is not preceded by the expression "abc". This is a non-capturing match, that is, the match is not captured for possible later use with back references. pattern cannot contain a new line. pattern must be of fixed length. Matches either x or y. For example, 'z food' matches "z" or "food". '(z f)ood' matches "zood" or "food". A character set. Matches any one of the enclosed characters. For example, '[abc]' matches the 'a' in "plain". A negative character set. Matches any character not enclosed. For example, '[^abc]' matches the 'p' in "plain". A range of characters. Matches any character in the specified range. For example, '[a-z]' matches any lowercase alphabetic character in the range 'a' through 'z'. A negative range characters. Matches any character not in the specified range. For example, '[^a-z]' matches any character not in the range 'a' through 'z'.

3 of 5 12/22/2014 9:55 AM Character Classes The following character classes are used within a character set such as "[:classname:]". For instance, "[[:space:]]" is the set of all whitespace characters. alnum alpha blank cntrl Any alphanumeric character. Any alphabetical character a-z, A-Z, and other character. Any blank character, either a space or a tab. Any control character. digit Any digit 0-9. graph lower print punct space upper xdigit word Any graphical character. Any lowercase character a-z, and other lowercase character. Any printable character. Any punctuation character. Any whitespace character. Any uppercase character A-Z, and other uppercase character. Any hexadecimal digit character, 0-9, a-f and A-F. Any word character - all alphanumeric characters plus the underscore. unicode Any character whose code is greater than 255. Single character escape sequences The following escape sequences are aliases for single characters: 0x07 \a Bell character. 0x0C \f Form feed. 0x0A \n Newline character. 0x0D \r Carriage return. 0x09 \t Tab character. 0x0B \v Vertical tab. 0x1B \e ASCII Escape character. 0dd \0dd An octal character code, where dd is one or more octal digits. 0xXX \xxx A hexadecimal character code, where XX is one or more hexadecimal digits (a Unicode character). 0xXXXX \x{xxxx} A hexadecimal character code, where XXXX is one or more hexadecimal digits (a Unicode character). Z-'@' \cz Z-'@' An ASCII escape sequence control-z, where Z is any ASCII character greater than or equal to the character code for '@'.

meditor How to: Regular Expression Syntax 4 of 5 12/22/2014 9:55 AM Word Boundaries The following escape sequences match the boundaries of words: \< Matches the start of a word. \> Matches the end of a word. \b Matches a word boundary (the start or end of a word). \B Matches only when not at a word boundary. Character class escape sequences The following escape sequences can be used to represent entire character classes: \w Any word character - all alphanumeric characters plus the underscore. \W Complement of \w - find any non-word character \s Any whitespace character. \S Complement of \s. \d Any digit 0-9. \D Complement of \d. \l Any lower case character a-z. \L Complement of \l. \u Any upper case character A-Z. \U Complement of \u. \C Any single character, equivalent to '.'. \Q The begin quote operator, everything that follows is treated as a literal character until a \E end quote operator is found. \E The end quote operator, terminates a sequence begun with \Q. Replacement Expressions The following expressions are available for the Replace With box in the Replace dialog box and in the Replace in Files dialog box. \0 Indicates a back reference to the entire regular expression. \1 - \9 Indicates a back reference - a back reference is a reference to a previous sub-expression that has already been matched. The reference is to what the sub-expression matched, not to the expression itself. A back reference consists of the escape character "\" followed by a digit "1" to "9", "\1" refers to the first sub-expression, "\2" to the second etc. \n A new line.

5 of 5 12/22/2014 9:55 AM \r A carriage return in case of Replace in Files. See also To Specify New Lines. \t A tab. \L Forces all subsequent substituted characters to be in lowercase. \U Forces all subsequent substituted characters to be in uppercase. \H Forces all subsequent substituted characters to be in half-width characters. \F Forces all subsequent substituted characters to be in full-width characters. \E Turns off previous \L, \U, \F, or \H. (?n:true_expression:false_expression) Notes If sub-expression N was matched, then true_expression is evaluated and sent to output, otherwise false_expression is evaluated and sent to output. In Find in Files and in Replace in Files, the carriage return (\r) and the line feed (\n) must be specified carefully. See To Specify New Lines for details. In order for some escape sequences to work in EmEditor, like "\l", "\u" and their complements, the Match Case option has to be selected. Copyright Notice The regular expression routines used in EmEditor use Boost library Regex++. Copyright (c) 1998-2001 Dr John Maddock See Also Q. What are examples of regular expressions? To Specify New Lines Copyright 2003-2014 by Emurasoft, Inc.