Regular Expressions (in Python)



Similar documents
Regular Expression Syntax

dtsearch Regular Expressions

Lecture 4. Regular Expressions grep and sed intro

Regular Expressions. In This Appendix

Using Regular Expressions in Oracle

Lecture 18 Regular Expressions

Regular Expressions. Abstract

Content of this lecture. Regular Expressions in Java. Hello, world! In Java. Programming in Java

Regular Expressions Overview Suppose you needed to find a specific IPv4 address in a bunch of files? This is easy to do; you just specify the IP

Regular Expressions. General Concepts About Regular Expressions

Regular Expressions in Create Lists Revised April 2015 to account for both Millennium and Sierra

Regular Expressions. The Complete Tutorial. Jan Goyvaerts

Regular Expression Searching

Hands-On UNIX Exercise:

DigitalPersona. Password Manager Pro. Version 5.0. Administrator Guide

CSE 341 Lecture 28. Regular expressions. slides created by Marty Stepp

Python Lists and Loops

Version August 2016

CS106A, Stanford Handout #38. Strings and Chars

Kiwi Log Viewer. A Freeware Log Viewer for Windows. by SolarWinds, Inc.

University Convocation. IT 3203 Introduction to Web Development. Pattern Matching. Why Match Patterns? The Search Method. The Replace Method

CS Unix Tools & Scripting Lecture 9 Shell Scripting

JavaScript: Introduction to Scripting Pearson Education, Inc. All rights reserved.

Regular Expressions and Pattern Matching

BASH scripting. Sebastian von Alfthan Scripting techniques CSC Tieteen tietotekniikan keskus Oy CSC IT Center for Science Ltd.

Verilog - Representation of Number Literals

Variables, Constants, and Data Types

Introduction to Shell Programming

Outline Basic concepts of Python language

How About Security Testing?

Regular Expressions for Perl, C, PHP, Python, Java, and.net. Regular Expression. Pocket Reference. Tony Stubblebine

Exercise 4 Learning Python language fundamentals

Unix Shell Scripts. Contents. 1 Introduction. Norman Matloff. July 30, Introduction 1. 2 Invoking Shell Scripts 2

Evaluation of JFlex Scanner Generator Using Form Fields Validity Checking

Introduction to Python

C H A P T E R Regular Expressions regular expression

Regular expressions and sed & awk

Introducing Oracle Regular Expressions. An Oracle White Paper September 2003

Chart of ASCII Codes for SEVIS Name Fields

Python: Regular Expressions

Python Loops and String Manipulation

Attendance User Guide. PowerSchool 6.x Student Information System

Retrieving Data Using the SQL SELECT Statement. Copyright 2006, Oracle. All rights reserved.

Bachelors of Computer Application Programming Principle & Algorithm (BCA-S102T)

.NET Standard DateTime Format Strings

Web Programming Step by Step

BASH Scripting. A bash script may consist of nothing but a series of command lines, e.g. The following helloworld.sh script simply does an echo.

HTML Web Page That Shows Its Own Source Code

Unix Shell Scripting Tutorial Ashley J.S Mills

URL encoding uses hex code prefixed by %. Quoted Printable encoding uses hex code prefixed by =.

awk A UNIX tool to manipulate and generate formatted data

5 Arrays and Pointers

CSE 154 LECTURE 11: REGULAR EXPRESSIONS

Concepts Design Basics Command-line MySQL Security Loophole

F ahrenheit = 9 Celsius + 32

Sorting. Lists have a sort method Strings are sorted alphabetically, except... Uppercase is sorted before lowercase (yes, strange)

Being Regular with Regular Expressions. John Garmany Session

JAVA - QUICK GUIDE. Java SE is freely available from the link Download Java. So you download a version based on your operating system.

AN INTRODUCTION TO UNIX

Bash shell programming Part II Control statements

java.util.scanner Here are some of the many features of Scanner objects. Some Features of java.util.scanner

We will learn the Python programming language. Why? Because it is easy to learn and many people write programs in Python so we can share.

Introduction to Python

Upgrading MySQL from 32-bit to 64-bit

Web Programming. Origins of Ruby

Searching Guide Version 8.0 December 11, 2013

The TED website and its features Selecting a language Registered users Creating a My TED account... 8

USEFUL UNIX COMMANDS

ESCI 386 IDL Programming for Advanced Earth Science Applications Lesson 6 Program Control

X1 Professional Client

ASCII CONTROL COMMANDS FOR MATRIX SWITCHING SYSTEMS AN_0001

List of FTP commands for the Microsoft command-line FTP client

Systems Programming & Scripting

CS390S, Week 9:Meta-Character Vulnerabilities

CS177 MIDTERM 2 PRACTICE EXAM SOLUTION. Name: Student ID:

Pemrograman Dasar. Basic Elements Of Java

Unix Scripts and Job Scheduling


Regular expressions are a formal way to

ASCII CODES WITH GREEK CHARACTERS

Villanova University CSC 2400: Computer Systems I

User s Guide. Command Line Interface. for Switched Rack PDUs

IceWarp Server. Reference Manual. Version 10

Exim's interfaces to mail filtering

grep, awk and sed three VERY useful command-line utilities Matt Probert, Uni of York grep = global regular expression print

Introduction to Python for Text Analysis

Play a Sound file in Visual Basic 6

Conference Bridge setup

Advanced Bash Scripting. Joshua Malone

Bash Tutorial. Eigil Obrestad and Erik Hjelmås. August 18, 2015

Introduction to Java Applications Pearson Education, Inc. All rights reserved.

USING EXCEL 2010 TO SOLVE LINEAR PROGRAMMING PROBLEMS MTH 125 Chapter 4

5.1 Radical Notation and Rational Exponents

COS 333: Advanced Programming Techniques

BSA Electronic Filing Requirements For Report of Foreign Bank and Financial Accounts (FinCEN Report 114)

Fuld Skolerapport for Søhusskolen, i Odense kommune, for skoleår 2013/2014 for klassetrin(ene) 9. med reference Tilsvarende klassetrin i kommunen

Fuld Skolerapport for Hunderupskolen, i Odense kommune, for skoleår 2013/2014 for klassetrin(ene) 7. med reference Tilsvarende klassetrin i kommunen

Introduction to. Marty Stepp University of Washington

Handout 1. Introduction to Java programming language. Java primitive types and operations. Reading keyboard Input using class Scanner.

Transcription:

Regular Expressions (in Python)

Python or Egrep We will use Python. In some scripting languages you can call the command grep or egrep egrep pattern file.txt E.g. egrep ^A file.txt Will print all the line of file.txt which start with (^) the letter A (capital A)

Regular expression (abbreviated regex or regexp) a search pattern, mainly for use in pattern matching with strings, i.e. "find and replace"- like operations. Each character in a regular expression is either understood to be a metacharacter with its special meaning, or a regular character with its literal meaning. We ask the question does a given string match a certain pattern?

1.. 2. + 3.? 4. * 5. ^ 6. $ 7. [...] 8. - 9. [^...] 10. 11. () 12. {m,n} List of Meta characters

. (dot) Matches any single character (many applications exclude newlines, and exactly which characters are considered newlines is flavor-, characterencoding-, and platform-specific, but it is safe to assume that the line feed character is included). Within POSIX bracket expressions, the dot character matches a literal dot. For example, a.c matches "abc", etc., but [a.c] matches only "a", ".", or "c".

Example. string1 = "Hello, world." if re.search(r"...", string1): print string1 + " has length >= 5"

Example [.] literally a dot string1 = "Hello, world." if re.search(r"...[.]", string1): print string1 + " has length >= 5 and ends with a."

+ Matches the preceding element one or more times. For example, ab+c matches "abc", "abbc", "abbbc", and so on, but not "ac". string1 = "Hello, world." if re.search(r"l+", string1): print 'There are one or more consecutive letter "l"' +\ "'s in " + string1

? Matches the preceding pattern element zero or one times. #? #Matches the preceding pattern element zero or one times. string1 = "Hello, world." if re.search(r"h.?e", string1): print "There is an 'H' and a 'e' separated by 0-1 characters (Ex: He Hoe)"

* Matches the preceding element zero or more times. For example, ab*c matches "ac", "abc", "abbbc", etc. [xyz]* matches "", "x", "y", "z", "zx", "zyx", "xyzzy", and so on. (ab)* matches "", "ab", "abab", "ababab", and so on. string1 = "Hello, world." if re.search(r"e(ll)*o", string1): print "'e' followed by zero to many'll' followed by 'o' (eo, ello, ellllo)"

^ Matches the beginning of a line or string. #^ Matches the beginning of a line or string. string1 = "Hello World" if re.search(r"^he", string1): print string1, "starts with the characters 'He'"

$ Matches the end of a line or string. string1 = "Hello World" if re.search(r"rld$", string1): print string1, "is a line or string that ends with 'rld'"

[ ] A bracket expression. Matches a single character that is contained within the brackets. For example, [abc] matches "a", "b", or "c". [a-z] specifies a range which matches any lowercase letter from "a" to "z". These forms can be mixed: [abcx-z] matches "a", "b", "c", "x", "y", or "z", as does [a-cx-z]. The - character is treated as a literal character if it is the last or the first (after the ^, if present) character within the brackets: [abc-], [-abc]. Note that backslash escapes are not allowed. The ] character can be included in a bracket expression if it is the first (after the ^) character: []abc].

Example [] #[] Denotes a set of possible character matches. string1 = "Hello, world." if re.search(r"[aeiou]+", string1): print string1 + " contains one or more vowels."

[^ ] Matches a single character that is not contained within the brackets. For example, [^abc] matches any character other than "a", "b", or "c". [^a-z] matches any single character that is not a lowercase letter from "a" to "z". Likewise, literal characters and ranges can be mixed.

Example [^ ] #[^...] Matches every character except the ones inside brackets. string1 = "Hello World\n" if re.search(r"[^abc]", string1): print string1 + " contains a character other than a, b, and c"

Example # Separates alternate possibilities. string1 = "Hello, world." if re.search(r"(hello Hi Pogo)", string1): print "At least one of Hello, Hi, or Pogo is contained in " + string1

Defines a marked subexpression. The string matched within the parentheses can be recalled later (see the next entry, \n). A marked subexpression is also called a block or capturing group. ()

Example () string1 = "Hello, world." m_obj = re.search(r"(h..).(o..)(...)", string1) if re.search(r"(h..).(o..)(...)", string1): print "We matched '" + m_obj.group(1) + "' and '" + m_obj.group(2) + "' and '" + m_obj.group(3)+ "'"

\n {m,n} \n Matches what the nth marked subexpression matched, where n is a digit from 1 to 9. This construct is vaguely defined in the POSIX.2 standard. Some tools allow referencing more than nine capturing groups. {m,n} Matches the preceding element at least m and not more than n times. For example, a{3,5} matches only "aaa", "aaaa", and "aaaaa". This is not found in a few older instances of regular expressions. BRE mode requires\{m,n\}.

-v option A regex in Python, either the search or match methods, returns a Match object or None. For grep - v equivalent, you might use: import re for line in sys.stdin: if re.search(r'[az]', line) is None: sys.stdout.write(line)

e.g. Username /^[a-z0-9_-]{3,16}$/ Starts and ends with 3-16 numbers, letters, underscores or hyphens Any lowercase letter (a-z), number (0-9), an underscore, or a hyphen. At least 3 to 16 characters. Matches E.g. my-us3r_n4m3 but not th1s1swayt00_l0ngt0beausername

e.g. Password /^[a-z0-9_-]{6,18}$/ Starts and ends with 6-18 letters, numbers, underscores, hyphens. Matches e.g. myp4ssw0rd but not mypa$$w0rd

e.g. Hex Value /^#?([a-f0-9]{6} [a-f0-9]{3})$/ Starts with a +/- (optional) followed by one or more Matches e.g. #a3c113 but not #4d82h4

e.g. Email /^([a-z0-9_\.-]+)@([\da-z\.-]+)\.([a-z\.]{2,6})$/ String that matches: john@doe.com String that doesn't match: john@doe.something (TLD is too long)

Match n characters egrep.exe "^...$" data1.txt Will match any line with exactly 3 characters ^ starts with. And contains (i.e. 3 characters) $ ends with Or just egrep.exe "^.{3}$" data1.txt What about egrep.exe "(..){2}" data1.txt?