University Convocation. IT 3203 Introduction to Web Development. Pattern Matching. Why Match Patterns? The Search Method. The Replace Method



Similar documents
Regular Expressions Overview Suppose you needed to find a specific IPv4 address in a bunch of files? This is easy to do; you just specify the IP

Lecture 18 Regular Expressions

Regular Expression Searching

Kiwi Log Viewer. A Freeware Log Viewer for Windows. by SolarWinds, Inc.

Regular Expressions. In This Appendix

Regular Expressions. The Complete Tutorial. Jan Goyvaerts

Regular Expressions. General Concepts About Regular Expressions

Using Regular Expressions in Oracle

Eventia Log Parsing Editor 1.0 Administration Guide

Lecture 4. Regular Expressions grep and sed intro

Programming Languages CIS 443

Regular Expressions for Perl, C, PHP, Python, Java, and.net. Regular Expression. Pocket Reference. Tony Stubblebine

Regular Expressions. Abstract

dtsearch Regular Expressions

Regular Expression Syntax

dtsearch Frequently Asked Questions Version 3

.NET Standard DateTime Format Strings

Introduction to Searching with Regular Expressions

Regular Expressions. Chapter 11. Python for Informatics: Exploring Information

JavaScript: Introduction to Scripting Pearson Education, Inc. All rights reserved.

VDF Query User Manual

VMware vcenter Log Insight User's Guide

Python Lists and Loops

Regular Expressions and Pattern Matching

Regular expressions are a formal way to

3 Data Properties and Validation Rules

Compiler Construction

DigitalPersona. Password Manager Pro. Version 5.0. Administrator Guide

Chapter 5. Microsoft Access

Time Clock Import Setup & Use

grep, awk and sed three VERY useful command-line utilities Matt Probert, Uni of York grep = global regular expression print

CS106A, Stanford Handout #38. Strings and Chars

Subnetting Examples. There are three types of subnetting examples I will show in this document:

Content of this lecture. Regular Expressions in Java. Hello, world! In Java. Programming in Java

X1 Professional Client

6.045: Automata, Computability, and Complexity Or, Great Ideas in Theoretical Computer Science Spring, Class 4 Nancy Lynch

Form Validation. Server-side Web Development and Programming. What to Validate. Error Prevention. Lecture 7: Input Validation and Error Handling

Python: Regular Expressions

1. To start Installation: To install the reporting tool, copy the entire contents of the zip file to a directory of your choice. Run the exe.

Introducing Oracle Regular Expressions. An Oracle White Paper September 2003

Kaseya 2. User Guide. Version 7.0. English

Regular Expressions for Natural Language Processing

Managing Code Development Using CCS Project Manager

How To Set Up A Scopdial On A Pc Or Macbook Or Ipod (For A Pc) With A Cell Phone (For Macbook) With An Ipod Or Ipo (For An Ipo) With Your Cell Phone Or

Importing and Exporting With SPSS for Windows 17 TUT 117

Import Filter Editor User s Guide

We will learn the Python programming language. Why? Because it is easy to learn and many people write programs in Python so we can share.


Introduction to Lex. General Description Input file Output file How matching is done Regular expressions Local names Using Lex

Load testing with. WAPT Cloud. Quick Start Guide

Install Java Development Kit (JDK) 1.8

Applies to Version 6 Release 5 X12.6 Application Control Structure

BACKSCATTER PROTECTION AGENT Version 1.1 documentation

Limitation of Liability

Bottom-Up Parsing. An Introductory Example

Python Programming: An Introduction to Computer Science

Using PRX to Search and Replace Patterns in Text Strings

Regular Expressions (in Python)

SOME EXCEL FORMULAS AND FUNCTIONS

Web Programming Step by Step

How to Create and Send a Froogle Data Feed

VMware vrealize Log Insight User's Guide

CSE 341 Lecture 28. Regular expressions. slides created by Marty Stepp

How To Protect Your Network From Attack From Outside From Inside And Outside

IBM Tivoli Network Manager 3.8

Automata Theory. Şubat 2006 Tuğrul Yılmaz Ankara Üniversitesi

GUIDELINES ON THE USE OF THE ELECTRONIC PAYMENT TEMPLATE FOR PAYROLL AND MISCELLANEOUS CREDITS

NiCE Log File Management Pack. for. System Center Operations Manager Quick Start Guide

Talk-101 User Guides Web Content Filter Administration

Chapter 3 Writing Simple Programs. What Is Programming? Internet. Witin the web server we set lots and lots of requests which we need to respond to

VMware vcenter Log Insight User's Guide

Dreamweaver CS5. Module 2: Website Modification

Perl in a nutshell. First CGI Script and Perl. Creating a Link to a Script. print Function. Parsing Data 4/27/2009. First CGI Script and Perl

Lab 14A: Using Task Manager and Event Viewer

Regular Expression. n What is Regex? n Meta characters. n Pattern matching. n Functions in re module. n Usage of regex object. n String substitution

Evaluator s Guide. PC-Duo Enterprise HelpDesk v5.0. Copyright 2006 Vector Networks Ltd and MetaQuest Software Inc. All rights reserved.

HTML Web Page That Shows Its Own Source Code

Microsoft Access 2010 Tables & Field Properties

Introduction to Finite Automata

Advanced BIAR Participant Guide

DTD Tutorial. About the tutorial. Tutorial

Converting Microsoft Access 2002 to Pipe-Delimited ASCII Text Files

MS Access: Advanced Tables and Queries. Lesson Notes Author: Pamela Schmidt

Lab 9. Spam, Spam, Spam. Handout 11 CSCI 134: Spring, To gain experience with Strings. Objective

Packet Capture. Document Scope. SonicOS Enhanced Packet Capture

Version 2.1.x. Barracuda Message Archiver. Outlook Add-In User's Guide

JavaScript: Control Statements I

Secrets of printf. 1 Background. 2 Simple Printing. Professor Don Colton. Brigham Young University Hawaii. 2.1 Naturally Special Characters

How To Use Excel With A Calculator

6.170 Tutorial 3 - Ruby Basics

POLYCENTER Software Installation Utility Developer s Guide

Using Safari to Deliver and Debug a Responsive Web Design

Enhancing the SAS Enhanced Editor with Toolbar Customizations Lynn Mullins, PPD, Cincinnati, Ohio

How to Fix Mail-Merge Number Formatting in Word 2010

Transcription:

IT 3203 Introduction to Web Development Regular Expressions October 12 Notice: This session is being recorded. Copyright 2007 by Bob Brown University Convocation Tuesday, October 13, 11:00 AM 12:15 PM Student Center Theatre Convocation Speaker: Dr. John Palfrey Speaking on Born Digital in a Network Society Professor at Harvard Law School Vice Dean for Library and Information Resources Co-author of Born Digital: Understanding the First Generation of Digital Natives and also Access Denied: The Practice and Politics of Internet Filtering Pattern Matching Pattern matching in JavaScript is based on regular expressions. Regular expressions are patterns that are compared with strings or substrings In reality, regular expressions are a small formal language. Two approaches in JavaScript: regexp object methods of the string object Why Match Patterns? Most data validation that can be done on the client-side consists of testing data for conformance to a pattern. Telephone numbers Email addresses Dates Money amounts what else? The Search Method Search is a method of the string object var my_string = "Abernathy"; var my_pos = my_string.search(/er/); My_pos becomes 2. /er/ is a pattern. The search method searches for the pattern in the string. Returns -1 if there is no match. The Replace Method var bobs = "Bob, Bobbie"; bobs.replace(/bob/g, "Bill"); The string bobs now contains Bill, Billbie /Bob/ is a pattern, but Bill is just a string. The g means global.

The Match Method Match is the most general of the methods var fruit = "4 apples 3 oranges"; var my_nbrs = fruit.match(/\d/g); my_nbrs contains [4, 3] (it s an array) g all matches no g first match, plus parenthesized subpatterns \d matches digits ( and \D matches non-digits.) Forming Regular Expressions / / enclose patterns normal characters match themselves (e.g. rabbit ) Metacharacters have special meanings \ ( ) [ ] { } ^ $ * +?. Metacharacters can be included in patterns by escaping with a backslash, like \$ A real dollar sign Wildcard Matching. (period) matches any character except newline /snow./ matches snows, snowy matches snowi in snowing Classes [ ] (brackets) define classes [abc] /[abc]/ matches a or b or c /[a-h]/ matches lower-case a through h ^ (circumflex) inverts a class /[^aeiou]/ matches all except a,e,i,o,u Predefined Classes \x backslash and class abbreviation See your textbook or a JavaScript reference \d matches a digit: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 /\d+\.\d*/ One or more digits a period zero or more digits Word and Space Characters Word characters: [a-za-z0-9_] \w Non-word characters: [^a-za-z0-9_] \W Space characters: space, tab, new line: Non-space characters: \s \S Capitalization reverses the sense of the predefined class names.

Boundary Matches \b matches boundary between word and non-word Foo baz zero-length match This allows a whole-words-only search. /Fred\b/ Matches Fred is but not Frederick is /Fred\B/ Matches Frederick is but not Fred is \bis\b matches is in: This island is beautiful Repetition * zero or more + one or more? one or none { } a count (applies to pattern character on left) /xy{4}z/ == /xyyyyz/ /X*y+z?/ Repetition Examples * zero or more + one or more? one or none /\d*\.\d+/ /\d*\.?\d*/ Repetition Exercise /\d*\.\d+/ 1. 0.0 2..25 3. 137 4. 137. 5. 4.5678 6. xyz.123 Can We Fix The Pattern? Assume we are trying to match valid numbers in various combinations with decimal point. Is this any better? (Not much!) /\d+\.?\d*/ 1. 0.0 2..25 3. 137 4. 137. 5. 4.5678 6. xyz.123 Repetition Exercise: Case 2 /\d+\.?\d*/ 2..25 This expression does match test case 2 at position 1, the digit 2. But the decimal point is skipped by \d+, which matches 25 \.? makes (another) decimal optional \d* matches nothing It also matches within:.25.67! Why? What about:.25.67.89?

Repetition Exercise: Case 6 /\d+\.?\d*/ 6. xyz.123 This expression does match test case 6 at position 4, the digit 1. But the decimal point is skipped by \d+, which matches 123 \.? makes (another) decimal optional \d* matches nothing Another Repetition Exercise /X*y+z?/ 1. Xyyyz 2. Xzzy 3. yyyyz 4. yyyy 5. wxyzz 6. zzzxyzz Anchors Specify where to start matching /^pearl/ Match starts at beginning of string pearls are... but not my pearls... Same character as pattern inversion, but different context, different meaning. /gold$/ Anchors to end of string I like gold but not sunset is golden Grouping and Alternatives Parentheses group items. The pipe or vertical bar matches one of two or more alternatives. abc(def xyz) Matches ABCDEF or ABCXYZ Now We Can Fix The Pattern Almost! We are trying to match either a digit or a decimal point: If a decimal point, then one or more digits Otherwise, an optional decimal point followed by zero or more digits. /^\d*( \.\d*)?$/ Problem: This matches a decimal point all by itself. To fix, we need conditional expressions, which are beyond the scope of the course because conditionals are not supported in JavaScript. A Closer Look /^\d*( \.\d*)?$/ Anchored at the beginning of the string Zero or more digits A group containing either nothing, or a decimal point and zero or more digits, Repeated zero or one times. Anchored at the end of the string

Did That Work? 1. 0.0 /^\d*( \.\d*)?$/ 2..25 3. 137 4. 137. 5. 4.5678 6. xyz.123 7.. Follow the pattern: g global i case-insensitive /buffalo/i Modifiers Matches Buffalo and buffalo The Split Method Splits a string into substrings Returns an array of substrings var my_str = "grapes:apples:oranges"; var fruit = my_str.split(":"); fruit is ["grapes", "apples", "oranges"] Split can take a regular expression as a delimiter Split with a Regular Expression Splitting a comma-delimited string: var my_nbrs = "12,34,56"; var nbr_array = my_nbrs.split(","); What about this? var my_nbrs = "12, 3,4, 56"; nbr_array=my_nbrs.split(/\s*,\s*/); A 7-Digit Phone Number How does this work? var ok = phnum.search(/\d{3}-\d{4}/); What does the search method return for this? 555-1212 A 7-Digit Phone Number How does this work? var ok = phnum.search(/\d{3}-\d{4}/); What about this? 444555-12123456

A 7-Digit Phone Number How does this work? var ok = phnum.search(/\d{3}-\d{4}/); 10-Digit Phone Number Can it be extended for Atlanta-style phone numbers? var ok=phnum.search(/^\d{3}-\d{3}-\d{4}$/); What about this? 444555-12123456 var ok = phnum.search(/^\d{3}-\d{4}$/); Anchoring the beginning and end gives an expression that works: No match here! 10-Digit Phone Number Can the format be made less rigid? (Yes!) /^\(?\d{3}\d*\d{3}\d*\d{4}$/ Anchor at the beginning of the string Optional left parenthesis Three digits Optional non-digits Three digits Optional non-digits Four digits Anchored at the end of the string. Accepting Free-Form Phone Numbers Parentheses act as grouping and storage operators. var ok = datum.search(/^\(?\d{3}\d*\d{3}\d*\d{4}$/); if (ok==0) { var parts = datum.match (/^\(?(\d{3})\d*(\d{3})\d*(\d{4})$/); output.value='('+parts[1]+') '+parts[2]+'-'+parts[3]; } Accepts: 404-555-1234, 4045551234, (404) 555-1234, etc. Returns: (404) 555-1234 Regular Expressions as NFAs Nondeterministic Finite Automata Nondeterministic is not the same as random Each part of a regular expression will match as much as it can..* matches to end of string! The regular expression engine backtracks when necessary, i.e. when a match would otherwise fail. Regular Expressions are Greedy A regular expression will match as much of the target string as possible /2.*2/ 19202122232425252627282930313233 19202122232425252627282930313233

Regular Expressions are Greedy Consider parsing HTML with a regular expression. Stars by the <b>billions</b> and <b>billions</b>. Regular Expressions are Greedy Consider parsing HTML with a regular expression. Stars by the <b>billions</b> and <b>billions</b>. /<b>.*<\/b>/ /<b>.*<\/b>/ The? is also the lazy modifier: Friedl, J. Mastering Regular Expressions /<b>.*?<\/b>/ Friedl, J. Mastering Regular Expressions Questions IP Addresses 4.56.123.156 /^(\d+)\.(\d+)\.(\d+)\.(\d+)$/ var octets=ip.match( ); check each octet for being 255