CSE 154 LECTURE 11: REGULAR EXPRESSIONS



Similar documents
Web Programming Step by Step

CSE 341 Lecture 28. Regular expressions. slides created by Marty Stepp

Lecture 18 Regular Expressions

Regular Expression Syntax

JAVASCRIPT AND COOKIES

PHP Tutorial From beginner to master

Python Lists and Loops

Tutorial 6 Creating a Web Form. HTML and CSS 6 TH EDITION

CS 1133, LAB 2: FUNCTIONS AND TESTING

HTML Form Widgets. Review: HTML Forms. Review: CGI Programs

Web Programming Step by Step

TCP/IP Networking, Part 2: Web-Based Control

Internet Technologies

Regular Expressions Overview Suppose you needed to find a specific IPv4 address in a bunch of files? This is easy to do; you just specify the IP

Web Programming with PHP 5. The right tool for the right job.

Lab 5 Introduction to Java Scripts

Form Validation. Server-side Web Development and Programming. What to Validate. Error Prevention. Lecture 7: Input Validation and Error Handling

Some Scanner Class Methods

DeltaPAY v2 Merchant Integration Specifications (HTML - v1.9)

PHP Authentication Schemes

Inserting the Form Field In Dreamweaver 4, open a new or existing page. From the Insert menu choose Form.

LAB MANUAL CS (22): Web Technology

Using Regular Expressions in Oracle

CS177 MIDTERM 2 PRACTICE EXAM SOLUTION. Name: Student ID:

Designing for Dynamic Content

WIRIS quizzes web services Getting started with PHP and Java

Lecture 4. Regular Expressions grep and sed intro

Real SQL Programming 1

JavaScript: Introduction to Scripting Pearson Education, Inc. All rights reserved.

InternetVista Web scenario documentation

CS412 Interactive Lab Creating a Simple Web Form

JavaScript By: A. Mousavi & P. Broomhead SERG, School of Engineering Design, Brunel University, UK

This script is called by an HTML form using the POST command with this file as the action. Example: <FORM METHOD="POST" ACTION="formhandler.

CSCI110: Examination information.

Variables, Constants, and Data Types

Client-side Web Engineering From HTML to AJAX

Script Handbook for Interactive Scientific Website Building

Introduction to Java Applications Pearson Education, Inc. All rights reserved.

Regular Expressions (in Python)

CS106A, Stanford Handout #38. Strings and Chars

Advanced Web Technology 10) XSS, CSRF and SQL Injection 2

Viewing Form Results

An Introduction to Developing ez Publish Extensions

University Convocation. IT 3203 Introduction to Web Development. Pattern Matching. Why Match Patterns? The Search Method. The Replace Method

Perl in a nutshell. First CGI Script and Perl. Creating a Link to a Script. print Function. Parsing Data 4/27/2009. First CGI Script and Perl

CS169.1x Lecture 5: SaaS Architecture and Introduction to Rails " Fall 2012"

Regular Expressions. Abstract

grep, awk and sed three VERY useful command-line utilities Matt Probert, Uni of York grep = global regular expression print

Being Regular with Regular Expressions. John Garmany Session

Regular Expressions. General Concepts About Regular Expressions

Finding XSS in Real World

Kiwi Log Viewer. A Freeware Log Viewer for Windows. by SolarWinds, Inc.

F ahrenheit = 9 Celsius + 32

Importing Lease Data into Forms Online

NewsletterAdmin 2.4 Setup Manual

Regular Expressions and Pattern Matching

Technical Specification ideal

PHP. Introduction. Significance. Discussion I. What Is PHP?

Forms, CGI Objectives. HTML forms. Form example. Form example...

Full URLs, specified in RFC 3986 have up to eight parts.

jquery Tutorial for Beginners: Nothing But the Goods

Now that we have discussed some PHP background

HTML Web Page That Shows Its Own Source Code

Tutorial: Using PHP, SOAP and WSDL Technology to access a public web service.

Passing 1D arrays to functions.

Programming Languages CIS 443

DESIGNING HTML HELPERS TO OPTIMIZE WEB APPLICATION DEVELOPMENT

JavaScript and Dreamweaver Examples

HTML Forms and CONTROLS

Version August 2016

Introduction to Python

Webair CDN Secure URLs

InPost UK Limited GeoWidget Integration Guide Version 1.1

Online signature API. Terms used in this document. The API in brief. Version 0.20,

Sorting. Lists have a sort method Strings are sorted alphabetically, except... Uppercase is sorted before lowercase (yes, strange)

Introduction to Server-Side Programming. Charles Liu

Chapter 2: Interactive Web Applications

Exploits: XSS, SQLI, Buffer Overflow

Chapter 22 How to send and access other web sites

HTML Tables. IT 3203 Introduction to Web Development

Dynamic Web-Enabled Data Collection

Cross Site Scripting (XSS) and PHP Security. Anthony Ferrara NYPHP and OWASP Security Series June 30, 2011

Direct Post Method (DPM) Developer Guide

Webapps Vulnerability Report

JavaScript Introduction

Multimedia im Netz Online Multimedia Winter semester 2015/16. Tutorial 03 Major Subject

A Tale of the Weaknesses of Current Client-side XSS Filtering

Regular Expression Denial of Service

CTIS 256 Web Technologies II. Week # 1 Serkan GENÇ

dtsearch Regular Expressions

Outline. CS 112 Introduction to Programming. Recap: HTML/CSS/Javascript. Admin. Outline

Lab 4.4 Secret Messages: Indexing, Arrays, and Iteration

Contents. 2 Alfresco API Version 1.0

CGI Programming. What is CGI?

<option> eggs </option> <option> cheese </option> </select> </p> </form>

Unit 6. Loop statements

Compiler Construction

Transcription:

CSE 154 LECTURE 11: REGULAR EXPRESSIONS

What is form validation? validation: ensuring that form's values are correct some types of validation: preventing blank values (email address) ensuring the type of values integer, real number, currency, phone number, Social Security number, postal address, email address, date, credit card number,... ensuring the format and range of values (ZIP code must be a 5-digit integer) ensuring that values fit together (user types email twice, and the two must match)

A real form that uses validation

Client vs. server-side validation Validation can be performed: client-side (before the form is submitted) can lead to a better user experience, but not secure (why not?) server-side (in PHP code, after the form is submitted) needed for truly secure validation, but slower both best mix of convenience and security, but requires most effort to program

An example form to be validated <form action="http://foo.com/foo.php" method="get"> <div> City: <input name="city" /> <br /> State: <input name="state" size="2" maxlength="2" /> <br /> ZIP: <input name="zip" size="5" maxlength="5" /> <br /> <input type="submit" /> </div> </form> HTML Let's validate this form's data on the server... output

Basic server-side validation $city = $_POST["city"]; $state = $_POST["state"]; $zip = $_POST["zip"]; if (!$city strlen($state)!= 2 strlen($zip)!= 5) { print "Error, invalid city/state/zip submitted."; } PHP basic idea: examine parameter values, and if they are bad, show an error message and abort. But: How do you test for integers vs. real numbers vs. strings? How do you test for a valid credit card number? How do you test that a person's name has a middle initial? (How do you test whether a given string matches a particular complex format?)

Regular expressions /^[a-za-z_\-]+@(([a-za-z_\-])+\.)+[a-za-z]{2,4}$/ regular expression ("regex"): a description of a pattern of text can test whether a string matches the expression's pattern can use a regex to search/replace characters in a string regular expressions are extremely powerful but tough to read (the above regular expression matches email addresses) regular expressions occur in many places: Java: Scanner, String's split method (CSE 143 sentence generator) supported by PHP, JavaScript, and other languages many text editors (TextPad) allow regexes in search/replace The site Rubular is useful for testing a regex.

Regular expressions This picture best describes regex.

Basic regular expressions /abc/ in PHP, regexes are strings that begin and end with / the simplest regexes simply match a particular substring the above regular expression matches any string containing "abc": YES: "abc", "abcdef", "defabc", ".=.abc.=.",... NO: "fedcba", "ab c", "PHP",...

Wildcards:. A dot. matches any character except a \n line break /.oo.y/ matches "Doocy", "goofy", "LooNy",... A trailing i at the end of a regex (after the closing /) signifies a case-insensitive match /all/i matches Allison Obourn", small", JANE GOODALL",...

Special characters:, (), \ means OR /abc def g/ matches "abc", "def", or "g" There's no AND symbol. Why not? () are for grouping /(Homer Marge) Simpson/ matches "Homer Simpson" or "Marge Simpson" \ starts an escape sequence many characters must be escaped to match them literally: / \ $. [ ] ( ) ^ * +? /<br \/>/ matches lines containing <br /> tags

Quantifiers: *, +,? * means 0 or more occurrences /abc*/ matches "ab", "abc", "abcc", "abccc",... /a(bc)*/ matches "a", "abc", "abcbc", "abcbcbc",... /a.*a/ matches "aa", "aba", "a8qa", "a!?xyz 9a",... + means 1 or more occurrences /Hi!+ there/ matches "Hi! there", "Hi!!! there",... /a(bc)+/ matches "abc", "abcbc", "abcbcbc",...? means 0 or 1 occurrences /a(bc)?/ matches "a" or "abc"

More quantifiers: {min,max} {min,max} means between min and max occurrences (inclusive) /a(bc){2,4}/ matches "abcbc", "abcbcbc", or "abcbcbcbc" min or max may be omitted to specify any number {2,} means 2 or more {,6} means up to 6 {3} means exactly 3

Practice exercise When you search Google, it shows the number of pages of results as "o"s in the word "Google". What regex matches strings like "Google", "Gooogle", "Goooogle",...? (try it) (data) Answer: /Goo+gle/ (or /Go{2,}gle/)

Anchors: ^ and $ ^ represents the beginning of the string or line; $ represents the end /Jess/ matches all strings that contain Jess; /^Jess/ matches all strings that start with Jess; /Jess$/ matches all strings that end with Jess; /^Jess$/ matches the exact string "Jess" only /^Alli.*Obourn$/ matches AlliObourn", Allie Obourn", Allison E Obourn",... but NOT Allison Obourn stinks" or "I H8 Allison Obourn" (on the other slides, when we say, /PATTERN/ matches "text", we really mean that it matches any string that contains that text)

Character sets: [] [] group characters into a character set; will match any single character from the set /[bcd]art/ matches strings containing "bart", "cart", and "dart" equivalent to /(b c d)art/ but shorter inside [], many of the modifier keys act as normal characters /what[!*?]*/ matches "what", "what!", "what?**!", "what??!",... What regular expression matches DNA (strings of A, C, G, or T)? /[ACGT]+/

Character ranges: [start-end] inside a character set, specify a range of characters with - /[a-z]/ matches any lowercase letter /[a-za-z0-9]/ matches any lower- or uppercase letter or digit an initial ^ inside a character set negates it /[^abcd]/ matches any character other than a, b, c, or d inside a character set, - must be escaped to be matched /[+\-]?[0-9]+/ matches an optional + or -, followed by at least one digit

Practice Exercises What regular expression matches letter grades such as A, B+, or D-? (try it) (data) What regular expression would match UW Student ID numbers? (try it) (data) What regular expression would match a sequence of only consonants, assuming that the string consists only of lowercase letters? (try it) (data)

Escape sequences special escape sequence character sets: \d matches any digit (same as [0-9]); \D any non-digit ([^0-9]) \w matches any word character (same as [a-za-z_0-9]); \W any non-word char \s matches any whitespace character (, \t, \n, etc.); \S any non-whitespace What regular expression matches names in a "Last, First M." format with any number of spaces? /\w+,\s+\w+\s+\w\./

Regular expressions in PHP (PDF) regex syntax: strings that begin and end with /, such as "/[AEIOU]+/" function preg_match(regex, string) preg_replace(regex, replacement, string) preg_split(regex, string) description returns TRUE if string matches regex returns a new string with all substrings that match regex replaced by replacement returns an array of strings from given string broken apart using given regex as delimiter (like explode but more powerful)

PHP form validation w/ regexes $state = $_POST["state"]; if (!preg_match("/^[a-z]{2}$/", $state)) { print "Error, invalid state submitted."; } PHP preg_match and regexes help you to validate parameters sites often don't want to give a descriptive error message here (why?)

Regular expression PHP example # replace vowels with stars $str = "the quick brown fox"; $str = preg_replace("/[aeiou]/", "*", $str); # "th* q**ck br*wn f*x" # break apart into words $words = preg_split("/[ ]+/", $str); # ("th*", "q**ck", "br*wn", "f*x") # capitalize words that had 2+ consecutive vowels for ($i = 0; $i < count($words); $i++) { if (preg_match("/\\*{2,}/", $words[$i])) { $words[$i] = strtoupper($words[$i]); } } # ("th*", "Q**CK", "br*wn", "f*x") PHP

The die function die("error message text"); PHP PHP's die function prints a message and then completely stops code execution it is sometimes useful to have your page "die" on invalid input problem: poor user experience (a partial, invalid page is sent back)

The header function header("http header text"); # in general header("location: url"); # for browser redirection PHP PHP's header function can be used for several common HTTP messages sending back HTTP error codes (404 not found, 403 forbidden, etc.) redirecting from one page to another indicating content types, languages, caching policies, server info,... you can use a Location header to tell the browser to redirect itself to another page useful to redirect if the user makes a validation error must appear before any other HTML output generated by the script

Using header to redirect between pages header("location: url"); PHP $city = $_POST["city"]; $state = $_POST["state"]; $zip = $_POST["zip"]; if (!$city strlen($state)!= 2 strlen($zip)!= 5) { header("location: start-page.php"); # invalid input; redirect } PHP one problem: User is redirected back to original form without any clear error message or understanding of why the redirect occurred. (We can improve this later.)

Handling invalid data function check_valid($regex, $param) { if (preg_match($regex, $_POST[$param])) { return $_POST[$param]; } else { # code to run if the parameter is invalid die("bad $param"); } }... $sid = check_valid("/^[0-9]{7}$/", "studentid"); $section = check_valid("/^[ab][a-c]$/i", "section"); PHP Having a common helper function to check parameters is useful. If your page needs to show a particular HTML output on errors, the die function may not be appropriate.

Regular expressions in HTML forms How old are you? <input type="text" name="age" size="2" pattern="[0-9]+" title="an integer" /> <input type="submit" /> HTML output HTML5 adds a new pattern attribute to input elements the browser will refuse to submit the form unless the value matches the regex