JAVA - REGULAR EXPRESSIONS



Similar documents
Content of this lecture. Regular Expressions in Java. Hello, world! In Java. Programming in Java

Lecture 18 Regular Expressions

Programming Languages CIS 443

Regular Expression Syntax

CSE 1223: Introduction to Computer Programming in Java Chapter 2 Java Fundamentals

public static void main(string[] args) { System.out.println("hello, world"); } }

Regular Expressions Overview Suppose you needed to find a specific IPv4 address in a bunch of files? This is easy to do; you just specify the IP

JAVA - QUICK GUIDE. Java SE is freely available from the link Download Java. So you download a version based on your operating system.

CS 106 Introduction to Computer Science I

java.util.scanner Here are some of the many features of Scanner objects. Some Features of java.util.scanner

Regular Expression Searching

J a v a Quiz (Unit 3, Test 0 Practice)

Scanner. It takes input and splits it into a sequence of tokens. A token is a group of characters which form some unit.

We will learn the Python programming language. Why? Because it is easy to learn and many people write programs in Python so we can share.

Using Regular Expressions in Oracle

1) Which of the following is a constant, according to Java naming conventions? a. PI b. Test c. x d. radius

Moving from CS 61A Scheme to CS 61B Java

Using PRX to Search and Replace Patterns in Text Strings

Install Java Development Kit (JDK) 1.8

Regular Expressions. The Complete Tutorial. Jan Goyvaerts

qwertyuiopasdfghjklzxcvbnmqwerty uiopasdfghjklzxcvbnmqwertyuiopasd fghjklzxcvbnmqwertyuiopasdfghjklzx cvbnmqwertyuiopasdfghjklzxcvbnmq

Sample CSE8A midterm Multiple Choice (circle one)

Kiwi Log Viewer. A Freeware Log Viewer for Windows. by SolarWinds, Inc.

WA2099 Introduction to Java using RAD 8.0 EVALUATION ONLY. Student Labs. Web Age Solutions Inc.

Classes and Objects in Java Constructors. In creating objects of the type Fraction, we have used statements similar to the following:

COMP 356 Programming Language Structures Notes for Chapter 4 of Concepts of Programming Languages Scanning and Parsing

University Convocation. IT 3203 Introduction to Web Development. Pattern Matching. Why Match Patterns? The Search Method. The Replace Method

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science

Variables, Constants, and Data Types

Regular Expressions for Perl, C, PHP, Python, Java, and.net. Regular Expression. Pocket Reference. Tony Stubblebine

Topics. Parts of a Java Program. Topics (2) CS 146. Introduction To Computers And Java Chapter Objectives To understand:

JAVA - INHERITANCE. extends is the keyword used to inherit the properties of a class. Below given is the syntax of extends keyword.

Lecture 5: Java Fundamentals III

C H A P T E R Regular Expressions regular expression

Introduction to Java

Sources: On the Web: Slides will be available on:

JDK 1.5 Updates for Introduction to Java Programming with SUN ONE Studio 4

Comp 248 Introduction to Programming

First Java Programs. V. Paúl Pauca. CSC 111D Fall, Department of Computer Science Wake Forest University. Introduction to Computer Science

Compiler Construction

Basic Java Constructs and Data Types Nuts and Bolts. Looking into Specific Differences and Enhancements in Java compared to C

Lecture 4. Regular Expressions grep and sed intro

Chapter 2 Basics of Scanning and Conventional Programming in Java

Pemrograman Dasar. Basic Elements Of Java

A Grammar for the C- Programming Language (Version S16) March 12, 2016

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Java Basics: Data Types, Variables, and Loops

Topic 11 Scanner object, conditional execution

JAVA - METHODS. Method definition consists of a method header and a method body. The same is shown below:

CS106A, Stanford Handout #38. Strings and Chars

Debugging. Common Semantic Errors ESE112. Java Library. It is highly unlikely that you will write code that will work on the first go

Handout 1. Introduction to Java programming language. Java primitive types and operations. Reading keyboard Input using class Scanner.

Unit 6. Loop statements

So far we have considered only numeric processing, i.e. processing of numeric data represented

Chapter 2 Introduction to Java programming

Object-Oriented Design Lecture 4 CSU 370 Fall 2007 (Pucella) Tuesday, Sep 18, 2007

Object Oriented Software Design

Lexical analysis FORMAL LANGUAGES AND COMPILERS. Floriano Scioscia. Formal Languages and Compilers A.Y. 2015/2016

Introduction to Object-Oriented Programming

Computer Programming C++ Classes and Objects 15 th Lecture

Introduction to Java. CS 3: Computer Programming in Java

Python Lists and Loops

Contents. 9-1 Copyright (c) N. Afshartous

How to Write a Simple Makefile

Handout 3 cs180 - Programming Fundamentals Spring 15 Page 1 of 6. Handout 3. Strings and String Class. Input/Output with JOptionPane.

Introduction to Java Applications Pearson Education, Inc. All rights reserved.

JAVA - MULTITHREADING

The first program: Little Crab

03 - Lexical Analysis

Statements and Control Flow

Object Oriented Software Design

Introduction to Python

Building Java Programs

Java String Methods in Print Templates

LAB4 Making Classes and Objects

ECE 122. Engineering Problem Solving with Java

Problem 1. CS 61b Summer 2005 Homework #2 Due July 5th at the beginning of class

The following program is aiming to extract from a simple text file an analysis of the content such as:

2 The first program: Little Crab

JAVA - EXCEPTIONS. An exception can occur for many different reasons, below given are some scenarios where exception occurs.

Building Java Programs

Regular Expressions. In This Appendix

Object-Oriented Programming in Java

Third AP Edition. Object-Oriented Programming and Data Structures. Maria Litvin. Gary Litvin. Phillips Academy, Andover, Massachusetts

6.1. Example: A Tip Calculator 6-1

The Java Series. Java Essentials I What is Java? Basic Language Constructs. Java Essentials I. What is Java?. Basic Language Constructs Slide 1

COSC Introduction to Computer Science I Section A, Summer Question Out of Mark A Total 16. B-1 7 B-2 4 B-3 4 B-4 4 B Total 19

1 Introduction. 2 An Interpreter. 2.1 Handling Source Code

Being Regular with Regular Expressions. John Garmany Session

Unix Shell Scripts. Contents. 1 Introduction. Norman Matloff. July 30, Introduction 1. 2 Invoking Shell Scripts 2

Arrays. Introduction. Chapter 7

Regular Expression. n What is Regex? n Meta characters. n Pattern matching. n Functions in re module. n Usage of regex object. n String substitution

CS 141: Introduction to (Java) Programming: Exam 1 Jenny Orr Willamette University Fall 2013

JAVA - OBJECT & CLASSES

CSCI 3136 Principles of Programming Languages

ASCII Encoding. The char Type. Manipulating Characters. Manipulating Characters

CompSci 125 Lecture 08. Chapter 5: Conditional Statements Chapter 4: return Statement

A TOOL FOR DATA STRUCTURE VISUALIZATION AND USER-DEFINED ALGORITHM ANIMATION

Section 6 Spring 2013

Secrets of printf. 1 Background. 2 Simple Printing. Professor Don Colton. Brigham Young University Hawaii. 2.1 Naturally Special Characters

Transcription:

JAVA - REGULAR EXPRESSIONS http://www.tutorialspoint.com/java/java_regular_expressions.htm Copyright tutorialspoint.com Java provides the java.util.regex package for pattern matching with regular expressions. Java regular expressions are very similar to the Perl programming language and very easy to learn. A regular expression is a special sequence of characters that helps you match or find other strings or sets of strings, using a specialized syntax held in a pattern. They can be used to search, edit, or manipulate text and data. The java.util.regex package primarily consists of the following three classes: Pattern Class: A Pattern object is a compiled representation of a regular expression. The Pattern class provides no public constructors. To create a pattern, you must first invoke one of its public static compile methods, which will then return a Pattern object. These methods accept a regular expression as the first argument. Matcher Class: A Matcher object is the engine that interprets the pattern and performs match operations against an input string. Like the Pattern class, Matcher defines no public constructors. You obtain a Matcher object by invoking the matcher method on a Pattern object. PatternSyntaxException: A PatternSyntaxException object is an unchecked exception that indicates a syntax error in a regular expression pattern. Capturing Groups: Capturing groups are a way to treat multiple characters as a single unit. They are created by placing the characters to be grouped inside a set of parentheses. For example, the regular expression dog creates a single group containing the letters "d", "o", and "g". Capturing groups are numbered by counting their opening parentheses from left to right. In the expression (AB(C)), for example, there are four such groups: (AB(C)) A B(C) C To find out how many groups are present in the expression, call the groupcount method on a matcher object. The groupcount method returns an int showing the number of capturing groups present in the matcher's pattern. There is also a special group, group 0, which always represents the entire expression. This group is not included in the total reported by groupcount. Example: Following example illustrates how to find a digit string from the given alphanumeric string: public static void main( String args[] ) // String to be scanned to find the pattern. String line = "This order was placed for QT3000! OK?"; String pattern = "(.*)(\\d+)(.*)"; // Create a Pattern object

Pattern r = Pattern.compile(pattern); // Now create matcher object. Matcher m = r.matcher(line); if (m.find( )) System.out.println("Found value: " + m.group(0) ); System.out.println("Found value: " + m.group(1) ); System.out.println("Found value: " + m.group(2) ); else System.out.println("NO MATCH"); Found value: This order was placed for QT3000! OK? Found value: This order was placed for QT300 Found value: 0 Regular Expression Syntax: Here is the table listing down all the regular expression metacharacter syntax available in Java: Subexpression ^ Matches Matches beginning of line. $ Matches end of line.. Matches any single character except newline. Using m option allows it to match newline as well. [...] Matches any single character in brackets. [^...] Matches any single character not in brackets \A Beginning of entire string \z End of entire string \Z End of entire string except allowable final line terminator. re* re+ re? re n re n, re n, m Matches 0 or more occurrences of preceding expression. Matches 1 or more of the previous thing Matches 0 or 1 occurrence of preceding expression. Matches exactly n number of occurrences of preceding expression. Matches n or more occurrences of preceding expression. Matches at least n and at most m occurrences of preceding expression. a b Matches either a or b. re Groups regular expressions and remembers matched text.? : re Groups regular expressions without remembering matched text.? > re Matches independent pattern without backtracking. \w Matches word characters. \W Matches nonword characters.

\s Matches whitespace. Equivalent to [\t\n\r\f]. \S Matches nonwhitespace. \d Matches digits. Equivalent to [0-9]. \D Matches nondigits. \A Matches beginning of string. \Z Matches end of string. If a newline exists, it matches just before newline. \z Matches end of string. \G Matches point where last match finished. \n Back-reference to capture group number "n" \b Matches word boundaries when outside brackets. Matches backspace 0x08 when inside brackets. \B Matches nonword boundaries. \n, \t, etc. Matches newlines, carriage returns, tabs, etc. \Q Escape quote all characters up to \E \E Ends quoting begun with \Q Methods of the Matcher Class: Here is a list of useful instance methods: Index Methods: Index methods provide useful index values that show precisely where the match was found in the input string: 1 public int start Returns the start index of the previous match. 2 public int startintgroup Returns the start index of the subsequence captured by the given group during the previous match operation. 3 public int end Returns the offset after the last character matched. 4 public int endintgroup Returns the offset after the last character of the subsequence captured by the given group during the previous match operation. Study Methods:

Study methods review the input string and return a Boolean indicating whether or not the pattern is found: 1 public boolean lookingat Attempts to match the input sequence, starting at the beginning of the region, against the pattern. 2 public boolean find Attempts to find the next subsequence of the input sequence that matches the pattern. 3 public boolean findintstart Resets this matcher and then attempts to find the next subsequence of the input sequence that matches the pattern, starting at the specified index. 4 public boolean matches Attempts to match the entire region against the pattern. Replacement Methods: Replacement methods are useful methods for replacing text in an input string: 1 public Matcher appendreplacementstringbuffersb, Stringreplacement Implements a non-terminal append-and-replace step. 2 public StringBuffer appendtailstringbuffersb Implements a terminal append-and-replace step. 3 public String replaceallstringreplacement Replaces every subsequence of the input sequence that matches the pattern with the given replacement string. 4 public String replacefirststringreplacement Replaces the first subsequence of the input sequence that matches the pattern with the given replacement string. 5 public static String quotereplacementstrings Returns a literal replacement String for the specified String. This method produces a String that will work as a literal replacement s in the appendreplacement method of the Matcher class. The start and end Methods:

Following is the example that counts the number of times the word "cat" appears in the input string: private static final String REGEX = "\\bcat\\b"; private static final String INPUT = "cat cat cat cattie cat"; public static void main( String args[] ) Pattern p = Pattern.compile(REGEX); Matcher m = p.matcher(input); // get a matcher object int count = 0; while(m.find()) count++; System.out.println("Match number "+count); System.out.println("start(): "+m.start()); System.out.println("end(): "+m.end()); atch number 1 start(): 0 end(): 3 atch number 2 start(): 4 end(): 7 atch number 3 start(): 8 end(): 11 atch number 4 start(): 19 end(): 22 You can see that this example uses word boundaries to ensure that the letters "c" "a" "t" are not merely a substring in a longer word. It also gives some useful information about where in the input string the match has occurred. The start method returns the start index of the subsequence captured by the given group during the previous match operation, and end returns the index of the last character matched, plus one. The matches and lookingat Methods: The matches and lookingat methods both attempt to match an input sequence against a pattern. The difference, however, is that matches requires the entire input sequence to be matched, while lookingat does not. Both methods always start at the beginning of the input string. Here is the example explaining the functionality: private static final String REGEX = "foo"; private static final String INPUT = "fooooooooooooooooo"; private static Pattern pattern; private static Matcher matcher;

public static void main( String args[] ) pattern = Pattern.compile(REGEX); matcher = pattern.matcher(input); System.out.println("Current REGEX is: "+REGEX); System.out.println("Current INPUT is: "+INPUT); System.out.println("lookingAt(): "+matcher.lookingat()); System.out.println("matches(): "+matcher.matches()); Current REGEX is: foo Current INPUT is: fooooooooooooooooo lookingat(): true matches(): false The replacefirst and replaceall Methods: The replacefirst and replaceall methods replace text that matches a given regular expression. As their names indicate, replacefirst replaces the first occurrence, and replaceall replaces all occurrences. Here is the example explaining the functionality: private static String REGEX = "dog"; private static String INPUT = "The dog says meow. " + "All dogs say meow."; private static String REPLACE = "cat"; public static void main(string[] args) Pattern p = Pattern.compile(REGEX); // get a matcher object Matcher m = p.matcher(input); INPUT = m.replaceall(replace); System.out.println(INPUT); The cat says meow. All cats say meow. The appendreplacement and appendtail Methods: The Matcher class also provides appendreplacement and appendtail methods for text replacement. Here is the example explaining the functionality: private static String REGEX = "a* b"; private static String INPUT = "aabfooaabfooabfoob"; private static String REPLACE = "-"; public static void main(string[] args)

Pattern p = Pattern.compile(REGEX); // get a matcher object Matcher m = p.matcher(input); StringBuffer sb = new StringBuffer(); while(m.find()) m.appendreplacement(sb,replace); m.appendtail(sb); System.out.println(sb.toString()); -foo-foo-foo- PatternSyntaxException Class Methods: A PatternSyntaxException is an unchecked exception that indicates a syntax error in a regular expression pattern. The PatternSyntaxException class provides the following methods to help you determine what went wrong: 1 public String getdescription Retrieves the description of the error. 2 public int getindex Retrieves the error index. 3 public String getpattern Retrieves the erroneous regular expression pattern. 4 public String getmessage Returns a multi-line string containing the description of the syntax error and its index, the erroneous regular expression pattern, and a visual indication of the error index within the pattern. Loading [MathJax]/jax/output/HTML-CSS/jax.js