Using Regular Expressions in Oracle



Similar documents
Being Regular with Regular Expressions. John Garmany Session

Regular Expression Syntax

Regular Expression Searching

Lecture 18 Regular Expressions

Regular Expressions Overview Suppose you needed to find a specific IPv4 address in a bunch of files? This is easy to do; you just specify the IP

Kiwi Log Viewer. A Freeware Log Viewer for Windows. by SolarWinds, Inc.

[ Team LiB ] [ Team LiB ] Table of Contents Reviews Reader Reviews Errata. Oracle Regular Expressions Pocket Reference

Introducing Oracle Regular Expressions. An Oracle White Paper September 2003

Symbols in subject lines. An in-depth look at symbols

Regular Expressions. Abstract

Regular Expressions. General Concepts About Regular Expressions

Regular Expressions (in Python)

Oracle Database 11g SQL

Lecture 4. Regular Expressions grep and sed intro

Regular Expressions. In This Appendix

dtsearch Frequently Asked Questions Version 3

Oracle Database: Introduction to SQL

Python Lists and Loops

Oracle Database: Introduction to SQL

CS106A, Stanford Handout #38. Strings and Chars

Access Part 2 - Design

University Convocation. IT 3203 Introduction to Web Development. Pattern Matching. Why Match Patterns? The Search Method. The Replace Method

3.GETTING STARTED WITH ORACLE8i

Eventia Log Parsing Editor 1.0 Administration Guide

Oracle Database: Introduction to SQL

Bachelors of Computer Application Programming Principle & Algorithm (BCA-S102T)

Ad Hoc Advanced Table of Contents

Selecting Features by Attributes in ArcGIS Using the Query Builder

Archive Search and Search Agents. Select any section on the left to jump right in or click Start to begin.

Regular Expressions and Pattern Matching

Oracle SQL. Course Summary. Duration. Objectives

COMSC 100 Programming Exercises, For SP15

IBM Emulation Mode Printer Commands

GENEVA COLLEGE INFORMATION TECHNOLOGY SERVICES. Password POLICY

Frequently Asked Questions on character sets and languages in MT and MX free format fields

Variables, Constants, and Data Types

Excel: Introduction to Formulas

Introduction to Searching with Regular Expressions

JavaScript: Introduction to Scripting Pearson Education, Inc. All rights reserved.

The use of binary codes to represent characters

Retrieving Data Using the SQL SELECT Statement. Copyright 2006, Oracle. All rights reserved.

Regular Expressions. The Complete Tutorial. Jan Goyvaerts

Select the Crow s Foot entity relationship diagram (ERD) option. Create the entities and define their components.

Import Filter Editor User s Guide

Regular Expressions for Perl, C, PHP, Python, Java, and.net. Regular Expression. Pocket Reference. Tony Stubblebine

Chapter 4: Computer Codes

KB_SQL SQL Reference Guide Version 4

TAP Interface Specifications

How to Make the Most of Excel Spreadsheets

POLYCENTER Software Installation Utility Developer s Guide

Lab 4.4 Secret Messages: Indexing, Arrays, and Iteration

DEBT COLLECTION SYSTEM ACCOUNT SUBMISSION FILE

Content of this lecture. Regular Expressions in Java. Hello, world! In Java. Programming in Java

File by OCR Manual. Updated December 9, 2008

grep, awk and sed three VERY useful command-line utilities Matt Probert, Uni of York grep = global regular expression print

ACCELL/SQL: Creating Reports with RPT Report Writer

Lab 9 Access PreLab Copy the prelab folder, Lab09 PreLab9_Access_intro

Digital System Design Prof. D Roychoudhry Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Time Clock Import Setup & Use

Setting Up Database Security with Access 97

Introduction to Java Applications Pearson Education, Inc. All rights reserved.

Japanese Character Printers EPL2 Programming Manual Addendum

Regular expressions are a formal way to

ASCII Encoding. The char Type. Manipulating Characters. Manipulating Characters

Create a survey using Google Forms

PL / SQL Basics. Chapter 3

Access 2010 Intermediate Skills

Handout 1. Introduction to Java programming language. Java primitive types and operations. Reading keyboard Input using class Scanner.

Oracle Database: SQL and PL/SQL Fundamentals NEW

Using PRX to Search and Replace Patterns in Text Strings

VDF Query User Manual

So far we have considered only numeric processing, i.e. processing of numeric data represented

Statoil esourcing portal. Supplier user guide February 2015

Resources You can find more resources for Sync & Save at our support site:

Introduction This document s purpose is to define Microsoft SQL server database design standards.

Corporate Online. Import format for Payment Processing Service files

Oracle Database: SQL and PL/SQL Fundamentals

Information and Computer Science Department ICS 324 Database Systems Lab#11 SQL-Basic Query

Name: Class: Date: 9. The compiler ignores all comments they are there strictly for the convenience of anyone reading the program.

Exercise 4 Learning Python language fundamentals

Advisen News Search - Quick Start Guide

Microsoft Office. Mail Merge in Microsoft Word

Embedded Special Characters Kiran Karidi, Mahipal Vanam, and Sridhar Dodlapati

InfiniteInsight 6.5 sp4

Business Online Transaction Import & Export (Download) File Formats REFERENCE DOCUMENT

Unix Shell Scripts. Contents. 1 Introduction. Norman Matloff. July 30, Introduction 1. 2 Invoking Shell Scripts 2

Using SQL Queries in Crystal Reports

set in Options). Returns the cursor to its position prior to the Correct command.

Reading Delimited Text Files into SAS 9 TS-673

Access Queries (Office 2003)

Version of Barcode Toolbox adds support for Adobe Illustrator CS

D S D Q U E R Y M E N U

Topics. Parts of a Java Program. Topics (2) CS 146. Introduction To Computers And Java Chapter Objectives To understand:

The Center for Teaching, Learning, & Technology

INFOASSIST: REPORTING MADE SIMPLE

A Comparison of Database Query Languages: SQL, SPARQL, CQL, DMX

Grandstream Networks, Inc. UCM6100 Series IP PBX Appliance CDR and REC API Guide

dtsearch Regular Expressions

Python: Regular Expressions

Searching Guide Version 8.0 December 11, 2013

Transcription:

Using Regular Expressions in Oracle Everyday most of us deal with multiple string functions in Sql. May it be for truncating a string, searching for a substring or locating the presence of special characters. The regexp functions available in Oracle 10g can help us achieve the above tasks in a simpler and faster way. I have tried to illustrate the behavior of the regexp functions with common patterns and description of each. This document mainly focuses on the usage of patterns. The patterns can be used with any of the regular expression functions. But before we start let me include little oracle documentation. I hope the documentation is good enough to start with. Okay let s start with our first case. 1. Validate a string for alphabets only regexp_like('google','^[[:alpha:]]{6}$') In the above example we tried to check if the source string contains only 6 alphabets and hence we got a match for Google. Now let s try to understand the pattern '^[[:alpha:]]{6}$' ^ marks the start of the string [[:alpha:]] specifies alphabet class {6} specifies the number of alphabets $ marks the of the string 2. Validate a string for lower case alphabets only regexp_like('terminator','^([[:lower:]]{3,12})$') Output: No Match Found If we change the input to terminator, we get a match. In the above example we tried to check if the source string contains only lower case alphabets with a string length varying from minimum of 3 to max of 12. The pattern is similar as above but {3,12}. {3,12} specifies at least 3 alphabets but not more than 12

3. Case Sensitive Search regexp_like('republic Of India','of', 'c') Output: No Match Found c stands for case sensitive In the above example we tried to search for the pattern of exactly matching the case and hence we didn t find a match Let s now try a search for of in the source string irrespective of the case in the below example: regexp_like('republic Of India','of','i') i stands for case insensitive 4. Match nth character regexp_like('ter*minator','^...[^[:alnum:]]') In the above example we tried to search for a special character at the 4 th position of the input string ter*minator Let s now try to understand the pattern '^...[^[:alnum:]]' ^ marks the start of the string. a dot signifies any character ( 3 dots specify any three characters) [^[:alnum:]] Non alpha-numeric characters (^ inside brackets specifies negation) Note: The $ is missing in the pattern. It s because we are not concerned beyond the 4 th character and hence we need not mark the of the string. (...[^[:alnum:]]$ would mean any three characters followed by a special character and no characters beyond the special character) 5. Search and replace white spaces

select regexp_replace('help Earth Stay Green','[[:blank:]]{2,8}',' ') from dual; Output: Help Earth Stay Green In the above example we have replaced multiple white spaces (2 to 8) in the source string with single space. Let s now try to understand the pattern [[:blank:]]{2,8} Now that you are familiar with the patterns you may be wondering about the missing (^$) anchors in the pattern. These have been skipped purposefully. In the source string we wanted to search for multiple spaces anywhere in the string. So we need not specify the start (^) and ($) anchors. Note: ^[[:blank:]]{2,8}$ would mean a pattern consisting of min of 2 spaces to a max of 8 and no other characters and we won t find a match for the above input. 6. Search for control characters regexp_like('super' chr(13) 'Star','[[:cntrl:]]') In the above example we tried to search for presence of carriage return in our source string (13 is the ASCII value for carriage return). 7. Extract sub strings from source string i) select regexp_substr('1psn/231_3253/abc','[[:alnum:]]+') from dual; Output: 1PSN [[:alnum:]]+ One or more number of alphanumeric characters (The + sign stands for one or more occurrences) Note: I didn t include the ^ and $ anchors as I wanted to search for the matching pattern anywhere in the source string. By default the first matching pattern is returned. ii) select regexp_substr('1psn/231_3253/abc','[[:alnum:]]+',1,2) from dual; Output: 231 Note the extra parameters in the expression compared to the first example. 1 specifies that the search needs to start from the first character in the source string 2 specifies the second occurrence of the matching pattern which is 231 in the source string 1PSN/231_3253/ABc iii) select regexp_substr('@@/231_3253/abc','@*[[:alnum:]]+') from dual;

Output: 231 @* Search for zero or more occurrences of @ ( * stands for zero or more occurrences) [[:alnum:]]+ followed by one or more occurrences of alphanumeric characters Note: In the above example oracle looks for @ (zero times or more) immediately followed by alphanumeric characters. Since a '/' comes between @ and 231 the output is: 0 occurrences of @ + one or more occurrences of alphanumeric characters. iv) select regexp_substr('1@/231_3253/abc','@+[[:alnum:]]*') from dual; Output: @ @+ one or more occurrences of @ [[:alnum:]]* followed by 0 or more occurrences of alphanumeric characters v) select regexp_substr('1@/231_3253/abc','@+[[:alnum:]]+') from dual; Output: Null @+ one or more occurrences of @ [[:alnum:]]+ followed by one or more occurrences of alphanumeric characters In the above example, there is no matching pattern in the source string and hence the output is null. vi) select regexp_substr('@1psn/231_3253/abc125','[[:digit:]]+$') from dual; Output: 125 [[:digit:]]+ one or more occurrences of digits only $ at the of the string In the above example we tried to extract the digits at the of the source string. vii) select regexp_substr('@1psn/231_3253/abc','[^[:digit:]]+$') from dual; Output: /ABc [^[:digit:]]+$ one or more occurrences of non-digit literals at the of the string Note: ^ inside square brackets marks the negation of the class viii) select regexp_substr('tom_kyte@oracle.com','[^@]+') from dual; Output: Tom_Kyte [^@]+ one or more occurrences of characters which are not @ The above example extracts the name part from email address. ix) select regexp_substr('1psn/231_3253/abc','[[:alnum:]]*',1,2) from dual; Output: Null

[[:alnum:]]* zero or more number of alphanumeric characters We looked for the second occurrence of alpha numeric characters in the source string which is 231. Don t we find the output misleading here? In fact it isn t. If you carefully look at the pattern, it says; second occurrence of zero or more number of alphanumeric characters. The word or is stressed here. We have a / followed by 1PSN. This accounts for zero occurrence of alphanumeric characters and hence the output Null. Now let s move to our next case. 8. Validate email REGEXP_LIKE('tom_kyte@oracle.com', '^([[:alnum:]]+(_? \.))[[:alnum:]]*@[[:alnum:]]+(\.([[:alnum:]]+)){1,2}$') then 'Match Found' Let s try to understand the pattern ^([[:alnum:]]+(_? \.))[[:alnum:]]*@[[:alnum:]]+(\.([[:alnum:]]+)){1,2}$ in parts. You are aware that the ^ marks the start of the string. Now let s break ([[:alnum:]]+(_? \.)) into two parts: [[:alnum:]]+ and (_? \.) [[:alnum:]]+ represents one or more occurrences of alphanumeric characters (_? \.) represents zero or one occurrence of underscore or dot. The? symbol here stands for zero or one occurrence; and symbol represents OR. The expression (_? \.) represents for a sub expression as a whole representing an optional underscore or dot. The parentheses help in grouping expressions together to represent one sub expression. So now, ([[:alnum:]]+(_? \.)) as a whole represents one or more occurrences of alphanumeric characters optionally followed by an underscore or dot. Now that we have understood sub expressions we can move ahead with the rest of the pattern. [[:alnum:]]* followed by zero or more occurrences of alphanumeric characters @ followed by @ [[:alnum:]]+ followed by one or more occurrences of alphanumeric characters (\.([[:alnum:]]+)){1,2} followed by the sub expression: dot followed by one or more occurrences of alphanumeric characters; from one to a max of two occurrences of the sub expression (Example:.com or.co.in) Input: tom.kyte@oracle.com Input: tom-kyte@oracle.co.uk Output: No Match Found

Note: Did you notice the backslash before the dot? The backslash here is used as escape sequence for dot. Take extra care while testing patterns that include dot literals, as a Dot (.) alone stands for any character 9. Validate SSN regexp_like('987-65-4321','^[0-9]{3}-[0-9]{2}-[0-9]{4}$') Input: 987-654-3210 Output: No match found ^ start of the string [0-9]{3} three occurrences of digits from 0-9 - followed by hyphen [0-9]{2} two occurrences of digits from 0-9 - followed by hyphen [0-9]{4} four occurrences of digits from 0-9 $ of the string The above pattern can also be used to validate phone numbers with little customization. 10. Consecutive Occurrences i) Let s try to search for two consecutive occurrences of letters from a-z in the following example. select regexp_substr('republicc Of Africaa','([a-z])\1', 1,1,'i') from dual; Output: cc ([a-z]) character set a-z \1 consecutive occurrence of any character in the class [a-z] 1 starting from 1st character in the string 1 First occurrence i stands for case insensitive ii) Now let s try to search for three consecutive occurrences digits from 6 to 9 in the following example. regexp_like('patch 10888 applied','([6-9])\1\1')

Note: I didn t specify the position or occurrence in the above sql query. By default, oracle will search for the first occurrence of the matching pattern from the start of the string and will stop when it encounters appropriate match. If no match is found it stops at the of the string 11. Formatting Strings SELECT REGEXP_REPLACE('04099661234', '([[:digit:]]{3})([[:digit:]]{4})([[:digit:]]{4})', '(\1) \2-\3') as Formatted_Phone FROM dual; Output: (040) 9966-1234 We tried to format a phone number in the above example. Let s understand the match pattern and replacing string. Our match pattern is ^([[:digit:]]{3})([[:digit:]]{4})([[:digit:]]{4})$ ([[:digit:]]{3}) 3 digits ([[:digit:]]{4}) followed by 4 digits ([[:digit:]]{4}) followed by 4 digits Why did I group the digits into sub expressions using the parentheses? I could have simply searched for [[:digit:]]{11} as my input string comprises of 11 digits only. To understand the reason, let s look at the replacing string (\1) \2-\3 ( includes a opening parenthesis \1 represents the first sub group expression in our match pattern i.e. 040 ) includes a closing parenthesis \2 represents the second sub group i.e. 9966 - includes a hyphen \3 represents the third sub group i.e. 1234 We ll see some more formatting in our next example. SELECT REGEXP_REPLACE('04099661234', '^([[:digit:]]{1})([[:digit:]]{2})([[:digit:]]{4})([[:digit:]]{4})$', '+91-\2-\3-\4') as Formatted_Phone FROM dual; Output: +91-40-9966-1234 In the next example, we shall include a space between every character. SELECT REGEXP_REPLACE('YAHOO', '(.)', '\1 ') as Output FROM dual; Output: Y A H O O 12. Some more patterns In the below example let s look for http:// followed by a substring of one or more alphanumeric characters and optionally, a period (.) SELECT REGEXP_SUBSTR('Go to http://www.oracle.com/products and click on database','http://([[:alnum:]]+\.?){3,4}/?') RESULT

FROM dual; Output: http://www.oracle.com Input: Go to http://www.oracle.co.uk/products and click on database Output: http://www.oracle.co.uk http:// The characters http:// ([[:alnum:]]+\.?) represents the sub expression: one or more occurrences of alphanumeric characters followed by a dot optionally {3,4} minimum 3 occurrences of the above sub expression to a max of 4 /? followed by forward slash optionally Let s now try to extract the third value from a csv string. select regexp_substr('1101,yokohama,japan,1.5.105','[^,]+',1,3)as Output from dual; Output: Japan [^,]+ one or more occurrences of non comma characters 1 specifies the starting position 3 Third match Let us assume we have a source string as Why does a kid enjoy kidding with kids only? and we want to search for either kid or kids or kidding in the source string. regexp_like('why does a kid enjoy kidding with kids only?','kid(s ding)*', 'i') kid characters kid (s ding)* followed by zero or more occurrences of s or ding ( stands for OR) i represents case insensitive I have tried to accommodate few common and regular patterns which we may find useful for our work. For more examples and information on regular expressions you may visit the following links. 1. http://www.oracle.com/technology/obe/obe10gdb/develop/regexp/regexp.htm 2. http://www.psoug.org/reference/regexp.html Oracle Documentation Available regular Expression Functions: Function Name REGEXP_LIKE Description Similar to the LIKE operator, but performs regular expression

matching instead of simple pattern matching REGEXP_INSTR REGEXP_REPLACE REGEXP_SUBSTR Searches for a given string for a regular expression pattern and returns the position were the match is found Searches for a regular expression pattern and replaces it with a replacement string Searches for a regular expression pattern within a given string and returns the matched substring Syntax REGEXP_LIKE(srcstr, pattern [,match_option]) REGEXP_INSTR(srcstr, pattern [, position [, occurrence [, return_option [, match_option]]]]) REGEXP_SUBSTR(srcstr, pattern [, position [, occurrence [, match_option]]]) REGEXP_REPLACE(srcstr, pattern [,replacestr [, position [, occurrence [, match_option]]]]) Metacharacters You can use several predefined metacharacter symbols in the pattern matching with the functions. Symbol Description * Matches zero or more occurrences Alternation operator for specifying alternative matches ^/$ [] [^exp] {m} {m,n} Matches the start of line and the of line Bracket expression for a matching list matching any one of the expressions represented in the list If the caret is inside the bracket, it negates the expression. Matches exactly m times Matches at least m times but no more than n times [: :] Specifies a character class and matches any character in that class \ Can have four different meanings: (1) stand for itself; (2) quote the next character; (3) introduce an operator; (4) do nothing + Matches one or more occurrences? Matches zero or one occurrence. Matches any character in the supported character set (except NULL) () Grouping expression (treated as a single subexpression)

\n Backreference expression [==] [..] Specifies equivalence classes Specifies one collation element (such as a multicharacter element) Match Options Option c i Description Case sensitive matching Case insensitive matching Character Classes Option [:alnum:] [:alpha:] [:blank:] [:cntrl:] [:digit:] [:lower:] Description Alphanumeric characters Alphabetic characters Blank Space Characters Control characters (nonprinting) Numeric digits Lowercase alphabetic characters [:space:] Space characters (nonprinting), such as carriage return, newline, vertical tab, and form feed [:upper:] Uppercase alphabetic characters