Jerzy Nawrocki, Introduction to informatics



Similar documents
AWK: The Duct Tape of Computer Science Research. Tim Sherwood UC Santa Barbara

awk A UNIX tool to manipulate and generate formatted data

grep, awk and sed three VERY useful command-line utilities Matt Probert, Uni of York grep = global regular expression print

CMPSCI 250: Introduction to Computation. Lecture #19: Regular Expressions and Their Languages David Mix Barrington 11 April 2013

COS 333: Advanced Programming Techniques

Regular Languages and Finite Automata

Chapter 2: Elements of Java

C H A P T E R Regular Expressions regular expression

Introduction to Lex. General Description Input file Output file How matching is done Regular expressions Local names Using Lex

Regular Expressions with Nested Levels of Back Referencing Form a Hierarchy

Lecture 5. sed and awk

COMP 356 Programming Language Structures Notes for Chapter 4 of Concepts of Programming Languages Scanning and Parsing

We will learn the Python programming language. Why? Because it is easy to learn and many people write programs in Python so we can share.

The programming language C. sws1 1

Compiler I: Syntax Analysis Human Thought

THEORY of COMPUTATION

Antlr ANother TutoRiaL

Scanner. tokens scanner parser IR. source code. errors

CISC 181 Project 3 Designing Classes for Bank Accounts

Command Scripts Running scripts: include and commands

Retrieving Data Using the SQL SELECT Statement. Copyright 2006, Oracle. All rights reserved.

Module 816. File Management in C. M. Campbell 1993 Deakin University

Reading 13 : Finite State Automata and Regular Expressions

Computer Science. General Education Students must complete the requirements shown in the General Education Requirements section of this catalog.

How To Program In Scheme (Prolog)

Lecture 2: Regular Languages [Fa 14]

PHP Tutorial From beginner to master

Lecture 18 Regular Expressions

Compiler Construction

Information Systems SQL. Nikolaj Popov

Turing Machines, Part I

Management Information Systems 260 Web Programming Fall 2006 (CRN: 42459)

Outline Basic concepts of Python language

Python Programming: An Introduction to Computer Science

Moving from CS 61A Scheme to CS 61B Java

A Lex Tutorial. Victor Eijkhout. July Introduction. 2 Structure of a lex file

Advanced Query for Query Developers

IV-1Working with Commands

CS103B Handout 17 Winter 2007 February 26, 2007 Languages and Regular Expressions

Java Basics: Data Types, Variables, and Loops

AP Computer Science Static Methods, Strings, User Input

Introduction to Python

In This Lecture. SQL Data Definition SQL SQL. Notes. Non-Procedural Programming. Database Systems Lecture 5 Natasha Alechina

Informatica e Sistemi in Tempo Reale

Fast string matching

Sources: On the Web: Slides will be available on:

CS 141: Introduction to (Java) Programming: Exam 1 Jenny Orr Willamette University Fall 2013

Automata and Formal Languages

Fondamenti di C++ - Cay Horstmann 1

Introduction to Python

So far we have considered only numeric processing, i.e. processing of numeric data represented

Introduction to Microsoft Jet SQL

Biinterpretability up to double jump in the degrees

Visual Logic Instructions and Assignments

Evaluation of JFlex Scanner Generator Using Form Fields Validity Checking

Regular Expression Syntax

B A S I C S C I E N C E S

THE DEGREES OF BI-HYPERHYPERIMMUNE SETS

Lexical analysis FORMAL LANGUAGES AND COMPILERS. Floriano Scioscia. Formal Languages and Compilers A.Y. 2015/2016

CS106A, Stanford Handout #38. Strings and Chars

Forensic Analysis of Internet Explorer Activity Files

Oracle Database 11g SQL

Python Lists and Loops

Form Validation. Server-side Web Development and Programming. What to Validate. Error Prevention. Lecture 7: Input Validation and Error Handling

Introduction to Matlab

590.7 Network Security Lecture 2: Goals and Challenges of Security Engineering. Xiaowei Yang

Database Normalization. Mohua Sarkar, Ph.D Software Engineer California Pacific Medical Center

#1-12: Write the first 4 terms of the sequence. (Assume n begins with 1.)

HP-UX Essentials and Shell Programming Course Summary

ESPResSo Summer School 2012

Formal Languages and Automata Theory - Regular Expressions and Finite Automata -

Exceptions in MIPS. know the exception mechanism in MIPS be able to write a simple exception handler for a MIPS machine

Bachelors of Computer Application Programming Principle & Algorithm (BCA-S102T)

Boolean Algebra Part 1

Catalan Numbers. Thomas A. Dowling, Department of Mathematics, Ohio State Uni- versity.

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science

HOMEWORK # 2 SOLUTIO

CAs and Turing Machines. The Basis for Universal Computation

(IALC, Chapters 8 and 9) Introduction to Turing s life, Turing machines, universal machines, unsolvable problems.

Mathematics for Computer Science/Software Engineering. Notes for the course MSM1F3 Dr. R. A. Wilson

A Typing System for an Optimizing Multiple-Backend Tcl Compiler

Finite Automata and Regular Languages

CSE 1223: Introduction to Computer Programming in Java Chapter 2 Java Fundamentals

Embedded Systems. Review of ANSI C Topics. A Review of ANSI C and Considerations for Embedded C Programming. Basic features of C

C++ Language Tutorial

Section 1.4 Place Value Systems of Numeration in Other Bases

How To Use Excel With A Calculator

Using Static Program Analysis to Aid Intrusion Detection

Information Systems. Administered by the Department of Mathematical and Computing Sciences within the College of Arts and Sciences.

Udacity cs101: Building a Search Engine. Extracting a Link

System Calls Related to File Manipulation

Topics. Parts of a Java Program. Topics (2) CS 146. Introduction To Computers And Java Chapter Objectives To understand:

PART-A Questions. 2. How does an enumerated statement differ from a typedef statement?

03 - Lexical Analysis

Compilers Lexical Analysis

Programming and Software Development CTAG Alignments

D a n s k e B a n k M e s s a g e I m p l e m e n t a t i o n G u i d e. M a s t e r C a r d C o r p o r a t e C a r d T r a n s a c t i o n s

This section describes how LabVIEW stores data in memory for controls, indicators, wires, and other objects.

Transcription:

Jerzy Nawrocki Faculty of Computing & Information Sci. Poznan University of Technology jerzy.nawrocki@put.poznan.pl File conversion problem FName:John SName:Great Salary 585 Text processing and AWK FName:Ann SName:Nice Salary 7 FName SName Salary John Great 585 Ann Nice 7 Text processing & AWK (2) File conversion problem File conversion problem #include <stdio.h> #include <stdlib.h> FILE *fin; char token[2]; char gettoken(void) {int i=; char c; do {c = getc(fin); if (c == EOF) return (EOF); } while (c <! ); Solution in C: 4 lines of code Text processing & AWK (3) Solution in AWK BEGIN {FS=": ";} NR == 1 {print $1, "\t", $3, "\t", $5;} {gsub(/,/, "., $6); print $2, "\t", $4, "\t", $6;} Text processing & AWK (4) Origins of AWK Authors of AWK Bell Labs, Murray Hill (New Jersey), Foto: http://en.wikipedia.org/wiki/bell_labs Bell Labs, New Jersey (USA), 1977 AWK: Aho, Weinberger, Kernighan Platforms: Unix, MS DOS/Windows Similarity to C Text processing & AWK (5) Alfred Aho http://www.underforty.us/geeks.html Peter Weinberger Brian Kernighan Text processing & AWK (6) Text processing and AWK 1

Aim of the lecture Agenda To present: Another programming paradigm (rule-based programming) Basics of AWK Fundamentals of AWK Simplest programs Patterns Variables Text processing & AWK (7) Text processing & AWK (8) Fundamental question Input file What is text? Field Jerzy Nawrocki 4389 I1 Jane Kowalski 4378 I2 Line Fields: $1, $2, $3,... Text processing & AWK (9) Text processing & AWK (1) Program structure Execution principle pattern1 {instruction1} pattern2 {instruction2}...... Processing rule Jerzy Nawrocki Jane Kowalski Adam Malinowski pattern1 {instruction1} pattern2 {instruction2}...... Text processing & AWK (11) Text processing & AWK (12) Text processing and AWK 2

Agenda Simplest programs Fundamentals of AWK Simplest programs Patterns Variables How many fields on the output? Jerzy Nawrocki 4389 I1 Jane Kowalski 4378 I2 $4== I1 { print $2, $1; } Nawrocki Jerzy Malinowski Adam Text processing & AWK (13) Text processing & AWK (14) Simplest programs Simplest programs How many fields on the output? Jerzy Nawrocki 4389 I1 Jane Kowalski 4378 I2 What field will be first? Jerzy Nawrocki 4389 I1 Jane Kowalski 4378 I2 $4== I1 { print $2, $1; } Jerzy Nawrocki 4389 I1 Nawrocki Jerzy Kowalski Jane Malinowski Adam Text processing & AWK (15) Text processing & AWK (16) Agenda Patterns Fundamentals of AWK Simplest programs Patterns Variables Begining and end of text Relations Compound patterns Range patterns Text processing & AWK (17) Text processing & AWK (18) Text processing and AWK 3

Begining and end of text Begining and end of text Jerzy Nawrocki Jane Kowalski BEGIN { print ----- ; } $4== I2 { print $2, $1; } END { print ***** ; } ----- Kowalski Jane ***** 4389 I1 4378 I2 Text processing & AWK (19) Jerzy Nawrocki Jane Kowalski END { print ***** ; } $4== I2 { print $2, $1; } BEGIN { print ----- ; } ----- Kowalski Jane ***** 4389 I1 4378 I2 Text processing & AWK (2) Relations Compound patterns 12 11 2 11 $1 > $2 12 11 or $1==1 $2==1 && and $1==1 && $2==1! not! $1==1 Text processing & AWK (21) Text processing & AWK (22) Compound patterns Agenda Jerzy Adam 4389 I1 Adam Kowalski 4378 I2 $4== I1 && $1== Adam { print $2, $1; } Fundamentals of AWK Simplest programs Patterns Variables Malinowski Adam Text processing & AWK (23) Text processing & AWK (24) Text processing and AWK 4

http://www.math.wisc.edu/~gpslogic/ Stephen Kleene 199-1-5, Connecticut, USA 1934: Dr, Princeton Univ., (Alonzo Church) 1935: Univ. of Wisconsin- Madison (USA) 1939-4: Inst. for Advanced Study, Princeton recursion theory 199: National Medal of Sci. 1994-1-25, Madison Text processing & AWK (25) Regular expressions Arithmetic expressions Value: Text Number Value(2 3 + 3) = 9 Regular expressions Value: Text SetOfCharacterStrings Value(/Ala Ola/) = {"Ala", "Ola"} Text processing & AWK (26) $, $1, $2,.. $, $1, $2,.. Begining String ~ /^ reg_exp / $1 ~ /^I$/ $1 ~ /^I/ Text processing & AWK (27) Text processing & AWK (28) $, $1, $2,.. $, $1, $2,.. Begining String ~ /^ reg_exp / End String ~ / reg_exp $/ $3 ~ /e$/ Begining String ~ /^ reg_exp / End String ~ / reg_exp $/ Substring String ~ / reg_exp / $3 ~ /e/ Text processing & AWK (29) Text processing & AWK (3) Text processing and AWK 5

$, $1, $2,.. $, $1, $2,.. Begining String ~ /^ reg_exp / End String ~ / reg_exp $/ Substring String ~ / reg_exp / $ ~ /ne/ Text processing & AWK (31) Begining String ~ /^ reg_exp / End String ~ / reg_exp $/ Substring String ~ / reg_exp / $ ~ / wyr_reg / = / wyr_reg / /ne/ Text processing & AWK (32) Special characters Special characters. Any character [ ] Set of characters \n New line \. Dot \ Quotation \ddd Character of octal code = ddd What does it mean? /^.$/ /[123456789]/ /[-9]/ Text processing & AWK (33) Text processing & AWK (34) Complement of a set of characters Disjunction (or) reg_exp reg_exp What s the difference? [^... ] /[^-9]/ /^[-9]/ /im in/ Text processing & AWK (35) Text processing & AWK (36) Text processing and AWK 6

Parentheses and disjunction Parentheses change precedence of operators: 3*(4 + 5) = 3* 9 = 27 Distributive law: 3*(4 + 5) = 3*4 + 3*5 = 12 + 15 = 27 (4 + 5)*3 = 4*3 + 5*3 = 12 +15 = 27 Select all lines for which the first field is a number. 1 2 3 AD 1984 $1 ~ /[-9]/ Distributivity of concatenation over disjunction: /i(m n)/ = /Lond(o y)n/ = /im in/ /London Londyn/ Text processing & AWK (37) Text processing & AWK (38) Select all lines for which the first field is a number. 1 2 3 4tune: AD 1984 2 cents Select all lines for which the first field is a 1-digit number. 1 2 3 4tune: 4tune: AD 1984 2 cents 32 cents $1 ~ /[-9]/ $1 ~ /^ [-9] $/ 1 2 3 4tune: 2 cents Text processing & AWK (39) Text processing & AWK (4) Select all lines for which the first field is a 1- or 2-digit number. 1 2 3 4tune: 4tune: AD1984 2 cents 32 cents Select all lines for which the first field is a 1- or 2-digit number. 1 2 3 4tune: 4tune: 4tune: AD 1984 2 cents 32 cents 32 cents 12 euro $1 ~ /^( [-9] [-9][-9] )$/ $1 ~ /^( [-9] [-9][-9] )$/ 1 digit 2 digits 1 2 3 2 cents 32 cents Text processing & AWK (41) Text processing & AWK (42) Text processing and AWK 7

$1 ~ /^( [-9] [-9][-9] [-9][-9][-9] )$/ Select all lines for which the first field is a number. 1 2 3 4tune: 4tune: 4tune: AD 1984 2 cents 32 cents 32 cents 12 euro 1 digit 2 digits 3 digits [-9] = [-9] 1 [-9][-9] = [-9] 2 [-9][-9][-9] = [-9] 3.................. [-9] 1 [-9] 2 [-9] 3... = [-9]+ Text processing & AWK (43) Text processing & AWK (44) $1 ~ /^ [-9]+ $/ Select all lines for which the first field is a number. 1 2 3 4tune: 4tune: 4tune: AD 1984 2 cents 32 cents 32 cents 12 euro 1 2 3 2 cents 32 cents 32 cents 12 euro Text processing & AWK (45) A non-empty sequence Can be a of w s. sequence of w s. w+ = w ww www... Empty string w* = w ww www... x = x x = x w+ = w w* = w* w x w* = x x w+ w*x = x w+x w( w ww www.. )= w ww www wwww.. Text processing & AWK (46) Riddle Riddle What s its meaning? What on the output? Payment 21/11 ================ Nawrocki 16 Antczak 359 $2 ~ /^[-9][-9]*$/ $2 ~ /^[-9]+$/ Text processing & AWK (47) Text processing & AWK (48) Text processing and AWK 8

Agenda Variables Fundamentals of AWK Simplest programs Patterns Variables Variables introduced by a programmer (type: string of characters; initial value: empty string / zero) Built-in variables (standard meaning) Field variables $1, $(i+j-1),.. Text processing & AWK (49) Text processing & AWK (5) Some built-in variables NF number of fields in a row NR row number FILENAME file name with input data NR NF total 1 2 2 1 2 3 Variables If you have a hammer, everything looks like a nail {total= total + NF;} END {print "Fields: ", total; print "Rows: ", NR;} Text processing & AWK (51) Text processing & AWK (52) Input file Summary Field Jerzy Nawrocki 4389 I1 Jane Kowalski 4378 I2 Line Fields: $1, $2, $3,... Text processing & AWK (53) Text processing & AWK (54) Text processing and AWK 9

Program structure Execution principle pattern1 {instruction1} pattern2 {instruction2}...... Processing rule Jerzy Nawrocki Jane Kowalski Adam Malinowski pattern1 {instruction1} pattern2 {instruction2}...... Text processing & AWK (55) Text processing & AWK (56) Regular expressions Summary (1)* = {ε, 1, 11, 111, } ε x = x ε L 1 L 2 = {xy: x L 1 y L 2 } L (r) = { ε } L n+1 (r) = L(r) L n (r) L n (r) = {xx x: x L(r) } n L(r*) = U L n (r) n= L(r*) = {ε,x, xx, xxx, : x L(r)} Text processing & AWK (57) At last! gawk -f prog.awk <in.txt >out.txt Other features of AWK: Compound instructions (if, while,..) Dynamic arrays Built-in functions (gsub,..) Text processing & AWK (58) Literature A. Aho, B. Kernighan, P. Weinberger, The AWK Programming Language, Addison-Wesley, Reading, 1988. J. Nawrocki, W. Complak, Wprowadzenie do przetwarzania tekstów w języku AWK, Pro Dialog 2 (1994), 23-46. J. Cybulka, B. Jankowska, J.R. Nawrocki, Automatyczne przetwarzanie tekstów. AWK, Lex i YACC, Nakom, Poznań, 22. Text processing & AWK (59) Thank you for your attention! Text processing & AWK (6) Text processing and AWK 1