The l3regex package: regular expressions in TEX
|
|
|
- Virginia Helen Robbins
- 9 years ago
- Views:
Transcription
1 The l3regex package: regular expressions in TEX The L A TEX3 Project Released 2015/11/04 1 l3regex documentation The l3regex package provides regular expression testing, extraction of submatches, splitting, and replacement, all acting on token lists. The syntax of regular expressions is mostly a subset of the pcre syntax (and very close to posix), with some additions due to the fact that TEX manipulates tokens rather than characters. For performance reasons, only a limited set of features are implemented. Notably, back-references are not supported. Let us give a few examples. After \tl_set:nn \l_my_tl { That~cat. } \regex_replace_once:nnn { at } { is } \l_my_tl the token list variable \l_my_tl holds the text This cat., where the first occurrence of at was replaced by is. A more complicated example is a pattern to add a comma at the end of each word: \regex_replace_all:nnn { \w+ } { \0, } \l_my_tl The \w sequence represents any word character, and + indicates that the \w sequence should be repeated as many times as possible (at least once), hence matching a word in the input token list. In the replacement text, \0 denotes the full match (here, a word). If a regular expression is to be used several times, it can be compiled once, and stored in a regex variable using \regex_const:nn. For example, \regex_const:nn \c_foo_regex { \c{begin} \cb. (\c[^be].*) \ce. } stores in \c_foo_regex a regular expression which matches the starting marker for an environment: \begin, followed by a begin-group token (\cb.), then any number of tokens which are neither begin-group nor end-group character tokens (\c[^be].*), ending with an end-group token (\ce.). As explained in the next section, the parentheses capture the result of \c[^be].*, giving us access to the name of the environment when doing replacements. This file describes v6224, last revised 2015/11/04. [email protected] 1
2 1.1 Syntax of regular expressions Most characters match exactly themselves, with an arbitrary category code. Some characters are special and must be escaped with a backslash (e.g., \* matches a star character). Some escape sequences of the form backslash letter also have a special meaning (for instance \d matches any digit). As a rule, every alphanumeric character (A Z, a z, 0 9) matches exactly itself, and should not be escaped, because \A, \B,... have special meanings; non-alphanumeric printable ascii characters can (and should) always be escaped: many of them have special meanings (e.g., use \(, \), \?, \.); spaces should always be escaped (even in character classes); any other character may be escaped or not, without any effect: both versions will match exactly that character. Note that these rules play nicely with the fact that many non-alphanumeric characters are difficult to input into TEX under normal category codes. For instance, \\abc\% matches the characters \abc% (with arbitrary category codes), but does not match the control sequence \abc followed by a percent character. Matching control sequences can be done using the \c{ regex } syntax (see below). Any special character which appears at a place where its special behaviour cannot apply matches itself instead (for instance, a quantifier appearing at the beginning of a string), after raising a warning. Characters. \x{hh...} Character with hex code hh... \xhh Character with hex code hh. \a Alarm (hex 07). \e Escape (hex 1B). \f Form-feed (hex 0C). \n New line (hex 0A). \r Carriage return (hex 0D). \t Horizontal tab (hex 09). Character types.. A single period matches any token. \d Any decimal digit. \h Any horizontal space character, equivalent to [\ \^^I]: space and tab. \s Any space character, equivalent to [\ \^^I\^^J\^^L\^^M]. 2
3 \v Any vertical space character, equivalent to [\^^J\^^K\^^L\^^M]. Note that \^^K is a vertical space, but not a space, for compatibility with Perl. \w Any word character, i.e., alpha-numerics and underscore, equivalent to [A-Za-z0-9\_]. \D Any token not matched by \d. \H Any token not matched by \h. \N Any token other than the \n character (hex 0A). \S Any token not matched by \s. \V Any token not matched by \v. \W Any token not matched by \w. Of those,., \D, \H, \N, \S, \V, and \W will match arbitrary control sequences. Character classes match exactly one token in the subject. [...] Positive character class. Matches any of the specified tokens. [^...] Negative character class. Matches any token other than the specified characters. x-y Within a character class, this denotes a range (can be used with escaped characters). [: name :] Within a character class (one more set of brackets), this denotes the posix character class name, which can be alnum, alpha, ascii, blank, cntrl, digit, graph, lower, print, punct, space, upper, word, or xdigit. [:^ name :] Negative posix character class. For instance, [a-oq-z\cc.] matches any lowercase latin letter except p, as well as control sequences (see below for a description of \c). Quantifiers (repetition).? 0 or 1, greedy.?? 0 or 1, lazy. * 0 or more, greedy. *? 0 or more, lazy. + 1 or more, greedy. +? 1 or more, lazy. {n} Exactly n. {n,} n or more, greedy. {n,}? n or more, lazy. 3
4 {n, m} At least n, no more than m, greedy. {n, m}? At least n, no more than m, lazy. Anchors and simple assertions. \b Word boundary: either the previous token is matched by \w and the next by \W, or the opposite. For this purpose, the ends of the token list are considered as \W. \B Not a word boundary: between two \w tokens or two \W tokens (including the boundary). ^or \A Start of the subject token list. $, \Z or \z End of the subject token list. \G Start of the current match. This is only different from ^ in the case of multiple matches: for instance \regex_count:nnn { \G a } { aaba } \l_tmpa_int yields 2, but replacing \G by ^ would result in \l_tmpa_int holding the value 1. Alternation and capturing groups. A B C Either one of A, B, or C. (...) Capturing group. (?:...) Non-capturing group. (?...) Non-capturing group which resets the group number for capturing groups in each alternative. The following group will be numbered with the first unused group number. The \c escape sequence allows to test the category code of tokens, and match control sequences. Each character category is represented by a single uppercase letter: C for control sequences; B for begin-group tokens; E for end-group tokens; M for math shift; T for alignment tab tokens; P for macro parameter tokens; U for superscript tokens (up); D for subscript tokens (down); S for spaces; L for letters; 4
5 O for others; and A for active characters. The \c escape sequence is used as follows. \c{ regex } A control sequence whose csname matches the regex, anchored at the beginning and end, so that \c{begin} matches exactly \begin, and nothing else. \cx Applies to the next object, which can be a character, character property, class, or group, and forces this object to only match tokens with category X (any of CBEMTPUDSLOA. For instance, \cl[a-z\d] matches uppercase letters and digits of category code letter, \cc. matches any control sequence, and \co(abc) matches abc where each character has category other. \c[xyz] Applies to the next object, and forces it to only match tokens with category X, Y, or Z (each being any of CBEMTPUDSLOA). For instance, \c[lso](..) matches two tokens of category letter, space, or other. \c[^xyz] Applies to the next object and prevents it from matching any token with category X, Y, or Z (each being any of CBEMTPUDSLOA). For instance, \c[^o]\d matches digits which have any category different from other. The category code tests can be used inside classes; for instance, [\co\d \c[lo][a-f]] matches what TEX considers as hexadecimal digits, namely digits with category other, or uppercase letters from A to F with category either letter or other. Within a group affected by a category code test, the outer test can be overridden by a nested test: for instance, \cl(ab\co\*cd) matches ab*cd where all characters are of category letter, except * which has category other. The \u escape sequence allows to insert the contents of a token list directly into a regular expression or a replacement, avoiding the need to escape special characters. Namely, \u{ tl var name } matches the exact contents of the token list tl var. Within a \c{...} control sequence matching, the \u escape sequence only expands its argument once, in effect performing \tl_to_str:v. Quantifiers are not supported directly: use a group. The option (?i) makes the match case insensitive (identifying A Z with a z; no Unicode support yet). This applies until the end of the group in which it appears, and can be reverted using (?-i). For instance, in (?i)(a(?-i)b c)d, the letters a and d are affected by the i option. Characters within ranges and classes are affected individually: (?i)[y-\\] is equivalent to [YZ\[\\yz], and (?i)[^aeiou] matches any character which is not a vowel. Neither character properties, nor \c{...} nor \u{...} are affected by the i option. In character classes, only [, ^, -, ], \ and spaces are special, and should be escaped. Other non-alphanumeric characters can still be escaped without harm. Any escape sequence which matches a single character (\d, \D, etc.) is supported in character classes. If the first character is ^, then the meaning of the character class is inverted; ^ appearing anywhere else in the range is not special. If the first character (possibly following a leading ^) is ] then it does not need to be escaped since ending the range there would 5
6 make it empty. Ranges of characters can be expressed using -, for instance, [\D 0-5] and [^6-9] are equivalent. Capturing groups are a means of extracting information about the match. Parenthesized groups are labelled in the order of their opening parenthesis, starting at 1. The contents of those groups corresponding to the best match (leftmost longest) can be extracted and stored in a sequence of token lists using for instance \regex_extract_- once:nnntf. The \K escape sequence resets the beginning of the match to the current position in the token list. This only affects what is reported as the full match. For instance, \regex_extract_all:nnn { a \K. } { a123aaxyz } \l_foo_seq results in \l_foo_seq containing the items {1} and {a}: the true matches are {a1} and {aa}, but they are trimmed by the use of \K. The \K command does not affect capturing groups: for instance, \regex_extract_once:nnn { (. \K c)+ \d } { acbc3 } \l_foo_seq results in \l_foo_seq containing the items {c3} and {bc}: the true match is {acbc3}, with first submatch {bc}, but \K resets the beginning of the match to the last position where it appears. 1.2 Syntax of the replacement text Most of the features described in regular expressions do not make sense within the replacement text. Escaped characters are supported as inside regular expressions. The whole match is accessed as \0, and the first 9 submatches are accessed as \1,..., \9. Further submatches are accessed through \g{ number } where number is any nonnegative integer. If there are fewer than number capturing groups, the submatch is empty. For instance, \tl_set:nn \l_my_tl { Hello,~world! } \regex_replace_all:nnn { ([er]?l o). } { \(\0\-\-\1\) } \l_my_tl results in \l_my_tl holding H(ell--el)(o,--o) w(or--o)(ld--l)! Submatches keep the same category codes as in the original token list. The characters inserted by the replacement have category code 12 (other) by default, with the exception of space characters. Spaces inserted through \ have category code 10, while spaces inserted through \x20 or \x{20} have category code 12. The escape sequence \c allows to insert characters with arbitrary category codes, as well as control sequences. \cxy Produces the character Y (which can be given as an escape sequence such as \t for tab, or \( or \) for a parenthesis) with category code X, which must be one of CBEMTPUDSLOA. \c{ text } Produces the control sequence with csname text. The text may contain references to the submatches \0, \1, etc. 6
7 The escape sequence \u{ tl var name } allows to insert the contents of the token list with name tl var name directly into the replacement, avoiding the need to escape special characters. Within the construction \c{ text }, the \u escape sequence only expands its argument once, in effect performing \tl_to_str:v. Submatches can be used within the argument of \u. For instance, \tl_set:nn \l_my_one_tl { first } \tl_set:nn \l_my_two_tl { \emph{second} } \tl_set:nn \l_my_tl { one, two, one, one } \regex_replace_all:nnn { [^,]+ } { \u{l_my_\0_tl} } \l_my_tl results in \l_my_tl holding first,\emph{second},first,first. 1.3 Pre-compiling regular expressions If a regular expression is to be used several times, it is better to compile it once rather than doing it each time the regular expression is used. The compiled regular expression is stored in a variable. All of the l3regex module s functions can be given their regular expression argument either as an explicit string or as a compiled regular expression. \regex_new:n \regex_set:nn \regex_gset:nn \regex_const:nn \regex_new:n regex var Creates a new regex var or raises an error if the name is already taken. The declaration is global. The regex var will initially be such that it never matches. \regex_set:nn regex var { regex } Stores a compiled version of the regular expression in the regex var. For instance, this function can be used as \regex_new:n \l_my_regex \regex_set:nn \l_my_regex { my\ (simple\ )? reg(ex ular\ expression) } The assignment is local for \regex_set:nn and global for \regex_gset:nn. \regex_const:nn for compiled expressions which will never change. Use \regex_show:n \regex_show:n \regex_show:n { regex } Shows how l3regex interprets the regex. For instance, \regex_show:n {\A X Y} shows +-branch anchor at start (\A) char code 88 +-branch char code 89 indicating that the anchor \A only applies to the first branch: the second branch is not anchored to the beginning of the match. 7
8 1.4 Matching All regular expression functions are available in both :n and :N variants. The former require a standard regular expression, while the later require a compiled expression as generated by \regex_(g)set:nn. \regex_match:nntf \regex_match:nntf \regex_match:nntf { regex } { token list } { true code } { false code } Tests whether the regular expression matches any part of the token list. For instance, \regex_match:nntf { b [cde]* } { abecdcx } { TRUE } { FALSE } \regex_match:nntf { [b-dq-w] } { example } { TRUE } { FALSE } leaves TRUE then FALSE in the input stream. \regex_count:nnn \regex_count:nnn \regex_count:nnn { regex } { token list } int var Sets int var within the current TEX group level equal to the number of times regular expression appears in token list. The search starts by finding the left-most longest match, respecting greedy and ungreedy operators. Then the search starts again from the character following the last character of the previous match, until reaching the end of the token list. Infinite loops are prevented in the case where the regular expression can match an empty token list: then we count one match between each pair of characters. For instance, \int_new:n \l_foo_int \regex_count:nnn { (b+ c) } { abbababcbb } \l_foo_int results in \l_foo_int taking the value Submatch extraction \regex_extract_once:nnntf \regex_extract_once:nnntf \regex_extract_once:nnn { regex } { token list } seq var \regex_extract_once:nnntf { regex } { token list } seq var { true code } { false code } Finds the first match of the regular expression in the token list. If it exists, the match is stored as the zeroeth item of the seq var, and further items are the contents of capturing groups, in the order of their opening parenthesis. The seq var is assigned locally. If there is no match, the seq var is cleared. The testing versions insert the true code into the input stream if a match was found, and the false code otherwise. For instance, assume that you type \regex_extract_once:nnntf { \A(La)?TeX(!*)\Z } { LaTeX!!! } \l_foo_seq { true } { false } Then the regular expression (anchored at the start with \A and at the end with \Z) will match the whole token list. The first capturing group, (La)?, matches La, and the second capturing group, (!*), matches!!!. Thus, \l_foo_seq will contain the items {LaTeX!!!}, {La}, and {!!!}, and the true branch is left in the input stream. 8
9 \regex_extract_all:nnntf \regex_extract_all:nnntf \regex_extract_all:nnn { regex } { token list } seq var \regex_extract_all:nnntf { regex } { token list } seq var { true code } { false code } Finds all matches of the regular expression in the token list, and stores all the submatch information in a single sequence (concatenating the results of multiple \regex_- extract_once:nnn calls). The seq var is assigned locally. If there is no match, the seq var is cleared. The testing versions insert the true code into the input stream if a match was found, and the false code otherwise. For instance, assume that you type \regex_extract_all:nnntf { \w+ } { Hello,~world! } \l_foo_seq { true } { false } Then the regular expression will match twice, and the resulting sequence contains the two items {Hello} and {world}, and the true branch is left in the input stream. \regex_split:nnntf \regex_split:nnntf \regex_split:nnn { regular expression } { token list } seq var \regex_split:nnntf { regular expression } { token list } seq var { true code } { false code } Splits the token list into a sequence of parts, delimited by matches of the regular expression. If the regular expression has capturing groups, then the token lists that they match are stored as items of the sequence as well. The assignment to seq var is local. If no match is found the resulting seq var has the token list as its sole item. If the regular expression matches the empty token list, then the token list is split into single tokens. The testing versions insert the true code into the input stream if a match was found, and the false code otherwise. For example, after \seq_new:n \l_path_seq \regex_split:nnntf { / } { the/path/for/this/file.tex } \l_path_seq { true } { false } the sequence \l_path_seq contains the items {the}, {path}, {for}, {this}, and {file.tex}, and the true branch is left in the input stream. 1.6 Replacement \regex_replace_once:nnntf \regex_replace_once:nnntf \regex_replace_once:nnn { regular expression } { replacement } tl var \regex_replace_once:nnntf { regular expression } { replacement } tl var { true code } { false code } Searches for the regular expression in the token list and replaces the first match with the replacement. The result is assigned locally to tl var. In the replacement, \0 represents the full match, \1 represent the contents of the first capturing group, \2 of the second, etc. 9
10 \regex_replace_all:nnntf \regex_replace_all:nnntf \regex_replace_all:nnn { regular expression } { replacement } tl var \regex_replace_all:nnntf { regular expression } { replacement } tl var { true code } { false code } Replaces all occurrences of the \regular expression in the token list by the replacement, where \0 represents the full match, \1 represent the contents of the first capturing group, \2 of the second, etc. Every match is treated independently, and matches cannot overlap. The result is assigned locally to tl var. 1.7 Bugs, misfeatures, future work, and other possibilities The following need to be done now. Change user function names! Clean up the use of messages. Rewrite the documentation in a more ordered way, perhaps add a bnf? Additional error-checking to come. Currently, a{\x34} is recognized as a{4}. Cleaner error reporting in the replacement phase. Add tracing information. Detect attempts to use back-references. Test for the maximum register \c_max_register_int. Find out whether the fact that \W and friends match the end-marker leads to bugs. Possibly update \ regex_item_reverse:n. Enforce that \cc can only be followed by a match-all dot. The empty cs should be matched by \c{}, not by \c{csname.?endcsname\s?}. Code improvements to come. Change \skip to \dimen for the array of active threads, and shift the array of submatch informations so that it starts at \skip0. Optimize \c{abc} for matching a specific control sequence. Only build... once. Use \skip for the left and right state stacks when compiling a regex. Should \ regex_action_free_group:n only be used for greedy {n,} quantifier? (I think not.) Quantifiers for \u and assertions. 10
11 Improve digit grabbing for the \g escape in replacement. Allow arbitrary integer expressions for all those numbers? When matching, keep track of an explicit stack of current_state and current_- submatches. If possible, when a state is reused by the same thread, kill other subthreads. Use \dimen registers rather than \l regex_balance_tl to build \ regex_- replacement_balance_one_match:n. Reduce the number of epsilon-transitions in alternatives. Optimize simple strings: use less states (abcade should give two states, for abc and ade). [Does that really make sense?] Optimize groups with no alternative. Optimize states with a single \ regex_action_free:n. Optimize the use of \ regex_action_success: by inserting it in state 2 directly instead of having an extra transition. Optimize the use of \int_step_... functions. Groups don t capture within regexes for csnames; optimize and document. Decide and document what \c{\c{...}} should do in the replacement text, similar questions for \u. Better show for anchors, properties, and catcode tests. Does \K really need a new state for itself? When compiling, use a boolean in_cs and less magic numbers. Instead of checking whether the character is special or alphanumeric using its character code, check if it is special in regexes with \cs_if_exist tests. The following features are likely to be implemented at some point in the future. Allow \cl(abc) in replacement text. General look-ahead/behind assertions. Regex matching on external files. Conditional subpatterns with look ahead/behind: if what follows is [... ], then [... ]. (*..) and (?..) sequences to set some options. UTF-8 mode for pdftex. 11
12 Newline conventions are not done. In particular, we should have an option for. not to match newlines. Also, \A should differ from ^, and \Z, \z and $ should differ. Unicode properties: \p{..} and \P{..}; \X which should match any extended Unicode sequence. This requires to manipulate a lot of data, probably using treeboxes. The following features of pcre or Perl will probably not be implemented. \ddd, matching the character with octal code ddd; Callout with (?C...), we cannot run arbitrary user code during the matching, because the regex code uses registers in an unsafe way; Conditional subpatterns (other than with a look-ahead or look-behind condition): this is non-regular, isn t it? Named subpatterns: TEX programmers have lived so far without any need for named macro parameters. The following features of pcre or Perl will definitely not be implemented. \cx, similar to TEX s own \^^x; Comments: TEX already has its own system for comments. \Q...\E escaping: this would require to read the argument verbatim, which is not in the scope of this module. Atomic grouping, possessive quantifiers: those tools, mostly meant to fix catastrophic backtracking, are unnecessary in a non-backtracking algorithm, and difficult to implement. Subroutine calls: this syntactic sugar is difficult to include in a non-backtracking algorithm, in particular because the corresponding group should be treated as atomic. Also, we cannot afford to run user code within the regular expression matching, because of our misuse of registers. Recursion: this is a non-regular feature. non-regular feature, this requires backtracking, which is pro- Back-references: hibitively slow. Backtracking control verbs: intrinsically tied to backtracking. \C single byte in UTF-8 mode: XeTEX and LuaTEX serve us characters directly, and splitting those into bytes is tricky, encoding dependent, and most likely not useful anyways. 12
13 Index The italic numbers denote the pages where the corresponding entry is described, numbers underlined point to the definition, all others indicate the places where it is used. B \begin , 5 C cs commands: \cs_if_exist F foo commands: \l_foo_int \c_foo_regex \l_foo_seq , 6 I int commands: \int_step_ M max commands: \c_max_register_int my commands: \l_my_tl , 6, 7 R regex commands: \regex_(g)set:nn \ regex_action_free:n \ regex_action_free_group:n \ regex_action_success: \l regex_balance_tl \regex_const:nn , 7, 7 \regex_count:nnn \regex_count:nnn , 8 \regex_extract_all:nnn \regex_extract_all:nnntf \regex_extract_all:nnntf , 9 \regex_extract_once:nnn , 9 \regex_extract_once:nnntf \regex_extract_once:nnntf... 6, 8, 8 \regex_gset:nn , 7 \ regex_item_reverse:n \regex_match:nntf \regex_match:nntf , 8 \regex_new:n , 7 \regex_replace_all:nnn \regex_replace_all:nnntf \regex_replace_all:nnntf , 10 \regex_replace_once:nnn \regex_replace_once:nnntf \regex_replace_once:nnntf , 9 \ regex_replacement_balance_- one_match:n \regex_set:nn , 7, 7 \regex_show:n \regex_show:n , 7, 7 \regex_split:nnn \regex_split:nnntf \regex_split:nnntf , 9 \regular expression T TEX and L A TEX 2ε commands: \dimen , 11 \skip , 10, 10 tl commands: \tl_to_str:v , 7 tmpa commands: \l_tmpa_int
Regular Expressions Overview Suppose you needed to find a specific IPv4 address in a bunch of files? This is easy to do; you just specify the IP
Regular Expressions Overview Suppose you needed to find a specific IPv4 address in a bunch of files? This is easy to do; you just specify the IP address as a string and do a search. But, what if you didn
Regular Expression Syntax
1 of 5 12/22/2014 9:55 AM EmEditor Home - EmEditor Help - How to - Search Regular Expression Syntax EmEditor regular expression syntax is based on Perl regular expression syntax. Literals All characters
Regular Expressions for Perl, C, PHP, Python, Java, and.net. Regular Expression. Pocket Reference. Tony Stubblebine
Regular Expressions for Perl, C, PHP, Python, Java, and.net Regular Expression Pocket Reference Tony Stubblebine Regular Expression Pocket Reference Regular Expression Pocket Reference Tony Stubblebine
Chapter 4: Computer Codes
Slide 1/30 Learning Objectives In this chapter you will learn about: Computer data Computer codes: representation of data in binary Most commonly used computer codes Collating sequence 36 Slide 2/30 Data
Being Regular with Regular Expressions. John Garmany Session
Being Regular with Regular Expressions John Garmany Session John Garmany Senior Consultant Burleson Consulting Who Am I West Point Graduate GO ARMY! Masters Degree Information Systems Graduate Certificate
Using Regular Expressions in Oracle
Using Regular Expressions in Oracle Everyday most of us deal with multiple string functions in Sql. May it be for truncating a string, searching for a substring or locating the presence of special characters.
CS106A, Stanford Handout #38. Strings and Chars
CS106A, Stanford Handout #38 Fall, 2004-05 Nick Parlante Strings and Chars The char type (pronounced "car") represents a single character. A char literal value can be written in the code using single quotes
Kiwi Log Viewer. A Freeware Log Viewer for Windows. by SolarWinds, Inc.
Kiwi Log Viewer A Freeware Log Viewer for Windows by SolarWinds, Inc. Kiwi Log Viewer displays text based log files in a tabular format. Only a small section of the file is read from disk at a time which
Compiler Construction
Compiler Construction Regular expressions Scanning Görel Hedin Reviderad 2013 01 23.a 2013 Compiler Construction 2013 F02-1 Compiler overview source code lexical analysis tokens intermediate code generation
Excel: Introduction to Formulas
Excel: Introduction to Formulas Table of Contents Formulas Arithmetic & Comparison Operators... 2 Text Concatenation... 2 Operator Precedence... 2 UPPER, LOWER, PROPER and TRIM... 3 & (Ampersand)... 4
Programming Languages CIS 443
Course Objectives Programming Languages CIS 443 0.1 Lexical analysis Syntax Semantics Functional programming Variable lifetime and scoping Parameter passing Object-oriented programming Continuations Exception
Moving from CS 61A Scheme to CS 61B Java
Moving from CS 61A Scheme to CS 61B Java Introduction Java is an object-oriented language. This document describes some of the differences between object-oriented programming in Scheme (which we hope you
So far we have considered only numeric processing, i.e. processing of numeric data represented
Chapter 4 Processing Character Data So far we have considered only numeric processing, i.e. processing of numeric data represented as integer and oating point types. Humans also use computers to manipulate
The collect package. Jonathan Sauer [email protected] 2004/09/10
The collect package Jonathan Sauer [email protected] 2004/09/10 Abstract This file describes the collect package that makes it possible to collect text for later use. Contents 1 Introduction 1 2 Usage
Regular Expressions. General Concepts About Regular Expressions
Regular Expressions This appendix explains regular expressions and how to use them in Cisco IOS software commands. It also provides details for composing regular expressions. This appendix has the following
Name: Class: Date: 9. The compiler ignores all comments they are there strictly for the convenience of anyone reading the program.
Name: Class: Date: Exam #1 - Prep True/False Indicate whether the statement is true or false. 1. Programming is the process of writing a computer program in a language that the computer can respond to
Secrets of printf. 1 Background. 2 Simple Printing. Professor Don Colton. Brigham Young University Hawaii. 2.1 Naturally Special Characters
Secrets of Professor Don Colton Brigham Young University Hawaii is the C language function to do formatted printing. The same function is also available in PERL. This paper explains how works, and how
Regular Expressions. The Complete Tutorial. Jan Goyvaerts
Regular Expressions The Complete Tutorial Jan Goyvaerts Regular Expressions: The Complete Tutorial Jan Goyvaerts Copyright 2006, 2007 Jan Goyvaerts. All rights reserved. Last updated July 2007. No part
University Convocation. IT 3203 Introduction to Web Development. Pattern Matching. Why Match Patterns? The Search Method. The Replace Method
IT 3203 Introduction to Web Development Regular Expressions October 12 Notice: This session is being recorded. Copyright 2007 by Bob Brown University Convocation Tuesday, October 13, 11:00 AM 12:15 PM
Variables, Constants, and Data Types
Variables, Constants, and Data Types Primitive Data Types Variables, Initialization, and Assignment Constants Characters Strings Reading for this class: L&L, 2.1-2.3, App C 1 Primitive Data There are eight
IBM Emulation Mode Printer Commands
IBM Emulation Mode Printer Commands Section 3 This section provides a detailed description of IBM emulation mode commands you can use with your printer. Control Codes Control codes are one-character printer
JavaScript: Introduction to Scripting. 2008 Pearson Education, Inc. All rights reserved.
1 6 JavaScript: Introduction to Scripting 2 Comment is free, but facts are sacred. C. P. Scott The creditor hath a better memory than the debtor. James Howell When faced with a decision, I always ask,
Python Loops and String Manipulation
WEEK TWO Python Loops and String Manipulation Last week, we showed you some basic Python programming and gave you some intriguing problems to solve. But it is hard to do anything really exciting until
CS 141: Introduction to (Java) Programming: Exam 1 Jenny Orr Willamette University Fall 2013
Oct 4, 2013, p 1 Name: CS 141: Introduction to (Java) Programming: Exam 1 Jenny Orr Willamette University Fall 2013 1. (max 18) 4. (max 16) 2. (max 12) 5. (max 12) 3. (max 24) 6. (max 18) Total: (max 100)
Regular expressions are a formal way to
C H A P T E R 11 Regular Expressions 11 This chapter describes regular expression pattern matching and string processing based on regular expression substitutions. These features provide the most powerful
Lecture 18 Regular Expressions
Lecture 18 Regular Expressions Many of today s web applications require matching patterns in a text document to look for specific information. A good example is parsing a html file to extract tags
C H A P T E R Regular Expressions regular expression
7 CHAPTER Regular Expressions Most programmers and other power-users of computer systems have used tools that match text patterns. You may have used a Web search engine with a pattern like travel cancun
Unit 6. Loop statements
Unit 6 Loop statements Summary Repetition of statements The while statement Input loop Loop schemes The for statement The do statement Nested loops Flow control statements 6.1 Statements in Java Till now
Pemrograman Dasar. Basic Elements Of Java
Pemrograman Dasar Basic Elements Of Java Compiling and Running a Java Application 2 Portable Java Application 3 Java Platform Platform: hardware or software environment in which a program runs. Oracle
Programming Languages
Programming Languages Programming languages bridge the gap between people and machines; for that matter, they also bridge the gap among people who would like to share algorithms in a way that immediately
COMP 356 Programming Language Structures Notes for Chapter 4 of Concepts of Programming Languages Scanning and Parsing
COMP 356 Programming Language Structures Notes for Chapter 4 of Concepts of Programming Languages Scanning and Parsing The scanner (or lexical analyzer) of a compiler processes the source program, recognizing
CS 241 Data Organization Coding Standards
CS 241 Data Organization Coding Standards Brooke Chenoweth University of New Mexico Spring 2016 CS-241 Coding Standards All projects and labs must follow the great and hallowed CS-241 coding standards.
Bachelors of Computer Application Programming Principle & Algorithm (BCA-S102T)
Unit- I Introduction to c Language: C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating
Regular Expressions. In This Appendix
A Expressions In This Appendix Characters................... 888 Delimiters................... 888 Simple Strings................ 888 Special Characters............ 888 Rules....................... 891
Topics. Parts of a Java Program. Topics (2) CS 146. Introduction To Computers And Java Chapter Objectives To understand:
Introduction to Programming and Algorithms Module 2 CS 146 Sam Houston State University Dr. Tim McGuire Introduction To Computers And Java Chapter Objectives To understand: the meaning and placement of
Introduction to Python
WEEK ONE Introduction to Python Python is such a simple language to learn that we can throw away the manual and start with an example. Traditionally, the first program to write in any programming language
Data Tool Platform SQL Development Tools
Data Tool Platform SQL Development Tools ekapner Contents Setting SQL Development Preferences...5 Execution Plan View Options Preferences...5 General Preferences...5 Label Decorations Preferences...6
Ecma/TC39/2013/NN. 4 th Draft ECMA-XXX. 1 st Edition / July 2013. The JSON Data Interchange Format. Reference number ECMA-123:2009
Ecma/TC39/2013/NN 4 th Draft ECMA-XXX 1 st Edition / July 2013 The JSON Data Interchange Format Reference number ECMA-123:2009 Ecma International 2009 COPYRIGHT PROTECTED DOCUMENT Ecma International 2013
Version 2.5.0 22 August 2016
Version 2.5.0 22 August 2016 Published by Just Great Software Co. Ltd. Copyright 2009 2016 Jan Goyvaerts. All rights reserved. RegexMagic and Just Great Software are trademarks of Jan Goyvaerts i Table
We will learn the Python programming language. Why? Because it is easy to learn and many people write programs in Python so we can share.
LING115 Lecture Note Session #4 Python (1) 1. Introduction As we have seen in previous sessions, we can use Linux shell commands to do simple text processing. We now know, for example, how to count words.
C++ Input/Output: Streams
C++ Input/Output: Streams 1 The basic data type for I/O in C++ is the stream. C++ incorporates a complex hierarchy of stream types. The most basic stream types are the standard input/output streams: istream
ASSEMBLY LANGUAGE PROGRAMMING (6800) (R. Horvath, Introduction to Microprocessors, Chapter 6)
ASSEMBLY LANGUAGE PROGRAMMING (6800) (R. Horvath, Introduction to Microprocessors, Chapter 6) 1 COMPUTER LANGUAGES In order for a computer to be able to execute a program, the program must first be present
Introducing Oracle Regular Expressions. An Oracle White Paper September 2003
Introducing Oracle Regular Expressions An Oracle White Paper September 2003 Introducing Oracle Regular Expressions Introduction...4 History of Regular Expressions...4 Traditional Database Pattern Matching...5
Hands-on Exercise 1: VBA Coding Basics
Hands-on Exercise 1: VBA Coding Basics This exercise introduces the basics of coding in Access VBA. The concepts you will practise in this exercise are essential for successfully completing subsequent
CS 106 Introduction to Computer Science I
CS 106 Introduction to Computer Science I 01 / 21 / 2014 Instructor: Michael Eckmann Today s Topics Introduction Homework assignment Review the syllabus Review the policies on academic dishonesty and improper
ESPResSo Summer School 2012
ESPResSo Summer School 2012 Introduction to Tcl Pedro A. Sánchez Institute for Computational Physics Allmandring 3 D-70569 Stuttgart Germany http://www.icp.uni-stuttgart.de 2/26 Outline History, Characteristics,
Symbol Tables. Introduction
Symbol Tables Introduction A compiler needs to collect and use information about the names appearing in the source program. This information is entered into a data structure called a symbol table. The
Base Conversion written by Cathy Saxton
Base Conversion written by Cathy Saxton 1. Base 10 In base 10, the digits, from right to left, specify the 1 s, 10 s, 100 s, 1000 s, etc. These are powers of 10 (10 x ): 10 0 = 1, 10 1 = 10, 10 2 = 100,
Python Lists and Loops
WEEK THREE Python Lists and Loops You ve made it to Week 3, well done! Most programs need to keep track of a list (or collection) of things (e.g. names) at one time or another, and this week we ll show
Lexical analysis FORMAL LANGUAGES AND COMPILERS. Floriano Scioscia. Formal Languages and Compilers A.Y. 2015/2016
Master s Degree Course in Computer Engineering Formal Languages FORMAL LANGUAGES AND COMPILERS Lexical analysis Floriano Scioscia 1 Introductive terminological distinction Lexical string or lexeme = meaningful
C++ Language Tutorial
cplusplus.com C++ Language Tutorial Written by: Juan Soulié Last revision: June, 2007 Available online at: http://www.cplusplus.com/doc/tutorial/ The online version is constantly revised and may contain
Number Representation
Number Representation CS10001: Programming & Data Structures Pallab Dasgupta Professor, Dept. of Computer Sc. & Engg., Indian Institute of Technology Kharagpur Topics to be Discussed How are numeric data
JAVA - QUICK GUIDE. Java SE is freely available from the link Download Java. So you download a version based on your operating system.
http://www.tutorialspoint.com/java/java_quick_guide.htm JAVA - QUICK GUIDE Copyright tutorialspoint.com What is Java? Java is: Object Oriented Platform independent: Simple Secure Architectural- neutral
PL / SQL Basics. Chapter 3
PL / SQL Basics Chapter 3 PL / SQL Basics PL / SQL block Lexical units Variable declarations PL / SQL types Expressions and operators PL / SQL control structures PL / SQL style guide 2 PL / SQL Block Basic
Perl in a nutshell. First CGI Script and Perl. Creating a Link to a Script. print Function. Parsing Data 4/27/2009. First CGI Script and Perl
First CGI Script and Perl Perl in a nutshell Prof. Rasley shebang line tells the operating system where the Perl interpreter is located necessary on UNIX comment line ignored by the Perl interpreter End
The Clean programming language. Group 25, Jingui Li, Daren Tuzi
The Clean programming language Group 25, Jingui Li, Daren Tuzi The Clean programming language Overview The Clean programming language first appeared in 1987 and is still being further developed. It was
A couple of things involving environments
A couple of things involving environments Will Robertson 2014/05/04 v0.3 Abstract This package provides two things, one for document authors and one for macro authors. For the document authors, a new method,
java.util.scanner Here are some of the many features of Scanner objects. Some Features of java.util.scanner
java.util.scanner java.util.scanner is a class in the Java API used to create a Scanner object, an extremely versatile object that you can use to input alphanumeric characters from several input sources
How to translate VisualPlace
Translation tips 1 How to translate VisualPlace The international language support in VisualPlace is based on the Rosette library. There are three sections in this guide. It starts with instructions for
Lecture 4. Regular Expressions grep and sed intro
Lecture 4 Regular Expressions grep and sed intro Previously Basic UNIX Commands Files: rm, cp, mv, ls, ln Processes: ps, kill Unix Filters cat, head, tail, tee, wc cut, paste find sort, uniq comm, diff,
Pseudo code Tutorial and Exercises Teacher s Version
Pseudo code Tutorial and Exercises Teacher s Version Pseudo-code is an informal way to express the design of a computer program or an algorithm in 1.45. The aim is to get the idea quickly and also easy
2) Write in detail the issues in the design of code generator.
COMPUTER SCIENCE AND ENGINEERING VI SEM CSE Principles of Compiler Design Unit-IV Question and answers UNIT IV CODE GENERATION 9 Issues in the design of code generator The target machine Runtime Storage
Regular Expressions and Automata using Haskell
Regular Expressions and Automata using Haskell Simon Thompson Computing Laboratory University of Kent at Canterbury January 2000 Contents 1 Introduction 2 2 Regular Expressions 2 3 Matching regular expressions
Semantic Analysis: Types and Type Checking
Semantic Analysis Semantic Analysis: Types and Type Checking CS 471 October 10, 2007 Source code Lexical Analysis tokens Syntactic Analysis AST Semantic Analysis AST Intermediate Code Gen lexical errors
Data Integrator. Pervasive Software, Inc. 12365-B Riata Trace Parkway Austin, Texas 78727 USA
Data Integrator Event Management Guide Pervasive Software, Inc. 12365-B Riata Trace Parkway Austin, Texas 78727 USA Telephone: 888.296.5969 or 512.231.6000 Fax: 512.231.6010 Email: [email protected]
Introduction to Searching with Regular Expressions
Introduction to Searching with Regular Expressions Christopher M. Frenz Department of Computer Engineering Technology New York City College of Technology (CUNY) 300 Jay St Brooklyn, NY 11201 Email: [email protected]
Positional Numbering System
APPENDIX B Positional Numbering System A positional numbering system uses a set of symbols. The value that each symbol represents, however, depends on its face value and its place value, the value associated
The Pointless Machine and Escape of the Clones
MATH 64091 Jenya Soprunova, KSU The Pointless Machine and Escape of the Clones The Pointless Machine that operates on ordered pairs of positive integers (a, b) has three modes: In Mode 1 the machine adds
Commonly Used Excel Functions. Supplement to Excel for Budget Analysts
Supplement to Excel for Budget Analysts Version 1.0: February 2016 Table of Contents Introduction... 4 Formulas and Functions... 4 Math and Trigonometry Functions... 5 ABS... 5 ROUND, ROUNDUP, and ROUNDDOWN...
6.170 Tutorial 3 - Ruby Basics
6.170 Tutorial 3 - Ruby Basics Prerequisites 1. Have Ruby installed on your computer a. If you use Mac/Linux, Ruby should already be preinstalled on your machine. b. If you have a Windows Machine, you
INCIDENCE-BETWEENNESS GEOMETRY
INCIDENCE-BETWEENNESS GEOMETRY MATH 410, CSUSM. SPRING 2008. PROFESSOR AITKEN This document covers the geometry that can be developed with just the axioms related to incidence and betweenness. The full
A package for rotated objects in L A TEX
A package for rotated objects in L A TEX Robin Fairbairns Sebastian Rahtz Leonor Barroca printed January 26, 2010 Contents 1 Introduction 1 2 Usage 2 2.1 Package options............................. 2
Real SQL Programming. Persistent Stored Modules (PSM) PL/SQL Embedded SQL
Real SQL Programming Persistent Stored Modules (PSM) PL/SQL Embedded SQL 1 SQL in Real Programs We have seen only how SQL is used at the generic query interface --- an environment where we sit at a terminal
Portal Connector Fields and Widgets Technical Documentation
Portal Connector Fields and Widgets Technical Documentation 1 Form Fields 1.1 Content 1.1.1 CRM Form Configuration The CRM Form Configuration manages all the fields on the form and defines how the fields
Lecture 9. Semantic Analysis Scoping and Symbol Table
Lecture 9. Semantic Analysis Scoping and Symbol Table Wei Le 2015.10 Outline Semantic analysis Scoping The Role of Symbol Table Implementing a Symbol Table Semantic Analysis Parser builds abstract syntax
2.6 Exponents and Order of Operations
2.6 Exponents and Order of Operations We begin this section with exponents applied to negative numbers. The idea of applying an exponent to a negative number is identical to that of a positive number (repeated
Evaluation of JFlex Scanner Generator Using Form Fields Validity Checking
ISSN (Online): 1694-0784 ISSN (Print): 1694-0814 12 Evaluation of JFlex Scanner Generator Using Form Fields Validity Checking Ezekiel Okike 1 and Maduka Attamah 2 1 School of Computer Studies, Kampala
Java Interview Questions and Answers
1. What is the most important feature of Java? Java is a platform independent language. 2. What do you mean by platform independence? Platform independence means that we can write and compile the java
HTML Web Page That Shows Its Own Source Code
HTML Web Page That Shows Its Own Source Code Tom Verhoeff November 2009 1 Introduction A well-known programming challenge is to write a program that prints its own source code. For interpreted languages,
CS 2112 Spring 2014. 0 Instructions. Assignment 3 Data Structures and Web Filtering. 0.1 Grading. 0.2 Partners. 0.3 Restrictions
CS 2112 Spring 2014 Assignment 3 Data Structures and Web Filtering Due: March 4, 2014 11:59 PM Implementing spam blacklists and web filters requires matching candidate domain names and URLs very rapidly
grep, awk and sed three VERY useful command-line utilities Matt Probert, Uni of York grep = global regular expression print
grep, awk and sed three VERY useful command-line utilities Matt Probert, Uni of York grep = global regular expression print In the simplest terms, grep (global regular expression print) will search input
3. Mathematical Induction
3. MATHEMATICAL INDUCTION 83 3. Mathematical Induction 3.1. First Principle of Mathematical Induction. Let P (n) be a predicate with domain of discourse (over) the natural numbers N = {0, 1,,...}. If (1)
Decision Logic: if, if else, switch, Boolean conditions and variables
CS 1044 roject 3 Fall 2009 Decision Logic: if, if else, switch, Boolean conditions and variables This programming assignment uses many of the ideas presented in sections 3 through 5 of the Dale/Weems text
PROBLEMS (Cap. 4 - Istruzioni macchina)
98 CHAPTER 2 MACHINE INSTRUCTIONS AND PROGRAMS PROBLEMS (Cap. 4 - Istruzioni macchina) 2.1 Represent the decimal values 5, 2, 14, 10, 26, 19, 51, and 43, as signed, 7-bit numbers in the following binary
Introduction to Java Applications. 2005 Pearson Education, Inc. All rights reserved.
1 2 Introduction to Java Applications 2.2 First Program in Java: Printing a Line of Text 2 Application Executes when you use the java command to launch the Java Virtual Machine (JVM) Sample program Displays
Compilers Lexical Analysis
Compilers Lexical Analysis SITE : http://www.info.univ-tours.fr/ mirian/ TLC - Mírian Halfeld-Ferrari p. 1/3 The Role of the Lexical Analyzer The first phase of a compiler. Lexical analysis : process of
Xcode User Default Reference. (Legacy)
Xcode User Default Reference (Legacy) Contents Introduction 5 Organization of This Document 5 Software Version 5 See Also 5 Xcode User Defaults 7 Xcode User Default Overview 7 General User Defaults 8 NSDragAndDropTextDelay
1 Introduction. 2 An Interpreter. 2.1 Handling Source Code
1 Introduction The purpose of this assignment is to write an interpreter for a small subset of the Lisp programming language. The interpreter should be able to perform simple arithmetic and comparisons
Lexical Analysis and Scanning. Honors Compilers Feb 5 th 2001 Robert Dewar
Lexical Analysis and Scanning Honors Compilers Feb 5 th 2001 Robert Dewar The Input Read string input Might be sequence of characters (Unix) Might be sequence of lines (VMS) Character set ASCII ISO Latin-1
Excel Intermediate. Table of Contents UPPER, LOWER, PROPER AND TRIM...28
Excel Intermediate Table of Contents Formulas UPPER, LOWER, PROPER AND TRM...2 LEFT, MID, and RIGHT...3 CONCATENATE...4 & (Ampersand)...5 CONCATENATE vs. & (Ampersand)...5 ROUNDUP, and ROUNDDOWN...6 VLOOKUP...7
First Java Programs. V. Paúl Pauca. CSC 111D Fall, 2015. Department of Computer Science Wake Forest University. Introduction to Computer Science
First Java Programs V. Paúl Pauca Department of Computer Science Wake Forest University CSC 111D Fall, 2015 Hello World revisited / 8/23/15 The f i r s t o b l i g a t o r y Java program @author Paul Pauca
OpenOffice.org 3.2 BASIC Guide
OpenOffice.org 3.2 BASIC Guide Copyright The contents of this document are subject to the Public Documentation License. You may only use this document if you comply with the terms of the license. See:
Lecture 2: Regular Languages [Fa 14]
Caveat lector: This is the first edition of this lecture note. Please send bug reports and suggestions to [email protected]. But the Lord came down to see the city and the tower the people were building.
PROBLEM SOLVING SEVENTH EDITION WALTER SAVITCH UNIVERSITY OF CALIFORNIA, SAN DIEGO CONTRIBUTOR KENRICK MOCK UNIVERSITY OF ALASKA, ANCHORAGE PEARSON
PROBLEM SOLVING WITH SEVENTH EDITION WALTER SAVITCH UNIVERSITY OF CALIFORNIA, SAN DIEGO CONTRIBUTOR KENRICK MOCK UNIVERSITY OF ALASKA, ANCHORAGE PEARSON Addison Wesley Boston San Francisco New York London
Dynamic Programming. Lecture 11. 11.1 Overview. 11.2 Introduction
Lecture 11 Dynamic Programming 11.1 Overview Dynamic Programming is a powerful technique that allows one to solve many different types of problems in time O(n 2 ) or O(n 3 ) for which a naive approach
Ed. v1.0 PROGRAMMING LANGUAGES WORKING PAPER DRAFT PROGRAMMING LANGUAGES. Ed. v1.0
i PROGRAMMING LANGUAGES ii Copyright 2011 Juhász István iii COLLABORATORS TITLE : PROGRAMMING LANGUAGES ACTION NAME DATE SIGNATURE WRITTEN BY István Juhász 2012. március 26. Reviewed by Ágnes Korotij 2012.
SYSTEMS OF EQUATIONS AND MATRICES WITH THE TI-89. by Joseph Collison
SYSTEMS OF EQUATIONS AND MATRICES WITH THE TI-89 by Joseph Collison Copyright 2000 by Joseph Collison All rights reserved Reproduction or translation of any part of this work beyond that permitted by Sections
MATLAB Programming. Problem 1: Sequential
Division of Engineering Fundamentals, Copyright 1999 by J.C. Malzahn Kampe 1 / 21 MATLAB Programming When we use the phrase computer solution, it should be understood that a computer will only follow directions;
[Refer Slide Time: 05:10]
Principles of Programming Languages Prof: S. Arun Kumar Department of Computer Science and Engineering Indian Institute of Technology Delhi Lecture no 7 Lecture Title: Syntactic Classes Welcome to lecture
TIP: To access the WinRunner help system at any time, press the F1 key.
CHAPTER 11 TEST SCRIPT LANGUAGE We will now look at the TSL language. You have already been exposed to this language at various points of this book. All the recorded scripts that WinRunner creates when
