Text Processing In Java: Characters and Strings



Similar documents
Variables, Constants, and Data Types

CS106A, Stanford Handout #38. Strings and Chars

Handout 1. Introduction to Java programming language. Java primitive types and operations. Reading keyboard Input using class Scanner.

J a v a Quiz (Unit 3, Test 0 Practice)

Handout 3 cs180 - Programming Fundamentals Spring 15 Page 1 of 6. Handout 3. Strings and String Class. Input/Output with JOptionPane.

Unit 6. Loop statements

Pemrograman Dasar. Basic Elements Of Java

CS 141: Introduction to (Java) Programming: Exam 1 Jenny Orr Willamette University Fall 2013

Building Java Programs

CmpSci 187: Programming with Data Structures Spring 2015

Introduction to Java Applications Pearson Education, Inc. All rights reserved.

CS 106 Introduction to Computer Science I

Introduction to Java Lecture Notes. Ryan Dougherty

Chapter 2 Basics of Scanning and Conventional Programming in Java

Object-Oriented Design Lecture 4 CSU 370 Fall 2007 (Pucella) Tuesday, Sep 18, 2007

Topics. Parts of a Java Program. Topics (2) CS 146. Introduction To Computers And Java Chapter Objectives To understand:

Lecture 5: Java Fundamentals III

Web Programming Step by Step

Chapter 4: Computer Codes

Building Java Programs

Number Representation

Compiler Construction

Chapter 2 Introduction to Java programming

Stacks. Linear data structures

Introduction to Visual C++.NET Programming. Using.NET Environment

Regular Expression Syntax

Computer Programming I

CSE 1223: Introduction to Computer Programming in Java Chapter 2 Java Fundamentals

Example of a Java program

CS170 Lab 11 Abstract Data Types & Objects

Chapter 2: Elements of Java

JavaScript: Introduction to Scripting Pearson Education, Inc. All rights reserved.

Outline Basic concepts of Python language

Introduction to Java. CS 3: Computer Programming in Java

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science

JAVA - QUICK GUIDE. Java SE is freely available from the link Download Java. So you download a version based on your operating system.

java.util.scanner Here are some of the many features of Scanner objects. Some Features of java.util.scanner

Ecma/TC39/2013/NN. 4 th Draft ECMA-XXX. 1 st Edition / July The JSON Data Interchange Format. Reference number ECMA-123:2009

COSC282 BIG DATA ANALYTICS FALL 2015 LECTURE 2 - SEP 9

Java Programming Fundamentals

Introduction to Java

Sample CSE8A midterm Multiple Choice (circle one)

Moving from CS 61A Scheme to CS 61B Java

Java Interview Questions and Answers

Unix Shell Scripts. Contents. 1 Introduction. Norman Matloff. July 30, Introduction 1. 2 Invoking Shell Scripts 2

Recursion. Definition: o A procedure or function that calls itself, directly or indirectly, is said to be recursive.

AP Computer Science Java Subset

Recursion and Recursive Backtracking

UIL Computer Science for Dummies by Jake Warren and works from Mr. Fleming

LAB 3 Part 1 OBJECTS & CLASSES

Masters programmes in Computer Science and Information Systems. Object-Oriented Design and Programming. Sample module entry test xxth December 2013

Conditionals (with solutions)

Iteration CHAPTER 6. Topic Summary

Basic Java Constructs and Data Types Nuts and Bolts. Looking into Specific Differences and Enhancements in Java compared to C

Building Java Programs

Regular Expressions Overview Suppose you needed to find a specific IPv4 address in a bunch of files? This is easy to do; you just specify the IP

12-6 Write a recursive definition of a valid Java identifier (see chapter 2).

Programming Languages CIS 443

ASCII Code. Numerous codes were invented, including Émile Baudot's code (known as Baudot

Interaction: Mouse and Keyboard DECO1012

Secrets of printf. 1 Background. 2 Simple Printing. Professor Don Colton. Brigham Young University Hawaii. 2.1 Naturally Special Characters

Scanner. It takes input and splits it into a sequence of tokens. A token is a group of characters which form some unit.

Semantic Analysis: Types and Type Checking

Numeral Systems. The number twenty-five can be represented in many ways: Decimal system (base 10): 25 Roman numerals:

Java programming for C/C++ developers

.NET Standard DateTime Format Strings

So far we have considered only numeric processing, i.e. processing of numeric data represented

Object Oriented Software Design

Cardinality. The set of all finite strings over the alphabet of lowercase letters is countable. The set of real numbers R is an uncountable set.

COSC Introduction to Computer Science I Section A, Summer Question Out of Mark A Total 16. B-1 7 B-2 4 B-3 4 B-4 4 B Total 19

ASCII Encoding. The char Type. Manipulating Characters. Manipulating Characters

File class in Java. Scanner reminder. Files 10/19/2012. File Input and Output (Savitch, Chapter 10)

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

C++ Language Tutorial

Hash Tables. Computer Science E-119 Harvard Extension School Fall 2012 David G. Sullivan, Ph.D. Data Dictionary Revisited

Software Testing. Definition: Testing is a process of executing a program with data, with the sole intention of finding errors in the program.

Chulalongkorn University International School of Engineering Department of Computer Engineering Computer Programming Lab.

Week 1: Review of Java Programming Basics

Contents. 9-1 Copyright (c) N. Afshartous

CMPSCI 250: Introduction to Computation. Lecture #19: Regular Expressions and Their Languages David Mix Barrington 11 April 2013

Pseudo code Tutorial and Exercises Teacher s Version

Overview. java.math.biginteger, java.math.bigdecimal. Definition: objects are everything but primitives The eight primitive data type in Java

Part I. Multiple Choice Questions (2 points each):

Basic Programming and PC Skills: Basic Programming and PC Skills:

The Cool Reference Manual

Finding XSS in Real World

Symbols in subject lines. An in-depth look at symbols

public static void main(string[] args) { System.out.println("hello, world"); } }

Section 6 Spring 2013

First Java Programs. V. Paúl Pauca. CSC 111D Fall, Department of Computer Science Wake Forest University. Introduction to Computer Science

Course: Programming II - Abstract Data Types. The ADT Stack. A stack. The ADT Stack and Recursion Slide Number 1

Problem 1. CS 61b Summer 2005 Homework #2 Due July 5th at the beginning of class

JavaScript: Control Statements I

Massachusetts Institute of Technology 6.005: Elements of Software Construction Fall 2011 Quiz 1 October 14, Instructions

A Grammar for the C- Programming Language (Version S16) March 12, 2016

Copyright. Restricted Rights Legend. Trademarks or Service Marks. Copyright 2003 BEA Systems, Inc. All Rights Reserved.

Systems I: Computer Organization and Architecture

Sources: On the Web: Slides will be available on:

JavaScript Introduction

Member Functions of the istream Class

Transcription:

Text Processing In Java: Characters and Strings Wellesley College CS230 Lecture 03 Monday, February 5 Handout #10 (Revised Friday, February 9) Reading: Downey: Chapter 7 Problem Set: Assignment #1 due Tuesday, Feburary 13 3-1 Characters Characters are the fundamental unit of strings & files Character literals: letters: a, b,, z, A, B, Z digits: 0, 1,, 9 punctuation:,,., ;,!, (space), escaped characters: \ (single quote) \ (double quote) \\ (backslash) \n (newline) \r (carriage return) \t (tab) \b (backspace) \f (form feed) 3-2 1

ASCII Character Representations ASCII = American Standard Code for Information Exchange Defines integer representations for 2^8 = 256 characters (see http://www.asciitable.com): lowercase letters: a 97, b 98,, z 122 uppercase letters: A 65, B 66,, Z 90 digits: 0 48, 1 49,, 9 57 punctuation: 32,! 33, ( 41, ) 42, escaped chars: \n 10, \r 13, \ 34, \ 39, \\ 92 Use casts to convert between characters and integers: (int) A 65, (int) a 97 (char) 65 A, (char) 97 a Can compare characters directly via <, <=, ==,!=, >=, >: a < b true, b == b true, c <= 0 false More modern Unicode representations define integer representations for 2^16 = 65536 characters; ASCII are the first 256 of these. 3-3 Putting ASCII Representations to Work public static boolean isletter (char c) { return ( a <= c && c <= z ) ( A <= c && c <= Z ); public static boolean isdigit (char c) { return ( 0 <= c && c <= 9 ); public static boolean isuppercase (char c) { // islowercase is similar return A <= c && c <= Z ; public static char tolowercase (char c) { // touppercase is similar return isuppercase(c)? (char) (((int) c) + 32) : c; public static String tostring (char c) { return + c; public static char shiftchar (char c, int shift) { return (char) MathOps.mod(((int) c) + shift, 256); // MathOps.mod(x,m) returns x modulo m 3-4 2

Class Methods of the Character Class Many character methods are already provided as class methods of the Character class, so we needn t define them from scratch. Morever, these work on full unicode characters, not just ASCII subset. E.g.: public static boolean isletter (char c); public static boolean isdigit (char c); public static boolean isletterordigit (char c); public static boolean iswhitespace (char c); public static boolean isuppercase (char c); public static boolean islowercase (char c); public static char tolowercase (char c); public static char touppercase (char c); public static String tostring (char c); See Java API of the Character class for the complete list of methods. 3-5 Java Strings vs. Character Arrays char [] chars = { c, s, 2, 3, 0 ; String str = cs230 ; Like arrays of characters, Java strings are sequences of characters indexed from 0: str.charat(0) c,, str.charat(4) 0 chars[0] c,, chars[4] 0 However, Java Strings differ from character arrays in important ways: Strings are instances of a class (String), but arrays are not instances of any class; can invoke String instance methods on strings. Strings are immutable --- cannot be changed once created. Different notations for String literals (double quotes) and accessing elements at an index (charat() method). 3-6 3

Some String Instance Methods String s1 = delivery ; s1.length() 8 s1.charat(3) i s1.charat(8) StringIndexOutOfBoundsException s1.indexof( e ) 1 s1.indexof( q ) -1 // Indicates character is not present s1.indexof( e, 2) 5 s1.indexof( e, 9) -1 // *No* exception here! s1.indexof( live ) 2 s1.indexof( love ) -1 s1.indexof( live, 9) -1 // *No* exception here! 3-7 More String Instance Methods s1.substring(4) very s1.substring(8) s1.substring(9) StringIndexOutOfBoundsException s1.substring(2, 6) live s1.substring(4,8) very s1.substring(5,5) s1.substring(6,9) StringIndexOutOfBoundsException s1.substring(7,6) StringIndexOutOfBoundsException s1.touppercase() DELIVERY (s1.touppercase()).tolowercase() delivery s1.tochararray() { d, e, l, i, v, e, r, y 3-8 4

Even More String Instance Methods String s2 = "I say \t\"yes\"\nyou say \t\"no\"\ni say \t\"why?\""; System.out.println(s2); I say "Yes" you say "No" I say "Why?" s2.length() 40 s2.charat(7) \ s2.charat(6) \t s2.charat(12) \n s2.replace('s', 'S') "I Say \t\"yes\"\nyou Say \t\"no\"\ni Say \t\"why?\"" s2.replaceall("say", "said") "I said \t\"yes\"\nyou said \t\"no\"\ni said \t\"why?\"" s2.replaceall("[\t\n]", " ") // [\t\n] is a regular expression denoting a tab *or* a newline. "I say \"Yes\ you say \"No\ I say \"Why?\" "aabaaacadaaaae".replaceall( a+, XYZ ) XYZbXYZcXYZdXYZe // a+ is a regular expression denoting any nonempty sequence of a s "p \t \n q \n rs \t t".replaceall("\\s+", " ") p q rs t // \s+ is a regular expression denoting any nonempty sequence of whitespace // characters (e.g., spaces, tabs, newlines, carriage returns). The extra backslash // is needed to pass the backslash before the s to the regular expression checker. 3-9 Yet More String Instance Methods s2.split("\n") {"I say \t\"yes\"", "you say \t\"no\"", "I say \t\"why?\"" s2.split("[\n\t]"); {"I say ", "\"Yes\"", "you say ", "\"No\"", "I say ", "\"Why?\"" s2.split("[\n\t ]"); // [\n\t ] is a regular expression denoting a // newline *or* a tab *or* a space. {"I", "say", "", "\"Yes\"", "you", "say", "", "\"No\"", "I ", "say", "", "\"Why?\"" // The empty strings come from three space/tab combinations in the input. s2.split("\\s+"); // see notes on \s+ on the previous slide {"I", "say", "\"Yes\"", "you", "say", "\"No\"", "I ", "say", "\"Why?\"" // Using \s+ removes the empty strings "\t \na\t \nb\t \n".trim() "a\t \nb" How do we convert strings like "{ 17, -3, 42 " to the corresponding integer arrays (e.g. {17,-3,42)? (See IntArray.fromString() in utils directory.) See Java API of the String class for the complete list of String methods. 3-10 5

String Comparisons "cs230".equals("cs230") true "cs230".equals("cs230") false "cs230" == "cs230" Could be true, could be false! "prefer".compareto("radar") a negative integer "prefer".compareto("preference") a negative integer "prefer".compareto("prefer") 0 "prefer".compareto("like") a postive integer 3-11 Simple String Methods // Assume these are define in a StringOps class // first("cs230") c public static char first (String s) { // butfirst("cs230") "s230" public static String butfirst (String s) { // last("cs230") 0 public static char last (String s) { // butlast("cs230") "cs23" public static String butlast (String s) { 3-12 6

String Mapping 1 (Use Iteration) public static String touppercase (String s) { How would we modify the above to implement the following? public static String replace (String s, char oldchar, char newchar); public static String shiftstring (String s, int shift); // StringOps.shiftString("HAL", 1) "IBM" 3-13 String Mapping 2 (Use Recursion) public static String touppercase (String s) { 3-14 7

String Filtering (Iterative and Recursive) // StringOps.removeChar( delivery, e ) dlivry // StringOps.removeChar( delivery, d ) elivery // StringOps.removeChar( delivery, q ) delivery public static String removechar (String s, char c) { 3-15 String Reversal (Iterative and Recursive) // StringOps.reveree( delivery ) yreviled public static String reverse (String s) { 3-16 8