Lab 8 - Input and Output David Woods dwoods@scss.tcd.ie 1 Streams Input and Output usually abbreviated as I/O are how we handle files in Java and other programming languages. An I/O stream is what Java uses to get data from an input source, or to put data into an output destination. Streams allow us to handle many kinds of data, including the simple primitive types (int, boolean, char, double, etc.) we ve seen already, and more complex objects (String, Scanner, List, etc.). Sometimes streams will just move data from one place to another, and sometimes they will process it in some useful way before passing it on. No matter what the stream does, it will always follow the same basic model: Reading information into a program Data Source Input Stream Program Writing information from a program Program Output Stream Data Source We ve actually already seen an example of an input stream, when we were using the Scanner class. This takes data from an input stream, which in this case is System.in, which corresponds to the keyboard input (this is also known as standard in). The Scanner allows us to access the information from the input stream in our programs (using next(), nextint(), nextline(), etc.). Similarly, System.out is an output stream, corresponding to the output on the screen (standard out). java.io In order to access the classes discussed below, we will need to import them into our programs. We do this in the same way as we import the Scanner class import java.util.scanner except that all the classes we want are inside the java.io library. We can either import the classes we want one by one, or we can use the asterisk character to import everything from the library: import java.io.* One important thing to be aware of with data streams is that things can go wrong. A file can go missing, or a stream can be interrupted (if the program is ended early, for example). We need to be able to deal with these exceptions, and we have two ways to do this: 1
Method 1: try / catch We can wrap all of our code in a try block, which will try to do the tasks we want, but in the case where something goes wrong, we can catch the exception (which for these cases will be of the type IOException), and handle it in a catch block. try // code using the stream catch (IOException e) // code to only be executed if the exception occurs Method 2: throws Exception Instead of wrapping our code up, we can anticipate that function might contain some code that could run into one of the exceptional circumstances discussed above. After the function definition and arguments, we add the keyword throws, followed by the kind of exception that we are expecting. public static void main(string[] args) throws IOException // code using the stream Note that this seems like we re avoiding the problem of handling the exception (which we did with the catch block in the previous method) in fact, we re just moving the problem, and we ll have to worry about it elsewhere. For now, though, we don t need to worry about that too much. 2 Byte Streams In the same way that, ultimately, we can break down every data type to ones and zeroes, grouped in 8-bit bytes, every type of stream is ultimately a descendant of a byte stream either InputStream or OutputStream. On their own, these aren t too useful to us, as they are very low-level, and we don t usually want to move just bytes around, but it s important to understand what s going on behind the scenes. For example, using the classes FileInputStream and FileOutputStream, we could get a stream of bytes from one file, process it in some useful way, and then write it to some other file. However, generally speaking we will not use byte streams unless we need access to this low-level data for some reason. Since we will be mostly dealing with text data as computational linguists, we will typically use character streams to handle moving our data around. Stream Pipeline You should always deal with streams in the following order: 2
1. open the stream 2. process the data 3. close the stream It s important to always close your streams when you no longer need them, as doing otherwise may lead to resource and memory leaks. These can cause your computer to slow down, and corrupted files in bad cases. This goes for Scanners too, as closing one closes the input stream it s handling (System.in), as in the example below: public static void main(string[] args) // open a Scanner stream, using the keyboard as the data source Scanner scan = new Scanner(System.in); System.out.println("Enter some text: "); // get the data from the stream String str = scan.nextline(); // process the data in some way str = str.tolowercase(); System.out.println(str); // close the stream, preventing memory leaks! scan.close(); 3 Character Streams The simplest character stream classes are called Reader and Writer, but as with the byte streams, we have specialised versions of these classes for use in file I/O, which is what we are most interested in at the moment: FileReader and FileWriter. These classes allow us to pass a filename as a String argument, and open up that file to access the characters of the text inside it, or write characters to a file. Note that the input stream s file must exist before trying to open it, but if the output stream s file does not exist, it will be created automatically. import java.io.filereader; import java.io.filewriter; import java.io.ioexception; public class CopyChars // main function can throw IOException public static void main(string[] args) throws IOException // open the streams FileReader inputstream = new FileReader("myinput.txt"); FileWriter outputstream = new FileWriter("myoutput.txt"); // grab the first character int c = inputstream.read(); // a result of -1 means the end of the stream is reached while (c!= -1) 3
// write the character into the output file outputstream.write(c); // grab the next character c = inputstream.read(); // close the streams! inputstream.close(); outputstream.close(); You ll notice that that this uses an integer to store the values of the characters this is just due to the way that the streams represent the characters internally. Getting Lines of Data All of the above is fine for working in low-level data situations, but we normally want to deal with more than one byte or one character at a time. Typically, we re going to be dealing with Strings of text. The most straightforward way of building on what we already know is to take units of lines, rather than single characters. A line is a String of characters with some line-terminator at the end. What this terminator is will depend on the system you re using, but is usually one of a carriage-return/line-feed sequence ("\r\n"), or a single carriage-return ("\r"), or a single line-feed ("\n"). We don t need to worry about which is being used most of the time, as Java generally can figure it out without our help, but sometimes if things aren t looking right, it may be due to a mismatched lineterminator. So how do we get a line of characters using I/O streams? We need to introduce another pair of classes to do this for us: BufferedReader and PrintWriter. These classes will allow us to create buffers of characters to deal with in the format of Strings. We use the classes in conjunction with the two classes we used for simple character streams, FileReader and FileWriter. 4
import java.io.filereader; import java.io.filewriter; import java.io.bufferedreader; import java.io.printwriter; import java.io.ioexception; public class CopyLines // main function can throw IOException public static void main(string[] args) throws IOException // open the streams FileReader inputstream = new FileReader("myinput.txt"); FileWriter outputstream = new FileWriter("myoutput.txt"); // pass the basic streams to the new buffered streams BufferedReader reader = new BufferedReader(inputStream); PrintWriter writer = new PrintWriter(outputStream); // grab the first line String line = reader.readline(); // a result of null means the end of the stream is reached while (line!= null) // write the line into the output file writer.println(line); // grab the next character line = reader.readline(); // close the streams! inputstream.close(); outputstream.close(); reader.close(); writer.close(); You should also be aware that it is possible to initialise the buffered streams in a single line each: BufferedReader br = new BufferedReader(new FileReader("input.txt")); Doing it this way means you only have to close the buffered stream, and it will close the character stream too: // closes the buffered and simple streams at once br.close(); 5
4 Lab Exercise Please sumbit your solution to me by email (dwoods@scss.tcd.ie) before 23:59 on Monday 21st of November. Ensure all code is fully commented and tested, or full marks will not be given. Remember that you must recompile your code every time you make any changes. Create a short text file (approximately 10 lines). Create a Java program called SimpleIO.java The program should ask the user for an input file (the text file you created in the first step), and should make sure this file exists and is not a directory (you ll have to look online for these tests). It should keep asking the user for input until an existant file is given. Next, the program should ask for another file name, for output. This file does not need to exist, but it must be a distinct name from the input file. Keep asking until this condition is satisfied. Read through the input file, line by line, and count the occurrences of the word the in each line. Hint: split the string into an array of lowercase words. For each line of input, you should print the line in the output file, followed by the number of occurrences of the. For example, if the input line is The cat and the dog, the output line should be The cat and the dog [2] 6