PA2: Word Cloud (100 Points)



Similar documents
PA8: 2048 GUI (100 Points)

Lab 1: Introduction to C, ASCII ART and the Linux Command Line Environment

Decision Logic: if, if else, switch, Boolean conditions and variables

AP Computer Science Java Mr. Clausen Program 9A, 9B

Project 2: Bejeweled

CS 141: Introduction to (Java) Programming: Exam 1 Jenny Orr Willamette University Fall 2013

Vim, Emacs, and JUnit Testing. Audience: Students in CS 331 Written by: Kathleen Lockhart, CS Tutor

Introduction to Programming System Design. CSCI 455x (4 Units)

NLP Programming Tutorial 0 - Programming Basics

Introduction to Computer Programming (CS 1323) Project 9

CS 241 Data Organization Coding Standards

- User input includes typing on the keyboard, clicking of a mouse, tapping or swiping a touch screen device, etc.

Repetition Using the End of File Condition

Visual Logic Instructions and Assignments

csce4313 Programming Languages Scanner (pass/fail)

Some Scanner Class Methods

CS170 Lab 11 Abstract Data Types & Objects

Ohio University Computer Services Center August, 2002 Crystal Reports Introduction Quick Reference Guide

PART-A Questions. 2. How does an enumerated statement differ from a typedef statement?

CS 2112 Spring Instructions. Assignment 3 Data Structures and Web Filtering. 0.1 Grading. 0.2 Partners. 0.3 Restrictions

1 Basic commands. 2 Terminology. CS61B, Fall 2009 Simple UNIX Commands P. N. Hilfinger

Pseudo code Tutorial and Exercises Teacher s Version

CSE 308. Coding Conventions. Reference

1 Description of The Simpletron

Boolean Expressions, Conditions, Loops, and Enumerations. Precedence Rules (from highest to lowest priority)

Hypercosm. Studio.

CNC Transfer. Operating Manual

CSC 120: Computer Science for the Sciences (R section)

Command Line - Part 1

HW3: Programming with stacks

CPS122 - OBJECT-ORIENTED SOFTWARE DEVELOPMENT. Team Project

Moving from CS 61A Scheme to CS 61B Java

J a v a Quiz (Unit 3, Test 0 Practice)

Introduction to Python

Lecture 2 Mathcad Basics

Introduction to Java

Debugging. Common Semantic Errors ESE112. Java Library. It is highly unlikely that you will write code that will work on the first go

Contents. Microsoft Office 2010 Tutorial... 1

C++ INTERVIEW QUESTIONS

Voluntary Product Accessibility Template Blackboard Learn Release 9.1 April 2014 (Published April 30, 2014)

Bitrix Site Manager 4.1. User Guide

Problem 1. CS 61b Summer 2005 Homework #2 Due July 5th at the beginning of class

Before you can use the Duke Ambient environment to start working on your projects or

Sage Abra SQL HRMS Reports. User Guide

Chapter 3. Input and output. 3.1 The System class

Simple File Input & Output

Test Generator. Creating Tests

Getting Started with Command Prompts

PharmaSUG Paper QT26

Java Application Developer Certificate Program Competencies

Microsoft Word 2010 Prepared by Computing Services at the Eastman School of Music July 2010

User s Guide for the Texas Assessment Management System

How to Format a Bibliography or References List in the American University Thesis and Dissertation Template

Name: Class: Date: 9. The compiler ignores all comments they are there strictly for the convenience of anyone reading the program.

File by OCR Manual. Updated December 9, 2008

Sources: On the Web: Slides will be available on:

Tutorial Guide to the IS Unix Service

Using SVN to Manage Source RTL

Thirty Useful Unix Commands

Outline. Conditional Statements. Logical Data in C. Logical Expressions. Relational Examples. Relational Operators

How to Write a Simple Makefile

Single Property Website Quickstart Guide

Selection Statements

Sample CSE8A midterm Multiple Choice (circle one)

Introduction to Java Applications Pearson Education, Inc. All rights reserved.

In this Chapter you ll learn:

While You Were Sleeping - Scheduling SAS Jobs to Run Automatically Faron Kincheloe, Baylor University, Waco, TX

General Software Development Standards and Guidelines Version 3.5

Java course - IAG0040. Unit testing & Agile Software Development

INFSCI 0017 Fundamentals of Object- Oriented Programming

DOING MORE WITH WORD: MICROSOFT OFFICE 2010

Bitrix Site Manager 4.0. Quick Start Guide to Newsletters and Subscriptions

Java Language Tools COPYRIGHTED MATERIAL. Part 1. In this part...

CSCI 1301: Introduction to Computing and Programming Summer 2015 Project 1: Credit Card Pay Off

Introduction to Java Programming ITP 109 (2 Units) Fall 2015

Lecture 9. Semantic Analysis Scoping and Symbol Table

Microsoft Word Revising Word Documents Using Markup Tools

CISC 181 Project 3 Designing Classes for Bank Accounts

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Lab 4.4 Secret Messages: Indexing, Arrays, and Iteration

Translating to Java. Translation. Input. Many Level Translations. read, get, input, ask, request. Requirements Design Algorithm Java Machine Language

Week 2 Practical Objects and Turtles

A binary search tree or BST is a binary tree that is either empty or in which the data element of each node has a key, and:

Sage Abra SQL HRMS System. User Guide

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science

Using SVN to Manage Source RTL

Assignment 2: Matchismo 2

Mastering Mail Merge. 2 Parts to a Mail Merge. Mail Merge Mailings Ribbon. Mailings Create Envelopes or Labels

Automated Inventory System

Bash shell programming Part II Control statements

AppendixA1A1. Java Language Coding Guidelines. A1.1 Introduction

VISUAL GUIDE to. RX Scripting. for Roulette Xtreme - System Designer 2.0

Exercise 4 Learning Python language fundamentals

Advanced Tornado TWENTYONE Advanced Tornado Accessing MySQL from Python LAB

Learn Perl by Example - Perl Handbook for Beginners - Basics of Perl Scripting Language

Statements and Control Flow

Installing Java (Windows) and Writing your First Program

Chapter 2: Algorithm Discovery and Design. Invitation to Computer Science, C++ Version, Third Edition

Transcription:

PA2: Word Cloud (100 Points) Due: 11:59pm, Thursday, April 16th Overview You will create a program to read in a text file and output the most frequent and unique words by using an ArrayList. Setup In all of the following, the > is a generic command line prompt (you do not type that). You will need to create a new directory named pa2 in your cs8b home directory on ieng6.ucsd.edu. > cd > mkdir pa2 The first command (cd) changes your current directory to your home directory. cd stands for change directory. By default, if you do not specify a directory to change to the command will put you in your home directory. The second command (mkdir pa2) makes a new directory named pa2. This new directory will be in your home directory since you did a cd beforehand. Copy the provided files from the public directory by typing in: > cp ~/../public/pa2/* ~/pa2/ Now type > cd pa2 This will change your current working directory to the new pa2 directory you just created. All files associated with this programming assignment must be place in this directory. And in general, you should do all your work on this programming assignment in this pa2 directory. Once you have created and navigated to your pa2 directory you can run the following command: > ls Your pa2 directory should now contain the following files: // code files WordPair.java WordCloudTester.java WordCloud.java // do not change // do not change // the class you will modification

// text files, will not be submitted commonwords.txt // the common words in English to exclude small.txt // small text file, feel free to edit and/or change usdeclaration.txt // the US Declaration of Independence for testing usconst.txt // the US Constitution for testing screenplaytig.txt // text from The Imitation Game // sample output files, will not be submitted usconst_10.out // correct output for running the usconst with 10 words usdeclaration_10.out // correct output for running the usdeclaration with 20 words To use the sample output to compare with the output of your program, you may use the following command (diff). If the files are the same, then after you type in the diff command nothing will be displayed. If they re different then you will see text on the screen starting from the line where the two files differ. Example diff output if correct $ java WordCloudTester usconst.txt 10 > myoutput.out $ diff myoutput.out usconst_10.out Example diff output if incorrect: $ diff myoutput.out usconst_10.out 4c4 < States(110) President(102) United(85) State(75) Congress(57) Office(37) Law(35) Amendment(35) Person(34) House(33) --- > States(114) President(106) United(85) State(75) Congress(57) Office(37) Law(35) Amendment(35) Person(34) House(33) For more information on diff, type: man diff README ( 10 points ) You are required to provide a text file named README, NOT Readme.txt, README.pdf, or README.docx, etc. with your assignment in your pa1 t directory. There should be no file extension after the file name README. Your README should include the following sections: Program Description ( 3 points ) : Describe what the program does as if it was intended for a 5 year old or your grandmother. Do not assume your reader is a computer science major. Short Response ( 7 points ): Answer the following questions: Vim related Questions: 1. How do you switch from insert mode to command mode in vim? 2. What are two ways to enter insert mode from command mode in vim? 3. a. How do you quit a file in vim? b. How do you quit a file in vim, without having to save the file first? 4. a. How do you save a file in vim? b. How do you save and quit a file in vim using a single command?

Style ( 20 points ) Unix/Linux related Questions: 5. a. How do you change directories from the command line? b. How do you change to the home directory, no matter what directory you are currently in? 6. How do you make a directory from the command line? 7. How do you show the path to the directory you are currently in? You will be graded for the style of programming on this assignment. A few suggestions/requirements for style are given below. These guidelines for style will have to be followed for all the remaining assignments. Read them carefully. Use reasonable in-line comments to make your code clear and readable. Use class headers and method header blocks to describe the purpose of your program and methods (see below). Also, use file headers. Every time you open a new block of code (use a '{'), indent farther. Go back to the previous level of indenting when you close the block (use a ''). Keep all lines less than 80 characters. Use 2-3 spaces for each level of indentation. Make sure each level of indentation lines up evenly. Use reasonable variable names. Example: if (bunnies are in your house){ if(you are not allergic to them){ rejoice(); playwithbunnies(); else{ calmlyexithouse(); havesomeonemovebunny(); Other options for alignment of brackets: if (you have glitter) { throwatyourfriends(); else { getglitter(); Use static final variables to make your code as general as possible. Judicious use of blank spaces around logical chunks of code makes your code much easier to read and debug.

Do not use magic numbers or hard-coded numbers. This means that if you want to use a number other than 0, -1, or 1, you must give it a variable name. This is so that your values are understandable and also can be changed later if need be. Always recompile and run your program right before turning it in, just in case you commented out some code by mistake. You will be specifically be graded on commenting, file headers, class and method headers, meaningful variable names, sufficient use of blank lines, not using more than 80 characters on a line, perfect indentation, and no magic numbers/hard-coded numbers other than zero. A note on the starter files given: There are some comments to explain what the methods do but you should delete them and write your own comments instead. Do not mistake these comments for method headers as method headers require additional information and need to follow the format below. Example file header comment: /* * Name: Jane-Joe Student * Login: cs8bxx <<< --- Use your cs8b course-specific account name * Date: Month Day, Year * File: Name of this file, for example: WordCloud.java * Sources of Help:... (for example: names of people, books, websites, etc.) * * Describe what this program does here. */ Example class and method header comment: /* * Name: Class or method name * Purpose: Briefly describe the purpose of this class or method * Parameters: List all parameters and their types and what they represent. * If no parameters, just state None. * Return: Specify the return type and what it represents. * If no return value, just specify void. */ Correctness (70 points) You are provided with the full code for WordPair.java and WordCloudTester.java. You do not need to modify these files. You will need to implement the following methods in WordCloud.java. The intended use for this class is: 1. read in the words from a file 2. strip out any common words (i.e., the, a, an) (the file with common words is given to you) 3. display the most occurring words in the file Each of the methods below corresponds to one of these steps.

25 points void getwordsfromfile( String filename ) This method constructs an ArrayList containing WordPairs for each word in the file. Your algorithm for this will be: for every word in the file search for the word in the arraylist if the word is present increment count if word was not already in the arraylist add word to the arraylist Also, see Scanner section below. As an important note, adding a word into the ArrayList should be case insensitive and should just keep the String in the cloud the same as its first occurrence. 15 points void removecommon(string filename) This method will read in each word from the specified file and remove that word from the ArrayList. To remove the word from the list, beware that you cannot iterate over a list while modifying it. (In other words, your for loop which iterates over each value should start over whenever you remove a word.) In addition, your method need not be efficient (nested for loops is fine). Also, removing a word is case insensitive like in the method before. Also, see Scanner section below. 25 points void printtopnwords(int n) This method will print the top n words, as determined by their frequency, in the array list. To implement this method, you can iterate through the list looking for the word with the highest count. Once you ve printed this word out, you can negate its count. When you are done outputting the top n words, you will need to iterate through the array list again and change the negated counts back to their original values. For instance, if cat originally has the count 222, then once it has been printed out, its count should be -222. By the time the method finishes running, cat s count should be 222 again. You are required to be able to have printtopnwords execute more than once and produce the same results. In the case of a tie (two words are equally frequent), you should select the word which occurred first in the original text file. You should not sort the array list. You will lose all points for this method if you sort the array list. Also, you will be graded for formatting of the outputted text. If your formatting is wrong, up to 10 points will be deducted. Make sure that spaces and parentheses are consistent with what is provided in the sample output. 5 points int findwordcount(string word) This method will take a string and search for it in the arraylist. If the word is found, its count is returned. If the word isn t found, then 0 is returned. Also, searching for a word is case insensitive like in the methods removecommon words and getwordsfromfile.

Scanner Use a Scanner object to read words from the file. The Scanner Javadocs: http://docs.oracle.com/javase/7/docs/api/java/util/scanner.html The basic framework to use a Scanner is: // Construct a Scanner that reads in words from a file Scanner input = new Scanner( new File(fileName)); while ( input.hasnext() ) // while there are more words to be read in { number = input.next(); // reads next string WordPair This class is just a pairing of a String (a word) with an integer (the number of occurrences) for use in your WordCloud class. Your ArrayList will store WordPair objects. Applicable methods are provided. ArrayLists ArrayLists are dynamically resizing arrays. In Java, standard arrays are initialized to a fixed size when first created. However, with ArrayLists, there is an initial size but when it becomes full, it automatically enlarges. When an element is removed, it automatically gets smaller. Go to the Useful Links section on the course website for a link to Java API where you can find information about ArrayLists and the methods they have. Some suggested methods to use are : get, size, add, and remove. How To Test (Sample Output) The following is an example of using the WordCloudTester on the provided file (usdeclaration.txt), requesting the top 20 words. > java WordCloudTester usdeclaration.txt 20 Reading in File: usdeclaration.txt Removing common words Displaying the top 20 words laws(8) people(7) government(5) States(4) powers(4) assent(4) large(4) time(4) independent(4) free(4) Declaration(3) United(3) mankind(3) hold(3) rights(3) long(3) abolishing(3) usurpations(3) absolute(3) repeated(3) Turnin To turnin your code, navigate to your home directory and run the following command: > cse8bturnin pa2 You may turn in your programming assignment as many times as you like. The last submission you turn in before the deadline is the one that we will collect. Always recompile and run your program right before turning it in, just in case you commented out some code by mistake.

Verify To verify a previously turned in assignment, > cse8bverify pa2 If you are unsure your program has been turned in, use the verify command. We will not take any late files you forgot to turn in. Verify will help you check which files you have successfully submitted. It is your responsibility to make sure you properly turned in your assignment. Files to be collected: README WordCloud.java WordCloudTester.java WordPair.java The files that you turn in must be EXACTLY the same name as those above. Extra Credit Extra credit will be given for turning in your assignment early. You can earn up to maximum of 3 points (3%) extra credit. Final Turnin Date: Tuesday, April 14 11:59pm Extra Credit: 3pts Note: Only your latest turnin submission will be considered for receiving extra credit. This is because each submission overrides the last one. NO LATE ASSIGNMENTS ACCEPTED. DO NOT EMAIL US YOUR ASSIGNMENT! Start Early and Often!