NLP Programming Tutorial 0 - Programming Basics



Similar documents
FEEG Applied Programming 3 - Version Control and Git II

Tutorial 0A Programming on the command line

Introduction to Python

MATLAB & Git Versioning: The Very Basics

Working with Docker on Microsoft Azure

Code::Blocks Student Manual

Version Control with. Ben Morgan

Git Basics. Christopher Simpkins Chris Simpkins (Georgia Tech) CS 2340 Objects and Design CS / 22

Getting Started With! CGminer/BFGminer!

Running Knn Spark on EC2 Documentation

Exercise 1: Python Language Basics

Two Best Practices for Scientific Computing

ALERT installation setup

Tutorial- Counting Words in File(s) using MapReduce

Version control. HEAD is the name of the latest revision in the repository. It can be used in subversion rather than the latest revision number.

Linux Overview. Local facilities. Linux commands. The vi (gvim) editor

Installing Java (Windows) and Writing your First Program

Command Line - Part 1

PA2: Word Cloud (100 Points)

Adafruit's Raspberry Pi Lesson 6. Using SSH

Invitation to Ezhil : A Tamil Programming Language for Early Computer-Science Education 07/10/13

CISE Research Infrastructure: Mid-Scale Infrastructure - NSFCloud (CRI: NSFCloud)

Version Control using Git and Github. Joseph Rivera

Fermilab Central Web Service Site Owner User Manual. DocDB: CS-doc-5372

Vim, Emacs, and JUnit Testing. Audience: Students in CS 331 Written by: Kathleen Lockhart, CS Tutor

Lab 2 : Basic File Server. Introduction

Source Code Management for Continuous Integration and Deployment. Version 1.0 DO NOT DISTRIBUTE

University of Toronto

Creating a Java application using Perfect Developer and the Java Develo...

CS 103 Lab Linux and Virtual Machines

Linux Development Environment Description Based on VirtualBox Structure

Dalhousie University CSCI 2132 Software Development Winter 2015 Lab 7, March 11

AdWhirl Open Source Server Setup Instructions

We will learn the Python programming language. Why? Because it is easy to learn and many people write programs in Python so we can share.

Creating Microsoft Azure Web Sites

Outline. hardware components programming environments. installing Python executing Python code. decimal and binary notations running Sage

B&K Precision 1785B, 1786B, 1787B, 1788 Power supply Python Library

Extreme computing lab exercises Session one

The objective of this lab is to learn how to set up an environment for running distributed Hadoop applications.

INSTALLING AN SSH / X-WINDOW ENVIRONMENT ON A WINDOWS PC. Nicholas Fitzkee Mississippi State University

Project 5 Twitter Analyzer Due: Fri :59:59 pm

Installing C++ compiler for CSc212 Data Structures

cloud-kepler Documentation

A Short Introduction to Writing Java Code. Zoltán Majó

UBUNTU VIRTUAL MACHINE + CAFFE MACHINE LEARNING LIBRARY SETUP TUTORIAL

Mercurial. Why version control (Single users)

PROGRAMMING FOR BIOLOGISTS. BIOL 6297 Monday, Wednesday 10 am -12 pm

Introduction to Perl

WinSCP Tutorial 01/28/09: Y. Liow

CDH installation & Application Test Report

Agenda. Using HPC Wales 2

Version Control Systems (Part 2)

See the installation page

Version control with GIT

Ubuntu. Ubuntu. C++ Overview. Ubuntu. History of C++ Major Features of C++

MOOSE-Based Application Development on GitLab

What Does Tequila Have to Do with Managing Macs? Using Open Source Tools to Manage Mac OS in the Enterprise!

Computational Mathematics with Python

The Basics of FTP. Basic Order of Operations: Commands: FTP (File Transfer Protocol) allows a user to transfer files to/from a remote network site.

Tutorial Guide to the IS Unix Service

Setting up Radmind For an OSX Public Lab

Intel Do-It-Yourself Challenge Lab 2: Intel Galileo s Linux side Nicolas Vailliet

Cisco Networking Academy Program Curriculum Scope & Sequence. Fundamentals of UNIX version 2.0 (July, 2002)

Lab 1: Introduction to C, ASCII ART and the Linux Command Line Environment

How to Run Spark Application

Version control systems. Lecture 2

Hadoop Installation MapReduce Examples Jake Karnes

Introduction. Created by Richard Bell 10/29/2014

Version Control with Git. Kate Hedstrom ARSC, UAF

Theorist HT Induc0on Course Lesson 1: Se6ng up your new computer (Mac OS X >= 10.6) As of 9/27/2012

PuTTY/Cygwin Tutorial. By Ben Meister Written for CS 23, Winter 2007

BACKUP YOUR SENSITIVE DATA WITH BACKUP- MANAGER

Steps for running C-program

Lab 1 Beginning C Program

Introduction to Git. Markus Kötter Notes. Leinelab Workshop July 28, 2015

Magento Search Extension TECHNICAL DOCUMENTATION

CSCE 110 Programming I Basics of Python: Variables, Expressions, and Input/Output

Basic C Shell. helpdesk@stat.rice.edu. 11th August 2003

A SHORT INTRODUCTION TO DUPLICITY WITH CLOUD OBJECT STORAGE. Version

The Build Process. of (GNU Tools for ARM Embedded Processors)

Remote Desktop With P2P VNC

Memopol Documentation

Building a Python Plugin

Computational Mathematics with Python

Unix Guide. Logo Reproduction. School of Computing & Information Systems. Colours red and black on white backgroun

PCSpim Tutorial. Nathan Goulding-Hotta v0.1

Intro to Intel Galileo - IoT Apps GERARDO CARMONA

Computational Mathematics with Python

Version control. with git and GitHub. Karl Broman. Biostatistics & Medical Informatics, UW Madison

Version control tracks multiple versions. Configuration Management. Version Control. V Software Engineering Lecture 12, Spring 2008

About the Tutorial. Audience. Prerequisites. Copyright & Disclaimer GIT

An Introduction to Mercurial Version Control Software

CS 2112 Lab: Version Control

Using GitHub for Rally Apps (Mac Version)

Laboration 3 - Administration

How to read Temperature and Humidity from Am2302 sensor using Thingworx Edge java SKD for Raspberry Pi

Introducing Xcode Source Control

SparkLab May 2015 An Introduction to

Getting Started with Hadoop with Amazon s Elastic MapReduce

Mobile Labs Plugin for IBM Urban Code Deploy

Transcription:

NLP Programming Tutorial 0 - Programming Basics Graham Neubig Nara Institute of Science and Technology (NAIST) 1

About this Tutorial 14 parts, starting from easier topics Each time: During the tutorial: Learn something new At home: Do a programming exercise Next week: Talk about results with your neighbor Programming language is your choice Examples will be in Python, so it is recommended I can help with Python, C++, Java, Perl Working in pairs is encouraged 2

Setting Up Your Environment 3

Open a Terminal If you are on Linux or Mac From the program menu select terminal If you are on Windows Install cygwin or use ssh to log in to a Linux machine 4

Install Software (if necessary) 3 types of software: python: the programming language a text editor (gvim, emacs, etc.) git: A version control system Linux: sudo apt-get install git vim-gnome python Windows: Run cygwin setup.exe, select git, gvim, and python 5

Download the Tutorial Files from Github Use the git clone command to download the code $ git clone https://github.com/neubig/nlptutorial.git You should find this PDF in the downloaded directory $ cd nlptutorial $ ls download/00-intro/nlp-programming-en-00-intro.pdf 6

Using gvim You can use any text editor, but if you are using vim: If it is your first time, you may want to copy my vim settings file, which will make vim easier to use: $ cp misc/vimrc ~/.vimrc Open vim: $ gvim test.txt Press i to start input and write test Press escape, and type :wq to save and quit ( :w is save, :q is quit) 7

Using git You can use git to save your progress First, add the changed file $ git add test.txt And save your change $ git commit (Enter a message like added a test file ) Using git, you can do things like go back to your last commit (git reset), download the latest updates (git pull), or upload code to github (git push) 8

Basic Programming 9

Hello World! 1)Open my-program.py in an editor (gvim, emacs, gedit) $ gvim my-program.py 2) Type in the following program 3) Make the program executable $ chmod 755 my-program.py 4) Run the program $./my-program.py Hello World! 10

Main data types used Strings: hello, goodbye Integers: -1, 0, 1, 3 Floats: -4.2, 0.0, 3.14 $./my-program.py string: hello float: 2.500000 int: 4 11

if/else, for if this condition is true then do this otherwise do this for every element in this do this $./my-program.py my_variable is not 4 i == 1 i == 2 i == 3 i == 4 Be careful! 12 range(1, 5) == (1, 2, 3, 4)

Storing many pieces of data Dense Storage Index Value 0 20 1 94 2 10 3 2 4 0 5 19 6 3 Sparse Storage Index Value 49 20 81 94 96 10 104 2 or Index Value apple 20 banana 94 cherry 10 date 2 13

Arrays (or lists in Python) Good for dense storage Index is an integer, starting at 0 Make a list with 5 elements Add one more element to the end of the list Print the length of the list Print the 4 th element Loop through and print every element of the list 14

Maps (or dictionaries in Python) Good for sparse storage: create pairs of key/value add a new entry print size print one entry check whether a key exists print key/value pairs in order 15

defaultdict A useful expansion on dictionary with a default value import library default value of zero print existing key print non-existent key 16

Splitting and joining strings In NLP: often split sentences into words Split string at white space into an array of words Combine the array into a single string, separating with $./my-program.py... this is a pen 17

Functions Functions take an input, transform the input, and return an output function add_and_abs takes x and y as input add x and y together and return the absolute value call add_and_abs with x=-4 and y=1 18

Using command line arguments/ Reading files First argument Open file for reading with r Read the file one line at a time Delete the line end symbol \n If the line is not empty, print $./my-program.py test.txt 19

Testing Your Code 20

Simple Input/Output Tests Example: Program word-count.py should count the words in a file 1) Create a small input file 2) Count the words by hand, write them in an output file test-word-count-in.txt a b c b c d 3) Run the program test-word-count-out.txt a 1 b 2 c 2 d 1./word-count.py test-word-count-in.txt > word-count-out.txt 4) Compare the results $ diff test-word-count-out.txt word-count-out.txt 21

Unit Tests Write code to test each function Test several cases, and print an error if result is wrong Return 1 if all tests passed, 0 otherwise 22

ALWAYS Test your Code Creating tests: Makes you think about the problem before writing code Will reduce your debugging time drastically Will make your code easier to understand later 23

Practice Exercise 24

Practice Exercise Make a program that counts the frequency of words in a file this is a pen this pen is my pen a 1 is 2 my 1 pen 3 this 2 Test it on test/00-input.txt, test/00-answer.txt Run the program on the file data/wiki-en-train.word Report: The number of unique words The frequencies of the first few words in the list 25

Pseudo-code create a dictionary counts create a map to hold counts open a file for each line in the file split line into words for w in words if w exists in counts, add 1 to counts[w] else set counts[w] = 1 print key, value of counts 26