String Edit Distance (and intro to dynamic programming) Lecture #4 Computational Linguistics CMPSCI 591N, Spring 2006

Size: px
Start display at page:

Download "String Edit Distance (and intro to dynamic programming) Lecture #4 Computational Linguistics CMPSCI 591N, Spring 2006"

Transcription

1 String Edit Distance (and intro to dynamic programming) Lecture # omputational Linguistics MPSI 59N, Spring 6 University of Massachusetts mherst ndrew Mcallum

2 Dynamic Programming (Not much to do with programming in the S sense.) Dynamic programming is efficient in finding optimal solutions for cases with lots of overlapping sub-problems. It solves problems by recombining solutions to sub-problems, when the sub-problems themselves may share sub-sub-problems.

3 Fibonacci Numbers

4

5 alculating Fibonacci Numbers F(n) = F(n-) + F(n-), where F()=, F()=. Non-Dynamic Programming implementation def fib(n): if n == or n == : return n else: return fib(n-) + fib(n-) For fib(8), how many calls to function fib(n)?

6 DP Example: alculating Fibonacci Numbers Dynamic Programming: avoid repeated calls by remembering function values already calculated. table = {} def fib(n): global table if table.has_key(n): return table[n] if n == or n == : table[n] = n return n else: value = fib(n-) + fib(n-) table[n] = value return value

7 DP Example: alculating Fibonacci Numbers...or alternately, in a list instead of a dictionary... def fib(n): table = [] * (n+) table[] = table[] = for i in range(,n+): table[i] = table[i-] + table[i-] return table[n] We will see this pattern many more times in this course:. reate a table (of the right dimensions to describe our problem.. Fill the table, re-using solutions to previous sub-problems.

8 String Edit Distance Given two strings (sequences) return the distance between the two strings as measured by......the minimum number of character edit operations needed to turn one sequence into the other. ndrew mdrewz. substitute m to n. delete the z Distance =

9 String distance metrics: Levenshtein Given strings s and t Distance is shortest sequence of edit commands that transform s to t, (or equivalently s to t). Simple set of operations: opy character from s over to t (cost ) Delete a character in s (cost ) Insert a character in t (cost ) Substitute one character for another (cost ) This is Levenshtein distance

10 Levenshtein distance - example distance( William ohen, Willliam ohon ) S O E I N H O _ M I L L L I W N H O _ M I L L I W s t edit op cost so far... alignment gap lignment is a little bit like a parse.

11 Finding the Minimum What is the minimum number of operations for...? nother fine day in the park nybody can see him pick the ball Not so easy... not so clear. Not only are the strings, longer, but is isn t immediately obvious where the alignments should happen. What if we consider all possible alignments by brute force? How many alignments are there?

12 Dynamic Program Table for String Edit Measure distance between strings PRK SPKE P R K S P K E c ij c ij = the number of edit operations needed to align P with SP.

13 Dynamic Programming to the Rescue! How to take our big problem and chop it into building-block pieces. Given some partial solution, it isn t hard to figure out what a good next immediate step is. Partial solution = This is the cost for aligning s up to position i with t up to position j. Next step = In order to align up to positions x in s and y in t, should the last operation be a substitute, insert, or delete?

14 Dynamic Program Table for String Edit Measure distance between strings PRK SPKE Edit operations for turning SPKE into PRK S P delete P R K insert K E substitute

15 Dynamic Program Table for String Edit Measure distance between strings PRK SPKE P R K c c c c c 5 S c c c c c P c c c c c c c??? K E

16 Dynamic Program Table for String Edit P R K c c c c c 5 S c c c c c P c c subst c delete c c c insert c??? K E D(i,j) = score of best alignment from s..si to t..tj = min D(i-,j-), if si=tj //copy D(i-,j-)+, if si!=tj //substitute D(i-,j)+ //insert D(i,j-)+ //delete

17 omputing Levenshtein distance - D(i,j) = score of best alignment from s..si to t..tj = min D(i-,j-) + d(si,tj) //subst/copy D(i-,j)+ //insert D(i,j-)+ //delete (simplify by letting d(c,d)= if c=d, else) also let D(i,)=i (for i inserts) and D(,j)=j

18 Dynamic Program Table Initialized P R K S P K E 5 D(i,j) = score of best alignment from s..si to t..tj D(i-,j-)+d(si,tj) //substitute D(i-,j)+ //insert = min D(i,j-)+ //delete

19 Dynamic Program Table... filling in P R K S P K E 5 D(i,j) = score of best alignment from s..si to t..tj D(i-,j-)+d(si,tj) //substitute D(i-,j)+ //insert = min D(i,j-)+ //delete

20 Dynamic Program Table... filling in P R K S P K E 5 D(i,j) = score of best alignment from s..si to t..tj D(i-,j-)+d(si,tj) //substitute D(i-,j)+ //insert = min D(i,j-)+ //delete

21 Dynamic Program Table... filling in P R K S P K E 5 D(i,j) = score of best alignment from s..si to t..tj D(i-,j-)+d(si,tj) //substitute D(i-,j)+ //insert = min D(i,j-)+ //delete

22 Dynamic Program Table... filling in S P K E 5 P D(i,j) = score of best alignment from s..si to t..tj = min R D(i-,j-)+d(si,tj) D(i-,j)+ D(i,j-)+ K //substitute //insert //delete Final cost of aligning all of both strings.

23 DP String Edit Distance def stredit (s,s): "alculate Levenstein edit distance for strings s and s." len = len(s) # vertically len = len(s) # horizontally # llocate the table table = [None]*(len+) for i in range(len+): table[i] = []*(len+) # Initialize the table for i in range(, len+): table[i][] = i for i in range(, len+): table[][i] = i # Do dynamic programming for i in range(,len+): for j in range(,len+): if s[j-] == s[i-]: d = else: d = table[i][j] = min(table[i-][j-] + d, table[i-][j]+, table[i][j-]+)

24 Remebering the lignment (trace) D(i,j) = min D(i-,j-) + d(si,tj) //subst/copy D(i-,j)+ //insert D(i,j-)+ //delete trace indicates where the min value came from, and can be used to find edit operations and/or a best alignment (may be more than ) M O H N 5 O H E N

25 Three Enhanced Variants Needleman-Munch Variable costs Smith-Waterman Find longest soft matching subsequence ffine Gap Distance Make repeated deletions (insertions) cheaper (Implement one for homework?)

26 Needleman-Wunch distance D(i,j) = min D(i-,j-) + d(si,tj) //subst/copy D(i-,j) + G //insert D(i,j-) + G //delete d(c,d) is an arbitrary distance function on characters (e.g. related to typo frequencies, amino acid substitutibility, etc) G = gap cost William ohen Wukkuan igeb

27 Smith-Waterman distance Instead of looking at each sequence in its entirety, this compares segments of all possible lengths and chooses whichever maximize the similarity measure. For every cell the algorithm calculates all possible paths leading to it. These paths can be of any length and can contain insertions and deletions.

28 Smith-Waterman distance D(i,j) = min //start over D(i-,j-) + d(si,tj) //subst/copy D(i-,j) + G //insert D(i,j-) + G //delete O H E N G = d(c,c) = - d(c,d) = + M O H N

29 Example output from Python s ' a l l o n g e r l * o *- - u *- - n *- - - g 5 - * e *-7-6 (My implementation of HW#, task choice #. -Mcallum)

30 ffine gap distances Smith-Waterman fails on some pairs that seem quite similar: William W. ohen William W. Don t call me Dubya ohen Intuitively, single a single long long insertions are is cheaper than a lot lot of of short insertions

31 ffine gap distances - Idea: urrent cost of a gap of n characters: ng Make this cost: + (n-)b, where is cost of opening a gap, and B is cost of continuing a gap.

32 ffine gap distances - D(i,j) = max D(i-,j-) D(i-,j-) + d(si,tj) d(si,tj) //subst/copy D(i-,j)- IS(I-,j-) + d(si,tj) //insert D(i,j-)- //delete IT(I-,j-) + d(si,tj) IS(i,j) = max D(i-,j) - IS(i-,j) - B Best score in which si is aligned with a gap IT(i,j) = max D(i,j-) - IT(i,j-) - B Best score in which tj is aligned with a gap

33 ffine gap distances as automata -d(si,tj) IS -B -d(si,tj) D - - -d(si,tj) IT -B

34 Generative version of affine gap automata (Bilenko&Mooney, TechReport ) HMM emits pairs: (c,d) in state M, pairs (c,-) in state D, and pairs (-,d) in state I. For each state there is a multinomial distribution on pairs. The HMM can trained with EM from a sample of pairs of matched strings (s,t) E-step is forward-backward; M-step uses some ad hoc smoothing

35 ffine gap edit-distance learning: experiments results (Bilenko & Mooney) Experimental method: parse records into fields; append a few key fields together; sort by similarity; pick a threshold T and call all pairs with distance(s,t) < T duplicates ; picking T to maximize F-measure.

36 ffine gap edit-distance learning: experiments results (Bilenko & Mooney)

37 ffine gap edit-distance learning: experiments results (Bilenko & Mooney) Precision/recall for MILING dataset duplicate detection

38 ffine gap distances experiments (from Mcallum, Nigam,Ungar KDD) Goal is to match data like this:

39 The assignment Homework # Start with my stredit.py code Make some modifications Write a little about your experiences Some possible modifications Implement Needleman-Wunch, Smith-Waterman, or ffine Gap Distance. reate a little spell-checker: if entered word isn t in the dictionary, return the dictionary word that is closest. hange implementation to operate on sequences of words rather than characters... get an online translation dictionary, and find alignments between English & French or English & Russian! Try to learn the parameters of the function from data. (Tough.)

Pairwise Sequence Alignment

Pairwise Sequence Alignment Pairwise Sequence Alignment carolin.kosiol@vetmeduni.ac.at SS 2013 Outline Pairwise sequence alignment global - Needleman Wunsch Gotoh algorithm local - Smith Waterman algorithm BLAST - heuristics What

More information

Dynamic Programming. Lecture 11. 11.1 Overview. 11.2 Introduction

Dynamic Programming. Lecture 11. 11.1 Overview. 11.2 Introduction Lecture 11 Dynamic Programming 11.1 Overview Dynamic Programming is a powerful technique that allows one to solve many different types of problems in time O(n 2 ) or O(n 3 ) for which a naive approach

More information

Programming Exercises

Programming Exercises s CMPS 5P (Professor Theresa Migler-VonDollen ): Assignment #8 Problem 6 Problem 1 Programming Exercises Modify the recursive Fibonacci program given in the chapter so that it prints tracing information.

More information

Arithmetic Coding: Introduction

Arithmetic Coding: Introduction Data Compression Arithmetic coding Arithmetic Coding: Introduction Allows using fractional parts of bits!! Used in PPM, JPEG/MPEG (as option), Bzip More time costly than Huffman, but integer implementation

More information

Lecture 13: The Knapsack Problem

Lecture 13: The Knapsack Problem Lecture 13: The Knapsack Problem Outline of this Lecture Introduction of the 0-1 Knapsack Problem. A dynamic programming solution to this problem. 1 0-1 Knapsack Problem Informal Description: We have computed

More information

Scheduling Shop Scheduling. Tim Nieberg

Scheduling Shop Scheduling. Tim Nieberg Scheduling Shop Scheduling Tim Nieberg Shop models: General Introduction Remark: Consider non preemptive problems with regular objectives Notation Shop Problems: m machines, n jobs 1,..., n operations

More information

Data Warehousing. Jens Teubner, TU Dortmund jens.teubner@cs.tu-dortmund.de. Winter 2014/15. Jens Teubner Data Warehousing Winter 2014/15 1

Data Warehousing. Jens Teubner, TU Dortmund jens.teubner@cs.tu-dortmund.de. Winter 2014/15. Jens Teubner Data Warehousing Winter 2014/15 1 Jens Teubner Data Warehousing Winter 2014/15 1 Data Warehousing Jens Teubner, TU Dortmund jens.teubner@cs.tu-dortmund.de Winter 2014/15 Jens Teubner Data Warehousing Winter 2014/15 152 Part VI ETL Process

More information

Solutions to Homework 6

Solutions to Homework 6 Solutions to Homework 6 Debasish Das EECS Department, Northwestern University ddas@northwestern.edu 1 Problem 5.24 We want to find light spanning trees with certain special properties. Given is one example

More information

Programming Using Python

Programming Using Python Introduction to Computation and Programming Using Python Revised and Expanded Edition John V. Guttag The MIT Press Cambridge, Massachusetts London, England CONTENTS PREFACE xiii ACKNOWLEDGMENTS xv 1 GETTING

More information

CS 2112 Spring 2014. 0 Instructions. Assignment 3 Data Structures and Web Filtering. 0.1 Grading. 0.2 Partners. 0.3 Restrictions

CS 2112 Spring 2014. 0 Instructions. Assignment 3 Data Structures and Web Filtering. 0.1 Grading. 0.2 Partners. 0.3 Restrictions CS 2112 Spring 2014 Assignment 3 Data Structures and Web Filtering Due: March 4, 2014 11:59 PM Implementing spam blacklists and web filters requires matching candidate domain names and URLs very rapidly

More information

FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM

FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM International Journal of Innovative Computing, Information and Control ICIC International c 0 ISSN 34-48 Volume 8, Number 8, August 0 pp. 4 FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT

More information

Closest Pair Problem

Closest Pair Problem Closest Pair Problem Given n points in d-dimensions, find two whose mutual distance is smallest. Fundamental problem in many applications as well as a key step in many algorithms. p q A naive algorithm

More information

Inferring Probabilistic Models of cis-regulatory Modules. BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 2015 Colin Dewey cdewey@biostat.wisc.

Inferring Probabilistic Models of cis-regulatory Modules. BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 2015 Colin Dewey cdewey@biostat.wisc. Inferring Probabilistic Models of cis-regulatory Modules MI/S 776 www.biostat.wisc.edu/bmi776/ Spring 2015 olin Dewey cdewey@biostat.wisc.edu Goals for Lecture the key concepts to understand are the following

More information

VGRAM: Improving Performance of Approximate Queries on String Collections Using Variable-Length Grams

VGRAM: Improving Performance of Approximate Queries on String Collections Using Variable-Length Grams VGRAM: Improving Performance of Approximate Queries on String Collections Using Variable-Length Grams Chen Li University of California, Irvine CA 9697, USA chenli@ics.uci.edu Bin Wang Northeastern University

More information

R-trees. R-Trees: A Dynamic Index Structure For Spatial Searching. R-Tree. Invariants

R-trees. R-Trees: A Dynamic Index Structure For Spatial Searching. R-Tree. Invariants R-Trees: A Dynamic Index Structure For Spatial Searching A. Guttman R-trees Generalization of B+-trees to higher dimensions Disk-based index structure Occupancy guarantee Multiple search paths Insertions

More information

Topics in Computational Linguistics. Learning to Paraphrase: An Unsupervised Approach Using Multiple-Sequence Alignment

Topics in Computational Linguistics. Learning to Paraphrase: An Unsupervised Approach Using Multiple-Sequence Alignment Topics in Computational Linguistics Learning to Paraphrase: An Unsupervised Approach Using Multiple-Sequence Alignment Regina Barzilay and Lillian Lee Presented By: Mohammad Saif Department of Computer

More information

Near Optimal Solutions

Near Optimal Solutions Near Optimal Solutions Many important optimization problems are lacking efficient solutions. NP-Complete problems unlikely to have polynomial time solutions. Good heuristics important for such problems.

More information

Eventia Log Parsing Editor 1.0 Administration Guide

Eventia Log Parsing Editor 1.0 Administration Guide Eventia Log Parsing Editor 1.0 Administration Guide Revised: November 28, 2007 In This Document Overview page 2 Installation and Supported Platforms page 4 Menus and Main Window page 5 Creating Parsing

More information

agucacaaacgcu agugcuaguuua uaugcagucuua

agucacaaacgcu agugcuaguuua uaugcagucuua RNA Secondary Structure Prediction: The Co-transcriptional effect on RNA folding agucacaaacgcu agugcuaguuua uaugcagucuua By Conrad Godfrey Abstract RNA secondary structure prediction is an area of bioinformatics

More information

THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS

THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS O.U. Sezerman 1, R. Islamaj 2, E. Alpaydin 2 1 Laborotory of Computational Biology, Sabancı University, Istanbul, Turkey. 2 Computer Engineering

More information

Topological Data Analysis Applications to Computer Vision

Topological Data Analysis Applications to Computer Vision Topological Data Analysis Applications to Computer Vision Vitaliy Kurlin, http://kurlin.org Microsoft Research Cambridge and Durham University, UK Topological Data Analysis quantifies topological structures

More information

B490 Mining the Big Data. 2 Clustering

B490 Mining the Big Data. 2 Clustering B490 Mining the Big Data 2 Clustering Qin Zhang 1-1 Motivations Group together similar documents/webpages/images/people/proteins/products One of the most important problems in machine learning, pattern

More information

Course: Model, Learning, and Inference: Lecture 5

Course: Model, Learning, and Inference: Lecture 5 Course: Model, Learning, and Inference: Lecture 5 Alan Yuille Department of Statistics, UCLA Los Angeles, CA 90095 yuille@stat.ucla.edu Abstract Probability distributions on structured representation.

More information

Engineering Problem Solving and Excel. EGN 1006 Introduction to Engineering

Engineering Problem Solving and Excel. EGN 1006 Introduction to Engineering Engineering Problem Solving and Excel EGN 1006 Introduction to Engineering Mathematical Solution Procedures Commonly Used in Engineering Analysis Data Analysis Techniques (Statistics) Curve Fitting techniques

More information

Dynamic Programming Problem Set Partial Solution CMPSC 465

Dynamic Programming Problem Set Partial Solution CMPSC 465 Dynamic Programming Problem Set Partial Solution CMPSC 465 I ve annotated this document with partial solutions to problems written more like a test solution. (I remind you again, though, that a formal

More information

New Hash Function Construction for Textual and Geometric Data Retrieval

New Hash Function Construction for Textual and Geometric Data Retrieval Latest Trends on Computers, Vol., pp.483-489, ISBN 978-96-474-3-4, ISSN 79-45, CSCC conference, Corfu, Greece, New Hash Function Construction for Textual and Geometric Data Retrieval Václav Skala, Jan

More information

2. Select Point B and rotate it by 15 degrees. A new Point B' appears. 3. Drag each of the three points in turn.

2. Select Point B and rotate it by 15 degrees. A new Point B' appears. 3. Drag each of the three points in turn. In this activity you will use Sketchpad s Iterate command (on the Transform menu) to produce a spiral design. You ll also learn how to use parameters, and how to create animation action buttons for parameters.

More information

Data Deduplication in Slovak Corpora

Data Deduplication in Slovak Corpora Ľ. Štúr Institute of Linguistics, Slovak Academy of Sciences, Bratislava, Slovakia Abstract. Our paper describes our experience in deduplication of a Slovak corpus. Two methods of deduplication a plain

More information

GENERAL SCIENCE LABORATORY 1110L Lab Experiment 3: PROJECTILE MOTION

GENERAL SCIENCE LABORATORY 1110L Lab Experiment 3: PROJECTILE MOTION GENERAL SCIENCE LABORATORY 1110L Lab Experiment 3: PROJECTILE MOTION Objective: To understand the motion of a projectile in the earth s gravitational field and measure the muzzle velocity of the projectile

More information

Cost Model: Work, Span and Parallelism. 1 The RAM model for sequential computation:

Cost Model: Work, Span and Parallelism. 1 The RAM model for sequential computation: CSE341T 08/31/2015 Lecture 3 Cost Model: Work, Span and Parallelism In this lecture, we will look at how one analyze a parallel program written using Cilk Plus. When we analyze the cost of an algorithm

More information

Solving Simultaneous Equations and Matrices

Solving Simultaneous Equations and Matrices Solving Simultaneous Equations and Matrices The following represents a systematic investigation for the steps used to solve two simultaneous linear equations in two unknowns. The motivation for considering

More information

Paper 109-25 Merges and Joins Timothy J Harrington, Trilogy Consulting Corporation

Paper 109-25 Merges and Joins Timothy J Harrington, Trilogy Consulting Corporation Paper 109-25 Merges and Joins Timothy J Harrington, Trilogy Consulting Corporation Abstract This paper discusses methods of joining SAS data sets. The different methods and the reasons for choosing a particular

More information

Approximation Algorithms

Approximation Algorithms Approximation Algorithms or: How I Learned to Stop Worrying and Deal with NP-Completeness Ong Jit Sheng, Jonathan (A0073924B) March, 2012 Overview Key Results (I) General techniques: Greedy algorithms

More information

Computational Geometry: Line segment intersection

Computational Geometry: Line segment intersection : Line segment intersection Panos Giannopoulos Wolfgang Mulzer Lena Schlipf AG TI SS 2013 Tutorial room change: 055 this building!!! (from next monday on) Outline Motivation Line segment intersection (and

More information

CSC 180 H1F Algorithm Runtime Analysis Lecture Notes Fall 2015

CSC 180 H1F Algorithm Runtime Analysis Lecture Notes Fall 2015 1 Introduction These notes introduce basic runtime analysis of algorithms. We would like to be able to tell if a given algorithm is time-efficient, and to be able to compare different algorithms. 2 Linear

More information

Bio-Informatics Lectures. A Short Introduction

Bio-Informatics Lectures. A Short Introduction Bio-Informatics Lectures A Short Introduction The History of Bioinformatics Sanger Sequencing PCR in presence of fluorescent, chain-terminating dideoxynucleotides Massively Parallel Sequencing Massively

More information

Error Log Processing for Accurate Failure Prediction. Humboldt-Universität zu Berlin

Error Log Processing for Accurate Failure Prediction. Humboldt-Universität zu Berlin Error Log Processing for Accurate Failure Prediction Felix Salfner ICSI Berkeley Steffen Tschirpke Humboldt-Universität zu Berlin Introduction Context of work: Error-based online failure prediction: error

More information

Scheduling Programming Activities and Johnson's Algorithm

Scheduling Programming Activities and Johnson's Algorithm Scheduling Programming Activities and Johnson's Algorithm Allan Glaser and Meenal Sinha Octagon Research Solutions, Inc. Abstract Scheduling is important. Much of our daily work requires us to juggle multiple

More information

Math 55: Discrete Mathematics

Math 55: Discrete Mathematics Math 55: Discrete Mathematics UC Berkeley, Fall 2011 Homework # 5, due Wednesday, February 22 5.1.4 Let P (n) be the statement that 1 3 + 2 3 + + n 3 = (n(n + 1)/2) 2 for the positive integer n. a) What

More information

Fast Sequential Summation Algorithms Using Augmented Data Structures

Fast Sequential Summation Algorithms Using Augmented Data Structures Fast Sequential Summation Algorithms Using Augmented Data Structures Vadim Stadnik vadim.stadnik@gmail.com Abstract This paper provides an introduction to the design of augmented data structures that offer

More information

Introduction to Parallel Programming and MapReduce

Introduction to Parallel Programming and MapReduce Introduction to Parallel Programming and MapReduce Audience and Pre-Requisites This tutorial covers the basics of parallel programming and the MapReduce programming model. The pre-requisites are significant

More information

EE602 Algorithms GEOMETRIC INTERSECTION CHAPTER 27

EE602 Algorithms GEOMETRIC INTERSECTION CHAPTER 27 EE602 Algorithms GEOMETRIC INTERSECTION CHAPTER 27 The Problem Given a set of N objects, do any two intersect? Objects could be lines, rectangles, circles, polygons, or other geometric objects Simple to

More information

Algorithm Design and Recursion

Algorithm Design and Recursion Chapter 13 Algorithm Design and Recursion Objectives To understand basic techniques for analyzing the efficiency of algorithms. To know what searching is and understand the algorithms for linear and binary

More information

Chapter 2: Algorithm Discovery and Design. Invitation to Computer Science, C++ Version, Third Edition

Chapter 2: Algorithm Discovery and Design. Invitation to Computer Science, C++ Version, Third Edition Chapter 2: Algorithm Discovery and Design Invitation to Computer Science, C++ Version, Third Edition Objectives In this chapter, you will learn about: Representing algorithms Examples of algorithmic problem

More information

Computers. An Introduction to Programming with Python. Programming Languages. Programs and Programming. CCHSG Visit June 2014. Dr.-Ing.

Computers. An Introduction to Programming with Python. Programming Languages. Programs and Programming. CCHSG Visit June 2014. Dr.-Ing. Computers An Introduction to Programming with Python CCHSG Visit June 2014 Dr.-Ing. Norbert Völker Many computing devices are embedded Can you think of computers/ computing devices you may have in your

More information

Introduction to Computer Science I Spring 2014 Mid-term exam Solutions

Introduction to Computer Science I Spring 2014 Mid-term exam Solutions Introduction to Computer Science I Spring 2014 Mid-term exam Solutions 1. Question: Consider the following module of Python code... def thing_one (x): y = 0 if x == 1: y = x x = 2 if x == 2: y = -x x =

More information

Module 9 The CIS error profiling technology

Module 9 The CIS error profiling technology Florian Fink Module 9 The CIS error profiling technology 2015-09-15 1 / 24 Module 9 The CIS error profiling technology Florian Fink Centrum für Informations- und Sprachverarbeitung (CIS) Ludwig-Maximilians-Universität

More information

CD-HIT User s Guide. Last updated: April 5, 2010. http://cd-hit.org http://bioinformatics.org/cd-hit/

CD-HIT User s Guide. Last updated: April 5, 2010. http://cd-hit.org http://bioinformatics.org/cd-hit/ CD-HIT User s Guide Last updated: April 5, 2010 http://cd-hit.org http://bioinformatics.org/cd-hit/ Program developed by Weizhong Li s lab at UCSD http://weizhong-lab.ucsd.edu liwz@sdsc.edu 1. Introduction

More information

Random Fibonacci-type Sequences in Online Gambling

Random Fibonacci-type Sequences in Online Gambling Random Fibonacci-type Sequences in Online Gambling Adam Biello, CJ Cacciatore, Logan Thomas Department of Mathematics CSUMS Advisor: Alfa Heryudono Department of Mathematics University of Massachusetts

More information

Face detection is a process of localizing and extracting the face region from the

Face detection is a process of localizing and extracting the face region from the Chapter 4 FACE NORMALIZATION 4.1 INTRODUCTION Face detection is a process of localizing and extracting the face region from the background. The detected face varies in rotation, brightness, size, etc.

More information

for ECM Titanium) This guide contains a complete explanation of the Driver Maker plug-in, an add-on developed for

for ECM Titanium) This guide contains a complete explanation of the Driver Maker plug-in, an add-on developed for Driver Maker User Guide (Plug-in for ECM Titanium) Introduction This guide contains a complete explanation of the Driver Maker plug-in, an add-on developed for ECM Titanium, the chip-tuning software produced

More information

Lempel-Ziv Coding Adaptive Dictionary Compression Algorithm

Lempel-Ziv Coding Adaptive Dictionary Compression Algorithm Lempel-Ziv Coding Adaptive Dictionary Compression Algorithm 1. LZ77:Sliding Window Lempel-Ziv Algorithm [gzip, pkzip] Encode a string by finding the longest match anywhere within a window of past symbols

More information

Statistical Machine Translation: IBM Models 1 and 2

Statistical Machine Translation: IBM Models 1 and 2 Statistical Machine Translation: IBM Models 1 and 2 Michael Collins 1 Introduction The next few lectures of the course will be focused on machine translation, and in particular on statistical machine translation

More information

Part 2: Community Detection

Part 2: Community Detection Chapter 8: Graph Data Part 2: Community Detection Based on Leskovec, Rajaraman, Ullman 2014: Mining of Massive Datasets Big Data Management and Analytics Outline Community Detection - Social networks -

More information

Collecting Polish German Parallel Corpora in the Internet

Collecting Polish German Parallel Corpora in the Internet Proceedings of the International Multiconference on ISSN 1896 7094 Computer Science and Information Technology, pp. 285 292 2007 PIPS Collecting Polish German Parallel Corpora in the Internet Monika Rosińska

More information

Introduction to: Computers & Programming: Review for Midterm 2

Introduction to: Computers & Programming: Review for Midterm 2 Introduction to: Computers & Programming: Adam Meyers New York University Summary Some Procedural Matters Summary of what you need to Know For the Test and To Go Further in the Class The Practice Midterm

More information

DesCartes (Combined) Subject: Mathematics Goal: Data Analysis, Statistics, and Probability

DesCartes (Combined) Subject: Mathematics Goal: Data Analysis, Statistics, and Probability DesCartes (Combined) Subject: Mathematics Goal: Data Analysis, Statistics, and Probability RIT Score Range: Below 171 Below 171 171-180 Data Analysis and Statistics Data Analysis and Statistics Solves

More information

Arrangements And Duality

Arrangements And Duality Arrangements And Duality 3.1 Introduction 3 Point configurations are tbe most basic structure we study in computational geometry. But what about configurations of more complicated shapes? For example,

More information

5 INTEGER LINEAR PROGRAMMING (ILP) E. Amaldi Fondamenti di R.O. Politecnico di Milano 1

5 INTEGER LINEAR PROGRAMMING (ILP) E. Amaldi Fondamenti di R.O. Politecnico di Milano 1 5 INTEGER LINEAR PROGRAMMING (ILP) E. Amaldi Fondamenti di R.O. Politecnico di Milano 1 General Integer Linear Program: (ILP) min c T x Ax b x 0 integer Assumption: A, b integer The integrality condition

More information

CS177 MIDTERM 2 PRACTICE EXAM SOLUTION. Name: Student ID:

CS177 MIDTERM 2 PRACTICE EXAM SOLUTION. Name: Student ID: CS177 MIDTERM 2 PRACTICE EXAM SOLUTION Name: Student ID: This practice exam is due the day of the midterm 2 exam. The solutions will be posted the day before the exam but we encourage you to look at the

More information

Linear Programming I

Linear Programming I Linear Programming I November 30, 2003 1 Introduction In the VCR/guns/nuclear bombs/napkins/star wars/professors/butter/mice problem, the benevolent dictator, Bigus Piguinus, of south Antarctica penguins

More information

IBM SPSS Direct Marketing 23

IBM SPSS Direct Marketing 23 IBM SPSS Direct Marketing 23 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 23, release

More information

Unit 5 Length. Year 4. Five daily lessons. Autumn term Unit Objectives. Link Objectives

Unit 5 Length. Year 4. Five daily lessons. Autumn term Unit Objectives. Link Objectives Unit 5 Length Five daily lessons Year 4 Autumn term Unit Objectives Year 4 Suggest suitable units and measuring equipment to Page 92 estimate or measure length. Use read and write standard metric units

More information

Chapter 11. 11.1 Load Balancing. Approximation Algorithms. Load Balancing. Load Balancing on 2 Machines. Load Balancing: Greedy Scheduling

Chapter 11. 11.1 Load Balancing. Approximation Algorithms. Load Balancing. Load Balancing on 2 Machines. Load Balancing: Greedy Scheduling Approximation Algorithms Chapter Approximation Algorithms Q. Suppose I need to solve an NP-hard problem. What should I do? A. Theory says you're unlikely to find a poly-time algorithm. Must sacrifice one

More information

Why is Internal Audit so Hard?

Why is Internal Audit so Hard? Why is Internal Audit so Hard? 2 2014 Why is Internal Audit so Hard? 3 2014 Why is Internal Audit so Hard? Waste Abuse Fraud 4 2014 Waves of Change 1 st Wave Personal Computers Electronic Spreadsheets

More information

IBM SPSS Direct Marketing 22

IBM SPSS Direct Marketing 22 IBM SPSS Direct Marketing 22 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 22, release

More information

Hidden Markov Models

Hidden Markov Models 8.47 Introduction to omputational Molecular Biology Lecture 7: November 4, 2004 Scribe: Han-Pang hiu Lecturer: Ross Lippert Editor: Russ ox Hidden Markov Models The G island phenomenon The nucleotide frequencies

More information

FTP client Selection and Programming

FTP client Selection and Programming COMP 431 INTERNET SERVICES & PROTOCOLS Spring 2016 Programming Homework 3, February 4 Due: Tuesday, February 16, 8:30 AM File Transfer Protocol (FTP), Client and Server Step 3 In this assignment you will

More information

DATA ANALYSIS IN PUBLIC SOCIAL NETWORKS

DATA ANALYSIS IN PUBLIC SOCIAL NETWORKS International Scientific Conference & International Workshop Present Day Trends of Innovations 2012 28 th 29 th May 2012 Łomża, Poland DATA ANALYSIS IN PUBLIC SOCIAL NETWORKS Lubos Takac 1 Michal Zabovsky

More information

Computational Mathematics with Python

Computational Mathematics with Python Computational Mathematics with Python Basics Claus Führer, Jan Erik Solem, Olivier Verdier Spring 2010 Claus Führer, Jan Erik Solem, Olivier Verdier Computational Mathematics with Python Spring 2010 1

More information

Graphing calculators Transparencies (optional)

Graphing calculators Transparencies (optional) What if it is in pieces? Piecewise Functions and an Intuitive Idea of Continuity Teacher Version Lesson Objective: Length of Activity: Students will: Recognize piecewise functions and the notation used

More information

Lecture 2, Introduction to Python. Python Programming Language

Lecture 2, Introduction to Python. Python Programming Language BINF 3360, Introduction to Computational Biology Lecture 2, Introduction to Python Young-Rae Cho Associate Professor Department of Computer Science Baylor University Python Programming Language Script

More information

6.02 Practice Problems: Routing

6.02 Practice Problems: Routing 1 of 9 6.02 Practice Problems: Routing IMPORTANT: IN ADDITION TO THESE PROBLEMS, PLEASE SOLVE THE PROBLEMS AT THE END OF CHAPTERS 17 AND 18. Problem 1. Consider the following networks: network I (containing

More information

Computational Mathematics with Python

Computational Mathematics with Python Numerical Analysis, Lund University, 2011 1 Computational Mathematics with Python Chapter 1: Basics Numerical Analysis, Lund University Claus Führer, Jan Erik Solem, Olivier Verdier, Tony Stillfjord Spring

More information

- Easy to insert & delete in O(1) time - Don t need to estimate total memory needed. - Hard to search in less than O(n) time

- Easy to insert & delete in O(1) time - Don t need to estimate total memory needed. - Hard to search in less than O(n) time Skip Lists CMSC 420 Linked Lists Benefits & Drawbacks Benefits: - Easy to insert & delete in O(1) time - Don t need to estimate total memory needed Drawbacks: - Hard to search in less than O(n) time (binary

More information

Lossless Data Compression Standard Applications and the MapReduce Web Computing Framework

Lossless Data Compression Standard Applications and the MapReduce Web Computing Framework Lossless Data Compression Standard Applications and the MapReduce Web Computing Framework Sergio De Agostino Computer Science Department Sapienza University of Rome Internet as a Distributed System Modern

More information

The Advantages and Disadvantages of Network Computing Nodes

The Advantages and Disadvantages of Network Computing Nodes Big Data & Scripting storage networks and distributed file systems 1, 2, in the remainder we use networks of computing nodes to enable computations on even larger datasets for a computation, each node

More information

Recursive Algorithms. Recursion. Motivating Example Factorial Recall the factorial function. { 1 if n = 1 n! = n (n 1)! if n > 1

Recursive Algorithms. Recursion. Motivating Example Factorial Recall the factorial function. { 1 if n = 1 n! = n (n 1)! if n > 1 Recursion Slides by Christopher M Bourke Instructor: Berthe Y Choueiry Fall 007 Computer Science & Engineering 35 Introduction to Discrete Mathematics Sections 71-7 of Rosen cse35@cseunledu Recursive Algorithms

More information

Persistent Data Structures

Persistent Data Structures 6.854 Advanced Algorithms Lecture 2: September 9, 2005 Scribes: Sommer Gentry, Eddie Kohler Lecturer: David Karger Persistent Data Structures 2.1 Introduction and motivation So far, we ve seen only ephemeral

More information

Offline sorting buffers on Line

Offline sorting buffers on Line Offline sorting buffers on Line Rohit Khandekar 1 and Vinayaka Pandit 2 1 University of Waterloo, ON, Canada. email: rkhandekar@gmail.com 2 IBM India Research Lab, New Delhi. email: pvinayak@in.ibm.com

More information

Using Edit-Distance Functions to Identify Similar E-Mail Addresses Howard Schreier, U.S. Dept. of Commerce, Washington DC

Using Edit-Distance Functions to Identify Similar E-Mail Addresses Howard Schreier, U.S. Dept. of Commerce, Washington DC Paper 073-29 Using Edit-Distance Functions to Identify Similar E-Mail Addresses Howard Schreier, U.S. Dept. of Commerce, Washington DC ABSTRACT Version 9 of SAS software has added functions which can efficiently

More information

Lecture 2 Mathcad Basics

Lecture 2 Mathcad Basics Operators Lecture 2 Mathcad Basics + Addition, - Subtraction, * Multiplication, / Division, ^ Power ( ) Specify evaluation order Order of Operations ( ) ^ highest level, first priority * / next priority

More information

9.4. The Scalar Product. Introduction. Prerequisites. Learning Style. Learning Outcomes

9.4. The Scalar Product. Introduction. Prerequisites. Learning Style. Learning Outcomes The Scalar Product 9.4 Introduction There are two kinds of multiplication involving vectors. The first is known as the scalar product or dot product. This is so-called because when the scalar product of

More information

Introduction to Microsoft Excel 2010

Introduction to Microsoft Excel 2010 Introduction to Microsoft Excel 2010 Screen Elements Quick Access Toolbar The Ribbon Formula Bar Expand Formula Bar Button File Menu Vertical Scroll Worksheet Navigation Tabs Horizontal Scroll Bar Zoom

More information

Multi-Algorithm Ontology Mapping with Automatic Weight Assignment and Background Knowledge

Multi-Algorithm Ontology Mapping with Automatic Weight Assignment and Background Knowledge Multi-Algorithm Mapping with Automatic Weight Assignment and Background Knowledge Shailendra Singh and Yu-N Cheah School of Computer Sciences Universiti Sains Malaysia 11800 USM Penang, Malaysia shai14@gmail.com,

More information

Curriculum Map. Discipline: Computer Science Course: C++

Curriculum Map. Discipline: Computer Science Course: C++ Curriculum Map Discipline: Computer Science Course: C++ August/September: How can computer programs make problem solving easier and more efficient? In what order does a computer execute the lines of code

More information

Why? A central concept in Computer Science. Algorithms are ubiquitous.

Why? A central concept in Computer Science. Algorithms are ubiquitous. Analysis of Algorithms: A Brief Introduction Why? A central concept in Computer Science. Algorithms are ubiquitous. Using the Internet (sending email, transferring files, use of search engines, online

More information

Recognition. Sanja Fidler CSC420: Intro to Image Understanding 1 / 28

Recognition. Sanja Fidler CSC420: Intro to Image Understanding 1 / 28 Recognition Topics that we will try to cover: Indexing for fast retrieval (we still owe this one) History of recognition techniques Object classification Bag-of-words Spatial pyramids Neural Networks Object

More information

Definition: A vector is a directed line segment that has and. Each vector has an initial point and a terminal point.

Definition: A vector is a directed line segment that has and. Each vector has an initial point and a terminal point. 6.1 Vectors in the Plane PreCalculus 6.1 VECTORS IN THE PLANE Learning Targets: 1. Find the component form and the magnitude of a vector.. Perform addition and scalar multiplication of two vectors. 3.

More information

Solutions to Problem Set 1

Solutions to Problem Set 1 YALE UNIVERSITY DEPARTMENT OF COMPUTER SCIENCE CPSC 467b: Cryptography and Computer Security Handout #8 Zheng Ma February 21, 2005 Solutions to Problem Set 1 Problem 1: Cracking the Hill cipher Suppose

More information

A parallel algorithm for the extraction of structured motifs

A parallel algorithm for the extraction of structured motifs parallel algorithm for the extraction of structured motifs lexandra arvalho MEI 2002/03 omputação em Sistemas Distribuídos 2003 p.1/27 Plan of the talk Biological model of regulation Nucleic acids: DN

More information

Introduction to Support Vector Machines. Colin Campbell, Bristol University

Introduction to Support Vector Machines. Colin Campbell, Bristol University Introduction to Support Vector Machines Colin Campbell, Bristol University 1 Outline of talk. Part 1. An Introduction to SVMs 1.1. SVMs for binary classification. 1.2. Soft margins and multi-class classification.

More information

Computational Mathematics with Python

Computational Mathematics with Python Boolean Arrays Classes Computational Mathematics with Python Basics Olivier Verdier and Claus Führer 2009-03-24 Olivier Verdier and Claus Führer Computational Mathematics with Python 2009-03-24 1 / 40

More information

Team Builder Project

Team Builder Project Team Builder Project Software Requirements Specification Draft 2 February 2, 2015 Team:.dat ASCII 1 Table of Contents Introduction Purpose 4 Scope of Project.4 Overview.5 Business Context 5 Glossary 6

More information

1 Introduction. Dr. T. Srinivas Department of Mathematics Kakatiya University Warangal 506009, AP, INDIA tsrinivasku@gmail.com

1 Introduction. Dr. T. Srinivas Department of Mathematics Kakatiya University Warangal 506009, AP, INDIA tsrinivasku@gmail.com A New Allgoriitthm for Miiniimum Costt Liinkiing M. Sreenivas Alluri Institute of Management Sciences Hanamkonda 506001, AP, INDIA allurimaster@gmail.com Dr. T. Srinivas Department of Mathematics Kakatiya

More information

Project Scheduling. Introduction

Project Scheduling. Introduction Project Scheduling Introduction In chapter, the O and ON networks were presented, also the time and cost of individual activities based were calculated. Yet, however, we do not know how long is the total

More information

Seminar. Path planning using Voronoi diagrams and B-Splines. Stefano Martina stefano.martina@stud.unifi.it

Seminar. Path planning using Voronoi diagrams and B-Splines. Stefano Martina stefano.martina@stud.unifi.it Seminar Path planning using Voronoi diagrams and B-Splines Stefano Martina stefano.martina@stud.unifi.it 23 may 2016 This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International

More information

10. THERM DRAWING TIPS

10. THERM DRAWING TIPS 10. THERM DRAWING TIPS 10.1. Drawing Tips The THERM User's Manual describes in detail how to draw cross-sections in THERM. This section of the NFRC Simualation Training Manual presents some additional

More information

Binary Image Scanning Algorithm for Cane Segmentation

Binary Image Scanning Algorithm for Cane Segmentation Binary Image Scanning Algorithm for Cane Segmentation Ricardo D. C. Marin Department of Computer Science University Of Canterbury Canterbury, Christchurch ricardo.castanedamarin@pg.canterbury.ac.nz Tom

More information