INF4820, Algorithms for AI and NLP: More Common Lisp Vector Spaces

Size: px
Start display at page:

Download "INF4820, Algorithms for AI and NLP: More Common Lisp Vector Spaces"

Transcription

1 INF4820, Algorithms for AI and NLP: More Common Lisp Vector Spaces Erik Velldal University of Oslo Sept. 4, 2012

2 Topics for today 2 More Common Lisp More data types: Arrays, sequences, hash tables, and structures. Iteration with loop Vector space models Spatial models for representing data The distributional hypothesis Semantic spaces

3 Arrays 3 Integer-indexed container (elements count from zero)? (setf *foo* (make-array 5)) #(nil nil nil nil nil)? (setf (aref *foo* 0) 42) 42? *foo* #(42 nil nil nil nil) Can be fixed-sized or adjustable. Can also represent grids of multiple dimensions:? (setf *foo* (make-array (2 5) :initial-element 0)) #(( ) ( ))? (incf (aref *foo* 1 2))

4 Arrays: Specializations and generalizations Specialized array types: strings and bit-vectors. Arrays and lists are subtypes of the abstract data type sequence. CL provides a large library of sequence functions, e.g.:? (length "foobarbaz") 9? (elt "foobarbaz" 8) #\z? (subseq "foobarbaz" 3 6) "bar"? (substitute #\x #\b "foobarbaz") "fooxarxaz"? (find 42 ( )) nil? (position 1 #( )) 0? (count 1 #( )) 3? (remove 1 ( )) (2 3 0)? (count-if # (lambda (x) (equalp (elt x 0) #\f)) ("foo" "bar" (b a z) "foom")) 2? (remove-if # evenp #( )) #(1 3 5)? (every # evenp #( )) NIL? (some # evenp #( )) T? (sort ( ) # <) ( ) 4

5 Hash tables 5 While lists are inefficient for indexing large data sets, and arrays restricted to numeric keys, hash tables efficiently handles a large number of (almost) arbitrary type keys. Any of the four (built-in) equality tests can be used for key comparison (a more restricted test = more efficient access).? (defparameter *table* (make-hash-table :test # equal)) *table*? (gethash "foo" *table*) nil? (setf (gethash "foo" *table*) 42) 42 Useful trick for testing, inserting and updating in one go (specyfing 0 as the default value):? (incf (gethash "bar" *table* 0)) 1? (gethash "bar" *table*) 1 Hash table iteration: use maphash or specialized loop directives.

6 Structures / structs 6 defstruct creates a new abstract data type with named slots. Encapsulates a group of related data (i.e. an object ). Each structure type is a new type distinct from all existing Lisp types. Defines a new constructor, slot accessors, and a type predicate.? (defstruct cd artist title) CD? (setf *foo* (make-cd :artist "Elvis" :title "Blue Hawaii")) #S(CD :ARTIST "Elvis" :TITLE "Blue Hawaii")? (listp *foo*) nil? (cd-p *foo*) t? (setf (cd-title *foo*) "G.I. Blues") "G.I. Blues"? *foo* #S(CD :ARTIST "Elvis" :TITLE "G.I. Blues")

7 If you can t see the forest for the trees or can t even see the trees for the parentheses. A Lisp specialty: Uniformity Lisp beginners can sometimes find the syntax overwhelming. What s with all the parentheses? For seasoned Lispers the beauty lies in the fact that there s hardly any syntax at all (beyond the abstract data type of lists). Lisp code is a Lisp data structure. Lisp programs are trees of sexps (sometimes compared to the abstract syntax trees created internally by the parser/compiler for other languages). Makes it easier to write code that generates code: macros.

8 Macros 8 Pitch: programs that generate programs. Macros provide a way for our code to manipulate itself (before it s passed to the compiler). Can implement transformations that allow us to extend the syntax of the language. Allows us to control (or even prevent) the evaluation of arguments. We ve already used some built-in Common Lisp macros: and, or, if, cond, defun, setf, etc. We might get back to writing macros ourselves later in the course, but for now let s just look at perhaps the best example of how macros can redefine the syntax of the language for good or for worse, depending on who you ask: loop.

9 Iteration with loop 9 We ve talked about recursion as a powerful control structure but at times iteration comes more natural. While there is always dolist and dotimes for simple iteration, the loop macro is much more versatile. (defun odd-numbers (list) (loop for number in list when (oddp number) collect number)) Illustrates the power of macros: loop is basically a mini-language for iteration. Goodbye uniformity; Different syntax based on special keywords. Lisp-guru Paul Graham on loop: one of the worst flaws in CL. But non-lispy as it may be, loop is extremely general and powerful!

10 loop: a few more examples 10? (loop for i from 10 to 50 by 10 collect i) ( )? (loop for i below 10 when (oddp i) sum i) 25? (loop for x across "foo" collect x) (#\f #\o #\o)? (loop with foo = (a b c d) for i in foo for j from 0 until (eq i c) do (format t "~a = ~a ~%" j i)) 0 = A 1 = B

11 loop: a few more examples 11? (loop for i below 10 if (evenp i) collect i into evens else collect i into odds finally (return (list evens odds))) (( ) ( ))? (loop for value being the hash-values of *my-hash* using (hash-key key) do (format t "~&~a -> ~a" key value))

12 Input and Output 12 Reading and writing is mediated through streams. The symbol t indicates the default stream, the terminal.? (format t "~a is the ~a.~%" 42 "answer") 42 is the answer. nil (read-line stream nil) reads one line of text from stream, returning it as a string. (read stream nil) reads one well-formed s-expression. The second reader argument asks to return nil on end-of-file. (with-open-file (stream "sample.txt" :direction :input) (loop for line = (read-line stream nil) while line do (format t "~a~%" line)))

13 Good Lisp style Bottom-up design In Lisp, you don t just write your program down toward the language, you also build the language up toward your program (Paul Graham; Extend the language to fit your problem! Instead of trying to solve everything with one large function: Build your program by layers of smaller functions. Eliminate repetition and patterns. Related: Define abstraction barriers. Separate the code that uses a given data abstraction from the code that implement that data abstraction. Promotes code re-use: Makes the code shorter and easier to read, debug and maintain. Somewhat more mundane: Adhere to the time-honored 80 column rule. Close multiple parens on the same line. Use Emacs auto-indentation (TAB). 13

14 And now VECTOR SPACE MODELS

15 Vector space model 15 A model for representing data based on a spatial metaphor. Each object is represented as a vector (or point) positioned in a coordinate system. Each coordinate (or dimension or axis) of the space corresponds to some descriptive and measurable property (feature) of the objects. When we want to measure the similarity of two objects, we can measure their geometrical distance/closeness in the model. Vector representations are foundational to a wide range of ML methods.

16 Semantic spaces 16 A semantic space is a vector space model where the points represent words, where the dimensions represent context of use, and where we d like the distance between points to reflect the semantic similarity of the words they represent. AKA distributional semantic models (DSM) and word space models. Some choices and issues: Usage = meaning? How do we define context? How do we define the vector values/weights? How do we measure similarity?

17 The Distributional Hypothesis 17 AKA The Contextual Theory of Meaning Meaning is use. (Wittgenstein, 1953) You shall know a word by the company it keeps. (Firth, 1968) The meaning of entities, and the meaning of grammatical relations among them, is related to the restriction of combinations of these entities relative to other entities. (Harris, 1968) He was feeling seriously hung over after drinking too many shots of retawerif at the party last night.

18 Distributional methods 18 Distributional view on lexical semantics. The idea: Record contexts across large collections of texts (corpora) to characterize word meaning. Motivation: Can compare the meaning of words by comparing their contexts. No need for prior knowledge! Each word o i represented by a tuple (vector) of features f 1,..., f n, where each f j records some property of the observed contexts of o i. But before we start looking at how to compare the feature vectors, we first need to define context and word.

19 Defining context 19 Let s say we want to extract features for the target bread in: I bake bread for breakfast. Context windows Context = neighborhood of ±n words left/right of the focus word. Bag-of-Words (BoW); ignoring the linear ordering of the words. Features: {I, bake, for, breakfast} Grammatical context Context = the grammatical relations to other words. Intuition: When words combine in a construction they often impose semantic constraints on each-other. Requires deeper linguistic analysis than simple BoW approaches. Features: {dir_obj(bake), prep_for(breakfast)}

20 Defining context (cont d) 20 What is a word? Tokenization: breaking text up into words or other meaningful units. Different levels of abstraction and morphological normalization: Stop-words What to do with case, numbers, punctuation, compounds,...? Full-form words vs. stemming vs. lemmatization... It s a common strategy to filter out closed-class words or function words by using a so-called stop-list. The idea is that only content words are relevant. Example: The programmer s programs had been programmed. Full-forms: the programmer s programs had been programmed. Lemmas: the programmer s program have be program. W/ stop-list: programmer program program Stems: program program program

21 Different contexts different similarities 21 What do we mean by similar? The type of context dictates the type of semantic similarity. Relatedness vs. sameness. Or domain vs. content. Similarity in domain : {car, road, gas, service, traffic, driver, license} Similarity in content: {car, train, bicycle, truck, vehicle, airplane, buss} While broader definitions of context (e.g. sentence-level BoW) tend to give clues for domain-based relatedness, more fine-grained grammatical contexts give clues for content-based similarity.

22 Feature vectors 22 A vector space model is defined by a system of n dimensions objects are represented as real valued vectors in the space R n. Our observations of words in context must be encoded numerically: Each context feature is mapped to a dimension j [1, n]. For a given word, the value of a given feature is its number of co-occurrences for the corresponding context across our corpus. Let the set of n features describing the lexical contexts of a word o i be represented as a feature vector F(o i ) = f i = f i1,..., f in. For example, assume that the ith word is cake and the jth feature is OBJ_OF(bake), then fij = f (cake, OBJ_OF(bake)) = 4 would mean that we have observed cake to be the object of the verb bake in our corpus 4 times.

23 Word context association 23 We want our feature vectors to reflect which contexts are the most salient or relevant for each word. Problem: Raw co-occurrence frequencies alone are not good indicators of relevance. Consider the noun wine as a direct object of the verbs buy and pour: f (wine, OBJ_OF(buy)) = 14 f (wine, OBJ_OF(pour)) = 8... but the feature OBJ_OF(buy) seems more indicative of the semantics of wine than OBJ_OF(buy). Solution: Weight the counts by an association function, normalizing our observed frequencies for chance co-occurrence. There s a range of different association measures in use, and most take the form of a statistical test of dependence; e.g. pointwise mutual information, log odds ratio, the t-test, log likelihood,...

24 Pointwise Mutual Information 24 Defines the association between a feature f and an observation o as a likelihood ratio of their joint probability and the product of their marginal probabilities: P(f, o) I (f, o) = log 2 P(f )P(o) = log P(f )P(o f ) 2 P(f )P(o) P(o f ) = log 2 P(o) Perfect independence: P(f, o) = P(f )P(o) and I (f, o) = 0. Perfect dependence: If f and o always occur together then P(o f ) = 1 and I (f, o) = log 2 1/P(o). A smaller marginal probability P(o) leads to a larger association score I (f, o). Overestimates the correlation of rare events.

25 The Log Odds Ratio 25 Measures the magnitude of association between an observed object o and a feature f independently of their marginal probabilities: log θ(f, o) = log P(f, o)/p(f, o) P( f, o)/p( f, o) θ(f, o) expresses how much the chance of observing o increases when the feature f is present. log θ(f, o) > 0 means the probability of seeing o increases when f is present. log θ = 0 indicates distributional independence.

26 Negative Correlations 26 Negatively correlated pairs (f, o) are usually ignored when measuring word context associations (e.g. if log θ(f, o) < 0). Unreliable estimates about negative correlations in sparse data. Both unobserved or negatively correlated co-occurrence pairs are assumed to have zero association. We will use X = { x 1,..., x k } to denote the set of association vectors that results from applying the association weighting. That is, x i = A (f i1 ),..., A (f in ), where for example A = log θ (i.e. the log odds ratio).

27 Euclidean distance 27 Vector space models let us compute the semantic similarity of words in terms of spatial proximity. So how do we do that then? One standard metric is the Euclidean distance: d( x, y) = n ( x i y i ) 2 Computes the length (or norm) of the difference of the vectors. The Euclidean norm of a vector is: i=1 x = n i=1 x2 i = x x Intuitive interpretation: The distance between two points corresponds to the length of a straight line connecting them.

28 Euclidean distance and length bias 28 However, a potential problem with Euclidean distance is that it is very sensitive to extreme values and the length of the vectors. As vectors of words with different frequencies will tend to have different length, the frequency will also affect the similarity judgment.

29 Overcoming length bias by normalization 29 Note that, although our association weighting to some degree already normalizes the differences in frequency, words with initially long frequency vectors, will also tend to have longer association vectors. One way to reduce effect of frequency / length is to first normalize all our vectors to have unit length, i.e.: x = 1 (Can be achieved by simply dividing each element by the length.)

30 Cosine similarity Another way to deal with length bias: use the cosine measure. Computes similarity as a function of the angle between the vectors: cos( x, y) = Constant range between 0 and 1. i x i y i x i 2 y i 2 i i = x y x y Avoids the arbitrary scaling caused by dimensionality, frequency or the range of the association measure A. As the angle between the vectors shortens, the cosine approaches 1. When applied to normalized vectors, the cosine can be simplified to the dot product alone: n cos( x, y) = x y = x i y i i=1 The same relative rank order as the Euclidean distance for unit vectors! 30

31 Next Week 31 More on vector space models Dealing with sparse vectors Computing neighbor relations in the semantic space Representing classes Representing class membership Classification algorithms KNN-classification / c-means, etc. Reading: The chapter Vector Space Classification (sections ) in Manning, Raghavan & Schütze (2008);

32 31 Firth, J. R. (1968). A synopsis of linguistic theory. In F. R. Palmer (Ed.), Selected papers of j. r. firth: Longman. Harris, Z. S. (1968). Mathematical structures of language. New York: Wiley. Wittgenstein, L. (1953). Philosophical investigations. Oxford: Blackwell.

Curriculum Map. Discipline: Computer Science Course: C++

Curriculum Map. Discipline: Computer Science Course: C++ Curriculum Map Discipline: Computer Science Course: C++ August/September: How can computer programs make problem solving easier and more efficient? In what order does a computer execute the lines of code

More information

Search Engines. Stephen Shaw <[email protected]> 18th of February, 2014. Netsoc

Search Engines. Stephen Shaw <stesh@netsoc.tcd.ie> 18th of February, 2014. Netsoc Search Engines Stephen Shaw Netsoc 18th of February, 2014 Me M.Sc. Artificial Intelligence, University of Edinburgh Would recommend B.A. (Mod.) Computer Science, Linguistics, French,

More information

Chapter 15 Functional Programming Languages

Chapter 15 Functional Programming Languages Chapter 15 Functional Programming Languages Introduction - The design of the imperative languages is based directly on the von Neumann architecture Efficiency (at least at first) is the primary concern,

More information

dm106 TEXT MINING FOR CUSTOMER RELATIONSHIP MANAGEMENT: AN APPROACH BASED ON LATENT SEMANTIC ANALYSIS AND FUZZY CLUSTERING

dm106 TEXT MINING FOR CUSTOMER RELATIONSHIP MANAGEMENT: AN APPROACH BASED ON LATENT SEMANTIC ANALYSIS AND FUZZY CLUSTERING dm106 TEXT MINING FOR CUSTOMER RELATIONSHIP MANAGEMENT: AN APPROACH BASED ON LATENT SEMANTIC ANALYSIS AND FUZZY CLUSTERING ABSTRACT In most CRM (Customer Relationship Management) systems, information on

More information

Functional Programming. Functional Programming Languages. Chapter 14. Introduction

Functional Programming. Functional Programming Languages. Chapter 14. Introduction Functional Programming Languages Chapter 14 Introduction Functional programming paradigm History Features and concepts Examples: Lisp ML 1 2 Functional Programming Functional Programming Languages The

More information

Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval

Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval Information Retrieval INFO 4300 / CS 4300! Retrieval models Older models» Boolean retrieval» Vector Space model Probabilistic Models» BM25» Language models Web search» Learning to Rank Search Taxonomy!

More information

Manifold Learning Examples PCA, LLE and ISOMAP

Manifold Learning Examples PCA, LLE and ISOMAP Manifold Learning Examples PCA, LLE and ISOMAP Dan Ventura October 14, 28 Abstract We try to give a helpful concrete example that demonstrates how to use PCA, LLE and Isomap, attempts to provide some intuition

More information

Symbol Tables. Introduction

Symbol Tables. Introduction Symbol Tables Introduction A compiler needs to collect and use information about the names appearing in the source program. This information is entered into a data structure called a symbol table. The

More information

Lecture 9. Semantic Analysis Scoping and Symbol Table

Lecture 9. Semantic Analysis Scoping and Symbol Table Lecture 9. Semantic Analysis Scoping and Symbol Table Wei Le 2015.10 Outline Semantic analysis Scoping The Role of Symbol Table Implementing a Symbol Table Semantic Analysis Parser builds abstract syntax

More information

Moving from CS 61A Scheme to CS 61B Java

Moving from CS 61A Scheme to CS 61B Java Moving from CS 61A Scheme to CS 61B Java Introduction Java is an object-oriented language. This document describes some of the differences between object-oriented programming in Scheme (which we hope you

More information

TF-IDF. David Kauchak cs160 Fall 2009 adapted from: http://www.stanford.edu/class/cs276/handouts/lecture6-tfidf.ppt

TF-IDF. David Kauchak cs160 Fall 2009 adapted from: http://www.stanford.edu/class/cs276/handouts/lecture6-tfidf.ppt TF-IDF David Kauchak cs160 Fall 2009 adapted from: http://www.stanford.edu/class/cs276/handouts/lecture6-tfidf.ppt Administrative Homework 3 available soon Assignment 2 available soon Popular media article

More information

Mining a Corpus of Job Ads

Mining a Corpus of Job Ads Mining a Corpus of Job Ads Workshop Strings and Structures Computational Biology & Linguistics Jürgen Jürgen Hermes Hermes Sprachliche Linguistic Data Informationsverarbeitung Processing Institut Department

More information

THREE DIMENSIONAL GEOMETRY

THREE DIMENSIONAL GEOMETRY Chapter 8 THREE DIMENSIONAL GEOMETRY 8.1 Introduction In this chapter we present a vector algebra approach to three dimensional geometry. The aim is to present standard properties of lines and planes,

More information

Syntax Check of Embedded SQL in C++ with Proto

Syntax Check of Embedded SQL in C++ with Proto Proceedings of the 8 th International Conference on Applied Informatics Eger, Hungary, January 27 30, 2010. Vol. 2. pp. 383 390. Syntax Check of Embedded SQL in C++ with Proto Zalán Szűgyi, Zoltán Porkoláb

More information

Recognition. Sanja Fidler CSC420: Intro to Image Understanding 1 / 28

Recognition. Sanja Fidler CSC420: Intro to Image Understanding 1 / 28 Recognition Topics that we will try to cover: Indexing for fast retrieval (we still owe this one) History of recognition techniques Object classification Bag-of-words Spatial pyramids Neural Networks Object

More information

Programming Languages CIS 443

Programming Languages CIS 443 Course Objectives Programming Languages CIS 443 0.1 Lexical analysis Syntax Semantics Functional programming Variable lifetime and scoping Parameter passing Object-oriented programming Continuations Exception

More information

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS 1 AND ALGORITHMS Chiara Renso KDD-LAB ISTI- CNR, Pisa, Italy WHAT IS CLUSTER ANALYSIS? Finding groups of objects such that the objects in a group will be similar

More information

COMPUTATIONAL DATA ANALYSIS FOR SYNTAX

COMPUTATIONAL DATA ANALYSIS FOR SYNTAX COLING 82, J. Horeck~ (ed.j North-Holland Publishing Compa~y Academia, 1982 COMPUTATIONAL DATA ANALYSIS FOR SYNTAX Ludmila UhliFova - Zva Nebeska - Jan Kralik Czech Language Institute Czechoslovak Academy

More information

Medical Information Management & Mining. You Chen Jan,15, 2013 [email protected]

Medical Information Management & Mining. You Chen Jan,15, 2013 You.chen@vanderbilt.edu Medical Information Management & Mining You Chen Jan,15, 2013 [email protected] 1 Trees Building Materials Trees cannot be used to build a house directly. How can we transform trees to building materials?

More information

NLP Lab Session Week 3 Bigram Frequencies and Mutual Information Scores in NLTK September 16, 2015

NLP Lab Session Week 3 Bigram Frequencies and Mutual Information Scores in NLTK September 16, 2015 NLP Lab Session Week 3 Bigram Frequencies and Mutual Information Scores in NLTK September 16, 2015 Starting a Python and an NLTK Session Open a Python 2.7 IDLE (Python GUI) window or a Python interpreter

More information

BUILDING A PREDICTIVE MODEL AN EXAMPLE OF A PRODUCT RECOMMENDATION ENGINE

BUILDING A PREDICTIVE MODEL AN EXAMPLE OF A PRODUCT RECOMMENDATION ENGINE BUILDING A PREDICTIVE MODEL AN EXAMPLE OF A PRODUCT RECOMMENDATION ENGINE Alex Lin Senior Architect Intelligent Mining [email protected] Outline Predictive modeling methodology k-nearest Neighbor

More information

Phase 2 of the D4 Project. Helmut Schmid and Sabine Schulte im Walde

Phase 2 of the D4 Project. Helmut Schmid and Sabine Schulte im Walde Statistical Verb-Clustering Model soft clustering: Verbs may belong to several clusters trained on verb-argument tuples clusters together verbs with similar subcategorization and selectional restriction

More information

Signal Processing First Lab 01: Introduction to MATLAB. 3. Learn a little about advanced programming techniques for MATLAB, i.e., vectorization.

Signal Processing First Lab 01: Introduction to MATLAB. 3. Learn a little about advanced programming techniques for MATLAB, i.e., vectorization. Signal Processing First Lab 01: Introduction to MATLAB Pre-Lab and Warm-Up: You should read at least the Pre-Lab and Warm-up sections of this lab assignment and go over all exercises in the Pre-Lab section

More information

QUERYING THE COMPONENT DATA OF A GRAPHICAL CADASTRAL DATABASE USING VISUAL LISP PROGRAM

QUERYING THE COMPONENT DATA OF A GRAPHICAL CADASTRAL DATABASE USING VISUAL LISP PROGRAM University 1 Decembrie 1918 of Alba Iulia RevCAD 16/2014 QUERYING THE COMPONENT DATA OF A GRAPHICAL CADASTRAL DATABASE USING VISUAL LISP PROGRAM Caius DIDULESCU, Associate Professor PhD eng. - Faculty

More information

TECHNOLOGY Computer Programming II Grade: 9-12 Standard 2: Technology and Society Interaction

TECHNOLOGY Computer Programming II Grade: 9-12 Standard 2: Technology and Society Interaction Standard 2: Technology and Society Interaction Technology and Ethics Analyze legal technology issues and formulate solutions and strategies that foster responsible technology usage. 1. Practice responsible

More information

Chapter 4 One Dimensional Kinematics

Chapter 4 One Dimensional Kinematics Chapter 4 One Dimensional Kinematics 41 Introduction 1 4 Position, Time Interval, Displacement 41 Position 4 Time Interval 43 Displacement 43 Velocity 3 431 Average Velocity 3 433 Instantaneous Velocity

More information

Homework 4 Statistics W4240: Data Mining Columbia University Due Tuesday, October 29 in Class

Homework 4 Statistics W4240: Data Mining Columbia University Due Tuesday, October 29 in Class Problem 1. (10 Points) James 6.1 Problem 2. (10 Points) James 6.3 Problem 3. (10 Points) James 6.5 Problem 4. (15 Points) James 6.7 Problem 5. (15 Points) James 6.10 Homework 4 Statistics W4240: Data Mining

More information

Statistical Validation and Data Analytics in ediscovery. Jesse Kornblum

Statistical Validation and Data Analytics in ediscovery. Jesse Kornblum Statistical Validation and Data Analytics in ediscovery Jesse Kornblum Administrivia Silence your mobile Interactive talk Please ask questions 2 Outline Introduction Big Questions What Makes Things Similar?

More information

Adaptive Context-sensitive Analysis for JavaScript

Adaptive Context-sensitive Analysis for JavaScript Adaptive Context-sensitive Analysis for JavaScript Shiyi Wei and Barbara G. Ryder Department of Computer Science Virginia Tech Blacksburg, VA, USA {wei, ryder}@cs.vt.edu Abstract Context sensitivity is

More information

ADVANCED SCHOOL OF SYSTEMS AND DATA STUDIES (ASSDAS) PROGRAM: CTech in Computer Science

ADVANCED SCHOOL OF SYSTEMS AND DATA STUDIES (ASSDAS) PROGRAM: CTech in Computer Science ADVANCED SCHOOL OF SYSTEMS AND DATA STUDIES (ASSDAS) PROGRAM: CTech in Computer Science Program Schedule CTech Computer Science Credits CS101 Computer Science I 3 MATH100 Foundations of Mathematics and

More information

Search and Information Retrieval

Search and Information Retrieval Search and Information Retrieval Search on the Web 1 is a daily activity for many people throughout the world Search and communication are most popular uses of the computer Applications involving search

More information

I PUC - Computer Science. Practical s Syllabus. Contents

I PUC - Computer Science. Practical s Syllabus. Contents I PUC - Computer Science Practical s Syllabus Contents Topics 1 Overview Of a Computer 1.1 Introduction 1.2 Functional Components of a computer (Working of each unit) 1.3 Evolution Of Computers 1.4 Generations

More information

Programming Exercise 3: Multi-class Classification and Neural Networks

Programming Exercise 3: Multi-class Classification and Neural Networks Programming Exercise 3: Multi-class Classification and Neural Networks Machine Learning November 4, 2011 Introduction In this exercise, you will implement one-vs-all logistic regression and neural networks

More information

Decision Support System Methodology Using a Visual Approach for Cluster Analysis Problems

Decision Support System Methodology Using a Visual Approach for Cluster Analysis Problems Decision Support System Methodology Using a Visual Approach for Cluster Analysis Problems Ran M. Bittmann School of Business Administration Ph.D. Thesis Submitted to the Senate of Bar-Ilan University Ramat-Gan,

More information

Data Deduplication in Slovak Corpora

Data Deduplication in Slovak Corpora Ľ. Štúr Institute of Linguistics, Slovak Academy of Sciences, Bratislava, Slovakia Abstract. Our paper describes our experience in deduplication of a Slovak corpus. Two methods of deduplication a plain

More information

The C Programming Language course syllabus associate level

The C Programming Language course syllabus associate level TECHNOLOGIES The C Programming Language course syllabus associate level Course description The course fully covers the basics of programming in the C programming language and demonstrates fundamental programming

More information

Text Mining in JMP with R Andrew T. Karl, Senior Management Consultant, Adsurgo LLC Heath Rushing, Principal Consultant and Co-Founder, Adsurgo LLC

Text Mining in JMP with R Andrew T. Karl, Senior Management Consultant, Adsurgo LLC Heath Rushing, Principal Consultant and Co-Founder, Adsurgo LLC Text Mining in JMP with R Andrew T. Karl, Senior Management Consultant, Adsurgo LLC Heath Rushing, Principal Consultant and Co-Founder, Adsurgo LLC 1. Introduction A popular rule of thumb suggests that

More information

Glossary of Object Oriented Terms

Glossary of Object Oriented Terms Appendix E Glossary of Object Oriented Terms abstract class: A class primarily intended to define an instance, but can not be instantiated without additional methods. abstract data type: An abstraction

More information

Selected Topics in Applied Machine Learning: An integrating view on data analysis and learning algorithms

Selected Topics in Applied Machine Learning: An integrating view on data analysis and learning algorithms Selected Topics in Applied Machine Learning: An integrating view on data analysis and learning algorithms ESSLLI 2015 Barcelona, Spain http://ufal.mff.cuni.cz/esslli2015 Barbora Hladká [email protected]

More information

Statistical Machine Translation: IBM Models 1 and 2

Statistical Machine Translation: IBM Models 1 and 2 Statistical Machine Translation: IBM Models 1 and 2 Michael Collins 1 Introduction The next few lectures of the course will be focused on machine translation, and in particular on statistical machine translation

More information

1 Introduction. 2 An Interpreter. 2.1 Handling Source Code

1 Introduction. 2 An Interpreter. 2.1 Handling Source Code 1 Introduction The purpose of this assignment is to write an interpreter for a small subset of the Lisp programming language. The interpreter should be able to perform simple arithmetic and comparisons

More information

Bachelor of Games and Virtual Worlds (Programming) Subject and Course Summaries

Bachelor of Games and Virtual Worlds (Programming) Subject and Course Summaries First Semester Development 1A On completion of this subject students will be able to apply basic programming and problem solving skills in a 3 rd generation object-oriented programming language (such as

More information

COMP 356 Programming Language Structures Notes for Chapter 4 of Concepts of Programming Languages Scanning and Parsing

COMP 356 Programming Language Structures Notes for Chapter 4 of Concepts of Programming Languages Scanning and Parsing COMP 356 Programming Language Structures Notes for Chapter 4 of Concepts of Programming Languages Scanning and Parsing The scanner (or lexical analyzer) of a compiler processes the source program, recognizing

More information

VISUAL GUIDE to. RX Scripting. for Roulette Xtreme - System Designer 2.0

VISUAL GUIDE to. RX Scripting. for Roulette Xtreme - System Designer 2.0 VISUAL GUIDE to RX Scripting for Roulette Xtreme - System Designer 2.0 UX Software - 2009 TABLE OF CONTENTS INTRODUCTION... ii What is this book about?... iii How to use this book... iii Time to start...

More information

3. INNER PRODUCT SPACES

3. INNER PRODUCT SPACES . INNER PRODUCT SPACES.. Definition So far we have studied abstract vector spaces. These are a generalisation of the geometric spaces R and R. But these have more structure than just that of a vector space.

More information

v w is orthogonal to both v and w. the three vectors v, w and v w form a right-handed set of vectors.

v w is orthogonal to both v and w. the three vectors v, w and v w form a right-handed set of vectors. 3. Cross product Definition 3.1. Let v and w be two vectors in R 3. The cross product of v and w, denoted v w, is the vector defined as follows: the length of v w is the area of the parallelogram with

More information

Data quality in Accounting Information Systems

Data quality in Accounting Information Systems Data quality in Accounting Information Systems Comparing Several Data Mining Techniques Erjon Zoto Department of Statistics and Applied Informatics Faculty of Economy, University of Tirana Tirana, Albania

More information

Simple Language Models for Spam Detection

Simple Language Models for Spam Detection Simple Language Models for Spam Detection Egidio Terra Faculty of Informatics PUC/RS - Brazil Abstract For this year s Spam track we used classifiers based on language models. These models are used to

More information

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE STATISTICS The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE VS. INFERENTIAL STATISTICS Descriptive To organize,

More information

CSC2420 Fall 2012: Algorithm Design, Analysis and Theory

CSC2420 Fall 2012: Algorithm Design, Analysis and Theory CSC2420 Fall 2012: Algorithm Design, Analysis and Theory Allan Borodin November 15, 2012; Lecture 10 1 / 27 Randomized online bipartite matching and the adwords problem. We briefly return to online algorithms

More information

Web Document Clustering

Web Document Clustering Web Document Clustering Lab Project based on the MDL clustering suite http://www.cs.ccsu.edu/~markov/mdlclustering/ Zdravko Markov Computer Science Department Central Connecticut State University New Britain,

More information

Gamma Distribution Fitting

Gamma Distribution Fitting Chapter 552 Gamma Distribution Fitting Introduction This module fits the gamma probability distributions to a complete or censored set of individual or grouped data values. It outputs various statistics

More information

2) Write in detail the issues in the design of code generator.

2) Write in detail the issues in the design of code generator. COMPUTER SCIENCE AND ENGINEERING VI SEM CSE Principles of Compiler Design Unit-IV Question and answers UNIT IV CODE GENERATION 9 Issues in the design of code generator The target machine Runtime Storage

More information

TECHNICAL UNIVERSITY OF CRETE DATA STRUCTURES FILE STRUCTURES

TECHNICAL UNIVERSITY OF CRETE DATA STRUCTURES FILE STRUCTURES TECHNICAL UNIVERSITY OF CRETE DEPT OF ELECTRONIC AND COMPUTER ENGINEERING DATA STRUCTURES AND FILE STRUCTURES Euripides G.M. Petrakis http://www.intelligence.tuc.gr/~petrakis Chania, 2007 E.G.M. Petrakis

More information

Adaption of Statistical Email Filtering Techniques

Adaption of Statistical Email Filtering Techniques Adaption of Statistical Email Filtering Techniques David Kohlbrenner IT.com Thomas Jefferson High School for Science and Technology January 25, 2007 Abstract With the rise of the levels of spam, new techniques

More information

Projektgruppe. Categorization of text documents via classification

Projektgruppe. Categorization of text documents via classification Projektgruppe Steffen Beringer Categorization of text documents via classification 4. Juni 2010 Content Motivation Text categorization Classification in the machine learning Document indexing Construction

More information

Introduction to Bayesian Classification (A Practical Discussion) Todd Holloway Lecture for B551 Nov. 27, 2007

Introduction to Bayesian Classification (A Practical Discussion) Todd Holloway Lecture for B551 Nov. 27, 2007 Introduction to Bayesian Classification (A Practical Discussion) Todd Holloway Lecture for B551 Nov. 27, 2007 Naïve Bayes Components ML vs. MAP Benefits Feature Preparation Filtering Decay Extended Examples

More information

Technical Report. The KNIME Text Processing Feature:

Technical Report. The KNIME Text Processing Feature: Technical Report The KNIME Text Processing Feature: An Introduction Dr. Killian Thiel Dr. Michael Berthold [email protected] [email protected] Copyright 2012 by KNIME.com AG

More information

PL / SQL Basics. Chapter 3

PL / SQL Basics. Chapter 3 PL / SQL Basics Chapter 3 PL / SQL Basics PL / SQL block Lexical units Variable declarations PL / SQL types Expressions and operators PL / SQL control structures PL / SQL style guide 2 PL / SQL Block Basic

More information

Basic Lisp Operations

Basic Lisp Operations Basic Lisp Operations BLO-1 Function invocation It is an S-expression just another list! ( function arg1 arg2... argn) First list item is the function prefix notation The other list items are the arguments

More information

CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES. From Exploratory Factor Analysis Ledyard R Tucker and Robert C.

CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES. From Exploratory Factor Analysis Ledyard R Tucker and Robert C. CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES From Exploratory Factor Analysis Ledyard R Tucker and Robert C MacCallum 1997 180 CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES In

More information

Variable Base Interface

Variable Base Interface Chapter 6 Variable Base Interface 6.1 Introduction Finite element codes has been changed a lot during the evolution of the Finite Element Method, In its early times, finite element applications were developed

More information

Lexical analysis FORMAL LANGUAGES AND COMPILERS. Floriano Scioscia. Formal Languages and Compilers A.Y. 2015/2016

Lexical analysis FORMAL LANGUAGES AND COMPILERS. Floriano Scioscia. Formal Languages and Compilers A.Y. 2015/2016 Master s Degree Course in Computer Engineering Formal Languages FORMAL LANGUAGES AND COMPILERS Lexical analysis Floriano Scioscia 1 Introductive terminological distinction Lexical string or lexeme = meaningful

More information

Simple maths for keywords

Simple maths for keywords Simple maths for keywords Adam Kilgarriff Lexical Computing Ltd [email protected] Abstract We present a simple method for identifying keywords of one corpus vs. another. There is no one-sizefits-all

More information

Lesson 15 - Fill Cells Plugin

Lesson 15 - Fill Cells Plugin 15.1 Lesson 15 - Fill Cells Plugin This lesson presents the functionalities of the Fill Cells plugin. Fill Cells plugin allows the calculation of attribute values of tables associated with cell type layers.

More information

Discrete Math in Computer Science Homework 7 Solutions (Max Points: 80)

Discrete Math in Computer Science Homework 7 Solutions (Max Points: 80) Discrete Math in Computer Science Homework 7 Solutions (Max Points: 80) CS 30, Winter 2016 by Prasad Jayanti 1. (10 points) Here is the famous Monty Hall Puzzle. Suppose you are on a game show, and you

More information

Linear Algebra Notes for Marsden and Tromba Vector Calculus

Linear Algebra Notes for Marsden and Tromba Vector Calculus Linear Algebra Notes for Marsden and Tromba Vector Calculus n-dimensional Euclidean Space and Matrices Definition of n space As was learned in Math b, a point in Euclidean three space can be thought of

More information

PGR Computing Programming Skills

PGR Computing Programming Skills PGR Computing Programming Skills Dr. I. Hawke 2008 1 Introduction The purpose of computing is to do something faster, more efficiently and more reliably than you could as a human do it. One obvious point

More information

Engineering Problem Solving and Excel. EGN 1006 Introduction to Engineering

Engineering Problem Solving and Excel. EGN 1006 Introduction to Engineering Engineering Problem Solving and Excel EGN 1006 Introduction to Engineering Mathematical Solution Procedures Commonly Used in Engineering Analysis Data Analysis Techniques (Statistics) Curve Fitting techniques

More information

STATISTICA Formula Guide: Logistic Regression. Table of Contents

STATISTICA Formula Guide: Logistic Regression. Table of Contents : Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

More information

CS 241 Data Organization Coding Standards

CS 241 Data Organization Coding Standards CS 241 Data Organization Coding Standards Brooke Chenoweth University of New Mexico Spring 2016 CS-241 Coding Standards All projects and labs must follow the great and hallowed CS-241 coding standards.

More information

w ki w kj k=1 w2 ki k=1 w2 kj F. Aiolli - Sistemi Informativi 2007/2008

w ki w kj k=1 w2 ki k=1 w2 kj F. Aiolli - Sistemi Informativi 2007/2008 RSV of the Vector Space Model The matching function RSV is the cosine of the angle between the two vectors RSV(d i,q j )=cos(α)= n k=1 w kiw kj d i 2 q j 2 = n k=1 n w ki w kj n k=1 w2 ki k=1 w2 kj 8 Note

More information

Magit-Popup User Manual

Magit-Popup User Manual Magit-Popup User Manual for version 2.5 Jonas Bernoulli Copyright (C) 2015-2016 Jonas Bernoulli You can redistribute this document and/or modify it under the terms of the GNU General

More information

9.4. The Scalar Product. Introduction. Prerequisites. Learning Style. Learning Outcomes

9.4. The Scalar Product. Introduction. Prerequisites. Learning Style. Learning Outcomes The Scalar Product 9.4 Introduction There are two kinds of multiplication involving vectors. The first is known as the scalar product or dot product. This is so-called because when the scalar product of

More information

Measurement and Metrics Fundamentals. SE 350 Software Process & Product Quality

Measurement and Metrics Fundamentals. SE 350 Software Process & Product Quality Measurement and Metrics Fundamentals Lecture Objectives Provide some basic concepts of metrics Quality attribute metrics and measurements Reliability, validity, error Correlation and causation Discuss

More information

Some programming experience in a high-level structured programming language is recommended.

Some programming experience in a high-level structured programming language is recommended. Python Programming Course Description This course is an introduction to the Python programming language. Programming techniques covered by this course include modularity, abstraction, top-down design,

More information

Effective Data Retrieval Mechanism Using AML within the Web Based Join Framework

Effective Data Retrieval Mechanism Using AML within the Web Based Join Framework Effective Data Retrieval Mechanism Using AML within the Web Based Join Framework Usha Nandini D 1, Anish Gracias J 2 1 [email protected] 2 [email protected] Abstract A vast amount of assorted

More information

5.1 Database Schema. 5.1.1 Schema Generation in SQL

5.1 Database Schema. 5.1.1 Schema Generation in SQL 5.1 Database Schema The database schema is the complete model of the structure of the application domain (here: relational schema): relations names of attributes domains of attributes keys additional constraints

More information

Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words

Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words , pp.290-295 http://dx.doi.org/10.14257/astl.2015.111.55 Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words Irfan

More information

Parsing Technology and its role in Legacy Modernization. A Metaware White Paper

Parsing Technology and its role in Legacy Modernization. A Metaware White Paper Parsing Technology and its role in Legacy Modernization A Metaware White Paper 1 INTRODUCTION In the two last decades there has been an explosion of interest in software tools that can automate key tasks

More information

Programming Languages

Programming Languages Programming Languages Programming languages bridge the gap between people and machines; for that matter, they also bridge the gap among people who would like to share algorithms in a way that immediately

More information

UNDERSTANDING THE TWO-WAY ANOVA

UNDERSTANDING THE TWO-WAY ANOVA UNDERSTANDING THE e have seen how the one-way ANOVA can be used to compare two or more sample means in studies involving a single independent variable. This can be extended to two independent variables

More information

Log-Linear Models. Michael Collins

Log-Linear Models. Michael Collins Log-Linear Models Michael Collins 1 Introduction This note describes log-linear models, which are very widely used in natural language processing. A key advantage of log-linear models is their flexibility:

More information

Clustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016

Clustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016 Clustering Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016 1 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate data attributes with

More information

Introduction to Statistical Computing in Microsoft Excel By Hector D. Flores; [email protected], and Dr. J.A. Dobelman

Introduction to Statistical Computing in Microsoft Excel By Hector D. Flores; hflores@rice.edu, and Dr. J.A. Dobelman Introduction to Statistical Computing in Microsoft Excel By Hector D. Flores; [email protected], and Dr. J.A. Dobelman Statistics lab will be mainly focused on applying what you have learned in class with

More information

15.062 Data Mining: Algorithms and Applications Matrix Math Review

15.062 Data Mining: Algorithms and Applications Matrix Math Review .6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop

More information

Neovision2 Performance Evaluation Protocol

Neovision2 Performance Evaluation Protocol Neovision2 Performance Evaluation Protocol Version 3.0 4/16/2012 Public Release Prepared by Rajmadhan Ekambaram [email protected] Dmitry Goldgof, Ph.D. [email protected] Rangachar Kasturi, Ph.D.

More information

Introduction to Information Retrieval http://informationretrieval.org

Introduction to Information Retrieval http://informationretrieval.org Introduction to Information Retrieval http://informationretrieval.org IIR 6&7: Vector Space Model Hinrich Schütze Institute for Natural Language Processing, University of Stuttgart 2011-08-29 Schütze:

More information

Course Title: Software Development

Course Title: Software Development Course Title: Software Development Unit: Customer Service Content Standard(s) and Depth of 1. Analyze customer software needs and system requirements to design an information technology-based project plan.

More information

09336863931 : provid.ir

09336863931 : provid.ir provid.ir 09336863931 : NET Architecture Core CSharp o Variable o Variable Scope o Type Inference o Namespaces o Preprocessor Directives Statements and Flow of Execution o If Statement o Switch Statement

More information

Section 1.1. Introduction to R n

Section 1.1. Introduction to R n The Calculus of Functions of Several Variables Section. Introduction to R n Calculus is the study of functional relationships and how related quantities change with each other. In your first exposure to

More information

SAP InfiniteInsight Explorer Analytical Data Management v7.0

SAP InfiniteInsight Explorer Analytical Data Management v7.0 End User Documentation Document Version: 1.0-2014-11 SAP InfiniteInsight Explorer Analytical Data Management v7.0 User Guide CUSTOMER Table of Contents 1 Welcome to this Guide... 3 1.1 What this Document

More information

Analysis of Binary Search algorithm and Selection Sort algorithm

Analysis of Binary Search algorithm and Selection Sort algorithm Analysis of Binary Search algorithm and Selection Sort algorithm In this section we shall take up two representative problems in computer science, work out the algorithms based on the best strategy to

More information

USC Marshall School of Business Marshall Information Services

USC Marshall School of Business Marshall Information Services USC Marshall School of Business Marshall Information Services Excel Dashboards and Reports The goal of this workshop is to create a dynamic "dashboard" or "Report". A partial image of what we will be creating

More information

C++ Programming Language

C++ Programming Language C++ Programming Language Lecturer: Yuri Nefedov 7th and 8th semesters Lectures: 34 hours (7th semester); 32 hours (8th semester). Seminars: 34 hours (7th semester); 32 hours (8th semester). Course abstract

More information

MAS 500 Intelligence Tips and Tricks Booklet Vol. 1

MAS 500 Intelligence Tips and Tricks Booklet Vol. 1 MAS 500 Intelligence Tips and Tricks Booklet Vol. 1 1 Contents Accessing the Sage MAS Intelligence Reports... 3 Copying, Pasting and Renaming Reports... 4 To create a new report from an existing report...

More information

Image Compression through DCT and Huffman Coding Technique

Image Compression through DCT and Huffman Coding Technique International Journal of Current Engineering and Technology E-ISSN 2277 4106, P-ISSN 2347 5161 2015 INPRESSCO, All Rights Reserved Available at http://inpressco.com/category/ijcet Research Article Rahul

More information

II. RELATED WORK. Sentiment Mining

II. RELATED WORK. Sentiment Mining Sentiment Mining Using Ensemble Classification Models Matthew Whitehead and Larry Yaeger Indiana University School of Informatics 901 E. 10th St. Bloomington, IN 47408 {mewhiteh, larryy}@indiana.edu Abstract

More information