Selected Topics in Applied Machine Learning: An integrating view on data analysis and learning algorithms

Similar documents
BIDM Project. Predicting the contract type for IT/ITES outsourcing contracts

Impact of Boolean factorization as preprocessing methods for classification of Boolean data

Author Gender Identification of English Novels

Knowledge Discovery and Data Mining. Structured vs. Non-Structured Data

Model Selection. Introduction. Model Selection

Experiments in Web Page Classification for Semantic Web

Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words

SAS Software to Fit the Generalized Linear Model

Active Learning SVM for Blogs recommendation

Software Solutions Appendix B. B.1 The R Software

IT services for analyses of various data samples

Decision Trees from large Databases: SLIQ

Clustering Connectionist and Statistical Language Processing

Analyzing Research Data Using Excel

Data, Measurements, Features

Data Mining - Evaluation of Classifiers

: Introduction to Machine Learning Dr. Rita Osadchy

Social Media Mining. Data Mining Essentials

Azure Machine Learning, SQL Data Mining and R

Data Mining Techniques Chapter 6: Decision Trees

Chapter 8. Final Results on Dutch Senseval-2 Test Data

Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning

Car Insurance. Prvák, Tomi, Havri

Microsoft Azure Machine learning Algorithms

Data Mining Algorithms Part 1. Dejan Sarka

Automated Content Analysis of Discussion Transcripts

Maschinelles Lernen mit MATLAB

A chart generator for the Dutch Alpino grammar

Overview. Evaluation Connectionist and Statistical Language Processing. Test and Validation Set. Training and Test Set

Mining the Software Change Repository of a Legacy Telephony System

Pearson's Correlation Tests

Université de Montpellier 2 Hugo Alatrista-Salas : hugo.alatrista-salas@teledetection.fr

1: B asic S imu lati on Modeling

1. Classification problems

How the Computer Translates. Svetlana Sokolova President and CEO of PROMT, PhD.

Natural Language to Relational Query by Using Parsing Compiler

Pentaho Data Mining Last Modified on January 22, 2007

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

How to write a technique essay? A student version. Presenter: Wei-Lun Chao Date: May 17, 2012

Point Biserial Correlation Tests

Data Mining Part 5. Prediction

COMPUTATIONAL DATA ANALYSIS FOR SYNTAX

Transformation of Free-text Electronic Health Records for Efficient Information Retrieval and Support of Knowledge Discovery

Data Mining Yelp Data - Predicting rating stars from review text

Introduction to Learning & Decision Trees

Classification of Titanic Passenger Data and Chances of Surviving the Disaster Data Mining with Weka and Kaggle Competition Data

Model Deployment. Dr. Saed Sayad. University of Toronto

Tutorial Segmentation and Classification

Chapter 19 The Chi-Square Test

Predicting Defaults of Loans using Lending Club s Loan Data

Introduction to time series analysis

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

Segmentation and Classification of Online Chats

Data Mining with Weka

Sense-Tagging Verbs in English and Chinese. Hoa Trang Dang

Introduction to Pattern Recognition

EFFICIENCY OF DECISION TREES IN PREDICTING STUDENT S ACADEMIC PERFORMANCE

Logistic Regression for Spam Filtering

Hybrid Strategies. for better products and shorter time-to-market

Fine-grained German Sentiment Analysis on Social Media

Data Preprocessing. Week 2

Some Computer Organizations and Their Effectiveness. Michael J Flynn. IEEE Transactions on Computers. Vol. c-21, No.

CIRGIRDISCO at RepLab2014 Reputation Dimension Task: Using Wikipedia Graph Structure for Classifying the Reputation Dimension of a Tweet

Decompose Error Rate into components, some of which can be measured on unlabeled data

Framing Business Problems as Data Mining Problems

How To Bet On An Nfl Football Game With A Machine Learning Program

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM

Testing Data-Driven Learning Algorithms for PoS Tagging of Icelandic

A General Approach to Incorporate Data Quality Matrices into Data Mining Algorithms

Univariate Regression

Network Machine Learning Research Group. Intended status: Informational October 19, 2015 Expires: April 21, 2016

Projektgruppe. Categorization of text documents via classification

Sentiment Analysis of Twitter Feeds for the Prediction of Stock Market Movement

BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES

Predicting the Stock Market with News Articles

Probabilistic user behavior models in online stores for recommender systems

Introduction to Logistic Regression

Association Between Variables

Predicting Student Performance by Using Data Mining Methods for Classification

Analysis of WEKA Data Mining Algorithm REPTree, Simple Cart and RandomTree for Classification of Indian News

Experimental methods. Elisabeth Ahlsén Linguistic Methods Course

Identifying At-Risk Students Using Machine Learning Techniques: A Case Study with IS 100

RRSS - Rating Reviews Support System purpose built for movies recommendation

Statistical Data Mining. Practical Assignment 3 Discriminant Analysis and Decision Trees

The Role of Sentence Structure in Recognizing Textual Entailment

MHI3000 Big Data Analytics for Health Care Final Project Report

Environmental Remote Sensing GEOG 2021

Data Mining. Nonlinear Classification

Role of Customer Response Models in Customer Solicitation Center s Direct Marketing Campaign

Analysing Questionnaires using Minitab (for SPSS queries contact -)

Protein Protein Interaction Networks

DATA ANALYSIS II. Matrix Algorithms

Transcription:

Selected Topics in Applied Machine Learning: An integrating view on data analysis and learning algorithms ESSLLI 2015 Barcelona, Spain http://ufal.mff.cuni.cz/esslli2015 Barbora Hladká hladka@ufal.mff.cuni.cz Martin Holub holub@ufal.mff.cuni.cz Charles University in Prague, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics ESSLLI 2015 Hladká & Holub Day 2, page 1/38

Block 2.1 Data analysis (cntnd) Motivation No. 1 We, as students of English, want to understand the following sentences properly He broke down and cried when we talked to him about it. Major cried, jabbing a finger in the direction of one heckler. If we are not sure, we check definitions for the verb cry in a dictionary ESSLLI 2015 Hladká & Holub Day 2, page 2/38

Verb Patterns Recognition CRY -- dictionary definitions ESSLLI 2015 Hladká & Holub Day 2, page 3/38

Verb Patterns Recognition Based on the explanation and the examples of usage, we can recognize the two meanings of cry in the sentences He broke down and cried when we talked to him about it. [1] Major cried, jabbing a finger in the direction of one heckler. [2] ESSLLI 2015 Hladká & Holub Day 2, page 4/38

Verb Patterns Recognition Motivation No. 2 We, as developers of natural language application, need to recognize verb meanings automatically. Verb Patterns Recognition task (VPR) is the computational linguistic task of lexical disambiguation of verbs a lexicon consists of verb usage patterns that correspond to dictionary definitions disambiguation is recognition of the verb usage pattern in a given sentence ESSLLI 2015 Hladká & Holub Day 2, page 5/38

VPR Verb patterns CRY -- Pattern definitions Pattern 1 Explanation Example [Human] cry [no object] [[Human]] weeps usually because [[Human]] is unhappy or in pain His advice to stressful women was: ` If you cry, do n't cry alone. Pattern 4 Explanation Example [Human] cry [THAT-CL WH-CL QUOTE] ({out}) [[Human]] shouts ([QUOTE]) loudly typically, in order to attract attention You can hear them screaming and banging their heads, crying that they want to go home. Pattern 7 Explanation Example [Entity State] cry [{out}] [{for} Action] [no object] [[Entity State]] requires [[Action]] to be taken urgently Identifying areas which cry out for improvement or even simply areas of muddle and misunderstanding, is by no means negative -- rather a spur to action. E.g., the pattern 1 of cry consists of a subject that is supposed to be a Human and of no object. ESSLLI 2015 Hladká & Holub Day 2, page 6/38

VPR Getting examples Examples for the VPR task are the output of annotation. 1 Choosing verbs you are interested in > cry, submit 2 Defining their patterns 3 Collecting sentences with the chosen verbs ESSLLI 2015 Hladká & Holub Day 2, page 7/38

VPR Getting examples 4 Annotating the sentences assign a pattern that fits best the given sentence if you think that no pattern matches the sentence, choose "u" if you do not think that the given word is a verb, choose "x" ESSLLI 2015 Hladká & Holub Day 2, page 8/38

VPR Data Basic statistics CRY SUBMIT instances 250 250 classes 1 4 7 u x u 1 2 4 5 frequency 131 59 13 33 14 7 177 33 12 21 ESSLLI 2015 Hladká & Holub Day 2, page 9/38

VPR Data representation instance feature target id vector pattern morphological morpho-syntactic morpho-syntactic semantic feature feature feature feature family family family familiy (MS) (STA) (MST) (SEM) 129825 0 0 0 0 1................................. 0 0 0 0 7............ For more details, see vpr.handout posted at the course webpage. ESSLLI 2015 Hladká & Holub Day 2, page 10/38

VPR Feature extraction He broke down and cried when we talked to him about it. MF_tense_vbd 1 verb past tense OK MF_3p_verbs 1 third word preceding the verb is verb broke, OK MF_3n_verbs 1 third word following the verb is verb talked, OK......... STA.LEX_prt_none 1 there is no particle dependent on the verb OK STA.LEX_prep_none 1 there is no preposition dependent on the verb OK......... MST.GEN_n_subj 1 nominal subject of the verb OK......... SEM.s.ac 1 verb s subject is Abstract he, KO......... tp 1 true target pattern ESSLLI 2015 Hladká & Holub Day 2, page 11/38

VPR Details on annotation Annotation by 1 expert and 3 annotators verb target number of baseline avg human perplexity kappa classes instances (%) accuracy (%) 2 H(P) CRY 1,4,7,u,x 250 52,4 92,2 3,5 0,84 SUBMIT 1,2,4,5,u 250 70,8 94,1 2,6 0,88 baseline is accuracy of the most frequent classifier avg human accuracy is average accuracy of 3 annotators with respect to the expert s annotation perplexity of a target class kappa is Fleiss kappa of inter-annotator agreement ESSLLI 2015 Hladká & Holub Day 2, page 12/38

Questions? ESSLLI 2015 Hladká & Holub Day 2, page 13/38

Data analysis (cntnd) Deeper understanding the task by statistical view on the data We exploit the data in order to make prediction of the target value. Build intuition and understanding for both the task and the data Ask questions and search for answers in the data What values do we see What associations do we see Do plotting and summarizing ESSLLI 2015 Hladká & Holub Day 2, page 14/38

Analyzing distributions of values Feature frequency Feature frequency fr(a j ) = #{x i x j i > 0} where A j is the j-th feature, x i is the feature vector of the i-th instance, and x j i is the value of A j in x i. ESSLLI 2015 Hladká & Holub Day 2, page 15/38

Analyzing distributions of values Feature frequency > examples <- read.csv("cry.development.csv", sep="\t") > c <- examples[,-c(1,ncol(examples))] > length(names(c)) # get the number of features [1] 363 # compute feature frequencies using the fr function > ff <- apply(c, 2, fr.feature) > table(sort(ff)) 0 1 2 3 4 5 6 7 8 9 10 12 14 15 16 20 181 47 26 12 9 3 5 6 4 4 7 1 3 1 2 1 21 24 25 26 28 29 30 31 32 34 35 39 41 42 46 48 49 3 1 1 2 1 1 3 5 2 2 1 1 1 1 1 3 1 51 55 64 65 77 82 89 92 98 138 151 176 181 217 218 245 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 247 248 249 1 1 2 ESSLLI 2015 Hladká & Holub Day 2, page 16/38

Analyzing distributions of values Feature frequency VPR task: cry (feature frequency cry.r) fr(a) 0 50 100 200 features ESSLLI 2015 Hladká & Holub Day 2, page 17/38

Analyzing distributions of values Feature frequency VPR task: cry (feature frequency cry.r) fr(a) > 50 50 100 150 200 250 STA.GEN_any_obj MF.2p_verbs MF.tense_vbg MF.tense_vbd MF.3n_nominal MF.2p_nominal MF.3p_nominal MF.2n_nominal MF.1p_nominal MF.1n_adverbial MST.GEN_n_subj STA.GEN_n_subj MST.LEX_prep_none STA.LEX_prep_none STA.LEX_prt_none MST.LEX_prt_none STA.LEX_mark_none STA.GEN_c_subj STA.LEX_prepc_none MST.LEX_prepc_none MST.LEX_mark_none ESSLLI 2015 Hladká & Holub Day 2, page 18/38

Analyzing distributions of values Entropy # compute entropy using the entropy function > e <- apply(c, 2, entropy) > table(sort(round(e,2)) 0 0.04 0.07 0.09 0.12 0.14 0.16 0.18 0.2 0.22 0.24 0.28 0.31 0.33 181 49 27 13 9 4 5 6 4 4 7 1 3 1 0.34 0.4 0.42 0.46 0.47 0.48 0.51 0.52 0.53 0.54 0.55 0.56 0.57 0.58 2 1 3 1 1 2 1 1 3 5 3 1 2 1 0.62 0.64 0.65 0.69 0.71 0.73 0.76 0.82 0.83 0.85 0.88 0.89 0.91 0.94 1 1 1 1 4 1 1 1 1 1 1 1 1 1 0.95 0.97 0.99 1 3 1 ESSLLI 2015 Hladká & Holub Day 2, page 19/38

Analyzing distributions of values Entropy H(A) 0.0 0.2 0.4 0.6 0.8 1.0 VPR task: cry (entropy cry.r) features ESSLLI 2015 Hladká & Holub Day 2, page 20/38

Analyzing distributions of values Entropy VPR task: cry (entropy cry.r) H(A) > 0.5 0.5 0.6 0.7 0.8 0.9 MST.GEN_infinitive MF.3p_verbs STA.GEN_advcl MF.1p_verbs STA.GEN_dobj MST.GEN_subj_pl MF.3n_verbs STA.GEN_infinitive SEM.s.H MST.GEN_any_obj MF.tense_vb STA.GEN_any_obj SEM.s.ac STA.LEX_prep_none MF.2p_verbs MST.GEN_advmod STA.GEN_advmod MF.tense_vbd MST.LEX_prep_none MF.tense_vbg MF.1n_adverbial MF.3n_nominal MF.2p_nominal STA.GEN_n_subj MF.2n_nominal MST.GEN_n_subj MF.3p_nominal MF.1p_nominal ESSLLI 2015 Hladká & Holub Day 2, page 21/38

Association between feature and target value Pearson contingency coefficient VPR task: cry (pearson contingency coefficient vpr.r) pcc(a,p)>0.3 0.3 0.5 0.7 MF.2n_nominal MF.1p_verbs MST.LEX_prep_to SEM.s.act MF.3p_adjective MF.tense_vb MF.3n_adjective MST.LEX_prep_in STA.LEX_prep_in MF.2n_verbs MST.GEN_dobj MF.1n_nominal MF.3n_nominal SEM.s.domain SEM.s.inst MF.1p_be MF.1p_wh.pronoun STA.LEX_prep_to STA.GEN_ccomp MST.GEN_n_subj MST.GEN_any_obj STA.GEN_n_subj STA.GEN_any_obj MF.2p_verbs SEM.s.sem MF.tense_vbg MF.2n_to SEM.o.geom SEM.s.L MF.tense_vbd MF.1n_adverbial MST.LEX_prep_none STA.LEX_prep_none MST.LEX_prt_none MST.LEX_prt_out MF.2n_adverbial STA.LEX_prt_none STA.LEX_prt_out MST.LEX_prep_for STA.LEX_prep_for ESSLLI 2015 Hladká & Holub Day 2, page 22/38

Association between feature and target value Conditional entropy ESSLLI 2015 Hladká & Holub Day 2, page 23/38

Association between feature and target value Conditional entropy # compute conditional entropy using the entropy.cond function ce <- apply(c, 2, entropy.cond, y=examples$tp) table(sort(round(ce,2)) 0 0.04 0.07 0.09 0.12 0.14 0.16 0.18 0.2 0.22 0.24 0.28 0.31 0.33 181 49 27 13 9 4 5 6 4 4 7 1 3 1 0.34 0.4 0.42 0.46 0.47 0.48 0.51 0.52 0.53 0.54 0.55 0.56 0.57 0.58 2 1 3 1 1 2 1 1 3 5 3 1 2 1 0.62 0.64 0.65 0.69 0.71 0.73 0.76 0.82 0.83 0.85 0.88 0.89 0.91 0.94 1 1 1 1 4 1 1 1 1 1 1 1 1 1 0.95 0.97 0.99 1 3 1 ESSLLI 2015 Hladká & Holub Day 2, page 24/38

Association between feature and target value Conditional entropy VPR task: cry (entropy cry.r) H(P A) 0.0 0.2 0.4 0.6 0.8 features ESSLLI 2015 Hladká & Holub Day 2, page 25/38

Association between feature and target value Conditional entropy VPR task: cry (entropy cry.r) H(P A) > 0.5 0.5 0.6 0.7 0.8 0.9 MST.GEN_infinitive MF.3p_verbs STA.GEN_advcl MF.1p_verbs STA.GEN_dobj MST.GEN_subj_pl MF.3n_verbs STA.GEN_infinitive SEM.s.H MST.GEN_any_obj MF.tense_vb STA.GEN_any_obj SEM.s.ac STA.LEX_prep_none MF.2p_verbs MST.GEN_advmod STA.GEN_advmod MF.tense_vbd MST.LEX_prep_none MF.tense_vbg MF.1n_adverbial MF.3n_nominal MF.2p_nominal STA.GEN_n_subj MF.2n_nominal MST.GEN_n_subj MF.3p_nominal MF.1p_nominal ESSLLI 2015 Hladká & Holub Day 2, page 26/38

What values do we see Analyzing distributions of values Filter out uneffective features from the CRY data > examples <- read.csv("cry.development.csv", sep="\t") > n <- nrow(examples) > ## remove id and target class tp > c.0 <- examples[,-c(1,ncol(examples))] > ## remove features with 0s only > c.1 <- c.0[,colsums(as.matrix(sapply(c.0, as.numeric)))!= 0] > ## remove features with 1s only > c.2 <- c.1[,colsums(as.matrix(sapply(c.1, as.numeric)))!= n] > ## remove column duplicates > c <- data.frame(t(unique(t(as.matrix(c.2))))) > ncol(c.0) # get the number of input features [1] 363 > ncol(c) # get the number of effective features [1] 168 ESSLLI 2015 Hladká & Holub Day 2, page 27/38

Methods for basic data exploration Confusion matrix Confusion matrices are contingency tables that display results of classification algorithms/annotations. They enables to perform error/difference analysis. Example Two annotators A 1 and A 2 annotated 50 sentences with cry. A 1 A 2 1 4 7 u x 1 24 3 1 3 0 4 3 3 0 1 1 7 0 2 4 0 1 u 1 0 0 0 0 x 0 1 0 0 2 ESSLLI 2015 Hladká & Holub Day 2, page 28/38

What agreement would be reached by chance? Example 1 Assume two annotators (A 1, A 2 ), two classes (t 1, t 2 ), and the following distribution: Then the best possible agreement is t 1 t 2 A 1 50 % 50 % A 2 50 % 50 % ESSLLI 2015 Hladká & Holub Day 2, page 29/38

What agreement would be reached by chance? Example 1 Assume two annotators (A 1, A 2 ), two classes (t 1, t 2 ), and the following distribution: Then t 1 t 2 A 1 50 % 50 % A 2 50 % 50 % the best possible agreement is 100 % the worst possible agreement is ESSLLI 2015 Hladká & Holub Day 2, page 29/38

What agreement would be reached by chance? Example 1 Assume two annotators (A 1, A 2 ), two classes (t 1, t 2 ), and the following distribution: Then t 1 t 2 A 1 50 % 50 % A 2 50 % 50 % the best possible agreement is 100 % the worst possible agreement is 0 % the agreement-by-chance would be ESSLLI 2015 Hladká & Holub Day 2, page 29/38

What agreement would be reached by chance? Example 1 Assume two annotators (A 1, A 2 ), two classes (t 1, t 2 ), and the following distribution: Then t 1 t 2 A 1 50 % 50 % A 2 50 % 50 % the best possible agreement is 100 % the worst possible agreement is 0 % the agreement-by-chance would be 50 % ESSLLI 2015 Hladká & Holub Day 2, page 29/38

What agreement would be reached by chance? Example 2 Assume two annotators (A 1, A 2 ), two classes (t 1, t 2 ), and the following distribution: Then the best possible agreement is t 1 t 2 A 1 90 % 10 % A 2 90 % 10 % ESSLLI 2015 Hladká & Holub Day 2, page 30/38

What agreement would be reached by chance? Example 2 Assume two annotators (A 1, A 2 ), two classes (t 1, t 2 ), and the following distribution: Then t 1 t 2 A 1 90 % 10 % A 2 90 % 10 % the best possible agreement is 100 % the worst possible agreement is ESSLLI 2015 Hladká & Holub Day 2, page 30/38

What agreement would be reached by chance? Example 2 Assume two annotators (A 1, A 2 ), two classes (t 1, t 2 ), and the following distribution: Then t 1 t 2 A 1 90 % 10 % A 2 90 % 10 % the best possible agreement is 100 % the worst possible agreement is 80 % the agreement-by-chance would be ESSLLI 2015 Hladká & Holub Day 2, page 30/38

What agreement would be reached by chance? Example 2 Assume two annotators (A 1, A 2 ), two classes (t 1, t 2 ), and the following distribution: Then t 1 t 2 A 1 90 % 10 % A 2 90 % 10 % the best possible agreement is 100 % the worst possible agreement is 80 % the agreement-by-chance would be 82 % ESSLLI 2015 Hladká & Holub Day 2, page 30/38

What agreement would be reached by chance? Example 3 Assume two annotators (A 1, A 2 ), two classes (t 1, t 2 ), and the following distribution: Then the best possible agreement is t 1 t 2 A 1 90 % 10 % A 2 80 % 20 % ESSLLI 2015 Hladká & Holub Day 2, page 31/38

What agreement would be reached by chance? Example 3 Assume two annotators (A 1, A 2 ), two classes (t 1, t 2 ), and the following distribution: Then t 1 t 2 A 1 90 % 10 % A 2 80 % 20 % the best possible agreement is 90 % the worst possible agreement is ESSLLI 2015 Hladká & Holub Day 2, page 31/38

What agreement would be reached by chance? Example 3 Assume two annotators (A 1, A 2 ), two classes (t 1, t 2 ), and the following distribution: Then t 1 t 2 A 1 90 % 10 % A 2 80 % 20 % the best possible agreement is 90 % the worst possible agreement is 70 % the agreement-by-chance would be ESSLLI 2015 Hladká & Holub Day 2, page 31/38

What agreement would be reached by chance? Example 3 Assume two annotators (A 1, A 2 ), two classes (t 1, t 2 ), and the following distribution: Then t 1 t 2 A 1 90 % 10 % A 2 80 % 20 % the best possible agreement is 90 % the worst possible agreement is 70 % the agreement-by-chance would be 74 % ESSLLI 2015 Hladká & Holub Day 2, page 31/38

Example in R The situation from Example 3 can be simulated in R # N will be the sample size > N = 10^6 # two annotators will annotate randomly > A1 = sample(c(rep(1, 0.9*N), rep(0, 0.1*N))) > A2 = sample(c(rep(1, 0.8*N), rep(0, 0.2*N))) # percentage of their observed agreement > mean(a1 == A2) [1] 0.740112 # exact calculation -- just for comparison > 0.9*0.8 + 0.1*0.2 [1] 0.74 ESSLLI 2015 Hladká & Holub Day 2, page 32/38

Cohen s kappa Cohen s kappa was introduced by Jacob Cohen in 1960. κ = Pr(a) Pr(e) 1 Pr(e) Pr(a) is the relative observed agreement among annotators = percentage of agreements in the sample Pr(e) is the hypothetical probability of chance agreement = probability of their agreement if they annotated randomly κ > 0 if the proportion of agreement obtained exceeds the proportion of agreement expected by chance Limitations Cohen s kappa measures agreement between two annotators only for more annotators you should use Fleiss kappa see http://en.wikipedia.org/wiki/fleiss _kappa ESSLLI 2015 Hladká & Holub Day 2, page 33/38

Cohen s kappa A 1 A 2 1 4 7 u x 1 24 3 1 3 0 4 3 3 0 1 1 7 0 2 4 0 1 u 1 0 0 0 0 x 0 1 0 0 2 Cohen s kappa:? ESSLLI 2015 Hladká & Holub Day 2, page 34/38

Homework 2.1 Work with the SUBMIT data 1 Filter out uneffective features from the data using the filtering rules that we applied to the CRY data. 2 Draw a plot of the conditional entropy H(P A) for the effective features. Then focus on the features for which H(P A) 0.5. Comment what you see on the plots. ESSLLI 2015 Hladká & Holub Day 2, page 35/38

VPR vs. MOV comparison MOV VPR type of task regression classification getting examples by collecting annotation # of examples 100,000 250 # of features 32 363 categorical/binary 29/18 0/363 numerical 3 0 output values 1 5 5 discrete categories ESSLLI 2015 Hladká & Holub Day 2, page 36/38

Block 2.2 Introductory remarks on VPR classifiers ESSLLI 2015 Hladká & Holub Day 2, page 37/38

Example Decision Tree classifier cry Trained using a cross-validation fold ESSLLI 2015 Hladká & Holub Day 2, page 38/38