Statistical Pattern Recognition (CE-725) Department of Computer Engineering Sharif University of Technology



Similar documents
ANOVA Notes Page 1. Analysis of Variance for a One-Way Classification of Data

Statistical Intrusion Detector with Instance-Based Learning

APPENDIX III THE ENVELOPE PROPERTY

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. x, where. = y - ˆ " 1

Maintenance Scheduling of Distribution System with Optimal Economy and Reliability

IDENTIFICATION OF THE DYNAMICS OF THE GOOGLE S RANKING ALGORITHM. A. Khaki Sedigh, Mehdi Roudaki

Numerical Methods with MS Excel

Simple Linear Regression

Settlement Prediction by Spatial-temporal Random Process

Relaxation Methods for Iterative Solution to Linear Systems of Equations

Curve Fitting and Solution of Equation

n. We know that the sum of squares of p independent standard normal variables has a chi square distribution with p degrees of freedom.

The Gompertz-Makeham distribution. Fredrik Norström. Supervisor: Yuri Belyaev

ADAPTATION OF SHAPIRO-WILK TEST TO THE CASE OF KNOWN MEAN

Preprocess a planar map S. Given a query point p, report the face of S containing p. Goal: O(n)-size data structure that enables O(log n) query time.

6.7 Network analysis Introduction. References - Network analysis. Topological analysis

1. The Time Value of Money

Bayesian Network Representation

Speeding up k-means Clustering by Bootstrap Averaging

Abraham Zaks. Technion I.I.T. Haifa ISRAEL. and. University of Haifa, Haifa ISRAEL. Abstract

Average Price Ratios

Classic Problems at a Glance using the TVM Solver

CHAPTER 2. Time Value of Money 6-1

SHAPIRO-WILK TEST FOR NORMALITY WITH KNOWN MEAN

On Error Detection with Block Codes

CIS603 - Artificial Intelligence. Logistic regression. (some material adopted from notes by M. Hauskrecht) CIS603 - AI. Supervised learning

The Digital Signature Scheme MQQ-SIG

A New Bayesian Network Method for Computing Bottom Event's Structural Importance Degree using Jointree

Credibility Premium Calculation in Motor Third-Party Liability Insurance

Optimal multi-degree reduction of Bézier curves with constraints of endpoints continuity

An IG-RS-SVM classifier for analyzing reviews of E-commerce product

The impact of service-oriented architecture on the scheduling algorithm in cloud computing

Robust Realtime Face Recognition And Tracking System

where p is the centroid of the neighbors of p. Consider the eigenvector problem

Chapter Eight. f : R R

RUSSIAN ROULETTE AND PARTICLE SPLITTING

A Study of Unrelated Parallel-Machine Scheduling with Deteriorating Maintenance Activities to Minimize the Total Completion Time

MDM 4U PRACTICE EXAMINATION

ECONOMIC CHOICE OF OPTIMUM FEEDER CABLE CONSIDERING RISK ANALYSIS. University of Brasilia (UnB) and The Brazilian Regulatory Agency (ANEEL), Brazil

Integrating Production Scheduling and Maintenance: Practical Implications

Security Analysis of RAPP: An RFID Authentication Protocol based on Permutation

ISyE 512 Chapter 7. Control Charts for Attributes. Instructor: Prof. Kaibo Liu. Department of Industrial and Systems Engineering UW-Madison

Forecasting Trend and Stock Price with Adaptive Extended Kalman Filter Data Fusion

Applications of Support Vector Machine Based on Boolean Kernel to Spam Filtering

CH. V ME256 STATICS Center of Gravity, Centroid, and Moment of Inertia CENTER OF GRAVITY AND CENTROID

Loss Distribution Generation in Credit Portfolio Modeling

The simple linear Regression Model

Green Master based on MapReduce Cluster

T = 1/freq, T = 2/freq, T = i/freq, T = n (number of cash flows = freq n) are :

An Approach to Evaluating the Computer Network Security with Hesitant Fuzzy Information

A COMPARATIVE STUDY BETWEEN POLYCLASS AND MULTICLASS LANGUAGE MODELS

Chapter 3. AMORTIZATION OF LOAN. SINKING FUNDS R =

Fractal-Structured Karatsuba`s Algorithm for Binary Field Multiplication: FK

M. Salahi, F. Mehrdoust, F. Piri. CVaR Robust Mean-CVaR Portfolio Optimization

Optimal replacement and overhaul decisions with imperfect maintenance and warranty contracts

On formula to compute primes and the n th prime

STOCHASTIC approximation algorithms have several

A DISTRIBUTED REPUTATION BROKER FRAMEWORK FOR WEB SERVICE APPLICATIONS

Constrained Cubic Spline Interpolation for Chemical Engineering Applications

The analysis of annuities relies on the formula for geometric sums: r k = rn+1 1 r 1. (2.1) k=0

Dynamic Two-phase Truncated Rayleigh Model for Release Date Prediction of Software

Suspicious Transaction Detection for Anti-Money Laundering

AP Statistics 2006 Free-Response Questions Form B

α 2 α 1 β 1 ANTISYMMETRIC WAVEFUNCTIONS: SLATER DETERMINANTS (08/24/14)

Study on prediction of network security situation based on fuzzy neutral network

Regression Analysis. 1. Introduction

Statistical Decision Theory: Concepts, Methods and Applications. (Special topics in Probabilistic Graphical Models)

A particle swarm optimization to vehicle routing problem with fuzzy demands

Cyber Journals: Multidisciplinary Journals in Science and Technology, Journal of Selected Areas in Telecommunications (JSAT), January Edition, 2011

Analysis of one-dimensional consolidation of soft soils with non-darcian flow caused by non-newtonian liquid

AN ALGORITHM ABOUT PARTNER SELECTION PROBLEM ON CLOUD SERVICE PROVIDER BASED ON GENETIC

A Parallel Transmission Remote Backup System

IP Network Topology Link Prediction Based on Improved Local Information Similarity Algorithm

Compressive Sensing over Strongly Connected Digraph and Its Application in Traffic Monitoring

ANALYTICAL MODEL FOR TCP FILE TRANSFERS OVER UMTS. Janne Peisa Ericsson Research Jorvas, Finland. Michael Meyer Ericsson Research, Germany

On Cheeger-type inequalities for weighted graphs

Banking (Early Repayment of Housing Loans) Order,

A particle Swarm Optimization-based Framework for Agile Software Effort Estimation

ROULETTE-TOURNAMENT SELECTION FOR SHRIMP DIET FORMULATION PROBLEM

Approximation Algorithms for Scheduling with Rejection on Two Unrelated Parallel Machines

Automated Event Registration System in Corporation

DIGITAL AUDIO WATERMARKING: SURVEY

Reinsurance and the distribution of term insurance claims

Transcription:

I The Name of God, The Compassoate, The ercful Name: Problems' eys Studet ID#:. Statstcal Patter Recogto (CE-725) Departmet of Computer Egeerg Sharf Uversty of Techology Fal Exam Soluto - Sprg 202 (50 mutes 00+5 pots) ) Basc Cocepts (5 pots) a) True or false questos: For each of the followg parts, specfy that the gve statemet s true or false. I the case of true, provde a bref explaato, otherwse, propose a couter example. -.. The erel (x, x 2 ) s symmetrc, where x ad x j are the feature vectors for -th ad j-th examples. -.. Ay decso boudary that we get from a geeratve model wth class-codtoal Gaussa dstrbutos could prcple be reproduced wth a SV ad a polyomal erel. -.. After trag a SV, we ca dscard all examples whch are ot support vectors ad ca stll classfy ew examples. b) What would happe f the actvato fucto at hdde ad output layer LP be lear? Expla why ths smpler actvato fucto s ot ormally used LPs although t would smplfy ad accelerate the calculatos for the bac-propagato algorthm? a) - True. (x,x 2) φ(x ) φ(x 2) φ(x 2) φ(x ) (x 2,x ). - True. Sce class-codtoal Gaussas always yeld quadratc decso boudares, they ca be reproduced wth a SV wth erel of degree less tha or equal to two. - True. Oly support vectors affect the boudary. b) If we use ths actvato fucto, LP becomes le Perceptro (a lear classfer). To be more specfc, we ca wrte the weghts from the put to the hdde layer as a matrx W HI, the weghts from the hdde to output layer as W OH, ad the bas at the hdde ad output layer as vectors b H ad b O. Usg vector ad matrx multplcato, the hdde actvatos ca be wrtte as H b H + W HI * I. Ad, the output actvatos ca be wrtte as O b O + W OH * H b O + W OH * ( b H + W HI * I ) ( b O + W OH * b H ) + (W OH * W HI ) * I b OI + W OI * I; b OI b O + W OH * b H, W OI W OH * W HI Therefore, the same fucto ca be computed wth a smpler etwor, wth o hdde layer, usg the weghts W OI ad bas b OI. 2) Support Vector aches (20 pots) Cosder the followg data pots ad labels: Data pot Label x (,) x 2 (2,) x 3 (2,0) x 4 (,2) - x 5 (2,2) - x 6 (,-3) - Suppose that we use followg embeddg fucto to separate two classes by a large marg classfer.

2 2 2 2 φ (x) (x + x,x x ) a) Fd the support vectors, vsually. b) Fd the parameters of the SV classfer (w, w 0, λ ). c) Itroduce a embeddg fucto from 2-D to -D that separates orgal data pots learly. a) Trasformed data pots are: Orgal Trasformed x (,) x ' (2,0) x 2 (2,) x '2 (5,) x 3 (2,0) x '3 (4,2) x 4 (,2) x '4 (5,-) x 5 (2,2) x '5 (8,0) x 6 (,-3) x '6 (0,4) The, x2, x4 ad x6 are support vectors. b) y(w T x+w 0 ) for all support vectors, the w(-,), b5. y λ 0, the λ λ 2 λ 3 0. I addto w T T T λyx, the λ(5,) λ2(5, ) λ3(0,4) (, ), ad ( λ, λ2, λ 3) (, 4/5, /5). c) The -D embeddg fucto s φ(x,y) y. 3) Graphcal ethods (25 pots) a) Cosder a H wth three odes {S,S 2,S 3 }, outputs {A,B}, tal state probabltes {,0,0}, state trasto probablty matrx A, ad output probablty matrx B. Compute P(O B, O 2 B,, O 200 B) the gve H. b) Gve the followg graphcal model, whch of the followg statemets are true, regardless of the codtoal probablty dstrbutos? b) P(D,H) P(D)P(H) b2) P(A, I) P(A)P(I) b5) P(J,,L) P(J,L)P(,L) b6) P(E,C A,G) P(E A,G)P(C A,G) a) We ca wrte the probablty as: b3) P(A, I G) P(A G)P(I G) A b4) P(J,G F) P(J F)P(G F) I B E J C F L D G N H Ad we have, 0.5 0.25 0.25 0.5 0.5 A 0 0, B 0.5 0.5 0 0 0 P(O B,...,O B) P(O B,...,O B q S )P ( S ) 200 200 200 200 P(O B,...,O B q S ) P(O B,...,O B q S ) 2 P(O B,...,O B q S ) 200 200 200 200 2 200 200 3 0 99 P 200(S ) 2,P 200(S 2 ) P 200(S 3 ) ( P 200(s )) 2 The, P(O B, O 2 B,, O 200 B)2-399 +2-20 (-2-99 ) b) b) True. Because there s o actve trals ay possble paths from D to H (DCGFEIJH, DCGFEIJLH, DCBAEIJH ad DCBAEIJLH). b2) True. Because there s o actve trals ay possble paths from A to I (AEI ad ABCGFEI). b3) False. There s a actve tral o the path ABCGFEI. 200

b4) False: There s a actve tral o the path GCBAEIJ (E s descedat of F). b5) True: Because there s o actve trals ay possble paths from J to (J ad JL). b6) False: There s a actve tral o the path EFGC. 4) Expectato ad axmato (20 pots) Cosder a radom varable x that s categorcal wth possble values {,2,,}. Suppose that x s represeted as a vector such that x() f x taes the -th value, ad x (). The dstrbuto of x s represeted by a mxture of dscrete multomal dstrbutos such that: px ( ) π px ( ) ad px ( ) ( j) j x( j) π deotes the mxg coeffcet for the -th compoet (or the pror probablty that the hdde varable ), ad (j) represets the probablty P(x(j) ). Observed data pots {x },,, derve the E ad steps to estmate π ad (j) for all values of, ad j. The hdde varables are j s. j s a bary varable whch s f x s draw from the j-th dstrbuto. E Step: Step: We have, P(x, θ )P( θ) P( x, θ ) P(x θ) P(x ) π P(x j j ) π j x(j) π j (j) x(l) π j j l j(l) P( X,Z θ ) P(x, θ) P(x, θ)p( θ) P( θ ) π P( θ ) π x ( j) θ θ j ( ) P(x, ) P(x ) P(x, ) P(x ) ( j) Substtutg the above two values the lelhood results : The log lelhood s: x(j) ( (j) ) x(j) (j) P( X,Z θ ) P(x, θ)p( θ) π j π j ( ) L(X,Z θ ) lp(x,z θ ) lπ + x ( j)l ( j) j To estmate π s, fxg j s ad cosderg the costrat Σπ, ad usg Lagraga multpler we must optme the followg objectve fucto: Settg the dfferetato to ero we have, ( l l ) L( π ) L( X,Z θ ) +λ π

To calculate the value of λ we use the fact that L +λ π π π λ λ 0 l π l l l λ. The, Ths equato results l l. We complete estmato by substtutg the value of λ the prevous obtaed equato for π. The fal result s: l l π I the same way, to estmate s, fxg j s ad cosderg the costrat Σ l (l), ad usg Lagraga multpler we must optme the followg objectve fucto: ( l ) L( ) L( X,Z θ ) +λ (l) Settg the dfferetato to ero we have, L x ( j) +λ (j) (j) λ x(j) To calculate the value of λ we use the fact that λ 0 (j) λ l (l) l x l. The, Ths equato results l x l. We complete estmato by substtutg the value of λ the prevous obtaed equato for. The fal result s: x(l) l xj x j (j) l xl 5) Clusterg (5 pots) 0 2 a) Assume we are tryg to cluster the pots 2, 2, 2,..., 2 (a total of + pots where +2 ) usg herarchcal clusterg. We brea tes by combg the two clusters whch the lowest umber resdes. For example, f the dstace betwee clusters A ad B s the same as the dstace betwee clusters C ad D we would chose A ad B as the ext two clusters to combe f m{a,b} < m{c,d} where {A,B} are the set of umbers assged to A ad B. a) If we are usg Euclda dstace, draw a setch of the herarchcal clusterg tree we would obta for each of sgle/complete lage methods. a2) Now assume we are usg the dstace fucto d(p,q) max(p,q)/m(p,q). Whch of the sgle/complete lage methods wll result a dfferet tree from the oe obtaed (a) whe usg ths dstace fucto? If you th that oe or more of these methods wll result a dfferet tree, setch the ew tree as well. b) Cosder the followg algorthm to partto the data pots to clusters:. Calculate the parwse dstace d(p, P j ) betwee every two data pots P ad P j the set of data pots to be clustered ad buld a complete graph o the set of data pots wth edge weghts correspodg dstaces. 2. Geerate the mum Spag Tree of the graph.e. Choose the subset of edges E' wth mmum sum of weghts such that G' (P,E') s a sgle coected tree. N

3. Throw out the - edges wth the heavest weghts to geerate dscoected trees correspodg to the clusters. Idetfy whch of the clusterg algorthms you saw the class correspods to the metoed algorthm. a) All lage methods leads to the same tree show fgure (a). Fg. a Fg. b a2) Sgle l does ot chage. Complete l chages to the graph show Fg. b. b) The clusterg correspods to sgle-l bottom-up clusterg. The edges used to calculate the cluster dstaces for the sgle l bottom up clusterg correspod to the edges of the ST (sce all pots must be clustered, ad the cluster dstace s sgle l ad chooses the m weght edge jog together two so far ucoected clusters). Thus, the heavest edge the tree correspods to the top most clusters, ad so o. 6) Sem-Supervsed Learg (0 pots) Cosder the followg fgure (a) whch cotas labeled (class flled blac crcles class 2 hollow crcles) ad ulabeled (squares) data. We would le to use two methods (re-weghtg ad cotrag) order to utle the ulabeled data whe trag a Gaussa classfer. Fg. a Fg. b x 2 x 2 x x a) How ca we use co-trag ths case (what are the two classfers)? b) We would le to use re-weghtg of ulabeled data to mprove the classfcato performace. Reweghtg method s doe by placg the dashed crcle (show fgure b) o each of the labeled data pots ad coutg the umber of ulabeled data pots that crcle. Next, a Gaussa classfer s ru wth the ew weghts computed. b) To what class (hollow crcles or full crcles) would we assg the ulabeled pot A s we were trag a Gaussa classfer usg oly the labeled data pots (wth o re-weghtg)? b2) To what class (hollow crcles or full crcles) would we assg the ulabeled pot A s we were trag a classfer usg the re-weghtg procedure descrbed above? a) Co-trag parttos the feature space to two separate sets ad uses these sets to costruct depedet classfers. Here, the most atural way s to use oe classfer (a Gaussa) for the x axs ad the secod (aother Gaussa) usg the y axs. b) Hollow class. Note that the hollow pots are much more spread out ad so the Gaussa leared for them wll have a hgher varace. b2) Aga, the hollow class. Re-weghtg wll ot chage the result sce t wll be doe depedetly for each of the two classes, ad wll produce very smlar class ceters to the oes (b) above. Good Luc!