Vector Quantization and Clustering



Similar documents
Neural Networks Lesson 5 - Cluster Analysis

Clustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016

Clustering UE 141 Spring 2013

Social Media Mining. Data Mining Essentials

Cluster Analysis. Isabel M. Rodrigues. Lisboa, Instituto Superior Técnico

Chapter 7. Cluster Analysis

Unsupervised learning: Clustering

Machine Learning using MapReduce

SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING

. Learn the number of classes and the structure of each class using similarity between unlabeled training patterns

Information Retrieval and Web Search Engines

UNSUPERVISED MACHINE LEARNING TECHNIQUES IN GENOMICS

ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING)

There are a number of different methods that can be used to carry out a cluster analysis; these methods can be classified as follows:

Clustering Artificial Intelligence Henry Lin. Organizing data into clusters such that there is

Clustering. Adrian Groza. Department of Computer Science Technical University of Cluj-Napoca

Distances, Clustering, and Classification. Heatmaps

School Class Monitoring System Based on Audio Signal Processing

Clustering. Data Mining. Abraham Otero. Data Mining. Agenda

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS

SPEAKER IDENTIFICATION FROM YOUTUBE OBTAINED DATA

Data Mining Project Report. Document Clustering. Meryem Uzun-Per

Medical Information Management & Mining. You Chen Jan,15, 2013 You.chen@vanderbilt.edu

Unsupervised Learning and Data Mining. Unsupervised Learning and Data Mining. Clustering. Supervised Learning. Supervised Learning

Cluster Analysis. Alison Merikangas Data Analysis Seminar 18 November 2009

Hierarchical Cluster Analysis Some Basics and Algorithms

Environmental Remote Sensing GEOG 2021

How To Cluster

STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and

Standardization and Its Effects on K-Means Clustering Algorithm

Emotion Detection from Speech

Spot me if you can: Uncovering spoken phrases in encrypted VoIP conversations

B490 Mining the Big Data. 2 Clustering

Statistical Databases and Registers with some datamining

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

The SPSS TwoStep Cluster Component

Myanmar Continuous Speech Recognition System Based on DTW and HMM

Establishing the Uniqueness of the Human Voice for Security Applications

Data Mining Clustering (2) Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining

Segmentation of stock trading customers according to potential value

Ericsson T18s Voice Dialing Simulator

How To Solve The Cluster Algorithm

Cluster analysis Cosmin Lazar. COMO Lab VUB

Solutions to Exam in Speech Signal Processing EN2300

Introduction to Clustering

Accuplacer Arithmetic Study Guide

Movie Classification Using k-means and Hierarchical Clustering

Segmentation & Clustering

Principles of Data Mining by Hand&Mannila&Smyth

A comparison of various clustering methods and algorithms in data mining

SoSe 2014: M-TANI: Big Data Analytics

Comparison and Analysis of Various Clustering Methods in Data mining On Education data set Using the weak tool

Example: Document Clustering. Clustering: Definition. Notion of a Cluster can be Ambiguous. Types of Clusterings. Hierarchical Clustering

Data Mining. Cluster Analysis: Advanced Concepts and Algorithms

The Science and Art of Market Segmentation Using PROC FASTCLUS Mark E. Thompson, Forefront Economics Inc, Beaverton, Oregon

An Overview of Knowledge Discovery Database and Data mining Techniques

Chapter ML:XI (continued)

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

Linear Threshold Units

Cluster Analysis: Advanced Concepts

CS 2750 Machine Learning. Lecture 1. Machine Learning. CS 2750 Machine Learning.

An Arabic Text-To-Speech System Based on Artificial Neural Networks

CLUSTERING LARGE DATA SETS WITH MIXED NUMERIC AND CATEGORICAL VALUES *

Cluster Analysis: Basic Concepts and Algorithms

Voice---is analog in character and moves in the form of waves. 3-important wave-characteristics:

Cluster Analysis: Basic Concepts and Algorithms

CLUSTER ANALYSIS FOR SEGMENTATION

PERFORMANCE ANALYSIS OF CLUSTERING ALGORITHMS IN DATA MINING IN WEKA

Cluster Analysis: Basic Concepts and Methods

Clustering DHS , ,

Distances between Clustering, Hierarchical Clustering

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

Personalized Hierarchical Clustering

Colour Image Segmentation Technique for Screen Printing

Probabilistic Latent Semantic Analysis (plsa)

EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set

K-Means Clustering Tutorial

Data Preprocessing. Week 2

HES-SO Master of Science in Engineering. Clustering. Prof. Laura Elena Raileanu HES-SO Yverdon-les-Bains (HEIG-VD)

An Introduction to Cluster Analysis for Data Mining

How To Make A Credit Risk Model For A Bank Account

Decision Support System Methodology Using a Visual Approach for Cluster Analysis Problems

A successful market segmentation initiative answers the following critical business questions: * How can we a. Customer Status.

THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS

Steven M. Ho!and. Department of Geology, University of Georgia, Athens, GA

A Relevant Document Information Clustering Algorithm for Web Search Engine

Using Data Mining for Mobile Communication Clustering and Characterization

118 One hundred Eighteen

Hadoop SNS. renren.com. Saturday, December 3, 11

W6.B.1. FAQs CS535 BIG DATA W6.B If the distance of the point is additionally less than the tight distance T 2, remove it from the original set

K-Means Cluster Analysis. Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1

Constrained Clustering of Territories in the Context of Car Insurance

How To Identify Noisy Variables In A Cluster

Automatic parameter regulation for a tracking system with an auto-critical function

Statistical Machine Learning

Transcription:

Vector Quantization and Clustering Introduction K-means clustering Clustering issues Hierarchical clustering Divisive (top-down) clustering Agglomerative (bottom-up) clustering Applications to speech recognition 6.345 Automatic Speech Recognition Vector Quantization & Clustering 1

Acoustic Modelling Signal Representation Vector Quantization Waveform Feature Vectors Symbols Signal representation produces feature vector sequence Multi-dimensional sequence can be processed by: Methods that directly model continuous space Quantizing and modelling of discrete symbols Main advantages and disadvantages of quantization: Reduced storage and computation costs Potential loss of information due to quantization 6.345 Automatic Speech Recognition Vector Quantization & Clustering 2

Vector Quantization (VQ) Used in signal compression, speech and image coding More efficient information transmission than scalar quantization (can achieve less that 1 bit/parameter) Used for discrete acoustic modelling since early 1980s Based on standard clustering algorithms: Individual cluster centroids are called codewords Set of cluster centroids is called a codebook Basic VQ is K-means clustering Binary VQ is a form of top-down clustering (used for efficient quantization) 6.345 Automatic Speech Recognition Vector Quantization & Clustering 3

VQ & Clustering Clustering is an example of unsupervised learning Number and form of classes {C i } unknown Available data samples {x i } are unlabeled Useful for discovery of data structure before classification or tuning or adaptation of classifiers Results strongly depend on the clustering algorithm 6.345 Automatic Speech Recognition Vector Quantization & Clustering 4

Acoustic Modelling Example [t] - [n] [I] 6.345 Automatic Speech Recognition Vector Quantization & Clustering 5

Clustering Issues What defines a cluster? Is there a prototype representing each cluster? What defines membership in a cluster? What is the distance metric, d(x, y)? How many clusters are there? Is the number of clusters picked before clustering? How well do the clusters represent unseen data? How is a new data point assigned to a cluster? 6.345 Automatic Speech Recognition Vector Quantization & Clustering 6

K-Means Clustering Used to group data into K clusters, {C 1,...,C K } Each cluster is represented by mean of assigned data Iterative algorithm converges to a local optimum: Select K initial cluster means, {µ 1,...,µ K } Iterate until stopping criterion is satisfied: 1. Assign each data sample to the closest cluster xɛc i, d(x,µ i ) d(x,µ j ), i j 2. Update K means from assigned samples µ i = E(x), xɛc i, 1 i K Nearest neighbor quantizer used for unseen data 6.345 Automatic Speech Recognition Vector Quantization & Clustering 7

K-Means Example: K = 3 Random selection of 3 data samples for initial means Euclidean distance metric between means and samples 0 3 1 4 2 5 6.345 Automatic Speech Recognition Vector Quantization & Clustering 8

K-Means Properties Usually used with a Euclidean distance metric d(x,µ i )= x µ i 2 =(x µ i ) t (x µ i ) The total distortion, D, is the sum of squared error D = K i=1 xɛc i x µ i 2 D decreases between n th and n +1 st iteration D(n +1) D(n) Also known as Isodata, or generalized Lloyd algorithm Similarities with Expectation-Maximization (EM) algorithm for learning parameters from unlabeled data 6.345 Automatic Speech Recognition Vector Quantization & Clustering 9

K-Means Clustering: Initialization K-means converges to a local optimum Global optimum is not guaranteed Initial choices can influence final result K =3 Initial K-means can be chosen randomly Clustering can be repeated multiple times Hierarchical strategies often used to seed clusters Top-down (divisive) (e.g., binary VQ) Bottom-up (agglomerative) 6.345 Automatic Speech Recognition Vector Quantization & Clustering 10

K-Means Clustering: Stopping Criterion Many criterion can be used to terminate K-means : No changes in sample assignments Maximum number of iterations exceeded Change in total distortion, D, falls below a threshold 1 D(n +1) D(n) <T 6.345 Automatic Speech Recognition Vector Quantization & Clustering 11

Acoustic Clustering Example 12 clusters, seeded with agglomerative clustering Spectral representation based on auditory-model 6.345 Automatic Speech Recognition Vector Quantization & Clustering 12

Clustering Issues: Number of Clusters In general, the number of clusters is unknown K =2 K=4 Dependent on clustering criterion, space, computation or distortion requirements, or on recognition metric 6.345 Automatic Speech Recognition Vector Quantization & Clustering 14

Clustering Issues: Clustering Criterion The criterion used to partition data into clusters plays a strong role in determining the final results 6.345 Automatic Speech Recognition Vector Quantization & Clustering 15

Clustering Issues: Distance Metrics A distance metric usually has the properties: 1. 0 d(x, y) 2. d(x, y) =0iff x = y 3. d(x, y) =d(y, x) 4. d(x, y) d(x, z)+d(y,z) 5. d(x + z, y + z) = d(x, y)(invariant) In practice, distance metrics may not obey some of these properties but are a measure of dissimilarity 6.345 Automatic Speech Recognition Vector Quantization & Clustering 16

Clustering Issues: Distance Metrics Distance metrics strongly influence cluster shapes: Normalized dot-product: x t y x y Euclidean: x µ i 2 =(x µ i ) t (x µ i ) Weighted Euclidean: (x µ i ) t W(x µ i ) (e.g., W =Σ 1 ) Minimum distance (chain): min d(x,x i ), x i ɛc i Representation specific... 6.345 Automatic Speech Recognition Vector Quantization & Clustering 17

Clustering Issues: Impact of Scaling Scaling feature vector dimensions can significantly impact clustering results Scaling can be used to normalize dimensions so a simple distance metric is a reasonable criterion for similarity 6.345 Automatic Speech Recognition Vector Quantization & Clustering 18

Clustering Issues: Training and Test Data Training data performance can be arbitrarily good e.g., lim K D K =0 Independent test data needed to measure performance Performance can be measured by distortion, D, or some more relevant speech recognition metric Robust training will degrade minimally during testing Good training data closely matches test conditions Development data are often used for refinements, since through iterative testing they can implicitly become a form of training data 6.345 Automatic Speech Recognition Vector Quantization & Clustering 19

Alternative Evaluation Criterion: LPC VQ Example Autumn Time (seconds) 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 4 4 Wide Band Spectrogram 3 3 khz 2 2 khz 1 1 0 0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 Autumn LPC Time (seconds) 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 4 4 Wide Band Spectrogram 3 3 khz 2 2 khz 1 1 0 0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 (codebook size = 1024) 6.345 Automatic Speech Recognition Vector Quantization & Clustering 21

Hierarchical Clustering Clusters data into a hierarchical class structure Top-down (divisive) or bottom-up (agglomerative) Often based on stepwise-optimal, orgreedy, formulation Hierarchical structure useful for hypothesizing classes Used to seed clustering algorithms such as K-means 6.345 Automatic Speech Recognition Vector Quantization & Clustering 22

Divisive Clustering Creates hierarchy by successively splitting clusters into smaller groups On each iteration, one or more of the existing clusters are split apart to form new clusters The process repeats until a stopping criterion is met Divisive techniques can incorporate pruning and merging heuristics which can improve the final result 6.345 Automatic Speech Recognition Vector Quantization & Clustering 23

Example of Non-Uniform Divisive Clustering 0 1 2 6.345 Automatic Speech Recognition Vector Quantization & Clustering 24

Example of Uniform Divisive Clustering 0 1 2 6.345 Automatic Speech Recognition Vector Quantization & Clustering 25

Divisive Clustering Issues Initialization of new clusters Random selection from cluster samples Selection of member samples far from center Perturb dimension of maximum variance Perturb all dimensions slightly Uniform or non-uniform tree structures Cluster pruning (due to poor expansion) Cluster assignment (distance metric) Stopping criterion Rate of distortion decrease Cannot increase cluster size 6.345 Automatic Speech Recognition Vector Quantization & Clustering 26

Divisive Clustering Example: Binary VQ Often used to create M =2 B size codebook (B bit codebook, codebook size M) Uniform binary divisive clustering used On each iteration each cluster is divided in two µ + i = µ i (1 + ɛ) µ i = µ i (1 ɛ) K-means used to determine cluster centroids Also known as LBG (Linde, Buzo, Gray) algorithm A more efficient version does K-means only within each binary split, and retains tree for efficient lookup 6.345 Automatic Speech Recognition Vector Quantization & Clustering 27

Agglomerative Clustering Structures N samples or seed clusters into a hierarchy On each iteration, the two most similar clusters are merged together to form a new cluster After N 1 iterations, the hierarchy is complete Structure displayed in the form of a dendrogram By keeping track of the similarity score when new clusters are created, the dendrogram can often yield insights into the natural grouping of the data 6.345 Automatic Speech Recognition Vector Quantization & Clustering 28

Dendrogram Example (One Dimension) Distance 00 11 1100 00 11 1100 00 11 00 11 00 11 00 11 1100 6.345 Automatic Speech Recognition Vector Quantization & Clustering 29

Agglomerative Clustering Issues Measuring distances between clusters C i and C j with respective number of tokens n i and n j Average distance: 1 n i n j i,j d(x i,x j ) Maximum distance (compact): max i,j Minimum distance (chain): min i,j d(x i,x j ) d(x i,x j ) Distance between two representative vectors of each cluster such as their means: d(µ i,µ j ) 6.345 Automatic Speech Recognition Vector Quantization & Clustering 30

Stepwise-Optimal Clustering Common to minimize increase in total distortion on each merging iteration: stepwise-optimal or greedy On each iteration, merge the two clusters which produce the smallest increase in distortion Distance metric for minimizing distortion, D, is: ni n j n i +n j µ i µ j Tends to combine small clusters with large clusters before merging clusters of similar sizes 6.345 Automatic Speech Recognition Vector Quantization & Clustering 31

Clustering for Segmentation 6.345 Automatic Speech Recognition Vector Quantization & Clustering 32

Speaker Clustering 23 female and 53 male speakers from TIMIT corpus Vector based on F1 and F2 averages for 9 vowels Distance d(c i,c j )is average of distances between members 2 4 6 6.345 Automatic Speech Recognition Vector Quantization & Clustering 33

Velar Stop Allophones 6.345 Automatic Speech Recognition Vector Quantization & Clustering 34

Velar Stop Allophones (con t) Time (seconds) 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 8 8 Wide Band Spectrogram Time (seconds) 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 8 8 Wide Band Spectrogram Time (seconds) 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 8 8 Wide Band Spectrogram 7 7 7 7 7 7 6 6 6 6 6 6 5 5 5 5 5 5 khz 4 khz 4 4 khz 4 4 4 khz 3 3 3 3 3 3 2 2 2 2 2 2 1 1 1 1 1 1 0 Waveform 0 0 Waveform 0 0 Waveform 0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 keep cot quick 6.345 Automatic Speech Recognition Vector Quantization & Clustering 35

Acoustic-Phonetic Hierarchy Clustering of phonetic distributions across 12 clusters 6.345 Automatic Speech Recognition Vector Quantization & Clustering 36

Word Clustering A A_M AFTERNOON AIRCRAFT AIRPLANE AMERICAN AN ANY ATLANTA AUGUST AVAILABLE BALTIMORE BE BOOK BOSTON CHEAPEST CITY CLASS COACH CONTINENTAL COST DALLAS DALLAS_FORT_WORTH DAY DELTA DENVER DOLLARS DOWNTOWN EARLIEST EASTERN ECONOMY EIGHT EIGHTY EVENING FARE FARES FIFTEEN FIFTY FIND FIRST_CLASS FIVE FLY FORTY FOUR FRIDAY GET GIVE GO GROUND HUNDRED INFORMATION IT JULY KIND KNOW LATEST LEAST LOWEST MAKE MAY MEAL MEALS MONDAY MORNING MOST NEED NINE NINETY NONSTOP NOVEMBER O+CLOCK OAKLAND OH ONE ONE_WAY P_M PHILADELPHIA PITTSBURGH PLANE RETURN ROUND_TRIP SAN_FRANCISCO SATURDAY SCENARIO SERVED SERVICE SEVEN SEVENTY SIX SIXTY STOPOVER SUNDAY TAKE TELL THERE THESE THIRTY THIS THOSE THREE THURSDAY TICKET TIME TIMES TRANSPORTATION TRAVEL TUESDAY TWENTY TWO TYPE U_S_AIR UNITED USED WANT WASHINGTON WEDNESDAY WILL WOULD ZERO NIL 0 2 4 6 6.345 Automatic Speech Recognition Vector Quantization & Clustering 37

VQ Applications Usually used to reduce computation Can be used alone for classification Used in dynamic time warping (DTW) and discrete hidden Markov models (HMMs) Multiple codebooks are used when spaces are statistically independent (product codebooks) Matrix codebooks are sometimes used to capture correlation between succesive frames Used for semi-parametric density estimation (e.g., semi-continuous mixtures) 6.345 Automatic Speech Recognition Vector Quantization & Clustering 38

References Huang, Acero, and Hon, Spoken Language Processing, Prentice-Hall, 2001. Duda, Hart and Stork, Pattern Classification, John Wiley & Sons, 2001. A. Gersho and R. Gray, Vector Quantization and Signal Compression, Kluwer Academic Press, 1992. R. Gray, Vector Quantization, IEEE ASSP Magazine, 1(2), 1984. B. Juang, D. Wang, A. Gray, Distortion Performance of Vector Quantization for LPC Voice Coding, IEEE Trans ASSP, 30(2), 1982. J. Makhoul, S. Roucos, H. Gish, Vector Quantization in Speech Coding, Proc. IEEE, 73(11), 1985. L. Rabiner and B. Juang, Fundamentals of Speech Recognition, Prentice-Hall, 1993. 6.345 Automatic Speech Recognition Vector Quantization & Clustering 39