Understanding Grapheme

Size: px

Start display at page:

Download "Understanding Grapheme"

Loren McCormick
7 years ago
Views:

1 Understanding Grapheme Dong Wang January 15, 2007

2 What is Grapheme? Understand Grapheme Comparation Result on WSJCAM0 Graphem And Spoken Term Detection Spokten Term Detection Main Stream Grapheme Based Spoken Term Detection Future Work On Grapheme

3 Language:Multiple Layer Information Expression MEANING WORD WORD FRAG Pron.FRAG WAVE (a-table) TABLE T-A-B-L-E t-ei-b-l A language has multiple layers for communication. between different layers are famous. Mapping

4 Lexicon: Mapping from Word to Pronunciation Accurate Lexicon V S Obscure Pronunciation Mixutre Gaussians Context dependent model Multi entries in lexicon Network based pronunciation

5 Grapheme: Thinking The Essence That ll be great if stochastic mapping is available, that is Grapheme. Obscure Mapping bewteen Meaning and Pronunciation A bag of pronunications Mixture Phoneme Grapheme

6 Grapheme: Thinking The Essence Composition of linguistic and acoustic clues Context Dependent Grapheme: Unit of Phonology Context independent Tri phoneme/grapheme Phoneme Grapheme Thanks for Motoyuki

7 Grapheme: Thinking The Essence Where grapheme can be used? High dependency between graphemes and phonemes or Obey phonology rules strictly or Powerful language model(or other constrains) for discrimination Lg. p2 g2 tide g1 p1 tid Ac.

8 Grapheme: Thinking The Essence Can the grapheme lexicon be refined? Graphemes and Phonemes are different sharing strategy Grapheme Lexicon can be concreted someway Difficulties in searching, how to resolve Gr. E I A ai ei ih ax Ph.

9 Grapheme System on Wsjcam0 pure grapheme/phoneme recognizer Basic decoder + 8 letter-gram + lexicon constrain Phoneme Grapheme grapheme and phoneme word decoder Lexicon +bi-gram lattice +trigram rescore HDecode Phoneme Grapheme

10 Grapheme System on Wsjcam0 Graphemes of Letter-Pairs Strategy WER Single Letter Single+TH/TION/SION/SH/CH/GH/PH Single+TH/SH/GH/PH Single+TH/SH/PH-F/GH All the word-pair schemes contain mappings like AU-O

11 Grapheme System on Wsjcam0 Question set for tri-grapheme singular *CMU uses phoneme-grapheme mapping *currently used grapheme-phoneme mapping data driven singular vs phoneme-grapheme mapping question set Singular(9 mix) Phoneme Mapping (9 mix) simple vs complex phoneme-grapheme mapping question set Simple(6 mix) Complex (6 mix)

12 Grapheme Based Spoken Term Detection There are several ways for Spoken Term Detection, or Spoken Document Retrieval Acoustic detection LVCSR based detection Phoneme lattice based detection Hybrid Features make grapheme based STD/SDR feasible No special requirement for LVCSR accuracy In almost all cases, only those words with clear meaning will be searched, which means linguistic discrimination Avoid word to phoneme conversion, which is almost inevitable for any other systems

13 Grapheme Based Spoken Term Detection Phoneme Based Detection P1: Word lattice generated from HVite (without higher level LM, but only bigram word lattice) P2: Word lattice generated from HDecode (without pre-built word lattice, but with higher level LM) P3: Phoneme lattice generated from HVite Grapheme Based Detection G1: Word lattice generated from HVite (without higher level LM, but only bigram word lattice) G2: Word lattice generated from HDecode (without pre-built word lattice, but with higher level LM) G3: Grapheme lattice generated from HVite G4: Grapheme lattice generated from HDecode ( with 8-gram grapheme LM) G5: Word lattice generated from HVite (without any LM, just the lexicon)

14 Grapheme Based Spoken Term Detection Most Frequent Word Detection HIT False Accept Real Occ. FOM P P P G G G G G most frequent words are selected from the 5k dictioanry according to the LM unigram frequency, and should occur at least 3 times, but filter out stop words

15 Grapheme Based Spoken Term Detection Least Frequent Word Detection HIT False Accept Real Occ. FOM P P P G G G G G least frequent words are selected from the 5k dictioanry according to the LM unigram frequency, and should occur at least once, but filter out stop words

16 Grapheme Based Spoken Term Detection How to handle OOV words P3,G3,G4 can be used to detect OOV words directly, without any change on the result If audio are allowed to be re-searched, OOV words can be added into lexicon on the fly, so G1,G2,G5 can be used, and no change on the result

17 Grapheme Based Spoken Term Detection How to handle words never seen (not in LM) P3,G3 has no change G4 will be affected. We delete all those training sentences containing the target words and test again If audio are allowed to be re-searched, those words can be added into vocabulary, but as UNKNOWN words in LM, so G1,G2 can be used. We only tested G2 in this case If audio are allowed to be re-searched, those words can be added into vocabulary on the fly so G5 can be used. The result is the same becuase G5 dose not use LM HIT False Accept Real Occ. FOM P G G G G

18 Grapheme Based Spoken Term Detection Performance Test Phase I: Recognition (recognize 3 sentences) Time Storage(k) P1 4: P2 0: P3 10:02 9,753 G1 2:44 1,097 G2 0: G3 5:54 22,679 G4 1: G5 4:38 1,113 Grapheme normally generize larger lattice in shorter time G4 is good at generating high quality lattice in short time

19 Grapheme Based Spoken Term Detection Performance Test Phase II: Index (Index 80 most frequent words) Time Storage(k) P1 0:34 34,893 P2 0:24 28,008 P3 28:51 974,521 G1 0:40 116,224 G2 0:33 38,091 G3 1:35:45 2,373,737 G4 0:34 31,873 G5 1:01 75,644 Indexing time is basically determined by the lattice size Grapheme lattice seems more fast, maybe the single entry?

20 Grapheme Based Spoken Term Detection What conclusion can we draw from these results It s a principle that phoneme system works well for In Vocabulary words Graphemes with long-span language models works well in OOV words If the audio are allowed to be searched again, G2 is the best way to deal with OOV, even those words never seen Hybrid sytem obviously a promising solution

21 Future Work Vocabulary Refinement Two Pass Decoder: Recall acoustic evaluation on rescoring? One Pass Decoder: Look afterword when reach the word boundary? Language Migrating and Adaptation, thanks for Partha Pure Chinese Pure English English porting After Adaptation Languages with different pronunciation basis are hard for migrating Languages with different phonology rules are hard for migrating This is intrinsic in graphemes as they are compound of acoustic and lingustic units

22 Future Work Large file alingment the most strict language: the transcript benefit from unsupervised learning Use wsjcam0 grapheme system recognize mp3 downloaded from internet Direct Applying on-line adaptation each 100 short segments large amount of OOVs in on-line books or conferences Handle bad word piece, for example {I d} Grapheme based ASR may be a powerful spider who can update itself steadily by finding proper audio segments, and cooperated with TEXT spider, who provides larger and larger and up-to-date LM, it can find much humane audio indexable, without seperate things like Grapheme to Phoneme statistics.

23 Future Work Language Identification The nature of grapheme with linguistic information Currently most sucessful identfier is phoneme decoder with phoneme language model The same reason as graphemes are not suitable for language porting is just the reason they suitable for identification

24 Final Page We can not hope Grapheme is a good transcriber, but we really hope it is a good information miner... Most of the ideas come from Simon and Joe, they can answer any questions if I do not understand!

Turkish Radiology Dictation System

Turkish Radiology Dictation System Ebru Arısoy, Levent M. Arslan Boaziçi University, Electrical and Electronic Engineering Department, 34342, Bebek, stanbul, Turkey arisoyeb@boun.edu.tr, arslanle@boun.edu.tr