(12) United States Patent (10) Patent N0.: US 7,555,428 B1 Franz et al. (45) Date of Patent: Jun. 30, 2009
|
|
- Anna Park
- 8 years ago
- Views:
Transcription
1 US B1 (12) United States Patent (10) Patent N0.: US 7,555,428 B1 Franz et a. (45) Date of Patent: Jun. 30, 2009 (54) SYSTEM AND METHOD FOR IDENTIFYING OTHER PUBLICATIONS COMPOUNDS THROUGH ITERATIVE ANALYSIS Su. K., Wu, M,. and Chang, J A Corpus-based approach to automatic compound extraction. In Proceedings of the 32nd Annua (75) Inventors; Aexander Franz pao Ato C A (Us); Meeting on Association For Computationa Linguistics (Las Cruces, Brian Much Berkeey, CA (Us) New Mexico, Jun , 1994). Annua Meeting ofthe ACL. Asso ciation for Computationa Linguistics, MorristoWn, NJ, * (73) Assignee: Googe Inc., Mountain View, CA (U S) Venkataraman, A A statistica mode for Word discovery in transcribed speech. Comput. Linguist. 27, 3 (Sep. 2001), * ( * ) Notice: Subject to any discaimer, the term ofthis G30, J, Goodman, J, Li, M, and Lee, K, 2002, Toward a uni?ed patent is exiended 01 adjusted under 35 approach to statistica anguage modeing for chinese. ACM Trans U.S.C. 154(b) by 574 days. actions on Asian Language Information Processing (TALIP), 1 (Mar. 2002), DOI: (21) App. N0.: 10/647, (22) Fied: Aug. 21, 2003 (Continued) (51) Int- C Primary ExamineriParas' Shah ' ' G06F 1 7/21 (200601) (74) Attorney, Agent, or FzrmiFsh & R1chards0n P.C. G06F 17/27 G06F 17/28 ( ) ( ) (57) ABSTRACT (52) US. C /10; 704/7; 704/9 (58) Fied of Cassi?cation Search /7, A System and method for identifying Compounds through 7041/10: 9 iterative anaysis of measure of association is discosed. A See appication?e for Compete Search history imit on a number of tokens per compound is speci?ed. Com (56) References Cited pounds Within a text corpus are iterativey evauated. A num ber of occurrences of one or more n-grams Within the text U.S. PATENT DOCUMENTS corpus is determined. Each n-gram incudes up to a maximum 5,842,217 A * 11/1998 Light /101 number Oftokens, which are each Provided in a Vocabuary 5,867,812 A * 2/1999 Sassano 704/10 for the text corpus. At east one n-gram incuding a number of 6,173,298 B1 * 1/2001 Smadja /209 tokens equa to the imit based on the number OfOCCurrenCe/S 6,285,999 B1 9/2001 Page is identi?ed. A measure of association between the tokens in 6,349,282 B1 * 2/2002 Van Aeten et a /257 the identi?ed n-gram is determined. Each identi?ed n-gram 6,754,617 B1 * 6/2004 Ejerhed ~~ 704/9 With a su?icient measure of association is added to the 2007/ A* 3/2007 Kaku et a /10 JP FOREIGN PATENT DOCUMENTS * 6/1996 vocabuary as a compound token and the imit is adjusted. 15 Caims, 6 Drawing Sheets 5Q Token Compound Engne Vocabuary \ _1 > Corpus preparation I * Text Corpus N-gram,1_ 51/ V 36 Counts List 54 \de \ N-gram List k 58 M, N-gram Counter : 61 J Likeuhood E113. \ Compound Fnder :/ 53 Compounds R t' L' t 8'08 '8 Likeihood Eva \ /\55 L' t 'S 60 " q Iterator \/\ 56 W 37
2 US 7,555,428 B1 Page 2 OTHER PUBLICATIONS Jurafsky, D., et a. (2000). Backoff. Speech and Language Process ing.: An Introduction to Natura Language Processing, Computa tiona Linguistics, and Speech Recognition. Pearson Ha Jerse, pp * Smadja, F Retrieving coocations from text: Xtract. Comput. Linguist. 9, 1(Mar. 1993), * FrantZi, K. T. and Ananiadou, S Extracting nested cooca tions. In Proceedings of the 16th Conference on Computationa Linguisticsivo. (Copenhagen, Denmark, Aug. 5-9, 1996). Inter nationa Conference On Computationa Linguistics. Association for Computationa Linguistics, Morristown, NJ DOI: doi.org/0.35/9926.* Seretan V., Neriman, L. and Wehri, E Extraction of Muti Word Coocations Using Syntactic Bigram Composition. In Pro ceedings of the Internationa Conference on Recent Advances in NLP (RANLP-ZOOB), Borovets, Bugaria, pp * C.D. Manning and H. SchutZe, Foundations Of Statistica Natura Languages Processing, Ch. 5, MIT Press (1999). T. Dunning, Accurate Methods For The Statistics Of Surprise And Coincidence, Comp. Ling, vo. 19, No. I, pp (1993). * cited by examiner
3
4 US. Patent Jun. 30, 2009 Sheet 2 of6 US 7,555,428 B1 Figure 2. Q 381 Storage Compound Sewer 32 Av Memory Text Corpus 34 /_\ Compound '/ Engine AH w "o Processor Compounds List ; \/ J dz ) Figure 3. 3g %, Token Compound Engine Vocabuary i I \ i_ y Corpus preparation < Text Corpus N-gram 51 "*V 36 Counts List (I: S4/"\JII k N-gram List,L 58 \- UJ N-gram Counter 61 JIK \/\ Likeihood compound Finder 7/ 53 Compounds R t' L t a '08 '8 Likeihood Eva \i/\ 55 L'St 6O Iterator \/\ 56 w 37
5 US. Patent Jun. 30, 2009 Figure 4. Sheet 3 0f 6 US 7,555,428 B1 Prepare corpus w 71 Find compounds w 72 Figure 5. a Assembe text into corpus W81 Parse corpus into tokens W82 Construct initia token $83
6 US. Patent Jun. 30, 2009 Sheet 4 0f 6 US 7,555,428 B1 Figure 6. 1% Identify ength of compounds (LEN) Identify upper imit on compounds desired (LIM) Count n-grams Seect one or more n grams of ength LEN from n-gram ist or eac 0 one or more seected n-grams, do Determine ikeihood of seected n-gram 105 Next /* n-gram */ 107 Seect up to LIM most ikey n-grams of ength LEN Add each of most ikey n-grams as compound tokens to token vocabuary a
7 US. Patent Jun. 30, 2009 Sheet 5 0f 6 US 7,555,428 B1 Figure 6 Reconstruct token VA 1 10 vocabuary Adjust compounds ength LEN S 112 Y es 111 N0
8 US. Patent Jun. 30, 2009 Sheet 6 of6 US 7,555,428 B1 Figure Identify maximum n-gram ength (MAX) 1, J 121 Seect unique n-grams $122 For i = 1 to MAX, do 123 Count number of occurrences of unique n-grams of ength fusing token vocabuary VA 124 \ Next /*i*/ /Z/ 125
9 1 SYSTEM AND METHOD FOR IDENTIFYING COMPOUNDS THROUGH ITERATIVE ANALYSIS FIELD OF THE INVENTION The present invention reates in genera to text anaysis and, in particuar, to a system and method for identifying compounds through iterative anaysis. BACKGROUND OF THE INVENTION Athough the origins of the Internet trace back to the ate 1960s, the more recenty-deveoped WordWide Web ( Web ), together With the ong-estabished Usenet, have revoutionized accessibiity to untod voumes of information in stored eectronic form to a WordWide audience, incuding Written, spoken (audio) and visua (imagery and video) infor mation, both in archived and rea-time formats. The Web provides information via interconnected Web pages that can be navigated through embedded hyperinks. The Usenet pro vides information in a non-interactive buetin board format consisting of static news messages posted and retrievabe by readers. In short, the Web and Usenet provide desktop access to a virtuay unimited ibrary of information in amost every anguage WordWide. Information exchange on the Web and Usenet both operate under a cient-server mode. For the Web, individua cients typicay execute Web browsers to retrieve and dispay Web pages in a graphica user environment. For the Usenet, indi vidua cients generay execute news readers to retrieve, post and dispay news messages, usuay in a textua user environ ment. Both Web browsers and news readers interface to cen traized content servers, Which function as data dissemina tion, storage and retrieva repositories. NeWs messages avaiabe via the Usenet are cataoged into speci?c news groups and?nding reevant content invoves a straightforward searching of news groups and message ists. Web content, however, is not organized in any structured manner and search engines have evoved to enabe users to?nd and retrieve reevant Web content, as We as news mes sages and other types of content. As the amount and variety of Web content have increased, the sophistication and accuracy of search engines have ikewise improved. Existing methods used by search engines are based on matching search query terms to terms indexed from Web pages. More advanced methods determine the importance of retrieved Web content using, for exampe, a hyperink structure-based anaysis, such as described in S. Brin and L. Page, The Anatomy of a Large-Scae Hypertextua Search Engine, (1998) andin US. Pat. No. 6,285,999, issued Sep. 4, 2001 to Page, the disco sures of Which are incorporated by reference. Compounds frequenty occur in Web content, news mes sages, and other types of content. A compound, sometimes aso referred to as a coocation, is de?ned as any sequence of Words that co-occur more often than by mere chance. Com pounds occur in text and speech as a natura anguage con struct and can incude proper nouns, such as San Francisco, compound nouns, such as hot dog, and other semantic and syntactic anguage constructs, Which resut in the co-occur rence of two or more Words. Compounds occur With reguar ity in a range of appications, incuding speech recognition, text cassi?cation, and search resut scoring. Recognizing compounds is dif?cut, especiay When occurring in speech or ive text. Moreover, most anguages ack reguar syntactic or semantic cues to enabe easy iden ti?cation of compounds. In German, for instance, the?rst US 7,555,428 B etter of each noun is capitaized, Which compicates the identi?cation of proper nouns. Simiary, the types of poten tia compounds can depend on the subject matter. For instance, a scienti?c paper coud incude compounds Whoy unique from those found in a sports coumn. Conventiona approaches to?nding compounds in a text corpora typicay rey on n-gram anaysis, such as described in C. D. Manning and H. SchiitZe, Foundations of Statistica Natura Languages Processing, Ch. 5, MIT Press (1999), the discosure Which is incorporated by reference. An n-gram is a muti-word occurrence. N-gram-based approaches therefore count the frequencies of individua Words or tokens and the frequencies of Word sequences of varying engths. N-gram based approaches suffer from three principa di?icuties. First, n-gram-based approaches are storage inef?cient. As the number of Words occurring in each n-gram increases, the number of unique n-grams in a corpus approaches the number of Words in a corpus. Storing the counts for ong sequences of n-grams can require a prohibitivey arge amount of memory. Second, With compounds of varying engths, the ikeihood of spurious shorter compounds being incuded as substrings increases. Spurious substrings of onger compounds can occur, skewing compound identi?cation resuts. For exampe, New York City is a three-word compound, Where the Words New, York, and City are highy correated. As a side effect, York City is aso highy correated, but generay does not represent a meaningfu compound. York and City are ony correated in the context of the arger com pound, New York City. Simiary, With compounds consisting of three or more Words, the ikeihood that a onger compound Wi contain two-word or three-word compounds as substrings increases. Spurious ong compounds that contain shorter, but signi? cant, compounds as substrings can occur. For exampe, San Francisco as a two-word compound, but San Francisco has is not a three-word compound. Nevertheess, n-gram based approaches, Which assume a Words are independent, Woud erroneousy identify San Francisco has as a three Word compound. Therefore, there is a need for an approach to ef?cienty identifying compounds in a text corpus based on a measure of association, such as a ikeihood of co-occurrence between the Words Which constitute each compound. There is a further need for an approach to forming a ist of compounds though an anaysis of a text corpus With minima overapping substrings, minima overapping compounds, and e?icient memory utiization. SUMMARY OF THE INVENTION The present invention provides a system and method for discovering and identifying compounds Within a text corpus through iterative anaysis of measures of association between constituent tokens. A text corpus is evauated into a set of unique n-grams and counts of the frequencies of occurrence of each unique n-gram are taied. Those n-grams having a speci?ed ength are seected and the ikeihood of each seected n-gram being a compound, that is, the ikeihood of coocation, is determined. In the described embodiment, the ikeihood of coocation is evauated using the ikeihood ratio method, athough other methodoogies and approaches coud be used, as Woud be recognized by one skied in the art. Those n-grams most ikey constituting compounds are added to a token vocabuary, preferaby up to an upper imit number of n-grams. The token vocabuary is reconstructed to add the new compounds and remove constituent tokens, Which occur in the new compounds. The speci?ed ength is
10 US 7,555,428 B1 3 adjusted and evauation continues using the revised token vocabuary. In the preceding exampe, the n-gram San Fran cisco has Woud receive a reativey ow score because a ikeihood under the assumption that the n-gram San Fran cisco has is a compound Woud be ony sighty higher than 5 a ikeihood under the assumption that San Francisco is a compound but the entire n-gram is not a compound. An embodiment provides a system and method for?nding compounds in a text corpus. A vocabuary incuding tokens extracted from a text corpus is buit. Compounds having a puraity of engths Within the text corpus are iterativey iden ti?ed. Each compound incudes a puraity of tokens. A fre quency of occurrence for one or more n-grams in the text corpus is evauated. Each n-gram incudes tokens seected from the vocabuary. A ikeihood of coocation for one or more of the n-grams having a same ength is determined. The n-grams having a highest ikeihood are added as compounds to the vocabuary. The vocabuary is rebuit based on the added compounds. 20 A further embodiment provides a system and method for identifying compounds through iterative anaysis of measure of association. A imit on a number of tokens per compound is speci?ed. Compounds Within a text corpus are iterativey 25 evauated. A number of occurrences of one or more n-grams Within the text corpus is determined. Each n-gram incudes up to a maximum number of tokens, Which are each provided in a vocabuary for the text corpus. At east one n-gram incud ing a number of tokens equa to the imit based on the number 30 of occurrences is identi?ed. A measure of association between the tokens in the identi?ed n-gram is determined. Each identi?ed n-gram With a suf?cient measure of associa tion is added to the vocabuary as a compound token. The vocabuary is rebuit based on the added compound tokens 35 and adjusting the imit. The imit is adjusted. Sti other embodiments of the present invention Wi become readiy apparent to those skied in the art from the foowing detaied description, Wherein are described 40 embodiments of the invention by Way of iustrating the best mode contempated for carrying out the invention. As Wi be reaized, the invention is capabe of other and different embodiments and its severa detais are capabe of modi?ca tions in various obvious respects, a Without departing from 45 the spirit and the scope of the present invention. Accordingy, the drawings and detaied description are to be regarded as iustrative in nature and not as restrictive. BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a bock diagram showing a system for identifying compounds through iterative anaysis, in accordance With the present invention. FIG. 2 is a functiona bock diagram showing a compound server, in accordance With the present invention. FIG. 3 is a functiona bock diagram showing the software modues used by the compound engine of FIG. 2. FIG. 4 is a How diagram showing a method for identifying compounds through iterative anaysis, in accordance With the present invention. FIG. 5 is a How diagram showing the routine for preparing a corpus for use in the method of FIG. 4. FIG. 6 is a How diagram showing the routine for?nding compounds for use in the method of FIG FIG. 7 is a How diagram showing the routine for counting n-grams for use in the routine of FIG. 6. DETAILED DESCRIPTION System OvervieW FIG. 1 is a bock diagram showing a system 9 for identi fying compounds through iterative anaysis, in accordance With the present invention. A puraity of individua cients 12 are communicativey interfaced to a server 11 via an intemet Work 10, such as the Internet, or other form of communica tions network, as Woud be recognized by one skied in the art. The individua cients 12 are operated by users 19 Who transact requests for Web content, news messages, other types of content, and other operations through their respective ci ent 12. In genera, each cient 12 can be any form of computing patform connectabe to a network, such as the internetwork 10, and capabe of interacting With appication programs. Exempary exampes of individua cients incude, Without imitation, persona computers, digita assistances, smart ceuar teephones and pagers, ightweight cients, Worksta tions, dumb terminas interfaced to an appication server, and various arrangements and con?gurations thereof, as Woud be recognized by one skied in the art. The intemet Work 10 incudes various topoogies, con?gurations, and arrangements of network interconnectivity components arranged to interoperativey coupe With enterprise, Wide area and oca area networks and incude, Without imitation, con ventionay Wired, Wireess, sateite, optica, and equivaent network technoogies, as Woud be recognized by one skied in the art. For Web content retrieva and news message posting and retrieva, each cient 12 executes a Web browser and news reader appication 18 ( BroWser/ Reader ). Web content 25 is requested via a Web server 20 executing on the server 11. Simiary, news messages ( News Msgs ) 26 are posted and retrieved via a news server 21 aso executing on the server 11. In addition, speech, as communicated from a user 19 via a cient 12, can be recognized through a speech recognizer 23. Search resuts and other types of text can be cassi?ed by a text cassi?er 24. Other types of server functionaity can be provided, as Woud be recognized by one skied in the art. Note the Web browsing, news reading, speech recognition, and text cassi?cation functions coud be aso be impe mented separatey as stand aone appications, as are known in the art. The server 11 maintains an attached storage device 15 in Which the Web content 25, news messages 26, and other content 27 are stored. The Web content 25, news messages 26, and other content 27 coud aso be maintained remotey on other Web and news servers (not shown) interconnected either directy or indirecty via the intemetwork 10 and Which are preferaby accessibe by each cient 12. A compound server (not shown) identi?es compounds from a training cor pus and creates a ist of compounds, as further described beow With reference to FIG. 2. The compounds ist can be used by the search engine 22, speech recognizer 23, text cassi?er 24, and other components (not shown) on the server 11, one or more of the cients 12, or on other functiona components, as Woud be recognized by one skied in the art. In a further embodiment, a search engine 22 executes on the server 11 for processing queries for Web content 25, news messages 26, and other content 27. Each query describes or identi?es information, Which is potentiay retrievabe via either the Web server 20 or news server 21. Preferaby, each
11 5 query provides characteristics, typicay expressed as terms, incuding individua Words and compounds. A search engine 22, aso executing on the server 11, receives each query, identi?es matching Web content 25, news messages 26, and other content 27, and send back resuts conforming to the query preferences. Other styes, forms or de?nitions of que ries, query characteristics, and reated metadata are feasibe, as Woud be recognized by one skied in the art. The search engine 22 preferaby identi?es the Web content 25, news messages 26, and other content 27 best matching the search query terms to provide high quaity search resuts, such as described in S. Erin and L. Page, The Anatomy of a Large-Scae Hypertextua Search Engine (1998) and in US. Pat. No. 6,285,999, issued Sep. 4, 2001 to Page, the disco sures of Which are incorporated by reference. In identifying matching Web content 25, news messages 26, and other con tent 27, the search engine 22 operates on information charac teristics describing potentiay retrievabe content. Note the functionaity provided by the server 20, incuding the Web server 20, news server 21, search engine 22, speech recog nizer 23, and text cassi?er 24 coud be provided by a oosey or tighty-couped distributed or paraeized computing con?guration, in addition to a uniprocessing environment. The individua computer systems, incuding server 11 and cients 12, incude genera purpose, programmed digita com puting devices consisting of a centra processing unit (pro cessors 13 and 16, respectivey), random access memory (memories 14 and 17, respectivey), non-voatie secondary storage 15 and 28, such as a hard drive or CD ROM drive, network or Wireess interfaces, and periphera devices, incuding user interfacing means, such as a keyboard and dispay. Program code, incuding software programs, and data is oaded into the RAM for execution and processing by the CPU and resuts are generated for dispay, output, trans mitta, or storage. Compound Server FIG. 2 is a functiona bock diagram 30 showing a com pound server 31, in accordance With the present invention. The compound server 31 discovers and identi?es compounds based on tokens extracted from a text corpus 36 and stores the compounds in a compounds ist 37. The compounds server 31 incudes a compound engine 34, Which identi?es compounds through iterative anaysis, as further described beow With reference to FIG. 3. The compound server 31 maintains an attached storage device 35 in Which the text corpus 36 and compounds ist 37 are stored. The text corpus 36 consists of documents that incude Web content, news messages, and other content, incuding the Web content 25, news messages 26, and other content 27 stored by the server 11 (shown in FIG. 1), as We as documents from other sources, as is known in the art. The individua computer system, incuding the compound server 31, incude genera purpose, programmed digita com puting devices consisting of a centra processing unit (pro cessor 33), random access memory (memory 32), non-voa tie secondary storage 35, such as a hard drive or CD ROM drive, network or Wireess interfaces, and periphera devices, incuding user interfacing means, such as a keyboard and dispay. Program code, incuding software programs, and data is oaded into the RAM for execution and processing by the CPU and resuts are generated for dispay, output, trans mitta, or storage. Compound Server Components FIG. 3 is a functiona bock diagram 50 showing the soft Ware modues used by the compound engine 34 of FIG. 2. The US 7,555,428 B compound engine 34 consists of a corpus preparation com ponent 51, n-gram counter 52, and compound?nder 53. The corpus preparation component 51 and n-gram counter 52 both operate on the text corpus 36, Which consists of a set of documents ( Docs ) 61 that contain raw text provided as Web content, news messages, and other content. The corpus preparation component 51 evauates the text corpus 36 to construct an initia token vocabuary 57, as further described beow With reference to FIG. 5. The corpus preparation com ponent 51 incudes a parser 54 that tokenizes each document 61 in the text corpus 36. TokeniZing removes White space, punctuations, and formatting artifacts to form a cean ist of individua Words, Which each becomes a one-word token. The n-gram counter 52 is used by the compound?nder 53 to determine the frequencies of occurrences of unique n-grams Within the text corpus 36, as further described beow With reference to FIG. 7. The n-gram counter 52 generates a ist of the counts of occurrences of each n-gram 58 and a ist of unique n-grams 59. The compound?nder 53 uses the n-gram counts ist 58 to determine, for each n-gram of a desired ength, the ikeihood that the n-gram is a compound, as further described beow With reference to FIG. 6. The compound?nder 53 retrieves each unique n-gram from the unique n-gram ist 59. A ikei hood ratio component 55 determines the ikeihood of an n-gram being a compound and stores the computed ikeihood ratio in a ikeihood ratio ist 60. The compound?nder 53 identi?es those unique n-grams having a highest ikeihood of being compounds and generates a compounds ist 37. An iterator 56 repetitivey executes the operations of the n-gram counter 52 and compound?nder 53 to progressivey identify compounds of varying engths in the text corpus 36. Method OvervieW FIG. 4 is a How diagram showing a method 70 for identi fying compounds through iterative anaysis, in accordance With the present invention. The method 70 is described as a sequence of process operations or steps, Which can be executed, for instance, by the compound engine 34 of FIG. 2, or equivaent components. The method 70 performs two functions: preparing the text corpus 36 and generating the ist of compounds 37. Accord ingy, the text corpus 36 is prepared (bock 71), as further described beow With reference to FIG. 5. Next, compounds are found (bock 72), as further described beow With refer ence to FIG. 6. Routine then terminates. Preparing Corpus FIG. 5 is a How diagram showing the routine 80 for pre paring a corpus for use in the method 70 of FIG. 4. The purpose of this routine is to convert the documents 61 in the text corpus 36 into a raw set of individua Words, Which are stored as one-word tokens in a token vocabuary 57 (shown in FIG. 3). The routine begins by assembing the text in the individua documents 61 into the text corpus 36 (bock 81). If required, the text corpus 36 is parsed into individua tokens (bock 82) by removing White space, punctuations, and formatting arti facts, such as HTML tags and reated extraneous data, as is known in the art. Finay, an initia token vocabuary 57 is constructed from Words extracted from the text corpus 36 (bock 83). The routine then returns. Finding Compounds FIG. 6 is a How diagram showing the routine 100 for?nding compounds for use in the method 70 of FIG. 4. The purpose of this routine is to discover and identify compounds
12 7 Within the text corpus 36 based on evauation of measures of association for each potentia compound. The routine anayzes the text corpus 36 in an iterative manner. During each iteration (bocks ), a set of n-grams of a speci?ed ength having the highest ikeihood of co-occurrence, that is, of being compounds, are identi?ed. In subsequent iterations (bock 111), the ength of the com pounds is adjusted to identify further compounds. The use of measures of association, such as ikeihood hypotheses, aows spurious substrings and mutipe-token compounds to be avoided, as We as to e?icienty store ony those n-grams With stronger ikeihoods of co-occurrence. The routine begins by initiay identifying an initia ength for compounds and, in one embodiment, an upper imit on the number of compounds desired (bocks 101 and 102, respec tivey). The ength of the compounds changes during subse quent iterations (bock 111). The upper imit functions as a quaity?ter that imits the potentia compounds to a?xed number of candidates preferaby having the highest ikei hood of being compounds. Next, the n-grams occurring in the text corpus 36 are identi?ed and counted (bock 103), as further described beow With reference to FIG. 7. The count ing of n-grams generates the ist of n-gram counts 58 and the ist of unique n-grams 59. One or more of the n-grams of the speci?ed ength of compounds are seected from the ist of unique n-grams 59 (bock 104). Note that the entire set of n-grams of the speci?ed ength need not be seected. For exampe, n-grams appearing in a ist of known compounds, such as city names, coud be skipped for e?iciency. For each of the one or more seected n-grams (bock 105) for Which a compound determination is desired, the ikeihood of the seected n-gram being a compound is determined (bock 106). Likeihood determination may then continue With each of the remaining seected n-grams (bock 107). In the described embodiment, the ikeihood of a seected n-gram being a compound is determined using a measure of association, known as the ikeihood ratio method, as described in T. Dunning, Accurate Methods for the Statistics ofsurprise and Coincidence, Comp. Ling, Vo. 19, No. 1, pp (1993), the discosure of Which is incorporated by reference. Each seected n-gram is assigned a score equa to the ike ihood of the observed text corpus under the assumption that the n-gram is a compound, divided by the ikeihood of the observed text corpus under the assumption that the n-gram is not a compound, as expressed as equation (1): Where L(H-) is the ikeihood of observing the data HZ- under an independence hypothesis and L(Hc) is the ikeihood of observing the data HG under a coocation hypothesis. Assum ing a binomia distribution appies, the independence hypoth esis can be expressed as equation (2): Where t1 and t2 are a pair of tokens in the seected n-gram. Simiary, the coocation hypothesis can be expressed as equation (3): P(21)>P(2Z) (3) Where t1 and t2 are a pair of tokens in the seected n-gram. US 7,555,428 B Under the coocation hypothesis, for each sequence S: W1,...,Wn ofn tokens in the text corpus 36, MS) Wi be the greatest ikeihood ratio found by considering a possibe Ways to spit the n-token sequence into two contiguous parts. The n-token sequences S are sorted by 7»(S) and designated the K sequences With the owest MS) vaues as coocations. The coocation hypothesis L(HC) can be computed, as expressed as equation (4): L(1, [2 form compound) (4) mi 5:) L(n grarn does not form compound) ar Finay, the score 7» is cacuated by soving for L(H.) and L(Hc). Other equations, methods and processes for determin ing measures of association are visibe, as Woud be recog nized by one skied in the art. Next, up to the upper imit most ikey n-grams of the speci?ed ength are seected (bock 108) and added as com pound tokens to the token vocabuary 57 (bock 109). The token vocabuary 57 is reconstructed (bock 110) using the newy-added tokens in the token vocabuary 57. During reconstruction, constituent tokens, Which occur in each newy-added token are removed from the token vocabuary 57 and the newy-added tokens are subsequenty treated as one unit in the text corpus 36. For exampe, each occurrence of the Words San and Francisco Wi subsequenty be treated as a singe compound token San Francisco. If fur ther iterations are required (bock 111), the ength of the compounds is adjusted (bock 112). In the described embodi ment, ong compounds are identi?ed during the?rst iteration and progressivey shorter compounds are identi?ed in subse quent iterations. Each subsequent iteration begins With the identi?cation and recounting of the n-grams occurring in the text corpus 36 (bock 103). The n-grams must be recounted to account for those compound tokens newy-added and con stituent tokens newy-removed from the token vocabuary 57 during the previous iteration. Upon competion of the itera tions, the routine returns. Counting N-Grams FIG. 7 is a How diagram showing the routine 120 for counting n-grams for use in the routine 100 of FIG. 6. The purpose of this routine is to generate the ist of unique n-grams 59 and ist of n-gram counts 58. The maximum n-gram ength is identi?ed (bock 121). The maximum n-gram ength Wi equa the current compounds ength used in the routine to?nd compounds 100. Next, the unique n-grams having a number of tokens equa to the maxi mum n-gram ength are seected (bock 122). For each n-gram having a ength of up to the maximum n-gram ength (bock 123), the number of occurrences of each unique n-gram of that ength is counted against the token vocabuary 57 (bock 124). Counting continues for each subsequent n-gram ength (bock 125), after Which the routine returns. Whie the invention has been particuary shown and described as referenced to the embodiments thereof, those skied in the art Wi understand that the foregoing and other changes in form and detai may be made therein Without departing from the spirit and scope of the invention. What is caimed is: 1. A computer-impemented method for identifying com pounds in text, comprising: extracting a vocabuary of tokens from text; iterating from n>2 down to n:2 Where n decreases by one each iteration and in each iteration performing the actions of: identifying a puraity of unique n-grams in the
13 9 text, each n-gram being an occurrence in the text of n sequen tia tokens, each token being found in the vocabuary; divid ing each n-gram into n- pairs of two adjacent segments, Where each segment consists of at east one token; for each n-gram, cacuating a ikeihood of coocation for each pair of the n- pairs of two adjacent segments of the n-gram and determining a score for the n-gram based on a owest cacu ated ikeihood of coocation for the each of the n- pairs; identifying a set of n-grams having scores above a threshod; and adding the identi?ed set of n-grams as compound tokens to the vocabuary and removing constituent tokens that occur in the added compound tokens from the vocabuary, Wherein the iterating is performed by one or more processors. 2. The method of caim 1 Where cacuating a ikeihood of coocation for each pair of segments of the n- gram comprises determining a ikeihood ratio 7» for each pair of segments that is computed in accordance With the formua: A : mm mm Where L(H.) is a ikeihood of observing HZ. under an indepen dence hypothesis, L(Hc) is a ikeihood of observing Hc under a coocation hypothesis, and H is a pair of segments. 3. The method of caim 2 Where the L(HC) is computed for each pair of segments, t1, t2, in each n-gram in accordance With the formua: L(1, [2 form compound) argrnax UH.) [(n gram does not form compound) 4. The method of caim 2 Where, for each pair of segments, t1, t2, in each n-gram, the independence hypothesis comprises P(t2 t):p(t2?) and the coocation hypothesis comprises P(r2 r)>p(r2. 5. The method of caim 1 Where identifying a puraity of unique n-grams in the text comprises skipping n-grams appearing in a ist of known compounds. 6. A computer readabe storage medium on Which program code is stored, Which program code, When executed by a processor, causes the processor to perform operations com prising: extracting a vocabuary of tokens from text; iterating from n>2 down to n:2 Where n decreases by one each itera tion and in each iteration performing the actions of: identify ing a puraity of unique n-grams in the text, each n-gram being an occurrence in the text of n sequentia tokens, each token being found in the vocabuary; dividing each n-gram into n- pairs of two adjacent segments, Where each segment consists of at east one token; for each n-gram, cacuating a ikeihood of coocation for each of the n- pairs of two adjacent segments of the n-gram and determining a score for the n-gram based on a owest cacuated ikeihood of coo cation for the each of the n- pairs; identifying a set of n-grams having scores above a threshod; and adding the identi?ed set of n-grams as compound tokens to the vocabu ary and removing constituent tokens that occur in the added compound tokens from the vocabuary. 7. The computer-readabe storage medium of caim 6 Where cacuating a ikeihood of coocation for each pair of segments of the n-gram comprises determining a ikeihood ratio 7» for each pair of segments that is computed in accor dance With the formua: US 7,555,428 B where L(H-) is a ikeihood of observing HZ- under an indepen dence hypothesis, L(Hc) is a ikeihood of observing Hc under a coocation hypothesis, and H is a pair of segments. 8. The computer-readabe storage medium of caim 7 Where the L(Hc) is computed for each pair of segments, t1, t2, in each n-gram in accordance With the formua: L(t1, [2 form compound) argrnax UH [(n gram does not form compound) 9. The computer-readabe storage medium of caim 7 Where, for each pair of segments, t1, t2, in each n-gram, the independence hypothesis comprises P(t2 t):p(t2 ) and the coocation hypothesis comprises P(t2 t)>p(t2?). 10. The computer-readabe storage medium of caim 6 Where identifying a puraity of unique n-grams in the text comprises skipping n-grams appearing in a ist of known compounds. 11. A system comprising: a computer readabe storage medium on Which a program product is stored; and one or more processors con?gured to execute the program product and perform operations comprising: extracting a vocabuary of tokens from text; iterating from n>2 down to n:2 Where n decreases by one each iteration and in each iteration perform ing the actions of: identifying a puraity of unique n-grams in the text, each n-gram being an occurrence in the text of n sequentia tokens, each token being found in the vocabuary; dividing each n-gram into n- pairs of two adjacent seg ments, Where each segment consists of at east one token; for each n-gram, cacuating a ikeihood of coocation for each of the n- pairs of two adjacent segments of the n-gram and determining a score for the n-gram based on a owest cacu ated ikeihood of coocation for the each of the n- pairs; identifying a set of n-grams having scores above a threshod; and adding the identi?ed set of n-grams as compound tokens to the vocabuary and removing constituent tokens that occur in the added compound tokens from the vocabuary. 12. The system of caim 11 Where cacuating a ikeihood of coocation for each pair of segments of the n-gram com prises determining a ikeihood ratio 7» for each pair of seg ments that is computed in accordance With the formua: where L(H-) is a ikeihood of observing HZ- under an indepen dence hypothesis, L(HC) is a ikeihood of observing Hc under a coocation hypothesis, and H is a pair of segments. 13. The system of caim 12 Where the L(Hc) is computed for each pair of segments, t1, t2, in each n-gram in accordance With the formua: L(t1, [2 form compound) argrnax UH.) [(n gram does not form compound)
14 The system of caim 12 Where, for each pair of seg ments, t1, t2, in each n-gram, the independence hypothesis comprises P(t2 t):p(t2?) and the coocation hypothesis comprises P(t2 t)>p(t2. US 7,555,428 B The system of caim 11 Where identifying a puraity of unique n-grams in the text comprises skipping n-grams appearing in a ist of known compounds. * * * * *
15 UNITED STATES PATENT AND TRADEMARK OFFICE CERTIFICATE OF CORRECTION PATENT NO. : 7,555,428 B Page 1 of 1 APPLICATION NO. : 10/ DATED : June 30, 2009 INVENTOR(S) : Aexander Mark Franz and Brian Mich It is certified that error appears in the above-identi?ed patent and that said Letters Patent is hereby corrected as shown beow: Coumn 9; Lines at Caim 3; repace the current formua with: arg max we] L01,t2jorm compound) [.(n gram does not form compound) Coumn 10; Lines at Caim 8; repace the current formua with: arg max L011) L(:],tzjorm compound) L(n gram does not form compound) Coumn 10; Line 21 at Caim 9; repace the current formua with: P(t2[tr):P(t2t1) Coumn 10; Lines at Caim 13; repace the current formua With: L(t,t2 form compound) L(n gram does not form compound) Signed and Seaed this Twenty-?fth Day of August, 2009 David J. Kappos Director ofthe United States Patent and Trademark O?ice
16 UNITED STATES PATENT AND TRADEMARK office CERTIFICATE OF CORRECTION PATENT No. : 7,555,428 B1 Page 1 of 1 APPLICATION NO. : 10/ DATED : June 30, 2009 INVENTOR(S) : Franz et a1. It is certified that error appears in the above-identi?ed patent and that said Letters Patent is hereby corrected as shown beow: On the Tite Page: The first or soe Notice shoud read - Subject to any discaimer, the term of this patent is extended or adjusted under 35 U.S.C. 154(b) by 914 days. Signed and Seaed this Twenty-eighth Day of December, 2010 David J. Kappos Director ofthe United States Patent and Trademark O?ice
US 20060288075Al (19) United States (12) Patent Application Publication (10) Pub. No.: US 2006/0288075 A1 Wu (57) A sender is selectively input- S301
US 20060288075A (19) United States (12) Patent Appication Pubication (10) Pub. No.: US 2006/0288075 A1 Wu (43) Pub. Date: Dec. 21, 2006 (54) ELECTRONIC MAILBOX ADDRESS BOOK MANAGEMENT SYSTEM AND METHOD
More information(12) Patent Application Publication (10) Pub. N0.: US 2006/0105797 A1 Marsan et al. (43) Pub. Date: May 18, 2006
(19) United States US 20060105797A (12) Patent Appication Pubication (10) Pub. N0.: US 2006/0105797 A1 Marsan et a. (43) Pub. Date: (54) METHOD AND APPARATUS FOR (52) US. C...... 455/522 ADJUSTING A MOBILE
More information(12) United States Patent
(12) United States Patent US008099768B2 (10) Patent N0.: Cheng et a]. () Date of Patent: Jan. 17, 12 (54) METHOD AND SYSTEM FOR (56) References Cited MULTI-PROTOCOL SINGLE LOGOUT U.S. PATENT DOCUMENTS
More informationUS 20080120174A1 (19) United States (12) Patent Application Publication (10) Pub. No.: US 2008/0120174 A1 L1 (43) Pub. Date: May 22, 2008
US 20080120174A1 (19) United States (12) Patent Appication Pubication (10) Pub. No.: US 2008/0120174 A1 L1 (43) Pub. Date: May 22, 2008 (54) METHOD AND SYSTEM FOR FLEXIBLE Pubication Cassi?cation PRODUCT
More informationUS 20060206935Al (19) United States (12) Patent Application Publication (10) Pub. No.: US 2006/0206935 A1 Choi et al. (43) Pub. Date: Sep.
US 20060206935A (19) United States (12) Patent Appication Pubication (10) Pub. No.: US 20060206935 A1 Choi et a. (43) Pub. Date: Sep. 14, 2006 (54) APPARATUS AND METHOD FOR ADAPTVELY PREVENTNG ATTACKS
More information(12) United States Patent Rune
(12) United States Patent Rune US006304913B1 (10) Patent N0.: (45) Date of Patent: US 6,304,913 B1 on. 16, 2001 (54) INTERNET SYSTEM AND METHOD FOR SELECTING A CLOSEST SERVER FROM A PLURALITY OF ALTERNATIVE
More informationUS 20020059452A1 (19) United States (12) Patent Application Publication (10) Pub. No.: US 2002/0059452 A1 Yokota et al. (43) Pub. Date: May 16, 2002
US 20020059452A1 (19) United States (12) Patent Appication Pubication (10) Pub. No.: US 2002/0059452 A1 Yokota et a. (43) Pub. Date: May 16, 2002 (54) METHOD AND SYSTEM FOR (30) Foreign Appication Priority
More informationIntroduction to XSL. Max Froumentin - W3C
Introduction to XSL Max Froumentin - W3C Introduction to XSL XML Documents Stying XML Documents XSL Exampe I: Hamet Exampe II: Mixed Writing Modes Exampe III: database Other Exampes How do they do that?
More informationUS 20110196934A1 (19) United States (12) Patent Application Publication (10) Pub. N0.: US 2011/0196934 A1 Sheer (43) Pub. Date: Aug.
US 20110196934A1 (19) United States (12) Patent Appication Pubication (10) Pub. N0.: US 2011/0196934 A1 Sheer (43) Pub. Date: Aug. 11, 2011 (54) SOCKET SMTP LOAD BALANCING (52) US. C...... 709/206; 709/226;
More informationUS 20070139188A1 (19) United States (12) Patent Application Publication (10) Pub. No.: US 2007/0139188 A1 Ollis et al. HOME PROCESSOR /\ J\ NETWORK
US 20070139188A1 (19) United States (12) Patent Application Publication (10) Pub. No.: US 2007/0139188 A1 Ollis et al. (43) Pub. Date: Jun. 21, 2007 (54) (75) (73) (21) (22) METHOD AND APPARATUS FOR COMMUNICATING
More informationEarly access to FAS payments for members in poor health
Financia Assistance Scheme Eary access to FAS payments for members in poor heath Pension Protection Fund Protecting Peope s Futures The Financia Assistance Scheme is administered by the Pension Protection
More informationChapter 3: JavaScript in Action Page 1 of 10. How to practice reading and writing JavaScript on a Web page
Chapter 3: JavaScript in Action Page 1 of 10 Chapter 3: JavaScript in Action In this chapter, you get your first opportunity to write JavaScript! This chapter introduces you to JavaScript propery. In addition,
More informationUS 20090193 l46al (19) United States (12) Patent Application Publication (10) Pub. No.: US 2009/0193146 A1 Albert et al.
US 20090193 46A (19) United States (12) Patent Appication Pubication (10) Pub. No.: US 2009/0193146 A1 Abert et a. (43) Pub. Date: Ju. 30, 2009 (54) UTLZNG VRTUAL SERVER WEGHT FOR (22) Fied: Jan. 24, 2008
More informationNCH Software FlexiServer
NCH Software FexiServer This user guide has been created for use with FexiServer Version 1.xx NCH Software Technica Support If you have difficuties using FexiServer pease read the appicabe topic before
More information(75) Inventors; Martin CZACHOR, JR (52) US Cl... 379/214.01 Newtown Square, PA (U S); Kevin CZACHOR, West Chester, PA (57) ABSTRACT
US 20120033800A1 (19) United States (12) Patent Appication Pubication (10) Pub. No.: US 2012/0033800 A1 CZACHOR, JR. et a. (43) Pub. Date: (54) SYSTEM AND METHOD FOR PROVDNG Pubication Cassi?cation ENHANCED
More informationAustralian Bureau of Statistics Management of Business Providers
Purpose Austraian Bureau of Statistics Management of Business Providers 1 The principa objective of the Austraian Bureau of Statistics (ABS) in respect of business providers is to impose the owest oad
More informationTechnical Writing - A Glossary of English Language Terms
US008589474B2 (12) United States Patent Parsons et a. (10) Patent N0.: (45) Date of Patent: US 8,589,474 B2 *Nov. 19, 2013 (54) (75) (73) (21) (22) (65) (51) (52) (58) (56) SYSTEMS AND METHODS FOR SOFTWARE
More informationUS 20080262882A1 (19) United States (12) Patent Application Publication (10) Pub. No.: US 2008/0262882 A1 Farrell (43) Pub. Date: Oct.
US 20080262882A1 (19) United States (12) Patent Appication Pubication (10) Pub. No.: US 2008/0262882 A1 Farre (43) Pub. Date: (54) PROVIDING AND CORRELATING CLINICAL Pubication Cassi?cation AND BUSINESS
More informationREADING A CREDIT REPORT
Name Date CHAPTER 6 STUDENT ACTIVITY SHEET READING A CREDIT REPORT Review the sampe credit report. Then search for a sampe credit report onine, print it off, and answer the questions beow. This activity
More informationsoftware, and perform automatic dialing according to the /*~102
US 20140105199A1 (19) United States (12) Patent Application Publication (10) Pub. No.: US 2014/0105199 A1 Tian (43) Pub. Date: (54) METHOD AND APPARATUS FOR AUTOMATIC DIALING ACCESS POINTS (71) Applicant:
More informationThe Web Insider... The Best Tool for Building a Web Site *
The Web Insider... The Best Too for Buiding a Web Site * Anna Bee Leiserson ** Ms. Leiserson describes the types of Web-authoring systems that are avaiabe for buiding a site and then discusses the various
More informationELECTRONIC FUND TRANSFERS YOUR RIGHTS AND RESPONSIBILITIES
About ELECTRONIC FUND TRANSFERS YOUR RIGHTS AND RESPONSIBILITIES The Eectronic Fund Transfers we are capabe of handing for consumers are indicated beow, some of which may not appy your account. Some of
More informationELECTRONIC FUND TRANSFERS YOUR RIGHTS AND RESPONSIBILITIES. l l
ELECTRONIC FUND TRANSFERS YOUR RIGHTS AND RESPONSIBILITIES The Eectronic Fund Transfers we are capabe of handing for consumers are indicated beow some of which may not appy your account Some of these may
More informationUS 20140046812A1 (19) United States (12) Patent Application Publication (10) Pub. No.: US 2014/0046812 A1 FAN et al. (43) Pub. Date: Feb.
US 20140046812A1 (19) United States (12) Patent Application Publication (10) Pub. No.: US 2014/0046812 A1 FAN et al. (43) Pub. Date: (54) EXPENSE REPORTS FOR PAYMENTS MADE (52) US. Cl. WITH A MOBILE DEVICE
More informationELECTRONIC FUND TRANSFERS. l l l. l l. l l l. l l l
Program Organization = Number "1060" = Type "123342" = "ETM2LAZCD" For = "502859" "TCCUS" "" Name "WK Number = Name "First "1001" = "1" Eectronic = "1001" = Financia "Jane Funds Doe" Northwest Xfer PG1
More informationELECTRONIC FUND TRANSFERS YOUR RIGHTS AND RESPONSIBILITIES. l l
ELECTRONIC FUND TRANSFERS YOUR RIGHTS AND RESPONSIBILITIES The Eectronic Fund Transfers we are capabe of handing for consumers are indicated beow some of which may not appy your account Some of these may
More informationFace Hallucination and Recognition
Face Haucination and Recognition Xiaogang Wang and Xiaoou Tang Department of Information Engineering, The Chinese University of Hong Kong {xgwang1, xtang}@ie.cuhk.edu.hk http://mmab.ie.cuhk.edu.hk Abstract.
More informationELECTRONIC FUND TRANSFERS YOUR RIGHTS AND RESPONSIBILITIES. l l. l l. l l. l l
ELECTRONIC FUND TRANSFERS YOUR RIGHTS AND RESPONSIBILITIES The Eectronic Fund Transfers we are capabe of handing for consumers are indicated beow some of which may not appy your account Some of these may
More informationAA Fixed Rate ISA Savings
AA Fixed Rate ISA Savings For the road ahead The Financia Services Authority is the independent financia services reguator. It requires us to give you this important information to hep you to decide whether
More informationAvaya Remote Feature Activation (RFA) User Guide
Avaya Remote Feature Activation (RFA) User Guide 03-300149 Issue 5.0 September 2007 2007 Avaya Inc. A Rights Reserved. Notice Whie reasonabe efforts were made to ensure that the information in this document
More information\ \ \ connection connection connection interface interface interface
US 20140122910A1 (19) United States (12) Patent Application Publication (10) Pub. No.: US 20140122910 A1 Chiu et al. (43) Pub. Date: May 1, 2014 (54) RACK SERVER SYSTEM AND OPERATION Publication Classi?cation
More informationNormalization of Database Tables. Functional Dependency. Examples of Functional Dependencies: So Now what is Normalization? Transitive Dependencies
ISM 602 Dr. Hamid Nemati Objectives The idea Dependencies Attributes and Design Understand concepts normaization (Higher-Leve Norma Forms) Learn how to normaize tabes Understand normaization and database
More informationWHITE PAPER BEsT PRAcTIcEs: PusHIng ExcEl BEyond ITs limits WITH InfoRmATIon optimization
Best Practices: Pushing Exce Beyond Its Limits with Information Optimization WHITE Best Practices: Pushing Exce Beyond Its Limits with Information Optimization Executive Overview Microsoft Exce is the
More information60 REDIRECTING THE PRINT PATH MANAGER 1
US006788429B1 (12) United States Patent (10) Patent No.: US 6,788,429 B1 Clough et al. (45) Date of Patent: Sep. 7, 2004 (54) REMOTE PRINT QUEUE MANAGEMENT FOREIGN PATENT DOCUMENTS (75) Inventors: James
More informationSecure Network Coding with a Cost Criterion
Secure Network Coding with a Cost Criterion Jianong Tan, Murie Médard Laboratory for Information and Decision Systems Massachusetts Institute of Technoogy Cambridge, MA 0239, USA E-mai: {jianong, medard}@mit.edu
More informationELECTRONIC FUND TRANSFERS YOUR RIGHTS AND RESPONSIBILITIES. l l. l l. l l
ELECTRONIC FUND TRANSFERS YOUR RIGHTS AND RESPONSIBILITIES The Eectronic Fund Transfers we are capabe of handing for consumers are indicated beow, some of which may not appy your account Some of these
More informationLookup CNAM / other database for calllng
(19) United States US 20140003589Al (12) Patent Application Publication (10) Pub. No.: US 2014/0003589 A1 Martino et al. (43) Pub. Date: Jan. 2, 2014 (54) (71) (72) (73) (21) (22) (63) PROVIDING AUDIO
More information3.3 SOFTWARE RISK MANAGEMENT (SRM)
93 3.3 SOFTWARE RISK MANAGEMENT (SRM) Fig. 3.2 SRM is a process buit in five steps. The steps are: Identify Anayse Pan Track Resove The process is continuous in nature and handed dynamicay throughout ifecyce
More informationLife Contingencies Study Note for CAS Exam S. Tom Struppeck
Life Contingencies Study Note for CAS Eam S Tom Struppeck (Revised 9/19/2015) Introduction Life contingencies is a term used to describe surviva modes for human ives and resuting cash fows that start or
More informationBack up information data by blocks, and generate backup data of each block
US 20140046903A1 (19) United States (12) Patent Application Publication (10) Pub. No.: US 2014/0046903 A1 Ylll (43) Pub. Date: (54) DATA BACKUP AND RECOVERY METHOD Publication Classi?cation FOR MOBILE
More information(54) RETARGETING RELATED TECHNIQUES (52) US. Cl... 705/1453 AND OFFERINGS. (75) Inventors: Ayrnan Farahat, San Francisco, (57) ABSTRACT
US 20120271714Al (19) United States (12) Patent Application Publication (10) Pub. N0.: US 2012/0271714 A1 Farahat et a]. (43) Pub. Date: Oct. 25, 2012 (54) RETARGETING RELATED TECHNIQUES (52) US. Cl......
More informationLaw Libraries in the Cloud **
LAW LIBRARY JOURNAL Vo. 101:2 [2009-15] Technoogy for Everyone... * Law Libraries in the Coud ** Diane Murey *** Ms. Murey provides an overview of the meanings of coud computing and software as a service,
More informationUS 20070016324A1 (19) United States (12) Patent Application Publication (10) Pub. No.: US 2007/0016324 A1. Operating System. 106 q f 108.
US 20070016324A1 (19) United States (12) Patent Application Publication (10) Pub. No.: US 2007/0016324 A1 Oddiraj u et al. (43) Pub. Date: Jan. 18, 2007 (54) SYSTEM BOOT OPTMZER (75) nventors: Chandar
More informationPatent Application Publication Sep. 30, 2004 Sheet 1 0f 2. Hierarchical Query. Contact Ow FIG. 1
US 20040193595A1 (19) United States (12) Patent Application Publication (10) Pub. No.: US 2004/0193595 A1 Kaminsky et al. (43) Pub. Date: Sep. 30, 2004 (54) NEAREST KNOWN PERSON DIRECTORY FUNCTION (75)
More informationINTERNATIONAL PAYMENT INSTRUMENTS
INTERNATIONAL PAYMENT INSTRUMENTS Dr Nguyen Minh Duc 2009 1 THE INTERNATIONAL CHAMBER OF COMMERCE THE ICC AT A GLANCE represent the word business community at nationa and internationa eves promotes word
More informationELECTRONIC FUND TRANSFERS YOUR RIGHTS AND RESPONSIBILITIES. l l l. l l
ELECTRONIC FUND TRANSFERS YOUR RIGHTS AND RESPONSIBILITIES The Eectronic Fund Transfers we are capabe of handing for consumers are indicated beow, some of which may not appy your account Some of these
More informationIntegrating Risk into your Plant Lifecycle A next generation software architecture for risk based
Integrating Risk into your Pant Lifecyce A next generation software architecture for risk based operations Dr Nic Cavanagh 1, Dr Jeremy Linn 2 and Coin Hickey 3 1 Head of Safeti Product Management, DNV
More informationELECTRONIC FUND TRANSFERS YOUR RIGHTS AND RESPONSIBILITIES
ELECTRONIC FUND TRANSFERS YOUR RIGHTS AND RESPONSIBILITIES The Eectronic Fund Transfers we are capabe of handing for consumers are indicated beow, some of which may not appy your account Some of these
More informationBusiness Banking. A guide for franchises
Business Banking A guide for franchises Hep with your franchise business, right on your doorstep A true understanding of the needs of your business: that s what makes RBS the right choice for financia
More informationDistribution of Income Sources of Recent Retirees: Findings From the New Beneficiary Survey
Distribution of Income Sources of Recent Retirees: Findings From the New Beneficiary Survey by Linda Drazga Maxfied and Virginia P. Rena* Using data from the New Beneficiary Survey, this artice examines
More informationUlllted States Patent [19] [11] Patent Number: 5,862,220
US005 862220A Uted States Patent [19] [11] Patent Number: 5,862,220 Perman [45] Date of Patent: Jan. 19, 1999 [54] METHOD AND APPARATUS FOR USNG 5,538,255 7/1996 Barker - NETWORK ADDRESS NFORMATON TO 5,561,709
More informationPay-on-delivery investing
Pay-on-deivery investing EVOLVE INVESTment range 1 EVOLVE INVESTMENT RANGE EVOLVE INVESTMENT RANGE 2 Picture a word where you ony pay a company once they have deivered Imagine striking oi first, before
More informationELECTRONIC FUND TRANSFERS YOUR RIGHTS AND RESPONSIBILITIES. l l. l l
ELECTRONIC FUND TRANSFERS YOUR RIGHTS AND RESPONSIBILITIES The Eectronic Fund Transfers we are capabe of handing for consumers are indicated beow some of which may not appy your account Some of these may
More informationOracle Project Financial Planning. User's Guide Release 11.1.2.2
Orace Project Financia Panning User's Guide Reease 11.1.2.2 Project Financia Panning User's Guide, 11.1.2.2 Copyright 2012, Orace and/or its affiiates. A rights reserved. Authors: EPM Information Deveopment
More informationUS 20020072350A1 (19) United States (12) Patent Application Publication (10) Pub. No.: US 2002/0072350 A1 Fukuzato (43) Pub. Date: Jun.
US 20020072350A1 (19) United States (12) Patent Application Publication (10) Pub. No.: US 20020072350 A1 Fukuzato (43) Pub. Date: Jun. 13, 2002 (54) BACKUP METHOD OF APPLICATIONS OF PORTABLE CELLULAR PHONE
More informationFixed income managers: evolution or revolution
Fixed income managers: evoution or revoution Traditiona approaches to managing fixed interest funds rey on benchmarks that may not represent optima risk and return outcomes. New techniques based on separate
More informationThe guaranteed selection. For certainty in uncertain times
The guaranteed seection For certainty in uncertain times Making the right investment choice If you can t afford to take a ot of risk with your money it can be hard to find the right investment, especiay
More informationELECTRONIC FUND TRANSFERS YOUR RIGHTS AND RESPONSIBILITIES
ELECTRONIC FUND TRANSFERS YOUR RIGHTS AND RESPONSIBILITIES The Eectronic Fund Transfers we are capabe of handing for consumers are indicated beow, some of which may not appy your account Some of these
More informationSubject: Corns of En gineers and Bureau of Reclamation: Information on Potential Budgetarv Reductions for Fiscal Year 1998
GAO United States Genera Accounting Office Washington, D.C. 20548 Resources, Community, and Economic Deveopment Division B-276660 Apri 25, 1997 The Honorabe Pete V. Domenici Chairman The Honorabe Harry
More informationTeamwork. Abstract. 2.1 Overview
2 Teamwork Abstract This chapter presents one of the basic eements of software projects teamwork. It addresses how to buid teams in a way that promotes team members accountabiity and responsibiity, and
More informationArt of Java Web Development By Neal Ford 624 pages US$44.95 Manning Publications, 2004 ISBN: 1-932394-06-0
IEEE DISTRIBUTED SYSTEMS ONLINE 1541-4922 2005 Pubished by the IEEE Computer Society Vo. 6, No. 5; May 2005 Editor: Marcin Paprzycki, http://www.cs.okstate.edu/%7emarcin/ Book Reviews: Java Toos and Frameworks
More information(71) Applicant: SPEAKWRITE, LLC,Austin, TX (US)
US 20130304465Al (19) United States (12) Patent Application Publication (10) Pub. No.: US 2013/0304465 A1 Henry et al. (43) Pub. Date: NOV. 14, 2013 (54) METHOD AND SYSTEM FOR AUDIO-VIDEO (52) US. Cl.
More informationFast Robust Hashing. ) [7] will be re-mapped (and therefore discarded), due to the load-balancing property of hashing.
Fast Robust Hashing Manue Urueña, David Larrabeiti and Pabo Serrano Universidad Caros III de Madrid E-89 Leganés (Madrid), Spain Emai: {muruenya,darra,pabo}@it.uc3m.es Abstract As statefu fow-aware services
More informationA Description of the California Partnership for Long-Term Care Prepared by the California Department of Health Care Services
2012 Before You Buy A Description of the Caifornia Partnership for Long-Term Care Prepared by the Caifornia Department of Heath Care Services Page 1 of 13 Ony ong-term care insurance poicies bearing any
More informationi Load balancer relays request to selected node
US 20040243709A1 (19) United States (12) Patent Application Publication (10) Pub. No.: US 2004/0243709 A1 Kalyanavarathan et al. (43) Pub. Date: Dec. 2, 2004 (54) SYSTEM AND METHOD FOR CLUSTER-SENSITIVE
More information(12) United States Patent (10) Patent No.: US 8,259,911 B1 Trandal et al. (45) Date of Patent: *Sep. 4, 2012
USOO8259911B1 (12) United States Patent (10) Patent No.: Tranda et a. (45) Date of Patent: *Sep. 4, 12 (54) CALL PROCESSING AND SUBSCRIBER 3,956,595 A 5/1976 Sobanski REGISTRATION SYSTEMS AND METHODS 4,009,337
More informationUS 201203 03424Al (19) United States (12) Patent Application Publication (10) Pub. N0.: US 2012/0303424 A1 Lundstrom (43) Pub. Date: NOV.
US 201203 03424Al (19) United States (12) Patent Application Publication (10) Pub. N0.: US 2012/0303424 A1 Lundstrom (43) Pub. Date: NOV. 29, 2012 (54) METHOD AND SOFTWARE FOR Publication Classi?cation
More informationNew Features in Cisco IOS 12.4
Page 1 of 5 New Features in Cisco IOS 12.4 Peter J. Wecher Introduction I'm writing this in mid-august. Things have been hot (business, weather). That means its time for my more-or-ess annua artice about
More informationMarch 14, 1967 _ A. GABOR ETAL I 3,309,597 MOTOR ACCELERATION CONTROL SYSTEM. l Filed April 20, 1964. T1.l _. ,57m/Az.
March 14, 1967 _ A. GABOR ETAL I 3,309,597 MOTOR ACCELERATION CONTROL SYSTEM Fied Apri 20, 1964 T1. _,57m/Az. SVGA/,QL ATTORNEY United States Patent O 1 3,309,597 MOTOR ACCELERATION CONTRGL SYSTEM Andrew
More informationDynamic Pricing Trade Market for Shared Resources in IIU Federated Cloud
Dynamic Pricing Trade Market or Shared Resources in IIU Federated Coud Tongrang Fan 1, Jian Liu 1, Feng Gao 1 1Schoo o Inormation Science and Technoogy, Shiiazhuang Tiedao University, Shiiazhuang, 543,
More informationwanagamem transformation and management
US 20120150919Al (19) United States (12) Patent Application Publication (10) Pub. No.: US 2012/0150919 A1 Brown et al. (43) Pub. Date: Jun. 14, 2012 (54) (76) (21) (22) (60) (51) AGENCY MANAGEMENT SYSTEM
More informationLet s get usable! Usability studies for indexes. Susan C. Olason. Study plan
Let s get usabe! Usabiity studies for indexes Susan C. Oason The artice discusses a series of usabiity studies on indexes from a systems engineering and human factors perspective. The purpose of these
More informationOrder-to-Cash Processes
TMI170 ING info pat 2:Info pat.qxt 01/12/2008 09:25 Page 1 Section Two: Order-to-Cash Processes Gregory Cronie, Head Saes, Payments and Cash Management, ING O rder-to-cash and purchase-topay processes
More informationUS 20090157756Al (19) United States (12) Patent Application Publication (10) Pub. No.: US 2009/0157756 A1 Sanvido (43) Pub. Date: Jun.
US 20090157756Al (19) United States (12) Patent Application Publication (10) Pub. No.: US 2009/0157756 A1 Sanvido (43) Pub. Date: Jun. 18, 2009 (54) FILE SYSTEM FOR STORING FILES IN Publication Classi?cation
More informationChapter 3: e-business Integration Patterns
Chapter 3: e-business Integration Patterns Page 1 of 9 Chapter 3: e-business Integration Patterns "Consistency is the ast refuge of the unimaginative." Oscar Wide In This Chapter What Are Integration Patterns?
More informationHAVE YOU EMBRACED THE NEW E-DISCOVERY RULES OR ARE YOU JUST HOPING YOU WON T HAVE TO DEAL WITH THEM?
J U N E 2 0 0 9 N E V A D A L A W Y E R HAVE YOU EMBRACED THE NEW E-DISCOVERY RULES OR ARE YOU JUST HOPING YOU WON T HAVE TO DEAL WITH THEM? BY JOHN L. KRIEGER, ESQ. The amendments to the Federa Rues of
More informationCOMPARISON OF DIFFUSION MODELS IN ASTRONOMICAL OBJECT LOCALIZATION
COMPARISON OF DIFFUSION MODELS IN ASTRONOMICAL OBJECT LOCALIZATION Františe Mojžíš Department of Computing and Contro Engineering, ICT Prague, Technicá, 8 Prague frantise.mojzis@vscht.cz Abstract This
More informationHybrid Interface Solutions for next Generation Wireless Access Infrastructure
tec. Connectivity & Networks Voker Sorhage Hybrid Interface Soutions for next Generation Wireess Access Infrastructure Broadband wireess communication wi revoutionize every aspect of peope s ives by enabing
More informationSpatio-Temporal Asynchronous Co-Occurrence Pattern for Big Climate Data towards Long-Lead Flood Prediction
Spatio-Tempora Asynchronous Co-Occurrence Pattern for Big Cimate Data towards Long-Lead Food Prediction Chung-Hsien Yu, Dong Luo, Wei Ding, Joseph Cohen, David Sma and Shafiqu Isam Department of Computer
More informationPrecise assessment of partial discharge in underground MV/HV power cables and terminations
QCM-C-PD-Survey Service Partia discharge monitoring for underground power cabes Precise assessment of partia discharge in underground MV/HV power cabes and terminations Highy accurate periodic PD survey
More informationELECTRONIC FUND TRANSFERS YOUR RIGHTS AND RESPONSIBILITIES. l l l
ELECTRONIC FUND TRANSFERS YOUR RIGHTS AND RESPONSIBILITIES The Eectronic Fund Transfers we are capabe of handing for consumers are indicated beow, some of which may not appy your account Some of these
More informationSABRe B2.1: Design & Development. Supplier Briefing Pack.
SABRe B2.1: Design & Deveopment. Suppier Briefing Pack. 2013 Ros-Royce pc The information in this document is the property of Ros-Royce pc and may not be copied or communicated to a third party, or used
More informationOracle Hyperion Tax Provision. User's Guide Release 11.1.2.2
Orace Hyperion Tax Provision User's Guide Reease 11.1.2.2 Tax Provision User's Guide, 11.1.2.2 Copyright 2013, Orace and/or its affiiates. A rights reserved. Authors: EPM Information Deveopment Team Orace
More informationSPOTLIGHT. A year of transformation
WINTER ISSUE 2014 2015 SPOTLIGHT Wecome to the winter issue of Oasis Spotight. These newsetters are designed to keep you upto-date with news about the Oasis community. This quartery issue features an artice
More informationIntroduction the pressure for efficiency the Estates opportunity
Heathy Savings? A study of the proportion of NHS Trusts with an in-house Buidings Repair and Maintenance workforce, and a discussion of eary experiences of Suppies efficiency initiatives Management Summary
More informationExample of Credit Card Agreement for Bank of America Visa Signature and World MasterCard accounts
Exampe of Credit Card Agreement for Bank of America Visa Signature and Word MasterCard accounts PRICING INFORMATION Actua pricing wi vary from one cardhoder to another Annua Percentage Rates for Purchases
More informationBreakeven analysis and short-term decision making
Chapter 20 Breakeven anaysis and short-term decision making REAL WORLD CASE This case study shows a typica situation in which management accounting can be hepfu. Read the case study now but ony attempt
More informationDigital Competences in the Digital Agenda
C H A P T E R 4 Digita Competences in the Digita Agenda In 2011, 73% of EU 27 househods had access to the internet, a 3 percentage point. increase over 2010. A ack of skis is the second most important
More informationWelcome to the wonderful world of investing
Wecome to the wonderfu word of investing Congratuations. You ve taken the 1st step to the word of financia panning. The new inter-connected word offers range of products to suit your investment needs which
More informationEducation sector: Working conditions and job quality
European Foundation for the Improvement of Living and Working Conditions sector: Working conditions and job quaity Work pays a significant roe in peope s ives, in the functioning of companies and in society
More informationWith the arrival of Java 2 Micro Edition (J2ME) and its industry
Knowedge-based Autonomous Agents for Pervasive Computing Using AgentLight Fernando L. Koch and John-Jues C. Meyer Utrecht University Project AgentLight is a mutiagent system-buiding framework targeting
More informationKey Features of Life Insurance
Key Features of Life Insurance Life Insurance Key Features The Financia Conduct Authority is a financia services reguator. It requires us, Aviva, to give you this important information to hep you to decide
More informationTERM INSURANCE CALCULATION ILLUSTRATED. This is the U.S. Social Security Life Table, based on year 2007.
This is the U.S. Socia Security Life Tabe, based on year 2007. This is avaiabe at http://www.ssa.gov/oact/stats/tabe4c6.htm. The ife eperiences of maes and femaes are different, and we usuay do separate
More informationADVANCED ACCOUNTING SOFTWARE FOR GROWING BUSINESSES
ADVANCED ACCOUNTING SOFTWARE FOR GROWING BUSINESSES Product Features 1. System 2. Saes Ledger Unimited companies with password protection User security Muti-user system: 1 user comes as standard, up to
More informationChapter 3: Investing: Your Options, Your Risks, Your Rewards
Chapter 3: Investing: Your Options, Your Risks, Your Rewards Page 1 of 10 Chapter 3: Investing: Your Options, Your Risks, Your Rewards In This Chapter What is inside a mutua fund? What is a stock? What
More informationTeach yourself Android application development - Part I: Creating Android products
Teach yoursef Android appication deveopment - Part I: Creating Android products Page 1 of 7 Part of the EE Times Network A Artices Products Course TechPaper Webinars Login Register Wecome, Guest HOME DESIGN
More informationDelhi Business Review X Vol. 4, No. 2, July - December 2003. Mohammad Talha
Dehi Business Review X Vo. 4, No. 2, Juy - December 2003 TREATMENT TMENT OF GOODWILL IN ACCOUNTING Mohammad Taha GOODWILL is usuay ony recorded in an accounting system when a company purchases an unincorporated
More informationprofessional indemnity insurance proposal form
professiona indemnity insurance proposa form Important Facts Reating To This Proposa Form You shoud read the foowing advice before proceeding to compete this proposa form. Duty of Discosure Before you
More informationeg Enterprise vs. a Big 4 Monitoring Soution: Comparing Tota Cost of Ownership Restricted Rights Legend The information contained in this document is confidentia and subject to change without notice. No
More informationWHITE PAPER UndERsTAndIng THE VAlUE of VIsUAl data discovery A guide To VIsUAlIzATIons
Understanding the Vaue of Visua Data Discovery A Guide to Visuaizations WHITE Tabe of Contents Executive Summary... 3 Chapter 1 - Datawatch Visuaizations... 4 Chapter 2 - Snapshot Visuaizations... 5 Bar
More information