Scalable Biomedical Named Entity Recognition: Investigation of a Database-Supported SVM Approach

Size: px
Start display at page:

Download "Scalable Biomedical Named Entity Recognition: Investigation of a Database-Supported SVM Approach"

Transcription

1 Scalable Biomedical Named Etity Recogitio: Ivestigatio of a Database-Supported SVM Approach Moa Solima Habib * ad Jugal Kalita Departmet of Computer Sciece Uiversity of Colorado, 1420 Austi Bluffs Pkwy Colorado Sprigs, CO USA * Cotact author: [email protected] Abstract his paper explores the scalability issues associated with solvig the Named Etity Recogitio (NER) problem usig Support Vector Machies (SVM) ad high-dimesioal features ad presets two implemetatios to address these issues. he NER domai chose for these experimets is the biomedical publicatios domai, especially selected due to its importace ad iheret challeges. he performace results of a set of experimets coducted usig existig biary ad multiclass SVM with icreasig traiig data sizes are examied ad compared to results obtaied usig our ew implemetatios. Our baselie machie learig approach elimiates prior laguage or domai-specific kowledge ad achieves good outof-the-box accuracy measures that are comparable to those obtaied usig more complex approaches. he traiig time of multi-class SVM is reduced by several orders of magitude, which would make support vector machies a more viable ad practical machie learig solutio for real-world problems with large datasets. he first implemetatio SVM-PerfMulti is a ew istatiatio of SVM-Struct v3.0 built as a stadaloe C executable. he secod implemetatio SVM- MultiDB is a embedded database solutio for both biary ad multi-class SVM, built as a server-side extesio of PostgreSQL. Idex erms Named etity recogitio, support vector machies, database extesio, bioiformatics. I. INRODUCION Named etity recogitio (NER) is oe of the importat tasks i iformatio extractio, which ivolves the idetificatio ad classificatio of words or sequeces of words deotig a cocept or etity. Examples of amed etities i geeral text are ames of persos, locatios, or orgaizatios. Domaispecific amed etities are those terms or phrases that deote cocepts relevat to oe particular domai. For example, protei ad gee ames are amed etities which are of iterest to the domai of molecular biology ad medicie. he massive growth of textual iformatio available i the literature ad o the Web ecessitates the automatio of idetificatio ad maagemet of amed etities i text. he task of idetifyig amed etities i a particular laguage is ofte accomplished by icorporatig kowledge about the laguage taxoomy i the method used. I the Eglish laguage, such kowledge may iclude capitalizatio of proper ames, kow titles, commo prefixes or suffixes, part of speech taggig, ad/or idetificatio of ou phrases i text. echiques that rely o laguage-specific kowledge may ot be suitable for portig to other laguages. Moreover, the compositio of amed etities i literature pertaiig to specific domais follows differet rules i each, which may or may ot beefit from those relevat to geeral NER. I previous work [8], a simple architecture that elimiates laguage ad domai-specific kowledge from the amed etity recogitio process is applied to the Eglish biomedical etity recogitio task, as a baselie for other laguages ad domais. he biomedical field NER remais a challegig task due to growig omeclature, ambiguity i the left boudary of etities caused by descriptive amig, difficulty of maually aotatig large sets of traiig data, strog overlap amog differet etities, to cite a few of the NER challeges i this domai. he approach used reduces the pread post-processig of the textual data to a miimum ad capitalizes o SVM s strog geeralizatio ability to classify the amed etities. he accuracy measures achieved are comparable to those obtaied usig more complex techiques, which ecourage us to explore ways to improve the scalability of multi-class support vector machies. I this paper, we preset a ew istatiatio of SVM-Struct v3.0 [13] that exteds the improved biary SVM algorithm SVM-Perf [14] with a multi-class cuttig plae algorithm that reduces the multi-class traiig time by several orders of magitude. We refer to the ew multi-class implemetatio as SVM- PerfMulti. We also preset a ew database-supported implemetatio that icorporates the learig ad optimizatio algorithms of both SVM-Perf ad SVM- PerfMulti. For the sake of simplicity, we will refer to the biary classificatio compoet of the database implemetatio as SVM-PerfDB ad will refer to the multiclass compoet as SVM-MultiDB. Fially, the results of a set of scalability experimets usig existig ad ew SVM solutios are reported. hese experimets use biary ad multi-class SVM with a large set of real-world data from the biomedical literature. I Sectio II, the theory of biary ad multi-class support vector machies is briefly itroduced. Sectio III describes the experimets desig ad summarizes the results of a baselie experimet coducted durig the previous work [8] i order to assess the feasibility of our laguage ad domai-idepedet machie learig NER approach usig SVM ad highdimesioal features. he baselie experimet desig reduces pre-processig to feature extractio ad elimiates the use of prior laguage or domai kowledge. he traiig time ad 129

2 performace results of the baselie experimet are vastly improved usig the ew SVM-PerfMulti ad SVM-MultiDB implemetatios, which achieve good out-of-the-box accuracy ad performace measures. We briefly describe the ew SVM- Perf structural SVM formulatio ad the cuttig plae algorithm of SVM-PerfMulti i Sectio IV. he database architecture ad schema of SVM-PerfDB ad SVM-MultiDB are preseted i Sectio V. Fially, we report the results of several sets of sigle-class ad multi-class scalability tests usig existig ad ew SVM implemetatios ad icreasig traiig data size i Sectio VI. II. SUPPOR VECOR MACHINES he Support Vector Machie (SVM) is a powerful machie learig tool based o firm statistical ad mathematical foudatios cocerig geeralizatio ad optimizatio theory. SVM is based o Vapik s statistical learig theory [28] ad falls at the itersectio of kerel methods ad maximum margi classifiers. Support vector machies have bee successfully applied to may real-world problems such as face detectio, itrusio detectio, hadwritig recogitio, iformatio extractio, ad others. Support Vector Machie is a attractive method due to its high geeralizatio capability ad its ability to hadle highdimesioal iput data. Compared to eural etworks or decisio trees, SVM does ot suffer from the local miima problem, it has fewer learig parameters to select, ad it produces stable ad reproducible results. However, SVM suffers from slow traiig especially with o-liear kerels ad with large iput data size. Support vector machies are primarily biary classifiers. Extesios to multi-class problems are most ofte performed by combiig several biary machies i order to produce the fial multiclassificatio results. he more difficult problem of traiig oe SVM to classify all classes uses much more complex optimizatio algorithms ad are much slower to trai tha biary classifiers. A. Biary Support Vector Classificatio Biary classificatio is the task of classifyig the members of a give set of objects ito two groups o the basis of whether they have some property or ot. May applicatios take advatage of biary classificatio tasks, where the aswer to some questio is either a yes or o. he mathematical foudatio of Support Vector Machies ad the uderlyig Vapik-Chervoekis dimesio (VC Dimesio) is described i details i the literature coverig the statistical learig theory [2, 3, 12, 28] ad may other sources. he mai objective of support vector machies is to fid the optimal hyperplae separatig positive ad egative examples by maximizig the margi betwee the two classes. I mathematical terms, the problem is to fid f ( x ) = ( w xi + b) with maximal margi, such that: w xi + b = 1 for data poits that are support vectors w xi + b > 1 for other data poits Assumig a liearly separable dataset, the task of learig coefficiets w ad b of support vector machie f ( x ) = ( w xi + b) reduces to solvig the followig costraied optimizatio problem: fid w ad b that miimize: 1 w w 2 (1) s.t. yi ( w xi + b) " 1,! i I the o-liearly separable case, the margi maximizatio techique may be relaxed by a degree of error i the separatio. Slack Variables!i are itroduced to represet the error degree for each iput data poit. he optimizatio goal i this case is to maximize the margi while miimizig the slack variables, i.e., to fid w ad b that miimize:! 1 w w + C " (2) 2 i s.t. i= 1 yi ( w xi + b) " 1# $ i, $ i " 0,! i B. Multi-class Support Vector Classificatio For classificatio problems with multiple classes, differet approaches are developed i order to decide whether a give data poit belogs to oe of the classes or ot. he most commo approaches are those that combie several biary classifiers ad use a votig techique to make the fial classificatio decisio. hese iclude: Oe-Agaist-All [28], Oe-Agaist-Oe [17], Directed Acyclic Graph (DAG) [22], ad Half-agaist-half method [19]. A more complex approach is oe that attempts to build oe Support Vector Machie that separates all classes at the same time. I this sectio, these multi-class SVM approaches are briefly itroduced. Fig. compares the decisio boudaries for three classes usig a Oe-Agaist-All SVM, a Oe-Agaist-Oe SVM, ad a All- ogether SVM [2]. Overlappig areas represet uclassifiable regios, where a poit x is either positively idetified as belogig to more tha oe class, or is egatively idetified relative to all classes. Fig. 1 Compariso of Multi-Class Boudaries 1) Oe-Agaist-All Multi-Class SVM Oe-Agaist-All [28] is the earliest ad simplest multiclass SVM. For a k-class problem, Oe-Agaist-All maximizes k hyperplaes separatig each class from all the rest by costructig k biary SVMs. he ith SVM is traied with all the samples from the ith class agaist all the samples from the other classes. o classify a sample x, x is evaluated by all of the k SVMs ad the label of the class that has the largest value of the decisio fuctio is selected. 130

3 2) Oe-Agaist-Oe or Pairwise SVM Oe-Agaist-Oe [17] costructs oe biary machie betwee pairs of classes. For a k-class problem, it costructs k ( k!1) 2 biary classifiers. o classify a sample x, the sample is evaluated by each of the k ( k!1) 2 machies. he class that gets the largest value of the decisio fuctio by most machies is chose as the classificatio of x. 3) Half-Agaist-Half SVM Half-Agaist-Half multi-class SVM [19] is useful for problems where there is a close similarity betwee groups of classes. Usig Half-Agaist-Half SVM, a biary classifier is built that evaluates oe group of classes agaist aother group. he traied model cosists of at most! log k 2 " 2 biary SVMs. o classify a sample x, this techique idetifies the group of classes where the sample x belogs, tha cotiues to evaluate x with a subgroup, ad so o, util the fial class label is foud. he classificatio process is similar to a decisio tree that requires! log k " evaluatios at most. 2 4) All-ogether or All-At-Oce SVM A All-ogether multi-classificatio approach is computatioally more expesive yet usually more accurate tha all other multi-classificatio methods. Hsu ad Li [9] ote that as it is computatioally more expesive to solve multi-class problems, comparisos of these methods usig large-scale problems have ot bee seriously coducted. he experimets reported i this paper are a attempt to classify a large-scale problem usig this approach. he All-ogether approach builds oe SVM that maximizes all separatig hyperplaes at the same time. raiig data represetig all classes is used to geerate the traied model. With this approach, there are o uclassifiable regios as each data poit belogs to some class represeted i the traiig dataset. Fig. illustrates the elimiatio of uclassifiable regios i this case. he All-together multi-class SVM poses a complex optimizatio problem as it maximizes all decisio fuctios at the same time [5]. he idea is similar to the Oe-Agaist-All approach. It costructs k two-class rules where the mth fuctio w m!( x) + b separates traiig vectors of the class m from the other vectors. here are k decisio fuctios but all are obtaied by solvig oe problem. he primal formulatio of the optimizatio problem [2, 9] is to fid: k l 1 mi w w + C " (3)! m m! i wm," i 2 m= 1 i= 1 w $ xi ) wm$ ( xi s.t. i y i (! ) " em!# i, i = 1,..., l m m where ei " 0, if yi = m, ad ei " 1, if yi! m ad the decisio fuctio is argmax w m! ( x). m= 1,..., k Algorithms to decompose the problem [9] ad to solve the optimizatio problem [26] have bee developed, however, the All-ogether multi-class SVM approach remais a dautig task. he traiig time is very slow which makes the approach so far uusable for real-world problems with a large data set ad/or a high umber of classes. I this paper, we preset a improved multi-class solutio ad use it with a large set of real-world data to idetify biomedical amed etities. C. SVM Scalability ad Usability Challeges Beett ad Campbell [4] discuss the commo usability ad scalability issues of support vector machies. I this sectio we summarize the SVM scalability challeges oted i the literature ad i practice, which iclude: Optimizatio requires O( 3 ) time ad O( 2 ) memory for sigle class traiig, where is iput size (depedig o algorithm used). Multi-class traiig time is much higher, especially for All-ogether optimizatio, ad larger umber of classes. Slow traiig, especially with large iput datasets ad/or o-liear kerels. I additio to the scalability issues, tuig support vector machies requires the selectio of a suitable kerel fuctio ad model parameters. Model parameters are ofte selected usig a grid search, cross-validatio, or heuristic-based methods. Selectio of a suitable kerel fuctio for the problem at had is aother desiger-determied factor. Moreover, maagemet ad orgaizatio of learig ad classificatio results i a maer that fosters their reusability ad itegratio with other pre- or post-processig modules is curretly ot easily achievable. III. BASELINE EXPERIMENS Our baselie experimet [8] aims to idetify biomedical amed etities usig Support Vector Machies (SVM) [28], due to their geeralizatio capability ad their ability to hadle high-dimesioal feature ad iput space. he traiig ad testig data use the JNLPBA-04 shared task [16] data, which is a subset of the GENIA aotated corpus [15] of MEDLINE articles. he ames of proteis, cell lies, cell types, DNA ad RNA etities are previously labeled. he amed etities are ofte composed of a sequece of words. he traiig data icludes 2,000 aotated abstracts (cosistig of 492,551 tokes). he testig data icludes 404 abstracts (cosistig of 101,039 tokes) aotated for the same classes of etities. he fractio of positive examples with respect to the total umber of tokes i the traiig set varies from about 0.2% to about 6%. Basic statistics about the data sets as well as the absolute ad relative frequecies for amed etities withi each set ca be foud i [16]. he traiig ad test data pre-processig ivolves morphological ad cotextual features extractio oly, usig the JFEX software [6]. No laguage-specific pre-processig such as part-of-speech or ou phrases taggig is used. No dictioaries, gazetteers, or other domai-specific kowledge are used. he geerated feature space is very large, icludig over a millio differet features. All features are biary, i.e., each feature deotes whether the curret toke possesses this feature (oe) or ot (zero). he morphological features extracted iclude checkig 131

4 whether a toke is capitalized, umeric, is a puctuatio, is all i uppercase, is all i lowercase, is a sigle character, is a special character, icludes a hyphe, icludes a slash, is alphaumeric, cotais caps ad digits, ad a geeral regular expressio summarizig word shape. I additio, a cotextual collocatio of tokes active over three positios aroud the toke itself is used i order to provide a movig widow of cosecutive tokes which describes the cotext of the toke relative to its surroudig. A compariso of the performace of our baselie multiclass experimet to other systems usig SVM for biomedical NER is preseted i [8]. Usig a low value of the regularizatio factor C=0.01, the overall recall measure achieved is 62.43%, with a precisio measure of 64.50%, ad a fial F-score of 63.45%. While the baselie experimet achieved performace measures that are comparable to those obtaied usig more complex approaches such as [6, 25, 29], the traiig time usig the All-ogether multi-class SVM implemetatio, SVM-Multiclass [5, 26] was very high. Learig with the complete traiig dataset completed i 97 hours o a Xeo quad-processor 3.6 GHz machie. ABLE I SVM-PERFMULI MULI-CLASS EXPERIMEN RESULS VS. SYSEMS USING JNLPBA-04 DAA System Overall Performace Recall/Precisio/F-Score Zhou [30] Habib 76.0 / 69.4 / / 66.4 / 67.2 Giuliao [6] 64.4 / 69.8 / 67.0 Sog [25] 67.8 / 64.8 / 66.3 Rössler [23] 67.4 / 61.0 / 64.0 Habib [8] 62.3 / 64.5 / 63.4 Park [21] 66.5 / 59.8 / 63.0 Lee [18] 50.8 / 47.6 / 49.1 Baselie [16] 52.6 / 43.6 / 47.7 he traiig time is reduced by several orders of magitude usig our ew SVM-PerfMulti cuttig plae algorithm, briefly described i Sectio IV (see able III for a sample compariso). he performace measures are also improved to reach a overall recall measure of 67.9, with a precisio of 66.4 ad a F-score of 67.2 a almost 4% out-of-the-box performace improvemet. Protei amed etities are idetified with a recall, precisio, ad F- score. his places the performace of SVM-PerfMulti i secod place as compared to other published results. able I compares the performace measures attaied with differet systems. It is importat to ote that our out-of-the-box performace is attaied with o pre- or post-processig other tha feature extractio ad labelig. No exteral kowledge is used. he baselie performace i [30] is 60.3 F-score, which is boosted to 72.6 durig post-processig with the use of additioal dictioaries to relabel misclassified etities. We cosider our performace of 67.2 F-score to be a baselie measure, which may be boosted i post-processig with the use of exteral dictioaries ad lists of kow amed etities, if available. 132 IV. IMPROVED LINEAR SVM RAINING IME While coductig the scalability experimets, preseted i details i Sectio VI, we examie the differeces betwee the learig algorithm of SVM-Light [10-12], ad that of SVM- Perf [13, 14, 26, 27] o the total traiig time, memory usage, ad umber of support vectors. Sice SVM-Multiclass [5, 26] uses SVM-Light s learig algorithm, ad give the similar observatios oted i the experimets usig SVM- Light ad SVM-Multiclass, we explore the possibility of extedig SVM-Perf s structural learig algorithm to the multi-class classificatio problem. A. SVM-Perf Structural Formulatio SVM-Perf [14] improves the traiig time for liear biary classificatio problems by usig a ew SVM formulatio that combies costraits from the origial SVM formulatio (2). 1 mi w w + C! (4) w, e" 0 2 s.t. 1 1 % c $ { 0,1} : w & ci yixi # & ci "! i= 1 i= 1 Each costrait c i i the structural formulatio correspods to the sum of a subset of costraits, where c i ca be see as the maximum fractio of traiig errors possible over each subset [14]. Oly oe slack variable _ is shared across all costraits that is a upper boud o the fractio of traiig errors. SVM-Perf s traiig algorithm iteratively costructs a set of mostly violated costraits (oe per iteratio) ad adds it to the workig set used by the optimizatio algorithm. he algorithm repeats util o more costraits ca be foud to violate by more tha the desired precisio _. he cuttig plae algorithm fids the most violated costrait after each iteratio that correspods to & 1 1 # c = argmax% ( ci ' ( ci yi ( w xi )" (5) c= {0,1} $ i= 1 i= 1! where, # 1 yi ( w xi ) < 1 ci = "! 0 otherwise B. SVM-PerfMulti Cuttig Plae Algorithm I order to ivestigate the potetial improvemet i traiig for multi-class learig usig the learig algorithms implemeted i SVM-Perf [14] for traiig liear machies, we developed a iitial prototype ad the prelimiary experimets resulted i a tremedous improvemet of the traiig time while achievig same or better accuracy measures as SVM-Multiclass. A sample of the traiig time improvemet is reported i able III. he iitial prototype is motivated by the observatio of the umber of support vectors produced i the biary case usig SVM-Light ad i the multiclass case usig SVM-Multiclass. Durig the iitial scalability experimets, we oted that the umber of support vectors i both cases is is O( 0.8 ) w.r.t. the traiig data size. Usig SVM-Perf combied costraits learig algorithm, the umber of support vectors i the biary case is reduced from several thousads to less tha a hudred. SVM-PerfMulti [7] expads the structural formulatio of SVM-Perf to solve the multi-class classificatio case. Usig

5 the oe-slack formulatio, the liear multi-class optimizatio problem i (2) is replaced by k 1 mi w w + C" (6)! m wm," 2 m= 1 s.t. w xi m y i! w mxi " e!#, i = 1,..., For the multi-class problem, we use a stack of costrait vectors c = c1, c2,..., c, where k is the umber of classes. k he algorithm iteratively fids the stacked vector of most violated costraits ad repeats util o more costraits ca be foud which violate by more tha the desired precisio _. he cuttig plae algorithm fids the differece betwee the classificatio score of the correct class w ad the best y i xi classificatio score amog all other classes m! y i w x m i, where. A costrait is violated if the differece is greater tha a fractio of the combied traiig error. o accelerate the learig process, the algorithm icreases the traiig error threshold value after each iteratio usig a acceleratio factor, thereby icreasig the gap betwee the violated ad the o-violated costraits. he acceleratio factor is a fractio of the maximum correct classificatio score, which provides a reasoable idicatio of the decisio boudary. Acceleratio ca be disabled by usig a zero valued factor. he default acceleratio factor i SVM-PerfMulti is ABLE II SVM-PERFMULI & SVM-MULIDB PERFORMANCE MEASURES PER NAMED ENIY YPE Named Etity Performace Recall/Precisio/F-Score Protei / / DNA / / RNA / / Cell ype / / Cell Lie / / Overall / / Correct Right / / Correct Left / / V. DAABASE-SUPPORED SVM IMPLEMENAION I this sectio, we preset a SVM solutio assisted by a special database schema ad embedded database modules. he solutio icorporates the learig ad optimizatio algorithms of SVM-Perf ad SVM-PerfMulti. he database schema desig allows storage of iput data, evolvig traiig model(s), precomputed kerel outputs ad dot products, ad output data. he aim of this approach is to improve scalability by reducig the olie memory requiremets ad to foster SVM usability by providig a framework for easy reusability ad maageability of the learig eviromet ad experimetatio results. Usig a relatioal database to support SVM has bee attempted i [24] ad a more complete yet differet solutio is icluded i the Oracle 10g data miig product (ODM) [20]. MySvmDB [24] addresses the high memory requiremets by usig a relatioal database to store the iput data ad parameters. It does ot hadle the computatioal time limitatios. I fact, commuicatig costatly with the database system is kow to egatively impact the performace due to the cost of fetchig stored data. he oly SVM database implemetatio that tackles usability ad scalability issues is Oracle 10g commercialized SVM itegratio ito the Oracle Data Miig (ODM) product [20]. Oracle s approach to reducig the umber of data poits cosidered for traiig uses adaptive learig where a small model is built the used to classify all iput data. New iformative data poits are selected from all remaiig iput data ad the process is repeated util covergece or util reachig the maximum allowed umber of support vectors. Our approach does ot reduce the iput data size i order to evaluate the efficacy of the database-embedded modules i providig a scalable solutio. Oracle s multi-class implemetatio uses a oe-agaist-all classificatio method where several biary machies are built ad scorig is performed by votig for the best classificatio. he umber of biary machies i this case is equal to the umber of classes i the traiig data. Our SVM-MultiDB approach uses all-together traiig ad classificatio where oly oe machie is built ad used for classificatio. he ew embedded database modules supportig both the sigle class case as well as the all-together multi-class case ca be used to implemet the other multi-class learig approaches combiig biary machies, if eeded. Buildig a growig list of previously idetified ad aotated amed etities will be made possible by the database repository, which would provide a valuable resource to costatly improve the classificatio performace. he evolvig gazetteer list ca be used durig preprocessig or post-processig to aotate ad/or correct the classificatio of ewly discovered amed etities thereby boostig the overall performace. A. SVM Database Architecture he PostgreSQL [1] ope-source database maagemet system is chose due to its rich features, adherece to stadards, ad the flexible optios to exted DBMS via iteral or embedded fuctios. I order to reduce the commuicatio overhead with the database backed, we exted the database server with embedded C fuctios. his also provides a better itegratio of all compoets. Database triggers are used for frequetly updated values to esure data itegrity ad improve the potetial parallelizatio of the learig ad database processes. Fig. 2 presets the architecture used i the curret implemetatio. Pre-processig (feature extractio) ad postprocessig (evaluatio) modules are kept outside of the database modules for simplicity. Additioal supportig modules exist to import/export traiig ad test examples, import/export traiig model(s), ad trigger fuctios to compute derived data fields. o improve the usability of the SVM solutio, we will provide a web-based user-friedly iterface that allows the user to defie ew learig problems ad parameters, import/export traiig ad testig data ad/or 133

6 model(s), ad moitor the executio of the learig process. raiig Data est Data Lexico Pre-Processig Formattig for Feature Extractor Class Mappig Feature Extractio Orthographic Feature Extractor Cotextual Feature Extractor Iput Data Vectors Embedded Database Modules SVM raiig Modules Kerel Selectio Parameters Selectio Kerel Evaluatio Optimizatio Support Vector Machie Classificatio Fial Aotated Documets Support Vector Selectio Model Shrikig Import/Export Examples Import/Export Model Fig. 2 Database Architecture with Embedded SVM Data Repository Features Lexico Iput Data Vectors Kerel Evaluatios Support Vectors raied Model Classified Documets Class Labelig Evaluatio For the sake of simplicity, we will refer to the biary classificatio compoet of the database implemetatio as SVM-PerfDB ad will refer to the multi-class compoet as SVM-MultiDB. he embedded database modules are writte i C usig PostgreSQL s Server-Side Programmig Iterface (SPI) to access ad maipulate the data. he mai objectives of usig a database-supported solutio are: Use improved learig algorithms i order to reduce the traiig time. his is achieved by usig SVM-Perf ad SVM-PerfMulti as a basis of the implemetatio. Reduce the olie memory requiremets by storig iput examples ad geerated costrait vectors. A kow cocer of usig a database i place of memory-based data structures is the potetially egative impact o computatioal time due to the eed of frequet access to permaet storage. o remedy for this adverse reactio, oe eeds to miimize the eed to refetch data, possibly by storig itermediate results of smaller size i memory. Improve usability of the SVM solutio ad provide a practical framework for SVM learig ad classificatio. B. SVM-MultiDB Database Schema he database schema desig of SVM-MultiDB aims to provide a practical framework for biary ad multi-class SVM learig ad classificatio. Fig. 3 presets the mai database schema. he objectives of the schema are the followig: Reduce olie memory eeds of a learig exercise by storig iput examples ad geerated costrait vectors. Provide a way to store traiig ad/or testig example datasets idepedet of a learig exercise. Be able to defie multiple SVM experimets usig the same traiig ad/or testig datasets. Be able to use a subset of existig datasets for a learig experimet. his would be useful to coduct a grid search of the best learig parameters. Be able to reuse the same SVM exercise defiitio with differet learig parameters. Be able to label the same example dataset differetly i differet learig exercises. For e.g., the same dataset 134 may be used for biary classificatio of differet amed etities or for multi-class classificatio usig a differet umber of classes as part of idividual learig exercises without the eed to reload the example dataset. Provide a way to store itermediate kerel evaluatios ad dot products of costrait feature vectors. Provide a way to store the geerated most violated costrait vectors ad leared model(s) for future classificatio use ad potetially for ew icremetal learig algorithms. Be able to classify differet test datasets at ay time usig existig leared model(s). Easily maitai the defiitio of learig experimets ad parameters ad examie their results. As preseted i Fig. 3, the mai table defiitios supportig SVM learig ad classificatio are the followig: Example_set: defies a ew set of traiig ad/or test ig examples. Example: idividual example iput vectors that belog to a give example set. Note that the label stored is the origial textual label, for e.g., B-protei, ad ot a give class umber i order to facilitate the use of class idetificatios withi differet exercises. Label_set: defies a set of class labels. Label: idividual textual to class umber mappig that belogs to a give label_set. Kerel: a lookup table of valid kerel types. SVM: defies a selectio of all or part of a example set ad a give label set, to be used for a learig exercise. Lear_Param: defies a set of learig parameters. SVM_Lear: defies a specific learig exercise for a SVM defiitio ad a set of learig parameters. SVM_Model: a set of geerated costraits belogig to a give learig exercise. Support vectors that are selected for the fial leared model are marked usig the selected Boolea field. Computed alphas of the fial leared model for each selected support vector are stored. SVM_Model_Kerel: may be used to store kerel evaluatios (dot products for the liear case) i order to reduce computatioal redudacy ad the eed to refetch feature vectors for kerel computatio. SVM_Classify: defies a classificatio exercise. Classified_example: stores computed predictios of classified examples for a give classificatio exercise.

7 SVM-PerfMulti Memory Size vs. Number of Examples -c w 3 -l 2 (64-bit) 6,000 5,000 Memory Size (MBytes) 4,000 3,000 2,000 1,000 0 # of raiig Examples SVM-PerfMulti SVM-Perf C=0.01 SVM-Perf C=0.14 SVM-Perf C=1.0 Fig. 4 SVM-Perf ad SVM-PerfMulti Memory Usage vs. Data Size op to Bottom: SVM-PerfMulti, SVM-Perf (C=1.0, C=0.14, C=0.01) Fig. 3 SVM-MultiDB Database Schema C. radeoff of raiig ime vs. Olie Memory Needs Usig the SVM structural formulatio for either biary of multi-class learig, the traiig time is improved by combiig feature vectors ito vector(s) of most violated costraits. he geerated vectors require larger memory as the size of each costrait vector is O(f) i the biary case ad O(fm) i the multi-class case, where f is the umber of features i the traiig set ad m is the umber of classes. For e.g., for a traiig set with 1,000,000 features ad 10 classes ad assumig 8-bytes per feature to store feature umber ad its weight a biary costrait vector may eed up to 8MB of memory while a multi-class vector may eed up to 80MB. hese estimates costitute a worse-case sceario, where all features are represeted i each vector for all classes. I practice, usig the JNLPBA-04 traiig dataset with over a millio features ad 11 classes, the multi-class costrait vector size was about 0.5MB. Fig. 4 presets the olie memory requiremets with varyig traiig data size for biary traiig (usig regularizatio factor C=0.01, 0.14, ad 1.0) as well as for the multi-class traiig. Note that although the multi-class costrait vector size is potetially 11 times larger i this experimet, the actual memory eeds are less tha this estimate, ad ot much larger tha that eeded for biary classificatio with a larger regularizatio factor C. By examiig the time spet i differet parts of the learig algorithm, it is oted that about 50% of the time is spet computig argmax to fid the most violated costraits usig the origial iput vectors, ad the other 50% is spet optimizig the model usig the costrait vectors. We will examie the impact of storig each vector type i the database o the overall traiig time. 1) Effect of Fetchig Examples from Database SVM-MultiDB provides a cofigurable example cachig with three differet optios: o cachig (i.e., examples are always fetched from the database), full cachig (all examples are prefetched ito memory), ad a limited cache size where a predefied umber of example records is fetched as eeded. As expected, icreasig the cache size miimizes the time impact up to a certai size after which we see a impact due to loger prefetch time. However, sice the example vectors are requested oly oce ad i a sequetial maer to compute the most violated costrait, the overall impact of keepig example vectors i the database ad fetchig them as eeded had a miimal impact o time i the biary case ad almost o impact o the multi-class case. A compariso of the traiig time usig a example cache size of 500 is preseted i Fig. 5 for the biary, ad i Fig. 6, Fig. 7 for the multiclass case. 2) Effect of Fetchig Support Vectors from Database Costrait ad support vectors occupy more memory tha example vectors ad would result i a huge savig of olie memory if maitaied i the database. However, the existig C implemetatio of SVM-Perf ad SVM-PerfMulti require frequet access to the feature vectors durig the optimizatio process, mostly to compute kerel products ad update liear weights. Moreover, usig a variable support vector cache size may ot be useful due to the frequet o-sequetial access to the support vectors. A iitial direct portig of the C implemetatios to database implemetatio without imemory support cache egatively impacted the overall traiig time, which was expected. A efficiet database implemetatio requires optimizig access to the costrait feature vectors by cachig kerel evaluatios i memory ad miimizig the umber of loops used to compute other itermediate results. A iitial test of kerel product cachig 135

8 (o costrait cachig) resulted i about 50% time reductio. VI. SVM SCALABILIY EXPERIMENS I this sectio, the results of several sets of scalability experimets usig sigle-class ad multi-class SVM are examied. hese experimets use the same traiig ad test datasets described i Sectio III. he datasets represet a realworld problem, amely the biomedical amed etity recogitio, to idetify the ames of proteis, DNA, RNA, cell lies, ad cell types i biomedical abstracts. he approach used promotes laguage ad domai idepedece by elimiatig the use of prior laguage-specific ad domaispecific kowledge. Pre-processig of the traiig ad test datasets is limited to extractig morphological ad cotextual features describig words i the biomedical abstracts ad represetig each vector with a high-dimesioal biary vector. he iput dimesioality of the traiig data exceeds a millio features. he traiig data is composed of 492,551 examples ad the test data icludes 101,039 tokes. he scalability experimets trai sigle-class ad multiclass support vector machies usig chuks of the traiig dataset with icreasig size. he traied model is the used to classify amed etities i the complete test dataset. he traiig time is oted i each experimet as well as the umber of support vectors ad the accuracy measures achieved. Several sets of experimets are coducted usig differet traiig data sizes, which iclude: Sigle-class experimets idetifyig protei ames usig horste Joachims popular SVM-Light [10-12], ad a regularizatio factor C=0.01 ad 0.1. Sigle-class experimets idetifyig protei ames oly usig the ew SVM implemetatio, SVM-Perf [13, 14, 26, 27], ad a regularizatio factor C=0.01, 0.14, ad 0.1. Sigle-class experimets idetifyig protei ames oly usig our database embedded solutio, SVM-PerfDB, ad a regularizatio factor C=0.01, 0.14, ad 0.1. Multi-class experimets idetifyig all five amed etities (protei, DNA, RNA, cell lie, cell type) usig Joachims multi-class implemetatio, SVM-Multiclass [5, 26] with a regularizatio factor C=0.01. Multi-class experimets idetifyig all five amed etities usig our ew multi-class istatiatio, SVM- PerfMulti [7] with a acceleratio factor= Multi-class experimets idetifyig all five amed etities usig our database embedded solutio, SVM- MultiDB with a acceleratio factor= he traiig data chuks rage from 1,000 examples to 492,551 examples (the complete traiig dataset). Each set of experimets cosists of 51 tests. All experimets use a liear kerel ad a margi error of 0.1. he tests ru o a Itel Core 2 quad-processor 2.66 GHz machie ad a Xeo quad-processor 3.6 GHz machie. Ruig the same test o both machies completed i similar traiig time. A. Sigle-Class Results Usig SVM-Light [10-12], a sigle-class support vector machie is traied to recogize protei ame sequeces. he traied machie is the used to classify proteis i the test data. Sice o pre-processig was performed o the traiig ad testig data besides features extractio, the positive examples i the data sets remaied scarce. raiig the SVM- Light machie with the complete traiig dataset ad a regularizatio factor C=0.01 completed i about 28.5 miutes. he recall, precisio, ad F-score achieved i this case are 62.72, 56.12, ad respectively. Icreasig C to 1.0 raised the traiig time to about 269 miutes, ad improved the accuracy measures to 68.92, 58.58, ad respectively. he same set of experimets is repeated usig SVM-Perf [13, 14, 26, 27], which improves traiig time of liear machies to be liear w.r.t. the traiig data size. he traiig time improvemet usig SVM-Perf is several orders of magitude as compared to that usig SVM-Light, with the same classificatio results whe traied with the same learig parameters. Fig. 5 compares the traiig time usig both SVM-Light, SVM-Perf, ad SVM-PerfDB with the same data ad learig parameters. he traiig time usig SVM-Light is polyomial O( 2 ) while beig liear usig SVM-Perf ad SVM-PerfDB. he umber of support vectors usig SVM- Light is O( 0.8 ) w.r.t. the traiig data size. However, usig SVM-Perf, the umber of support vectors was oly a very small fractio of the traiig data size ad icreased slightly with icreased data size. he reduced umber of support vectors is the mai basis for the improved traiig time of SVM-Perf. he best recall, precisio, ad F-score measures re achieved usig C=0.14 ad are 73.10, 62.30, ad 67.27, respectively, where learig completes i about 15 miutes. B. Multi-Class Results he SVM-Multiclass implemetatio by. Joachims is based o [5] ad uses a differet quadratic optimizatio algorithm described i [26]. Hsu ad Li [9] ote that as it is computatioally more expesive to solve multi-class problems, comparisos of these methods usig large-scale problems have ot bee seriously coducted. Especially for methods solvig multi-class SVM i oe step, a much larger optimizatio problem is required so up to ow experimets are limited to small data sets. he multi-class experimets preseted herei attempt to solve a real-world large-scale problem usig a All-ogether classificatio method. he traiig data is composed of 11 classes where each amed etity is represeted by two classes oe deotig the begiig of a etity ad the other deotig a cotiuatio toke withi the same etity i additio to oe class deotig o-amed etity tokes. 136

9 raiig ime (sec) 1,800 1,600 1,400 1,200 1, SVM-PerfDB vs. SVM-Perf & SVM-Light raiig ime -c w 3 -l 2 (cache size = 500) classificatio performace variatio with traiig data size usig the multi-class approach. Note that the protei performace measures i this case are superior to the best achieved usig biary classificatio. he fial protei F-score i the biary Compariso case of with SVM-PerfMulti, C=0.14 SVM-MultiDB is & as SVM-Multiclass compared to a F- score of i the multi-class raiig ime case. 1,000, ,000 0 # of raiig Examples SVM-Perf SVM-PerfDB SVM-Light Fig. 5 Compariso of SVM-Light (op), SVM-Perf (Bottom), ad SVM-PerfDB (Middle) raiig ime vs. raiig Data Size raiig ime (sec) 10,000 1, o explore the scalability issues of the All-ogether multiclass SVM implemetatio, a series of experimets usig differet traiig data sizes is coducted with a low value for the C learig parameter equal to he traiig time with 1,000 examples was secods ad it icreased cosiderably with icreased data size to reach 416, secods (6, miutes or 4.8 days) o the same machie. he SVM-Multiclass [5, 26] implemetatio is based o the learig implemetatio i SVM-Light [10-12]. he traiig time remais polyomial O( 2 ) w.r.t. the traiig data size with a factor of O(k 2 ) icrease i time as compared to the sigle-class SVM-Light time, where k is the umber of classes. he traiig time required for All-ogether multi-class traiig is prohibitig to usig this approach with large datasets. SVM-PerfMulti ad SVM-MultiDB address this issue by usig a improved cuttig plae algorithm i cojuctio with the liear learig algorithm of SVM-Perf. able III ad Fig. 6 compare the traiig time usig the three methods. Fig. 7 takes a closer look at the impact of examples cachig i SVM-MultiDB usig a cache size of 500 examples. Note the miimal impact o traiig time i this case. ABLE III COMPARISON OF SVM-PERFMULI, SVM-MULIDB, AND SVM- MULICLASS RAINING IME (SECONDS) VS. RAINING DAA SIZE raiig Data Size SVM-Multiclass SVM-PerfMulti SVM-MultiDB (Examples Cache Size=500) 5, , ,000 1, ,000 7, ,000 23, ,000 91, , , , , , , , , , , Fig. 8 presets the impact of the traiig data size o the multi-class classificatio performace measures i terms of precisio, recall, ad F _=1 -score. Fig. 9 presets the protei 10 1 # of raiig Examples Fig. 6 Compariso of SVM-PerfMulti SVM-Multiclass SVM-Multiclass (op), SVM-MultiDB SVM-PerfMulti, ad SVM-MultiDB Compariso raiig of ime SVM-MultiDB vs. raiig ad SVM-PerfMulti Data Size raiig ime vs. Number of raiig Examples raiig ime (sec) 7,000 6,000 5,000 4,000 3,000 2,000 1,000 0 # of raiig Examples Fig. 7 Compariso of SVM-MultiDB (Edig SVM-PerfMulti Up) ad SVM- PerfMulti raiig SVM-PerfMulti ime vs. Overall raiig Performace Data Measures Size C=0.01 (Recall / Precisio / F-score) Performace (%) # of raiig Examples Fig. 8 SVM-MultiDB Overall Overall Recall Performace Overall Precisio vs. Overall raiig F-score Data Size (Very Close Recall, F-Score, ad Precisio Measures) VII. CONCLUSION AND FUURE WORK I this paper, we preset a improved multi-class cuttig plae algorithm that exteds the ew SVM structural formulatio [14] to improve multi-class liear traiig time. We also preset a database-supported implemetatio of the structural biary ad multi-class algorithms that aims to combie the ehaced traiig time with a reductio of olie memory eeds, ad to provide a practical ad usable framework for SVM learig. 137

10 SVM-MultiDB Protei Performace Measures C=0.01 (Recall / Precisio / F-score) Performace (%) # of raiig Examples Fig. 9 SVM-MultiDB Overall Protei Recall Performace Overall Precisiovs. raiig Overall F-score Data Size (From op to Bottom: Recall, F-Score, Precisio) A series of experimets is preseted i order to explore the scalability issues associated with solvig the amed etity recogitio problem usig multi-class support vector machies ad high-dimesioal features, ad compare the results usig the differet learig methods. Baselie experimet results have show that the proposed laguage ad domai-idepedet approach is capable of successfully recogizig ad classifyig amed etities with reasoable accuracy measures. he ew SVM-PerfMulti cuttig plae algorithm offers good out-of-the-box performace measures achieved i a traiig time that is several orders of magitude faster tha SVM-Multiclass. he ew database implemetatio of the biary ad multiclass SVM classifiers offers a practical framework for SVM learig that fosters reusability of leared model(s) ad ehaced usability of the solutio. Fetchig stored example vectors from the database is show to have miimal impact o the traiig time. However, frequet access to stored costrait vectors egatively impacts traiig time, uless a efficiet i-memory cachig of kerel products ad itermediate results is used. Iitial attempts to cache kerel evaluatios has prove to be effective. Future work will attempt to miimize access to the costrait vectors durig the optimizatio process thereby providig further reductio i olie memory requiremets. REFERENCES [1] PostgreSQL Ope Source Database, ed: PostgreSQL Global Developmet Group, Olie: [2] S. Abe, Support Vector Machies for Patter Classificatio. Lodo: Spriger-Verlag, [3] E. Alpaydi, Itroductio to Machie Learig. Cambridge, MA: he MI Press, [4] K. P. Beett ad C. Campbell, "Support Vector Machies: Hype or Hallelujah?," SIGKDD Explor. Newsl., vol. 2, pp. 1-13, [5] K. Crammer ad Y. Siger, "O the Algorithmic Implemetatio of Multi-class SVMs," Joural of Machie Learig Research, vol. 2, pp , [6] C. Giuliao, A. Lavelli, et al., Simple Iformatio Extractio (SIE): IC-irst, Istituto per la Ricerca Scietifica e ecologica, Olie: [7] M. S. Habib, "Addressig Scalability Issues of Named Etity Recogitio Usig Multi-Class Support Vector Machies," Itl Joural of Computatioal Itelligece, vol. 4, pp , [8] M. S. Habib ad J. Kalita, "Laguage ad Domai-Idepedet Named Etity Recogitio: Experimet usig SVM ad High- Dimesioal Features," i Proc. of the 4th Biotechology ad Bioiformatics Symposium (BIO-2007), Colorado Sprigs, CO, [9] C.-W. Hsu ad C.-C. Li, "A Compariso of Methods for Multi-Class Support Vector Machies," IEEE rasactios o Neural Networks, vol. 13, pp , [10]. Joachims, "Makig Large-Scale SVM Learig Practical," i Advaces i Kerel Methods - Support Vector Learig: Chapter 11, B. Schölkopf, C. Burges, ad A. Smola, Eds.: MI-Press, [11]. Joachims, "ext Categorizatio with Support Vector Machies: Learig with May Relevat Features," i Proc. of the Europea Coferece o Machie Learig, [12]. Joachims, Learig to Classify ext Usig Support Vector Machie. Norwell, MA: Kluwer Academic, [13]. Joachims, "A Support Vector Method for Multivariate Performace Measures," i Proc. of the Iteratioal Coferece o Machie Learig (ICML), [14]. Joachims, "raiig Liear SVMs i Liear ime," i Proc. of the ACM Cof. o Kowledge Discovery ad Data Miig (KDD), [15] J. D. Kim,. Ohta, et al., "GENIA Corpus--Sematically Aotated Corpus for Bio-extmiig," Bioiformatics, vol. 19 Suppl 1, pp , [16] J.-D. Kim,. Ohta, et al., "Itroductio to the Bio-Etity Recogitio ask at JNLPBA," i Proc. of the 2004 Joit Workshop o Natural Laguage Processig i Biomedicie ad its Applicatios (JNLPBA'2004), Geeva, Switzerlad, [17] U. H.-G. Kreßel, "Pairwise Classificatio ad Support Vector Machies," i Advaces i Kerel Methods: Support Vector Learig. Cambridge, MA: MI Press, 1999, pp [18] K.-J. Lee, Y.-S. Hwag, et al., "Biomedical Named Etity Recogitio usig wo-phase Model Based o SVMs," Joural of Biomedical Iformatics, vol. 37, pp , [19] H. Lei ad V. Govidaraju, "Half-Agaist-Half Multi-class Support Vector Machies," i Proc. of the 6th Iteratioal Workshop o Multiple Classifier Systems, Seaside, CA, USA, [20] B. L. Mileova, J. S. Yarmus, et al., "SVM i Oracle Database 10g: Removig the Barriers to Widespread Adoptio of Support Vector Machies," i Proc. of the 31st iteratioal coferece o Very large data bases, rodheim, Norway, [21] K.-M. Park, S.-H. Kim, et al., "Icorporatig Lexical Kowledge ito Biomedical NE Recogitio," i Proc. of the 2004 Joit Workshop o Natural Laguage Processig i Biomedicie ad its Applicatios (JNLPBA'2004), Geeva, Switzerlad, [22] J. C. Platt, N. Cristiaii, et al., "Large Margi DAGs for Multiclass Classificatio," i Advaces i Neural Iformatio Processig Systems, vol. 12, S. A. Solla,. K. Lee, ad K.-R. M uller, Eds. Cambridge, MA: MI Press, 2000, pp [23] M. Rössler, "Adaptig a NER-System for Germa to the Biomedical Domai," i Proc. of the 2004 Joit Workshop o Natural Laguage Processig i Biomedicie ad its Applicatios (JNLPBA'2004), Geeva, Switzerlad, [24] S. Rüpig, "Support Vector Machies i Relatioal Databases," i Patter Recogitio with Support Vector Machies - First Iteratioal Workshop, [25] Y. Sog, E. Kim, et al., "POSBIOM-NER i the Shared ask of BioNLP/NLPBA 2004," i Proc. of the 2004 Joit Workshop o Natural Laguage Processig i Biomedicie ad its Applicatios (JNLPBA'2004), Geeva, Switzerlad, [26] I. sochataridis,. Hofma, et al., "Support Vector Learig for Iterdepedet ad Structured Output Spaces," i Proc. of the 21st Itel Cof. o Machie Learig (ICML), Alberta, Caada, [27] I. sochataridis,. Joachims, et al., "Large Margi Methods for Structured ad Iterdepedet Output Variables," Joural of Machie Learig Research (JMLR), vol. 6, pp , [28] V. N. Vapik, Statistical Learig heory. New York, NY: Joh Wiley & Sos, [29] G. Zhou, "Recogizig Names i Biomedical exts usig Hidde Markov Model ad SVM plus Sigmoid," i Proc. of the 2004 Joit Workshop o Natural Laguage Processig i Biomedicie ad its Applicatios (JNLPBA'2004), Geeva, Switzerlad, [30] G. Zhou ad J. Su, "Explorig Deep Kowledge Resources i Biomedical Name Recogitio," i Proc. of the 2004 Joit Workshop o Natural Laguage Processig i Biomedicie ad its Applicatios (JNLPBA'2004), Geeva, Switzerlad,

Modified Line Search Method for Global Optimization

Modified Line Search Method for Global Optimization Modified Lie Search Method for Global Optimizatio Cria Grosa ad Ajith Abraham Ceter of Excellece for Quatifiable Quality of Service Norwegia Uiversity of Sciece ad Techology Trodheim, Norway {cria, ajith}@q2s.tu.o

More information

Domain 1: Designing a SQL Server Instance and a Database Solution

Domain 1: Designing a SQL Server Instance and a Database Solution Maual SQL Server 2008 Desig, Optimize ad Maitai (70-450) 1-800-418-6789 Domai 1: Desigig a SQL Server Istace ad a Database Solutio Desigig for CPU, Memory ad Storage Capacity Requiremets Whe desigig a

More information

(VCP-310) 1-800-418-6789

(VCP-310) 1-800-418-6789 Maual VMware Lesso 1: Uderstadig the VMware Product Lie I this lesso, you will first lear what virtualizatio is. Next, you ll explore the products offered by VMware that provide virtualizatio services.

More information

Spam Detection. A Bayesian approach to filtering spam

Spam Detection. A Bayesian approach to filtering spam Spam Detectio A Bayesia approach to filterig spam Kual Mehrotra Shailedra Watave Abstract The ever icreasig meace of spam is brigig dow productivity. More tha 70% of the email messages are spam, ad it

More information

Soving Recurrence Relations

Soving Recurrence Relations Sovig Recurrece Relatios Part 1. Homogeeous liear 2d degree relatios with costat coefficiets. Cosider the recurrece relatio ( ) T () + at ( 1) + bt ( 2) = 0 This is called a homogeeous liear 2d degree

More information

5 Boolean Decision Trees (February 11)

5 Boolean Decision Trees (February 11) 5 Boolea Decisio Trees (February 11) 5.1 Graph Coectivity Suppose we are give a udirected graph G, represeted as a boolea adjacecy matrix = (a ij ), where a ij = 1 if ad oly if vertices i ad j are coected

More information

Project Deliverables. CS 361, Lecture 28. Outline. Project Deliverables. Administrative. Project Comments

Project Deliverables. CS 361, Lecture 28. Outline. Project Deliverables. Administrative. Project Comments Project Deliverables CS 361, Lecture 28 Jared Saia Uiversity of New Mexico Each Group should tur i oe group project cosistig of: About 6-12 pages of text (ca be loger with appedix) 6-12 figures (please

More information

Analyzing Longitudinal Data from Complex Surveys Using SUDAAN

Analyzing Longitudinal Data from Complex Surveys Using SUDAAN Aalyzig Logitudial Data from Complex Surveys Usig SUDAAN Darryl Creel Statistics ad Epidemiology, RTI Iteratioal, 312 Trotter Farm Drive, Rockville, MD, 20850 Abstract SUDAAN: Software for the Statistical

More information

In nite Sequences. Dr. Philippe B. Laval Kennesaw State University. October 9, 2008

In nite Sequences. Dr. Philippe B. Laval Kennesaw State University. October 9, 2008 I ite Sequeces Dr. Philippe B. Laval Keesaw State Uiversity October 9, 2008 Abstract This had out is a itroductio to i ite sequeces. mai de itios ad presets some elemetary results. It gives the I ite Sequeces

More information

*The most important feature of MRP as compared with ordinary inventory control analysis is its time phasing feature.

*The most important feature of MRP as compared with ordinary inventory control analysis is its time phasing feature. Itegrated Productio ad Ivetory Cotrol System MRP ad MRP II Framework of Maufacturig System Ivetory cotrol, productio schedulig, capacity plaig ad fiacial ad busiess decisios i a productio system are iterrelated.

More information

CHAPTER 3 THE TIME VALUE OF MONEY

CHAPTER 3 THE TIME VALUE OF MONEY CHAPTER 3 THE TIME VALUE OF MONEY OVERVIEW A dollar i the had today is worth more tha a dollar to be received i the future because, if you had it ow, you could ivest that dollar ad ear iterest. Of all

More information

Security Functions and Purposes of Network Devices and Technologies (SY0-301) 1-800-418-6789. Firewalls. Audiobooks

Security Functions and Purposes of Network Devices and Technologies (SY0-301) 1-800-418-6789. Firewalls. Audiobooks Maual Security+ Domai 1 Network Security Every etwork is uique, ad architecturally defied physically by its equipmet ad coectios, ad logically through the applicatios, services, ad idustries it serves.

More information

.04. This means $1000 is multiplied by 1.02 five times, once for each of the remaining sixmonth

.04. This means $1000 is multiplied by 1.02 five times, once for each of the remaining sixmonth Questio 1: What is a ordiary auity? Let s look at a ordiary auity that is certai ad simple. By this, we mea a auity over a fixed term whose paymet period matches the iterest coversio period. Additioally,

More information

Cutting-Plane Training of Structural SVMs

Cutting-Plane Training of Structural SVMs Cuttig-Plae Traiig of Structural SVMs Thorste Joachims, Thomas Filey, ad Chu-Nam Joh Yu Abstract Discrimiative traiig approaches like structural SVMs have show much promise for buildig highly complex ad

More information

LECTURE 13: Cross-validation

LECTURE 13: Cross-validation LECTURE 3: Cross-validatio Resampli methods Cross Validatio Bootstrap Bias ad variace estimatio with the Bootstrap Three-way data partitioi Itroductio to Patter Aalysis Ricardo Gutierrez-Osua Texas A&M

More information

Review: Classification Outline

Review: Classification Outline Data Miig CS 341, Sprig 2007 Decisio Trees Neural etworks Review: Lecture 6: Classificatio issues, regressio, bayesia classificatio Pretice Hall 2 Data Miig Core Techiques Classificatio Clusterig Associatio

More information

Totally Corrective Boosting Algorithms that Maximize the Margin

Totally Corrective Boosting Algorithms that Maximize the Margin Mafred K. Warmuth [email protected] Ju Liao [email protected] Uiversity of Califoria at Sata Cruz, Sata Cruz, CA 95064, USA Guar Rätsch [email protected] Friedrich Miescher Laboratory of

More information

Systems Design Project: Indoor Location of Wireless Devices

Systems Design Project: Indoor Location of Wireless Devices Systems Desig Project: Idoor Locatio of Wireless Devices Prepared By: Bria Murphy Seior Systems Sciece ad Egieerig Washigto Uiversity i St. Louis Phoe: (805) 698-5295 Email: [email protected] Supervised

More information

SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES

SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES Read Sectio 1.5 (pages 5 9) Overview I Sectio 1.5 we lear to work with summatio otatio ad formulas. We will also itroduce a brief overview of sequeces,

More information

INVESTMENT PERFORMANCE COUNCIL (IPC) Guidance Statement on Calculation Methodology

INVESTMENT PERFORMANCE COUNCIL (IPC) Guidance Statement on Calculation Methodology Adoptio Date: 4 March 2004 Effective Date: 1 Jue 2004 Retroactive Applicatio: No Public Commet Period: Aug Nov 2002 INVESTMENT PERFORMANCE COUNCIL (IPC) Preface Guidace Statemet o Calculatio Methodology

More information

Confidence Intervals for One Mean

Confidence Intervals for One Mean Chapter 420 Cofidece Itervals for Oe Mea Itroductio This routie calculates the sample size ecessary to achieve a specified distace from the mea to the cofidece limit(s) at a stated cofidece level for a

More information

Incremental calculation of weighted mean and variance

Incremental calculation of weighted mean and variance Icremetal calculatio of weighted mea ad variace Toy Fich [email protected] [email protected] Uiversity of Cambridge Computig Service February 009 Abstract I these otes I eplai how to derive formulae for umerically

More information

Department of Computer Science, University of Otago

Department of Computer Science, University of Otago Departmet of Computer Sciece, Uiversity of Otago Techical Report OUCS-2006-09 Permutatios Cotaiig May Patters Authors: M.H. Albert Departmet of Computer Sciece, Uiversity of Otago Micah Colema, Rya Fly

More information

PUBLIC RELATIONS PROJECT 2016

PUBLIC RELATIONS PROJECT 2016 PUBLIC RELATIONS PROJECT 2016 The purpose of the Public Relatios Project is to provide a opportuity for the chapter members to demostrate the kowledge ad skills eeded i plaig, orgaizig, implemetig ad evaluatig

More information

1 Computing the Standard Deviation of Sample Means

1 Computing the Standard Deviation of Sample Means Computig the Stadard Deviatio of Sample Meas Quality cotrol charts are based o sample meas ot o idividual values withi a sample. A sample is a group of items, which are cosidered all together for our aalysis.

More information

Convention Paper 6764

Convention Paper 6764 Audio Egieerig Society Covetio Paper 6764 Preseted at the 10th Covetio 006 May 0 3 Paris, Frace This covetio paper has bee reproduced from the author's advace mauscript, without editig, correctios, or

More information

Running Time ( 3.1) Analysis of Algorithms. Experimental Studies ( 3.1.1) Limitations of Experiments. Pseudocode ( 3.1.2) Theoretical Analysis

Running Time ( 3.1) Analysis of Algorithms. Experimental Studies ( 3.1.1) Limitations of Experiments. Pseudocode ( 3.1.2) Theoretical Analysis Ruig Time ( 3.) Aalysis of Algorithms Iput Algorithm Output A algorithm is a step-by-step procedure for solvig a problem i a fiite amout of time. Most algorithms trasform iput objects ito output objects.

More information

Assessment of the Board

Assessment of the Board Audit Committee Istitute Sposored by KPMG Assessmet of the Board Whe usig a facilitator, care eeds to be take if the idividual is i some way coflicted due to the closeess of their relatioship with the

More information

CREATIVE MARKETING PROJECT 2016

CREATIVE MARKETING PROJECT 2016 CREATIVE MARKETING PROJECT 2016 The Creative Marketig Project is a chapter project that develops i chapter members a aalytical ad creative approach to the marketig process, actively egages chapter members

More information

INVESTMENT PERFORMANCE COUNCIL (IPC)

INVESTMENT PERFORMANCE COUNCIL (IPC) INVESTMENT PEFOMANCE COUNCIL (IPC) INVITATION TO COMMENT: Global Ivestmet Performace Stadards (GIPS ) Guidace Statemet o Calculatio Methodology The Associatio for Ivestmet Maagemet ad esearch (AIM) seeks

More information

Research Method (I) --Knowledge on Sampling (Simple Random Sampling)

Research Method (I) --Knowledge on Sampling (Simple Random Sampling) Research Method (I) --Kowledge o Samplig (Simple Radom Samplig) 1. Itroductio to samplig 1.1 Defiitio of samplig Samplig ca be defied as selectig part of the elemets i a populatio. It results i the fact

More information

The Forgotten Middle. research readiness results. Executive Summary

The Forgotten Middle. research readiness results. Executive Summary The Forgotte Middle Esurig that All Studets Are o Target for College ad Career Readiess before High School Executive Summary Today, college readiess also meas career readiess. While ot every high school

More information

CS100: Introduction to Computer Science

CS100: Introduction to Computer Science Review: History of Computers CS100: Itroductio to Computer Sciece Maiframes Miicomputers Lecture 2: Data Storage -- Bits, their storage ad mai memory Persoal Computers & Workstatios Review: The Role of

More information

Reliability Analysis in HPC clusters

Reliability Analysis in HPC clusters Reliability Aalysis i HPC clusters Narasimha Raju, Gottumukkala, Yuda Liu, Chokchai Box Leagsuksu 1, Raja Nassar, Stephe Scott 2 College of Egieerig & Sciece, Louisiaa ech Uiversity Oak Ridge Natioal Lab

More information

AdaLab. Adaptive Automated Scientific Laboratory (AdaLab) Adaptive Machines in Complex Environments. n Start Date: 1.4.15

AdaLab. Adaptive Automated Scientific Laboratory (AdaLab) Adaptive Machines in Complex Environments. n Start Date: 1.4.15 AdaLab AdaLab Adaptive Automated Scietific Laboratory (AdaLab) Adaptive Machies i Complex Eviromets Start Date: 1.4.15 Scietific Backgroud The Cocept of a Robot Scietist Computer systems capable of origiatig

More information

CHAPTER 3 DIGITAL CODING OF SIGNALS

CHAPTER 3 DIGITAL CODING OF SIGNALS CHAPTER 3 DIGITAL CODING OF SIGNALS Computers are ofte used to automate the recordig of measuremets. The trasducers ad sigal coditioig circuits produce a voltage sigal that is proportioal to a quatity

More information

NEW HIGH PERFORMANCE COMPUTATIONAL METHODS FOR MORTGAGES AND ANNUITIES. Yuri Shestopaloff,

NEW HIGH PERFORMANCE COMPUTATIONAL METHODS FOR MORTGAGES AND ANNUITIES. Yuri Shestopaloff, NEW HIGH PERFORMNCE COMPUTTIONL METHODS FOR MORTGGES ND NNUITIES Yuri Shestopaloff, Geerally, mortgage ad auity equatios do ot have aalytical solutios for ukow iterest rate, which has to be foud usig umerical

More information

STUDENTS PARTICIPATION IN ONLINE LEARNING IN BUSINESS COURSES AT UNIVERSITAS TERBUKA, INDONESIA. Maya Maria, Universitas Terbuka, Indonesia

STUDENTS PARTICIPATION IN ONLINE LEARNING IN BUSINESS COURSES AT UNIVERSITAS TERBUKA, INDONESIA. Maya Maria, Universitas Terbuka, Indonesia STUDENTS PARTICIPATION IN ONLINE LEARNING IN BUSINESS COURSES AT UNIVERSITAS TERBUKA, INDONESIA Maya Maria, Uiversitas Terbuka, Idoesia Co-author: Amiuddi Zuhairi, Uiversitas Terbuka, Idoesia Kuria Edah

More information

Basic Elements of Arithmetic Sequences and Series

Basic Elements of Arithmetic Sequences and Series MA40S PRE-CALCULUS UNIT G GEOMETRIC SEQUENCES CLASS NOTES (COMPLETED NO NEED TO COPY NOTES FROM OVERHEAD) Basic Elemets of Arithmetic Sequeces ad Series Objective: To establish basic elemets of arithmetic

More information

The Canadian Council of Professional Engineers

The Canadian Council of Professional Engineers The Caadia Coucil of Professioal Egieers Providig leadership which advaces the quality of life through the creative, resposible ad progressive applicatio of egieerig priciples i a global cotext Egieerig

More information

Your organization has a Class B IP address of 166.144.0.0 Before you implement subnetting, the Network ID and Host ID are divided as follows:

Your organization has a Class B IP address of 166.144.0.0 Before you implement subnetting, the Network ID and Host ID are divided as follows: Subettig Subettig is used to subdivide a sigle class of etwork i to multiple smaller etworks. Example: Your orgaizatio has a Class B IP address of 166.144.0.0 Before you implemet subettig, the Network

More information

Engineering Data Management

Engineering Data Management BaaERP 5.0c Maufacturig Egieerig Data Maagemet Module Procedure UP128A US Documetiformatio Documet Documet code : UP128A US Documet group : User Documetatio Documet title : Egieerig Data Maagemet Applicatio/Package

More information

Hypergeometric Distributions

Hypergeometric Distributions 7.4 Hypergeometric Distributios Whe choosig the startig lie-up for a game, a coach obviously has to choose a differet player for each positio. Similarly, whe a uio elects delegates for a covetio or you

More information

Vladimir N. Burkov, Dmitri A. Novikov MODELS AND METHODS OF MULTIPROJECTS MANAGEMENT

Vladimir N. Burkov, Dmitri A. Novikov MODELS AND METHODS OF MULTIPROJECTS MANAGEMENT Keywords: project maagemet, resource allocatio, etwork plaig Vladimir N Burkov, Dmitri A Novikov MODELS AND METHODS OF MULTIPROJECTS MANAGEMENT The paper deals with the problems of resource allocatio betwee

More information

Output Analysis (2, Chapters 10 &11 Law)

Output Analysis (2, Chapters 10 &11 Law) B. Maddah ENMG 6 Simulatio 05/0/07 Output Aalysis (, Chapters 10 &11 Law) Comparig alterative system cofiguratio Sice the output of a simulatio is radom, the comparig differet systems via simulatio should

More information

Taking DCOP to the Real World: Efficient Complete Solutions for Distributed Multi-Event Scheduling

Taking DCOP to the Real World: Efficient Complete Solutions for Distributed Multi-Event Scheduling Taig DCOP to the Real World: Efficiet Complete Solutios for Distributed Multi-Evet Schedulig Rajiv T. Maheswara, Milid Tambe, Emma Bowrig, Joatha P. Pearce, ad Pradeep araatham Uiversity of Souther Califoria

More information

Enhancing Oracle Business Intelligence with cubus EV How users of Oracle BI on Essbase cubes can benefit from cubus outperform EV Analytics (cubus EV)

Enhancing Oracle Business Intelligence with cubus EV How users of Oracle BI on Essbase cubes can benefit from cubus outperform EV Analytics (cubus EV) Ehacig Oracle Busiess Itelligece with cubus EV How users of Oracle BI o Essbase cubes ca beefit from cubus outperform EV Aalytics (cubus EV) CONTENT 01 cubus EV as a ehacemet to Oracle BI o Essbase 02

More information

Domain 1 Components of the Cisco Unified Communications Architecture

Domain 1 Components of the Cisco Unified Communications Architecture Maual CCNA Domai 1 Compoets of the Cisco Uified Commuicatios Architecture Uified Commuicatios (UC) Eviromet Cisco has itroduced what they call the Uified Commuicatios Eviromet which is used to separate

More information

1 Correlation and Regression Analysis

1 Correlation and Regression Analysis 1 Correlatio ad Regressio Aalysis I this sectio we will be ivestigatig the relatioship betwee two cotiuous variable, such as height ad weight, the cocetratio of a ijected drug ad heart rate, or the cosumptio

More information

where: T = number of years of cash flow in investment's life n = the year in which the cash flow X n i = IRR = the internal rate of return

where: T = number of years of cash flow in investment's life n = the year in which the cash flow X n i = IRR = the internal rate of return EVALUATING ALTERNATIVE CAPITAL INVESTMENT PROGRAMS By Ke D. Duft, Extesio Ecoomist I the March 98 issue of this publicatio we reviewed the procedure by which a capital ivestmet project was assessed. The

More information

Multiple Representations for Pattern Exploration with the Graphing Calculator and Manipulatives

Multiple Representations for Pattern Exploration with the Graphing Calculator and Manipulatives Douglas A. Lapp Multiple Represetatios for Patter Exploratio with the Graphig Calculator ad Maipulatives To teach mathematics as a coected system of cocepts, we must have a shift i emphasis from a curriculum

More information

A probabilistic proof of a binomial identity

A probabilistic proof of a binomial identity A probabilistic proof of a biomial idetity Joatho Peterso Abstract We give a elemetary probabilistic proof of a biomial idetity. The proof is obtaied by computig the probability of a certai evet i two

More information

ODBC. Getting Started With Sage Timberline Office ODBC

ODBC. Getting Started With Sage Timberline Office ODBC ODBC Gettig Started With Sage Timberlie Office ODBC NOTICE This documet ad the Sage Timberlie Office software may be used oly i accordace with the accompayig Sage Timberlie Office Ed User Licese Agreemet.

More information

Week 3 Conditional probabilities, Bayes formula, WEEK 3 page 1 Expected value of a random variable

Week 3 Conditional probabilities, Bayes formula, WEEK 3 page 1 Expected value of a random variable Week 3 Coditioal probabilities, Bayes formula, WEEK 3 page 1 Expected value of a radom variable We recall our discussio of 5 card poker hads. Example 13 : a) What is the probability of evet A that a 5

More information

summary of cover CONTRACT WORKS INSURANCE

summary of cover CONTRACT WORKS INSURANCE 1 SUMMARY OF COVER CONTRACT WORKS summary of cover CONTRACT WORKS INSURANCE This documet details the cover we ca provide for our commercial or church policyholders whe udertakig buildig or reovatio works.

More information

Measures of Spread and Boxplots Discrete Math, Section 9.4

Measures of Spread and Boxplots Discrete Math, Section 9.4 Measures of Spread ad Boxplots Discrete Math, Sectio 9.4 We start with a example: Example 1: Comparig Mea ad Media Compute the mea ad media of each data set: S 1 = {4, 6, 8, 10, 1, 14, 16} S = {4, 7, 9,

More information

How to read A Mutual Fund shareholder report

How to read A Mutual Fund shareholder report Ivestor BulletI How to read A Mutual Fud shareholder report The SEC s Office of Ivestor Educatio ad Advocacy is issuig this Ivestor Bulleti to educate idividual ivestors about mutual fud shareholder reports.

More information

AP Calculus AB 2006 Scoring Guidelines Form B

AP Calculus AB 2006 Scoring Guidelines Form B AP Calculus AB 6 Scorig Guidelies Form B The College Board: Coectig Studets to College Success The College Board is a ot-for-profit membership associatio whose missio is to coect studets to college success

More information

Hypothesis testing. Null and alternative hypotheses

Hypothesis testing. Null and alternative hypotheses Hypothesis testig Aother importat use of samplig distributios is to test hypotheses about populatio parameters, e.g. mea, proportio, regressio coefficiets, etc. For example, it is possible to stipulate

More information

Lesson 17 Pearson s Correlation Coefficient

Lesson 17 Pearson s Correlation Coefficient Outlie Measures of Relatioships Pearso s Correlatio Coefficiet (r) -types of data -scatter plots -measure of directio -measure of stregth Computatio -covariatio of X ad Y -uique variatio i X ad Y -measurig

More information

QUADRO tech. FSA Migrator 2.6. File Server Migrations - Made Easy

QUADRO tech. FSA Migrator 2.6. File Server Migrations - Made Easy QUADRO tech FSA Migrator 2.6 File Server Migratios - Made Easy FSA Migrator Cosolidate your archived ad o-archived File Server data - with ease! May orgaisatios struggle with the cotiuous growth of their

More information

Chapter 7: Confidence Interval and Sample Size

Chapter 7: Confidence Interval and Sample Size Chapter 7: Cofidece Iterval ad Sample Size Learig Objectives Upo successful completio of Chapter 7, you will be able to: Fid the cofidece iterval for the mea, proportio, ad variace. Determie the miimum

More information

7.1 Finding Rational Solutions of Polynomial Equations

7.1 Finding Rational Solutions of Polynomial Equations 4 Locker LESSON 7. Fidig Ratioal Solutios of Polyomial Equatios Name Class Date 7. Fidig Ratioal Solutios of Polyomial Equatios Essetial Questio: How do you fid the ratioal roots of a polyomial equatio?

More information

Domain 1: Configuring Domain Name System (DNS) for Active Directory

Domain 1: Configuring Domain Name System (DNS) for Active Directory Maual Widows Domai 1: Cofigurig Domai Name System (DNS) for Active Directory Cofigure zoes I Domai Name System (DNS), a DNS amespace ca be divided ito zoes. The zoes store ame iformatio about oe or more

More information

Effective Hybrid Intrusion Detection System: A Layered Approach

Effective Hybrid Intrusion Detection System: A Layered Approach I. J. Computer Network ad Iformatio Security, 2015, 3, 35-41 Published Olie February 2015 i MECS (http://www.mecs-press.org/) DOI: 10.5815/ijcis.2015.03.05 Effective Hybrid Itrusio Detectio System: A Layered

More information

I. Chi-squared Distributions

I. Chi-squared Distributions 1 M 358K Supplemet to Chapter 23: CHI-SQUARED DISTRIBUTIONS, T-DISTRIBUTIONS, AND DEGREES OF FREEDOM To uderstad t-distributios, we first eed to look at aother family of distributios, the chi-squared distributios.

More information

Subject CT5 Contingencies Core Technical Syllabus

Subject CT5 Contingencies Core Technical Syllabus Subject CT5 Cotigecies Core Techical Syllabus for the 2015 exams 1 Jue 2014 Aim The aim of the Cotigecies subject is to provide a groudig i the mathematical techiques which ca be used to model ad value

More information

TruStore: The storage. system that grows with you. Machine Tools / Power Tools Laser Technology / Electronics Medical Technology

TruStore: The storage. system that grows with you. Machine Tools / Power Tools Laser Technology / Electronics Medical Technology TruStore: The storage system that grows with you Machie Tools / Power Tools Laser Techology / Electroics Medical Techology Everythig from a sigle source. Cotets Everythig from a sigle source. 2 TruStore

More information

LEASE-PURCHASE DECISION

LEASE-PURCHASE DECISION Public Procuremet Practice STANDARD The decisio to lease or purchase should be cosidered o a case-by case evaluatio of comparative costs ad other factors. 1 Procuremet should coduct a cost/ beefit aalysis

More information

Comparison of Support Vector Machine and Artificial Neural Network Systems for Drug/Nondrug Classification

Comparison of Support Vector Machine and Artificial Neural Network Systems for Drug/Nondrug Classification 1882 J. Chem. If. Comput. Sci. 2003, 43, 1882-1889 Compariso of Support Vector Machie ad Artificial Neural Network Systems for Drug/Nodrug Classificatio Evgey Byvatov, Uli Fecher, Jes Sadowski, ad Gisbert

More information

AP Calculus BC 2003 Scoring Guidelines Form B

AP Calculus BC 2003 Scoring Guidelines Form B AP Calculus BC Scorig Guidelies Form B The materials icluded i these files are iteded for use by AP teachers for course ad exam preparatio; permissio for ay other use must be sought from the Advaced Placemet

More information

Baan Service Master Data Management

Baan Service Master Data Management Baa Service Master Data Maagemet Module Procedure UP069A US Documetiformatio Documet Documet code : UP069A US Documet group : User Documetatio Documet title : Master Data Maagemet Applicatio/Package :

More information

A Combined Continuous/Binary Genetic Algorithm for Microstrip Antenna Design

A Combined Continuous/Binary Genetic Algorithm for Microstrip Antenna Design A Combied Cotiuous/Biary Geetic Algorithm for Microstrip Atea Desig Rady L. Haupt The Pesylvaia State Uiversity Applied Research Laboratory P. O. Box 30 State College, PA 16804-0030 [email protected] Abstract:

More information

Domain 1 - Describe Cisco VoIP Implementations

Domain 1 - Describe Cisco VoIP Implementations Maual ONT (642-8) 1-800-418-6789 Domai 1 - Describe Cisco VoIP Implemetatios Advatages of VoIP Over Traditioal Switches Voice over IP etworks have may advatages over traditioal circuit switched voice etworks.

More information

Optimize your Network. In the Courier, Express and Parcel market ADDING CREDIBILITY

Optimize your Network. In the Courier, Express and Parcel market ADDING CREDIBILITY Optimize your Network I the Courier, Express ad Parcel market ADDING CREDIBILITY Meetig today s challeges ad tomorrow s demads Aswers to your key etwork challeges ORTEC kows the highly competitive Courier,

More information

Professional Networking

Professional Networking Professioal Networkig 1. Lear from people who ve bee where you are. Oe of your best resources for etworkig is alumi from your school. They ve take the classes you have take, they have bee o the job market

More information

Plug-in martingales for testing exchangeability on-line

Plug-in martingales for testing exchangeability on-line Plug-i martigales for testig exchageability o-lie Valetia Fedorova, Alex Gammerma, Ilia Nouretdiov, ad Vladimir Vovk Computer Learig Research Cetre Royal Holloway, Uiversity of Lodo, UK {valetia,ilia,alex,vovk}@cs.rhul.ac.uk

More information

Research Article Sign Data Derivative Recovery

Research Article Sign Data Derivative Recovery Iteratioal Scholarly Research Network ISRN Applied Mathematics Volume 0, Article ID 63070, 7 pages doi:0.540/0/63070 Research Article Sig Data Derivative Recovery L. M. Housto, G. A. Glass, ad A. D. Dymikov

More information

Sequences and Series

Sequences and Series CHAPTER 9 Sequeces ad Series 9.. Covergece: Defiitio ad Examples Sequeces The purpose of this chapter is to itroduce a particular way of geeratig algorithms for fidig the values of fuctios defied by their

More information

DAME - Microsoft Excel add-in for solving multicriteria decision problems with scenarios Radomir Perzina 1, Jaroslav Ramik 2

DAME - Microsoft Excel add-in for solving multicriteria decision problems with scenarios Radomir Perzina 1, Jaroslav Ramik 2 Itroductio DAME - Microsoft Excel add-i for solvig multicriteria decisio problems with scearios Radomir Perzia, Jaroslav Ramik 2 Abstract. The mai goal of every ecoomic aget is to make a good decisio,

More information

Configuring Additional Active Directory Server Roles

Configuring Additional Active Directory Server Roles Maual Upgradig your MCSE o Server 2003 to Server 2008 (70-649) 1-800-418-6789 Cofigurig Additioal Active Directory Server Roles Active Directory Lightweight Directory Services Backgroud ad Cofiguratio

More information

ADAPTIVE NETWORKS SAFETY CONTROL ON FUZZY LOGIC

ADAPTIVE NETWORKS SAFETY CONTROL ON FUZZY LOGIC 8 th Iteratioal Coferece o DEVELOPMENT AND APPLICATION SYSTEMS S u c e a v a, R o m a i a, M a y 25 27, 2 6 ADAPTIVE NETWORKS SAFETY CONTROL ON FUZZY LOGIC Vadim MUKHIN 1, Elea PAVLENKO 2 Natioal Techical

More information

FEATURE BASED RECOGNITION OF TRAFFIC VIDEO STREAMS FOR ONLINE ROUTE TRACING

FEATURE BASED RECOGNITION OF TRAFFIC VIDEO STREAMS FOR ONLINE ROUTE TRACING FEATURE BASED RECOGNITION OF TRAFFIC VIDEO STREAMS FOR ONLINE ROUTE TRACING Christoph Busch, Ralf Dörer, Christia Freytag, Heike Ziegler Frauhofer Istitute for Computer Graphics, Computer Graphics Ceter

More information

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 13

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 13 EECS 70 Discrete Mathematics ad Probability Theory Sprig 2014 Aat Sahai Note 13 Itroductio At this poit, we have see eough examples that it is worth just takig stock of our model of probability ad may

More information

Agency Relationship Optimizer

Agency Relationship Optimizer Decideware Developmet Agecy Relatioship Optimizer The Leadig Software Solutio for Cliet-Agecy Relatioship Maagemet supplier performace experts scorecards.deploymet.service decide ware Sa Fracisco Sydey

More information

Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed.

Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed. This documet was writte ad copyrighted by Paul Dawkis. Use of this documet ad its olie versio is govered by the Terms ad Coditios of Use located at http://tutorial.math.lamar.edu/terms.asp. The olie versio

More information

FACIAL EXPRESSION RECOGNITION BASED ON CLOUD MODEL

FACIAL EXPRESSION RECOGNITION BASED ON CLOUD MODEL FACIAL EXPRESSION RECOGNITION BASED ON CLOUD MODEL Hehua Chi a, Liahua Chi b *, Meg Fag a, Juebo Wu c a Iteratioal School of Software, Wuha Uiversity, Wuha 430079, Chia - [email protected] b School of Computer

More information

Automatic Tuning for FOREX Trading System Using Fuzzy Time Series

Automatic Tuning for FOREX Trading System Using Fuzzy Time Series utomatic Tuig for FOREX Tradig System Usig Fuzzy Time Series Kraimo Maeesilp ad Pitihate Soorasa bstract Efficiecy of the automatic currecy tradig system is time depedet due to usig fixed parameters which

More information

Lecture 3. denote the orthogonal complement of S k. Then. 1 x S k. n. 2 x T Ax = ( ) λ x. with x = 1, we have. i = λ k x 2 = λ k.

Lecture 3. denote the orthogonal complement of S k. Then. 1 x S k. n. 2 x T Ax = ( ) λ x. with x = 1, we have. i = λ k x 2 = λ k. 18.409 A Algorithmist s Toolkit September 17, 009 Lecture 3 Lecturer: Joatha Keler Scribe: Adre Wibisoo 1 Outlie Today s lecture covers three mai parts: Courat-Fischer formula ad Rayleigh quotiets The

More information

Design and Implementation of a Publication Database for the Vienna University of Technology

Design and Implementation of a Publication Database for the Vienna University of Technology Desig ad Implemetatio of a Publicatio Database for the Viea Uiversity of Techology Karl Riedlig Istitute of Idustrial Electroics ad Material Sciece, TU Wie, A-040 Viea [email protected] Abstract:

More information

Chatpun Khamyat Department of Industrial Engineering, Kasetsart University, Bangkok, Thailand [email protected]

Chatpun Khamyat Department of Industrial Engineering, Kasetsart University, Bangkok, Thailand ocpky@hotmail.com SOLVING THE OIL DELIVERY TRUCKS ROUTING PROBLEM WITH MODIFY MULTI-TRAVELING SALESMAN PROBLEM APPROACH CASE STUDY: THE SME'S OIL LOGISTIC COMPANY IN BANGKOK THAILAND Chatpu Khamyat Departmet of Idustrial

More information

Prescribing costs in primary care

Prescribing costs in primary care Prescribig costs i primary care LONDON: The Statioery Office 13.50 Ordered by the House of Commos to be prited o 14 May 2007 REPORT BY THE COMPTROLLER AND AUDITOR GENERAL HC 454 Sessio 2006-2007 18 May

More information

E-Plex Enterprise Access Control System

E-Plex Enterprise Access Control System Eterprise Access Cotrol System Egieered for Flexibility Modular Solutio The Eterprise Access Cotrol System is a modular solutio for maagig access poits. Employig a variety of hardware optios, system maagemet

More information