Web Object Indexing Using Domain Knowledge *

Size: px
Start display at page:

Download "Web Object Indexing Using Domain Knowledge *"

Transcription

1 Web Object Indexng Usng Doman Knowledge * Muyuan Wang Department of Automaton Tsnghua Unversty Bejng , Chna (86-10) Zhwe L, Le Lu, We-Yng Ma Mcrosoft Research Asa Sgma Center, Hadan Dstrct Bejng , Chna (86-10) [email protected] {t-zl, llu, wyma}@mcrosoft.com Nayao Zhang Department of Automaton Tsnghua Unversty Bejng , Chna (86-10) [email protected] ABSTRACT Web object s defned to represent any meanngful object embedded n web pages (e.g. mages, musc) or ponted to by hyperlnks (e.g. downloadable fles). Users usually search for nformaton of a certan object, rather than a web page contanng the query terms. To facltate web object searchng and organzng, n ths paper, we propose a novel approach to web object ndexng, by dscoverng ts nherent structure nformaton wth doman knowledge. In our approach, Layered LSI spaces are bult for the herarchcally structured doman knowledge, n order to emphasze the specfc semantcs and term space n each layer of the doman knowledge. Then, the web object representaton s constructed by hyperlnk analyss, and further pruned to remove the noses. Fnally, the structure attrbutes of the web object are extracted wth the knowledge document that best matches the web object. Our approach also ndcates a new way to use trust-worthy Deep Web knowledge to help organze dspersve nformaton of Surface Web. Categores and Subject Descrptors H.2.8 [Database Management]: Database Applcatons Data Mnng; I.7.m [Document and Text Processng]: Mscellaneous; H.3.3 [Informaton Storage and Retreval]: Informaton Search and Retreval Clusterng, Selecton Process. orented search engne s requred. Consequently, t s expected that some ntegral technque could be developed to ndex the web objects. In ths paper, web objects are defned to represent those meanngful objects embedded n web pages (e.g. mages), or ponted to by hyperlnks (e.g. song streamng, downloadable fles). Usually, the surroundng texts (ncludng anchor text) can prelmnarly descrbe a web objects. Complementary nformaton of the web object may be also possble presented n the neghborng pages that have hyperlnks among them. Fgure 1 llustrates two examples of web object, wth or wthout descrptons n the surroundng texts. Fgure 1(a) stands for a book object, wth a lttle valuable nformaton n the surroundng text. The nformaton about ts author, publsher, and the ntroducton of the author are found n the pages hyperlnked wth t. Fgure 1(b) llustrates a song object wth the descrptons of ts contanng album and performer n the neghborng pages. In a broad sense, almost everythng on the web can be regard as some knd of web object, ncludng those vrtual objects or concepts descrbed n web pages (e.g. a musc revew or book revew). General Terms Algorthms, Performance, Expermentaton. Keywords Web object, ndexng, doman knowledge, latent semantc ndexng, lnk analyss, confdence propagaton, nformaton retreval 1. INTRODUCTION Contemporary web search engnes are manly page orented, that s to say, ther ndexng granularty s web page. As a result, they are only able to provde search results n the form of ranked web pages wth respect to user's query. However, n many cases, the users want to search for nformaton of a certan object rather than the web page. For example, users may use query artst Beatles to search for the bography about Beatles band, ther albums and songs, nstead of the web pages that contan the query terms only. To meet such nformaton need, an object Copyrght s held by the author/owner(s). Conference 05, Month 1 2, 2005, Cty, State, Country. Copyrght 2005 ACM /00/0004 $5.00. Fgure 1. Some examples of web object 1

2 Web objects usually le n some mplct structure organzatons. For example, song object can consttute a layer n the herarchcal structure of artst-album-song, as Fgure 2 shows. Structured organzaton of web objects provdes a good drectory to facltate user to browse web resources dstrbuted dspersvely. Fgure 2 Herarchcal organzaton of a musc database, whch ncludes three layers: artst, album and song. In some Deep Web stes, each node s a web page; relatonshps between them are presented by hyperlnks. To facltate web object searchng and organzng, n ths paper, we propose a general framework to ndex web objects, and further, organze them based on ther mplct structure. To acheve such an objectve, we should solve two major dffcultes, 1. Lack of nformaton. Usually, there s lttle descrptve textual nformaton around web objects. The nformaton may be nosy and nsuffcent to represent web objects. It s crtcal to enrch the descrpton of the target web object and prune the nosy nformaton. 2. Dffcult to dentfy structure. Even f the descrpton of a web object s suffcent and extracted, t stll leaves a dffcult problem to extract the structure nformaton of the web object. Automatc dscovery of mplct structure s very challengng. Although some herarchcal clusterng methods are proposed to detect semantc structure, the obtaned herarchy s usually not meanngful to the target web object that s doman specfc [15]. t wll also meet some tradtonal obstacles lke polysemy and synonymy due to varous word usages n dfferent web pages. Moreover, the tradtonal wrapper-based nformaton extracton approaches are not sutable n these cases, snce a wrapper s usually sutable to only one knd of web pages but could not adapt to all knds of web pages, whch are usually dverse n ther desgn format. To deal wth these problems, we propose a novel non-clusterng and non-wrapper approach to ndex web objects n varous web pages, by ntegratng lnk analyss and doman knowledge. Hyperlnk analyss s used to enrch the textual descrpton of the web object; and doman knowledge s used to help nose removng and structure attrbutes constructon of the domanspecfc web objects. Currently, many doman knowledge databases are avalable and well organzed n Deep Web stes, such as All Musc Gude (for musc) [8] and Amazon (for books), where the doman knowledge s usually organzed herarchcally, and each node s descrbed by an ndvdual web page. These databases provde meanngful and doman-specfc structures for web objects, and thus can be greatly helpful n web object ndexng. Wth herarchcal doman knowledge, web objects can be ndexed usng the structure nformaton contaned n the knowledge document that best matches the web object. From ths pont of vew, our proposal provdes a promsng way to use trust-worthy Deep Web nformaton to help nformaton organzaton of Surface Web. The rest of ths paper s organzed as follows: Secton 2 presents the proposed framework. Secton 3 descrbes our layered ndexng scheme for herarchcal doman knowledge. Secton 4 proposes our approach to web object representaton through text and lnk analyss. Secton 5 presents the matchng process between web objects and the documents n knowledge database. Fnally, the proposed approach s evaluated n Secton 6, and an example applcaton n musc doman s presented n Secton 7. Related work and conclusons are gven n Secton 8 and FRAMEWORK Fgure 3 llustrates our proposed framework for web object ndexng. It s manly composed of three steps: knowledge space buldng, web object representaton, and web object ndexng. Step One: Knowledge Space Buldng. In the frst step, the doman knowledge s ndexed by Latent Semantc Indexng (LSI), for further use n web object representaton and dentfcaton. A knowledge database s usually herarchcal or tree-lke (as shown n Fgure 2), where dfferent layers have dfferent semantcs. Therefore, t s a better choce to represent knowledge layer by layer. Thus, n ths step, knowledge s ndexed n each layer respectvely, and Layered LSI spaces are bult for the doman knowledge. Step Two: Web Object Representaton. In ths step, textual nformaton of the web object s extracted and pruned. Frstly, a prelmnary neghborhood graph s constructed around target web object wth hyperlnk analyss, n order to enrch the descrpton of the web object. Then, the content of each web page s projected to the Layered LSI spaces of doman knowledge, to remove the rrelevant words to the web object. In order to obtan a better web object representaton, neghborhood graph s further pruned to remove nosy pages. After prunng, all web pages n the pruned neghborhood graph are combned together to get a new descrpton vector of the target web object. Step Three: Web Object Indexng. The major task n ths step s to dentfy the structure attrbutes of target web object based on ts descrpton vector and doman knowledge. To accomplsh ths task, the smlarty between the descrpton vector of web object and each knowledge document s frstly measured n the Layered LSI spaces; and then a process of structure-based confdence propagaton s performed to select the knowledge document whch best matches the web object. The correspondng structure attrbutes n the knowledge document are then used to ndex the web object. Although the proposed framework emphaszes on utlzng the doman knowledge and the correspondng herarchcal structure, our approach s stll feasble for non-herarchcal structure, whch can be consdered as a specal case of herarchcal structures, wth the layer number equal to one. In ths smplfed case, our approach becomes equvalent wth the method proposed n [4]. * Ths work was performed when the frst author was a vstng student at Mcrosoft Research Asa 2

3 matrx nstead of term frequency only. smplest TFIDF formula s employed, as: In our approach, the N TFIDF( w) = f ( w) log (1) D( w) where w denotes a term, f(w) represents the term frequency, N s the dmenson of term space, and D(w) stands for the set of documents that contan term w. Supposng there are c layers n knowledge database, and the weghted term-document matrx n each layer s denoted as A, ( =1,,c), each matrx can be decomposed by SVD as T ΣV A = U (2) Then, largest k sngular values are selected to construct the latent semantc structure of A, denoted as A : k k k k T Σ V k A = U (3) Thus, each document (or query) q can be represented as k T U 1 k q n the new semantc space. Fgure 3 The framework of two-step ndexng engne It s noted that, n ths paper, we manly focus on web objects ndexng, and assume web objects have been detected. In practce, web object detecton s usually doman specfc and thus can be solved wth some smple heurstc rules. 3. KNOWLEDGE SPACE BUILDING Both web object descrptons and knowledge database are textual, but usually are authored by dfferent authors. Ths fact leads to ther dfferent term spaces. Consequently, drectly usng doman knowledge to dentfy web object s not feasble. In order to solve ths problem, n our approach, Layered Latent Semantc Indexng s used to ndex documents n the knowledge database. LSI s capable to deal wth polysemy and synonymy problems n some degree, as ndcated n many textual retreval applcatons [6]. 3.1 Layered LSI of Knowledge Database Usually, each layer of the herarchcally structured knowledge represents dfferent concept, and has dfferent term space. For nstance, the three layers n musc doman knowledge represent the concepts of artst, album and song, respectvely; and the term Beatles s obvously more probable to appear at the artst layer, but not at the layer of album or song. Moreover, the scale of the term space of each layer s also dfferent. Thus, f all layers are ndexed together nto one LSI space, the layer wth smaller term space wll be overwhelmed by the larger one. That means the term mportance n the correspondng layer wll be unfarly reduced. Therefore, n our approach, we ndex each layer ndependently and thus compose Layered LSI spaces. In our approach, the textual content contaned n each record or page of the knowledge database s consdered as a knowledge document. These documents are tokenzed to construct a termdocument co-occurrence matrx. To mprove the effectveness of LSI, TFIDF s used as the weghtng functon n the co-occurrence 3.2 Dscusson Doman knowledge wll be further used n the web object representaton and web object ndexng. In these modules, all web pages (descrbng the target web object) are mapped to the Layered LSI spaces, n order to remove rrelevant words that are out of doman. As mentoned above, each layer of doman knowledge has ts unque semantc and term space. Thus, when we project each page nto the LSI space of one layer, n some degree, only the words that are accord wth the correspondng semantcs are amplfed; whle other rrelevant words are suppressed or removed. Therefore, n the further processng, the smlarty measure between two web pages or between web object and knowledge document, s computed n the Layered LSI spaces. It s more reasonable than drect comparson n the orgnal space. Meanwhle, such smlarty comparson actually s measured n the LSI space of each layer, respectvely. That s, the comparson s made from dfferent semantc aspects. The basc dea behnd ths process s that we can use the nformaton from all knds of aspects to cross-verfy the fnal decson. Our experments also ndcate the effectveness of ths method. 4. WEB OBJECT REPRESENTATION The prelmnary nformaton of the web object can be found n the web page contanng t or the correspondng web block [15]. However, the nformaton s usually nsuffcent to descrbe the web object and may also contan many nosy words. In order to mprove the web object representaton, a neghborhood graph around the web object s constructed wth text and hyperlnk analyss. Neghborhood pages are consdered n ths case snce they usually can help verfy or complement nformaton about web object. In ths secton, we frstly present the algorthm on neghborhood graph buldng and then address how to prune the nosy pages, whch are rrelevant to the target web object. 4.1 Neghborhood Graph Buldng To buld a neghborhood graph, the web page or the web block contanng the target web object s taken as the graph centre, and 3

4 all the web pages are connected accordng to ther hyperlnks. The utlzed approach to neghborhood graph buldng s based on the technque proposed by Harmandas [4]. That s, t constructs an undrected graph, where each web page s taken as a node and each hyperlnk as an edge, as llustrated n Fgure 4. In the followng sectons, the web page (or web block) contanng the target web object s referred as contaner for smplcty, whle the pages lnked wth the contaner are called neghborhood pages, Fgure 4 A 2-layer neghborhood graph wth 1- and 2-step pages In general, the less steps the page s from the contaner, the more closely the correspondng node s related to the target web object. Therefore, the number of the Graph Layer s an mportant parameter n web object representaton. Wth larger neghborhood layer number, more nformaton can be used to descrbe the web object, but more rrelevant (nosy) nformaton s also ntroduced. In the experments, we wll dscuss ts selecton. 4.2 Neghborhood Graph Prunng The neghborng graph s assumed contanng more nformaton to descrbe the target web object, compared wth the contaner tself. However, n the graph there are possble many nosy pages, whch do not contrbute to our task and wll even lead to 'topc drft'[1], such as the 'contact us'page and 'how to buy'page. They are necessary to be removed for the further processng. To remove the nosy pages, the smlarty between each neghborhood page and the contaner should be calculated, assumng the pages smlar to the contaner are more relevant to the web object. In our approach, the cosne measure s used to measure smlarty between each neghborhood page and contaner n each layer s LSI space. The smlarty n the jth-layer actually represents the confdence that whether the correspondng page has mutual semantcs wth the contaner from the vew of the jth-layer. Tradtonally, the pages wth small smlarty can be consdered as nosy pages [1]. In our experments, a porton of pages wth the smallest smlarty n each layer s taken as nosy pages. The threshold s called cutoff percent n our paper and s determned expermentally. By ths way, we get a set of nosy pages from each layer of doman knowledge, respectvely. Snce dfferent pages have dfferent semantcs or functons, they may get dfferent confdence n dfferent layers. Therefore, only the ntersecton of these canddate sets s confrmed as nosy pages, and then pruned from neghborhood graph. For some specal cases where the knowledge database s not herarchcal, e.g., for that only conssts of some plan text documents, our prunng scheme s stll avalable wth the layer number equal to 1. Once removng nosy pages n neghborhood graph, we can ntegrate the textual nformaton of all pages together to better descrbe the web object. In our approach, the weghted centrod (of neghborhood graph) s smply used as a descrpton vector of web object, whch s defned as, C = w D (4) D G where G s the pruned neghborhood graph, D s a web page n G, and w s the weghtng of web page D. The weghtng s set accordng to the path steps from the correspondng web page to the contaner. A general rule s, the less steps the page s, the hgher weghtng t has. In our approach, the weghtng s expermentally defned as equaton (5) and then normalzed to one, 1 w (5) log( d + 2) where d s the correspondng path steps, and the coeffcent 2 s heurstcally added n order to avod zero n the denomnator. 5. WEB OBJECT INDEXING To ths end, we get a descrpton vector of the target web object n Layered LSI spaces. Then, the vector s used to extract more exact structure nformaton of the target web object, by dscoverng the most approprate knowledge document whch best matches the web object. Thus, the problem s converted to textual nformaton matchng, that s, matchng between the web object representaton and each document n doman knowledge. At last, the obtaned structure data n knowledge document wll be used to ndex web objects. 5.1 Object-knowledge Matchng To acheve object-knowledge matchng, the smlarty between the descrpton vector and each knowledge document n each layer s calculated usng cosne measure. Usually, a web object belongs to only one certan layer of knowledge database. Therefore, our goal can be smply acheved by fndng the canddate document that s most smlar to the target web object, from the correspondng target layer n the knowledge database. However, ths method mght be not accurate enough, snce the smlarty score only provdes some confdence, and some noses would harm the matchng accuracy. To deal wth ths ssue, the herarchcal structure of doman knowledge s utlzed, and each canddate document n the target layer s re-scored or re-ranked by confrmng wth ts relatves (ncludng ancestors and offsprng). For example, as Fgure 5 shows, when we re-rank the confdence of the node T (a canddate document), the confdence of ts ancestors (node 1-2) and offsprng (node 3-7) are all ncorporated. Ths process s called confdence propagaton. The confdences from the canddate document s relatves actually represent the smlartes between the canddate document and the target web object n the correspondng semantc attrbutes (layers). The fnal score s thus cross-verfed from varous semantc aspects. The confdence propagaton from the relatves of a canddate document can be dvded nto two drectons, propagatng from ts ancestors (root) nodes (top-down), and propagatng from ts offsprng nodes (bottom-up), as Fgure 5 llustrates. In top-down propagaton, we drectly sum the confdence score of up-layers n the path from root to canddate document. It s feasble snce the 4

5 path from root to the canddate document s unque. However, from ts offsprng, the bottom-up propagaton path s not unque. We could also sum scores of all paths from the canddate document to ts offsprng for confdence propagaton. However, t s usually unfar snce the canddate documents wth lots of offsprng would be over-weghted. To deal wth ths problem, the canddate document wth a bg sub-tree s punshed by dvdng the number of ts chldren, n our approach. Fgure 5 Web object matchng and confdence propagaton The detaled formula of confdence propagaton s shown n equaton (6), whch sequentally combnes the nformaton from upper and lower layers, as well as normalzes nformaton comng from lower layers. D( r ) = s( r ) + D( k), 1 < TL k par( r ) U ( k) U ( r ) = s( r ) +, NC( r ) k ch( r ) TL < < L U ( l) S( rtl ) = s( rtl ) + D( k) + NC( r ) k par( rtl ) l ch( rtl ) TL where s(.) represents the orgnal confdence of a document, S(.) means the updated confdence score of a document, U(.) and D(.) means the propagated confdence from bottom-up and top-down, respectvely. r s a knowledge document and ts subscrpt represents ts layer number. L s the layer number of doman knowledge and TL s the target layer. par(.) and ch(.) stand for the set of parents and chldren of a document, respectvely, and NC(.) s the number of ts chldren. After confdence propagaton, the canddate document wth the hghest smlarty score S(r TL ) s chosen as the optmal document, whch best descrbes the structure attrbutes of the target web object. The propertes of ths web object are further used n ndexng and organzaton. It s noted that, n the case that the knowledge database s not herarchcal, we need not do any confdence propagaton. 5.2 Indexng Scheme Once web object s matched wth a knowledge document, the trust-worthy doman knowledge can be used to further ndex and organze web objects. It can be dvded nto two cases: 1. If the doman knowledge s a fully-structured database, the record attrbutes of the knowledge document can be used to represent the structure nformaton of the web object and then (6) ndex them. For example, f a song object s matched wth a knowledge document, n whch the correspondng metadata, such as artst, album, genre and release year, are avalable, these structure data can be further used to ndex the song object. Furthermore, many applcatons can be developed based on these structure ndces. 2. If the doman knowledge s not a structure database, that s, the knowledge document s not a record but plan text, we can use the text to descrbe web object better. Then, tradtonal ndexng scheme n text search/retreval can be used to ndex the web objects. 6. EXPERIMENTS In ths secton, we evaluate the performance of the proposed approach n the musc doman, ncludng, 1. We evaluate the overall performance of our approach, respect to varous settngs of neghborhood graph, such as the graph layer and cutoff percent mentoned n secton 4. We also look nto the performance on dfferent web objects. 2. We compare our approach wth the algorthm proposed n [4] (note that the algorthm n [4] s just a specfc verson of our approach, wth the cutoff percent equals 0). 3. We evaluate the effectveness of each module of our proposed framework, ncludng doman knowledge ndexng, neghborhood graph n web object representaton, and confdence propagaton n web object matchng. In order to measure the precson of our approach, the process of web object ndexng s vewed as an IR problem. That s, the target web object s taken as a query, the knowledge documents are ranked accordng to ther confdences, and then the top N are returned. Thus, the precson at top N results s used to evaluate the performance, as P@ N = q Q N q R Q (7) where Q s the query set and q s a query, R N s the top N results correspondng to query q, and s to get the count. In our experments, P@1, P@5, and P@10 are used for performance evaluaton. All our algorthms are mplemented n C++, and experments are run on two workstatons wth two 3.0G Hz Intel CPUs each and 3.0G memory. For SVD, we use SVDPACK whch can deal wth large-scale sparse matrx [9][13]. 6.1 Data Preparaton Musc s one knd of most valuable objects on the web, and many mature musc databases exst as Deep Web stes, whch can be used n our experments convenently. Therefore, our approach s performed and evaluated n the musc doman. In ths secton, we present the data preparaton for doman knowledge and testng musc objects, whch are further dvded nto song objects, album objects and artst objects Doman Knowledge Preparaton In our approach, the doman knowledge about musc s collected from All Musc Gude [8], whch s a Deep Web ste. The artst pages, album pages and song pages are downloaded, and then organzed n a three-layered, tree-lke structure, n whch each layer represents artst, album, and song, respectvely, as 5

6 shown n Fgure 2. The detals of our collected doman knowledge are shown n Table 1. Overall, there are pages, ncludng song pages, 4635 album pages and 1599 artst pages. Type #Documents #Overall Artst 1599 Album 4635 Song Table 1 The detals of our doman knowledge tree A specfc HTML parser s developed to re-buld up the knowledge herarchy from the crawled pages. The parser s used frst to explore the mplct semantc relatonshps between pages; and second, to extract specfc keywords, such as artst name, album ttle, and genre, for further web objects ndexng and other applcatons. After the knowledge s ready, LSI s used to ndex them. In text retreval doman, many researchers reported that keepng dmensons s a good trade-off between performance and computaton complexty of LSI. In our experments, the dmenson s set as 200 accordngly. Based on our algorthms, three layers of the knowledge tree are ndexed respectvely. For comparson, we also used the non-layered LSI, by combnng all documents of three layers nto a whole corpus Musc Objects Collecton As mentoned above, we manly focus on web objects ndexng, and assume web objects have been detected. Snce there are few works address ths task, n our experments, web objects are detected wth some smple heurstc rules. For example, a song object usually assocates wth some fle-type ndcators, such as fle extenson 'rm'or 'wma', and keywords lsten. In order to collect the testng web objects, we frst collect the top queres from search log of All Musc Gude n one week, whch ncludes hundred of songs, albums and artsts, respectvely. The queres not contaned n our knowledge tree are removed n order that each potental musc object has correspondng structure nformaton n the doman knowledge. Then, we throw the vald queres nto Google to obtan some related web pages, and crawl pages by usng them as seeds. In the crawled pages, only those, whch contan hyperlnks pontng to musc fles (wth specal extensons, such as rm, mp3, wma), are labeled as musc object pages. Table 2 shows the detals of our testng musc objects. A 2-layer neghborhood graph s crawled for each musc object. We do not consder 3-step pages (n the thrd layer), snce the web pages ncrease explosvely but a majorty of the pages are rrelevant to the target musc object. We also lmt the total page number under 500 n neghborhood graph buldng, n order to lmt the computatons. Object Type #Object #Average pages Artst Album Song Table 2. The detals of musc objects n testng set Fnally, the ground truth s manually annotated for evaluatons. 6.2 Overall Performance In the frst experment, we evaluate the overall performance of the proposed approach on testng song objects. In summary, our approach s confgured as followng modules, doman knowledge representaton usng Layered LSI spaces, neghborhood graph buldng and prunng, and herarchcal web object matchng. However, the performance s also related to two parameters, graph layer and cutoff percent n the constructon of the neghborhood graph. Fgure 6 llustrates the system performance (P@5) on song objects ndexng, correspondng to varous graph layer and cutoff percents. In our experments, the graph layer s chosen as 1 or 2, due to the explosve number of rrelevant 3-step pages; and the cutoff percent ranges from 25% to 100%, wth an nterval of 5%. Fgure 6 Overall performance respect to varous graph layers and cutoff percents From Fgure 6, t can be seen that more than 80% web objects can be correctly found n the returned top 5 knowledge documents, wth 60% cutoff n 1-layer graph and 85% n 2-layer graph. It can be also seen that, wth the ncrease of cutoff percent, the performance s frstly ncreases and then decreases. It s reasonable, snce when the cutoff percent s small, nosy nformaton manly affects the performance; whle the cutoff percent s large, relevant nformaton may be also fltered so that the precson decreases. Moreover, the best performance usng 2-layer graph s 4% better than that of 1-layer graph. However, a majorty (85%) of pages n the 2-layer neghborhood graph are consdered as nose and removed. Although t means some waste n page crawlng and parsng, n the followng experments, we stll uses 2-layer graph (ncludng contaner, 1-step and 2-step pages) wth optmal cutoff 85% as our default settng, n order to get hgher performance. In our experments, we also fnd the performance of our approach s heavly depended on descrptve nformaton of the song object. In order to look nto the performance on dfferent testng song objects, we manually classfy them nto three sets, ncludng self-descrptve, mult-page descrptve and nondescrptve object. The self-descrptve object contans suffcent nformaton n ts contaner page for human to tell ts structure property; and the mult-page descrptve object have to be dentfed from the contaner combnng wth the neghborhood pages; whle the non-descrptve object does not contan suffcent dscrmnatve nformaton n ts neghborhood graph. 6

7 Fgure 7 Precson curve of dfferent set of song objects Fgure 7 llustrates the performances correspondng to dfferent sets of song objects. From the fgure, we fnd that the performance on self-descrptve set frst ncreases and then almost keep the same when cutoff percent grows, whle the performance on mult-page descrptve set has a ncreasng phrase and decreasng phrase, wth optmal cutoff equal to 85%. It actually reflects the tradeoff between the nosy nformaton and complementary nformaton contaned n the neghborhood pages. However, the performance of non-descrptve set s always poor. It s ntutve snce no suffcent nformaton s provded n the neghborng pages. 6.3 Look Into Each Module The above secton evaluates the overall performance of our proposed approach. In ths secton, we evaluate the effectveness of each module whch s correspondng to each step n secton 2, ncludng Layered LSI spaces for doman knowledge, neghborhood graph, and confdence propagaton Doman Knowledge wth Layered LSI Indexng In our approach, doman knowledge s used, and Layered LSI spaces are bult to represent doman knowledge, snce dfferent layer of doman knowledge usually has dfferent meanng and term space. In order to evaluate the effectveness of doman knowledge and Layered LSI, an experment s desgned to compare the methods wth dfferent usages of knowledge, n the module of neghborhood graph prunng (keep other modules the same). The compared methods nclude 1) Layered LSI on doman knowledge, 2) one-layer LSI on doman knowledge, and 3) wthout doman knowledge (that s, comparng pages n neghborhood graph drectly [1]). The performance comparson of these three approaches s shown n fgure 8. It s obvous that the performance of Layered LSI ndexng s best. It s about 5% better than that wthout LSI, and about 3% better than one-layer ndexng. It not only ndcates knowledge s helpful n web object representaton, but also ndcates that layered ndexng can obtan better results Neghborhood Graph for Object Descrpton In secton 4, the neghborhood graph s bult to enrch the descrpton of the target web object, snce the contaner page usually does not provde suffcent dscrmnatve nformaton. It s especally sutable for those mult-page descrptve objects. Fgure 9 llustrates the performance of mult-page descrptve object ndexng, comparng among 1) 1-layer neghborhood graph (wth optmal cutoff 60%), 2) 2-layer neghborhood graph (wth optmal cutoff 85%), and 3) wthout neghborhood graph (only the contaner page). Fgure 9 Performance comparsons wth/wthout neghborhood graph for mult-page descrptve song objects It can be seen that, wth consderng neghborhood pages, the performance s dramatcally mproved. For example, the P@1 and P@5 are almost doubled after usng 1-layer graph or 2-layer graph Confdence Propagaton for Object Matchng In Secton 5, confdence propagaton s performed to re-score the smlarty between a canddate document and the target web object. After propagaton, the confdence of canddate document s verfed by confrmng wth ts relatves, and consequently, better results are obtaned. Fgure 10 llustrates performance comparson between wth confdence propagaton and wthout confdence propagaton, n the module of web object matchng. Fgure 8 Precson comparson on dfferent ndexng approaches Fgure 10 Performance comparsons wth/wthout confdence propagaton 7

8 It can be seen that, wth confdence propagaton, the precson and s mproved about 5%-7%. It ndcates confdence propagaton s also very helpful n web object ndexng. 6.4 Performance on Dfferent Musc Object To explore the performance on musc objects n the same doman but located n dfferent layers of the knowledge database, we also evaluate the performance on album objects and artst objects. The detaled comparsons are llustrated n Fgure 11. Fgure 11 Performance on musc objects n dfferent layers It s noted that, the performance on artst and album object s much better than song object, especally on P@1. There are two reasons. Frstly, due to the dstnct page propertes n musc doman, the terms used to descrbe an artst or album s usually much more than a song. In other words, there s much more nformaton that can be used to dstngush artsts and albums. Secondly, nformaton n song page s usually not dscrmnatve. By combnng wth neghborhood pages, the descrpton of a song object has a large overlap wth those songs from the same album. As a result, t s usually dffcult to dscrmnate the songs from a same album, whle t s relatvely easy to dentfy an album or artst object. Therefore, the P@1 of song object s much lower than that of albums and artsts; however, for P@10, the values are very smlar snce an album usually has about 10 songs only. 7. APPLICATION: MUSIC SEARCH The proposed technology on web object ndexng can be used n varous domans. To llustrate further use of ndexed web objects, an example prototype on musc search s developed n ths secton. The prototype s desgned to organze numerous musc resources dspersed on the Surface Web. Wth the obtaned structured ndces usng our approach, the search results are organzed accordng to ther mplct structure. Moreover, all the related nformaton, whch s dscovered from the knowledge documents, could also be assocated wth the search results and presented to users. Fgure 12 llustrates a snapshot of the prototype system. Wth the query yesterday, the returned results are organzed accordng to ther artst, album and year. As shown n the left panel of Fgure 12, the dfferent versons of yesterday and smlar songs were clustered to dfferent artsts, such as the Beatles, Boys II man and Clla Black; and then they are further classfed nto dfferent albums, such as Sessons and Yesterday and Today. As our experment shows, although t s dffcult to dentfy a song, the accuracy of album and artst dentfcaton s relatvely hgh. Therefore, organze the searchng results nto artst and album s feasble based on our proposed approach. Moreover, the related nformaton of current category, whch are obtaned from the matched knowledge documents, s shown n the rght panel of Fgure 12, ncludng song detals, album nformaton and track lst. The ndexed web objects, whch are obtaned from the Surface Web and assocated wth the current category, are also lsted n the Web Objects block. The prototype provdes more related and organzed nformaton than contemporary web musc search engnes. The obtaned structure s more meanngful n the doman than those obtaned by tradtonal semantc clusterng approaches [14]. 8. RELATED WORK Prevous works on object-orented ndexng can be traced back to the framework of mage retreval proposed by Harmandas [4], n whch a model for Web mages retreval s presented. The model s based on combnng the text content and hypertext n neghborhood graph. However, ther defnton on objects s lmted to web mages, and they dd not consder fndng the underlyng structures of web objects, ether. There are also some efforts to obtan musc structure nformaton from Deep Web stes. For example, Wndows Meda Player 10 has the functon to extract album nformaton of a song from MSN Musc, based on other metadata embedded n the ID3 tag. Usng a connectve cluster of web pages to mprove web search s also addressed n some work. Klenberg's HITS [2] s an early mportant effort usng lnk analyss to rank documents, whle t only depends on n/out-degrees of lnks. Bharat et al. [1] dentfed a problem of HITS: topc drft, and proposed to resolve t by content analyss n local graph. Ths modfcaton acheved a remarkable better precson n topc dstllaton applcaton. By manpulatng the weghts of pages, Chang et al. [3] created customzed authorty by addng more weghts to the documents that users are nterested n. Recently, researchers proposed to use connectve analyss to mprove term weghtng scheme n hyperlnk envronment [5]. However, none of them consdered usng doman knowledge to mprove the combnaton of lnk analyss and content analyss. Smlar to our work (matchng web object representaton n knowledge database), "DB+IR" [10][11][12] was proposed n the database doman to use unstructured query to search structured database. Dfferent from tradtonal SQL (Structure Query Language) technologes, some works n "DB+IR" regard database records as plan text, and use IR technologes to ndex and search database. Some researchers also proposed to mprove search precson by usng the structure of database, such as XSearch[12], whch utlzed the structure of XML DOM tree, and proposed to use ILF(Inverse Leaf Frequency) to punsh common terms n leaf nodes, whch was replacement of tradtonal IDF. 9. CONCLUSION In ths paper, we propose a novel approach to ndexng web objects by usng correspondng doman knowledge. Although n ths paper our method s utlzed and evaluated n the musc doman only, we beleve that our approach s feasble for general web objects ndexng. Here we conclude our approach: 8

9 1. Doman knowledge from tradtonal databases or Deep Web stes could be used to ndex web objects to help buld a domanspecfc structure. Ths method overcomes the dsadvantages of tradtonal approaches on herarchcal clusterng and automatc ontology buldng. 2. Indexng herarchcal knowledge database layer by layer s a good strategy, whch emphaszes each layer's concepts and suppresses noses effectvely. Moreover, ths method enables us to compare objects from varous aspects (varous attrbutes of object). It s an mprovement of the tradtonal approaches usng knowledge. 3. Confdence propagaton can effectvely utlze the relatonshps among nodes of knowledge tree. Each node's confdence can be amplfed or suppressed wth the confdences of ts relatves. It actually also provdes an enhancement for tradtonal IR methods. In the future works, we wll apply our approach nto more domans to buld effcent systems. We wll also further develop a better approach to web object detecton, and use technologes on web nformaton extracton to obtan better descrpton of the web objects. 10. REFERENCES [1] K. Bharat, and M. R. Henznger. Improved Algorthms for Topc Dstllaton n a Hyperlnked Envronment. In Proceedngs of the SIGIR conference on Informaton Retreval, 1998, pages [2] J. Klenberg. Authortatve sources n a hyperlnked envronment. Journal of the ACM, Volume 46, Issue 5, pages: , 1999 [3] H. Chang, D. Cohn, and A. McCallum. Creatng customzed authorty lsts. In Proceedngs of the Seventeenth Internatonal Conference on Machne Learnng, [4] V. Harmandas, M. Sanderson, and M. D. Dunlop. Image retreval by hypertext lnks. In Proceedngs of the SIGIR conference on Informaton Retreval (SIGIR 97), 1997 [5] K. Sugyama, K. Hatano, M. Yoshkawa, and S. Uemura. Refnement of TF-IDF Schemes for Web Pages usng ther Hyperlnked Neghborng Pages. In Proceedngs of the fourteenth conference on Hypertext and Hypermeda, 2003 [6] S. Deerwester, S. Dumas, et. al. Indexng by Latent Semantc Analyss. Journal of the Amercan Socety for Informaton Scence, 41-6, 1990, pages [7] Wordnet 2.0: A lexcal database for the Englsh language [8] All Musc Gude. [9] B. Mchael, D. Theresa, et. al. SVDPACKC (Verson 1.0) User's Gude. Techncal Report: UT-CS , Unversty of Tennessee [10] B. Y. Rcardo, and M. P. Consens. The contnued saga of DB-IR ntegraton. The 30th Internatonal Conference on Very Large Databases (VLDB) Tutoral, 2004 [11] V. Hrstds, and Y. Papakonstantnou. DISCOVER: Keyword Search n Relatonal Databases, In Proceedngs of the 28th Internatonal Conference on Very Large Databases (VLDB), 2002 [12] S. Cohen, J. Mamou, Y. Kanza, and Y. Sagv. XSEarch: A Semantc Search Engne for XML. In Proceedngs of the 29th Internatonal Conference on Very Large Databases (VLDB), 2003 [13] SVDPACK. [14] B. Lu, C. W. Chn, and H. T. Ng. Mnng Topc-Specfc Concepts and Defntons on the Web. In Proceedngs of the twelfth nternatonal World Wde Web conference (WWW2003), May 2003 [15] D. Ca, S. Yu, J. Wen, and W.-Y. Ma. VIPS: a Vson-based Page Segmentaton Algorthm, MSR-TR ,

10 Fgure 12 A snapshot of a prototype musc search system, utlzng the ndexng created by our proposed approach 10

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis The Development of Web Log Mnng Based on Improve-K-Means Clusterng Analyss TngZhong Wang * College of Informaton Technology, Luoyang Normal Unversty, Luoyang, 471022, Chna [email protected] Abstract.

More information

An Interest-Oriented Network Evolution Mechanism for Online Communities

An Interest-Oriented Network Evolution Mechanism for Online Communities An Interest-Orented Network Evoluton Mechansm for Onlne Communtes Cahong Sun and Xaopng Yang School of Informaton, Renmn Unversty of Chna, Bejng 100872, P.R. Chna {chsun,yang}@ruc.edu.cn Abstract. Onlne

More information

An Alternative Way to Measure Private Equity Performance

An Alternative Way to Measure Private Equity Performance An Alternatve Way to Measure Prvate Equty Performance Peter Todd Parlux Investment Technology LLC Summary Internal Rate of Return (IRR) s probably the most common way to measure the performance of prvate

More information

Vision Mouse. Saurabh Sarkar a* University of Cincinnati, Cincinnati, USA ABSTRACT 1. INTRODUCTION

Vision Mouse. Saurabh Sarkar a* University of Cincinnati, Cincinnati, USA ABSTRACT 1. INTRODUCTION Vson Mouse Saurabh Sarkar a* a Unversty of Cncnnat, Cncnnat, USA ABSTRACT The report dscusses a vson based approach towards trackng of eyes and fngers. The report descrbes the process of locatng the possble

More information

Semantic Link Analysis for Finding Answer Experts *

Semantic Link Analysis for Finding Answer Experts * JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 28, 51-65 (2012) Semantc Lnk Analyss for Fndng Answer Experts * YAO LU 1,2,3, XIAOJUN QUAN 2, JINGSHENG LEI 4, XINGLIANG NI 1,2,3, WENYIN LIU 2,3 AND YINLONG

More information

Enterprise Master Patient Index

Enterprise Master Patient Index Enterprse Master Patent Index Healthcare data are captured n many dfferent settngs such as hosptals, clncs, labs, and physcan offces. Accordng to a report by the CDC, patents n the Unted States made an

More information

What is Candidate Sampling

What is Candidate Sampling What s Canddate Samplng Say we have a multclass or mult label problem where each tranng example ( x, T ) conssts of a context x a small (mult)set of target classes T out of a large unverse L of possble

More information

A Performance Analysis of View Maintenance Techniques for Data Warehouses

A Performance Analysis of View Maintenance Techniques for Data Warehouses A Performance Analyss of Vew Mantenance Technques for Data Warehouses Xng Wang Dell Computer Corporaton Round Roc, Texas Le Gruenwald The nversty of Olahoma School of Computer Scence orman, OK 739 Guangtao

More information

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module LOSSLESS IMAGE COMPRESSION SYSTEMS Lesson 3 Lossless Compresson: Huffman Codng Instructonal Objectves At the end of ths lesson, the students should be able to:. Defne and measure source entropy..

More information

Assessing Student Learning Through Keyword Density Analysis of Online Class Messages

Assessing Student Learning Through Keyword Density Analysis of Online Class Messages Assessng Student Learnng Through Keyword Densty Analyss of Onlne Class Messages Xn Chen New Jersey Insttute of Technology [email protected] Brook Wu New Jersey Insttute of Technology [email protected] ABSTRACT Ths

More information

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College Feature selecton for ntruson detecton Slobodan Petrovć NISlab, Gjøvk Unversty College Contents The feature selecton problem Intruson detecton Traffc features relevant for IDS The CFS measure The mrmr measure

More information

WISE-Integrator: An Automatic Integrator of Web Search Interfaces for E-Commerce

WISE-Integrator: An Automatic Integrator of Web Search Interfaces for E-Commerce WSE-ntegrator: An Automatc ntegrator of Web Search nterfaces for E-Commerce Ha He, Wey Meng Dept. of Computer Scence SUNY at Bnghamton Bnghamton, NY 13902 {hahe,meng}@cs.bnghamton.edu Clement Yu Dept.

More information

Design and Development of a Security Evaluation Platform Based on International Standards

Design and Development of a Security Evaluation Platform Based on International Standards Internatonal Journal of Informatcs Socety, VOL.5, NO.2 (203) 7-80 7 Desgn and Development of a Securty Evaluaton Platform Based on Internatonal Standards Yuj Takahash and Yoshm Teshgawara Graduate School

More information

Can Auto Liability Insurance Purchases Signal Risk Attitude?

Can Auto Liability Insurance Purchases Signal Risk Attitude? Internatonal Journal of Busness and Economcs, 2011, Vol. 10, No. 2, 159-164 Can Auto Lablty Insurance Purchases Sgnal Rsk Atttude? Chu-Shu L Department of Internatonal Busness, Asa Unversty, Tawan Sheng-Chang

More information

320 The Internatonal Arab Journal of Informaton Technology, Vol. 5, No. 3, July 2008 Comparsons Between Data Clusterng Algorthms Osama Abu Abbas Computer Scence Department, Yarmouk Unversty, Jordan Abstract:

More information

Using Content-Based Filtering for Recommendation 1

Using Content-Based Filtering for Recommendation 1 Usng Content-Based Flterng for Recommendaton 1 Robn van Meteren 1 and Maarten van Someren 2 1 NetlnQ Group, Gerard Brandtstraat 26-28, 1054 JK, Amsterdam, The Netherlands, [email protected] 2 Unversty of

More information

How To Understand The Results Of The German Meris Cloud And Water Vapour Product

How To Understand The Results Of The German Meris Cloud And Water Vapour Product Ttel: Project: Doc. No.: MERIS level 3 cloud and water vapour products MAPP MAPP-ATBD-ClWVL3 Issue: 1 Revson: 0 Date: 9.12.1998 Functon Name Organsaton Sgnature Date Author: Bennartz FUB Preusker FUB Schüller

More information

Luby s Alg. for Maximal Independent Sets using Pairwise Independence

Luby s Alg. for Maximal Independent Sets using Pairwise Independence Lecture Notes for Randomzed Algorthms Luby s Alg. for Maxmal Independent Sets usng Parwse Independence Last Updated by Erc Vgoda on February, 006 8. Maxmal Independent Sets For a graph G = (V, E), an ndependent

More information

8 Algorithm for Binary Searching in Trees

8 Algorithm for Binary Searching in Trees 8 Algorthm for Bnary Searchng n Trees In ths secton we present our algorthm for bnary searchng n trees. A crucal observaton employed by the algorthm s that ths problem can be effcently solved when the

More information

Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network

Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network 700 Proceedngs of the 8th Internatonal Conference on Innovaton & Management Forecastng the Demand of Emergency Supples: Based on the CBR Theory and BP Neural Network Fu Deqang, Lu Yun, L Changbng School

More information

Forecasting the Direction and Strength of Stock Market Movement

Forecasting the Direction and Strength of Stock Market Movement Forecastng the Drecton and Strength of Stock Market Movement Jngwe Chen Mng Chen Nan Ye [email protected] [email protected] [email protected] Abstract - Stock market s one of the most complcated systems

More information

A Secure Password-Authenticated Key Agreement Using Smart Cards

A Secure Password-Authenticated Key Agreement Using Smart Cards A Secure Password-Authentcated Key Agreement Usng Smart Cards Ka Chan 1, Wen-Chung Kuo 2 and Jn-Chou Cheng 3 1 Department of Computer and Informaton Scence, R.O.C. Mltary Academy, Kaohsung 83059, Tawan,

More information

The OC Curve of Attribute Acceptance Plans

The OC Curve of Attribute Acceptance Plans The OC Curve of Attrbute Acceptance Plans The Operatng Characterstc (OC) curve descrbes the probablty of acceptng a lot as a functon of the lot s qualty. Fgure 1 shows a typcal OC Curve. 10 8 6 4 1 3 4

More information

PEER REVIEWER RECOMMENDATION IN ONLINE SOCIAL LEARNING CONTEXT: INTEGRATING INFORMATION OF LEARNERS AND SUBMISSIONS

PEER REVIEWER RECOMMENDATION IN ONLINE SOCIAL LEARNING CONTEXT: INTEGRATING INFORMATION OF LEARNERS AND SUBMISSIONS PEER REVIEWER RECOMMENDATION IN ONLINE SOCIAL LEARNING CONTEXT: INTEGRATING INFORMATION OF LEARNERS AND SUBMISSIONS Yunhong Xu, Faculty of Management and Economcs, Kunmng Unversty of Scence and Technology,

More information

DEFINING %COMPLETE IN MICROSOFT PROJECT

DEFINING %COMPLETE IN MICROSOFT PROJECT CelersSystems DEFINING %COMPLETE IN MICROSOFT PROJECT PREPARED BY James E Aksel, PMP, PMI-SP, MVP For Addtonal Informaton about Earned Value Management Systems and reportng, please contact: CelersSystems,

More information

Study on Model of Risks Assessment of Standard Operation in Rural Power Network

Study on Model of Risks Assessment of Standard Operation in Rural Power Network Study on Model of Rsks Assessment of Standard Operaton n Rural Power Network Qngj L 1, Tao Yang 2 1 Qngj L, College of Informaton and Electrcal Engneerng, Shenyang Agrculture Unversty, Shenyang 110866,

More information

Bayesian Network Based Causal Relationship Identification and Funding Success Prediction in P2P Lending

Bayesian Network Based Causal Relationship Identification and Funding Success Prediction in P2P Lending Proceedngs of 2012 4th Internatonal Conference on Machne Learnng and Computng IPCSIT vol. 25 (2012) (2012) IACSIT Press, Sngapore Bayesan Network Based Causal Relatonshp Identfcaton and Fundng Success

More information

NEURO-FUZZY INFERENCE SYSTEM FOR E-COMMERCE WEBSITE EVALUATION

NEURO-FUZZY INFERENCE SYSTEM FOR E-COMMERCE WEBSITE EVALUATION NEURO-FUZZY INFERENE SYSTEM FOR E-OMMERE WEBSITE EVALUATION Huan Lu, School of Software, Harbn Unversty of Scence and Technology, Harbn, hna Faculty of Appled Mathematcs and omputer Scence, Belarusan State

More information

Probabilistic Latent Semantic User Segmentation for Behavioral Targeted Advertising*

Probabilistic Latent Semantic User Segmentation for Behavioral Targeted Advertising* Probablstc Latent Semantc User Segmentaton for Behavoral Targeted Advertsng* Xaohu Wu 1,2, Jun Yan 2, Nng Lu 2, Shucheng Yan 3, Yng Chen 1, Zheng Chen 2 1 Department of Computer Scence Bejng Insttute of

More information

Updating the E5810B firmware

Updating the E5810B firmware Updatng the E5810B frmware NOTE Do not update your E5810B frmware unless you have a specfc need to do so, such as defect repar or nstrument enhancements. If the frmware update fals, the E5810B wll revert

More information

Recurrence. 1 Definitions and main statements

Recurrence. 1 Definitions and main statements Recurrence 1 Defntons and man statements Let X n, n = 0, 1, 2,... be a MC wth the state space S = (1, 2,...), transton probabltes p j = P {X n+1 = j X n = }, and the transton matrx P = (p j ),j S def.

More information

Document Clustering Analysis Based on Hybrid PSO+K-means Algorithm

Document Clustering Analysis Based on Hybrid PSO+K-means Algorithm Document Clusterng Analyss Based on Hybrd PSO+K-means Algorthm Xaohu Cu, Thomas E. Potok Appled Software Engneerng Research Group, Computatonal Scences and Engneerng Dvson, Oak Rdge Natonal Laboratory,

More information

THE APPLICATION OF DATA MINING TECHNIQUES AND MULTIPLE CLASSIFIERS TO MARKETING DECISION

THE APPLICATION OF DATA MINING TECHNIQUES AND MULTIPLE CLASSIFIERS TO MARKETING DECISION Internatonal Journal of Electronc Busness Management, Vol. 3, No. 4, pp. 30-30 (2005) 30 THE APPLICATION OF DATA MINING TECHNIQUES AND MULTIPLE CLASSIFIERS TO MARKETING DECISION Yu-Mn Chang *, Yu-Cheh

More information

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ). REVIEW OF RISK MANAGEMENT CONCEPTS LOSS DISTRIBUTIONS AND INSURANCE Loss and nsurance: When someone s subject to the rsk of ncurrng a fnancal loss, the loss s generally modeled usng a random varable or

More information

Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic

Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic Lagrange Multplers as Quanttatve Indcators n Economcs Ivan Mezník Insttute of Informatcs, Faculty of Busness and Management, Brno Unversty of TechnologCzech Republc Abstract The quanttatve role of Lagrange

More information

ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING

ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING Matthew J. Lberatore, Department of Management and Operatons, Vllanova Unversty, Vllanova, PA 19085, 610-519-4390,

More information

Descriptive Models. Cluster Analysis. Example. General Applications of Clustering. Examples of Clustering Applications

Descriptive Models. Cluster Analysis. Example. General Applications of Clustering. Examples of Clustering Applications CMSC828G Prncples of Data Mnng Lecture #9 Today s Readng: HMS, chapter 9 Today s Lecture: Descrptve Modelng Clusterng Algorthms Descrptve Models model presents the man features of the data, a global summary

More information

BERNSTEIN POLYNOMIALS

BERNSTEIN POLYNOMIALS On-Lne Geometrc Modelng Notes BERNSTEIN POLYNOMIALS Kenneth I. Joy Vsualzaton and Graphcs Research Group Department of Computer Scence Unversty of Calforna, Davs Overvew Polynomals are ncredbly useful

More information

http://wrap.warwick.ac.uk

http://wrap.warwick.ac.uk Orgnal ctaton: Crstea, Alexandra I. and De Moo, A. (2003) Desgner adaptaton n adaptve hypermeda authorng. In: Internatonal Conference on Informaton Technology : Codng and Computng (ITCC 2003), Las Vegas,

More information

Mining Multiple Large Data Sources

Mining Multiple Large Data Sources The Internatonal Arab Journal of Informaton Technology, Vol. 7, No. 3, July 2 24 Mnng Multple Large Data Sources Anmesh Adhkar, Pralhad Ramachandrarao 2, Bhanu Prasad 3, and Jhml Adhkar 4 Department of

More information

An Approach to Automatically Constructing Domain Ontology 1

An Approach to Automatically Constructing Domain Ontology 1 An Approach to Automatcally Constructng Doman Ontology 1 Tngtng He 1 2 3 Xaopeng Zhang 1 3 Xnghuo Ye 1 3 1 Department of Computer Scence, Huazhong ormal Unversty 430079 Wuhan, Chna 2 Software College of

More information

Overview of monitoring and evaluation

Overview of monitoring and evaluation 540 Toolkt to Combat Traffckng n Persons Tool 10.1 Overvew of montorng and evaluaton Overvew Ths tool brefly descrbes both montorng and evaluaton, and the dstncton between the two. What s montorng? Montorng

More information

The Greedy Method. Introduction. 0/1 Knapsack Problem

The Greedy Method. Introduction. 0/1 Knapsack Problem The Greedy Method Introducton We have completed data structures. We now are gong to look at algorthm desgn methods. Often we are lookng at optmzaton problems whose performance s exponental. For an optmzaton

More information

Data Broadcast on a Multi-System Heterogeneous Overlayed Wireless Network *

Data Broadcast on a Multi-System Heterogeneous Overlayed Wireless Network * JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 24, 819-840 (2008) Data Broadcast on a Mult-System Heterogeneous Overlayed Wreless Network * Department of Computer Scence Natonal Chao Tung Unversty Hsnchu,

More information

Performance Analysis and Coding Strategy of ECOC SVMs

Performance Analysis and Coding Strategy of ECOC SVMs Internatonal Journal of Grd and Dstrbuted Computng Vol.7, No. (04), pp.67-76 http://dx.do.org/0.457/jgdc.04.7..07 Performance Analyss and Codng Strategy of ECOC SVMs Zhgang Yan, and Yuanxuan Yang, School

More information

A Fast Incremental Spectral Clustering for Large Data Sets

A Fast Incremental Spectral Clustering for Large Data Sets 2011 12th Internatonal Conference on Parallel and Dstrbuted Computng, Applcatons and Technologes A Fast Incremental Spectral Clusterng for Large Data Sets Tengteng Kong 1,YeTan 1, Hong Shen 1,2 1 School

More information

A Simple Approach to Clustering in Excel

A Simple Approach to Clustering in Excel A Smple Approach to Clusterng n Excel Aravnd H Center for Computatonal Engneerng and Networng Amrta Vshwa Vdyapeetham, Combatore, Inda C Rajgopal Center for Computatonal Engneerng and Networng Amrta Vshwa

More information

PAS: A Packet Accounting System to Limit the Effects of DoS & DDoS. Debish Fesehaye & Klara Naherstedt University of Illinois-Urbana Champaign

PAS: A Packet Accounting System to Limit the Effects of DoS & DDoS. Debish Fesehaye & Klara Naherstedt University of Illinois-Urbana Champaign PAS: A Packet Accountng System to Lmt the Effects of DoS & DDoS Debsh Fesehaye & Klara Naherstedt Unversty of Illnos-Urbana Champagn DoS and DDoS DDoS attacks are ncreasng threats to our dgtal world. Exstng

More information

Software project management with GAs

Software project management with GAs Informaton Scences 177 (27) 238 241 www.elsever.com/locate/ns Software project management wth GAs Enrque Alba *, J. Francsco Chcano Unversty of Málaga, Grupo GISUM, Departamento de Lenguajes y Cencas de

More information

Conversion between the vector and raster data structures using Fuzzy Geographical Entities

Conversion between the vector and raster data structures using Fuzzy Geographical Entities Converson between the vector and raster data structures usng Fuzzy Geographcal Enttes Cdála Fonte Department of Mathematcs Faculty of Scences and Technology Unversty of Combra, Apartado 38, 3 454 Combra,

More information

How To Predct On The Web For Hfmd

How To Predct On The Web For Hfmd Proceedngs of the Twenty-Second Internatonal Jont Conference on Artfcal Intellgence Predctng Epdemc Tendency through Search Behavor Analyss Danqng Xu, Yqun Lu, Mn Zhang, Shaopng Ma, Anq Cu, Lyun Ru State

More information

On the Optimal Control of a Cascade of Hydro-Electric Power Stations

On the Optimal Control of a Cascade of Hydro-Electric Power Stations On the Optmal Control of a Cascade of Hydro-Electrc Power Statons M.C.M. Guedes a, A.F. Rbero a, G.V. Smrnov b and S. Vlela c a Department of Mathematcs, School of Scences, Unversty of Porto, Portugal;

More information

Project Networks With Mixed-Time Constraints

Project Networks With Mixed-Time Constraints Project Networs Wth Mxed-Tme Constrants L Caccetta and B Wattananon Western Australan Centre of Excellence n Industral Optmsaton (WACEIO) Curtn Unversty of Technology GPO Box U1987 Perth Western Australa

More information

1.1 The University may award Higher Doctorate degrees as specified from time-to-time in UPR AS11 1.

1.1 The University may award Higher Doctorate degrees as specified from time-to-time in UPR AS11 1. HIGHER DOCTORATE DEGREES SUMMARY OF PRINCIPAL CHANGES General changes None Secton 3.2 Refer to text (Amendments to verson 03.0, UPR AS02 are shown n talcs.) 1 INTRODUCTION 1.1 The Unversty may award Hgher

More information

Face Verification Problem. Face Recognition Problem. Application: Access Control. Biometric Authentication. Face Verification (1:1 matching)

Face Verification Problem. Face Recognition Problem. Application: Access Control. Biometric Authentication. Face Verification (1:1 matching) Face Recognton Problem Face Verfcaton Problem Face Verfcaton (1:1 matchng) Querymage face query Face Recognton (1:N matchng) database Applcaton: Access Control www.vsage.com www.vsoncs.com Bometrc Authentcaton

More information

How To Analyze News From A News Report

How To Analyze News From A News Report , pp. 385-396 http://dx.do.org/10.14257/jmue.2014.9.11.37 Topc Sentment Analyss n Chnese News Ouyang Chunpng, Zhou Wen +, Yu Yng, Lu Zhmng and Yang Xaohua School of Computer Scence and Technology, Unversty

More information

Exploiting Recommendation on Social Media Networks

Exploiting Recommendation on Social Media Networks Internatonal Journal of Scence and Research IJSR) ISSN Onln: 2319-7064 Index Coperncus Value 2013): 6.14 Impact Factor 2013): 4.438 Explotng Recommendaton on Socal Meda Networs Swat A. Adhav 1, Sheetal

More information

Fast Fuzzy Clustering of Web Page Collections

Fast Fuzzy Clustering of Web Page Collections Fast Fuzzy Clusterng of Web Page Collectons Chrstan Borgelt and Andreas Nürnberger Dept. of Knowledge Processng and Language Engneerng Otto-von-Guercke-Unversty of Magdeburg Unverstätsplatz, D-396 Magdeburg,

More information

Approaches to Text Mining for Clinical Medical Records

Approaches to Text Mining for Clinical Medical Records Approaches to Text Mnng for Clncal Medcal Records Xaohua Zhou and Hyol Han College of Informaton Scence and Technology Drexel Unversty Phladelpha, PA 19104 [email protected] [email protected] Isaac

More information

How To Know The Components Of Mean Squared Error Of Herarchcal Estmator S

How To Know The Components Of Mean Squared Error Of Herarchcal Estmator S S C H E D A E I N F O R M A T I C A E VOLUME 0 0 On Mean Squared Error of Herarchcal Estmator Stans law Brodowsk Faculty of Physcs, Astronomy, and Appled Computer Scence, Jagellonan Unversty, Reymonta

More information

8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by

8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by 6 CHAPTER 8 COMPLEX VECTOR SPACES 5. Fnd the kernel of the lnear transformaton gven n Exercse 5. In Exercses 55 and 56, fnd the mage of v, for the ndcated composton, where and are gven by the followng

More information

Open Access A Load Balancing Strategy with Bandwidth Constraint in Cloud Computing. Jing Deng 1,*, Ping Guo 2, Qi Li 3, Haizhu Chen 1

Open Access A Load Balancing Strategy with Bandwidth Constraint in Cloud Computing. Jing Deng 1,*, Ping Guo 2, Qi Li 3, Haizhu Chen 1 Send Orders for Reprnts to [email protected] The Open Cybernetcs & Systemcs Journal, 2014, 8, 115-121 115 Open Access A Load Balancng Strategy wth Bandwdth Constrant n Cloud Computng Jng Deng 1,*,

More information

A Data Mining-Based OLAP Aggregation of. Complex Data: Application on XML Documents

A Data Mining-Based OLAP Aggregation of. Complex Data: Application on XML Documents 1 Runnng head: A DATA MINING-BASED OLAP AGGREGATION A Data Mnng-Based OLAP Aggregaton of Complex Data: Applcaton on XML Documents Radh Ben Messaoud, Omar Boussad, Sabne Loudcher Rabaséda {rbenmessaoud

More information

Calculation of Sampling Weights

Calculation of Sampling Weights Perre Foy Statstcs Canada 4 Calculaton of Samplng Weghts 4.1 OVERVIEW The basc sample desgn used n TIMSS Populatons 1 and 2 was a two-stage stratfed cluster desgn. 1 The frst stage conssted of a sample

More information

IMPACT ANALYSIS OF A CELLULAR PHONE

IMPACT ANALYSIS OF A CELLULAR PHONE 4 th ASA & μeta Internatonal Conference IMPACT AALYSIS OF A CELLULAR PHOE We Lu, 2 Hongy L Bejng FEAonlne Engneerng Co.,Ltd. Bejng, Chna ABSTRACT Drop test smulaton plays an mportant role n nvestgatng

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Max Wellng Department of Computer Scence Unversty of Toronto 10 Kng s College Road Toronto, M5S 3G5 Canada [email protected] Abstract Ths s a note to explan support vector machnes.

More information

Research on Evaluation of Customer Experience of B2C Ecommerce Logistics Enterprises

Research on Evaluation of Customer Experience of B2C Ecommerce Logistics Enterprises 3rd Internatonal Conference on Educaton, Management, Arts, Economcs and Socal Scence (ICEMAESS 2015) Research on Evaluaton of Customer Experence of B2C Ecommerce Logstcs Enterprses Yle Pe1, a, Wanxn Xue1,

More information

Searching for Interacting Features for Spam Filtering

Searching for Interacting Features for Spam Filtering Searchng for Interactng Features for Spam Flterng Chuanlang Chen 1, Yun-Chao Gong 2, Rongfang Be 1,, and X. Z. Gao 3 1 Department of Computer Scence, Bejng Normal Unversty, Bejng 100875, Chna 2 Software

More information

Canon NTSC Help Desk Documentation

Canon NTSC Help Desk Documentation Canon NTSC Help Desk Documentaton READ THIS BEFORE PROCEEDING Before revewng ths documentaton, Canon Busness Solutons, Inc. ( CBS ) hereby refers you, the customer or customer s representatve or agent

More information

An Enhanced Super-Resolution System with Improved Image Registration, Automatic Image Selection, and Image Enhancement

An Enhanced Super-Resolution System with Improved Image Registration, Automatic Image Selection, and Image Enhancement An Enhanced Super-Resoluton System wth Improved Image Regstraton, Automatc Image Selecton, and Image Enhancement Yu-Chuan Kuo ( ), Chen-Yu Chen ( ), and Chou-Shann Fuh ( ) Department of Computer Scence

More information

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 12

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 12 14 The Ch-squared dstrbuton PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 1 If a normal varable X, havng mean µ and varance σ, s standardsed, the new varable Z has a mean 0 and varance 1. When ths standardsed

More information

Master s Thesis. Configuring robust virtual wireless sensor networks for Internet of Things inspired by brain functional networks

Master s Thesis. Configuring robust virtual wireless sensor networks for Internet of Things inspired by brain functional networks Master s Thess Ttle Confgurng robust vrtual wreless sensor networks for Internet of Thngs nspred by bran functonal networks Supervsor Professor Masayuk Murata Author Shnya Toyonaga February 10th, 2014

More information

A Programming Model for the Cloud Platform

A Programming Model for the Cloud Platform Internatonal Journal of Advanced Scence and Technology A Programmng Model for the Cloud Platform Xaodong Lu School of Computer Engneerng and Scence Shangha Unversty, Shangha 200072, Chna [email protected]

More information

Traffic State Estimation in the Traffic Management Center of Berlin

Traffic State Estimation in the Traffic Management Center of Berlin Traffc State Estmaton n the Traffc Management Center of Berln Authors: Peter Vortsch, PTV AG, Stumpfstrasse, D-763 Karlsruhe, Germany phone ++49/72/965/35, emal [email protected] Peter Möhl, PTV AG,

More information

Gender Classification for Real-Time Audience Analysis System

Gender Classification for Real-Time Audience Analysis System Gender Classfcaton for Real-Tme Audence Analyss System Vladmr Khryashchev, Lev Shmaglt, Andrey Shemyakov, Anton Lebedev Yaroslavl State Unversty Yaroslavl, Russa [email protected], [email protected], [email protected],

More information

+ + + - - This circuit than can be reduced to a planar circuit

+ + + - - This circuit than can be reduced to a planar circuit MeshCurrent Method The meshcurrent s analog of the nodeoltage method. We sole for a new set of arables, mesh currents, that automatcally satsfy KCLs. As such, meshcurrent method reduces crcut soluton to

More information

Research on Transformation Engineering BOM into Manufacturing BOM Based on BOP

Research on Transformation Engineering BOM into Manufacturing BOM Based on BOP Appled Mechancs and Materals Vols 10-12 (2008) pp 99-103 Onlne avalable snce 2007/Dec/06 at wwwscentfcnet (2008) Trans Tech Publcatons, Swtzerland do:104028/wwwscentfcnet/amm10-1299 Research on Transformaton

More information

Number of Levels Cumulative Annual operating Income per year construction costs costs ($) ($) ($) 1 600,000 35,000 100,000 2 2,200,000 60,000 350,000

Number of Levels Cumulative Annual operating Income per year construction costs costs ($) ($) ($) 1 600,000 35,000 100,000 2 2,200,000 60,000 350,000 Problem Set 5 Solutons 1 MIT s consderng buldng a new car park near Kendall Square. o unversty funds are avalable (overhead rates are under pressure and the new faclty would have to pay for tself from

More information

J. Parallel Distrib. Comput.

J. Parallel Distrib. Comput. J. Parallel Dstrb. Comput. 71 (2011) 62 76 Contents lsts avalable at ScenceDrect J. Parallel Dstrb. Comput. journal homepage: www.elsever.com/locate/jpdc Optmzng server placement n dstrbuted systems n

More information

APPLICATION OF PROBE DATA COLLECTED VIA INFRARED BEACONS TO TRAFFIC MANEGEMENT

APPLICATION OF PROBE DATA COLLECTED VIA INFRARED BEACONS TO TRAFFIC MANEGEMENT APPLICATION OF PROBE DATA COLLECTED VIA INFRARED BEACONS TO TRAFFIC MANEGEMENT Toshhko Oda (1), Kochro Iwaoka (2) (1), (2) Infrastructure Systems Busness Unt, Panasonc System Networks Co., Ltd. Saedo-cho

More information

A DATA MINING APPLICATION IN A STUDENT DATABASE

A DATA MINING APPLICATION IN A STUDENT DATABASE JOURNAL OF AERONAUTICS AND SPACE TECHNOLOGIES JULY 005 VOLUME NUMBER (53-57) A DATA MINING APPLICATION IN A STUDENT DATABASE Şenol Zafer ERDOĞAN Maltepe Ünversty Faculty of Engneerng Büyükbakkalköy-Istanbul

More information

A Novel Methodology of Working Capital Management for Large. Public Constructions by Using Fuzzy S-curve Regression

A Novel Methodology of Working Capital Management for Large. Public Constructions by Using Fuzzy S-curve Regression Novel Methodology of Workng Captal Management for Large Publc Constructons by Usng Fuzzy S-curve Regresson Cheng-Wu Chen, Morrs H. L. Wang and Tng-Ya Hseh Department of Cvl Engneerng, Natonal Central Unversty,

More information

How To Classfy Onlne Mesh Network Traffc Classfcaton And Onlna Wreless Mesh Network Traffic Onlnge Network

How To Classfy Onlne Mesh Network Traffc Classfcaton And Onlna Wreless Mesh Network Traffic Onlnge Network Journal of Computatonal Informaton Systems 7:5 (2011) 1524-1532 Avalable at http://www.jofcs.com Onlne Wreless Mesh Network Traffc Classfcaton usng Machne Learnng Chengje GU 1,, Shuny ZHANG 1, Xaozhen

More information

Demographic and Health Surveys Methodology

Demographic and Health Surveys Methodology samplng and household lstng manual Demographc and Health Surveys Methodology Ths document s part of the Demographc and Health Survey s DHS Toolkt of methodology for the MEASURE DHS Phase III project, mplemented

More information

Multi-sensor Data Fusion for Cyber Security Situation Awareness

Multi-sensor Data Fusion for Cyber Security Situation Awareness Avalable onlne at www.scencedrect.com Proceda Envronmental Scences 0 (20 ) 029 034 20 3rd Internatonal Conference on Envronmental 3rd Internatonal Conference on Envronmental Scence and Informaton Applcaton

More information

v a 1 b 1 i, a 2 b 2 i,..., a n b n i.

v a 1 b 1 i, a 2 b 2 i,..., a n b n i. SECTION 8.4 COMPLEX VECTOR SPACES AND INNER PRODUCTS 455 8.4 COMPLEX VECTOR SPACES AND INNER PRODUCTS All the vector spaces we have studed thus far n the text are real vector spaces snce the scalars are

More information

A neuro-fuzzy collaborative filtering approach for Web recommendation. G. Castellano, A. M. Fanelli, and M. A. Torsello *

A neuro-fuzzy collaborative filtering approach for Web recommendation. G. Castellano, A. M. Fanelli, and M. A. Torsello * Internatonal Journal of Computatonal Scence 992-6669 (Prnt) 992-6677 (Onlne) Global Informaton Publsher 27, Vol., No., 27-39 A neuro-fuzzy collaboratve flterng approach for Web recommendaton G. Castellano,

More information

Traffic-light a stress test for life insurance provisions

Traffic-light a stress test for life insurance provisions MEMORANDUM Date 006-09-7 Authors Bengt von Bahr, Göran Ronge Traffc-lght a stress test for lfe nsurance provsons Fnansnspetonen P.O. Box 6750 SE-113 85 Stocholm [Sveavägen 167] Tel +46 8 787 80 00 Fax

More information

A novel Method for Data Mining and Classification based on

A novel Method for Data Mining and Classification based on A novel Method for Data Mnng and Classfcaton based on Ensemble Learnng 1 1, Frst Author Nejang Normal Unversty;Schuan Nejang 641112,Chna, E-mal: [email protected] Abstract Data mnng has been attached great

More information

Brigid Mullany, Ph.D University of North Carolina, Charlotte

Brigid Mullany, Ph.D University of North Carolina, Charlotte Evaluaton And Comparson Of The Dfferent Standards Used To Defne The Postonal Accuracy And Repeatablty Of Numercally Controlled Machnng Center Axes Brgd Mullany, Ph.D Unversty of North Carolna, Charlotte

More information

RequIn, a tool for fast web traffic inference

RequIn, a tool for fast web traffic inference RequIn, a tool for fast web traffc nference Olver aul, Jean Etenne Kba GET/INT, LOR Department 9 rue Charles Fourer 90 Evry, France [email protected], [email protected] Abstract As networked

More information