An Approach to Automatically Constructing Domain Ontology 1
|
|
|
- Ernest Walsh
- 9 years ago
- Views:
Transcription
1 An Approach to Automatcally Constructng Doman Ontology 1 Tngtng He Xaopeng Zhang 1 3 Xnghuo Ye Department of Computer Scence, Huazhong ormal Unversty Wuhan, Chna 2 Software College of Tsnghua Unversty Beng, Chna 3 atonal Language Resources Montor & Research Center (network meda) Wuhan, Chna [email protected] [email protected] [email protected] Abstract. In ths paper we present an approach to mnng doman-dependent ontologes usng term extracton and relatonshp dscovery technology. There are two man nnovatons n our approach. One s extractng terms usng log-lkelhood rato, whch s based on the contrastve probablty of term occurrence n doman corpus and background corpus. The other s fusng together nformaton from multple knowledge sources as evdences for dscoverng partcular semantc relatonshps among terms. In the experment, we also mprove the tradtonal k-medods algorthm for mult-level clusterng. We have appled our approach to produce an ontology for the doman of computer scence and obtaned promsng results. 1 Introducton Ontologes have become an mportant means for structurng knowledge and buldng knowledge-ntensve systems. Ontologes have shown ther usefulness n applcaton areas such as ntellgent nformaton ntegraton, nformaton retreval and natural language processng, to name but a few. For ths purpose, efforts have been made to facltate the ontology engneerng process, n partcular the acquston of ontologes from doman texts [1]. Constructng an ontology s an extremely laborous effort. Even wth some reuse of core knowledge from an Upper Model, the task of creatng an ontology for a partcular doman has a hgh cost, ncurred for each new doman. Tools that could automate, or sem-automate, the constructon of ontologes for dfferent domans could dramatcally reduce the knowledge creaton cost. One approach to developng such tools s to rely on nformaton mplct n collectons of texts n a partcular doman. If t were possble to automatcally extract terms and ther semantc relatons from the text corpus, doman ontology could be bult convenently. Ths would be more cost-effectve than havng a human develop the ontology from scratch [2][3]. Doman specfc ontologes are useful n systems nvolved wth artfcal reasonng, and nformaton retreval. Ontologes gve such systems a vocabulary of terms, and concepts relatng one term to another. In ths paper, we present a method that automatcally mnng an ontology from any large corpus n a specfc doman, to support data ntegraton and nformaton retreval tasks. The nduced ontology conssts of doman concepts only related by parent-chld lnks, not ncludng more specalzed relatons. Our approach s comprsed of two man phases: term extracton and relatonshp dscovery. In the former phase, meanngful terms are extracted usng LLR formula. In the latter one, parent-chld relatons among terms are nduced through multlevel clusterng. The remander of the paper s as follows. In Secton 2, we wll dscuss the related works n ths feld. In Secton 3, we wll gve an overvew of the overall system archtecture and explan the means and progress of ontology constructon. Subsequently, an example wll show some promsng results we have 1 The paper s supported by atonal atural Scence Foundaton of Chna (SFC), (Grant o ; Grant o Grant o ;); Mnstry of educaton of Chna, Research Proect for Scence and technology, (Grant o ). 150
2 obtaned when applyng our mechansms for mnng ontologes from text, together wth our analyss. Ths wll be presented n Secton 4. In Secton 5, we conclude and menton our future work. 2 Prevous Work The exstng approaches to ontology nducton nclude those that start from structured data, mergng ontologes or database schemas (Doan et al. 2002). Other approaches use natural language data, sometmes ust by analyzng the corpus (Sanderson and Croft 1999), (Caraballo 1999) or by learnng to expand Wordet wth clusters of terms from a corpus, e.g., (Gru et al. 2003). Informaton extracton approaches that nfer labeled relatons ether requre substantal hand-created lngustc or doman knowledge, e.g., (Craven and Kumlen 1999) (Hull and Gomez 1993), or requre human-annotated tranng data wth relaton nformaton for each doman (Craven et al. 1998). For automatc ontology constructon, Govnd and Chakravarth have presented n 2001 an approach that extracts ontology from text documents by Sngular Value Decomposton (SVD), whch s a pure statstcal analyss method, as compared to heurstc and rule based methods. They adopt Latent Semantc Indexng (LSI), whch attempts to catch term-term statstcal references by replacng the document space wth lower dmensonal concept space. Ther method s convnced of ts smplcty because t s based on farly precse mathematcal foundaton. It s effectve but lmted wth precson. What s more, t doesn t fgure out the exact relatons among terms [4]. In year of 2001, Dekang Ln and Patrck Pantel proposed a method for doman concepts dscovery based on a clusterng algorthm called CBC (Clusterng by Commttee). They generally regard a concept as a cluster of terms. It ust deals wth only one aspect of the whole progress of ontology nducton [5]. 3 Automatc Ontology Constructng 3.1 Archtecture An overall archtecture for doman-ndependent ontology nducton s shown n Fgure 1. The documents of doman corpora are preprocessed to remove segments such as pctures and specfc symbols and then flter the stopwords. ext, terms are extracted based on word segmentaton and syntactc taggng. Subsequently semantc relatons between pars of terms are derved usng multlevel clusterng wth evdences from multple knowledge sources. Data Cleanup Term Dscovery Doman Corpus Removng Symbols Flterng Stopwords Parsng Taggng Term Scorng Term Extracton Relatonshp Dscovery Ontology Output Mult-level Clusterng Smlarty Computng Chnese Howet Fg. 1. System Archtecture 151
3 3.2 Term Dscovery In ths secton, our goal s to dentfy doman-relevant terms from a collecton of doman specfc texts. For term extracton, there exst several approaches. One s based on PAT-TREE, whch has the advantage n extractng terms (phrases) of any length because t could avod word segmentaton for Chnese. But t needs a large amount of documents to acheve excellent precson. Another approach to term extracton s C/C-Value method proposed by Frantz and Ananadou (1999), whch combnes lngustcs and statstcs methods and makes a progress. But t has lmtaton because ts lngustcs knowledge s only for Englsh. In ths paper, we propose a promsng approach to extractng doman-relevant terms. Terms are scored for doman-relevance based on the assumpton that f a term occurs sgnfcantly more n a doman corpus than n a background corpus, then the term s clearly doman relevant. As an example, n Table 1 we compare the number of documents contanng the terms 算 法 (algorthm), 内 存 (memory), 路 由 器 (Router) and 数 据 库 (database) n a doman-dependent corpus of computer scence ncludng 200 Chnese perodcal papers, compared to a larger Modern Chnese Corpus as the background corpus. Table 1. Term IDF n doman and background corpora 算 法 内 存 路 由 器 数 据 库 Total Doman Corpus Background Corpus Dfference We can observe from Table 1 that the terms 算 法 (algorthm), 内 存 (memory), 路 由 器 (Router) and 数 据 库 (database) occur sgnfcantly more n the doman corpus than n the Background corpus. We consder they are doman-dependent. To estmate the doman relevancy of a term, we use the log-lkelhood rato (LLR) gven by -2 log2 ( Ho (p;k1,n1,k2,n2 ) / Ha ( p1,p2;n1,k1,n2,k2 ) ) (1) LLR measures the extent to whch a hypotheszed model of the dstrbuton of cell counts, Ha, dffers from the null hypothess, Ho (namely, that the percentage of documents contanng ths term s the same n both corpora). We use a bnomal model for Ho and Ha. Here, p=(k1+k2)/(n1+n2), p1=k1/n1, p2=k2/n2, k1 s the number of documents contanng the term n the doman corpus, k2 s the number of documents contanng the term n the background corpus, n1 s the total number of documents n the doman corpus, n2 s the total number of documents n the background corpus [6]. 3.3 Relatonshp Dscovery In ths secton, we try to mne the parent-chld lnks among terms usng multlevel clusterng based on term smlarty. We fuse together nformaton from multple knowledge sources as evdences for partcular semantc relatonshps among terms. For clusterng we adopt mproved k-medods algorthm, whch s flat and effectve. The process of nducng relatons s as follows. Frst the clusterng algorthm s used to obtan top-level clusters. Subsequently, for each top-level cluster, we use the clusterng algorthm agan to gan clusters of second level. By analogy, we can fnd multlevel clusters that mply parent-chld relatonshps among terms. In experment the depth of clusterng level s restrcted. 152
4 3.3.1 Term Smlarty Computaton The preprocessed documents are analyzed and the frequency matrx known as Term-Document matrx s produced as result of the analyss. For a Term-Document matrx M[m][n], each term can be represented as the followng vector: T r = (M[][0], M[][1],, M[][k],, M[][n]) (2) We can compute the cosne smlarty between a par of terms gven by r r r r T T (3) cos( T, T ) = r r T T The cosne smlartes between several term pars are shown n Table 2. Table 2. Cosne smlartes of term pars Term1 Term2 Cosne Smlarty 缓 冲 区 (buffer) 磁 盘 (dsk) 缓 冲 区 (buffer) 主 存 (man store) 内 存 (memory) 磁 盘 (dsk) 服 务 器 (server) 网 络 (network) It can be observed that the cosne smlarty s lkely to satsfy the concurrent frequency of term pars. It reflects the context correlaton of terms but mght have a preudce aganst semantc correlaton among terms. For nstance, the cosne smlarty computed above between 磁 盘 (dsk) and 数 据 (data) s ust , whch devates from the experental value. In order to exact smlarty, we use Howet as another knowledge resource, whch s a Chnese thesaurus by Zhendong Dong and Qang Dong (2001). We combne the smlarty of terms n Howet as a supplement wth the cosne smlarty to estmate the correlaton of term pars. The Howet smlarty that nvolves unknown words s assgned to 0. Table 3 shows some Howet smlartes of term pars. Table 3. Howet smlartes of term pars Term1 Term2 Howet Smlarty 缓 冲 区 (buffer) 磁 盘 (dsk) 缓 冲 区 (buffer) 主 存 (man store) 内 存 (memory) 磁 盘 (dsk) 服 务 器 (server) 网 络 (network) The fnal formula of smlarty computaton s gven by r r r r r r sma ( T, T ) + α smb ( T, T ) (4) sm ( T, T ) = 2 153
5 r r Where sma T, T ) ( r r s the cosne smlarty and smb T, T ) ( s the Howet smlarty; Meanwhle, α s an adustment argument. The adusted smlarty between term pars s gven n Table 4 (α =0.5). Table 4. Adusted smlartes of term pars Term1 Term2 Smlarty 缓 冲 区 (buffer) 磁 盘 (dsk) 缓 冲 区 (buffer) 主 存 (man store) 内 存 (memory) 磁 盘 (dsk) 服 务 器 (server) 网 络 (network) Clusterng Algorthm In order to make t converge faster and result n better clusters n accordance wth the orgnal dstrbuton of terms, we mproved the tradtonal k-medods algorthm. When decdng the new center of a cluster, we frst choose top-p terms closest to the old center n the cluster and take the term closest to the mean of the p terms as the new center. The mproved algorthm s descrbed as follows: 1. Choose m terms as the centers of clusters randomly: C1, C 2,, C,, ; C m 2. Compute the smlarty for each term to each cluster center. Assgn the term nto the closest cluster. 3. Determne the new center of each cluster as follows: Compute the average smlarty of terms n cluster usng the formula: 1 avgsm[ ] = n n k = 1 sm( T k, C ) Where n s the number of terms n cluster. Pck out p terms most closest to the cluster center decded by avgsm [ ] p = m max_ avgsm Where max_ avgsm s the maxmum of m average smlartes for m clusters. Compute the mean of the above p terms and choose the term closest to t as the new center of cluster. 4. Go to step 2 only f m cluster centers s not stable. 5. End and result n m clusters. (5) (6) 4 Experments and Evaluaton We have appled our approach to produce ontology n computer scence doman. The doman corpus contans 200 Chnese theses (3.12 M) retreved from professonal publcatons. The background corpus we used s the Modern Chnese Corpus ( ) whch conssts of documents (33.5 M). 154
6 4.1 Term Dscovery In experment, we have compared the effect of the two scorng methods LLR and TF.IDF. Table 5 shows the scores of some terms n relatve percentage. The boldfaced terms s dentfed by LLR to be doman-relevant. It can be nferred from the experment results that LLR excels TF.IDF. Table 5. Relatve scores of both LLR and TF.IDF Term Doman DF Background DF LLR score TF.IDF score 路 由 器 总 线 控 制 器 端 口 缓 存 请 求 线 路 质 量 屏 蔽 A term lst of 286 doman-relevant terms (key terms) has been manually constructed to support the subsequent evaluaton. We estmate the results normally based on Recall, Precson and F-measure, whch are defned as follows. Recall(R) The rato of correct terms number to key terms number n manual lst gven by R = correct key (7) Precson(P) The rato of correct terms number to extracted terms number gven by P = correct correct + ncorrect (8) F-measure(F) F-measure s defned as a combnaton of recall and precson: F = 2 RP (9) R + P These three values vary dependng on the threshold of LLR. Fgure 2 shows the varety of F-measure, recall and precson along wth LLR. 155
7 Fg. 1. F-measure, recall and precson by varyng the threshold of LLR 4.2 Relatonshp Dscovery Accordng to fgure 2, we chose 94 as the threshold of LLR and obtaned 216 doman-relevant terms. The depth of clusterng level was restrcted to 2. We have ganed 5 top-level clusters and 15 second level clusters. The precson of the top-level clusters acheved 76.7%. The average precson of the second-level clusters acheved 70.3%. The parent-chld relatonshp among terms and clusters s shown n fgure 3. Each cluster s named by the most frequent term n t. 网 络 路 由 器 路 径 主 干 网 速 率 中 继 广 域 网 延 迟 分 组 交 换 交 换 机 局 域 网 交 换 网 接 入 网 策 略 关 网 键 词 网 格 网 络 拓 扑 骨 干 网 络 搜 索 搜 索 引 擎 网 页 脚 本 主 机 万 维 网 网 站 网 址 带 宽 阻 塞 端 口 监 听 信 息 网 服 务 器 防 火 墙 流 量 计 算 中 心 域 名 主 页 服 务 器 拷 贝 电 子 邮 件 邮 件 邮 箱 日 志 数 据 流 语 音 共 享 信 息 港 智 能 性 信 息 港 因 特 网 智 能 化 电 话 网 宽 带 调 制 解 调 频 率 段 数 字 网 计 算 机 进 程 计 算 机 体 系 结 构 总 线 硬 件 接 口 中 断 控 制 器 寄 存 器 运 算 时 间 段 数 据 共 享 实 时 系 统 调 度 者 进 程 子 系 统 分 系 统 硬 盘 计 算 机 芯 片 大 型 机 巨 型 磁 盘 软 盘 光 盘 主 存 吞 吐 软 驱 外 存 内 存 存 储 器 存 储 缓 冲 缓 存 输 出 驱 动 驱 动 器 适 配 器 网 卡 参 数 矩 阵 数 组 指 针 字 符 串 参 数 静 态 访 问 编 译 器 变 量 字 符 标 量 流 程 图 流 程 结 构 图 逻 辑 图 运 算 符 编 程 控 制 台 编 程 源 程 序 程 序 包 子 程 序 程 序 性 源 代 码 向 量 编 译 调 试 调 试 器 程 序 员 瓶 颈 数 据 包 程 地 址 输 入 提 示 符 键 盘 终 止 符 示 意 图 主 程 序 操 作 符 空 格 数 据 位 读 取 存 取 缓 冲 区 地 址 计 时 器 序 程 序 编 程 者 查 询 分 析 器 数 据 库 数 据 表 冗 余 备 份 标 识 符 计 数 器 初 始 化 程 序 语 句 表 达 式 操 作 数 图 形 切 换 图 形 界 面 请 求 文 本 框 图 标 按 钮 视 图 标 识 号 软 件 软 件 软 件 包 构 建 复 用 重 构 组 件 构 件 重 用 性 模 块 封 装 性 封 装 继 承 私 有 实 例 对 象 信 息 信 息 加 密 令 牌 信 息 论 用 户 名 口 令 登 录 信 息 信 道 二 进 制 编 码 解 码 局 部 译 码 补 码 误 码 纠 错 空 串 算 算 法 队 列 索 引 字 节 标 记 字 段 算 法 二 分 法 数 据 项 回 溯 堆 栈 节 点 调 度 优 先 级 虚 拟 法 执 行 交 换 轮 转 流 水 号 时 序 线 性 执 行 标 识 符 Fg. 2. Parent-chld relatonshps among terms 156
8 5 Concluson In ths paper we present an approach to automatcally mnng doman-dependent ontologes from corpora based on term extracton and relatonshp dscovery technology. There are two man nnovatons n our approach. One s extractng terms usng log-lkelhood rato, whch s based on the contrastve probablty of term occurrence n doman corpus and background corpus. The other s fusng together nformaton from multple knowledge sources as evdences for dscoverng partcular semantc relatonshps among terms. In our experment, we mprove the tradtonal k-medods algorthm for mult-level clusterng. We have appled our approach to produce an ontology for the doman of computer scence and obtaned promsng results. In the future work, we wll consder developng heurstc methods or rules to label the exact relatons among terms and fndng out a more effectve ontology evaluaton methodology. Reference 1. Indereet Man.Automatcally Inducng Ontologes from Corpora. Proceedngs of CompuTerm 2004: 3rd Internatonal Workshop on Computatonal Termnology, OLIG'2004, Geneva. 2. A. Maedche and S. Staab. Mnng ontology from text. 12th Internatonal Workshop on Knowledge Engneerng and Knowledge Management,2000 (EKAW'2000). 3. A. Maedche and S. Staab. The TEXT-TO-OTO Ontology Learnng Envronment. Software Demonstraton at ICCS-2000-Eght Internatonal Conference on Conceptual Structures. August 14-18, 2000, Darmstadt, Germany. 4. Dr.Sadanand Srvastava, Dr.James Gl de Lamadrd. Extractng an ontology from a document usng Sngular Value Decomposton. ADMI Ln, D. and Pantel, P Inducton of semantc classes from natural language text. In Proceedngs of SIGKDD-01. pp SanFrancsco, CA. 6. Dunnng.T. Accurate Methods for the Statstcs of Surprse and Concdence, Computatonal Lngustcs, 19(1); 61-74, March Chakravarth S Velvadapu, Document Ontology Extractor, Appled research n Computer scence, Fall A. Maedche and S. Staab. Dscoverng conceptual relatons from text. In Proceedngs of ECAI IOS Press, Amsterdam, D. Faure and C. edellec. A corpus-based conceptual clusterng method for verb frames and ontology acquston. In LREC workshop on adaptng lexcal and corpus resources to sublanguages and applcatons, Granada, Span, Doan, A., Madhavan, J., Domngs, P. and Halevy, A Learnng to Map between Ontologes on the Semantc Web. WWW
The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis
The Development of Web Log Mnng Based on Improve-K-Means Clusterng Analyss TngZhong Wang * College of Informaton Technology, Luoyang Normal Unversty, Luoyang, 471022, Chna [email protected] Abstract.
Vision Mouse. Saurabh Sarkar a* University of Cincinnati, Cincinnati, USA ABSTRACT 1. INTRODUCTION
Vson Mouse Saurabh Sarkar a* a Unversty of Cncnnat, Cncnnat, USA ABSTRACT The report dscusses a vson based approach towards trackng of eyes and fngers. The report descrbes the process of locatng the possble
What is Candidate Sampling
What s Canddate Samplng Say we have a multclass or mult label problem where each tranng example ( x, T ) conssts of a context x a small (mult)set of target classes T out of a large unverse L of possble
Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network
700 Proceedngs of the 8th Internatonal Conference on Innovaton & Management Forecastng the Demand of Emergency Supples: Based on the CBR Theory and BP Neural Network Fu Deqang, Lu Yun, L Changbng School
Web Object Indexing Using Domain Knowledge *
Web Object Indexng Usng Doman Knowledge * Muyuan Wang Department of Automaton Tsnghua Unversty Bejng 100084, Chna (86-10)51774518 Zhwe L, Le Lu, We-Yng Ma Mcrosoft Research Asa Sgma Center, Hadan Dstrct
Semantic Link Analysis for Finding Answer Experts *
JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 28, 51-65 (2012) Semantc Lnk Analyss for Fndng Answer Experts * YAO LU 1,2,3, XIAOJUN QUAN 2, JINGSHENG LEI 4, XINGLIANG NI 1,2,3, WENYIN LIU 2,3 AND YINLONG
1 Example 1: Axis-aligned rectangles
COS 511: Theoretcal Machne Learnng Lecturer: Rob Schapre Lecture # 6 Scrbe: Aaron Schld February 21, 2013 Last class, we dscussed an analogue for Occam s Razor for nfnte hypothess spaces that, n conjuncton
Study on Model of Risks Assessment of Standard Operation in Rural Power Network
Study on Model of Rsks Assessment of Standard Operaton n Rural Power Network Qngj L 1, Tao Yang 2 1 Qngj L, College of Informaton and Electrcal Engneerng, Shenyang Agrculture Unversty, Shenyang 110866,
An Interest-Oriented Network Evolution Mechanism for Online Communities
An Interest-Orented Network Evoluton Mechansm for Onlne Communtes Cahong Sun and Xaopng Yang School of Informaton, Renmn Unversty of Chna, Bejng 100872, P.R. Chna {chsun,yang}@ruc.edu.cn Abstract. Onlne
Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College
Feature selecton for ntruson detecton Slobodan Petrovć NISlab, Gjøvk Unversty College Contents The feature selecton problem Intruson detecton Traffc features relevant for IDS The CFS measure The mrmr measure
Mining Multiple Large Data Sources
The Internatonal Arab Journal of Informaton Technology, Vol. 7, No. 3, July 2 24 Mnng Multple Large Data Sources Anmesh Adhkar, Pralhad Ramachandrarao 2, Bhanu Prasad 3, and Jhml Adhkar 4 Department of
FREQUENCY OF OCCURRENCE OF CERTAIN CHEMICAL CLASSES OF GSR FROM VARIOUS AMMUNITION TYPES
FREQUENCY OF OCCURRENCE OF CERTAIN CHEMICAL CLASSES OF GSR FROM VARIOUS AMMUNITION TYPES Zuzanna BRO EK-MUCHA, Grzegorz ZADORA, 2 Insttute of Forensc Research, Cracow, Poland 2 Faculty of Chemstry, Jagellonan
How To Analyze News From A News Report
, pp. 385-396 http://dx.do.org/10.14257/jmue.2014.9.11.37 Topc Sentment Analyss n Chnese News Ouyang Chunpng, Zhou Wen +, Yu Yng, Lu Zhmng and Yang Xaohua School of Computer Scence and Technology, Unversty
A neuro-fuzzy collaborative filtering approach for Web recommendation. G. Castellano, A. M. Fanelli, and M. A. Torsello *
Internatonal Journal of Computatonal Scence 992-6669 (Prnt) 992-6677 (Onlne) Global Informaton Publsher 27, Vol., No., 27-39 A neuro-fuzzy collaboratve flterng approach for Web recommendaton G. Castellano,
SIMPLE LINEAR CORRELATION
SIMPLE LINEAR CORRELATION Smple lnear correlaton s a measure of the degree to whch two varables vary together, or a measure of the ntensty of the assocaton between two varables. Correlaton often s abused.
14.74 Lecture 5: Health (2)
14.74 Lecture 5: Health (2) Esther Duflo February 17, 2004 1 Possble Interventons Last tme we dscussed possble nterventons. Let s take one: provdng ron supplements to people, for example. From the data,
Assessing Student Learning Through Keyword Density Analysis of Online Class Messages
Assessng Student Learnng Through Keyword Densty Analyss of Onlne Class Messages Xn Chen New Jersey Insttute of Technology [email protected] Brook Wu New Jersey Insttute of Technology [email protected] ABSTRACT Ths
Forecasting the Direction and Strength of Stock Market Movement
Forecastng the Drecton and Strength of Stock Market Movement Jngwe Chen Mng Chen Nan Ye [email protected] [email protected] [email protected] Abstract - Stock market s one of the most complcated systems
Context-aware Mobile Recommendation System Based on Context History
TELKOMNIKA Indonesan Journal of Electrcal Engneerng Vol.12, No.4, Aprl 2014, pp. 3158 ~ 3167 DOI: http://dx.do.org/10.11591/telkomnka.v124.4786 3158 Context-aware Moble Recommendaton System Based on Context
Can Auto Liability Insurance Purchases Signal Risk Attitude?
Internatonal Journal of Busness and Economcs, 2011, Vol. 10, No. 2, 159-164 Can Auto Lablty Insurance Purchases Sgnal Rsk Atttude? Chu-Shu L Department of Internatonal Busness, Asa Unversty, Tawan Sheng-Chang
Searching for Interacting Features for Spam Filtering
Searchng for Interactng Features for Spam Flterng Chuanlang Chen 1, Yun-Chao Gong 2, Rongfang Be 1,, and X. Z. Gao 3 1 Department of Computer Scence, Bejng Normal Unversty, Bejng 100875, Chna 2 Software
A Performance Analysis of View Maintenance Techniques for Data Warehouses
A Performance Analyss of Vew Mantenance Technques for Data Warehouses Xng Wang Dell Computer Corporaton Round Roc, Texas Le Gruenwald The nversty of Olahoma School of Computer Scence orman, OK 739 Guangtao
PEER REVIEWER RECOMMENDATION IN ONLINE SOCIAL LEARNING CONTEXT: INTEGRATING INFORMATION OF LEARNERS AND SUBMISSIONS
PEER REVIEWER RECOMMENDATION IN ONLINE SOCIAL LEARNING CONTEXT: INTEGRATING INFORMATION OF LEARNERS AND SUBMISSIONS Yunhong Xu, Faculty of Management and Economcs, Kunmng Unversty of Scence and Technology,
THE APPLICATION OF DATA MINING TECHNIQUES AND MULTIPLE CLASSIFIERS TO MARKETING DECISION
Internatonal Journal of Electronc Busness Management, Vol. 3, No. 4, pp. 30-30 (2005) 30 THE APPLICATION OF DATA MINING TECHNIQUES AND MULTIPLE CLASSIFIERS TO MARKETING DECISION Yu-Mn Chang *, Yu-Cheh
Performance Analysis and Coding Strategy of ECOC SVMs
Internatonal Journal of Grd and Dstrbuted Computng Vol.7, No. (04), pp.67-76 http://dx.do.org/0.457/jgdc.04.7..07 Performance Analyss and Codng Strategy of ECOC SVMs Zhgang Yan, and Yuanxuan Yang, School
Statistical algorithms in Review Manager 5
Statstcal algorthms n Reve Manager 5 Jonathan J Deeks and Julan PT Hggns on behalf of the Statstcal Methods Group of The Cochrane Collaboraton August 00 Data structure Consder a meta-analyss of k studes
Using Content-Based Filtering for Recommendation 1
Usng Content-Based Flterng for Recommendaton 1 Robn van Meteren 1 and Maarten van Someren 2 1 NetlnQ Group, Gerard Brandtstraat 26-28, 1054 JK, Amsterdam, The Netherlands, [email protected] 2 Unversty of
Exploiting Recommendation on Social Media Networks
Internatonal Journal of Scence and Research IJSR) ISSN Onln: 2319-7064 Index Coperncus Value 2013): 6.14 Impact Factor 2013): 4.438 Explotng Recommendaton on Socal Meda Networs Swat A. Adhav 1, Sheetal
Research on Transformation Engineering BOM into Manufacturing BOM Based on BOP
Appled Mechancs and Materals Vols 10-12 (2008) pp 99-103 Onlne avalable snce 2007/Dec/06 at wwwscentfcnet (2008) Trans Tech Publcatons, Swtzerland do:104028/wwwscentfcnet/amm10-1299 Research on Transformaton
Bayesian Network Based Causal Relationship Identification and Funding Success Prediction in P2P Lending
Proceedngs of 2012 4th Internatonal Conference on Machne Learnng and Computng IPCSIT vol. 25 (2012) (2012) IACSIT Press, Sngapore Bayesan Network Based Causal Relatonshp Identfcaton and Fundng Success
A DATA MINING APPLICATION IN A STUDENT DATABASE
JOURNAL OF AERONAUTICS AND SPACE TECHNOLOGIES JULY 005 VOLUME NUMBER (53-57) A DATA MINING APPLICATION IN A STUDENT DATABASE Şenol Zafer ERDOĞAN Maltepe Ünversty Faculty of Engneerng Büyükbakkalköy-Istanbul
WISE-Integrator: An Automatic Integrator of Web Search Interfaces for E-Commerce
WSE-ntegrator: An Automatc ntegrator of Web Search nterfaces for E-Commerce Ha He, Wey Meng Dept. of Computer Scence SUNY at Bnghamton Bnghamton, NY 13902 {hahe,meng}@cs.bnghamton.edu Clement Yu Dept.
Face Verification Problem. Face Recognition Problem. Application: Access Control. Biometric Authentication. Face Verification (1:1 matching)
Face Recognton Problem Face Verfcaton Problem Face Verfcaton (1:1 matchng) Querymage face query Face Recognton (1:N matchng) database Applcaton: Access Control www.vsage.com www.vsoncs.com Bometrc Authentcaton
Implementation of Deutsch's Algorithm Using Mathcad
Implementaton of Deutsch's Algorthm Usng Mathcad Frank Roux The followng s a Mathcad mplementaton of Davd Deutsch's quantum computer prototype as presented on pages - n "Machnes, Logc and Quantum Physcs"
Comparison of Domain-Specific Lexicon Construction Methods for Sentiment Analysis
, pp.152-156 http://d.do.org/10.14257/astl.2016.135.38 Comparson of Doman-Specfc Lecon Constructon Methods for Sentment Analyss Myeong So Km 1, Jong Woo Km 2,3 and Cu Jng 4 1 Department of Mathematcs,
Approaches to Text Mining for Clinical Medical Records
Approaches to Text Mnng for Clncal Medcal Records Xaohua Zhou and Hyol Han College of Informaton Scence and Technology Drexel Unversty Phladelpha, PA 19104 [email protected] [email protected] Isaac
Set. algorithms based. 1. Introduction. System Diagram. based. Exploration. 2. Index
ISSN (Prnt): 1694-0784 ISSN (Onlne): 1694-0814 www.ijcsi.org 236 IT outsourcng servce provder dynamc evaluaton model and algorthms based on Rough Set L Sh Sh 1,2 1 Internatonal School of Software, Wuhan
Performance Analysis of Energy Consumption of Smartphone Running Mobile Hotspot Application
Internatonal Journal of mart Grd and lean Energy Performance Analyss of Energy onsumpton of martphone Runnng Moble Hotspot Applcaton Yun on hung a chool of Electronc Engneerng, oongsl Unversty, 511 angdo-dong,
Luby s Alg. for Maximal Independent Sets using Pairwise Independence
Lecture Notes for Randomzed Algorthms Luby s Alg. for Maxmal Independent Sets usng Parwse Independence Last Updated by Erc Vgoda on February, 006 8. Maxmal Independent Sets For a graph G = (V, E), an ndependent
Title Language Model for Information Retrieval
Ttle Language Model for Informaton Retreval Rong Jn Language Technologes Insttute School of Computer Scence Carnege Mellon Unversty Alex G. Hauptmann Computer Scence Department School of Computer Scence
Network Security Situation Evaluation Method for Distributed Denial of Service
Network Securty Stuaton Evaluaton Method for Dstrbuted Denal of Servce Jn Q,2, Cu YMn,2, Huang MnHuan,2, Kuang XaoHu,2, TangHong,2 ) Scence and Technology on Informaton System Securty Laboratory, Bejng,
8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by
6 CHAPTER 8 COMPLEX VECTOR SPACES 5. Fnd the kernel of the lnear transformaton gven n Exercse 5. In Exercses 55 and 56, fnd the mage of v, for the ndcated composton, where and are gven by the followng
RESEARCH ON DUAL-SHAKER SINE VIBRATION CONTROL. Yaoqi FENG 1, Hanping QIU 1. China Academy of Space Technology (CAST) yaoqi.feng@yahoo.
ICSV4 Carns Australa 9- July, 007 RESEARCH ON DUAL-SHAKER SINE VIBRATION CONTROL Yaoq FENG, Hanpng QIU Dynamc Test Laboratory, BISEE Chna Academy of Space Technology (CAST) [email protected] Abstract
Discriminative Improvements to Distributional Sentence Similarity
Dscrmnatve Improvements to Dstrbutonal Sentence Smlarty Yangfeng J School of Interactve Computng Georga Insttute of Technology [email protected] Jacob Esensten School of Interactve Computng Georga Insttute
A Simple Approach to Clustering in Excel
A Smple Approach to Clusterng n Excel Aravnd H Center for Computatonal Engneerng and Networng Amrta Vshwa Vdyapeetham, Combatore, Inda C Rajgopal Center for Computatonal Engneerng and Networng Amrta Vshwa
PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 12
14 The Ch-squared dstrbuton PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 1 If a normal varable X, havng mean µ and varance σ, s standardsed, the new varable Z has a mean 0 and varance 1. When ths standardsed
1.1 The University may award Higher Doctorate degrees as specified from time-to-time in UPR AS11 1.
HIGHER DOCTORATE DEGREES SUMMARY OF PRINCIPAL CHANGES General changes None Secton 3.2 Refer to text (Amendments to verson 03.0, UPR AS02 are shown n talcs.) 1 INTRODUCTION 1.1 The Unversty may award Hgher
Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur
Module LOSSLESS IMAGE COMPRESSION SYSTEMS Lesson 3 Lossless Compresson: Huffman Codng Instructonal Objectves At the end of ths lesson, the students should be able to:. Defne and measure source entropy..
Course outline. Financial Time Series Analysis. Overview. Data analysis. Predictive signal. Trading strategy
Fnancal Tme Seres Analyss Patrck McSharry [email protected] www.mcsharry.net Trnty Term 2014 Mathematcal Insttute Unversty of Oxford Course outlne 1. Data analyss, probablty, correlatons, vsualsaton
Open Access A Load Balancing Strategy with Bandwidth Constraint in Cloud Computing. Jing Deng 1,*, Ping Guo 2, Qi Li 3, Haizhu Chen 1
Send Orders for Reprnts to [email protected] The Open Cybernetcs & Systemcs Journal, 2014, 8, 115-121 115 Open Access A Load Balancng Strategy wth Bandwdth Constrant n Cloud Computng Jng Deng 1,*,
COMPUTER SUPPORT OF SEMANTIC TEXT ANALYSIS OF A TECHNICAL SPECIFICATION ON DESIGNING SOFTWARE. Alla Zaboleeva-Zotova, Yulia Orlova
Internatonal Book Seres "Informaton Scence and Computng" 29 COMPUTE SUPPOT O SEMANTIC TEXT ANALYSIS O A TECHNICAL SPECIICATION ON DESIGNING SOTWAE Alla Zaboleeva-Zotova, Yula Orlova Abstract: The gven
Semantic Matchmaking for Job Recruitment: An Ontology-Based Hybrid Approach
Semantc Matchmang for Job Recrutment: An Ontology-Based Hybrd Approach Maryam Fazel-Zarand Department of Computer Scence, Unversty of Toronto 10 Kng's College Road, Toronto, ON, M5S 3G4, CANADA Emal: [email protected]
Using Supervised Clustering Technique to Classify Received Messages in 137 Call Center of Tehran City Council
Usng Supervsed Clusterng Technque to Classfy Receved Messages n 137 Call Center of Tehran Cty Councl Mahdyeh Haghr 1*, Hamd Hassanpour 2 (1) Informaton Technology engneerng/e-commerce, Shraz Unversty (2)
A Load-Balancing Algorithm for Cluster-based Multi-core Web Servers
Journal of Computatonal Informaton Systems 7: 13 (2011) 4740-4747 Avalable at http://www.jofcs.com A Load-Balancng Algorthm for Cluster-based Mult-core Web Servers Guohua YOU, Yng ZHAO College of Informaton
320 The Internatonal Arab Journal of Informaton Technology, Vol. 5, No. 3, July 2008 Comparsons Between Data Clusterng Algorthms Osama Abu Abbas Computer Scence Department, Yarmouk Unversty, Jordan Abstract:
Probabilistic Latent Semantic User Segmentation for Behavioral Targeted Advertising*
Probablstc Latent Semantc User Segmentaton for Behavoral Targeted Advertsng* Xaohu Wu 1,2, Jun Yan 2, Nng Lu 2, Shucheng Yan 3, Yng Chen 1, Zheng Chen 2 1 Department of Computer Scence Bejng Insttute of
An Alternative Way to Measure Private Equity Performance
An Alternatve Way to Measure Prvate Equty Performance Peter Todd Parlux Investment Technology LLC Summary Internal Rate of Return (IRR) s probably the most common way to measure the performance of prvate
Dynamic Fuzzy Pattern Recognition
Dynamc Fuzzy Pattern Recognton Von der Fakultät für Wrtschaftswssenschaften der Rhensch-Westfälschen Technschen Hochschule Aachen zur Erlangung des akademschen Grades enes Doktors der Wrtschafts- und Sozalwssenschaften
Traffic-light a stress test for life insurance provisions
MEMORANDUM Date 006-09-7 Authors Bengt von Bahr, Göran Ronge Traffc-lght a stress test for lfe nsurance provsons Fnansnspetonen P.O. Box 6750 SE-113 85 Stocholm [Sveavägen 167] Tel +46 8 787 80 00 Fax
Methodology to Determine Relationships between Performance Factors in Hadoop Cloud Computing Applications
Methodology to Determne Relatonshps between Performance Factors n Hadoop Cloud Computng Applcatons Lus Eduardo Bautsta Vllalpando 1,2, Alan Aprl 1 and Alan Abran 1 1 Department of Software Engneerng and
METHODOLOGY TO DETERMINE RELATIONSHIPS BETWEEN PERFORMANCE FACTORS IN HADOOP CLOUD COMPUTING APPLICATIONS
METHODOLOGY TO DETERMINE RELATIONSHIPS BETWEEN PERFORMANCE FACTORS IN HADOOP CLOUD COMPUTING APPLICATIONS Lus Eduardo Bautsta Vllalpando 1,2, Alan Aprl 1 and Alan Abran 1 1 Department of Software Engneerng
STATISTICAL DATA ANALYSIS IN EXCEL
Mcroarray Center STATISTICAL DATA ANALYSIS IN EXCEL Lecture 6 Some Advanced Topcs Dr. Petr Nazarov 14-01-013 [email protected] Statstcal data analyss n Ecel. 6. Some advanced topcs Correcton for
NEURO-FUZZY INFERENCE SYSTEM FOR E-COMMERCE WEBSITE EVALUATION
NEURO-FUZZY INFERENE SYSTEM FOR E-OMMERE WEBSITE EVALUATION Huan Lu, School of Software, Harbn Unversty of Scence and Technology, Harbn, hna Faculty of Appled Mathematcs and omputer Scence, Belarusan State
Lecture 2: Single Layer Perceptrons Kevin Swingler
Lecture 2: Sngle Layer Perceptrons Kevn Sngler [email protected] Recap: McCulloch-Ptts Neuron Ths vastly smplfed model of real neurons s also knon as a Threshold Logc Unt: W 2 A Y 3 n W n. A set of synapses
I I I I DIAGNOSIS AS A NOTION OF GRAMMAR. to deal with them. Mitchell Marcus Artificial Intelligence Laboratory M.I.T.
DAGNOSS AS A NOTON OF GRAMMAR Mtchell Marcus Artfcal ntellgence Laboratory M..T. Ths paper wll sketch an approach to natural language parsng based on a new concepton of what makes up a recognton grammar
Fast Fuzzy Clustering of Web Page Collections
Fast Fuzzy Clusterng of Web Page Collectons Chrstan Borgelt and Andreas Nürnberger Dept. of Knowledge Processng and Language Engneerng Otto-von-Guercke-Unversty of Magdeburg Unverstätsplatz, D-396 Magdeburg,
Enterprise Master Patient Index
Enterprse Master Patent Index Healthcare data are captured n many dfferent settngs such as hosptals, clncs, labs, and physcan offces. Accordng to a report by the CDC, patents n the Unted States made an
How To Calculate The Accountng Perod Of Nequalty
Inequalty and The Accountng Perod Quentn Wodon and Shlomo Ytzha World Ban and Hebrew Unversty September Abstract Income nequalty typcally declnes wth the length of tme taen nto account for measurement.
A Secure Password-Authenticated Key Agreement Using Smart Cards
A Secure Password-Authentcated Key Agreement Usng Smart Cards Ka Chan 1, Wen-Chung Kuo 2 and Jn-Chou Cheng 3 1 Department of Computer and Informaton Scence, R.O.C. Mltary Academy, Kaohsung 83059, Tawan,
Exhaustive Regression. An Exploration of Regression-Based Data Mining Techniques Using Super Computation
Exhaustve Regresson An Exploraton of Regresson-Based Data Mnng Technques Usng Super Computaton Antony Daves, Ph.D. Assocate Professor of Economcs Duquesne Unversty Pttsburgh, PA 58 Research Fellow The
Overview of monitoring and evaluation
540 Toolkt to Combat Traffckng n Persons Tool 10.1 Overvew of montorng and evaluaton Overvew Ths tool brefly descrbes both montorng and evaluaton, and the dstncton between the two. What s montorng? Montorng
Calculation of Sampling Weights
Perre Foy Statstcs Canada 4 Calculaton of Samplng Weghts 4.1 OVERVIEW The basc sample desgn used n TIMSS Populatons 1 and 2 was a two-stage stratfed cluster desgn. 1 The frst stage conssted of a sample
Logical Development Of Vogel s Approximation Method (LD-VAM): An Approach To Find Basic Feasible Solution Of Transportation Problem
INTERNATIONAL JOURNAL OF SCIENTIFIC & TECHNOLOGY RESEARCH VOLUME, ISSUE, FEBRUARY ISSN 77-866 Logcal Development Of Vogel s Approxmaton Method (LD- An Approach To Fnd Basc Feasble Soluton Of Transportaton
Statistical Approach for Offline Handwritten Signature Verification
Journal of Computer Scence 4 (3): 181-185, 2008 ISSN 1549-3636 2008 Scence Publcatons Statstcal Approach for Offlne Handwrtten Sgnature Verfcaton 2 Debnath Bhattacharyya, 1 Samr Kumar Bandyopadhyay, 2
How To Know The Components Of Mean Squared Error Of Herarchcal Estmator S
S C H E D A E I N F O R M A T I C A E VOLUME 0 0 On Mean Squared Error of Herarchcal Estmator Stans law Brodowsk Faculty of Physcs, Astronomy, and Appled Computer Scence, Jagellonan Unversty, Reymonta
Binomial Link Functions. Lori Murray, Phil Munz
Bnomal Lnk Functons Lor Murray, Phl Munz Bnomal Lnk Functons Logt Lnk functon: ( p) p ln 1 p Probt Lnk functon: ( p) 1 ( p) Complentary Log Log functon: ( p) ln( ln(1 p)) Motvatng Example A researcher
benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).
REVIEW OF RISK MANAGEMENT CONCEPTS LOSS DISTRIBUTIONS AND INSURANCE Loss and nsurance: When someone s subject to the rsk of ncurrng a fnancal loss, the loss s generally modeled usng a random varable or
Project Networks With Mixed-Time Constraints
Project Networs Wth Mxed-Tme Constrants L Caccetta and B Wattananon Western Australan Centre of Excellence n Industral Optmsaton (WACEIO) Curtn Unversty of Technology GPO Box U1987 Perth Western Australa
A Novel Adaptive Load Balancing Routing Algorithm in Ad hoc Networks
Journal of Convergence Informaton Technology A Novel Adaptve Load Balancng Routng Algorthm n Ad hoc Networks Zhu Bn, Zeng Xao-png, Xong Xan-sheng, Chen Qan, Fan Wen-yan, We Geng College of Communcaton
CHAPTER 5 RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES
CHAPTER 5 RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES In ths chapter, we wll learn how to descrbe the relatonshp between two quanttatve varables. Remember (from Chapter 2) that the terms quanttatve varable
NPAR TESTS. One-Sample Chi-Square Test. Cell Specification. Observed Frequencies 1O i 6. Expected Frequencies 1EXP i 6
PAR TESTS If a WEIGHT varable s specfed, t s used to replcate a case as many tmes as ndcated by the weght value rounded to the nearest nteger. If the workspace requrements are exceeded and samplng has
Using an Adaptive Fuzzy Logic System to Optimise Knowledge Discovery in Proteomics
Usng an Adaptve Fuzzy Logc System to Optmse Knowledge Dscovery n Proteomcs James Malone, Ken McGarry and Chrs Bowerman School of Computng and Technology Sunderland Unversty St. Peter s Way, Sunderland,
Design and Development of a Security Evaluation Platform Based on International Standards
Internatonal Journal of Informatcs Socety, VOL.5, NO.2 (203) 7-80 7 Desgn and Development of a Securty Evaluaton Platform Based on Internatonal Standards Yuj Takahash and Yoshm Teshgawara Graduate School
Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic
Lagrange Multplers as Quanttatve Indcators n Economcs Ivan Mezník Insttute of Informatcs, Faculty of Busness and Management, Brno Unversty of TechnologCzech Republc Abstract The quanttatve role of Lagrange
RequIn, a tool for fast web traffic inference
RequIn, a tool for fast web traffc nference Olver aul, Jean Etenne Kba GET/INT, LOR Department 9 rue Charles Fourer 90 Evry, France [email protected], [email protected] Abstract As networked
1. Measuring association using correlation and regression
How to measure assocaton I: Correlaton. 1. Measurng assocaton usng correlaton and regresson We often would lke to know how one varable, such as a mother's weght, s related to another varable, such as a
Vehicle Detection and Tracking in Video from Moving Airborne Platform
Journal of Computatonal Informaton Systems 10: 12 (2014) 4965 4972 Avalable at http://www.jofcs.com Vehcle Detecton and Trackng n Vdeo from Movng Arborne Platform Lye ZHANG 1,2,, Hua WANG 3, L LI 2 1 School
