An Approach to Automatically Constructing Domain Ontology 1



Similar documents
The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis

Vision Mouse. Saurabh Sarkar a* University of Cincinnati, Cincinnati, USA ABSTRACT 1. INTRODUCTION

What is Candidate Sampling

Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network

Web Object Indexing Using Domain Knowledge *

Semantic Link Analysis for Finding Answer Experts *

1 Example 1: Axis-aligned rectangles

Study on Model of Risks Assessment of Standard Operation in Rural Power Network

An Interest-Oriented Network Evolution Mechanism for Online Communities

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College

Mining Multiple Large Data Sources

FREQUENCY OF OCCURRENCE OF CERTAIN CHEMICAL CLASSES OF GSR FROM VARIOUS AMMUNITION TYPES

How To Analyze News From A News Report

A neuro-fuzzy collaborative filtering approach for Web recommendation. G. Castellano, A. M. Fanelli, and M. A. Torsello *

SIMPLE LINEAR CORRELATION

14.74 Lecture 5: Health (2)

Assessing Student Learning Through Keyword Density Analysis of Online Class Messages

Forecasting the Direction and Strength of Stock Market Movement

Context-aware Mobile Recommendation System Based on Context History

Can Auto Liability Insurance Purchases Signal Risk Attitude?

Searching for Interacting Features for Spam Filtering

A Performance Analysis of View Maintenance Techniques for Data Warehouses

PEER REVIEWER RECOMMENDATION IN ONLINE SOCIAL LEARNING CONTEXT: INTEGRATING INFORMATION OF LEARNERS AND SUBMISSIONS

THE APPLICATION OF DATA MINING TECHNIQUES AND MULTIPLE CLASSIFIERS TO MARKETING DECISION

Performance Analysis and Coding Strategy of ECOC SVMs

Statistical algorithms in Review Manager 5

Using Content-Based Filtering for Recommendation 1

Exploiting Recommendation on Social Media Networks

Research on Transformation Engineering BOM into Manufacturing BOM Based on BOP

Bayesian Network Based Causal Relationship Identification and Funding Success Prediction in P2P Lending

A DATA MINING APPLICATION IN A STUDENT DATABASE

WISE-Integrator: An Automatic Integrator of Web Search Interfaces for E-Commerce

Face Verification Problem. Face Recognition Problem. Application: Access Control. Biometric Authentication. Face Verification (1:1 matching)

Implementation of Deutsch's Algorithm Using Mathcad

Comparison of Domain-Specific Lexicon Construction Methods for Sentiment Analysis

Approaches to Text Mining for Clinical Medical Records

Set. algorithms based. 1. Introduction. System Diagram. based. Exploration. 2. Index

Performance Analysis of Energy Consumption of Smartphone Running Mobile Hotspot Application

Luby s Alg. for Maximal Independent Sets using Pairwise Independence

Title Language Model for Information Retrieval

Network Security Situation Evaluation Method for Distributed Denial of Service

8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by

RESEARCH ON DUAL-SHAKER SINE VIBRATION CONTROL. Yaoqi FENG 1, Hanping QIU 1. China Academy of Space Technology (CAST)

Discriminative Improvements to Distributional Sentence Similarity

A Simple Approach to Clustering in Excel

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 12

1.1 The University may award Higher Doctorate degrees as specified from time-to-time in UPR AS11 1.

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Course outline. Financial Time Series Analysis. Overview. Data analysis. Predictive signal. Trading strategy

Open Access A Load Balancing Strategy with Bandwidth Constraint in Cloud Computing. Jing Deng 1,*, Ping Guo 2, Qi Li 3, Haizhu Chen 1

COMPUTER SUPPORT OF SEMANTIC TEXT ANALYSIS OF A TECHNICAL SPECIFICATION ON DESIGNING SOFTWARE. Alla Zaboleeva-Zotova, Yulia Orlova

Semantic Matchmaking for Job Recruitment: An Ontology-Based Hybrid Approach

Using Supervised Clustering Technique to Classify Received Messages in 137 Call Center of Tehran City Council

A Load-Balancing Algorithm for Cluster-based Multi-core Web Servers


Probabilistic Latent Semantic User Segmentation for Behavioral Targeted Advertising*

An Alternative Way to Measure Private Equity Performance

Dynamic Fuzzy Pattern Recognition

Traffic-light a stress test for life insurance provisions

Methodology to Determine Relationships between Performance Factors in Hadoop Cloud Computing Applications

METHODOLOGY TO DETERMINE RELATIONSHIPS BETWEEN PERFORMANCE FACTORS IN HADOOP CLOUD COMPUTING APPLICATIONS

STATISTICAL DATA ANALYSIS IN EXCEL

NEURO-FUZZY INFERENCE SYSTEM FOR E-COMMERCE WEBSITE EVALUATION

Lecture 2: Single Layer Perceptrons Kevin Swingler

I I I I DIAGNOSIS AS A NOTION OF GRAMMAR. to deal with them. Mitchell Marcus Artificial Intelligence Laboratory M.I.T.

Fast Fuzzy Clustering of Web Page Collections

Enterprise Master Patient Index

How To Calculate The Accountng Perod Of Nequalty

A Secure Password-Authenticated Key Agreement Using Smart Cards

Exhaustive Regression. An Exploration of Regression-Based Data Mining Techniques Using Super Computation

Overview of monitoring and evaluation

Calculation of Sampling Weights

Logical Development Of Vogel s Approximation Method (LD-VAM): An Approach To Find Basic Feasible Solution Of Transportation Problem

Statistical Approach for Offline Handwritten Signature Verification

How To Know The Components Of Mean Squared Error Of Herarchcal Estmator S

Binomial Link Functions. Lori Murray, Phil Munz

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).

Project Networks With Mixed-Time Constraints

A Novel Adaptive Load Balancing Routing Algorithm in Ad hoc Networks

CHAPTER 5 RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES

NPAR TESTS. One-Sample Chi-Square Test. Cell Specification. Observed Frequencies 1O i 6. Expected Frequencies 1EXP i 6

Using an Adaptive Fuzzy Logic System to Optimise Knowledge Discovery in Proteomics

Design and Development of a Security Evaluation Platform Based on International Standards

Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic

RequIn, a tool for fast web traffic inference

1. Measuring association using correlation and regression

Vehicle Detection and Tracking in Video from Moving Airborne Platform

Transcription:

An Approach to Automatcally Constructng Doman Ontology 1 Tngtng He 1 2 3 Xaopeng Zhang 1 3 Xnghuo Ye 1 3 1 Department of Computer Scence, Huazhong ormal Unversty 430079 Wuhan, Chna 2 Software College of Tsnghua Unversty 102201 Beng, Chna 3 atonal Language Resources Montor & Research Center (network meda) 430079 Wuhan, Chna tthe@mal.ccnu.edu.cn zhangxaopeng@mals.ccnu.edu.cn yxhez@tom.com Abstract. In ths paper we present an approach to mnng doman-dependent ontologes usng term extracton and relatonshp dscovery technology. There are two man nnovatons n our approach. One s extractng terms usng log-lkelhood rato, whch s based on the contrastve probablty of term occurrence n doman corpus and background corpus. The other s fusng together nformaton from multple knowledge sources as evdences for dscoverng partcular semantc relatonshps among terms. In the experment, we also mprove the tradtonal k-medods algorthm for mult-level clusterng. We have appled our approach to produce an ontology for the doman of computer scence and obtaned promsng results. 1 Introducton Ontologes have become an mportant means for structurng knowledge and buldng knowledge-ntensve systems. Ontologes have shown ther usefulness n applcaton areas such as ntellgent nformaton ntegraton, nformaton retreval and natural language processng, to name but a few. For ths purpose, efforts have been made to facltate the ontology engneerng process, n partcular the acquston of ontologes from doman texts [1]. Constructng an ontology s an extremely laborous effort. Even wth some reuse of core knowledge from an Upper Model, the task of creatng an ontology for a partcular doman has a hgh cost, ncurred for each new doman. Tools that could automate, or sem-automate, the constructon of ontologes for dfferent domans could dramatcally reduce the knowledge creaton cost. One approach to developng such tools s to rely on nformaton mplct n collectons of texts n a partcular doman. If t were possble to automatcally extract terms and ther semantc relatons from the text corpus, doman ontology could be bult convenently. Ths would be more cost-effectve than havng a human develop the ontology from scratch [2][3]. Doman specfc ontologes are useful n systems nvolved wth artfcal reasonng, and nformaton retreval. Ontologes gve such systems a vocabulary of terms, and concepts relatng one term to another. In ths paper, we present a method that automatcally mnng an ontology from any large corpus n a specfc doman, to support data ntegraton and nformaton retreval tasks. The nduced ontology conssts of doman concepts only related by parent-chld lnks, not ncludng more specalzed relatons. Our approach s comprsed of two man phases: term extracton and relatonshp dscovery. In the former phase, meanngful terms are extracted usng LLR formula. In the latter one, parent-chld relatons among terms are nduced through multlevel clusterng. The remander of the paper s as follows. In Secton 2, we wll dscuss the related works n ths feld. In Secton 3, we wll gve an overvew of the overall system archtecture and explan the means and progress of ontology constructon. Subsequently, an example wll show some promsng results we have 1 The paper s supported by atonal atural Scence Foundaton of Chna (SFC), (Grant o.60496323; Grant o.60375016 Grant o.10071028;); Mnstry of educaton of Chna, Research Proect for Scence and technology, (Grant o. 105117). 150

obtaned when applyng our mechansms for mnng ontologes from text, together wth our analyss. Ths wll be presented n Secton 4. In Secton 5, we conclude and menton our future work. 2 Prevous Work The exstng approaches to ontology nducton nclude those that start from structured data, mergng ontologes or database schemas (Doan et al. 2002). Other approaches use natural language data, sometmes ust by analyzng the corpus (Sanderson and Croft 1999), (Caraballo 1999) or by learnng to expand Wordet wth clusters of terms from a corpus, e.g., (Gru et al. 2003). Informaton extracton approaches that nfer labeled relatons ether requre substantal hand-created lngustc or doman knowledge, e.g., (Craven and Kumlen 1999) (Hull and Gomez 1993), or requre human-annotated tranng data wth relaton nformaton for each doman (Craven et al. 1998). For automatc ontology constructon, Govnd and Chakravarth have presented n 2001 an approach that extracts ontology from text documents by Sngular Value Decomposton (SVD), whch s a pure statstcal analyss method, as compared to heurstc and rule based methods. They adopt Latent Semantc Indexng (LSI), whch attempts to catch term-term statstcal references by replacng the document space wth lower dmensonal concept space. Ther method s convnced of ts smplcty because t s based on farly precse mathematcal foundaton. It s effectve but lmted wth precson. What s more, t doesn t fgure out the exact relatons among terms [4]. In year of 2001, Dekang Ln and Patrck Pantel proposed a method for doman concepts dscovery based on a clusterng algorthm called CBC (Clusterng by Commttee). They generally regard a concept as a cluster of terms. It ust deals wth only one aspect of the whole progress of ontology nducton [5]. 3 Automatc Ontology Constructng 3.1 Archtecture An overall archtecture for doman-ndependent ontology nducton s shown n Fgure 1. The documents of doman corpora are preprocessed to remove segments such as pctures and specfc symbols and then flter the stopwords. ext, terms are extracted based on word segmentaton and syntactc taggng. Subsequently semantc relatons between pars of terms are derved usng multlevel clusterng wth evdences from multple knowledge sources. Data Cleanup Term Dscovery Doman Corpus Removng Symbols Flterng Stopwords Parsng Taggng Term Scorng Term Extracton Relatonshp Dscovery Ontology Output Mult-level Clusterng Smlarty Computng Chnese Howet Fg. 1. System Archtecture 151

3.2 Term Dscovery In ths secton, our goal s to dentfy doman-relevant terms from a collecton of doman specfc texts. For term extracton, there exst several approaches. One s based on PAT-TREE, whch has the advantage n extractng terms (phrases) of any length because t could avod word segmentaton for Chnese. But t needs a large amount of documents to acheve excellent precson. Another approach to term extracton s C/C-Value method proposed by Frantz and Ananadou (1999), whch combnes lngustcs and statstcs methods and makes a progress. But t has lmtaton because ts lngustcs knowledge s only for Englsh. In ths paper, we propose a promsng approach to extractng doman-relevant terms. Terms are scored for doman-relevance based on the assumpton that f a term occurs sgnfcantly more n a doman corpus than n a background corpus, then the term s clearly doman relevant. As an example, n Table 1 we compare the number of documents contanng the terms 算 法 (algorthm), 内 存 (memory), 路 由 器 (Router) and 数 据 库 (database) n a doman-dependent corpus of computer scence ncludng 200 Chnese perodcal papers, compared to a larger Modern Chnese Corpus as the background corpus. Table 1. Term IDF n doman and background corpora 算 法 内 存 路 由 器 数 据 库 Total Doman Corpus 89 78 73 69 200 Background Corpus 7 0 4 6 12000 Dfference 82 78 69 63-11800 We can observe from Table 1 that the terms 算 法 (algorthm), 内 存 (memory), 路 由 器 (Router) and 数 据 库 (database) occur sgnfcantly more n the doman corpus than n the Background corpus. We consder they are doman-dependent. To estmate the doman relevancy of a term, we use the log-lkelhood rato (LLR) gven by -2 log2 ( Ho (p;k1,n1,k2,n2 ) / Ha ( p1,p2;n1,k1,n2,k2 ) ) (1) LLR measures the extent to whch a hypotheszed model of the dstrbuton of cell counts, Ha, dffers from the null hypothess, Ho (namely, that the percentage of documents contanng ths term s the same n both corpora). We use a bnomal model for Ho and Ha. Here, p=(k1+k2)/(n1+n2), p1=k1/n1, p2=k2/n2, k1 s the number of documents contanng the term n the doman corpus, k2 s the number of documents contanng the term n the background corpus, n1 s the total number of documents n the doman corpus, n2 s the total number of documents n the background corpus [6]. 3.3 Relatonshp Dscovery In ths secton, we try to mne the parent-chld lnks among terms usng multlevel clusterng based on term smlarty. We fuse together nformaton from multple knowledge sources as evdences for partcular semantc relatonshps among terms. For clusterng we adopt mproved k-medods algorthm, whch s flat and effectve. The process of nducng relatons s as follows. Frst the clusterng algorthm s used to obtan top-level clusters. Subsequently, for each top-level cluster, we use the clusterng algorthm agan to gan clusters of second level. By analogy, we can fnd multlevel clusters that mply parent-chld relatonshps among terms. In experment the depth of clusterng level s restrcted. 152

3.3.1 Term Smlarty Computaton The preprocessed documents are analyzed and the frequency matrx known as Term-Document matrx s produced as result of the analyss. For a Term-Document matrx M[m][n], each term can be represented as the followng vector: T r = (M[][0], M[][1],, M[][k],, M[][n]) (2) We can compute the cosne smlarty between a par of terms gven by r r r r T T (3) cos( T, T ) = r r T T The cosne smlartes between several term pars are shown n Table 2. Table 2. Cosne smlartes of term pars Term1 Term2 Cosne Smlarty 缓 冲 区 (buffer) 磁 盘 (dsk) 0.436248 缓 冲 区 (buffer) 主 存 (man store) 0.579771 内 存 (memory) 磁 盘 (dsk) 0.434059 服 务 器 (server) 网 络 (network) 0.17585 It can be observed that the cosne smlarty s lkely to satsfy the concurrent frequency of term pars. It reflects the context correlaton of terms but mght have a preudce aganst semantc correlaton among terms. For nstance, the cosne smlarty computed above between 磁 盘 (dsk) and 数 据 (data) s ust 0.0663552, whch devates from the experental value. In order to exact smlarty, we use Howet as another knowledge resource, whch s a Chnese thesaurus by Zhendong Dong and Qang Dong (2001). We combne the smlarty of terms n Howet as a supplement wth the cosne smlarty to estmate the correlaton of term pars. The Howet smlarty that nvolves unknown words s assgned to 0. Table 3 shows some Howet smlartes of term pars. Table 3. Howet smlartes of term pars Term1 Term2 Howet Smlarty 缓 冲 区 (buffer) 磁 盘 (dsk) 0.149333 缓 冲 区 (buffer) 主 存 (man store) 0.000000 内 存 (memory) 磁 盘 (dsk) 0.149333 服 务 器 (server) 网 络 (network) 0.285714 The fnal formula of smlarty computaton s gven by r r r r r r sma ( T, T ) + α smb ( T, T ) (4) sm ( T, T ) = 2 153

r r Where sma T, T ) ( r r s the cosne smlarty and smb T, T ) ( s the Howet smlarty; Meanwhle, α s an adustment argument. The adusted smlarty between term pars s gven n Table 4 (α =0.5). Table 4. Adusted smlartes of term pars Term1 Term2 Smlarty 缓 冲 区 (buffer) 磁 盘 (dsk) 0.510913 缓 冲 区 (buffer) 主 存 (man store) 0.579771 内 存 (memory) 磁 盘 (dsk) 0.508724 服 务 器 (server) 网 络 (network) 0.318716 3.3.2 Clusterng Algorthm In order to make t converge faster and result n better clusters n accordance wth the orgnal dstrbuton of terms, we mproved the tradtonal k-medods algorthm. When decdng the new center of a cluster, we frst choose top-p terms closest to the old center n the cluster and take the term closest to the mean of the p terms as the new center. The mproved algorthm s descrbed as follows: 1. Choose m terms as the centers of clusters randomly: C1, C 2,, C,, ; C m 2. Compute the smlarty for each term to each cluster center. Assgn the term nto the closest cluster. 3. Determne the new center of each cluster as follows: Compute the average smlarty of terms n cluster usng the formula: 1 avgsm[ ] = n n k = 1 sm( T k, C ) Where n s the number of terms n cluster. Pck out p terms most closest to the cluster center decded by avgsm [ ] p = m max_ avgsm Where max_ avgsm s the maxmum of m average smlartes for m clusters. Compute the mean of the above p terms and choose the term closest to t as the new center of cluster. 4. Go to step 2 only f m cluster centers s not stable. 5. End and result n m clusters. (5) (6) 4 Experments and Evaluaton We have appled our approach to produce ontology n computer scence doman. The doman corpus contans 200 Chnese theses (3.12 M) retreved from professonal publcatons. The background corpus we used s the Modern Chnese Corpus (1999-2001) whch conssts of 12000 documents (33.5 M). 154

4.1 Term Dscovery In experment, we have compared the effect of the two scorng methods LLR and TF.IDF. Table 5 shows the scores of some terms n relatve percentage. The boldfaced terms s dentfed by LLR to be doman-relevant. It can be nferred from the experment results that LLR excels TF.IDF. Table 5. Relatve scores of both LLR and TF.IDF Term Doman DF Background DF LLR score TF.IDF score 路 由 器 73 4 98.6 66.8 总 线 56 1 99.9 70.4 控 制 器 35 7 96.5 94.5 端 口 25 5 93.1 65.3 缓 存 36 2 94.6 53.5 请 求 17 188 67.5 72.6 线 路 31 232 56.2 65.7 质 量 37 3725 42.4 99.9 屏 蔽 7 8 15.3 48.2 A term lst of 286 doman-relevant terms (key terms) has been manually constructed to support the subsequent evaluaton. We estmate the results normally based on Recall, Precson and F-measure, whch are defned as follows. Recall(R) The rato of correct terms number to key terms number n manual lst gven by R = correct key (7) Precson(P) The rato of correct terms number to extracted terms number gven by P = correct correct + ncorrect (8) F-measure(F) F-measure s defned as a combnaton of recall and precson: F = 2 RP (9) R + P These three values vary dependng on the threshold of LLR. Fgure 2 shows the varety of F-measure, recall and precson along wth LLR. 155

Fg. 1. F-measure, recall and precson by varyng the threshold of LLR 4.2 Relatonshp Dscovery Accordng to fgure 2, we chose 94 as the threshold of LLR and obtaned 216 doman-relevant terms. The depth of clusterng level was restrcted to 2. We have ganed 5 top-level clusters and 15 second level clusters. The precson of the top-level clusters acheved 76.7%. The average precson of the second-level clusters acheved 70.3%. The parent-chld relatonshp among terms and clusters s shown n fgure 3. Each cluster s named by the most frequent term n t. 网 络 路 由 器 路 径 主 干 网 速 率 中 继 广 域 网 延 迟 分 组 交 换 交 换 机 局 域 网 交 换 网 接 入 网 策 略 关 网 键 词 网 格 网 络 拓 扑 骨 干 网 络 搜 索 搜 索 引 擎 网 页 脚 本 主 机 万 维 网 网 站 网 址 带 宽 阻 塞 端 口 监 听 信 息 网 服 务 器 防 火 墙 流 量 计 算 中 心 域 名 主 页 服 务 器 拷 贝 电 子 邮 件 邮 件 邮 箱 日 志 数 据 流 语 音 共 享 信 息 港 智 能 性 信 息 港 因 特 网 智 能 化 电 话 网 宽 带 调 制 解 调 频 率 段 数 字 网 计 算 机 进 程 计 算 机 体 系 结 构 总 线 硬 件 接 口 中 断 控 制 器 寄 存 器 运 算 时 间 段 数 据 共 享 实 时 系 统 调 度 者 进 程 子 系 统 分 系 统 硬 盘 计 算 机 芯 片 大 型 机 巨 型 磁 盘 软 盘 光 盘 主 存 吞 吐 软 驱 外 存 内 存 存 储 器 存 储 缓 冲 缓 存 输 出 驱 动 驱 动 器 适 配 器 网 卡 参 数 矩 阵 数 组 指 针 字 符 串 参 数 静 态 访 问 编 译 器 变 量 字 符 标 量 流 程 图 流 程 结 构 图 逻 辑 图 运 算 符 编 程 控 制 台 编 程 源 程 序 程 序 包 子 程 序 程 序 性 源 代 码 向 量 编 译 调 试 调 试 器 程 序 员 瓶 颈 数 据 包 程 地 址 输 入 提 示 符 键 盘 终 止 符 示 意 图 主 程 序 操 作 符 空 格 数 据 位 读 取 存 取 缓 冲 区 地 址 计 时 器 序 程 序 编 程 者 查 询 分 析 器 数 据 库 数 据 表 冗 余 备 份 标 识 符 计 数 器 初 始 化 程 序 语 句 表 达 式 操 作 数 图 形 切 换 图 形 界 面 请 求 文 本 框 图 标 按 钮 视 图 标 识 号 软 件 软 件 软 件 包 构 建 复 用 重 构 组 件 构 件 重 用 性 模 块 封 装 性 封 装 继 承 私 有 实 例 对 象 信 息 信 息 加 密 令 牌 信 息 论 用 户 名 口 令 登 录 信 息 信 道 二 进 制 编 码 解 码 局 部 译 码 补 码 误 码 纠 错 空 串 算 算 法 队 列 索 引 字 节 标 记 字 段 算 法 二 分 法 数 据 项 回 溯 堆 栈 节 点 调 度 优 先 级 虚 拟 法 执 行 交 换 轮 转 流 水 号 时 序 线 性 执 行 标 识 符 Fg. 2. Parent-chld relatonshps among terms 156

5 Concluson In ths paper we present an approach to automatcally mnng doman-dependent ontologes from corpora based on term extracton and relatonshp dscovery technology. There are two man nnovatons n our approach. One s extractng terms usng log-lkelhood rato, whch s based on the contrastve probablty of term occurrence n doman corpus and background corpus. The other s fusng together nformaton from multple knowledge sources as evdences for dscoverng partcular semantc relatonshps among terms. In our experment, we mprove the tradtonal k-medods algorthm for mult-level clusterng. We have appled our approach to produce an ontology for the doman of computer scence and obtaned promsng results. In the future work, we wll consder developng heurstc methods or rules to label the exact relatons among terms and fndng out a more effectve ontology evaluaton methodology. Reference 1. Indereet Man.Automatcally Inducng Ontologes from Corpora. Proceedngs of CompuTerm 2004: 3rd Internatonal Workshop on Computatonal Termnology, OLIG'2004, Geneva. 2. A. Maedche and S. Staab. Mnng ontology from text. 12th Internatonal Workshop on Knowledge Engneerng and Knowledge Management,2000 (EKAW'2000). 3. A. Maedche and S. Staab. The TEXT-TO-OTO Ontology Learnng Envronment. Software Demonstraton at ICCS-2000-Eght Internatonal Conference on Conceptual Structures. August 14-18, 2000, Darmstadt, Germany. 4. Dr.Sadanand Srvastava, Dr.James Gl de Lamadrd. Extractng an ontology from a document usng Sngular Value Decomposton. ADMI 2001. 5. Ln, D. and Pantel, P. 2001. Inducton of semantc classes from natural language text. In Proceedngs of SIGKDD-01. pp.317-322. SanFrancsco, CA. 6. Dunnng.T. Accurate Methods for the Statstcs of Surprse and Concdence, Computatonal Lngustcs, 19(1); 61-74, March 1993. 7. Chakravarth S Velvadapu, Document Ontology Extractor, Appled research n Computer scence, Fall-2001. 8. A. Maedche and S. Staab. Dscoverng conceptual relatons from text. In Proceedngs of ECAI-2000. IOS Press, Amsterdam, 2000. 9. D. Faure and C. edellec. A corpus-based conceptual clusterng method for verb frames and ontology acquston. In LREC workshop on adaptng lexcal and corpus resources to sublanguages and applcatons, Granada, Span, 1998. 10. Doan, A., Madhavan, J., Domngs, P. and Halevy, A. 2002. Learnng to Map between Ontologes on the Semantc Web. WWW 2002. 157