Finding Structured Data from Unstructured Data for Question Answering
|
|
- Maude Powell
- 8 years ago
- Views:
Transcription
1 2014 International Conference on Information, Communication echnology and System Finding Structured Data from Unstructured Data for Question Answering 1 Dewi W. Wardani, 2itik Musyarofah 12 Informatics Department 12 Sebelas Maret University Surakarta, Central Java, Indonesia 1 dww_ok@uns.ac.id, 2cheerful.rainbow@gmail.com Abstract a mainly research on Automatic Question Answering System commonly uses unstructured data as the data source. he using of structured data for Question Answering has been forgotten by Question Answering system. While the using of unstructured data provides an answer in the form of snippets of sentences or a list of snippets, structured data can produce a precise and concise answer. Structured data has a good quality and non trivial information. Hence, structured data will be very useful for obtaining the precise answer. In our opinion, actually we can find and using structured data from unstructured data. Using it for Question Answering is quite novel idea. herefore, we propose a new idea for finding structured data from unstructured data for Indonesian-language Question answering for obtaining a precise and concise answer. Our approach achieved accuracy 85.37%. is retrieving information on it is not easy and inflexible. Only non native user who knows the schema and knows how to write formal query language can search of structured data. Keywords question answering, structured data, unstructured data. I. INRODUCION Mainly researches on Automatic Question Answering uses unstructured data such as webpage as data source. It returns the answers in the form of snippet of the sentence or list of snippets. Bag-of-words retrieval is popular among automatic Question Answering system developers [1]. According to [2] Question Answering system produces the final answer to the user automatically. Question Answering aims at retrieving precise information from a large collection of documents [3]. Whereas, processing of unstructured data is recognized as one of the problems in information technology [4]. As an example in Figure 1, the result of question in Wikipedia (Indonesian language), is a bag of word, not precise answer. We have to read more to reach the precise answer by ourselves. he extension processing of unstructured data is needed, to make it more useful, moreover about 80% to 85% of the data stored in unstructured format [5]. he precise and concise answers are needed to give satisfaction to the user and this answer probably only can be given by structured data. his is because usually structured data contains high quality and non trivial information, whereas unstructured data returns snippets or the bag of words and users must read those snippets to seek the precise answer. Our previous research [6] demonstrated that using the structured data in information retrieval returns more relevant results, more highly ranked compared with bag-of-words on a sentence retrieval task. One of disadvantage of structured data /14/$ IEEE /14/$ IEEE Fig 1. Result of question on unstructured data In our opinion, actually within unstructured information itself we can obtain structured data. Since much more information and data is unstructured data, this idea will be quite interesting. We do not need to obtain a lot instant structured data such as table or database, or create knowledge based which take a lot time. In this work, we propose a new idea to find structured data from unstructured data (document). In this case we use document in Indonesian language as well for Question Answering. We limit the input question is a question that refers to the date, in formal and informal form. he approach uses pronouns attribute and similarity word approach to calculate the similarity between the question and snippets. Structured approach is also used by finding the structured data from unstructured data and then this structured information is used as a data source. II. RELAED WORKS Our previous research [6] has been integrating structured data and unstructured data. Structured data that has been forgotten by many Question Answering systems or search engines are very useful. his structured data is used to provide precise and concise answers. Users will not care about from which kind of the resource of the relevant information can be ICS 2014, Surabaya, Indonesia ICS 2014, Surabaya, Indonesia 19
2 found, they just want to get the better answers of their question [6] he other previous research in Question Answering for the Indonesian language has been widely applied [7, 2, 8, 9]. [10] Proposed a pattern based approach to Indonesian Question Answering system. Pattern based approach is a form of rulebased approach to categorization questions. [7] Proposed a machine learning approach for Indonesian Question Answering system. hese systems apply the SVM as a machine learning algorithm. Question with kapan (when) as the interrogative word is always a date question [7]. [8] Developed a Question Answering using rule-based method on the text of the Quran (the bible for the Moslem) in Indonesian language. Question Answering in structured data has also been done by [11]. Using the ontology and textual entailment, this Question Answering can be used for one language and across languages. Another research, [1] presented an approach to retrieve for Question Answering, applying structured retrieval techniques to the types of text annotations that Question Answering systems use. [6] Developed a new idea to improve the accuracy of complex questions by integrating structured data in the form of simple relational databases and unstructured data in the form of web pages. In general, Question Answering system architecture is composed of six phases namely Question Analysis, Document Collection Preprocessing, Candidate Document Analysis, Answer Extraction and the last Response Generation [12]. III. CONSIDERED PROBLEM AND IDEA Automatic Question Answering System uses unstructured data, hence the answer is returned in list of snippets. Only structured data can provide precise answer, yet structured data does not widely used in question answering. Our main idea is that actually we can find structured data from unstructured data (documents). We want to provide both structured and unstructured data from the same resource. For unstructured data form simply is a document and the structured data can be obtained from our proposed idea. Fig 2. Annotation structured data from unstructured data Figure 2 is the example of obtaining structured data from unstructured data. Annotation date 31 Desember 1799, explain the information around the date, VOC resmi dibubarkan pemerintah Belanda. In the beginning work, we only consider to the structured information which refers to the date information as well for the question. We expand not only question word kapan but also question word which often be used in informal form such as kapankah, tanggal berapa, or question word that begins with a preposition like pada tanggal berapa etc. In the common previous research 20 just using prefix question word. In this research we do not only using prefix question word but also allows suffix and confix question words. During the experiment related to date we used document from history domain because it contains a lot of date. For example: a. Kapankah Indonesia merdeka? (Prefix question) b. Indonesia merdeka pada tanggal berapa? (Suffix question) c. Sejak kapan Indonesia merdeka? (Confix question) IV. MEHODOLOGY he proposed idea contains of three main steps. Providing the resource, Question analysis and Finding Answer (candidate selection and matching snippets row). A. Providing he Resource Main task of this part is obtaining structured data from unstructured data. We waived tokenizing for the abbreviation for the name, title name, address, position, or rank, followed by a dot (.) as in able 1 ABLE 1. INDONESIAN LANGUAGE ABBREVIAION Abbreviation Name itle Name Address Position or Rank Example M., Moh., Muh., etc Dr., Drs., Ir., Prof., etc Mr., Mrs., Ms., etc KH., R., etc In this initial work, we only consider to obtain structured data refers to the date information. Since the domain of data is history, date is one of most important information in history question answering. his structured data is not like knowledge based, because actually it is obtained by the data itself, different with knowledge base which obtain the knowledge from the expert and need much time to create it. Figure 3, describes an example idea to obtain structured data that refers to date from unstructured data. o generate structured information that refers to when question, date formats such as in able 2 is needed to be obtained. We consider non-pronouns and pronouns approach. Pronouns attributes are required in the discovery of structured information. If there are sentences containing pronouns such as in able 3 then it is considered that these sentence is related to the previous sentence, hence the previous sentence is also taken as a snippet. For example in Figure 3, in detail, we can explain, firstly we need to recognize the date format in unstructured data, and then get the related snippets from the sentences that contains the date and around the date by recognizing the pronouns approach. Firstly, recognize the date 29 Oktober, and then we get also the snippet which contains the date. We will obtain structured data as follow: Date : 29 Oktober Snippet : Mereka juga minta KR mengosongkan kota Bandung bagian utara, paling lambat tanggal 29 Oktober
3 In the next paragraph, we will obtain the structured data: Date : 2 Mei 1889 Snippet : RM Suwardi Suryaningrat lahir pada tanggal 2 Mei 1889 di Yogyakarta 1922, in the snippet which contains the date there is pronouns Beliau. Beliau in Indonesian language is pronouns for him or her. Beliau mendirikan aman Siswa tanggal 3 Juli 1922 dengan tujuan memajukan pendidikan bangsa Indonesia. Beliau refers to Ki Hajar Dewantoro which mentioned in the previous sentence. It means that we must also consider the previous sentence, Namanya lebih dikenal dengan Ki Hajar Dewantoro., in the end we will obtain the structured data: Date : 3 Juli 1922 Snippet : Namanya lebih dikenal dengan Ki Hajar Dewantoro. Beliau mendirikan aman Siswa tanggal 3 Juli 1922 dengan tujuan memajukan pendidikan bangsa Indonesia. As follow is pseudo code of this approach: Input : Sentence[i] Output : Snippet Step : Begin if there is persona pronouns in Sentence[i] then if there is pointer pronouns in Sentence[i+1] then Snippet = Sentence[i-1] + Sentence[i] + Sentence[i+1] ; else Snippet = Sentence[i-1] + Sentence[i]; end if else if terdapat pointer pronouns pada Sentence[i+1] then Snippet = Sentence[i] + Sentence[i+1] ; else Snippet = Sentence[i]; end if end if end o generate structured information that refers to date, need to obtain date formats case in Indonesian language, such as in able 2. ABLE 2. DAE FORMAS Format [year] [month] [year] [day] [month] [day] [month] [year] [year]-[ year] [year] sampai [year] [year] sampai dengan [year] [day]-[ day] [month] [year] [day] [month] sampai [day] [month] [year] Example 1512 Maret Agustus 1 Januari sampai sampai dengan Oktober Agustus sampai 2 November 1949 Fig 3. Example finding structured data unstructured data using nonpronouns (above) and using pronouns approach (below) he other example in the sentences, recognized date 3 Juli 21
4 Here are the pronouns in Indonesian language ABLE 3. PRONOUNS IN INDONESIAN LANGUAGE [13] Persona First Second hird Meaning Single Saya, aku, daku, ku-,-ku Engkau, kamu, anda, dikau, kau-,-mu Ia, dia, beliau, -nya Plural Kami, kita Kalian, kamu (sekalian), Anda (sekalian) Mereka, -nya B. Question Analysis Several form of the type of questions that refer to time, as in table 4, and some examples question that refers to when in Indonesian language. ABLE 4. QUESION YPE REFERS O WHEN IN INDONESIAN LANGUAGE Prefix Where qt is a when question word C. Finding Answer here are two main steps on finding answer. First, is simple matching token question with a list of index words to find candidates of snippets? Each snippet containing at least one or more tokens of question will become candidates. Similarity measurement is done by determining the similarity between the candidate snippets of the question by using the simple approach. Formula 1 is derived from cosine similarity, this method is well-known method and often used to calculate the similarity of documents [14]., Kapan (kah) [ ] (1) anggal berapa (kah) [ ] ahun berapa (kah) [ ] Pada tanggal berapa (kah) [ ] Pada tahun berapa (kah) [ ] Setiap tanggal berapa (kah) [ ] Suffix and Confix [ ] kapan / [ ] kapan [ ] [ ] tanggal/ [ ] tanggal [ ] [ ] tahun/ [ ] tahun [ ] [ ] tanggal berapa/ [ ] tanggal berapa [ ] [ ] tahun berapa/ [ ] tahun berapa [ ] [ ] pada tanggal/ [ ] pada tanggal [ ] Where, q is question,si is snippet- i, wq,j is weight of term j on query and wi,j is weight of snippet i to term j Second step is a process of sorting snippet candidates is done where snippet that has similarity highest values compared to most other snippets are answers of questions. he highest value of a snippet will always be changed according to a given question. It s a simple general syntax of SQL to display the candidate of snippets that have been sequenced and to display the exact answer (date) from the snippet that have the highest similarity value. [ ] pada tanggal berapa/ [ ] pada tanggal berapa [ ] [ ] pada tahun/ [ ] pada tahun [ ] [ ] pada tahun berapa/ [ ] pada tahun berapa [ ] [ ] pada bulan/ [ ] pada bulan [ ] [ ] pada/ [ ] pada [ ] [ ] setiap tanggal/ [ ] setiap tanggal [ ] [ ] tahun berapakah/ [ ] tahun berapakah [ ] ABLE 5. EXAMPLE OF QUESION YPE REFERS O WHEN IN INDONESIAN LANGUAGE No Question Kapan ugu Proklamasi didirikan? Kapankah VOC mulai bangkrut? anggal berapakah BPUPKI melakukan sidang kedua? Pada tahun berapakah Bangsa Portugis datang ke Indonesia? Pertempuran Medan Area terjadi pada tanggal? Serangan Umum Satu Maret terjadi pada? aman Siswa didirikan kapan? Setiap tanggal berapakah diperingati hari lahir Pancasila? General forms for those questions are [Qt] +. +?... + [qt] +?... + [qt] +.. +? 22 V. EXPERIMEN As dataset we use electronic history book from [15]. From these ebooks we get more than 200 snippets. We asked the human expert to create factoid questions at once the answer of questions based on the ebook, it is about 123 questions with all variation of types of questions as we described in the previous section. In the experiment, we use correctness metric to measure accuracy of the answer by comparing the system answer with the human expert answer, as described in Formula 2. (2) Where, accuracy in percentage, sr is number of the relevant answer and s number of all questions Over all of the testing, we get 105 relevant answers and 18 not relevant answer, hence we got accuracy about %. It is promising number for accuracy and our proposed idea, that the discovery of structured data from unstructured data can be pretty useful in question answering. In table 6 are the example result of experimental, the true answer and the false one.
5 VI. CONCLUSION AND FUURE WORKS We have proposed how to obtain structure data from unstructured data and use both data in Indonesian language s QA in history domain. he accuracy result approach %, it s promising result. his research is still has limitation in the style of the questions. As the beginning research, we still put the limitation for question that refers to date. For next work, we expect the question will be expanded not only questions that refer to date, when question, but also all kinds of question such as what, where, who, why and how. ACKNOWLEDGMEN Fig 4. Framework of our approach ABLE 6. HE EXAMPLE OF EXPERIMENAL RESUL Question Expert Kapan dilakukan penyerangan Kedua untuk mengusir Portugis di Kerajaan Demak? Kapan terjadi perlawanan kaum Padri yang dipimpin oleh uanku Imam Bonjol? anggal berapa Sekutu membebaskan tentaranya yang ditawan di kamp-kamp Belanda? ahun berapa Belanda melakukan penarikan iuran kepada masyarakat Indonesia? Pada tanggal berapa KR mengepung Ambarawa? Pada tahun berapakah uanku Imam Bonjol Wafat? Sekutu mendarat di Belawan, Medan tanggal? /F 1513 Proposed Approach F We would like to thanks to Ministry Education of Indonesia that provide [15] and let the author to use their data (electronic book) and [16] that let the author to use Xpdf open source. We also thank to Puji Darwati from History Department, our historian in providing question in Indonesian history domain. REFERENCES [1] [2] 10 Oktober 10 Oktober [3] [4] 12 Desember 12 Desember 1864 [5] [6] Oktober Oktober [7] [8] Serikat Islam melakukan berbagai gerakan pemogokan tahun? Pusat enaga Rakyat (Putera) didirikan tanggal berapa? Perang Dunia II terjadi pada tahun? [9] 1 Maret Maret F [10] [11] [12] [13] [14] [15] [16] M. W. Bilotti et al, "Structured Retrieval for Question Answering", SIGIR'07 Proceding, Pages , Amsterdam, H. oba. Analisis Semantik dengan Representasi First Order Logic dalam Sistem anya Jawab. University of Indonesia, Bandung M. R. Kangavari, S. Ghandchi, and M. Golpour, "Information Retrieval : Improving Question Answering Systems by Query Reformulation and Answer Validation", World Academy of Science, engineering and echnologi, pages , Iran, R. Blumberg and S. Atre, he Problem with Unstructured Data. Information Management Magazine had been accessed at 30 April A. Harbison and P. Ryan. he Problem of Analysing Unstructured Data. Grant hornton International Ltd, Ireland, D. W. Wardani, Finding structured and unstructured features to improve the search result of complex question, National Cheng-Kung University, A. Purwarianti, M. suchiya and S. Nakagawa, A Machine Learning Approach for Indonesian Question Answering System, University of echnologi, Japan. M. D. Anggaeny, Implementasi Question Answering System dengan Metode Rule Based pada erjemahan Al Qur an Surat Al Baqarah. Institut Pertanian Bogor, R. Mahendra, S. D. Larasati, and R. Manurung. Extending an Indonesian Semantic Analysis-based Question Answering System with Linguistic and World Knowledge Axioms, H. oba and M. Adriani, Pattern Based Approach in Indonesian Question Answering System, University of Indonesia, Bandung. B. Sacaleanu et al. Entailment-based Question Answering for Structured Data, Companion volume Posters and Demonstrations, pages , Manchester, D. Grossman, Information Retrieval, had been accessed at 26 April 2011 A. Moeliono, ata Bahasa Baku Bahasa Indonesia. Jakarta : Balai Pustaka, 1988 D. L. Lee, H. Chuang and K. Seamons. Document Ranking and the Vector-Space Model, IEEE Software,
6 24 his page is left blank on purpose
Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words
, pp.290-295 http://dx.doi.org/10.14257/astl.2015.111.55 Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words Irfan
More informationSearch and Information Retrieval
Search and Information Retrieval Search on the Web 1 is a daily activity for many people throughout the world Search and communication are most popular uses of the computer Applications involving search
More informationTransformation of Free-text Electronic Health Records for Efficient Information Retrieval and Support of Knowledge Discovery
Transformation of Free-text Electronic Health Records for Efficient Information Retrieval and Support of Knowledge Discovery Jan Paralic, Peter Smatana Technical University of Kosice, Slovakia Center for
More informationKnowledge Discovery using Text Mining: A Programmable Implementation on Information Extraction and Categorization
Knowledge Discovery using Text Mining: A Programmable Implementation on Information Extraction and Categorization Atika Mustafa, Ali Akbar, and Ahmer Sultan National University of Computer and Emerging
More informationA Survey on Product Aspect Ranking
A Survey on Product Aspect Ranking Charushila Patil 1, Prof. P. M. Chawan 2, Priyamvada Chauhan 3, Sonali Wankhede 4 M. Tech Student, Department of Computer Engineering and IT, VJTI College, Mumbai, Maharashtra,
More informationA Comparative Approach to Search Engine Ranking Strategies
26 A Comparative Approach to Search Engine Ranking Strategies Dharminder Singh 1, Ashwani Sethi 2 Guru Gobind Singh Collage of Engineering & Technology Guru Kashi University Talwandi Sabo, Bathinda, Punjab
More informationClustering Technique in Data Mining for Text Documents
Clustering Technique in Data Mining for Text Documents Ms.J.Sathya Priya Assistant Professor Dept Of Information Technology. Velammal Engineering College. Chennai. Ms.S.Priyadharshini Assistant Professor
More informationCHAPTER 5 INTELLIGENT TECHNIQUES TO PREVENT SQL INJECTION ATTACKS
66 CHAPTER 5 INTELLIGENT TECHNIQUES TO PREVENT SQL INJECTION ATTACKS 5.1 INTRODUCTION In this research work, two new techniques have been proposed for addressing the problem of SQL injection attacks, one
More informationKeywords cosine similarity, correlation, standard deviation, page count, Enron dataset
Volume 4, Issue 1, January 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Cosine Similarity
More informationGraphical Web based Tool for Generating Query from Star Schema
Graphical Web based Tool for Generating Query from Star Schema Mohammed Anbar a, Ku Ruhana Ku-Mahamud b a College of Arts and Sciences Universiti Utara Malaysia, 0600 Sintok, Kedah, Malaysia Tel: 604-2449604
More informationTowards SoMEST Combining Social Media Monitoring with Event Extraction and Timeline Analysis
Towards SoMEST Combining Social Media Monitoring with Event Extraction and Timeline Analysis Yue Dai, Ernest Arendarenko, Tuomo Kakkonen, Ding Liao School of Computing University of Eastern Finland {yvedai,
More informationEffective Data Retrieval Mechanism Using AML within the Web Based Join Framework
Effective Data Retrieval Mechanism Using AML within the Web Based Join Framework Usha Nandini D 1, Anish Gracias J 2 1 ushaduraisamy@yahoo.co.in 2 anishgracias@gmail.com Abstract A vast amount of assorted
More informationInformation Retrieval System Assigning Context to Documents by Relevance Feedback
Information Retrieval System Assigning Context to Documents by Relevance Feedback Narina Thakur Department of CSE Bharati Vidyapeeth College Of Engineering New Delhi, India Deepti Mehrotra ASCS Amity University,
More informationNatural Language to Relational Query by Using Parsing Compiler
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 3, March 2015,
More informationHexaware E-book on Predictive Analytics
Hexaware E-book on Predictive Analytics Business Intelligence & Analytics Actionable Intelligence Enabled Published on : Feb 7, 2012 Hexaware E-book on Predictive Analytics What is Data mining? Data mining,
More informationANALYSIS AND EVALUATION SNORT, BRO, AND SURICATA AS INTRUSION DETECTION SYSTEM BASED ON LINUX SERVER
ANALYSIS AND EVALUATION SNORT, BRO, AND SURICATA AS INTRUSION DETECTION SYSTEM BASED ON LINUX SERVER Paper Department of Informatics Faculty of Communications and Informatics By: M. Faqih Ridho Fatah Yasin,
More informationA STUDY ON MOTIVATION TO START UP A BUSINESS AMONG CHINESE ENTREPRENEURS
A STUDY ON MOTIVATION TO START UP A BUSINESS AMONG CHINESE ENTREPRENEURS A master project submitted to the Graduate School in partial fulfillment of the requirements for the degree Master of Business Administration,
More informationSemantic Stored Procedures Programming Environment and performance analysis
Semantic Stored Procedures Programming Environment and performance analysis Marjan Efremov 1, Vladimir Zdraveski 2, Petar Ristoski 2, Dimitar Trajanov 2 1 Open Mind Solutions Skopje, bul. Kliment Ohridski
More informationEXPERIMENTAL ANALYSIS OF PASSIVE BANDWIDTH ESTIMATION TOOL FOR MULTIPLE HOP WIRELESS NETWORKS NURUL AMIRAH BINTI ABDULLAH
EXPERIMENTAL ANALYSIS OF PASSIVE BANDWIDTH ESTIMATION TOOL FOR MULTIPLE HOP WIRELESS NETWORKS NURUL AMIRAH BINTI ABDULLAH THESIS SUBMITTED IN FULFILLMENT OF THE DEGREE OF COMPUTER SCIENCE FACULTY OF COMPUTER
More informationElectronic Document Management Using Inverted Files System
EPJ Web of Conferences 68, 0 00 04 (2014) DOI: 10.1051/ epjconf/ 20146800004 C Owned by the authors, published by EDP Sciences, 2014 Electronic Document Management Using Inverted Files System Derwin Suhartono,
More informationIT services for analyses of various data samples
IT services for analyses of various data samples Ján Paralič, František Babič, Martin Sarnovský, Peter Butka, Cecília Havrilová, Miroslava Muchová, Michal Puheim, Martin Mikula, Gabriel Tutoky Technical
More informationwww.bintangbahasa.com Hi and welcome to Bintang Bahasa. Selamat datang di Bintang Bahasa.
www.bintangbahasa.com Hi and welcome to Bintang Bahasa. Selamat datang di Bintang Bahasa. My name s Brian and my friend s name is Pak Keban. Nama saya Pak Keban. In this audio we ll learn some basic words
More informationDEVELOP AND DESIGN SHEMATIC DIAGRAM AND MECHANISM ON ONE SEATER DRAG BUGGY MUHAMMAD IBRAHIM B MD NUJID
DEVELOP AND DESIGN SHEMATIC DIAGRAM AND MECHANISM ON ONE SEATER DRAG BUGGY MUHAMMAD IBRAHIM B MD NUJID A report in partial fulfillment of the requirements For award of the Diploma of Mechanical Engineering
More informationOntology based ranking of documents using Graph Databases: a Big Data Approach
Ontology based ranking of documents using Graph Databases: a Big Data Approach A.M.Abirami Dept. of Information Technology Thiagarajar College of Engineering Madurai, Tamil Nadu, India Dr.A.Askarunisa
More informationSearch Result Optimization using Annotators
Search Result Optimization using Annotators Vishal A. Kamble 1, Amit B. Chougule 2 1 Department of Computer Science and Engineering, D Y Patil College of engineering, Kolhapur, Maharashtra, India 2 Professor,
More informationIntroduction to IR Systems: Supporting Boolean Text Search. Information Retrieval. IR vs. DBMS. Chapter 27, Part A
Introduction to IR Systems: Supporting Boolean Text Search Chapter 27, Part A Database Management Systems, R. Ramakrishnan 1 Information Retrieval A research field traditionally separate from Databases
More informationKeywords social media, internet, data, sentiment analysis, opinion mining, business
Volume 5, Issue 8, August 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Real time Extraction
More informationSPAM FILTERING USING BAYESIAN TECHNIQUE BASED ON INDEPENDENT FEATURE SELECTION MASURAH BINTI MOHAMAD
SPAM FILTERING USING BAYESIAN TECHNIQUE BASED ON INDEPENDENT FEATURE SELECTION MASURAH BINTI MOHAMAD A project report submitted in partial fulfillment of the requirements for the award of the degree of
More informationLANGUAGE, CULTURE AND SOCIETY: A THEORETICAL ANALYSIS OF STUART HALL'S REPRESENTATION AND SIGNIFYING PRACTICES
LANGUAGE, CULTURE AND SOCIETY: A THEORETICAL ANALYSIS OF STUART HALL'S REPRESENTATION AND SIGNIFYING PRACTICES Mahmud Layan Hutasuhut Fakultas Bahasa dan Seni Universitas Negeri Medan ABSTRACT Language
More informationA QoS-Aware Web Service Selection Based on Clustering
International Journal of Scientific and Research Publications, Volume 4, Issue 2, February 2014 1 A QoS-Aware Web Service Selection Based on Clustering R.Karthiban PG scholar, Computer Science and Engineering,
More informationTHE DEVELOPMENT OF ANDROID MOBILE GAME AS SENIOR HIGH SCHOOL LEARNING MEDIA ON RATE REACTION AND CHEMICAL EQUILIBRIUM
Proceeding of International Conference On Research, Implementation And Education Of Mathematics And Sciences 2014, Yogyakarta State University, 18-20 May 2014 THE DEVELOPMENT OF ANDROID MOBILE GAME AS
More informationOCBC GREAT EASTERN CO-BRAND CARD FREQUENTLY ASKED QUESTIONS (FAQ) REBATE FEATURES, INTEREST FREE AUTO INSTALMENT PAYMENT PLAN (AUTO-IPP) AND BENEFITS
OCBC GREAT EASTERN CO-BRAND CARD FREQUENTLY ASKED QUESTIONS (FAQ) REBATE FEATURES, INTEREST FREE AUTO INSTALMENT PAYMENT PLAN (AUTO-IPP) AND BENEFITS 1. What benefits can I get when I use the OCBC Great
More informationNatural Language Query Processing for Relational Database using EFFCN Algorithm
International Journal of Computer Sciences and Engineering Open Access Research Paper Volume-4, Issue-02 E-ISSN: 2347-2693 Natural Language Query Processing for Relational Database using EFFCN Algorithm
More informationOptimization of Internet Search based on Noun Phrases and Clustering Techniques
Optimization of Internet Search based on Noun Phrases and Clustering Techniques R. Subhashini Research Scholar, Sathyabama University, Chennai-119, India V. Jawahar Senthil Kumar Assistant Professor, Anna
More informationA Proposed Algorithm for Spam Filtering Emails by Hash Table Approach
International Research Journal of Applied and Basic Sciences 2013 Available online at www.irjabs.com ISSN 2251-838X / Vol, 4 (9): 2436-2441 Science Explorer Publications A Proposed Algorithm for Spam Filtering
More informationThe Enron Corpus: A New Dataset for Email Classification Research
The Enron Corpus: A New Dataset for Email Classification Research Bryan Klimt and Yiming Yang Language Technologies Institute Carnegie Mellon University Pittsburgh, PA 15213-8213, USA {bklimt,yiming}@cs.cmu.edu
More informationFUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM
International Journal of Innovative Computing, Information and Control ICIC International c 0 ISSN 34-48 Volume 8, Number 8, August 0 pp. 4 FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT
More informationSEARCH ENGINE WITH PARALLEL PROCESSING AND INCREMENTAL K-MEANS FOR FAST SEARCH AND RETRIEVAL
SEARCH ENGINE WITH PARALLEL PROCESSING AND INCREMENTAL K-MEANS FOR FAST SEARCH AND RETRIEVAL Krishna Kiran Kattamuri 1 and Rupa Chiramdasu 2 Department of Computer Science Engineering, VVIT, Guntur, India
More informationDEVELOPING AN ISP FOR HOTEL INDUSTRY: A CASE STUDY ON PUTRA PALACE HOTEL
DEVELOPING AN ISP FOR HOTEL INDUSTRY: A CASE STUDY ON PUTRA PALACE HOTEL A report submitted to the Graduate School in partial fulfillment of the requirement for the Degree Master of Science (Information
More informationIMPROVING SERVICE REUSABILITY USING ENTERPRISE SERVICE BUS AND BUSINESS PROCESS EXECUTION LANGUAGE AKO ABUBAKR JAAFAR
IMPROVING SERVICE REUSABILITY USING ENTERPRISE SERVICE BUS AND BUSINESS PROCESS EXECUTION LANGUAGE AKO ABUBAKR JAAFAR A project report submitted in partial fulfillment of the requirements for the award
More informationNew Web tool to create educational and adaptive courses in an E-Learning platform based fusion of Web resources
New Web tool to create educational and adaptive courses in an E-Learning platform based fusion of Web resources Mohammed Chaoui 1, Mohamed Tayeb Laskri 2 1,2 Badji Mokhtar University Annaba, Algeria 1
More informationONLINE RESUME PARSING SYSTEM USING TEXT ANALYTICS
ONLINE RESUME PARSING SYSTEM USING TEXT ANALYTICS Divyanshu Chandola 1, Aditya Garg 2, Ankit Maurya 3, Amit Kushwaha 4 1 Student, Department of Information Technology, ABES Engineering College, Uttar Pradesh,
More informationSentiment Analysis on Big Data
SPAN White Paper!? Sentiment Analysis on Big Data Machine Learning Approach Several sources on the web provide deep insight about people s opinions on the products and services of various companies. Social
More informationExperiments in Web Page Classification for Semantic Web
Experiments in Web Page Classification for Semantic Web Asad Satti, Nick Cercone, Vlado Kešelj Faculty of Computer Science, Dalhousie University E-mail: {rashid,nick,vlado}@cs.dal.ca Abstract We address
More informationCOSC 3351 Software Design. Architectural Design (II) Edgar Gabriel. Spring 2008. Virtual Machine
COSC 3351 Software Design Architectural Design (II) Spring 2008 Virtual Machine A software system of virtual machine architecture usually consists of 4 components: Program component: stores the program
More informationAssisting bug Triage in Large Open Source Projects Using Approximate String Matching
Assisting bug Triage in Large Open Source Projects Using Approximate String Matching Amir H. Moin and Günter Neumann Language Technology (LT) Lab. German Research Center for Artificial Intelligence (DFKI)
More informationASSESSMENT ON THE IMPLEMENTATION OF INTERNAL QUALITY ASSURANCE AT HIGHER EDUCATION (AN INDONESIAN REPORT)
ASSESSMENT ON THE IMPLEMENTATION OF INTERNAL QUALITY ASSURANCE AT HIGHER EDUCATION (AN INDONESIAN REPORT) Ikhfan Haris Faculty of Education State University of Gorontalo INDONESIA ifanharis@ung.ac.id Abstract
More informationDistributed Database for Environmental Data Integration
Distributed Database for Environmental Data Integration A. Amato', V. Di Lecce2, and V. Piuri 3 II Engineering Faculty of Politecnico di Bari - Italy 2 DIASS, Politecnico di Bari, Italy 3Dept Information
More informationSemantic Concept Based Retrieval of Software Bug Report with Feedback
Semantic Concept Based Retrieval of Software Bug Report with Feedback Tao Zhang, Byungjeong Lee, Hanjoon Kim, Jaeho Lee, Sooyong Kang, and Ilhoon Shin Abstract Mining software bugs provides a way to develop
More informationHow To Use Neural Networks In Data Mining
International Journal of Electronics and Computer Science Engineering 1449 Available Online at www.ijecse.org ISSN- 2277-1956 Neural Networks in Data Mining Priyanka Gaur Department of Information and
More informationWeb Mining. Margherita Berardi LACAM. Dipartimento di Informatica Università degli Studi di Bari berardi@di.uniba.it
Web Mining Margherita Berardi LACAM Dipartimento di Informatica Università degli Studi di Bari berardi@di.uniba.it Bari, 24 Aprile 2003 Overview Introduction Knowledge discovery from text (Web Content
More informationInformation of Online Registration, Payment and IELTS Test Implementation
Information of Online Registration, Payment and IELTS Test Implementation What you need 1. Original ID (Passport/Indonesia ID - KTP) You will need to UPLOAD this document 2. VISA / Master Card or Pay Pall
More informationATLAS (Application for Tracking and Scheduling) as Location Guide and Academic Schedulling at Campus YSU (Yogyakarta State University)
International Journal of Computer and Communication Engineering, Vol. 3, No. 4, July 4 ATLAS (Application for Tracking and Scheduling) as Location Guide and Academic Schedulling at Campus YSU (Yogyakarta
More informationA Statistical Text Mining Method for Patent Analysis
A Statistical Text Mining Method for Patent Analysis Department of Statistics Cheongju University, shjun@cju.ac.kr Abstract Most text data from diverse document databases are unsuitable for analytical
More informationFair Scheduling Algorithm with Dynamic Load Balancing Using In Grid Computing
Research Inventy: International Journal Of Engineering And Science Vol.2, Issue 10 (April 2013), Pp 53-57 Issn(e): 2278-4721, Issn(p):2319-6483, Www.Researchinventy.Com Fair Scheduling Algorithm with Dynamic
More informationEmail Spam Detection Using Customized SimHash Function
International Journal of Research Studies in Computer Science and Engineering (IJRSCSE) Volume 1, Issue 8, December 2014, PP 35-40 ISSN 2349-4840 (Print) & ISSN 2349-4859 (Online) www.arcjournals.org Email
More informationApplying Performance Dashboard in Hospitals
, pp. 213-220 http://dx.doi.org/10.14257/ijseia.2015.9.1.19 Applying Performance Dashboard in Hospitals Andre M. R. Wajong Industrial Engineering, Faculty of Engineering, Bina Nusantara University (Syahdan
More informationPerformance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification
Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification Tina R. Patil, Mrs. S. S. Sherekar Sant Gadgebaba Amravati University, Amravati tnpatil2@gmail.com, ss_sherekar@rediffmail.com
More informationKybots, knowledge yielding robots German Rigau IXA group, UPV/EHU http://ixa.si.ehu.es
KYOTO () Intelligent Content and Semantics Knowledge Yielding Ontologies for Transition-Based Organization http://www.kyoto-project.eu/ Kybots, knowledge yielding robots German Rigau IXA group, UPV/EHU
More informationCOMPETENCY-BASED TEACHING AND LEARNING AT SENIOR HIGH SCHOOLS
COMPETENCY-BASED TEACHING AND LEARNING AT SENIOR HIGH SCHOOLS Anik Nunuk Wulyani Jurusan Sastra Inggris Fak. Sastra Universitas Negeri Malang Abstract: This article discusses practical ideas teachers,
More informationSearch Engine Based Intelligent Help Desk System: iassist
Search Engine Based Intelligent Help Desk System: iassist Sahil K. Shah, Prof. Sheetal A. Takale Information Technology Department VPCOE, Baramati, Maharashtra, India sahilshahwnr@gmail.com, sheetaltakale@gmail.com
More informationArchitecture of an Ontology-Based Domain- Specific Natural Language Question Answering System
Architecture of an Ontology-Based Domain- Specific Natural Language Question Answering System Athira P. M., Sreeja M. and P. C. Reghuraj Department of Computer Science and Engineering, Government Engineering
More informationDomain Classification of Technical Terms Using the Web
Systems and Computers in Japan, Vol. 38, No. 14, 2007 Translated from Denshi Joho Tsushin Gakkai Ronbunshi, Vol. J89-D, No. 11, November 2006, pp. 2470 2482 Domain Classification of Technical Terms Using
More informationAssisting bug Triage in Large Open Source Projects Using Approximate String Matching
Assisting bug Triage in Large Open Source Projects Using Approximate String Matching Amir H. Moin and Günter Neumann Language Technology (LT) Lab. German Research Center for Artificial Intelligence (DFKI)
More informationSINAI at WEPS-3: Online Reputation Management
SINAI at WEPS-3: Online Reputation Management M.A. García-Cumbreras, M. García-Vega F. Martínez-Santiago and J.M. Peréa-Ortega University of Jaén. Departamento de Informática Grupo Sistemas Inteligentes
More informationSemantic Search in Portals using Ontologies
Semantic Search in Portals using Ontologies Wallace Anacleto Pinheiro Ana Maria de C. Moura Military Institute of Engineering - IME/RJ Department of Computer Engineering - Rio de Janeiro - Brazil [awallace,anamoura]@de9.ime.eb.br
More informationFacilitating Knowledge Intelligence Using ANTOM with a Case Study of Learning Religion
Facilitating Knowledge Intelligence Using ANTOM with a Case Study of Learning Religion Herbert Y.C. Lee 1, Kim Man Lui 1 and Eric Tsui 2 1 Marvel Digital Ltd., Hong Kong {Herbert.lee,kimman.lui}@marvel.com.hk
More informationNetwork Big Data: Facing and Tackling the Complexities Xiaolong Jin
Network Big Data: Facing and Tackling the Complexities Xiaolong Jin CAS Key Laboratory of Network Data Science & Technology Institute of Computing Technology Chinese Academy of Sciences (CAS) 2015-08-10
More informationFolksonomies versus Automatic Keyword Extraction: An Empirical Study
Folksonomies versus Automatic Keyword Extraction: An Empirical Study Hend S. Al-Khalifa and Hugh C. Davis Learning Technology Research Group, ECS, University of Southampton, Southampton, SO17 1BJ, UK {hsak04r/hcd}@ecs.soton.ac.uk
More informationEnhancing the relativity between Content, Title and Meta Tags Based on Term Frequency in Lexical and Semantic Aspects
Enhancing the relativity between Content, Title and Meta Tags Based on Term Frequency in Lexical and Semantic Aspects Mohammad Farahmand, Abu Bakar MD Sultan, Masrah Azrifah Azmi Murad, Fatimah Sidi me@shahroozfarahmand.com
More informationAn efficient Join-Engine to the SQL query based on Hive with Hbase Zhao zhi-cheng & Jiang Yi
International Conference on Applied Science and Engineering Innovation (ASEI 2015) An efficient Join-Engine to the SQL query based on Hive with Hbase Zhao zhi-cheng & Jiang Yi Institute of Computer Forensics,
More informationSemantically Enhanced Web Personalization Approaches and Techniques
Semantically Enhanced Web Personalization Approaches and Techniques Dario Vuljani, Lidia Rovan, Mirta Baranovi Faculty of Electrical Engineering and Computing, University of Zagreb Unska 3, HR-10000 Zagreb,
More informationEfficiently Identifying Inclusion Dependencies in RDBMS
Efficiently Identifying Inclusion Dependencies in RDBMS Jana Bauckmann Department for Computer Science, Humboldt-Universität zu Berlin Rudower Chaussee 25, 12489 Berlin, Germany bauckmann@informatik.hu-berlin.de
More informationImproving Web Page Readability by Plain Language
www.ijcsi.org 315 Improving Web Page Readability by Plain Language Walayat Hussain 1, Osama Sohaib 2 and Arif Ali 3 1 Department of Computer Science, Balochistan University of I.T. Engineering and Management
More informationInteractive Recovery of Requirements Traceability Links Using User Feedback and Configuration Management Logs
Interactive Recovery of Requirements Traceability Links Using User Feedback and Configuration Management Logs Ryosuke Tsuchiya 1, Hironori Washizaki 1, Yoshiaki Fukazawa 1, Keishi Oshima 2, and Ryota Mibe
More informationMEMBERSHIP LOCALIZATION WITHIN A WEB BASED JOIN FRAMEWORK
MEMBERSHIP LOCALIZATION WITHIN A WEB BASED JOIN FRAMEWORK 1 K. LALITHA, 2 M. KEERTHANA, 3 G. KALPANA, 4 S.T. SHWETHA, 5 M. GEETHA 1 Assistant Professor, Information Technology, Panimalar Engineering College,
More informationAn Approach to support Web Service Classification and Annotation
An Approach to support Web Service Classification and Annotation Marcello Bruno, Gerardo Canfora, Massimiliano Di Penta, and Rita Scognamiglio marcello.bruno@unisannio.it, canfora@unisannio.it, dipenta@unisannio.it,
More informationMULTI AGENT-BASED DISTRIBUTED DATA MINING
MULTI AGENT-BASED DISTRIBUTED DATA MINING REECHA B. PRAJAPATI 1, SUMITRA MENARIA 2 Department of Computer Science and Engineering, Parul Institute of Technology, Gujarat Technology University Abstract:
More informationBig Data Text Mining and Visualization. Anton Heijs
Copyright 2007 by Treparel Information Solutions BV. This report nor any part of it may be copied, circulated, quoted without prior written approval from Treparel7 Treparel Information Solutions BV Delftechpark
More informationA Framework for Personalized Healthcare Service Recommendation
A Framework for Personalized Healthcare Service Recommendation Choon-oh Lee, Minkyu Lee, Dongsoo Han School of Engineering Information and Communications University (ICU) Daejeon, Korea {lcol, niklaus,
More informationSIMILAR THESAURUS BASED ON ARABIC DOCUMENT: AN OVERVIEW AND COMPARISON
SIMILAR THESAURUS BASED ON ARABIC DOCUMENT: AN OVERVIEW AND COMPARISON Essam S. Hanandeh, Department of Computer Information System, Zarqa University, Zarqa, Jordan Hanandeh@zu.edu.jo ABSTRACT The massive
More informationA Rule-Based Short Query Intent Identification System
A Rule-Based Short Query Intent Identification System Arijit De 1, Sunil Kumar Kopparapu 2 TCS Innovation Labs-Mumbai Tata Consultancy Services Pokhran Road No. 2, Thane West, Maharashtra 461, India 1
More informationHow to write a technique essay? A student version. Presenter: Wei-Lun Chao Date: May 17, 2012
How to write a technique essay? A student version Presenter: Wei-Lun Chao Date: May 17, 2012 1 Why this topic? Don t expect others to revise / modify your paper! Everyone has his own writing / thinkingstyle.
More informationThe University of Amsterdam s Question Answering System at QA@CLEF 2007
The University of Amsterdam s Question Answering System at QA@CLEF 2007 Valentin Jijkoun, Katja Hofmann, David Ahn, Mahboob Alam Khalid, Joris van Rantwijk, Maarten de Rijke, and Erik Tjong Kim Sang ISLA,
More informationUnderstanding Web personalization with Web Usage Mining and its Application: Recommender System
Understanding Web personalization with Web Usage Mining and its Application: Recommender System Manoj Swami 1, Prof. Manasi Kulkarni 2 1 M.Tech (Computer-NIMS), VJTI, Mumbai. 2 Department of Computer Technology,
More informationENHANCED WEB IMAGE RE-RANKING USING SEMANTIC SIGNATURES
International Journal of Computer Engineering & Technology (IJCET) Volume 7, Issue 2, March-April 2016, pp. 24 29, Article ID: IJCET_07_02_003 Available online at http://www.iaeme.com/ijcet/issues.asp?jtype=ijcet&vtype=7&itype=2
More informationEvaluation of Sub Query Performance in SQL Server
EPJ Web of Conferences 68, 033 (2014 DOI: 10.1051/ epjconf/ 201468033 C Owned by the authors, published by EDP Sciences, 2014 Evaluation of Sub Query Performance in SQL Server Tanty Oktavia 1, Surya Sujarwo
More informationA Survey on Web Mining From Web Server Log
A Survey on Web Mining From Web Server Log Ripal Patel 1, Mr. Krunal Panchal 2, Mr. Dushyantsinh Rathod 3 1 M.E., 2,3 Assistant Professor, 1,2,3 computer Engineering Department, 1,2 L J Institute of Engineering
More informationTECHNOLOGY ANALYSIS FOR INTERNET OF THINGS USING BIG DATA LEARNING
TECHNOLOGY ANALYSIS FOR INTERNET OF THINGS USING BIG DATA LEARNING Sunghae Jun 1 1 Professor, Department of Statistics, Cheongju University, Chungbuk, Korea Abstract The internet of things (IoT) is an
More informationImplementing Heuristic Miner for Different Types of Event Logs
Implementing Heuristic Miner for Different Types of Event Logs Angelina Prima Kurniati 1, GunturPrabawa Kusuma 2, GedeAgungAry Wisudiawan 3 1,3 School of Compuing, Telkom University, Indonesia. 2 School
More informationA GrAF-compliant Indonesian Speech Recognition Web Service on the Language Grid for Transcription Crowdsourcing
A GrAF-compliant Indonesian Speech Recognition Web Service on the Language Grid for Transcription Crowdsourcing LAW VI JEJU 2012 Bayu Distiawan Trisedya & Ruli Manurung Faculty of Computer Science Universitas
More informationDevelopment of Geospatial Dashboard with Analytic Hierarchy Processing for the Expansion of Branch Office Location
Development of Geospatial Dashboard with Analytic Hierarchy Processing for the Expansion of Branch Office Location Adrian Nuradiansyah 1, Indra Budi 2 1 Technische Universität Dresden, Dresden, 01069,
More informationPSG College of Technology, Coimbatore-641 004 Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS.
PSG College of Technology, Coimbatore-641 004 Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS Project Project Title Area of Abstract No Specialization 1. Software
More informationData Quality Mining: Employing Classifiers for Assuring consistent Datasets
Data Quality Mining: Employing Classifiers for Assuring consistent Datasets Fabian Grüning Carl von Ossietzky Universität Oldenburg, Germany, fabian.gruening@informatik.uni-oldenburg.de Abstract: Independent
More information11-792 Software Engineering EMR Project Report
11-792 Software Engineering EMR Project Report Team Members Phani Gadde Anika Gupta Ting-Hao (Kenneth) Huang Chetan Thayur Suyoun Kim Vision Our aim is to build an intelligent system which is capable of
More informationDATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM.
DATA MINING TECHNOLOGY Georgiana Marin 1 Abstract In terms of data processing, classical statistical models are restrictive; it requires hypotheses, the knowledge and experience of specialists, equations,
More informationHybIdx: Indexes for Processing Hybrid Graph Patterns Over Text-Rich Data Graphs Technical Report
HybIdx: Indexes for Processing Hybrid Graph Patterns Over Text-Rich Data Graphs Technical Report Günter Ladwig Thanh Tran Institute AIFB, Karlsruhe Institute of Technology, Germany {guenter.ladwig,ducthanh.tran}@kit.edu
More informationSPATIAL DATA CLASSIFICATION AND DATA MINING
, pp.-40-44. Available online at http://www. bioinfo. in/contents. php?id=42 SPATIAL DATA CLASSIFICATION AND DATA MINING RATHI J.B. * AND PATIL A.D. Department of Computer Science & Engineering, Jawaharlal
More informationEvaluation of Bayesian Spam Filter and SVM Spam Filter
Evaluation of Bayesian Spam Filter and SVM Spam Filter Ayahiko Niimi, Hirofumi Inomata, Masaki Miyamoto and Osamu Konishi School of Systems Information Science, Future University-Hakodate 116 2 Kamedanakano-cho,
More informationHELPDESK SYSTEM DESIGN AND DEVELOPMENT IN A UNIVERSITY BASED ON ITIL V3 FRAMEWORK (CASE STUDY: AL AZHAR INDONESIA UNIVERSITY)
HELPDESK SYSTEM DESIGN AND DEVELOPMENT IN A UNIVERSITY BASED ON ITIL V3 FRAMEWORK (CASE STUDY: AL AZHAR INDONESIA UNIVERSITY) Endang Ripmiatin 1), Arum Fitriati Informatics Engineering Department, Faculty
More information