PROCEEDINGS OF THE 10 TH ANNUAL INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND EDUCATION IN COMPUTER SCIENCE 2014
|
|
- Arnold York
- 7 years ago
- Views:
Transcription
1 PROCEEDINGS OF THE 10 TH ANNUAL INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND EDUCATION IN COMPUTER SCIENCE 2014 IEEE Sponsor With financial support of the Central Strategic Development Found of NBU July 2014, Albena, Bulgaria Chairmen: Ivan Landjev (Bulgaria), Rumen Stainov (Germany) and Lou Chitkushev (USA) General Secretaries: Petya Assenova (Bulgaria), Vijay Kanabar (USA)
2 CSECS 2014, pp The 10 th Annual International Conference on Computer Science and Education in Computer Science, July , Albena, Bulgaria SOME IMPROVEMENTS OF THE OPEN TEXT SUMMARIZER ALGORITHM USING HEURISTICS Filip ANDONOV, Velina SLAVOVA NBU, Computer Science Department Abstract: A number of heuristics to improve the method used by the Open Text Summarizer library are proposed. Keywords: automatic summary generation ACM Classification Keywords: Natural language processing, Text analysis 93
3 2 Andonov, Slavova Introduction Open Text Summarizer is an implementation of a grammar-agnostic method for creating a summary of a text. Although the method is very simple, the idea behind it is powerful enough to make it compete, in terms of quality of results, with much more complicated methods using powerful techniques. Still the fact that it is independent of the language of the text makes some space for improvements by adding other heuristics without compromising (much) its language independence. State of the art Nowadays the Internet provides a vast ocean of unstructured information in text form. The problem of harvesting this data and analyzing it for certain purposes is generally achieved by two main approaches data mining and data structuring (semantic technologies). The second approach is the orthodox one. Unfortunately, the data is mainly still in unstructured form. This means that for practical reasons the data mining approach is preferable for now. The reasons for processing all this data are different from marketing and business intelligence, through research to intelligence and military purposes. Automatic generation of summaries is not new [Luhn, 1958]. Many such tools implementing different methods exist. Popular areas of research on this topic are latent semantic analysis [Olmos et al, 2009], clustering [Amini et al, 2005] and evolutionary algorithms [Alguliev and Aliguliyev, 2009], and hidden Markov models [Conroy and Oleary, 2001]. All these methods are applied to solve tasks such as web searches, document mining, opinion mining, etc., all of these basically just making the netizen's life easier when dealing with large texts containing little (for a given person) important information. It is easily observable that in order to get results, researchers use sophisticated methods and scientific Some improvements of the Open Text Summarizer algorithm using heuristics 94
4 CSECS 2014, July , Albena, Bulgaria 3 instruments that are based on analytical tools and are grammar-specific. Nevertheless the performance of these instruments is not perfect. In recent years the large usage of Web web content created by users on the one hand and the dawn of printed media on the other - made opinion recognition an attractive topic. It turned out that the quick spreading of opinions in social media could topple governments and spark revolutions. The problem is that there are too many and too long texts on the Internet. We assume that a text summary will give concentrated information about expressed opinion polarity. The approach The aim is to create a simple tool based on the general regularities observed in language expression. These are not necessarily studied and described, as they are not subject of grammar or other branches of linguistics but they are observable, which means statistically detectable. For example, in discourse, when one needs to express an opinion, he/she uses the concepts and the features that are in the focus of what is meant to be expressed (in words) more frequently. This led to the idea to concentrate the tool around the word-forms score. We think that abstraction-based summarization is hard enough to be more of a scientific gymnastic than a practical solution, so we are focusing on extraction-based summarization. Basic scheme Text Concepts and features frequency Center of the saying Text filtering Summary Figure 1 95
5 4 Andonov, Slavova There are two main steps detection of the focus of saying and the creation of the summary by means of generating regular language expressions. There are two main approaches to analyzing texts. The first one (abstraction-based) tries to analyze the text and to rephrase it in a consice way. This is what humans do when writing a summary. The other (extraction-based) tries to extract key sentences from the text in some way and to combine them in a structured (but shorter) text again. One famous algorithm that uses this approach is TextRank. Because we use an extraction-based method, the text of the summary is not generated but filtered from the original text. The detection of the focus The main heuristic here is the one used in the Open Text Summarizer (OTS). It basically says that the most frequent (not included in the stopword list) words are the keywords of the text and that the sentences are scored based on the number of occurrences of the keywords in them. The word-forms which express the focus however are different parts of speech, so we suggest detecting them by means of a dictionary. Unfortunately the dictionary approach is not perfect in English and in many other languages different parts of speech have the same word form. For example, walk as a noun and walk as a verb. Still the goal here is not to achieve perfect detection, because the algorithm we are trying to improve does not use the information about the parts of speech at all. After this step the different sub-forms are stemmed as in frequency counts having the basic form is important. The next thing to do is to actually put in use the information about what part of speech each word is classified as by applying different weights to them. [Nicholls and Song, 2009] have shown that nouns are proved to be the center of conceptualization, so we give them higher weights, lower Some improvements of the Open Text Summarizer algorithm using heuristics 96
6 CSECS 2014, July , Albena, Bulgaria 5 ones for verbs and even lower ones for adjectives. We had to use heuristics in order to fit the weights better. Table 1 Part of speech Weight Verb 0.5 Noun 1.0 Adjective 0.2 Unknown 1.0 Now we proceed by counting the number of occurrences of all the words in the text (as OTS dictates) but with the applied weights. Instead of directly using the number of occurrences as a measure of importance of the word, we use another heuristic. It is similar to the idea of measuring entropy. Obviously, the OTS heuristic is that the more frequently a word is used in the text, the more important it is. On the other hand, every summarization algorithm uses some form of a stop-word list. The idea here is that some words that are very common in all texts ( the in English, for example) do not contribute any meaning to the text's topic, so we remove these words so that we do not contaminate the top positions in the frequency list. However, this idea can be stretched further if we have a large language corpus, we can determine which words are common for all texts, so even if they are common in our text they do not hold a discriminative power. 97
7 6 Andonov, Slavova Table 2 Words frequent in our text Words not frequent in our text Words frequent in all texts Not important Not important Words not frequent in all texts Important Not important Thus we use the language corpus to determine the frequency of a word in all texts and then use the formula below to assign a score to it: Word_score = the number of occurrences in a text / the maximal number of occurrences in a text / the number of occurrences in a global word list (all texts) / the maximum number of occurrences in all texts Text generation In order to avoid the need for grammatical knowledge and the creation of Chomsky s trees, the entities we work with are whole sentences. To every sentence in the text we assign a score, calculated by summing the scores of all of the words it consists of. Thus the higher the number of words that occur frequently in the text and the higher the frequency they occur in text with, the higher the score of the sentence. Now we have a score of all the sentences in the text. The original OTS algorithm simply takes the sentences with the highest score and puts them in the summary. However, a lurking problem of this naïve approach is that some sentences are connected as they contain references to things in previous sentences. To minimize the problem with such severed coreference chains, we use a list of words proven to be conductors of a co- Some improvements of the Open Text Summarizer algorithm using heuristics 98
8 CSECS 2014, July , Albena, Bulgaria 7 reference. Here we also use two additional heuristics. The first is that the most important conductors are located at the beginning of a sentence. The second is that because the internal concept buffer of a person is limited, the further the link word in the sentence is, the less likely it is that this word refers to a previous sentence and not to a concept in the current one. Personal pronouns link words such as I, he, she, it, we, you, etc. bring a score of 7 if they are the first word in the sentence, 6 if they are second word, etc. Other pronouns/link words such as this, that, these, such, there, but, etc. have a score calculated the same way as the personal pronouns, but the score is halved. We apply all these rules to create a second score to each sentence a co-reference score. The final stage of our method is to choose the sentences that have: the highest score, or the next sentence has a co-reference score of 7 or more. Conclusion We focused our efforts on improving a simple approach based on general rules without compromising its core idea of being grammaragnostic. We are doing this by using additional linguistic information but we avoid the need of full sentence structure analysis. The quality of the results we observed in the preliminary tests was satisfactory and we plan a large-scale experiment with language specialists. The main advantages of the method we are using are relatively high speed and fewer computational resources. 99
9 8 Andonov, Slavova Bibliography [Alguliev and Aliguliyev, 2009] Rasim Alguliev, Ramiz Aliguliyev Evolutionary Algorithm for Extractive Text Summarization Intelligent Information Management,1, , 2009 [Amini et al, 2005] Massih R Amini, Nicolas Usunier, Patrick Gallinari Advances in Information Retrieval, Pages , Springer Berlin, Heidelberg, 2005 [Conroy and Oleary, 2001] John M. Conroy and Dianne P. O'leary. Text summarization via hidden Markov models. In Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval (SIGIR '01). ACM, New York, NY, USA, , [Luhn, 1958] Luhn., H.P. "The Automatic Creation of Literature Abstracts". IBM Journal of Research and Development, Vol. 2, No. 2, pp , [Nicholls and Song, 2009] Nicholls, C. H. R. I. S., and Fei Song. "Improving sentiment analysis with part-of-speech weighting." Machine Learning and Cybernetics, 2009 International Conference on. Vol. 3. IEEE, [Olmos et al, 2009] [Ricardo Olmos, José A. León, Guillermo Jorge - Botana, and Inmaculada Escudero, New algorithms assessing short summaries in expository texts using latent semantic analysis Behavior Research Methods 41 (3), , 2009] Some improvements of the Open Text Summarizer algorithm using heuristics 100
10 CSECS 2014, July , Albena, Bulgaria 9 Authors' Information Filip ANDONOV, PhD, Chief Assistant Professor, Department of Computer Science, New Bulgarian University, fandonov@nbu.bg. Major Fields of Scientific Research: Multicriteria Optimization, Semantic Technologies Your photo here: Height: 2,58 cm Width: 1,84 cm Velina Slavova, PhD, Prof. in Computer Science, Department of Computer Science, New Bulgarian University, vslavova@nbu.bg Major Fields of Scientific Research: AI, Cognitive Science 101
How the Computer Translates. Svetlana Sokolova President and CEO of PROMT, PhD.
Svetlana Sokolova President and CEO of PROMT, PhD. How the Computer Translates Machine translation is a special field of computer application where almost everyone believes that he/she is a specialist.
More informationEfficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words
, pp.290-295 http://dx.doi.org/10.14257/astl.2015.111.55 Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words Irfan
More informationIT services for analyses of various data samples
IT services for analyses of various data samples Ján Paralič, František Babič, Martin Sarnovský, Peter Butka, Cecília Havrilová, Miroslava Muchová, Michal Puheim, Martin Mikula, Gabriel Tutoky Technical
More informationAnalyzing survey text: a brief overview
IBM SPSS Text Analytics for Surveys Analyzing survey text: a brief overview Learn how gives you greater insight Contents 1 Introduction 2 The role of text in survey research 2 Approaches to text mining
More informationGallito 2.0: a Natural Language Processing tool to support Research on Discourse
Presented in the Twenty-third Annual Meeting of the Society for Text and Discourse, Valencia from 16 to 18, July 2013 Gallito 2.0: a Natural Language Processing tool to support Research on Discourse Guillermo
More informationTOOL OF THE INTELLIGENCE ECONOMIC: RECOGNITION FUNCTION OF REVIEWS CRITICS. Extraction and linguistic analysis of sentiments
TOOL OF THE INTELLIGENCE ECONOMIC: RECOGNITION FUNCTION OF REVIEWS CRITICS. Extraction and linguistic analysis of sentiments Grzegorz Dziczkowski, Katarzyna Wegrzyn-Wolska Ecole Superieur d Ingenieurs
More informationData Mining in Web Search Engine Optimization and User Assisted Rank Results
Data Mining in Web Search Engine Optimization and User Assisted Rank Results Minky Jindal Institute of Technology and Management Gurgaon 122017, Haryana, India Nisha kharb Institute of Technology and Management
More informationWeb Mining. Margherita Berardi LACAM. Dipartimento di Informatica Università degli Studi di Bari berardi@di.uniba.it
Web Mining Margherita Berardi LACAM Dipartimento di Informatica Università degli Studi di Bari berardi@di.uniba.it Bari, 24 Aprile 2003 Overview Introduction Knowledge discovery from text (Web Content
More informationDelivering Smart Answers!
Companion for SharePoint Topic Analyst Companion for SharePoint All Your Information Enterprise-ready Enrich SharePoint, your central place for document and workflow management, not only with an improved
More informationAn Introduction to Data Mining
An Introduction to Intel Beijing wei.heng@intel.com January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail
More information131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10
1/10 131-1 Adding New Level in KDD to Make the Web Usage Mining More Efficient Mohammad Ala a AL_Hamami PHD Student, Lecturer m_ah_1@yahoocom Soukaena Hassan Hashem PHD Student, Lecturer soukaena_hassan@yahoocom
More informationSentiment analysis on news articles using Natural Language Processing and Machine Learning Approach.
Sentiment analysis on news articles using Natural Language Processing and Machine Learning Approach. Pranali Chilekar 1, Swati Ubale 2, Pragati Sonkambale 3, Reema Panarkar 4, Gopal Upadhye 5 1 2 3 4 5
More informationKnowledge Discovery using Text Mining: A Programmable Implementation on Information Extraction and Categorization
Knowledge Discovery using Text Mining: A Programmable Implementation on Information Extraction and Categorization Atika Mustafa, Ali Akbar, and Ahmer Sultan National University of Computer and Emerging
More informationBagged Ensemble Classifiers for Sentiment Classification of Movie Reviews
www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 3 Issue 2 February, 2014 Page No. 3951-3961 Bagged Ensemble Classifiers for Sentiment Classification of Movie
More informationcommunication tool: Silvia Biffignandi
An analysis of web sites as a communication tool: an application in the banking sector Silvia Biffignandi Bibliography Datamining come approccio alle analisi dei mercati e delle performance aziendali,
More informationAuthor Gender Identification of English Novels
Author Gender Identification of English Novels Joseph Baena and Catherine Chen December 13, 2013 1 Introduction Machine learning algorithms have long been used in studies of authorship, particularly in
More informationStemming Methodologies Over Individual Query Words for an Arabic Information Retrieval System
Stemming Methodologies Over Individual Query Words for an Arabic Information Retrieval System Hani Abu-Salem* and Mahmoud Al-Omari Department of Computer Science, Mu tah University, P.O. Box (7), Mu tah,
More informationSEARCH ENGINE WITH PARALLEL PROCESSING AND INCREMENTAL K-MEANS FOR FAST SEARCH AND RETRIEVAL
SEARCH ENGINE WITH PARALLEL PROCESSING AND INCREMENTAL K-MEANS FOR FAST SEARCH AND RETRIEVAL Krishna Kiran Kattamuri 1 and Rupa Chiramdasu 2 Department of Computer Science Engineering, VVIT, Guntur, India
More informationHow To Write A Summary Of A Review
PRODUCT REVIEW RANKING SUMMARIZATION N.P.Vadivukkarasi, Research Scholar, Department of Computer Science, Kongu Arts and Science College, Erode. Dr. B. Jayanthi M.C.A., M.Phil., Ph.D., Associate Professor,
More informationSpatio-Temporal Patterns of Passengers Interests at London Tube Stations
Spatio-Temporal Patterns of Passengers Interests at London Tube Stations Juntao Lai *1, Tao Cheng 1, Guy Lansley 2 1 SpaceTimeLab for Big Data Analytics, Department of Civil, Environmental &Geomatic Engineering,
More informationData Warehousing and Data Mining in Business Applications
133 Data Warehousing and Data Mining in Business Applications Eesha Goel CSE Deptt. GZS-PTU Campus, Bathinda. Abstract Information technology is now required in all aspect of our lives that helps in business
More informationARTIFICIAL INTELLIGENCE METHODS IN STOCK INDEX PREDICTION WITH THE USE OF NEWSPAPER ARTICLES
FOUNDATION OF CONTROL AND MANAGEMENT SCIENCES No Year Manuscripts Mateusz, KOBOS * Jacek, MAŃDZIUK ** ARTIFICIAL INTELLIGENCE METHODS IN STOCK INDEX PREDICTION WITH THE USE OF NEWSPAPER ARTICLES Analysis
More informationData Mining Solutions for the Business Environment
Database Systems Journal vol. IV, no. 4/2013 21 Data Mining Solutions for the Business Environment Ruxandra PETRE University of Economic Studies, Bucharest, Romania ruxandra_stefania.petre@yahoo.com Over
More informationFinancial Trading System using Combination of Textual and Numerical Data
Financial Trading System using Combination of Textual and Numerical Data Shital N. Dange Computer Science Department, Walchand Institute of Rajesh V. Argiddi Assistant Prof. Computer Science Department,
More informationDecision Trees for Mining Data Streams Based on the Gaussian Approximation
International Journal of Computer Sciences and Engineering Open Access Review Paper Volume-4, Issue-3 E-ISSN: 2347-2693 Decision Trees for Mining Data Streams Based on the Gaussian Approximation S.Babu
More informationDomain Knowledge Extracting in a Chinese Natural Language Interface to Databases: NChiql
Domain Knowledge Extracting in a Chinese Natural Language Interface to Databases: NChiql Xiaofeng Meng 1,2, Yong Zhou 1, and Shan Wang 1 1 College of Information, Renmin University of China, Beijing 100872
More informationTransformation of Free-text Electronic Health Records for Efficient Information Retrieval and Support of Knowledge Discovery
Transformation of Free-text Electronic Health Records for Efficient Information Retrieval and Support of Knowledge Discovery Jan Paralic, Peter Smatana Technical University of Kosice, Slovakia Center for
More informationRRSS - Rating Reviews Support System purpose built for movies recommendation
RRSS - Rating Reviews Support System purpose built for movies recommendation Grzegorz Dziczkowski 1,2 and Katarzyna Wegrzyn-Wolska 1 1 Ecole Superieur d Ingenieurs en Informatique et Genie des Telecommunicatiom
More informationA Survey on Product Aspect Ranking
A Survey on Product Aspect Ranking Charushila Patil 1, Prof. P. M. Chawan 2, Priyamvada Chauhan 3, Sonali Wankhede 4 M. Tech Student, Department of Computer Engineering and IT, VJTI College, Mumbai, Maharashtra,
More informationPatent Big Data Analysis by R Data Language for Technology Management
, pp. 69-78 http://dx.doi.org/10.14257/ijseia.2016.10.1.08 Patent Big Data Analysis by R Data Language for Technology Management Sunghae Jun * Department of Statistics, Cheongju University, 360-764, Korea
More informationSearch and Data Mining: Techniques. Text Mining Anya Yarygina Boris Novikov
Search and Data Mining: Techniques Text Mining Anya Yarygina Boris Novikov Introduction Generally used to denote any system that analyzes large quantities of natural language text and detects lexical or
More informationText Mining with R. Rob Zinkov. October 19th, 2010. Rob Zinkov () Text Mining with R October 19th, 2010 1 / 38
Text Mining with R Rob Zinkov October 19th, 2010 Rob Zinkov () Text Mining with R October 19th, 2010 1 / 38 Outline 1 Introduction 2 Readability 3 Summarization 4 Topic Modeling 5 Sentiment Analysis 6
More informationIdentifying Thesis and Conclusion Statements in Student Essays to Scaffold Peer Review
Identifying Thesis and Conclusion Statements in Student Essays to Scaffold Peer Review Mohammad H. Falakmasir, Kevin D. Ashley, Christian D. Schunn, Diane J. Litman Learning Research and Development Center,
More informationIntroduction to Text Mining and Semantics. Seth Grimes -- President, Alta Plana
Introduction to Text Mining and Semantics Seth Grimes -- President, Alta Plana New York Times October 9, 1958 Text expresses a vast, rich range of information, but encodes this information in a form that
More informationUSABILITY OF A FILIPINO LANGUAGE TOOLS WEBSITE
USABILITY OF A FILIPINO LANGUAGE TOOLS WEBSITE Ria A. Sagum, MCS Department of Computer Science, College of Computer and Information Sciences Polytechnic University of the Philippines, Manila, Philippines
More informationCLASSIFICATION AND CLUSTERING METHODS IN THE DECREASING OF THE INTERNET COGNITIVE LOAD
Acta Electrotechnica et Informatica No. 2, Vol. 6, 2006 1 CLASSIFICATION AND CLUSTERING METHODS IN THE DECREASING OF THE INTERNET COGNITIVE LOAD Kristína MACHOVÁ, Ivan KLIMKO Department of Cybernetics
More informationINVENTORY MANAGEMENT, SERVICE LEVEL AND SAFETY STOCK
INVENTORY MANAGEMENT, SERVICE LEVEL AND SAFETY STOCK Alin Constantin RĂDĂŞANU Alexandru Ioan Cuza University, Iaşi, Romania, alin.radasanu@ropharma.ro Abstract: There are many studies that emphasize as
More informationThe multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2
2nd International Conference on Advances in Mechanical Engineering and Industrial Informatics (AMEII 2016) The multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2 1 School of
More informationTagging with Hidden Markov Models
Tagging with Hidden Markov Models Michael Collins 1 Tagging Problems In many NLP problems, we would like to model pairs of sequences. Part-of-speech (POS) tagging is perhaps the earliest, and most famous,
More informationTesting Data-Driven Learning Algorithms for PoS Tagging of Icelandic
Testing Data-Driven Learning Algorithms for PoS Tagging of Icelandic by Sigrún Helgadóttir Abstract This paper gives the results of an experiment concerned with training three different taggers on tagged
More informationC o p yr i g ht 2015, S A S I nstitute Inc. A l l r i g hts r eser v ed. INTRODUCTION TO SAS TEXT MINER
INTRODUCTION TO SAS TEXT MINER TODAY S AGENDA INTRODUCTION TO SAS TEXT MINER Define data mining Overview of SAS Enterprise Miner Describe text analytics and define text data mining Text Mining Process
More informationEffective Data Retrieval Mechanism Using AML within the Web Based Join Framework
Effective Data Retrieval Mechanism Using AML within the Web Based Join Framework Usha Nandini D 1, Anish Gracias J 2 1 ushaduraisamy@yahoo.co.in 2 anishgracias@gmail.com Abstract A vast amount of assorted
More informationTECHNOLOGY ANALYSIS FOR INTERNET OF THINGS USING BIG DATA LEARNING
TECHNOLOGY ANALYSIS FOR INTERNET OF THINGS USING BIG DATA LEARNING Sunghae Jun 1 1 Professor, Department of Statistics, Cheongju University, Chungbuk, Korea Abstract The internet of things (IoT) is an
More informationHexaware E-book on Predictive Analytics
Hexaware E-book on Predictive Analytics Business Intelligence & Analytics Actionable Intelligence Enabled Published on : Feb 7, 2012 Hexaware E-book on Predictive Analytics What is Data mining? Data mining,
More informationHow To Use Data Mining For Knowledge Management In Technology Enhanced Learning
Proceedings of the 6th WSEAS International Conference on Applications of Electrical Engineering, Istanbul, Turkey, May 27-29, 2007 115 Data Mining for Knowledge Management in Technology Enhanced Learning
More informationAn Information Retrieval using weighted Index Terms in Natural Language document collections
Internet and Information Technology in Modern Organizations: Challenges & Answers 635 An Information Retrieval using weighted Index Terms in Natural Language document collections Ahmed A. A. Radwan, Minia
More informationDATA WAREHOUSE AND DATA MINING NECCESSITY OR USELESS INVESTMENT
Scientific Bulletin Economic Sciences, Vol. 9 (15) - Information technology - DATA WAREHOUSE AND DATA MINING NECCESSITY OR USELESS INVESTMENT Associate Professor, Ph.D. Emil BURTESCU University of Pitesti,
More informationNew Hash Function Construction for Textual and Geometric Data Retrieval
Latest Trends on Computers, Vol., pp.483-489, ISBN 978-96-474-3-4, ISSN 79-45, CSCC conference, Corfu, Greece, New Hash Function Construction for Textual and Geometric Data Retrieval Václav Skala, Jan
More informationBig Data and Scripting
Big Data and Scripting 1, 2, Big Data and Scripting - abstract/organization contents introduction to Big Data and involved techniques schedule 2 lectures (Mon 1:30 pm, M628 and Thu 10 am F420) 2 tutorials
More informationBUDT 758B-0501: Big Data Analytics (Fall 2015) Decisions, Operations & Information Technologies Robert H. Smith School of Business
BUDT 758B-0501: Big Data Analytics (Fall 2015) Decisions, Operations & Information Technologies Robert H. Smith School of Business Instructor: Kunpeng Zhang (kzhang@rmsmith.umd.edu) Lecture-Discussions:
More informationData Mining & Data Stream Mining Open Source Tools
Data Mining & Data Stream Mining Open Source Tools Darshana Parikh, Priyanka Tirkha Student M.Tech, Dept. of CSE, Sri Balaji College Of Engg. & Tech, Jaipur, Rajasthan, India Assistant Professor, Dept.
More informationORGANIZATIONAL KNOWLEDGE MAPPING BASED ON LIBRARY INFORMATION SYSTEM
ORGANIZATIONAL KNOWLEDGE MAPPING BASED ON LIBRARY INFORMATION SYSTEM IRANDOC CASE STUDY Ammar Jalalimanesh a,*, Elaheh Homayounvala a a Information engineering department, Iranian Research Institute for
More informationDublin City University at CLEF 2004: Experiments with the ImageCLEF St Andrew s Collection
Dublin City University at CLEF 2004: Experiments with the ImageCLEF St Andrew s Collection Gareth J. F. Jones, Declan Groves, Anna Khasin, Adenike Lam-Adesina, Bart Mellebeek. Andy Way School of Computing,
More informationA Big Data Analytical Framework For Portfolio Optimization Abstract. Keywords. 1. Introduction
A Big Data Analytical Framework For Portfolio Optimization Dhanya Jothimani, Ravi Shankar and Surendra S. Yadav Department of Management Studies, Indian Institute of Technology Delhi {dhanya.jothimani,
More informationSocial Media Analytics Summit April 17-18, 2012 Hotel Kabuki, San Francisco WELCOME TO THE SOCIAL MEDIA ANALYTICS SUMMIT #SMAS12
Social Media Analytics Summit April 17-18, 2012 Hotel Kabuki, San Francisco WELCOME TO THE SOCIAL MEDIA ANALYTICS SUMMIT #SMAS12 www.textanalyticsnews.com www.usefulsocialmedia.com New Directions in Social
More informationW. Heath Rushing Adsurgo LLC. Harness the Power of Text Analytics: Unstructured Data Analysis for Healthcare. Session H-1 JTCC: October 23, 2015
W. Heath Rushing Adsurgo LLC Harness the Power of Text Analytics: Unstructured Data Analysis for Healthcare Session H-1 JTCC: October 23, 2015 Outline Demonstration: Recent article on cnn.com Introduction
More informationNeuro-Fuzzy Classification Techniques for Sentiment Analysis using Intelligent Agents on Twitter Data
International Journal of Innovation and Scientific Research ISSN 2351-8014 Vol. 23 No. 2 May 2016, pp. 356-360 2015 Innovative Space of Scientific Research Journals http://www.ijisr.issr-journals.org/
More informationText Opinion Mining to Analyze News for Stock Market Prediction
Int. J. Advance. Soft Comput. Appl., Vol. 6, No. 1, March 2014 ISSN 2074-8523; Copyright SCRG Publication, 2014 Text Opinion Mining to Analyze News for Stock Market Prediction Yoosin Kim 1, Seung Ryul
More informationSocial Semantic Emotion Analysis for Innovative Multilingual Big Data Analytics Markets
Social Semantic Emotion Analysis for Innovative Multilingual Big Data Analytics Markets D7.5 Dissemination Plan Project ref. no H2020 141111 Project acronym Start date of project (dur.) Document due Date
More informationInternational Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014
RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer
More informationText Analytics Illustrated with a Simple Data Set
CSC 594 Text Mining More on SAS Enterprise Miner Text Analytics Illustrated with a Simple Data Set This demonstration illustrates some text analytic results using a simple data set that is designed to
More informationAn Approach towards Automation of Requirements Analysis
An Approach towards Automation of Requirements Analysis Vinay S, Shridhar Aithal, Prashanth Desai Abstract-Application of Natural Language processing to requirements gathering to facilitate automation
More information6.2.8 Neural networks for data mining
6.2.8 Neural networks for data mining Walter Kosters 1 In many application areas neural networks are known to be valuable tools. This also holds for data mining. In this chapter we discuss the use of neural
More informationASSOCIATION RULE MINING ON WEB LOGS FOR EXTRACTING INTERESTING PATTERNS THROUGH WEKA TOOL
International Journal Of Advanced Technology In Engineering And Science Www.Ijates.Com Volume No 03, Special Issue No. 01, February 2015 ISSN (Online): 2348 7550 ASSOCIATION RULE MINING ON WEB LOGS FOR
More informationA Comparative Study on Sentiment Classification and Ranking on Product Reviews
A Comparative Study on Sentiment Classification and Ranking on Product Reviews C.EMELDA Research Scholar, PG and Research Department of Computer Science, Nehru Memorial College, Putthanampatti, Bharathidasan
More informationStatic Data Mining Algorithm with Progressive Approach for Mining Knowledge
Global Journal of Business Management and Information Technology. Volume 1, Number 2 (2011), pp. 85-93 Research India Publications http://www.ripublication.com Static Data Mining Algorithm with Progressive
More informationDomain Classification of Technical Terms Using the Web
Systems and Computers in Japan, Vol. 38, No. 14, 2007 Translated from Denshi Joho Tsushin Gakkai Ronbunshi, Vol. J89-D, No. 11, November 2006, pp. 2470 2482 Domain Classification of Technical Terms Using
More informationUsing Artificial Intelligence to Manage Big Data for Litigation
FEBRUARY 3 5, 2015 / THE HILTON NEW YORK Using Artificial Intelligence to Manage Big Data for Litigation Understanding Artificial Intelligence to Make better decisions Improve the process Allay the fear
More informationAPPLYING CASE BASED REASONING IN AGILE SOFTWARE DEVELOPMENT
APPLYING CASE BASED REASONING IN AGILE SOFTWARE DEVELOPMENT AIMAN TURANI Associate Prof., Faculty of computer science and Engineering, TAIBAH University, Medina, KSA E-mail: aimanturani@hotmail.com ABSTRACT
More informationProcess Mining in Big Data Scenario
Process Mining in Big Data Scenario Antonia Azzini, Ernesto Damiani SESAR Lab - Dipartimento di Informatica Università degli Studi di Milano, Italy antonia.azzini,ernesto.damiani@unimi.it Abstract. In
More informationDATA MINING TECHNIQUES SUPPORT TO KNOWLEGDE OF BUSINESS INTELLIGENT SYSTEM
INTERNATIONAL JOURNAL OF RESEARCH IN COMPUTER APPLICATIONS AND ROBOTICS ISSN 2320-7345 DATA MINING TECHNIQUES SUPPORT TO KNOWLEGDE OF BUSINESS INTELLIGENT SYSTEM M. Mayilvaganan 1, S. Aparna 2 1 Associate
More informationAn Empirical Study of Application of Data Mining Techniques in Library System
An Empirical Study of Application of Data Mining Techniques in Library System Veepu Uppal Department of Computer Science and Engineering, Manav Rachna College of Engineering, Faridabad, India Gunjan Chindwani
More information31 Case Studies: Java Natural Language Tools Available on the Web
31 Case Studies: Java Natural Language Tools Available on the Web Chapter Objectives Chapter Contents This chapter provides a number of sources for open source and free atural language understanding software
More informationCOMP3420: Advanced Databases and Data Mining. Classification and prediction: Introduction and Decision Tree Induction
COMP3420: Advanced Databases and Data Mining Classification and prediction: Introduction and Decision Tree Induction Lecture outline Classification versus prediction Classification A two step process Supervised
More informationONLINE RESUME PARSING SYSTEM USING TEXT ANALYTICS
ONLINE RESUME PARSING SYSTEM USING TEXT ANALYTICS Divyanshu Chandola 1, Aditya Garg 2, Ankit Maurya 3, Amit Kushwaha 4 1 Student, Department of Information Technology, ABES Engineering College, Uttar Pradesh,
More informationExploring the use of Big Data techniques for simulating Algorithmic Trading Strategies
Exploring the use of Big Data techniques for simulating Algorithmic Trading Strategies Nishith Tirpankar, Jiten Thakkar tirpankar.n@gmail.com, jitenmt@gmail.com December 20, 2015 Abstract In the world
More informationDATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM.
DATA MINING TECHNOLOGY Georgiana Marin 1 Abstract In terms of data processing, classical statistical models are restrictive; it requires hypotheses, the knowledge and experience of specialists, equations,
More information2013 IOS Press. This document is published in:
This document is published in: Bossé, E. et al. (eds.) (2013) Prediction and Recognition of Piracy Efforts Using Collaborative Human-Centric Information Systems, Proceedings of the NATO Advanced Study
More informationAn Innovative Way for Mining Clinical and Administrative Healthcare Data
An Innovative Way for Mining Clinical and Administrative Healthcare Data Siu Hung Keith Lo and Maiga Chang School of Computing and Information Systems, Athabasca University, Canada keithshlo@yahoo.com,
More informationWeb Data Mining: A Case Study. Abstract. Introduction
Web Data Mining: A Case Study Samia Jones Galveston College, Galveston, TX 77550 Omprakash K. Gupta Prairie View A&M, Prairie View, TX 77446 okgupta@pvamu.edu Abstract With an enormous amount of data stored
More informationSelf-Improving Supply Chains
Self-Improving Supply Chains Cyrus Hadavi Ph.D. Adexa, Inc. All Rights Reserved January 4, 2016 Self-Improving Supply Chains Imagine a world where supply chain planning systems can mold themselves into
More informationCS 6740 / INFO 6300. Ad-hoc IR. Graduate-level introduction to technologies for the computational treatment of information in humanlanguage
CS 6740 / INFO 6300 Advanced d Language Technologies Graduate-level introduction to technologies for the computational treatment of information in humanlanguage form, covering natural-language processing
More informationAssociation rules for improving website effectiveness: case analysis
Association rules for improving website effectiveness: case analysis Maja Dimitrijević, The Higher Technical School of Professional Studies, Novi Sad, Serbia, dimitrijevic@vtsns.edu.rs Tanja Krunić, The
More informationLatent Dirichlet Markov Allocation for Sentiment Analysis
Latent Dirichlet Markov Allocation for Sentiment Analysis Ayoub Bagheri Isfahan University of Technology, Isfahan, Iran Intelligent Database, Data Mining and Bioinformatics Lab, Electrical and Computer
More informationFREQUENT PATTERN MINING FOR EFFICIENT LIBRARY MANAGEMENT
FREQUENT PATTERN MINING FOR EFFICIENT LIBRARY MANAGEMENT ANURADHA.T Assoc.prof, atadiparty@yahoo.co.in SRI SAI KRISHNA.A saikrishna.gjc@gmail.com SATYATEJ.K satyatej.koganti@gmail.com NAGA ANIL KUMAR.G
More informationStatistical Machine Translation: IBM Models 1 and 2
Statistical Machine Translation: IBM Models 1 and 2 Michael Collins 1 Introduction The next few lectures of the course will be focused on machine translation, and in particular on statistical machine translation
More informationThe Oxford Learner s Dictionary of Academic English
ISEJ Advertorial The Oxford Learner s Dictionary of Academic English Oxford University Press The Oxford Learner s Dictionary of Academic English (OLDAE) is a brand new learner s dictionary aimed at students
More informationQualitative Corporate Dashboards for Corporate Monitoring Peng Jia and Miklos A. Vasarhelyi 1
Qualitative Corporate Dashboards for Corporate Monitoring Peng Jia and Miklos A. Vasarhelyi 1 Introduction Electronic Commerce 2 is accelerating dramatically changes in the business process. Electronic
More informationSentiment analysis for news articles
Prashant Raina Sentiment analysis for news articles Wide range of applications in business and public policy Especially relevant given the popularity of online media Previous work Machine learning based
More informationCOURSE RECOMMENDER SYSTEM IN E-LEARNING
International Journal of Computer Science and Communication Vol. 3, No. 1, January-June 2012, pp. 159-164 COURSE RECOMMENDER SYSTEM IN E-LEARNING Sunita B Aher 1, Lobo L.M.R.J. 2 1 M.E. (CSE)-II, Walchand
More informationBOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL
The Fifth International Conference on e-learning (elearning-2014), 22-23 September 2014, Belgrade, Serbia BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL SNJEŽANA MILINKOVIĆ University
More informationPrediction of Heart Disease Using Naïve Bayes Algorithm
Prediction of Heart Disease Using Naïve Bayes Algorithm R.Karthiyayini 1, S.Chithaara 2 Assistant Professor, Department of computer Applications, Anna University, BIT campus, Tiruchirapalli, Tamilnadu,
More informationOptimizing the relevancy of Predictions using Machine Learning and NLP of Search Query
International Journal of Scientific and Research Publications, Volume 4, Issue 8, August 2014 1 Optimizing the relevancy of Predictions using Machine Learning and NLP of Search Query Kilari Murali krishna
More informationThe Ontology and Architecture for an Academic Social Network
www.ijcsi.org 22 The Ontology and Architecture for an Academic Social Network Moharram Challenger Computer Engineering Department, Islamic Azad University Shabestar Branch, Shabestar, East Azerbaijan,
More informationImproving SAS Global Forum Papers
Paper 3343-2015 Improving SAS Global Forum Papers Vijay Singh, Pankush Kalgotra, Goutam Chakraborty, Oklahoma State University, OK, US ABSTRACT Just as research is built on existing research, the references
More informationRanked Keyword Search in Cloud Computing: An Innovative Approach
International Journal of Computational Engineering Research Vol, 03 Issue, 6 Ranked Keyword Search in Cloud Computing: An Innovative Approach 1, Vimmi Makkar 2, Sandeep Dalal 1, (M.Tech) 2,(Assistant professor)
More informationAnalysis of Social Media Streams
Fakultätsname 24 Fachrichtung 24 Institutsname 24, Professur 24 Analysis of Social Media Streams Florian Weidner Dresden, 21.01.2014 Outline 1.Introduction 2.Social Media Streams Clustering Summarization
More informationPREDICTING MARKET VOLATILITY FEDERAL RESERVE BOARD MEETING MINUTES FROM
PREDICTING MARKET VOLATILITY FROM FEDERAL RESERVE BOARD MEETING MINUTES Reza Bosagh Zadeh and Andreas Zollmann Lab Advisers: Noah Smith and Bryan Routledge GOALS Make Money! Not really. Find interesting
More informationLearning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal
Learning Example Chapter 18: Learning from Examples 22c:145 An emergency room in a hospital measures 17 variables (e.g., blood pressure, age, etc) of newly admitted patients. A decision is needed: whether
More informationIntelligent Search for Answering Clinical Questions Coronado Group, Ltd. Innovation Initiatives
Intelligent Search for Answering Clinical Questions Coronado Group, Ltd. Innovation Initiatives Search The Way You Think Copyright 2009 Coronado, Ltd. All rights reserved. All other product names and logos
More information