IT services for analyses of various data samples
|
|
|
- Iris Townsend
- 10 years ago
- Views:
Transcription
1 IT services for analyses of various data samples Ján Paralič, František Babič, Martin Sarnovský, Peter Butka, Cecília Havrilová, Miroslava Muchová, Michal Puheim, Martin Mikula, Gabriel Tutoky Technical University of Košice, Faculty of Electrical Engineering and Informatics, Department of Cybernetics and Artificial Intelligence, Letná 9/B, Košice, Slovakia {jan.paralic, frantisek.babic, martin.sarnovsky, peter.butka, cecilia.havrilova, miroslava.muchova, michal.puheim, martin.mikula, Abstract. Nowadays efficient processing and analysis of various data samples is becoming an important means how to obtain a competitive advantage on the market. In this situation analytical services available through the cloud represent an interesting solution how to offer a variety of methods and algorithms in and easy-usable form. Set of services described in this paper were designed and implemented based on multiannual research and project activities in domains as text mining, distributive and parallel computing, sentiment analysis, topic modelling and data science in general. We present thee main subsets of services with a short description of the used approaches and technologies. Implemented methods and algorithms have been continuously tested and deployed within previous national or EU projects, dissertation or master thesis, etc. Keywords: data, analysis, services 1 Introduction An important condition for the proper functioning and efficient performance of the presented services is a technical infrastructure providing necessary computing power and data capacity. We continuously build our own computing environment in which we can not only deploy and test our services, but we are also able to offer them as a SaaS (Software as a Service) for any other potential users. Some basic characteristics of the proposed architecture are the following: Private cloud managed with CoreOS, a lightweight Linux distribution that focuses on managing Linux containers. Web services decoupled into containers using Docker, software for automating deployment of applications into Linux containers. OS and application is combined together into software container, which can then be launched inside virtualization software. REST-like web services with dedicated Web portal for user interaction. Programmatic calls are handled through the REST-like API, a current de facto standard for web services, which provides simpler alternative to the SOAP 82
2 ISBN and WS-* standards. User interactions are conducted through a web application that itself uses the same web services in the background. Programming languages as Java, C#, Python and R for business logic and analytics. 2 Analytical platform In this section each common subset of services is described in more details: Effective management, storage and analyses of large collections of text documents using a sufficiently powerful computing platform of a private cloud infrastructure. Analyses of transaction data from electronic shops in order to provide recommendations for customers based on their buying behavior and/or buying behavior of previous customers with similar characteristics. Analyses of textual data, e.g. data from web discussions to identify overall customer satisfaction with given products or identification of major topics occurring in given collection of textual data as well as sentiment of their authors about particular topics. 2.1 Big text data analysis Services for big text data analysis are designed and implemented in line with current state-of-the-art technologies and frameworks including newly designed and implemented methods and algorithms, accessible through web portal or API [8]. The backend implementation is based on the JBOWL library for text mining and supporting (i.e. indexation and preprocessing) services [5]. It is an internally developed Java library for text mining tasks. Particular methods were re-implemented into the distributed versions using the GridGain API [3, 4]. GridGain is the framework for distributed applications development, including the real-time big data analytical applications. User interface is implemented using JSP (Java Server Pages) and interactive visualizations of the models are implemented in the Processing framework. We offer following: Services for management and manipulation with text document collections services for dataset manipulation including dataset management. Services for indexing, complex statistical text analyses and preprocessing tasks services for preprocessing of text documents including various preprocessing methods such as stopwords removal, stemming or several weighting scheme computing methods. Services for classification models building implemented in distributed versions algorithms for classification model building, following classifier are implemented to utilize the distributed computing resources by using the GridGain framework for distributed computing: decision tree, K-NN an boosting compound classifier. 83
3 Services for clustering of the text documents in distributed versions algorithms for clustering models building, similar to classification models, implemented using GridGain: K-Means and GHSOM [7]. 2.2 Process and event log mining Next subset of services deals with the behavioral analysis of IT portal users (such as e-shop customers, social network users etc.). Actions of these users are usually mirrored by access and event logs (e.g. access to the IT portal, participation within the campaign, display of a specific product in e-shop, etc.). Our services can analyze these logs and extract various types of knowledge (e.g. classification rules, segmentation and clustering based on similar behavior, behavior patterns and recommendations). The recommender system provides user specific information which can be used for marketing campaigns, web personalized recommendations, advertisement etc. In a simple scenario, the system may predict the products which certain user is likely to buy. In this case, data about single user (user id, item id, ratings etc.) are loaded into the web GUI of our recommender system. A set of algorithms analyzes this data and produces a single data file, specific to the corresponding user. The file contains information about items which may be potentially interesting for the user. The system utilizes various algorithms, such as Matrix Factorization, Item Based k-nn, User Based knn, Weighted Regularized Matrix Factorization and Bayesian Personalized Ranking Matrix Factorization. The algorithms used perform a collaborative approach in which several models are created. Each individual model is built using a different method and represents a specific personal assessment of the user. Produced models are further tested, mutually compared and finally combined into a single hybrid model which combines a variety of recommendation techniques with goal to achieve the best performance possible. The system is designed and implemented using the RapidMiner analytics platform. An accompanying web GUI is implemented in PHP language. Results of the behavior rule mining service are in form of prediction rules usable for decision-making and support in areas such as management, marketing, customer segmentation, classification, behavior prediction etc. The service processes user data and event logs by means of data aggregation, clustering, classification and prediction. The aggregations are created using a predefined set of operators (such as count, sum, frequency etc.) and the results are filtered using Hierarchical Agglomerative Clustering leaving only the most relevant aggregated data (the predictors). The metric used for clustering is based on correlation coefficients obtained using either Pearson s product-moment correlation coefficient (for numeric event attributes) or Pearson's chi-square test of independence (for nominal event attributes). Finally, a decision tree is created using the aggregated data and a set of rules is extracted from it. This set is sorted according to the number of data examples to which a single rule applies correctly. Only the most significant rules are returned. Both components are implemented as web services communicating via JSON messages. 84
4 ISBN Sentiment and theme analysis These services provide automatic detection of textual document themes with the filtering option, i.e. access to relevant articles only. For search engines it can be implemented as an extension with possibility to search documents by their themes instead of words matching. In the product sales area and in discussions about products the themes detection is able to recognize main topics that interest customers at most. For public sector the detection of document themes can be used as good tool for e.g. detection of main politicians affairs and bring the reflection of the public persons. System for topic modeling [9] is created as a library in Java. It is supported by two frameworks Gate and Mallet. This system is able to process input documents automatically and display discovered topics with their description at output. Required number of output topics can be passed as input parameter by the user or can be automatically estimated by the system. For the topic modelling we used Latent Dirichlet Allocation (LDA) methodology. Dictionary based approach [2] was used for discussion polarity detection. It uses lexicons, which contain words useful for classification. It was created lexicon for opinion classification, which contains around 1200 words in nominative plural. Words have assigned strength of polarity. They are divided into 4 groups (positive, negative, opposite and intensification). These words are then used for text classification into positive or negative class. Algorithm compares words in text with words in dictionary. Final polarity value of a sentence is computed as the sum of values of all polarity words in this sentence. Final text polarity value depends on values of its sentences. 3 Conclusion The aim of our set of analytical services is not to compete with big analytical platforms supported by the most important vendors and actors in this domain. We provide it as customized approach for selected business case, e.g. on the level of medium or small companies that need to have an easy to use solution without necessary deeper knowledge about implemented methods or algorithms. On the other hand, it is possible to modify and improve the available services within own research activities. Acknowledgment. The work presented in this paper was partially supported by the Slovak Grant Agency of Ministry of Education and Academy of Science of the Slovak Republic under grant No. 1/1147/12 (50%) and as the result of the Project implementation: University Science Park TECHNICOM for Innovation Applications Supported by Knowledge Technology, ITMS: , supported by the Research & Development Operational Programme funded by the ERDF (50%). References 1. Blei, D. M., Ng, A. Y., Jordan, M. I.: Latent Dirichlet Allocation. In: Journal of Machine Learning Research 3, 2003, pp
5 2. Mikula, M., Machová, K.: Classification of opinion in conversational content. In: IEEE SAMI 2015 Proceedings, Herľany, Slovensko, 2015, pp Butka et al.: Distributed task-based execution engine for support of text-mining processes. In: IEEE SAMI 2009 Proceedings, Herľany, Slovensko, 2009, pp Bednár, P., Butka, P.: Task-based execution engine for JBOWL. In: WIKT 2008 Proceedings, Smolenice, Bratislava, STU, 2009, pp Bednár, P., Butka, P., Paralič, J.: Java library for support of text mining and retrieval. In: Znalosti 2005, Stará Lesná, VŠB-TU Ostrava, 2005, pp Bednár, P., Sarnovský, M., Demko, V.: RDF vs. NoSQL databases for the Semantic Web applications. In: IEEE SAMI 2014 Proceedings, Herľany, Slovensko, 2014, pp Sarnovský, M.: Design and implementation of Interactive visualization of GHSOM clustering algorithm for text mining tasks. In: International Journal of Research in Information Technology, Vol. 2, No. 7 (2014), pp Sarnovský, M.: Design and implementation of the cloud based application for text mining tasks. In: Data Mining and Knowledge Engineering, Vol. 6, No. 6 (2014), pp Smatana, M. et al.: Active learning enhanced semi-automatic annotation tool for aspectbased sentiment analysis. In: IEEE SISY 2013 Proceedings, Subotica, Serbia, 2013, pp
Data Mining Solutions for the Business Environment
Database Systems Journal vol. IV, no. 4/2013 21 Data Mining Solutions for the Business Environment Ruxandra PETRE University of Economic Studies, Bucharest, Romania [email protected] Over
An Introduction to Data Mining
An Introduction to Intel Beijing [email protected] January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail
Search and Data Mining: Techniques. Introduction Anna Yarygina Boris Novikov
Search and Data Mining: Techniques Introduction Anna Yarygina Boris Novikov Data Analytics: Conference Sections Fundamentals for data analytics Mechanisms and features Big Data Huge data Target analytics
DATA MINING TECHNIQUES AND APPLICATIONS
DATA MINING TECHNIQUES AND APPLICATIONS Mrs. Bharati M. Ramageri, Lecturer Modern Institute of Information Technology and Research, Department of Computer Application, Yamunanagar, Nigdi Pune, Maharashtra,
Data Mining Yelp Data - Predicting rating stars from review text
Data Mining Yelp Data - Predicting rating stars from review text Rakesh Chada Stony Brook University [email protected] Chetan Naik Stony Brook University [email protected] ABSTRACT The majority
Data Mining. 1 Introduction 2 Data Mining methods. Alfred Holl Data Mining 1
Data Mining 1 Introduction 2 Data Mining methods Alfred Holl Data Mining 1 1 Introduction 1.1 Motivation 1.2 Goals and problems 1.3 Definitions 1.4 Roots 1.5 Data Mining process 1.6 Epistemological constraints
Transformation of Free-text Electronic Health Records for Efficient Information Retrieval and Support of Knowledge Discovery
Transformation of Free-text Electronic Health Records for Efficient Information Retrieval and Support of Knowledge Discovery Jan Paralic, Peter Smatana Technical University of Kosice, Slovakia Center for
An Overview of Knowledge Discovery Database and Data mining Techniques
An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,
USING BIG DATA FOR INTELLIGENT BUSINESSES
HENRI COANDA AIR FORCE ACADEMY ROMANIA INTERNATIONAL CONFERENCE of SCIENTIFIC PAPER AFASES 2015 Brasov, 28-30 May 2015 GENERAL M.R. STEFANIK ARMED FORCES ACADEMY SLOVAK REPUBLIC USING BIG DATA FOR INTELLIGENT
KNIME TUTORIAL. Anna Monreale KDD-Lab, University of Pisa Email: [email protected]
KNIME TUTORIAL Anna Monreale KDD-Lab, University of Pisa Email: [email protected] Outline Introduction on KNIME KNIME components Exercise: Market Basket Analysis Exercise: Customer Segmentation Exercise:
A Systemic Artificial Intelligence (AI) Approach to Difficult Text Analytics Tasks
A Systemic Artificial Intelligence (AI) Approach to Difficult Text Analytics Tasks Text Analytics World, Boston, 2013 Lars Hard, CTO Agenda Difficult text analytics tasks Feature extraction Bio-inspired
Index Contents Page No. Introduction . Data Mining & Knowledge Discovery
Index Contents Page No. 1. Introduction 1 1.1 Related Research 2 1.2 Objective of Research Work 3 1.3 Why Data Mining is Important 3 1.4 Research Methodology 4 1.5 Research Hypothesis 4 1.6 Scope 5 2.
International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014
RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer
Pentaho Data Mining Last Modified on January 22, 2007
Pentaho Data Mining Copyright 2007 Pentaho Corporation. Redistribution permitted. All trademarks are the property of their respective owners. For the latest information, please visit our web site at www.pentaho.org
PSG College of Technology, Coimbatore-641 004 Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS.
PSG College of Technology, Coimbatore-641 004 Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS Project Project Title Area of Abstract No Specialization 1. Software
SPATIAL DATA CLASSIFICATION AND DATA MINING
, pp.-40-44. Available online at http://www. bioinfo. in/contents. php?id=42 SPATIAL DATA CLASSIFICATION AND DATA MINING RATHI J.B. * AND PATIL A.D. Department of Computer Science & Engineering, Jawaharlal
A Survey on Product Aspect Ranking
A Survey on Product Aspect Ranking Charushila Patil 1, Prof. P. M. Chawan 2, Priyamvada Chauhan 3, Sonali Wankhede 4 M. Tech Student, Department of Computer Engineering and IT, VJTI College, Mumbai, Maharashtra,
Using Data Mining for Mobile Communication Clustering and Characterization
Using Data Mining for Mobile Communication Clustering and Characterization A. Bascacov *, C. Cernazanu ** and M. Marcu ** * Lasting Software, Timisoara, Romania ** Politehnica University of Timisoara/Computer
Augmented Search for Web Applications. New frontier in big log data analysis and application intelligence
Augmented Search for Web Applications New frontier in big log data analysis and application intelligence Business white paper May 2015 Web applications are the most common business applications today.
The Prophecy-Prototype of Prediction modeling tool
The Prophecy-Prototype of Prediction modeling tool Ms. Ashwini Dalvi 1, Ms. Dhvni K.Shah 2, Ms. Rujul B.Desai 3, Ms. Shraddha M.Vora 4, Mr. Vaibhav G.Tailor 5 Department of Information Technology, Mumbai
Mobile Phone APP Software Browsing Behavior using Clustering Analysis
Proceedings of the 2014 International Conference on Industrial Engineering and Operations Management Bali, Indonesia, January 7 9, 2014 Mobile Phone APP Software Browsing Behavior using Clustering Analysis
BUDT 758B-0501: Big Data Analytics (Fall 2015) Decisions, Operations & Information Technologies Robert H. Smith School of Business
BUDT 758B-0501: Big Data Analytics (Fall 2015) Decisions, Operations & Information Technologies Robert H. Smith School of Business Instructor: Kunpeng Zhang ([email protected]) Lecture-Discussions:
A Statistical Text Mining Method for Patent Analysis
A Statistical Text Mining Method for Patent Analysis Department of Statistics Cheongju University, [email protected] Abstract Most text data from diverse document databases are unsuitable for analytical
BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL
The Fifth International Conference on e-learning (elearning-2014), 22-23 September 2014, Belgrade, Serbia BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL SNJEŽANA MILINKOVIĆ University
CLASSIFYING NETWORK TRAFFIC IN THE BIG DATA ERA
CLASSIFYING NETWORK TRAFFIC IN THE BIG DATA ERA Professor Yang Xiang Network Security and Computing Laboratory (NSCLab) School of Information Technology Deakin University, Melbourne, Australia http://anss.org.au/nsclab
Master Specialization in Knowledge Engineering
Master Specialization in Knowledge Engineering Pavel Kordík, Ph.D. Department of Computer Science Faculty of Information Technology Czech Technical University in Prague Prague, Czech Republic http://www.fit.cvut.cz/en
WebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat
Information Builders enables agile information solutions with business intelligence (BI) and integration technologies. WebFOCUS the most widely utilized business intelligence platform connects to any enterprise
CUSTOMER Presentation of SAP Predictive Analytics
SAP Predictive Analytics 2.0 2015-02-09 CUSTOMER Presentation of SAP Predictive Analytics Content 1 SAP Predictive Analytics Overview....3 2 Deployment Configurations....4 3 SAP Predictive Analytics Desktop
City Data Pipeline. A System for Making Open Data Useful for Cities. [email protected]
City Data Pipeline A System for Making Open Data Useful for Cities Stefan Bischof 1,2, Axel Polleres 1, and Simon Sperl 1 1 Siemens AG Österreich, Siemensstraße 90, 1211 Vienna, Austria {bischof.stefan,axel.polleres,simon.sperl}@siemens.com
KNOWLEDGE-BASED IN MEDICAL DECISION SUPPORT SYSTEM BASED ON SUBJECTIVE INTELLIGENCE
JOURNAL OF MEDICAL INFORMATICS & TECHNOLOGIES Vol. 22/2013, ISSN 1642-6037 medical diagnosis, ontology, subjective intelligence, reasoning, fuzzy rules Hamido FUJITA 1 KNOWLEDGE-BASED IN MEDICAL DECISION
Hexaware E-book on Predictive Analytics
Hexaware E-book on Predictive Analytics Business Intelligence & Analytics Actionable Intelligence Enabled Published on : Feb 7, 2012 Hexaware E-book on Predictive Analytics What is Data mining? Data mining,
Knowledge Discovery from patents using KMX Text Analytics
Knowledge Discovery from patents using KMX Text Analytics Dr. Anton Heijs [email protected] Treparel Abstract In this white paper we discuss how the KMX technology of Treparel can help searchers
Introduction Predictive Analytics Tools: Weka
Introduction Predictive Analytics Tools: Weka Predictive Analytics Center of Excellence San Diego Supercomputer Center University of California, San Diego Tools Landscape Considerations Scale User Interface
Big Data Analytics and Healthcare
Big Data Analytics and Healthcare Anup Kumar, Professor and Director of MINDS Lab Computer Engineering and Computer Science Department University of Louisville Road Map Introduction Data Sources Structured
Hadoop Technology for Flow Analysis of the Internet Traffic
Hadoop Technology for Flow Analysis of the Internet Traffic Rakshitha Kiran P PG Scholar, Dept. of C.S, Shree Devi Institute of Technology, Mangalore, Karnataka, India ABSTRACT: Flow analysis of the internet
How To Make Sense Of Data With Altilia
HOW TO MAKE SENSE OF BIG DATA TO BETTER DRIVE BUSINESS PROCESSES, IMPROVE DECISION-MAKING, AND SUCCESSFULLY COMPETE IN TODAY S MARKETS. ALTILIA turns Big Data into Smart Data and enables businesses to
Advanced analytics at your hands
2.3 Advanced analytics at your hands Neural Designer is the most powerful predictive analytics software. It uses innovative neural networks techniques to provide data scientists with results in a way previously
dm106 TEXT MINING FOR CUSTOMER RELATIONSHIP MANAGEMENT: AN APPROACH BASED ON LATENT SEMANTIC ANALYSIS AND FUZZY CLUSTERING
dm106 TEXT MINING FOR CUSTOMER RELATIONSHIP MANAGEMENT: AN APPROACH BASED ON LATENT SEMANTIC ANALYSIS AND FUZZY CLUSTERING ABSTRACT In most CRM (Customer Relationship Management) systems, information on
Analysis Tools and Libraries for BigData
+ Analysis Tools and Libraries for BigData Lecture 02 Abhijit Bendale + Office Hours 2 n Terry Boult (Waiting to Confirm) n Abhijit Bendale (Tue 2:45 to 4:45 pm). Best if you email me in advance, but I
Enhanced Boosted Trees Technique for Customer Churn Prediction Model
IOSR Journal of Engineering (IOSRJEN) ISSN (e): 2250-3021, ISSN (p): 2278-8719 Vol. 04, Issue 03 (March. 2014), V5 PP 41-45 www.iosrjen.org Enhanced Boosted Trees Technique for Customer Churn Prediction
How To Solve The Kd Cup 2010 Challenge
A Lightweight Solution to the Educational Data Mining Challenge Kun Liu Yan Xing Faculty of Automation Guangdong University of Technology Guangzhou, 510090, China [email protected] [email protected]
Lavastorm Analytic Library Predictive and Statistical Analytics Node Pack FAQs
1.1 Introduction Lavastorm Analytic Library Predictive and Statistical Analytics Node Pack FAQs For brevity, the Lavastorm Analytics Library (LAL) Predictive and Statistical Analytics Node Pack will be
Database Marketing, Business Intelligence and Knowledge Discovery
Database Marketing, Business Intelligence and Knowledge Discovery Note: Using material from Tan / Steinbach / Kumar (2005) Introduction to Data Mining,, Addison Wesley; and Cios / Pedrycz / Swiniarski
Application of Predictive Model for Elementary Students with Special Needs in New Era University
Application of Predictive Model for Elementary Students with Special Needs in New Era University Jannelle ds. Ligao, Calvin Jon A. Lingat, Kristine Nicole P. Chiu, Cym Quiambao, Laurice Anne A. Iglesia
MicroStrategy Course Catalog
MicroStrategy Course Catalog 1 microstrategy.com/education 3 MicroStrategy course matrix 4 MicroStrategy 9 8 MicroStrategy 10 table of contents MicroStrategy course matrix MICROSTRATEGY 9 MICROSTRATEGY
Enterprise Resource Planning Analysis of Business Intelligence & Emergence of Mining Objects
Enterprise Resource Planning Analysis of Business Intelligence & Emergence of Mining Objects Abstract: Build a model to investigate system and discovering relations that connect variables in a database
Predictive Analytics Techniques: What to Use For Your Big Data. March 26, 2014 Fern Halper, PhD
Predictive Analytics Techniques: What to Use For Your Big Data March 26, 2014 Fern Halper, PhD Presenter Proven Performance Since 1995 TDWI helps business and IT professionals gain insight about data warehousing,
Big Data Architect Certification Self-Study Kit Bundle
Big Data Architect Certification Bundle This certification bundle provides you with the self-study materials you need to prepare for the exams required to complete the Big Data Architect Certification.
White Paper. How Streaming Data Analytics Enables Real-Time Decisions
White Paper How Streaming Data Analytics Enables Real-Time Decisions Contents Introduction... 1 What Is Streaming Analytics?... 1 How Does SAS Event Stream Processing Work?... 2 Overview...2 Event Stream
ASSOCIATION RULE MINING ON WEB LOGS FOR EXTRACTING INTERESTING PATTERNS THROUGH WEKA TOOL
International Journal Of Advanced Technology In Engineering And Science Www.Ijates.Com Volume No 03, Special Issue No. 01, February 2015 ISSN (Online): 2348 7550 ASSOCIATION RULE MINING ON WEB LOGS FOR
Spatio-Temporal Patterns of Passengers Interests at London Tube Stations
Spatio-Temporal Patterns of Passengers Interests at London Tube Stations Juntao Lai *1, Tao Cheng 1, Guy Lansley 2 1 SpaceTimeLab for Big Data Analytics, Department of Civil, Environmental &Geomatic Engineering,
Monitis Project Proposals for AUA. September 2014, Yerevan, Armenia
Monitis Project Proposals for AUA September 2014, Yerevan, Armenia Distributed Log Collecting and Analysing Platform Project Specifications Category: Big Data and NoSQL Software Requirements: Apache Hadoop
Knowledge Discovery using Text Mining: A Programmable Implementation on Information Extraction and Categorization
Knowledge Discovery using Text Mining: A Programmable Implementation on Information Extraction and Categorization Atika Mustafa, Ali Akbar, and Ahmer Sultan National University of Computer and Emerging
Full-text Search in Intermediate Data Storage of FCART
Full-text Search in Intermediate Data Storage of FCART Alexey Neznanov, Andrey Parinov National Research University Higher School of Economics, 20 Myasnitskaya Ulitsa, Moscow, 101000, Russia [email protected],
Recommender Systems: Content-based, Knowledge-based, Hybrid. Radek Pelánek
Recommender Systems: Content-based, Knowledge-based, Hybrid Radek Pelánek 2015 Today lecture, basic principles: content-based knowledge-based hybrid, choice of approach,... critiquing, explanations,...
Data Refinery with Big Data Aspects
International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 3, Number 7 (2013), pp. 655-662 International Research Publications House http://www. irphouse.com /ijict.htm Data
Chapter 20: Data Analysis
Chapter 20: Data Analysis Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 20: Data Analysis Decision Support Systems Data Warehousing Data Mining Classification
OPC COMMUNICATION IN REAL TIME
OPC COMMUNICATION IN REAL TIME M. Mrosko, L. Mrafko Slovak University of Technology, Faculty of Electrical Engineering and Information Technology Ilkovičova 3, 812 19 Bratislava, Slovak Republic Abstract
Understanding Your Customer Journey by Extending Adobe Analytics with Big Data
SOLUTION BRIEF Understanding Your Customer Journey by Extending Adobe Analytics with Big Data Business Challenge Today s digital marketing teams are overwhelmed by the volume and variety of customer interaction
Sentiment Analysis. D. Skrepetos 1. University of Waterloo. NLP Presenation, 06/17/2015
Sentiment Analysis D. Skrepetos 1 1 Department of Computer Science University of Waterloo NLP Presenation, 06/17/2015 D. Skrepetos (University of Waterloo) Sentiment Analysis NLP Presenation, 06/17/2015
ISSN: 2320-1363 CONTEXTUAL ADVERTISEMENT MINING BASED ON BIG DATA ANALYTICS
CONTEXTUAL ADVERTISEMENT MINING BASED ON BIG DATA ANALYTICS A.Divya *1, A.M.Saravanan *2, I. Anette Regina *3 MPhil, Research Scholar, Muthurangam Govt. Arts College, Vellore, Tamilnadu, India Assistant
ON INTEGRATING UNSUPERVISED AND SUPERVISED CLASSIFICATION FOR CREDIT RISK EVALUATION
ISSN 9 X INFORMATION TECHNOLOGY AND CONTROL, 00, Vol., No.A ON INTEGRATING UNSUPERVISED AND SUPERVISED CLASSIFICATION FOR CREDIT RISK EVALUATION Danuta Zakrzewska Institute of Computer Science, Technical
Verifying Business Processes Extracted from E-Commerce Systems Using Dynamic Analysis
Verifying Business Processes Extracted from E-Commerce Systems Using Dynamic Analysis Derek Foo 1, Jin Guo 2 and Ying Zou 1 Department of Electrical and Computer Engineering 1 School of Computing 2 Queen
Ensembles and PMML in KNIME
Ensembles and PMML in KNIME Alexander Fillbrunn 1, Iris Adä 1, Thomas R. Gabriel 2 and Michael R. Berthold 1,2 1 Department of Computer and Information Science Universität Konstanz Konstanz, Germany [email protected]
Advanced In-Database Analytics
Advanced In-Database Analytics Tallinn, Sept. 25th, 2012 Mikko-Pekka Bertling, BDM Greenplum EMEA 1 That sounds complicated? 2 Who can tell me how best to solve this 3 What are the main mathematical functions??
Cleaned Data. Recommendations
Call Center Data Analysis Megaputer Case Study in Text Mining Merete Hvalshagen www.megaputer.com Megaputer Intelligence, Inc. 120 West Seventh Street, Suite 10 Bloomington, IN 47404, USA +1 812-0-0110
Social Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
Table of Contents. Chapter No. 1 Introduction 1. iii. xiv. xviii. xix. Page No.
Table of Contents Title Declaration by the Candidate Certificate of Supervisor Acknowledgement Abstract List of Figures List of Tables List of Abbreviations Chapter Chapter No. 1 Introduction 1 ii iii
Web Document Clustering
Web Document Clustering Lab Project based on the MDL clustering suite http://www.cs.ccsu.edu/~markov/mdlclustering/ Zdravko Markov Computer Science Department Central Connecticut State University New Britain,
Using reporting and data mining techniques to improve knowledge of subscribers; applications to customer profiling and fraud management
Using reporting and data mining techniques to improve knowledge of subscribers; applications to customer profiling and fraud management Paper Jean-Louis Amat Abstract One of the main issues of operators
CLASSIFICATION AND CLUSTERING METHODS IN THE DECREASING OF THE INTERNET COGNITIVE LOAD
Acta Electrotechnica et Informatica No. 2, Vol. 6, 2006 1 CLASSIFICATION AND CLUSTERING METHODS IN THE DECREASING OF THE INTERNET COGNITIVE LOAD Kristína MACHOVÁ, Ivan KLIMKO Department of Cybernetics
IEEE International Conference on Computing, Analytics and Security Trends CAST-2016 (19 21 December, 2016) Call for Paper
IEEE International Conference on Computing, Analytics and Security Trends CAST-2016 (19 21 December, 2016) Call for Paper CAST-2015 provides an opportunity for researchers, academicians, scientists and
Text Analytics Software Choosing the Right Fit
Text Analytics Software Choosing the Right Fit Tom Reamy Chief Knowledge Architect KAPS Group http://www.kapsgroup.com Text Analytics World San Francisco, 2013 Agenda Introduction Text Analytics Basics
CONTENTS PREFACE 1 INTRODUCTION 1 2 DATA VISUALIZATION 19
PREFACE xi 1 INTRODUCTION 1 1.1 Overview 1 1.2 Definition 1 1.3 Preparation 2 1.3.1 Overview 2 1.3.2 Accessing Tabular Data 3 1.3.3 Accessing Unstructured Data 3 1.3.4 Understanding the Variables and Observations
Assignment # 1 (Cloud Computing Security)
Assignment # 1 (Cloud Computing Security) Group Members: Abdullah Abid Zeeshan Qaiser M. Umar Hayat Table of Contents Windows Azure Introduction... 4 Windows Azure Services... 4 1. Compute... 4 a) Virtual
Statistical Feature Selection Techniques for Arabic Text Categorization
Statistical Feature Selection Techniques for Arabic Text Categorization Rehab M. Duwairi Department of Computer Information Systems Jordan University of Science and Technology Irbid 22110 Jordan Tel. +962-2-7201000
An Introduction to WEKA. As presented by PACE
An Introduction to WEKA As presented by PACE Download and Install WEKA Website: http://www.cs.waikato.ac.nz/~ml/weka/index.html 2 Content Intro and background Exploring WEKA Data Preparation Creating Models/
A Grid Architecture for Manufacturing Database System
Database Systems Journal vol. II, no. 2/2011 23 A Grid Architecture for Manufacturing Database System Laurentiu CIOVICĂ, Constantin Daniel AVRAM Economic Informatics Department, Academy of Economic Studies
A Near Real-Time Personalization for ecommerce Platform Amit Rustagi [email protected]
A Near Real-Time Personalization for ecommerce Platform Amit Rustagi [email protected] Abstract. In today's competitive environment, you only have a few seconds to help site visitors understand that you
RANKING WEB PAGES RELEVANT TO SEARCH KEYWORDS
ISBN: 978-972-8924-93-5 2009 IADIS RANKING WEB PAGES RELEVANT TO SEARCH KEYWORDS Ben Choi & Sumit Tyagi Computer Science, Louisiana Tech University, USA ABSTRACT In this paper we propose new methods for
IBM Social Media Analytics
IBM Social Media Analytics Analyze social media data to better understand your customers and markets Highlights Understand consumer sentiment and optimize marketing campaigns. Improve the customer experience
ifinder ENTERPRISE SEARCH
DATA SHEET ifinder ENTERPRISE SEARCH ifinder - the Enterprise Search solution for company-wide information search, information logistics and text mining. CUSTOMER QUOTE IntraFind stands for high quality
Pentaho High-Performance Big Data Reference Configurations using Cisco Unified Computing System
Pentaho High-Performance Big Data Reference Configurations using Cisco Unified Computing System By Jake Cornelius Senior Vice President of Products Pentaho June 1, 2012 Pentaho Delivers High-Performance
Text Analytics Beginner s Guide. Extracting Meaning from Unstructured Data
Text Analytics Beginner s Guide Extracting Meaning from Unstructured Data Contents Text Analytics 3 Use Cases 7 Terms 9 Trends 14 Scenario 15 Resources 24 2 2013 Angoss Software Corporation. All rights
DATA MINING TOOL FOR INTEGRATED COMPLAINT MANAGEMENT SYSTEM WEKA 3.6.7
DATA MINING TOOL FOR INTEGRATED COMPLAINT MANAGEMENT SYSTEM WEKA 3.6.7 UNDER THE GUIDANCE Dr. N.P. DHAVALE, DGM, INFINET Department SUBMITTED TO INSTITUTE FOR DEVELOPMENT AND RESEARCH IN BANKING TECHNOLOGY
Data Mining. Concepts, Models, Methods, and Algorithms. 2nd Edition
Brochure More information from http://www.researchandmarkets.com/reports/2171322/ Data Mining. Concepts, Models, Methods, and Algorithms. 2nd Edition Description: This book reviews state-of-the-art methodologies
Data Mining Algorithms Part 1. Dejan Sarka
Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka ([email protected]) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses
