IT services for analyses of various data samples

Size: px
Start display at page:

Download "IT services for analyses of various data samples"

Transcription

1 IT services for analyses of various data samples Ján Paralič, František Babič, Martin Sarnovský, Peter Butka, Cecília Havrilová, Miroslava Muchová, Michal Puheim, Martin Mikula, Gabriel Tutoky Technical University of Košice, Faculty of Electrical Engineering and Informatics, Department of Cybernetics and Artificial Intelligence, Letná 9/B, Košice, Slovakia {jan.paralic, frantisek.babic, martin.sarnovsky, peter.butka, cecilia.havrilova, miroslava.muchova, michal.puheim, martin.mikula, Abstract. Nowadays efficient processing and analysis of various data samples is becoming an important means how to obtain a competitive advantage on the market. In this situation analytical services available through the cloud represent an interesting solution how to offer a variety of methods and algorithms in and easy-usable form. Set of services described in this paper were designed and implemented based on multiannual research and project activities in domains as text mining, distributive and parallel computing, sentiment analysis, topic modelling and data science in general. We present thee main subsets of services with a short description of the used approaches and technologies. Implemented methods and algorithms have been continuously tested and deployed within previous national or EU projects, dissertation or master thesis, etc. Keywords: data, analysis, services 1 Introduction An important condition for the proper functioning and efficient performance of the presented services is a technical infrastructure providing necessary computing power and data capacity. We continuously build our own computing environment in which we can not only deploy and test our services, but we are also able to offer them as a SaaS (Software as a Service) for any other potential users. Some basic characteristics of the proposed architecture are the following: Private cloud managed with CoreOS, a lightweight Linux distribution that focuses on managing Linux containers. Web services decoupled into containers using Docker, software for automating deployment of applications into Linux containers. OS and application is combined together into software container, which can then be launched inside virtualization software. REST-like web services with dedicated Web portal for user interaction. Programmatic calls are handled through the REST-like API, a current de facto standard for web services, which provides simpler alternative to the SOAP 82

2 ISBN and WS-* standards. User interactions are conducted through a web application that itself uses the same web services in the background. Programming languages as Java, C#, Python and R for business logic and analytics. 2 Analytical platform In this section each common subset of services is described in more details: Effective management, storage and analyses of large collections of text documents using a sufficiently powerful computing platform of a private cloud infrastructure. Analyses of transaction data from electronic shops in order to provide recommendations for customers based on their buying behavior and/or buying behavior of previous customers with similar characteristics. Analyses of textual data, e.g. data from web discussions to identify overall customer satisfaction with given products or identification of major topics occurring in given collection of textual data as well as sentiment of their authors about particular topics. 2.1 Big text data analysis Services for big text data analysis are designed and implemented in line with current state-of-the-art technologies and frameworks including newly designed and implemented methods and algorithms, accessible through web portal or API [8]. The backend implementation is based on the JBOWL library for text mining and supporting (i.e. indexation and preprocessing) services [5]. It is an internally developed Java library for text mining tasks. Particular methods were re-implemented into the distributed versions using the GridGain API [3, 4]. GridGain is the framework for distributed applications development, including the real-time big data analytical applications. User interface is implemented using JSP (Java Server Pages) and interactive visualizations of the models are implemented in the Processing framework. We offer following: Services for management and manipulation with text document collections services for dataset manipulation including dataset management. Services for indexing, complex statistical text analyses and preprocessing tasks services for preprocessing of text documents including various preprocessing methods such as stopwords removal, stemming or several weighting scheme computing methods. Services for classification models building implemented in distributed versions algorithms for classification model building, following classifier are implemented to utilize the distributed computing resources by using the GridGain framework for distributed computing: decision tree, K-NN an boosting compound classifier. 83

3 Services for clustering of the text documents in distributed versions algorithms for clustering models building, similar to classification models, implemented using GridGain: K-Means and GHSOM [7]. 2.2 Process and event log mining Next subset of services deals with the behavioral analysis of IT portal users (such as e-shop customers, social network users etc.). Actions of these users are usually mirrored by access and event logs (e.g. access to the IT portal, participation within the campaign, display of a specific product in e-shop, etc.). Our services can analyze these logs and extract various types of knowledge (e.g. classification rules, segmentation and clustering based on similar behavior, behavior patterns and recommendations). The recommender system provides user specific information which can be used for marketing campaigns, web personalized recommendations, advertisement etc. In a simple scenario, the system may predict the products which certain user is likely to buy. In this case, data about single user (user id, item id, ratings etc.) are loaded into the web GUI of our recommender system. A set of algorithms analyzes this data and produces a single data file, specific to the corresponding user. The file contains information about items which may be potentially interesting for the user. The system utilizes various algorithms, such as Matrix Factorization, Item Based k-nn, User Based knn, Weighted Regularized Matrix Factorization and Bayesian Personalized Ranking Matrix Factorization. The algorithms used perform a collaborative approach in which several models are created. Each individual model is built using a different method and represents a specific personal assessment of the user. Produced models are further tested, mutually compared and finally combined into a single hybrid model which combines a variety of recommendation techniques with goal to achieve the best performance possible. The system is designed and implemented using the RapidMiner analytics platform. An accompanying web GUI is implemented in PHP language. Results of the behavior rule mining service are in form of prediction rules usable for decision-making and support in areas such as management, marketing, customer segmentation, classification, behavior prediction etc. The service processes user data and event logs by means of data aggregation, clustering, classification and prediction. The aggregations are created using a predefined set of operators (such as count, sum, frequency etc.) and the results are filtered using Hierarchical Agglomerative Clustering leaving only the most relevant aggregated data (the predictors). The metric used for clustering is based on correlation coefficients obtained using either Pearson s product-moment correlation coefficient (for numeric event attributes) or Pearson's chi-square test of independence (for nominal event attributes). Finally, a decision tree is created using the aggregated data and a set of rules is extracted from it. This set is sorted according to the number of data examples to which a single rule applies correctly. Only the most significant rules are returned. Both components are implemented as web services communicating via JSON messages. 84

4 ISBN Sentiment and theme analysis These services provide automatic detection of textual document themes with the filtering option, i.e. access to relevant articles only. For search engines it can be implemented as an extension with possibility to search documents by their themes instead of words matching. In the product sales area and in discussions about products the themes detection is able to recognize main topics that interest customers at most. For public sector the detection of document themes can be used as good tool for e.g. detection of main politicians affairs and bring the reflection of the public persons. System for topic modeling [9] is created as a library in Java. It is supported by two frameworks Gate and Mallet. This system is able to process input documents automatically and display discovered topics with their description at output. Required number of output topics can be passed as input parameter by the user or can be automatically estimated by the system. For the topic modelling we used Latent Dirichlet Allocation (LDA) methodology. Dictionary based approach [2] was used for discussion polarity detection. It uses lexicons, which contain words useful for classification. It was created lexicon for opinion classification, which contains around 1200 words in nominative plural. Words have assigned strength of polarity. They are divided into 4 groups (positive, negative, opposite and intensification). These words are then used for text classification into positive or negative class. Algorithm compares words in text with words in dictionary. Final polarity value of a sentence is computed as the sum of values of all polarity words in this sentence. Final text polarity value depends on values of its sentences. 3 Conclusion The aim of our set of analytical services is not to compete with big analytical platforms supported by the most important vendors and actors in this domain. We provide it as customized approach for selected business case, e.g. on the level of medium or small companies that need to have an easy to use solution without necessary deeper knowledge about implemented methods or algorithms. On the other hand, it is possible to modify and improve the available services within own research activities. Acknowledgment. The work presented in this paper was partially supported by the Slovak Grant Agency of Ministry of Education and Academy of Science of the Slovak Republic under grant No. 1/1147/12 (50%) and as the result of the Project implementation: University Science Park TECHNICOM for Innovation Applications Supported by Knowledge Technology, ITMS: , supported by the Research & Development Operational Programme funded by the ERDF (50%). References 1. Blei, D. M., Ng, A. Y., Jordan, M. I.: Latent Dirichlet Allocation. In: Journal of Machine Learning Research 3, 2003, pp

5 2. Mikula, M., Machová, K.: Classification of opinion in conversational content. In: IEEE SAMI 2015 Proceedings, Herľany, Slovensko, 2015, pp Butka et al.: Distributed task-based execution engine for support of text-mining processes. In: IEEE SAMI 2009 Proceedings, Herľany, Slovensko, 2009, pp Bednár, P., Butka, P.: Task-based execution engine for JBOWL. In: WIKT 2008 Proceedings, Smolenice, Bratislava, STU, 2009, pp Bednár, P., Butka, P., Paralič, J.: Java library for support of text mining and retrieval. In: Znalosti 2005, Stará Lesná, VŠB-TU Ostrava, 2005, pp Bednár, P., Sarnovský, M., Demko, V.: RDF vs. NoSQL databases for the Semantic Web applications. In: IEEE SAMI 2014 Proceedings, Herľany, Slovensko, 2014, pp Sarnovský, M.: Design and implementation of Interactive visualization of GHSOM clustering algorithm for text mining tasks. In: International Journal of Research in Information Technology, Vol. 2, No. 7 (2014), pp Sarnovský, M.: Design and implementation of the cloud based application for text mining tasks. In: Data Mining and Knowledge Engineering, Vol. 6, No. 6 (2014), pp Smatana, M. et al.: Active learning enhanced semi-automatic annotation tool for aspectbased sentiment analysis. In: IEEE SISY 2013 Proceedings, Subotica, Serbia, 2013, pp

Data Mining Solutions for the Business Environment

Data Mining Solutions for the Business Environment Database Systems Journal vol. IV, no. 4/2013 21 Data Mining Solutions for the Business Environment Ruxandra PETRE University of Economic Studies, Bucharest, Romania ruxandra_stefania.petre@yahoo.com Over

More information

Data Mining Yelp Data - Predicting rating stars from review text

Data Mining Yelp Data - Predicting rating stars from review text Data Mining Yelp Data - Predicting rating stars from review text Rakesh Chada Stony Brook University rchada@cs.stonybrook.edu Chetan Naik Stony Brook University cnaik@cs.stonybrook.edu ABSTRACT The majority

More information

lop Building Machine Learning Systems with Python en source

lop Building Machine Learning Systems with Python en source Building Machine Learning Systems with Python Master the art of machine learning with Python and build effective machine learning systems with this intensive handson guide Willi Richert Luis Pedro Coelho

More information

ADHAWK WORKS ADVERTISING ANALTICS ON A DASHBOARD

ADHAWK WORKS ADVERTISING ANALTICS ON A DASHBOARD ADHAWK WORKS ADVERTISING ANALTICS ON A DASHBOARD Mrs. Vijayalaxmi M. 1, Anagha Kelkar 2, Neha Puthran 2, Sailee Devne 2 Vice Principal 1, B.E. Students 2, Department of Information Technology V.E.S Institute

More information

An Introduction to Data Mining

An Introduction to Data Mining An Introduction to Intel Beijing wei.heng@intel.com January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail

More information

A Survey on Product Aspect Ranking

A Survey on Product Aspect Ranking A Survey on Product Aspect Ranking Charushila Patil 1, Prof. P. M. Chawan 2, Priyamvada Chauhan 3, Sonali Wankhede 4 M. Tech Student, Department of Computer Engineering and IT, VJTI College, Mumbai, Maharashtra,

More information

Search and Data Mining: Techniques. Introduction Anna Yarygina Boris Novikov

Search and Data Mining: Techniques. Introduction Anna Yarygina Boris Novikov Search and Data Mining: Techniques Introduction Anna Yarygina Boris Novikov Data Analytics: Conference Sections Fundamentals for data analytics Mechanisms and features Big Data Huge data Target analytics

More information

Augmented Search for Web Applications. New frontier in big log data analysis and application intelligence

Augmented Search for Web Applications. New frontier in big log data analysis and application intelligence Augmented Search for Web Applications New frontier in big log data analysis and application intelligence Business white paper May 2015 Web applications are the most common business applications today.

More information

DATA MINING TECHNIQUES AND APPLICATIONS

DATA MINING TECHNIQUES AND APPLICATIONS DATA MINING TECHNIQUES AND APPLICATIONS Mrs. Bharati M. Ramageri, Lecturer Modern Institute of Information Technology and Research, Department of Computer Application, Yamunanagar, Nigdi Pune, Maharashtra,

More information

An Overview of Knowledge Discovery Database and Data mining Techniques

An Overview of Knowledge Discovery Database and Data mining Techniques An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,

More information

WebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat

WebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat Information Builders enables agile information solutions with business intelligence (BI) and integration technologies. WebFOCUS the most widely utilized business intelligence platform connects to any enterprise

More information

Augmented Search for IT Data Analytics. New frontier in big log data analysis and application intelligence

Augmented Search for IT Data Analytics. New frontier in big log data analysis and application intelligence Augmented Search for IT Data Analytics New frontier in big log data analysis and application intelligence Business white paper May 2015 IT data is a general name to log data, IT metrics, application data,

More information

Data Mining. 1 Introduction 2 Data Mining methods. Alfred Holl Data Mining 1

Data Mining. 1 Introduction 2 Data Mining methods. Alfred Holl Data Mining 1 Data Mining 1 Introduction 2 Data Mining methods Alfred Holl Data Mining 1 1 Introduction 1.1 Motivation 1.2 Goals and problems 1.3 Definitions 1.4 Roots 1.5 Data Mining process 1.6 Epistemological constraints

More information

WHAT DEVELOPERS ARE TALKING ABOUT?

WHAT DEVELOPERS ARE TALKING ABOUT? WHAT DEVELOPERS ARE TALKING ABOUT? AN ANALYSIS OF STACK OVERFLOW DATA 1. Abstract We implemented a methodology to analyze the textual content of Stack Overflow discussions. We used latent Dirichlet allocation

More information

KNIME TUTORIAL. Anna Monreale KDD-Lab, University of Pisa Email: annam@di.unipi.it

KNIME TUTORIAL. Anna Monreale KDD-Lab, University of Pisa Email: annam@di.unipi.it KNIME TUTORIAL Anna Monreale KDD-Lab, University of Pisa Email: annam@di.unipi.it Outline Introduction on KNIME KNIME components Exercise: Market Basket Analysis Exercise: Customer Segmentation Exercise:

More information

Pentaho Data Mining Last Modified on January 22, 2007

Pentaho Data Mining Last Modified on January 22, 2007 Pentaho Data Mining Copyright 2007 Pentaho Corporation. Redistribution permitted. All trademarks are the property of their respective owners. For the latest information, please visit our web site at www.pentaho.org

More information

Transformation of Free-text Electronic Health Records for Efficient Information Retrieval and Support of Knowledge Discovery

Transformation of Free-text Electronic Health Records for Efficient Information Retrieval and Support of Knowledge Discovery Transformation of Free-text Electronic Health Records for Efficient Information Retrieval and Support of Knowledge Discovery Jan Paralic, Peter Smatana Technical University of Kosice, Slovakia Center for

More information

USING BIG DATA FOR INTELLIGENT BUSINESSES

USING BIG DATA FOR INTELLIGENT BUSINESSES HENRI COANDA AIR FORCE ACADEMY ROMANIA INTERNATIONAL CONFERENCE of SCIENTIFIC PAPER AFASES 2015 Brasov, 28-30 May 2015 GENERAL M.R. STEFANIK ARMED FORCES ACADEMY SLOVAK REPUBLIC USING BIG DATA FOR INTELLIGENT

More information

Using Data Mining for Mobile Communication Clustering and Characterization

Using Data Mining for Mobile Communication Clustering and Characterization Using Data Mining for Mobile Communication Clustering and Characterization A. Bascacov *, C. Cernazanu ** and M. Marcu ** * Lasting Software, Timisoara, Romania ** Politehnica University of Timisoara/Computer

More information

RESEARCH PAPERS FACULTY OF MATERIALS SCIENCE AND TECHNOLOGY IN TRNAVA SLOVAK UNIVERSITY OF TECHNOLOGY IN BRATISLAVA

RESEARCH PAPERS FACULTY OF MATERIALS SCIENCE AND TECHNOLOGY IN TRNAVA SLOVAK UNIVERSITY OF TECHNOLOGY IN BRATISLAVA RESEARCH PAPERS FACULTY OF MATERIALS SCIENCE AND TECHNOLOGY IN TRNAVA SLOVAK UNIVERSITY OF TECHNOLOGY IN BRATISLAVA 2013 Number 33 BUSINESS INTELLIGENCE IN PROCESS CONTROL Alena KOPČEKOVÁ, Michal KOPČEK,

More information

A Systemic Artificial Intelligence (AI) Approach to Difficult Text Analytics Tasks

A Systemic Artificial Intelligence (AI) Approach to Difficult Text Analytics Tasks A Systemic Artificial Intelligence (AI) Approach to Difficult Text Analytics Tasks Text Analytics World, Boston, 2013 Lars Hard, CTO Agenda Difficult text analytics tasks Feature extraction Bio-inspired

More information

BUDT 758B-0501: Big Data Analytics (Fall 2015) Decisions, Operations & Information Technologies Robert H. Smith School of Business

BUDT 758B-0501: Big Data Analytics (Fall 2015) Decisions, Operations & Information Technologies Robert H. Smith School of Business BUDT 758B-0501: Big Data Analytics (Fall 2015) Decisions, Operations & Information Technologies Robert H. Smith School of Business Instructor: Kunpeng Zhang (kzhang@rmsmith.umd.edu) Lecture-Discussions:

More information

Hadoop Technology for Flow Analysis of the Internet Traffic

Hadoop Technology for Flow Analysis of the Internet Traffic Hadoop Technology for Flow Analysis of the Internet Traffic Rakshitha Kiran P PG Scholar, Dept. of C.S, Shree Devi Institute of Technology, Mangalore, Karnataka, India ABSTRACT: Flow analysis of the internet

More information

SPATIAL DATA CLASSIFICATION AND DATA MINING

SPATIAL DATA CLASSIFICATION AND DATA MINING , pp.-40-44. Available online at http://www. bioinfo. in/contents. php?id=42 SPATIAL DATA CLASSIFICATION AND DATA MINING RATHI J.B. * AND PATIL A.D. Department of Computer Science & Engineering, Jawaharlal

More information

PSG College of Technology, Coimbatore-641 004 Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS.

PSG College of Technology, Coimbatore-641 004 Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS. PSG College of Technology, Coimbatore-641 004 Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS Project Project Title Area of Abstract No Specialization 1. Software

More information

Mobile Phone APP Software Browsing Behavior using Clustering Analysis

Mobile Phone APP Software Browsing Behavior using Clustering Analysis Proceedings of the 2014 International Conference on Industrial Engineering and Operations Management Bali, Indonesia, January 7 9, 2014 Mobile Phone APP Software Browsing Behavior using Clustering Analysis

More information

Index Contents Page No. Introduction . Data Mining & Knowledge Discovery

Index Contents Page No. Introduction . Data Mining & Knowledge Discovery Index Contents Page No. 1. Introduction 1 1.1 Related Research 2 1.2 Objective of Research Work 3 1.3 Why Data Mining is Important 3 1.4 Research Methodology 4 1.5 Research Hypothesis 4 1.6 Scope 5 2.

More information

Distributed Knowledge Management based on Software Agents and Ontology

Distributed Knowledge Management based on Software Agents and Ontology Distributed Knowledge Management based on Software Agents and Ontology Michal Laclavik 1, Zoltan Balogh 1, Ladislav Hluchy 1, Renata Slota 2, Krzysztof Krawczyk 3 and Mariusz Dziewierz 3 1 Institute of

More information

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014 RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer

More information

Knowledge Discovery from patents using KMX Text Analytics

Knowledge Discovery from patents using KMX Text Analytics Knowledge Discovery from patents using KMX Text Analytics Dr. Anton Heijs anton.heijs@treparel.com Treparel Abstract In this white paper we discuss how the KMX technology of Treparel can help searchers

More information

IBM Social Media Analytics

IBM Social Media Analytics IBM Social Media Analytics Analyze social media data to better understand your customers and markets Highlights Understand consumer sentiment and optimize marketing campaigns. Improve the customer experience

More information

Master Specialization in Knowledge Engineering

Master Specialization in Knowledge Engineering Master Specialization in Knowledge Engineering Pavel Kordík, Ph.D. Department of Computer Science Faculty of Information Technology Czech Technical University in Prague Prague, Czech Republic http://www.fit.cvut.cz/en

More information

BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL

BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL The Fifth International Conference on e-learning (elearning-2014), 22-23 September 2014, Belgrade, Serbia BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL SNJEŽANA MILINKOVIĆ University

More information

Introduction Predictive Analytics Tools: Weka

Introduction Predictive Analytics Tools: Weka Introduction Predictive Analytics Tools: Weka Predictive Analytics Center of Excellence San Diego Supercomputer Center University of California, San Diego Tools Landscape Considerations Scale User Interface

More information

MicroStrategy Course Catalog

MicroStrategy Course Catalog MicroStrategy Course Catalog 1 microstrategy.com/education 3 MicroStrategy course matrix 4 MicroStrategy 9 8 MicroStrategy 10 table of contents MicroStrategy course matrix MICROSTRATEGY 9 MICROSTRATEGY

More information

The Prophecy-Prototype of Prediction modeling tool

The Prophecy-Prototype of Prediction modeling tool The Prophecy-Prototype of Prediction modeling tool Ms. Ashwini Dalvi 1, Ms. Dhvni K.Shah 2, Ms. Rujul B.Desai 3, Ms. Shraddha M.Vora 4, Mr. Vaibhav G.Tailor 5 Department of Information Technology, Mumbai

More information

City Data Pipeline. A System for Making Open Data Useful for Cities. stefan.bischof@tuwien.ac.at

City Data Pipeline. A System for Making Open Data Useful for Cities. stefan.bischof@tuwien.ac.at City Data Pipeline A System for Making Open Data Useful for Cities Stefan Bischof 1,2, Axel Polleres 1, and Simon Sperl 1 1 Siemens AG Österreich, Siemensstraße 90, 1211 Vienna, Austria {bischof.stefan,axel.polleres,simon.sperl}@siemens.com

More information

DATA MINING TOOL FOR INTEGRATED COMPLAINT MANAGEMENT SYSTEM WEKA 3.6.7

DATA MINING TOOL FOR INTEGRATED COMPLAINT MANAGEMENT SYSTEM WEKA 3.6.7 DATA MINING TOOL FOR INTEGRATED COMPLAINT MANAGEMENT SYSTEM WEKA 3.6.7 UNDER THE GUIDANCE Dr. N.P. DHAVALE, DGM, INFINET Department SUBMITTED TO INSTITUTE FOR DEVELOPMENT AND RESEARCH IN BANKING TECHNOLOGY

More information

ifinder ENTERPRISE SEARCH

ifinder ENTERPRISE SEARCH DATA SHEET ifinder ENTERPRISE SEARCH ifinder - the Enterprise Search solution for company-wide information search, information logistics and text mining. CUSTOMER QUOTE IntraFind stands for high quality

More information

ASSOCIATION RULE MINING ON WEB LOGS FOR EXTRACTING INTERESTING PATTERNS THROUGH WEKA TOOL

ASSOCIATION RULE MINING ON WEB LOGS FOR EXTRACTING INTERESTING PATTERNS THROUGH WEKA TOOL International Journal Of Advanced Technology In Engineering And Science Www.Ijates.Com Volume No 03, Special Issue No. 01, February 2015 ISSN (Online): 2348 7550 ASSOCIATION RULE MINING ON WEB LOGS FOR

More information

KNOWLEDGE-BASED IN MEDICAL DECISION SUPPORT SYSTEM BASED ON SUBJECTIVE INTELLIGENCE

KNOWLEDGE-BASED IN MEDICAL DECISION SUPPORT SYSTEM BASED ON SUBJECTIVE INTELLIGENCE JOURNAL OF MEDICAL INFORMATICS & TECHNOLOGIES Vol. 22/2013, ISSN 1642-6037 medical diagnosis, ontology, subjective intelligence, reasoning, fuzzy rules Hamido FUJITA 1 KNOWLEDGE-BASED IN MEDICAL DECISION

More information

Advanced analytics at your hands

Advanced analytics at your hands 2.3 Advanced analytics at your hands Neural Designer is the most powerful predictive analytics software. It uses innovative neural networks techniques to provide data scientists with results in a way previously

More information

A Statistical Text Mining Method for Patent Analysis

A Statistical Text Mining Method for Patent Analysis A Statistical Text Mining Method for Patent Analysis Department of Statistics Cheongju University, shjun@cju.ac.kr Abstract Most text data from diverse document databases are unsuitable for analytical

More information

White Paper. How Streaming Data Analytics Enables Real-Time Decisions

White Paper. How Streaming Data Analytics Enables Real-Time Decisions White Paper How Streaming Data Analytics Enables Real-Time Decisions Contents Introduction... 1 What Is Streaming Analytics?... 1 How Does SAS Event Stream Processing Work?... 2 Overview...2 Event Stream

More information

Knowledge Discovery using Text Mining: A Programmable Implementation on Information Extraction and Categorization

Knowledge Discovery using Text Mining: A Programmable Implementation on Information Extraction and Categorization Knowledge Discovery using Text Mining: A Programmable Implementation on Information Extraction and Categorization Atika Mustafa, Ali Akbar, and Ahmer Sultan National University of Computer and Emerging

More information

Data Mining, Predictive Analytics with Microsoft Analysis Services and Excel PowerPivot

Data Mining, Predictive Analytics with Microsoft Analysis Services and Excel PowerPivot www.etidaho.com (208) 327-0768 Data Mining, Predictive Analytics with Microsoft Analysis Services and Excel PowerPivot 3 Days About this Course This course is designed for the end users and analysts that

More information

CLASSIFYING NETWORK TRAFFIC IN THE BIG DATA ERA

CLASSIFYING NETWORK TRAFFIC IN THE BIG DATA ERA CLASSIFYING NETWORK TRAFFIC IN THE BIG DATA ERA Professor Yang Xiang Network Security and Computing Laboratory (NSCLab) School of Information Technology Deakin University, Melbourne, Australia http://anss.org.au/nsclab

More information

OPC COMMUNICATION IN REAL TIME

OPC COMMUNICATION IN REAL TIME OPC COMMUNICATION IN REAL TIME M. Mrosko, L. Mrafko Slovak University of Technology, Faculty of Electrical Engineering and Information Technology Ilkovičova 3, 812 19 Bratislava, Slovak Republic Abstract

More information

Full-text Search in Intermediate Data Storage of FCART

Full-text Search in Intermediate Data Storage of FCART Full-text Search in Intermediate Data Storage of FCART Alexey Neznanov, Andrey Parinov National Research University Higher School of Economics, 20 Myasnitskaya Ulitsa, Moscow, 101000, Russia ANeznanov@hse.ru,

More information

Understanding Your Customer Journey by Extending Adobe Analytics with Big Data

Understanding Your Customer Journey by Extending Adobe Analytics with Big Data SOLUTION BRIEF Understanding Your Customer Journey by Extending Adobe Analytics with Big Data Business Challenge Today s digital marketing teams are overwhelmed by the volume and variety of customer interaction

More information

Sentiment Analysis. D. Skrepetos 1. University of Waterloo. NLP Presenation, 06/17/2015

Sentiment Analysis. D. Skrepetos 1. University of Waterloo. NLP Presenation, 06/17/2015 Sentiment Analysis D. Skrepetos 1 1 Department of Computer Science University of Waterloo NLP Presenation, 06/17/2015 D. Skrepetos (University of Waterloo) Sentiment Analysis NLP Presenation, 06/17/2015

More information

Recommender Systems: Content-based, Knowledge-based, Hybrid. Radek Pelánek

Recommender Systems: Content-based, Knowledge-based, Hybrid. Radek Pelánek Recommender Systems: Content-based, Knowledge-based, Hybrid Radek Pelánek 2015 Today lecture, basic principles: content-based knowledge-based hybrid, choice of approach,... critiquing, explanations,...

More information

ON INTEGRATING UNSUPERVISED AND SUPERVISED CLASSIFICATION FOR CREDIT RISK EVALUATION

ON INTEGRATING UNSUPERVISED AND SUPERVISED CLASSIFICATION FOR CREDIT RISK EVALUATION ISSN 9 X INFORMATION TECHNOLOGY AND CONTROL, 00, Vol., No.A ON INTEGRATING UNSUPERVISED AND SUPERVISED CLASSIFICATION FOR CREDIT RISK EVALUATION Danuta Zakrzewska Institute of Computer Science, Technical

More information

A Lightweight Solution to the Educational Data Mining Challenge

A Lightweight Solution to the Educational Data Mining Challenge A Lightweight Solution to the Educational Data Mining Challenge Kun Liu Yan Xing Faculty of Automation Guangdong University of Technology Guangzhou, 510090, China catch0327@yahoo.com yanxing@gdut.edu.cn

More information

A Near Real-Time Personalization for ecommerce Platform Amit Rustagi arustagi@ebay.com

A Near Real-Time Personalization for ecommerce Platform Amit Rustagi arustagi@ebay.com A Near Real-Time Personalization for ecommerce Platform Amit Rustagi arustagi@ebay.com Abstract. In today's competitive environment, you only have a few seconds to help site visitors understand that you

More information

IEEE International Conference on Computing, Analytics and Security Trends CAST-2016 (19 21 December, 2016) Call for Paper

IEEE International Conference on Computing, Analytics and Security Trends CAST-2016 (19 21 December, 2016) Call for Paper IEEE International Conference on Computing, Analytics and Security Trends CAST-2016 (19 21 December, 2016) Call for Paper CAST-2015 provides an opportunity for researchers, academicians, scientists and

More information

dm106 TEXT MINING FOR CUSTOMER RELATIONSHIP MANAGEMENT: AN APPROACH BASED ON LATENT SEMANTIC ANALYSIS AND FUZZY CLUSTERING

dm106 TEXT MINING FOR CUSTOMER RELATIONSHIP MANAGEMENT: AN APPROACH BASED ON LATENT SEMANTIC ANALYSIS AND FUZZY CLUSTERING dm106 TEXT MINING FOR CUSTOMER RELATIONSHIP MANAGEMENT: AN APPROACH BASED ON LATENT SEMANTIC ANALYSIS AND FUZZY CLUSTERING ABSTRACT In most CRM (Customer Relationship Management) systems, information on

More information

Analysis Tools and Libraries for BigData

Analysis Tools and Libraries for BigData + Analysis Tools and Libraries for BigData Lecture 02 Abhijit Bendale + Office Hours 2 n Terry Boult (Waiting to Confirm) n Abhijit Bendale (Tue 2:45 to 4:45 pm). Best if you email me in advance, but I

More information

EFFICIENT JOB SCHEDULING OF VIRTUAL MACHINES IN CLOUD COMPUTING

EFFICIENT JOB SCHEDULING OF VIRTUAL MACHINES IN CLOUD COMPUTING EFFICIENT JOB SCHEDULING OF VIRTUAL MACHINES IN CLOUD COMPUTING Ranjana Saini 1, Indu 2 M.Tech Scholar, JCDM College of Engineering, CSE Department,Sirsa 1 Assistant Prof., CSE Department, JCDM College

More information

Big Data Architect Certification Self-Study Kit Bundle

Big Data Architect Certification Self-Study Kit Bundle Big Data Architect Certification Bundle This certification bundle provides you with the self-study materials you need to prepare for the exams required to complete the Big Data Architect Certification.

More information

Web Document Clustering

Web Document Clustering Web Document Clustering Lab Project based on the MDL clustering suite http://www.cs.ccsu.edu/~markov/mdlclustering/ Zdravko Markov Computer Science Department Central Connecticut State University New Britain,

More information

Enterprise Resource Planning Analysis of Business Intelligence & Emergence of Mining Objects

Enterprise Resource Planning Analysis of Business Intelligence & Emergence of Mining Objects Enterprise Resource Planning Analysis of Business Intelligence & Emergence of Mining Objects Abstract: Build a model to investigate system and discovering relations that connect variables in a database

More information

Verifying Business Processes Extracted from E-Commerce Systems Using Dynamic Analysis

Verifying Business Processes Extracted from E-Commerce Systems Using Dynamic Analysis Verifying Business Processes Extracted from E-Commerce Systems Using Dynamic Analysis Derek Foo 1, Jin Guo 2 and Ying Zou 1 Department of Electrical and Computer Engineering 1 School of Computing 2 Queen

More information

HELSINKI UNIVERSITY OF TECHNOLOGY 26.1.2005 T-86.141 Enterprise Systems Integration, 2001. Data warehousing and Data mining: an Introduction

HELSINKI UNIVERSITY OF TECHNOLOGY 26.1.2005 T-86.141 Enterprise Systems Integration, 2001. Data warehousing and Data mining: an Introduction HELSINKI UNIVERSITY OF TECHNOLOGY 26.1.2005 T-86.141 Enterprise Systems Integration, 2001. Data warehousing and Data mining: an Introduction Federico Facca, Alessandro Gallo, federico@grafedi.it sciack@virgilio.it

More information

Using reporting and data mining techniques to improve knowledge of subscribers; applications to customer profiling and fraud management

Using reporting and data mining techniques to improve knowledge of subscribers; applications to customer profiling and fraud management Using reporting and data mining techniques to improve knowledge of subscribers; applications to customer profiling and fraud management Paper Jean-Louis Amat Abstract One of the main issues of operators

More information

A Cloud Based Solution with IT Convergence for Eliminating Manufacturing Wastes

A Cloud Based Solution with IT Convergence for Eliminating Manufacturing Wastes A Cloud Based Solution with IT Convergence for Eliminating Manufacturing Wastes Ravi Anand', Subramaniam Ganesan', and Vijayan Sugumaran 2 ' 3 1 Department of Electrical and Computer Engineering, Oakland

More information

INTERACTIVE AUDIENCE SELECTION TOOL FOR DISTRIBUTING A MOBILE CAMPAIGN

INTERACTIVE AUDIENCE SELECTION TOOL FOR DISTRIBUTING A MOBILE CAMPAIGN INTERACTIVE AUDIENCE SELECTION TOOL FOR DISTRIBUTING A MOBILE CAMPAIGN Talya Porat, Lihi Naamani-Dery, Lior Rokach and Bracha Shapira Deutsche Telekom Laboratories at Ben Gurion University Beer Sheva,

More information

CUSTOMER Presentation of SAP Predictive Analytics

CUSTOMER Presentation of SAP Predictive Analytics SAP Predictive Analytics 2.0 2015-02-09 CUSTOMER Presentation of SAP Predictive Analytics Content 1 SAP Predictive Analytics Overview....3 2 Deployment Configurations....4 3 SAP Predictive Analytics Desktop

More information

A CONCEPT FOR A SMART WEB PORTAL DEVELOPMENT IN INTELLIGENCE INFORMATION SYSTEM BASED ON SOA

A CONCEPT FOR A SMART WEB PORTAL DEVELOPMENT IN INTELLIGENCE INFORMATION SYSTEM BASED ON SOA A CONCEPT FOR A SMART WEB PORTAL DEVELOPMENT IN INTELLIGENCE INFORMATION SYSTEM BASED ON SOA Jugoslav Achkoski Vladimir Trajkovik Nevena Serafimova Military Academy General Mihailo Apostolski Faculty of

More information

RANKING WEB PAGES RELEVANT TO SEARCH KEYWORDS

RANKING WEB PAGES RELEVANT TO SEARCH KEYWORDS ISBN: 978-972-8924-93-5 2009 IADIS RANKING WEB PAGES RELEVANT TO SEARCH KEYWORDS Ben Choi & Sumit Tyagi Computer Science, Louisiana Tech University, USA ABSTRACT In this paper we propose new methods for

More information

Monitis Project Proposals for AUA. September 2014, Yerevan, Armenia

Monitis Project Proposals for AUA. September 2014, Yerevan, Armenia Monitis Project Proposals for AUA September 2014, Yerevan, Armenia Distributed Log Collecting and Analysing Platform Project Specifications Category: Big Data and NoSQL Software Requirements: Apache Hadoop

More information

Distributed Framework for Data Mining As a Service on Private Cloud

Distributed Framework for Data Mining As a Service on Private Cloud RESEARCH ARTICLE OPEN ACCESS Distributed Framework for Data Mining As a Service on Private Cloud Shraddha Masih *, Sanjay Tanwani** *Research Scholar & Associate Professor, School of Computer Science &

More information

Machine Learning and Cloud Computing. trends, issues, solutions. EGI-InSPIRE RI-261323

Machine Learning and Cloud Computing. trends, issues, solutions. EGI-InSPIRE RI-261323 Machine Learning and Cloud Computing trends, issues, solutions Daniel Pop HOST Workshop 2012 Future plans // Tools and methods Develop software package(s)/libraries for scalable, intelligent algorithms

More information

Table of Contents. Chapter No. 1 Introduction 1. iii. xiv. xviii. xix. Page No.

Table of Contents. Chapter No. 1 Introduction 1. iii. xiv. xviii. xix. Page No. Table of Contents Title Declaration by the Candidate Certificate of Supervisor Acknowledgement Abstract List of Figures List of Tables List of Abbreviations Chapter Chapter No. 1 Introduction 1 ii iii

More information

Hexaware E-book on Predictive Analytics

Hexaware E-book on Predictive Analytics Hexaware E-book on Predictive Analytics Business Intelligence & Analytics Actionable Intelligence Enabled Published on : Feb 7, 2012 Hexaware E-book on Predictive Analytics What is Data mining? Data mining,

More information

Data Mining Algorithms Part 1. Dejan Sarka

Data Mining Algorithms Part 1. Dejan Sarka Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka (dsarka@solidq.com) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses

More information

Master Thesis Proposal

Master Thesis Proposal Master Thesis Proposal Web Data Extraction of University Staff Competencies Edin Zildzo, 1125449 Supervisor: Ao.Univ.Prof.Dr. Jürgen Dorn Septemeber 11, 2014 1 Problem Statement Web data extraction is

More information

Big Data Analytics and Healthcare

Big Data Analytics and Healthcare Big Data Analytics and Healthcare Anup Kumar, Professor and Director of MINDS Lab Computer Engineering and Computer Science Department University of Louisville Road Map Introduction Data Sources Structured

More information

Spatio-Temporal Patterns of Passengers Interests at London Tube Stations

Spatio-Temporal Patterns of Passengers Interests at London Tube Stations Spatio-Temporal Patterns of Passengers Interests at London Tube Stations Juntao Lai *1, Tao Cheng 1, Guy Lansley 2 1 SpaceTimeLab for Big Data Analytics, Department of Civil, Environmental &Geomatic Engineering,

More information

IMPLEMENTING PREDICTIVE ANALYTICS USING HADOOP FOR DOCUMENT CLASSIFICATION ON CRM SYSTEM

IMPLEMENTING PREDICTIVE ANALYTICS USING HADOOP FOR DOCUMENT CLASSIFICATION ON CRM SYSTEM IMPLEMENTING PREDICTIVE ANALYTICS USING HADOOP FOR DOCUMENT CLASSIFICATION ON CRM SYSTEM Sugandha Agarwal 1, Pragya Jain 2 1,2 Department of Computer Science & Engineering ASET, Amity University, Noida,

More information

HOW TO MAKE SENSE OF BIG DATA TO BETTER DRIVE BUSINESS PROCESSES, IMPROVE DECISION-MAKING, AND SUCCESSFULLY COMPETE IN TODAY S MARKETS.

HOW TO MAKE SENSE OF BIG DATA TO BETTER DRIVE BUSINESS PROCESSES, IMPROVE DECISION-MAKING, AND SUCCESSFULLY COMPETE IN TODAY S MARKETS. HOW TO MAKE SENSE OF BIG DATA TO BETTER DRIVE BUSINESS PROCESSES, IMPROVE DECISION-MAKING, AND SUCCESSFULLY COMPETE IN TODAY S MARKETS. ALTILIA turns Big Data into Smart Data and enables businesses to

More information

Big Data for Satellite Business Intelligence

Big Data for Satellite Business Intelligence Big Data for Satellite Business Intelligence GSAW 2015 Loic COULET, Kratos ISE 2015 by Kratos ISE. Published by The Aerospace Corporation with permission. Who s talking? Computer Science Passionate Kratos

More information

ISSN: 2320-1363 CONTEXTUAL ADVERTISEMENT MINING BASED ON BIG DATA ANALYTICS

ISSN: 2320-1363 CONTEXTUAL ADVERTISEMENT MINING BASED ON BIG DATA ANALYTICS CONTEXTUAL ADVERTISEMENT MINING BASED ON BIG DATA ANALYTICS A.Divya *1, A.M.Saravanan *2, I. Anette Regina *3 MPhil, Research Scholar, Muthurangam Govt. Arts College, Vellore, Tamilnadu, India Assistant

More information

Supply chain intelligence: benefits, techniques and future trends

Supply chain intelligence: benefits, techniques and future trends MEB 2010 8 th International Conference on Management, Enterprise and Benchmarking June 4 5, 2010 Budapest, Hungary Supply chain intelligence: benefits, techniques and future trends Zoltán Bátori Óbuda

More information

Chapter 20: Data Analysis

Chapter 20: Data Analysis Chapter 20: Data Analysis Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 20: Data Analysis Decision Support Systems Data Warehousing Data Mining Classification

More information

Cleaned Data. Recommendations

Cleaned Data. Recommendations Call Center Data Analysis Megaputer Case Study in Text Mining Merete Hvalshagen www.megaputer.com Megaputer Intelligence, Inc. 120 West Seventh Street, Suite 10 Bloomington, IN 47404, USA +1 812-0-0110

More information

IMAV: An Intelligent Multi-Agent Model Based on Cloud Computing for Resource Virtualization

IMAV: An Intelligent Multi-Agent Model Based on Cloud Computing for Resource Virtualization 2011 International Conference on Information and Electronics Engineering IPCSIT vol.6 (2011) (2011) IACSIT Press, Singapore IMAV: An Intelligent Multi-Agent Model Based on Cloud Computing for Resource

More information

CONTENTS PREFACE 1 INTRODUCTION 1 2 DATA VISUALIZATION 19

CONTENTS PREFACE 1 INTRODUCTION 1 2 DATA VISUALIZATION 19 PREFACE xi 1 INTRODUCTION 1 1.1 Overview 1 1.2 Definition 1 1.3 Preparation 2 1.3.1 Overview 2 1.3.2 Accessing Tabular Data 3 1.3.3 Accessing Unstructured Data 3 1.3.4 Understanding the Variables and Observations

More information

Advanced In-Database Analytics

Advanced In-Database Analytics Advanced In-Database Analytics Tallinn, Sept. 25th, 2012 Mikko-Pekka Bertling, BDM Greenplum EMEA 1 That sounds complicated? 2 Who can tell me how best to solve this 3 What are the main mathematical functions??

More information

Lavastorm Analytic Library Predictive and Statistical Analytics Node Pack FAQs

Lavastorm Analytic Library Predictive and Statistical Analytics Node Pack FAQs 1.1 Introduction Lavastorm Analytic Library Predictive and Statistical Analytics Node Pack FAQs For brevity, the Lavastorm Analytics Library (LAL) Predictive and Statistical Analytics Node Pack will be

More information

INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE

INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE AGENDA Introduction to Big Data Introduction to Hadoop HDFS file system Map/Reduce framework Hadoop utilities Summary BIG DATA FACTS In what timeframe

More information

Assignment # 1 (Cloud Computing Security)

Assignment # 1 (Cloud Computing Security) Assignment # 1 (Cloud Computing Security) Group Members: Abdullah Abid Zeeshan Qaiser M. Umar Hayat Table of Contents Windows Azure Introduction... 4 Windows Azure Services... 4 1. Compute... 4 a) Virtual

More information

Ensembles and PMML in KNIME

Ensembles and PMML in KNIME Ensembles and PMML in KNIME Alexander Fillbrunn 1, Iris Adä 1, Thomas R. Gabriel 2 and Michael R. Berthold 1,2 1 Department of Computer and Information Science Universität Konstanz Konstanz, Germany First.Last@Uni-Konstanz.De

More information

Business Intelligence. A Presentation of the Current Lead Solutions and a Comparative Analysis of the Main Providers

Business Intelligence. A Presentation of the Current Lead Solutions and a Comparative Analysis of the Main Providers 60 Business Intelligence. A Presentation of the Current Lead Solutions and a Comparative Analysis of the Main Providers Business Intelligence. A Presentation of the Current Lead Solutions and a Comparative

More information

Study and Analysis of Data Mining Concepts

Study and Analysis of Data Mining Concepts Study and Analysis of Data Mining Concepts M.Parvathi Head/Department of Computer Applications Senthamarai college of Arts and Science,Madurai,TamilNadu,India/ Dr. S.Thabasu Kannan Principal Pannai College

More information

Application of Predictive Model for Elementary Students with Special Needs in New Era University

Application of Predictive Model for Elementary Students with Special Needs in New Era University Application of Predictive Model for Elementary Students with Special Needs in New Era University Jannelle ds. Ligao, Calvin Jon A. Lingat, Kristine Nicole P. Chiu, Cym Quiambao, Laurice Anne A. Iglesia

More information

Database Marketing, Business Intelligence and Knowledge Discovery

Database Marketing, Business Intelligence and Knowledge Discovery Database Marketing, Business Intelligence and Knowledge Discovery Note: Using material from Tan / Steinbach / Kumar (2005) Introduction to Data Mining,, Addison Wesley; and Cios / Pedrycz / Swiniarski

More information

Client Overview. Engagement Situation. Key Requirements

Client Overview. Engagement Situation. Key Requirements Client Overview Our client is one of the leading providers of business intelligence systems for customers especially in BFSI space that needs intensive data analysis of huge amounts of data for their decision

More information

IT Infrastructure of Data Center Services Based on ITIL

IT Infrastructure of Data Center Services Based on ITIL IT Infrastructure of Data Center Services Based on ITIL Kazuo Tomoda Fujitsu s data center services have been received favorably by customers and are growing steadily. As customers businesses become more

More information

Real-Time Analytics on Large Datasets: Predictive Models for Online Targeted Advertising

Real-Time Analytics on Large Datasets: Predictive Models for Online Targeted Advertising Real-Time Analytics on Large Datasets: Predictive Models for Online Targeted Advertising Open Data Partners and AdReady April 2012 1 Executive Summary AdReady is working to develop and deploy sophisticated

More information