BIG DATA PROCESSING FOR DECISION MAKING



Similar documents
Data Warehousing and Data Mining in Business Applications

Nine Common Types of Data Mining Techniques Used in Predictive Analytics

Prediction of Stock Performance Using Analytical Techniques

DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM.

ANALYTICS CENTER LEARNING PROGRAM

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

Digging for Gold: Business Usage for Data Mining Kim Foster, CoreTech Consulting Group, Inc., King of Prussia, PA

Course DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Big Data Analytics. An Introduction. Oliver Fuchsberger University of Paderborn 2014

Statistics for BIG data

Algorithmic Scoring Models

Nagarjuna College Of

Information Management course

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Applied Analytics in a World of Big Data. Business Intelligence and Analytics (BI&A) Course #: BIA 686. Catalog Description:

Using reporting and data mining techniques to improve knowledge of subscribers; applications to customer profiling and fraud management

Business Intelligence. Data Mining and Optimization for Decision Making

Applied Analytics in a World of Big Data. Business Intelligence and Analytics (BI&A) Course #: BIA 686. Catalog Description:

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

DATA MINING TECHNIQUES AND APPLICATIONS

The Internet of Things and Big Data: Intro

TRENDS IN THE DEVELOPMENT OF BUSINESS INTELLIGENCE SYSTEMS

BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES

Data are everywhere. IBM projects that every day we generate 2.5 quintillion bytes of data. In relative terms, this means 90

Database Marketing, Business Intelligence and Knowledge Discovery

How Organisations Are Using Data Mining Techniques To Gain a Competitive Advantage John Spooner SAS UK

Data Mining for Fun and Profit

CAS CS 565, Data Mining

Introduction to Data Mining

Big Data. Fast Forward. Putting data to productive use

An Overview of Knowledge Discovery Database and Data mining Techniques

III JORNADAS DE DATA MINING

Data Mining Solutions for the Business Environment

Knowledge Discovery from patents using KMX Text Analytics

Chapter 5. Warehousing, Data Acquisition, Data. Visualization

Advanced In-Database Analytics

The University of Jordan

Performing a data mining tool evaluation

E-Learning Using Data Mining. Shimaa Abd Elkader Abd Elaal

Data Mining and KDD: A Shifting Mosaic. Joseph M. Firestone, Ph.D. White Paper No. Two. March 12, 1997

Fluency With Information Technology CSE100/IMT100

Navigating Big Data business analytics

Statistics 215b 11/20/03 D.R. Brillinger. A field in search of a definition a vague concept

Data Mining + Business Intelligence. Integration, Design and Implementation

Introduction to Data Mining and Business Intelligence Lecture 1/DMBI/IKI83403T/MTI/UI

Direct Marketing of Insurance. Integration of Marketing, Pricing and Underwriting

Information Visualization WS 2013/14 11 Visual Analytics

Introduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing

Healthcare Measurement Analysis Using Data mining Techniques

Sanjeev Kumar. contribute

Reinventing Business Intelligence through Big Data

Better planning and forecasting with IBM Predictive Analytics

IBM's Fraud and Abuse, Analytics and Management Solution

Howe School of Technology Management. Applied Analytics in a World of Big Data. Business Intelligence and Analytics (BI&A) Proposed Course #: BIA 686

Get to Know the IBM SPSS Product Portfolio

Keywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance.

PRACTICAL DATA MINING IN A LARGE UTILITY COMPANY

ISSN: (Online) Volume 3, Issue 4, April 2015 International Journal of Advance Research in Computer Science and Management Studies

Lluis Belanche + Alfredo Vellido. Intelligent Data Analysis and Data Mining

CHAPTER SIX DATA. Business Intelligence The McGraw-Hill Companies, All Rights Reserved

Data Mining: Motivations and Concepts

Making confident decisions with the full spectrum of analysis capabilities

DATA WAREHOUSE AND DATA MINING NECCESSITY OR USELESS INVESTMENT

Research of Postal Data mining system based on big data

Outline. What is Big data and where they come from? How we deal with Big data?

Database Marketing simplified through Data Mining

Life Insurance & Big Data Analytics: Enterprise Architecture

MS1b Statistical Data Mining

OLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP

A Basic Guide to Modeling Techniques for All Direct Marketing Challenges

Introduction to Data Mining

Master of Mathematical Finance: Course Descriptions

THE THREE "Rs" OF PREDICTIVE ANALYTICS

Chapter ML:XI. XI. Cluster Analysis

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

Learning outcomes. Knowledge and understanding. Competence and skills

Surfing the Data Tsunami: A New Paradigm for Big Data Processing and Analytics

Promises and Pitfalls of Big-Data-Predictive Analytics: Best Practices and Trends

What s Trending in Analytics for the Consumer Packaged Goods Industry?

Course Syllabus For Operations Management. Management Information Systems

How To Use Data Mining For Loyalty Based Management

Importance or the Role of Data Warehousing and Data Mining in Business Applications

This Symposium brought to you by

Web Data Mining: A Case Study. Abstract. Introduction

DATA MINING AND WAREHOUSING CONCEPTS

DATA MINING IN FINANCE

NEURAL NETWORKS IN DATA MINING

The Scientific Data Mining Process

Study Plan for the Master Degree In Industrial Engineering / Management. (Thesis Track)

Data Mining Analytics for Business Intelligence and Decision Support

Business Intelligence and Decision Support Systems

Demystifying Big Data Government Agencies & The Big Data Phenomenon

Using Data Mining for Mobile Communication Clustering and Characterization

Cleaned Data. Recommendations

SURVEY REPORT DATA SCIENCE SOCIETY 2014

An Introduction to Advanced Analytics and Data Mining

Transcription:

UDC 004.67 K.K. Nurlybayeva 1, G.T. Balakayeva 2 ( 1 al-farabi Kazakh National university, Almaty, Kazakhstan, Kalamkas.nurlybayeva@gmail.com 2 al-farabi Kazakh National university, Almaty, Kazakhstan) BIG DATA PROCESSING FOR DECISION MAKING Abstract. Nowadays there is a growing problem of mining large amounts of data. This article is dedicated to description of the methods and techniques which are focused to solve these problems. There are some Data Mining algorithms are described in the paper. This article examines the latest developments in data analysis, as well as the benefits of analyzing large volumes of data for businesses. The article also describes proposals for the optimization of data processing systems and integrating them into a single infrastructure for a more rapid and "smart" business decision making. Key words. Big data, Data mining, regression, classification, association, OLAP. Nowadays, there is a big problem concerning the increase of the data volume. The concept of big data means that the volume of the data exceeds the volumes of the information systems. Additional methods and technologies of processing the data are needed when its volumes becomes more than terabyte or petabyte. It is clear that the algorithms which are suitable for small amounts of data are not appropriate for handling big data; they are not fast and efficient enough for it. Lots of information collected in data warehouses of the world s enterprises and companies [1]. The increasing of information volume continues each year and there are number of problems, which are still open to everyone: Storage of data requires certain financial costs for equipment, maintenance, backups, etc. Data processing is becoming more complex and it spent more and more resources. Nevertheless, big data analysis can be very beneficial for most of the interested parties. Big data is the source of interest for analysts who make decisions relying upon historical data [2]. They build necessary reports to be able to get the required information for analyzing and decision making. In addition, there are companies that are interested in obtaining benefits from the information stored in systems that process big data. This article examines the latest developments in data analysis, as well as the benefits of analyzing large volumes of data for businesses. The article also describes proposals for the optimization of data processing systems and integrating them into a single infrastructure for a more rapid and "smart" business decision making. The increase of the information amounts becomes the consequence of the improvement of the data recording and service technologies in the variety of the fields. The activity of almost each enterprise is accompanied by the registration of client s information, i.e. medical, commercial, industrial, scientific organizations. The question is what for this information is needed. There is no interest in raw flow of information without appropriate processing and analysis. Many analytical tools are available nowadays, however several years ago there weren t capability to handle such amounts of data or it was very expensive [3]. New and evolving analytical processing technologies now make possible what was not possible before. Examples include: New systems those are able to process a wide variety of unstructured data. Improved analytical capabilities including predictive and text analytics. Operational business intelligence that improves business flexibility by enabling automated realtime actions and intraday decision making. Cloud computing services. The system for processing big data should combine these technologies to enable new solutions that can bring significant benefits to the business [4]. In addition, to handle big data the system should represent a wide range of new analytical technologies and business possibilities. Examples include technologies such as: Design of predictive models Fraud detection Risk Analysis Construction of situational rooms

On-line analytical processing, etc. As world experience shows storage system and business process management should be reorganized according to the necessity of the company. For example in some cases there is no need in raw data. Therefore, the data which is saved in the database should be preprocessed and transformed. This particular measure will optimize the storage place and cost. Data mining technologies should be implemented in this case. Mathematical statistics formerly has been used as a primary tool for data analysis. However, in connection with the problems associated with data processing, statistical methods were not sufficient for analysis. Statistical methods are useful mainly for checking hypotheses (verification-driven data mining) and "rough" exploratory analysis, which is the foundation of online analytical processing (online analytical processing, OLAP). There are wide ranges of usage of the Data mining technology. It is used everywhere where data is present. The main areas where it is very important to use Data mining are marketing, credit scoring, fraud detection, any type of forecasting, etc [5]. There are five methods of Data mining that should be mentioned. Association Sequence Classification Clustering Forecasting Regression Association takes place when several occasions are related with each other. Data mining technologies allow determine the patterns of associative rules, which then can be used for knowledge database formation in the decision making systems. Sequence appears in case of the chain of the timely related occasions. Classification has the aim to solve the problem of sorting the separate occasion to the class of the existing occasions, by determining its number. Clustering is used in case of the finding the final number of clusters or classes which divide the set of occasions the particular non-intersecting subsets. Forecasting is used in every field in order to define future benefits from new product. Regression analysis is a statistical process for estimating the relationships among variables [6]. In probability theory and mathematical statistics, it is a dependence of the average value of a random variable from some other value or even several. In contrast to the purely functional dependence y = f(x), where each value of the independent variable x is the unique value of the dependent variable y, regression dependence implies that each value of the variable x may correspond to different values of y, due to the random nature of dependence. If there are dependence such as to some value of x corresponds a set of values {y, y,, y }, then the dependence is arithmetic from the x and it is a statistical regression: y = (y, y,, y ) n Regression study in probability theory based on the fact that the random variables X and Y, with joint probability distribution associated probabilistic dependence: for every fixed value X = x, the value of Y is a random variable with a certain (depending on the value of x) conditional probability distribution. Regression of Y on X is determined by the value of the conditional expectation Y, calculated under the condition that X = x : E(Y x) = u(x). The equation y = u(x) is a regression equation. Regression lines have the following remarkable property among all real functions f(x) a minimum expectation E[Y f(x)] is for a function f(x) = u(x). This means that the regression on Y by X provides the best in this sense on the representation of Y value X. This property allows the use regression for prediction value of Y by X. In other words, if the Y value is not directly observed and the experiment allows to record only X, then as predicted value Y can use the value of Y = u(x). The simplest case is when the regression dependence of Y on X is linear, for example E(Y x) = b + b x, where b and b - regression coefficients. In practice, the regression coefficients in the equation y=u(x) are unknown, and they are measured from the observed data.

Figure 1. Regression line Regression is widely used in analytical techniques to solve various business problems, such as forecasting (sales, exchange rates and equity), evaluation of various business indicators for the observed values of other indicators (scoring), identifying relationships between indicators, etc. Differences of Data Mining from other methods of data analysis Traditional methods of data analysis (statistical methods) and OLAP (Online Analytical Processing Systems) is that it mainly focuses on verification of pre-formulated hypotheses (verification-driven data mining) and the "rough" exploratory analysis, which underpins the online analytical processing (OnLine Analytical Processing, OLAP), while one of the main provisions of the Data Mining - find non-obvious relationships. Data Mining tools may find these patterns on their own and also build their own hypotheses about relationships. Since it is the formulation of hypotheses about relationships is the most difficult task, Data Mining advantage over other methods of analysis are obvious. Most statistical methods for identifying relationships in data using the concept of averaging over the sample, which leads to operations on non-existent values, whereas Data Mining operates the real values. OLAP is more suitable for retrospective understanding of historical data, Data Mining based on historical data to answer questions about the future.

Figure 2. Main stages of data processing Perspectives of Data Mining technology Potential Data Mining provides a tremendous opportunity for expanding the frontiers of technology. Development of Data Mining concerns the following areas: selection of types of subject areas, which will facilitate the formalization of the decision of the relevant tasks Data Mining, relating to these areas; establishment of formal languages and logical means by which arguments will be formalized and automated tool that will solve problems Data Mining in specific subject areas; development of methods for Data Mining, able not only to extract patterns from data, but also to form some theories based on empirical data; addressing the significant backlog of opportunities of Data Mining tools from theoretical achievements in this field. It is evident that the development of Data mining technology is the most directed to the areas related to business. In the short term Data Mining products can become as ordinary and necessary, as e- mail, and, for example, be used by users to find the lowest prices on certain goods or the cheapest tickets. In the long term future of Data Mining is really interesting - it can be to find intelligent agents as new treatments of various diseases, and a new understanding of the nature of the universe. However, Data Mining contains a potential danger - in fact more and more information becomes available through a worldwide network, including information of a private nature, and more knowledge is possible to get out of it. Areas where the use of technology Data Mining, is likely to be successful have these features: require decisions based on knowledge; have a changing environment; are accessible, adequate and meaningful data; provide high returns from the right decisions. There are several points of view on Data Mining nowadays. Supporters of one of them consider it a mirage, distracting from the classical analysis. Supporters of the other direction - that those who accept the Data Mining as an alternative to the traditional approach to the analysis. There is also the middle, where we consider the possibility of sharing the latest achievements in the field of Data Mining and classical statistical analysis of data. Data Mining technology is constantly evolving, is attracting increasing interest from both the scientific world, and from the applications of technology in business. Integration of new technologies such as Data Mining and others into a single infrastructure will help to achieve more rapid and smart business decision making. REFERENCES 1. Randal E. Bryant, Randy H. Katz, Berkeley Edward D. Lazowska. Big-Data Computing: Creating revolutionary breakthroughs in commerce, science, and society 2008. V.8 2. Bollier D. The Promise and Peril of Big Data Washington: The Aspen Institute, 2010. 23 p. 3. Heemink A. Mathematical Theory of Data Processing in Models (Data Assimilation Problems) 2008. Vol.1 202 p. 4. Moran W., La Scala B. Measurements in Mathematical Modeling and Data Processing 2008. Vol. 1 284 p. 5. K.K. Nurlybayeva, G.T. Balakayeva. Simulation of Large Data Processing for Smarter Decision Making. AWERProcedia Information Technology and Computer Science. 3 rd World Conference on Information Technology WCIT-2012. 2012. Vol 03, 2013. 1253-1257 p. 6. K.K. Nurlybayeva, G.T. Balakayeva. Processing of large amounts of data on a credit scoring example using neural network technology. Safety and Security Engineering V. 2013. - 165-171p. Нурлыбаева К.К., Балақаева Г.Т. Big making data processing for decision Түйіндеме. Осы уақытта мәлiметтердiң көлемдерiн үлкеюi өндiрудiң өсетiн сұрағында болады Бұл мақала осы мәселелердi шешуге бағдарлаған әдістердің және қабылдаулардың сипаттамасына арнаулы. Кейбір зияткерлік талдаудың алгоритмдер мақалада суреттелiп айтылған. Осы мақалада үлкен көлемдердің талдаудың артықшылықтары осы кәсіпкерлік үшін талдауға облыста өте жаңа әзірлеулер қаралып жатыр. Мақалада сонымен бірге өңдеу жүйе ықшамдау бойынша ұсыныстар суреттеліп жатыр және «ақылды» кәсіпкерлік-шешімдері тезірек қабылдануы үшін біртұтас инфрақұрылымына олардың кірігуі. Маңызды сөздер: Үлкен деректер, деректерді өңдеу, регрессия, топтастыру, қауымдастық, OLAP.

Нурлыбаева К.К., Балакаева Г.Т. Большие объемы данных для принятия решений Резюме. В настоящее время существует растущая проблема обработки больших объемов данных. Эта статья посвящена описанию методов и приемов, которые ориентированы на решение этих проблем. Некоторые алгоритмы интеллектуального анализа данных описаны в статье. В данной статье рассматриваются новейшие разработки в области анализа данных, а также преимущества анализа больших объемов данных для бизнеса. В статье также описывается предложения по оптимизации системы обработки данных и их интеграции в единую инфраструктуру для более быстрого принятия «умных» бизнес-решений. Ключевые слова. Большие данных, обработка данных, регрессия, классификация, ассоциация, OLAP.