Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download ""

Transcription

1 Breaking Out of the Black-Box: Research Challenges in Data Mining Padhraic Smyth Information and Computer Science University of California, Irvine CA 92697{ Introduction Database researchers, statisticians, and \data owners" often have quite dierent views of data. In a database context the traditional goal has been to provide a general and exible data management framework, with less concern about the content of the data. Statisticians on the other hand have traditionally focused primarily on issues of data modeling and inference with relatively little concern for where the data physically reside or how the data will be accessed. Data owners, in turn, tend to be more focused on using the data as a means to an end: business data owners want to increase revenue by developing better predictive models, and scientic data owners typically want todevelop insight into the phenomena generating the data. In this brief position paper we take a broad-scale view of \what people do with their data" (the data owner's perspective) and use this viewpoint to identify current opportunities and challenges for data mining research. 2 The Process of Data Mining The data mining process is often characterized as a multi-stage iterative process involving data selection, data cleaning, application of data mining algorithms, evaluation, and so forth. Here we adopt a somewhat dierent process-oriented view and break it down into ve basic steps: 1. Exploring and Preprocessing: the initial steps of exploring, visualizing, and querying the data, to gain insight into the data in an interactive manner. Preprocessing steps such variable selection, data focusing, and data validation can also be included in these initial steps. 2. Modeling: the steps involved in (a) selecting the model representations that we seek to t to the data (e.g., a tree, a linear function, a probability density model, etc.), (b) selecting the score functions that score dierent models with respect to the data, and (c) specifying the computational methods and algorithms to optimize the score function (e.g., greedy local search). These \components" combined together specify the data mining algorithm to be used. The components may be \precompiled" into a specic algorithm (e.g., CART or C4.5 decision tree implementations) or may beintegrated in a \customized" manner for a specic application (much more common in the sciences). 3. Mining: the step (often repeated) of actually running a particular data mining algorithm on a particular data set.

2 4. Evaluating: the step (often ignored) of critically evaluating the quality of the output of the data mining algorithm from step 3, both the predictions of the model and the interpretation of the tted model itself. 5. Deploying: the step (rarely achieved) of putting a model from a data mining algorithm into routine predictive use, e.g., using the model continuously in real-time for scoring customers visiting an ecommerce Web site. A challenging (and under-appreciated) technical issue in this context is how and when models should be updated for such \continuous data stream" applications. 3 Two Extremes of Data Mining Given the vast numbers of dierent users of data analysis tools, across dierent application disciplines, it is clearly dangerous to make broad generalizations about data analysis and data mining! Nonetheless, we will now consider two \prototype" users of data mining tools in the general context of the 5-step data mining provess above. These two prototypes are in some respects at opposite ends of a hypothetical spectrum of approaches to data mining. 3.1 The Business Data Miner The rst prototype data miner is \The Business-Person," or \BP" for short. BP typically deals with large numbers of customers, for example in retail consumer or nancial services environments. BP's main goal is to use data mining techniques for prediction of customer behavior to gain competitive advantage, e.g., using decision trees to try to identify promising customers for a marketing campaign. BP's approach to data mining is often strictly constrained by a number of factors: cost factors (the cost of data mining development and deployment should not be less than anticipated gains in revenue), integration factors (the deployed predictive modelmay need be integrated into an existing transaction-oriented environment), and time constraints (any potential competitive advantage may depend critically on being able to develop and deploy a predictive model quickly). In terms of the 5-step process from the last section, the BP may not have much time for exploration and modeling, but instead will often use \o-the-shelf" tools such as decision trees to quickly generate results, i.e., they may jump quickly to Step 3 of data mining. Step 4, evaluation, is becoming increasingly important in practical applications, where evaluation may gowell beyond the use of simple validation data sets. For example, ecommerce retailers (such as Amazon.com) and nancial services companies (such as CapitalOne) conduct designed experiments on random subsets of consumers to get more precise and realistic evaluation of new predictive methods. If one asks the BP what their most pressing problems are, they will almost certainly not say that they need a slightly more accurate decision tree algorithm, or a slightly faster association rule algorithm. Instead their most pressing problem is that of managing the overall process: how should features be dened for time-dependent data? which models are most appropriate? how much data is needed for training? will a random sample of 100k customers from a database of 10 million be \good enough"? what kind of a system can be put in place to update the models as new data arrives? and so on. 3.2 The Science Data Miner The other prototype we consider is the \Science Person" (SP for short) which isin some sense at the other end of a hypothetical spectrum of data miners. The SP might be an atmospheric scientist, investigating global climate change patterns via analysis of spatio-temporal gridded observations of temperature, pressure, wind-speed, taken on a global grid on a daily basis for the past 30 years.

3 Or the SP might be working in computational biology, exploring a large gene expression data set and its relation to cancer. The SP's data mining approach is quite dierent to that of the BP. SP typically spends significant time in Steps 1 and 2: exploring, visualizing, dening alternative models, and so forth. For example, in atmospheric science, the data can be \shaped" in dierent ways via numerous preprocessing techniques such as principal components analysis, time-series smoothing, and so forth. A single model may take 6 months or a yeartodevelop and only be evaluated once on a particular data set. SP's primary goal is to explore model space in such a manner as to better understand the phenomena generating the data: generating better predictions is merely a stepping stone on the path to identifying better models. Thus, being able to \get inside the black box" is absolutely critical to the SP. A decision tree may provide useful predictions, but ultimately the SP will want to understand precisely what is making the tree work. In this context, it is not surprising that traditional statistical models, \generative models" for the data, have tended to be more widely accepted by SPs than \black box" algorithms such as neural networks, trees, and so forth. For example, hidden Markov models (HMMs) have been very successful in protein sequence alignment because they have taken a generic model structure (the HMM) and integrated domain-specic knowledge into the model to provide a scientically plausible model-based approach for sequence alignment, clustering, and so forth. The SP is typically much less constrained (in terms of time, cost justications, etc) than the BP. However, while the BP can use aggregate population metrics (such as classication accuracy, squared error, lift, etc.) to evaluate models, the evaluation process for the SP is typically much more subjective in the sense that (a) it is not only the quality of the predictions that matter but also the structure of the learned model itself, and (b) the knowledge captured by the learned model must be evaluated relative to what is already known to the SP and to the SP's research community. The eect of this is to make the overall data mining process much more interactive and humancentered: models are carefully constructed and teased apart in a continuous labor-intensive cycle of model being proposed, evaluated, discarded, revised, veried, and so on. 4 Research Challenges in Data Mining In the context of the discussion above, there are several challenges that appear to be worthy of attention for data mining in the coming years. 4.1 A General Grand Challenge for Data Mining A grand vision for data mining is the development of general-purpose data mining software environments that assist the user in the overall process of data mining (such as the 5 steps described earlier). The software would ideally help the data miner navigate through the space of possible exploratory steps, modeling steps, algorithm choices, evaluation metrics, and deployment options. The current state of aairs is that for many applications (from ecommerce to climate modeling) the branching factor in terms of selecting specic methods is so high that most novice users are bewildered by the space of possible choices that they can make in the data mining process. The conventional solution to date (e.g., in commercial data mining packages) is typically to support a few standard methods and algorithms at each step. Clearly this can severely constrain how we model our data and in the extreme may be entirely inappropriate for the scientic data miner (where time and space are often important enough that they must be explicitly accounted for in any model). Development of such a software environment is clearly quite a challenging problem. Statisticians have been thinking about such approaches for quite some time, i.e., general purpose environments

4 for \programming with data" (e.g., Chambers, 1998) as well as graphical model environments that provide exible and general-purpose high-level languages for model construction (e.g., Gilks, Thomas, and Spiegelhalter, 1994). However, these tools are primarily intended for use by statisticians. To get BP and SP domain experts to use statistical algorithms on a routine basis we need to develop a \next-generation" of interactive user-centered data exploration tools. If we don't, the current situation will continue where only a very small set of algorithms and models are widelyused, and the broader spectrum of modeling and algorithmic techniques are accessible to only a small subset of data miners skilled in these techniques. 4.2 Challenges for Business Applications Grand Business Challenges Self-Tuning Data Mining Algorithms: \turn-key" data mining tools that require minimal intervention and tuning from data mining experts. A problem with current data mining algorithms is that they can often require a team of experienced Phd-level researchers to \baby sit" a data mining algorithm to get reasonable results in practice. For data mining solutions to be economically eective in large-scale operational business environments they will need a degree of autonomy beyond what we currently have available. Of course it is not clear that it is even possible to achieve such autonomy, but clearly there are important resource-management issues that need to be considered in this context. For example, a decision-theoretic autonomous agent framework for data mining could be very useful. Such a data mining algorithm could use utilities/probabilities to autonomously decide how much historical data to store, when to update the model, how much data to use in training, which models to use, how to validate and test the models, and so forth. This is of course a rather challenging assignment, given the real-time nature of the environment, the scale of data that is typically involved, and the uncertainties that abound Specic Business Challenges Modeling Time: predictive learning algorithms for time-dependent streams of customer data. For example with clickstream data the current approaches in practice largely involve converting clickstreams into feature vectors so that vector-based algorithms such as decision trees can be utilized. While this is a good engineering approach that takes advantage of existing tools, it cannot take account of many critical aspects of temporal data such as periodicity, seasonality, non-stationarity, and so forth. Models and algorithms that can incorporate these time aspects of customer behavior will tend to be more useful and valuable in the long run. (For an example of such models for predictive modeling with clickstream ecommerce data see Moe and Fader (2000)). Personalization from Sparse Data: integration of ideas from Bayesian statistics into customer proling and prediction, e.g., the use of hierarchical Bayesian models for borrowing strength given huge numbers of customers but with very little data for each on average. Such modeling approaches are routinely used in statistics but are virtually unknown in data mining at present. 4.3 Challenges for Science Applications Grand Scientic Challenges Scalable Exploration: The importance of the exploration step is easy to underestimate: in many scientic applications this is where the bulk of time on data analysis is actually spent, on

5 tasks such as visualization and clustering, leading to basic theory formation and hypothesis generation. Notions such as outliers, unusual patterns, and trends can play a particularly important role in scientic discovery. Work in database research can play an important role here: intelligent caching for ecient visualization, novel techniques for querying spatiotemporal data, multi-resolution data structures for ecient access to data, and so forth. See Critchlow and Musick (1999) for a general discussion of the role of data management in this context. As a specic example, in the atmospheric sciences signicant research eort is currently expended on constructing and interpreting complex general circulation models (GCMs) of the Earth's atmosphere, oceans, and landmass. These are very complex physical models (including wind and ocean current dynamics, atmospheric chemistry, polar ice-cap models, and so forth) that are used to simulate multivariate gridded measurements over the Earth at roughly daily intervals for a 100-year time-period. The returned data is a huge spatio-temporal multivariate eld. Scientists believe these models are quite accurate (at least in terms of longterm climatic time-scales). The models are widely used to investigate basic hypotheses about global warming phenomena (for example): if carbon emissions were increased at rate x how might this propagate through to polar ice-caps, and in turn would this have any eect on numbers and intensities of winter storms? Despite the success in improving the quality of the underlying GCMs, the methods available for exploring the resultant simulation data tend to be rather primitive. There is no equivalent of CART or C4.5 for easy modeling and exploration. Scientists typically investigate the data in a quite manual fashion by plotting various grids at certain times, and various summary statistics over time. There are clear research opportunities for data miners. However, the problems are quite challenging in that the raw underlying grid data are not the phenomena that the scientists are interested in, but rather the evolution and characteristics of \coherent structures" (such as storms, eddies, and so forth). Furthermore, once detected, these objects have varying spatial and temporal extent, making clustering (for example) a non-trivial problem. Nonetheless, despite these challenges, there are signicant opportunities for data mining research in a broad sense: coupling ideas from pattern recognition and computer vision (for object detection and tracking), from data mining algorithms (quantifying novelty), and from data access and management (how to carry this out in an ecient manner given the data sizes) Specic Scientic Challenges Models and Languages for Spatio-Temporal Data: Much scientic data involves spatial and temporal data (and sometimes both). \Non-vector" data sets are inherently more demanding to work with than multivariate vector data from a modeling viewpoint because the branching factor in modeling choice is very high. For example, the modeler must make many decisions about how much memory (in a temporal problem) should be modeled, what representations to choose, etc. While more conventional multivariate data also can have a high branching factor (variable selection, variable pre-processing, etc), this factor can be amplied when time, space, and hierarchies are introduced. Thus, there is a general need for \intelligent assistants" and high-level languages that support scientic data miners as they navigate through large spaces of models and algorithms. Pattern-Finding and Prior Knowledge: techniques and algorithms that can search massive databases to nd unusual structures that are both novel in the context of what is already

6 generally known to the scientist and are useful in some sense. A key word here is \structure": simple associations (such as produced by association rule algorithms) tend to be of limited value for many scientic data sets. Instead, computational biologists are often interested in local subsequences in DNA sequences (motifs) that are signicantly dierent from the background DNA distribution (e.g., Pevzner and Sze, 2000). Similarly, astronomers who study galaxy formation are often particularly interested in nding objects in radio-images of the deep-sky that are morphologically dierent to known stars and galaxies. While many data mining algorithms focus on searching for global models that describe a data set in its entirety, searching for local structure (such as motif-nding) is quite common in scientic applications, where only a small portion of the data is of interest. The research challenges are signicant: how does one incorporate the prior knowledge of the scientist in an eective manner? what is the appropriate score function for patterns? how does one solve the search problem when potentially looking for a \needle-in-a-haystack"? Is there a general theory for such pattern-nding algorithms or is each application relatively unique? 5 Conclusions Research in data mining as currently practiced is good at developing specic \black box" algorithms (such as learning algorithms for decision trees, naive Bayes, support vector machines, and so forth). But the \black box" algorithms are only a part of the overall landscape of data mining practice. We also need to be aware of how our tools are used in real applications. Ideally data mining research should focus more on what happens traditionally both before (exploration and modeling) and after (evaluation and deployment) the actual execution of a specic modeling or pattern-nding algorithm. These steps in the process often involve inherently hard problems but also present interesting research opportunities that have signicant potential scientic and economic impact. Acknowledgements Writing of this paper was supported in part by research awards from the following organizations: NSF (IRI ), NIST Advanced Technology Program, KLA-Tencor, Microsoft Research, and Lawrence Livermore National Laboratories. References Chambers, J. M. (1998) Programming with Data: Mathsoft. A Guide to the S Language, Seattle, WA: Gilks, W. R., Thomas, A., and Spiegelhalter, D. J. (1994) A language and program for complex Bayesian modeling, The Statistician, 43, 169{178. Moe, W. W., and Fader, P. S. (2000) Capturing evolving visit behavior in clickstream data, Working Paper Number 00{003 Wharton School of Business, University of Pennsylvania. Musick, R., Critchlow, T. (1999) Practical lessons in supporting large-scale computational science, SIGMOD Record, 28(4), 49{57. Pevzner, P., and Sze, S.-H. (2000) Combinatorial approaches to nding subtle signals in DNA sequences, Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology, Menlo Park, CA: AAAI Press, pp. 269{278.

Sanjeev Kumar. contribute

Sanjeev Kumar. contribute RESEARCH ISSUES IN DATAA MINING Sanjeev Kumar I.A.S.R.I., Library Avenue, Pusa, New Delhi-110012 sanjeevk@iasri.res.in 1. Introduction The field of data mining and knowledgee discovery is emerging as a

More information

The Scientific Data Mining Process

The Scientific Data Mining Process Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In

More information

Introduction. A. Bellaachia Page: 1

Introduction. A. Bellaachia Page: 1 Introduction 1. Objectives... 3 2. What is Data Mining?... 4 3. Knowledge Discovery Process... 5 4. KD Process Example... 7 5. Typical Data Mining Architecture... 8 6. Database vs. Data Mining... 9 7.

More information

Example application (1) Telecommunication. Lecture 1: Data Mining Overview and Process. Example application (2) Health

Example application (1) Telecommunication. Lecture 1: Data Mining Overview and Process. Example application (2) Health Lecture 1: Data Mining Overview and Process What is data mining? Example applications Definitions Multi disciplinary Techniques Major challenges The data mining process History of data mining Data mining

More information

SPATIAL DATA CLASSIFICATION AND DATA MINING

SPATIAL DATA CLASSIFICATION AND DATA MINING , pp.-40-44. Available online at http://www. bioinfo. in/contents. php?id=42 SPATIAL DATA CLASSIFICATION AND DATA MINING RATHI J.B. * AND PATIL A.D. Department of Computer Science & Engineering, Jawaharlal

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining 1 Why Data Mining? Explosive Growth of Data Data collection and data availability Automated data collection tools, Internet, smartphones, Major sources of abundant data Business:

More information

Visualization methods for patent data

Visualization methods for patent data Visualization methods for patent data Treparel 2013 Dr. Anton Heijs (CTO & Founder) Delft, The Netherlands Introduction Treparel can provide advanced visualizations for patent data. This document describes

More information

CHAPTER 3 DATA MINING AND CLUSTERING

CHAPTER 3 DATA MINING AND CLUSTERING CHAPTER 3 DATA MINING AND CLUSTERING 3.1 Introduction Nowadays, large quantities of data are being accumulated. The amount of data collected is said to be almost doubled every 9 months. Seeking knowledge

More information

Bayesian Predictive Profiles with Applications to Retail Transaction Data

Bayesian Predictive Profiles with Applications to Retail Transaction Data Bayesian Predictive Profiles with Applications to Retail Transaction Data Igor V. Cadez Information and Computer Science University of California Irvine, CA 92697-3425, U.S.A. icadez@ics.uci.edu Padhraic

More information

Information Management course

Information Management course Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 01 : 06/10/2015 Practical informations: Teacher: Alberto Ceselli (alberto.ceselli@unimi.it)

More information

Principles of Data Mining by Hand&Mannila&Smyth

Principles of Data Mining by Hand&Mannila&Smyth Principles of Data Mining by Hand&Mannila&Smyth Slides for Textbook Ari Visa,, Institute of Signal Processing Tampere University of Technology October 4, 2010 Data Mining: Concepts and Techniques 1 Differences

More information

Big Data: Rethinking Text Visualization

Big Data: Rethinking Text Visualization Big Data: Rethinking Text Visualization Dr. Anton Heijs anton.heijs@treparel.com Treparel April 8, 2013 Abstract In this white paper we discuss text visualization approaches and how these are important

More information

Dynamic Data in terms of Data Mining Streams

Dynamic Data in terms of Data Mining Streams International Journal of Computer Science and Software Engineering Volume 2, Number 1 (2015), pp. 1-6 International Research Publication House http://www.irphouse.com Dynamic Data in terms of Data Mining

More information

REGULATIONS FOR THE DEGREE OF MASTER OF SCIENCE IN COMPUTER SCIENCE (MSc[CompSc])

REGULATIONS FOR THE DEGREE OF MASTER OF SCIENCE IN COMPUTER SCIENCE (MSc[CompSc]) 305 REGULATIONS FOR THE DEGREE OF MASTER OF SCIENCE IN COMPUTER SCIENCE (MSc[CompSc]) (See also General Regulations) Any publication based on work approved for a higher degree should contain a reference

More information

Gerard Mc Nulty Systems Optimisation Ltd gmcnulty@iol.ie/0876697867 BA.,B.A.I.,C.Eng.,F.I.E.I

Gerard Mc Nulty Systems Optimisation Ltd gmcnulty@iol.ie/0876697867 BA.,B.A.I.,C.Eng.,F.I.E.I Gerard Mc Nulty Systems Optimisation Ltd gmcnulty@iol.ie/0876697867 BA.,B.A.I.,C.Eng.,F.I.E.I Data is Important because it: Helps in Corporate Aims Basis of Business Decisions Engineering Decisions Energy

More information

Information Visualization WS 2013/14 11 Visual Analytics

Information Visualization WS 2013/14 11 Visual Analytics 1 11.1 Definitions and Motivation Lot of research and papers in this emerging field: Visual Analytics: Scope and Challenges of Keim et al. Illuminating the path of Thomas and Cook 2 11.1 Definitions and

More information

An Overview of Knowledge Discovery Database and Data mining Techniques

An Overview of Knowledge Discovery Database and Data mining Techniques An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,

More information

DATA MINING TECHNIQUES AND APPLICATIONS

DATA MINING TECHNIQUES AND APPLICATIONS DATA MINING TECHNIQUES AND APPLICATIONS Mrs. Bharati M. Ramageri, Lecturer Modern Institute of Information Technology and Research, Department of Computer Application, Yamunanagar, Nigdi Pune, Maharashtra,

More information

1. Introduction MINING AND TRACKING EVOLVING WEB USER TRENDS FROM LARGE WEB SERVER LOGS. Basheer Hawwash and Olfa Nasraoui

1. Introduction MINING AND TRACKING EVOLVING WEB USER TRENDS FROM LARGE WEB SERVER LOGS. Basheer Hawwash and Olfa Nasraoui MINING AND TRACKING EVOLVING WEB USER TRENDS FROM LARGE WEB SERVER LOGS Basheer Hawwash and Olfa Nasraoui Knowledge Discovery and Web Mining Lab Dept. of Computer Engineering and Computer Science University

More information

Database Marketing, Business Intelligence and Knowledge Discovery

Database Marketing, Business Intelligence and Knowledge Discovery Database Marketing, Business Intelligence and Knowledge Discovery Note: Using material from Tan / Steinbach / Kumar (2005) Introduction to Data Mining,, Addison Wesley; and Cios / Pedrycz / Swiniarski

More information

ECLT 5810 E-Commerce Data Mining Techniques - Introduction. Prof. Wai Lam

ECLT 5810 E-Commerce Data Mining Techniques - Introduction. Prof. Wai Lam ECLT 5810 E-Commerce Data Mining Techniques - Introduction Prof. Wai Lam Data Opportunities Business infrastructure have improved the ability to collect data Virtually every aspect of business is now open

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining Jay Urbain Credits: Nazli Goharian & David Grossman @ IIT Outline Introduction Data Pre-processing Data Mining Algorithms Naïve Bayes Decision Tree Neural Network Association

More information

Introduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing

Introduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing Introduction to Data Mining and Machine Learning Techniques Iza Moise, Evangelos Pournaras, Dirk Helbing Iza Moise, Evangelos Pournaras, Dirk Helbing 1 Overview Main principles of data mining Definition

More information

Principles of Dat Da a t Mining Pham Tho Hoan hoanpt@hnue.edu.v hoanpt@hnue.edu. n

Principles of Dat Da a t Mining Pham Tho Hoan hoanpt@hnue.edu.v hoanpt@hnue.edu. n Principles of Data Mining Pham Tho Hoan hoanpt@hnue.edu.vn References [1] David Hand, Heikki Mannila and Padhraic Smyth, Principles of Data Mining, MIT press, 2002 [2] Jiawei Han and Micheline Kamber,

More information

Learning outcomes. Knowledge and understanding. Competence and skills

Learning outcomes. Knowledge and understanding. Competence and skills Syllabus Master s Programme in Statistics and Data Mining 120 ECTS Credits Aim The rapid growth of databases provides scientists and business people with vast new resources. This programme meets the challenges

More information

Data Mining and Exploration. Data Mining and Exploration: Introduction. Relationships between courses. Overview. Course Introduction

Data Mining and Exploration. Data Mining and Exploration: Introduction. Relationships between courses. Overview. Course Introduction Data Mining and Exploration Data Mining and Exploration: Introduction Amos Storkey, School of Informatics January 10, 2006 http://www.inf.ed.ac.uk/teaching/courses/dme/ Course Introduction Welcome Administration

More information

Search and Data Mining: Techniques. Applications Anya Yarygina Boris Novikov

Search and Data Mining: Techniques. Applications Anya Yarygina Boris Novikov Search and Data Mining: Techniques Applications Anya Yarygina Boris Novikov Introduction Data mining applications Data mining system products and research prototypes Additional themes on data mining Social

More information

Is a Data Scientist the New Quant? Stuart Kozola MathWorks

Is a Data Scientist the New Quant? Stuart Kozola MathWorks Is a Data Scientist the New Quant? Stuart Kozola MathWorks 2015 The MathWorks, Inc. 1 Facts or information used usually to calculate, analyze, or plan something Information that is produced or stored by

More information

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014 RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer

More information

Data Mining: Concepts and Techniques. Jiawei Han. Micheline Kamber. Simon Fräser University К MORGAN KAUFMANN PUBLISHERS. AN IMPRINT OF Elsevier

Data Mining: Concepts and Techniques. Jiawei Han. Micheline Kamber. Simon Fräser University К MORGAN KAUFMANN PUBLISHERS. AN IMPRINT OF Elsevier Data Mining: Concepts and Techniques Jiawei Han Micheline Kamber Simon Fräser University К MORGAN KAUFMANN PUBLISHERS AN IMPRINT OF Elsevier Contents Foreword Preface xix vii Chapter I Introduction I I.

More information

Data Mining Analytics for Business Intelligence and Decision Support

Data Mining Analytics for Business Intelligence and Decision Support Data Mining Analytics for Business Intelligence and Decision Support Chid Apte, T.J. Watson Research Center, IBM Research Division Knowledge Discovery and Data Mining (KDD) techniques are used for analyzing

More information

Learning is a very general term denoting the way in which agents:

Learning is a very general term denoting the way in which agents: What is learning? Learning is a very general term denoting the way in which agents: Acquire and organize knowledge (by building, modifying and organizing internal representations of some external reality);

More information

Statistical Models in Data Mining

Statistical Models in Data Mining Statistical Models in Data Mining Sargur N. Srihari University at Buffalo The State University of New York Department of Computer Science and Engineering Department of Biostatistics 1 Srihari Flood of

More information

Let the data speak to you. Look Who s Peeking at Your Paycheck. Big Data. What is Big Data? The Artemis project: Saving preemies using Big Data

Let the data speak to you. Look Who s Peeking at Your Paycheck. Big Data. What is Big Data? The Artemis project: Saving preemies using Big Data CS535 Big Data W1.A.1 CS535 BIG DATA W1.A.2 Let the data speak to you Medication Adherence Score How likely people are to take their medication, based on: How long people have lived at the same address

More information

DATA ANALYTICS USING R

DATA ANALYTICS USING R DATA ANALYTICS USING R Duration: 90 Hours Intended audience and scope: The course is targeted at fresh engineers, practicing engineers and scientists who are interested in learning and understanding data

More information

A Lightweight Solution to the Educational Data Mining Challenge

A Lightweight Solution to the Educational Data Mining Challenge A Lightweight Solution to the Educational Data Mining Challenge Kun Liu Yan Xing Faculty of Automation Guangdong University of Technology Guangzhou, 510090, China catch0327@yahoo.com yanxing@gdut.edu.cn

More information

Data, Measurements, Features

Data, Measurements, Features Data, Measurements, Features Middle East Technical University Dep. of Computer Engineering 2009 compiled by V. Atalay What do you think of when someone says Data? We might abstract the idea that data are

More information

REGULATIONS FOR THE DEGREE OF MASTER OF SCIENCE IN COMPUTER SCIENCE (MSc[CompSc])

REGULATIONS FOR THE DEGREE OF MASTER OF SCIENCE IN COMPUTER SCIENCE (MSc[CompSc]) 299 REGULATIONS FOR THE DEGREE OF MASTER OF SCIENCE IN COMPUTER SCIENCE (MSc[CompSc]) (See also General Regulations) Any publication based on work approved for a higher degree should contain a reference

More information

Introduction to Data Mining

Introduction to Data Mining Bioinformatics Ying Liu, Ph.D. Laboratory for Bioinformatics University of Texas at Dallas Spring 2008 Introduction to Data Mining 1 Motivation: Why data mining? What is data mining? Data Mining: On what

More information

Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland

Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland Data Mining and Knowledge Discovery in Databases (KDD) State of the Art Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland 1 Conference overview 1. Overview of KDD and data mining 2. Data

More information

not possible or was possible at a high cost for collecting the data.

not possible or was possible at a high cost for collecting the data. Data Mining and Knowledge Discovery Generating knowledge from data Knowledge Discovery Data Mining White Paper Organizations collect a vast amount of data in the process of carrying out their day-to-day

More information

Predicting the Risk of Heart Attacks using Neural Network and Decision Tree

Predicting the Risk of Heart Attacks using Neural Network and Decision Tree Predicting the Risk of Heart Attacks using Neural Network and Decision Tree S.Florence 1, N.G.Bhuvaneswari Amma 2, G.Annapoorani 3, K.Malathi 4 PG Scholar, Indian Institute of Information Technology, Srirangam,

More information

Bayesian networks - Time-series models - Apache Spark & Scala

Bayesian networks - Time-series models - Apache Spark & Scala Bayesian networks - Time-series models - Apache Spark & Scala Dr John Sandiford, CTO Bayes Server Data Science London Meetup - November 2014 1 Contents Introduction Bayesian networks Latent variables Anomaly

More information

A Review of Data Mining Techniques

A Review of Data Mining Techniques Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 4, April 2014,

More information

Data mining and official statistics

Data mining and official statistics Quinta Conferenza Nazionale di Statistica Data mining and official statistics Gilbert Saporta président de la Société française de statistique 5@ S Roma 15, 16, 17 novembre 2000 Palazzo dei Congressi Piazzale

More information

Protein Protein Interaction Networks

Protein Protein Interaction Networks Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics

More information

Data Mining + Business Intelligence. Integration, Design and Implementation

Data Mining + Business Intelligence. Integration, Design and Implementation Data Mining + Business Intelligence Integration, Design and Implementation ABOUT ME Vijay Kotu Data, Business, Technology, Statistics BUSINESS INTELLIGENCE - Result Making data accessible Wider distribution

More information

Lecture/Recitation Topic SMA 5303 L1 Sampling and statistical distributions

Lecture/Recitation Topic SMA 5303 L1 Sampling and statistical distributions SMA 50: Statistical Learning and Data Mining in Bioinformatics (also listed as 5.077: Statistical Learning and Data Mining ()) Spring Term (Feb May 200) Faculty: Professor Roy Welsch Wed 0 Feb 7:00-8:0

More information

Forecasting Solar Power with Adaptive Models A Pilot Study

Forecasting Solar Power with Adaptive Models A Pilot Study Forecasting Solar Power with Adaptive Models A Pilot Study Dr. James W. Hall 1. Introduction Expanding the use of renewable energy sources, primarily wind and solar, has become a US national priority.

More information

Assessing Data Mining: The State of the Practice

Assessing Data Mining: The State of the Practice Assessing Data Mining: The State of the Practice 2003 Herbert A. Edelstein Two Crows Corporation 10500 Falls Road Potomac, Maryland 20854 www.twocrows.com (301) 983-3555 Objectives Separate myth from reality

More information

BIG DATA & ANALYTICS. Transforming the business and driving revenue through big data and analytics

BIG DATA & ANALYTICS. Transforming the business and driving revenue through big data and analytics BIG DATA & ANALYTICS Transforming the business and driving revenue through big data and analytics Collection, storage and extraction of business value from data generated from a variety of sources are

More information

Customer Classification And Prediction Based On Data Mining Technique

Customer Classification And Prediction Based On Data Mining Technique Customer Classification And Prediction Based On Data Mining Technique Ms. Neethu Baby 1, Mrs. Priyanka L.T 2 1 M.E CSE, Sri Shakthi Institute of Engineering and Technology, Coimbatore 2 Assistant Professor

More information

MS1b Statistical Data Mining

MS1b Statistical Data Mining MS1b Statistical Data Mining Yee Whye Teh Department of Statistics Oxford http://www.stats.ox.ac.uk/~teh/datamining.html Outline Administrivia and Introduction Course Structure Syllabus Introduction to

More information

INTRODUCTION TO DATA MINING SAS ENTERPRISE MINER

INTRODUCTION TO DATA MINING SAS ENTERPRISE MINER INTRODUCTION TO DATA MINING SAS ENTERPRISE MINER Mary-Elizabeth ( M-E ) Eddlestone Principal Systems Engineer, Analytics SAS Customer Loyalty, SAS Institute, Inc. AGENDA Overview/Introduction to Data Mining

More information

Big Data Text Mining and Visualization. Anton Heijs

Big Data Text Mining and Visualization. Anton Heijs Copyright 2007 by Treparel Information Solutions BV. This report nor any part of it may be copied, circulated, quoted without prior written approval from Treparel7 Treparel Information Solutions BV Delftechpark

More information

Data Mining and KDD: A Shifting Mosaic. Joseph M. Firestone, Ph.D. White Paper No. Two. March 12, 1997

Data Mining and KDD: A Shifting Mosaic. Joseph M. Firestone, Ph.D. White Paper No. Two. March 12, 1997 1 of 11 5/24/02 3:50 PM Data Mining and KDD: A Shifting Mosaic By Joseph M. Firestone, Ph.D. White Paper No. Two March 12, 1997 The Idea of Data Mining Data Mining is an idea based on a simple analogy.

More information

Data Mining Solutions for the Business Environment

Data Mining Solutions for the Business Environment Database Systems Journal vol. IV, no. 4/2013 21 Data Mining Solutions for the Business Environment Ruxandra PETRE University of Economic Studies, Bucharest, Romania ruxandra_stefania.petre@yahoo.com Over

More information

Introduction to Machine Learning. Speaker: Harry Chao Advisor: J.J. Ding Date: 1/27/2011

Introduction to Machine Learning. Speaker: Harry Chao Advisor: J.J. Ding Date: 1/27/2011 Introduction to Machine Learning Speaker: Harry Chao Advisor: J.J. Ding Date: 1/27/2011 1 Outline 1. What is machine learning? 2. The basic of machine learning 3. Principles and effects of machine learning

More information

Social Media Mining. Data Mining Essentials

Social Media Mining. Data Mining Essentials Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

More information

Data Mining and Database Systems: Where is the Intersection?

Data Mining and Database Systems: Where is the Intersection? Data Mining and Database Systems: Where is the Intersection? Surajit Chaudhuri Microsoft Research Email: surajitc@microsoft.com 1 Introduction The promise of decision support systems is to exploit enterprise

More information

OLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP

OLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP Data Warehousing and End-User Access Tools OLAP and Data Mining Accompanying growth in data warehouses is increasing demands for more powerful access tools providing advanced analytical capabilities. Key

More information

CHAPTER-24 Mining Spatial Databases

CHAPTER-24 Mining Spatial Databases CHAPTER-24 Mining Spatial Databases 24.1 Introduction 24.2 Spatial Data Cube Construction and Spatial OLAP 24.3 Spatial Association Analysis 24.4 Spatial Clustering Methods 24.5 Spatial Classification

More information

Using D2K Data Mining Platform for Understanding the Dynamic Evolution of Land-Surface Variables

Using D2K Data Mining Platform for Understanding the Dynamic Evolution of Land-Surface Variables Using D2K Data Mining Platform for Understanding the Dynamic Evolution of Land-Surface Variables Praveen Kumar 1, Peter Bajcsy 2, David Tcheng 2, David Clutter 2, Vikas Mehra 1, Wei-Wen Feng 2, Pratyush

More information

A Proposal for the use of Artificial Intelligence in Spend-Analytics

A Proposal for the use of Artificial Intelligence in Spend-Analytics A Proposal for the use of Artificial Intelligence in Spend-Analytics Mark Bishop, Sebastian Danicic, John Howroyd and Andrew Martin Our core team Mark Bishop PhD studied Cybernetics and Computer Science

More information

Better planning and forecasting with IBM Predictive Analytics

Better planning and forecasting with IBM Predictive Analytics IBM Software Business Analytics SPSS Predictive Analytics Better planning and forecasting with IBM Predictive Analytics Using IBM Cognos TM1 with IBM SPSS Predictive Analytics to build better plans and

More information

Tools for Managing and Measuring the Value of Big Data Projects

Tools for Managing and Measuring the Value of Big Data Projects Tools for Managing and Measuring the Value of Big Data Projects Abstract Big Data and analytics focused projects have undetermined scope and changing requirements at their core. There is high risk of loss

More information

Evaluating an Integrated Time-Series Data Mining Environment - A Case Study on a Chronic Hepatitis Data Mining -

Evaluating an Integrated Time-Series Data Mining Environment - A Case Study on a Chronic Hepatitis Data Mining - Evaluating an Integrated Time-Series Data Mining Environment - A Case Study on a Chronic Hepatitis Data Mining - Hidenao Abe, Miho Ohsaki, Hideto Yokoi, and Takahira Yamaguchi Department of Medical Informatics,

More information

Statistics for BIG data

Statistics for BIG data Statistics for BIG data Statistics for Big Data: Are Statisticians Ready? Dennis Lin Department of Statistics The Pennsylvania State University John Jordan and Dennis K.J. Lin (ICSA-Bulletine 2014) Before

More information

Taming the Internet of Things: The Lord of the Things

Taming the Internet of Things: The Lord of the Things Taming the Internet of Things: The Lord of the Things Kirk Borne @KirkDBorne School of Physics, Astronomy, & Computational Sciences College of Science, George Mason University, Fairfax, VA Taming the Internet

More information

Graduate Co-op Students Information Manual. Department of Computer Science. Faculty of Science. University of Regina

Graduate Co-op Students Information Manual. Department of Computer Science. Faculty of Science. University of Regina Graduate Co-op Students Information Manual Department of Computer Science Faculty of Science University of Regina 2014 1 Table of Contents 1. Department Description..3 2. Program Requirements and Procedures

More information

Network Machine Learning Research Group. Intended status: Informational October 19, 2015 Expires: April 21, 2016

Network Machine Learning Research Group. Intended status: Informational October 19, 2015 Expires: April 21, 2016 Network Machine Learning Research Group S. Jiang Internet-Draft Huawei Technologies Co., Ltd Intended status: Informational October 19, 2015 Expires: April 21, 2016 Abstract Network Machine Learning draft-jiang-nmlrg-network-machine-learning-00

More information

EFFICIENT DATA PRE-PROCESSING FOR DATA MINING

EFFICIENT DATA PRE-PROCESSING FOR DATA MINING EFFICIENT DATA PRE-PROCESSING FOR DATA MINING USING NEURAL NETWORKS JothiKumar.R 1, Sivabalan.R.V 2 1 Research scholar, Noorul Islam University, Nagercoil, India Assistant Professor, Adhiparasakthi College

More information

New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Introduction

New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Introduction Introduction New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Predictive analytics encompasses the body of statistical knowledge supporting the analysis of massive data sets.

More information

Data Mining and Pattern Recognition for Large-Scale Scientific Data

Data Mining and Pattern Recognition for Large-Scale Scientific Data Data Mining and Pattern Recognition for Large-Scale Scientific Data Chandrika Kamath Center for Applied Scientific Computing Lawrence Livermore National Laboratory October 15, 1998 We need an effective

More information

The KDD Process for Extracting Useful Knowledge from Volumes of Data

The KDD Process for Extracting Useful Knowledge from Volumes of Data Knowledge Discovery in bases creates the context for developing the tools needed to control the flood of data facing organizations that depend on ever-growing databases of business, manufacturing, scientific,

More information

Clustering and scheduling maintenance tasks over time

Clustering and scheduling maintenance tasks over time Clustering and scheduling maintenance tasks over time Per Kreuger 2008-04-29 SICS Technical Report T2008:09 Abstract We report results on a maintenance scheduling problem. The problem consists of allocating

More information

Smart Grid Data Analytics for Decision Support

Smart Grid Data Analytics for Decision Support 1 Smart Grid Data Analytics for Decision Support Prakash Ranganathan, Department of Electrical Engineering, University of North Dakota, Grand Forks, ND, USA Prakash.Ranganathan@engr.und.edu, 701-777-4431

More information

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Overview This 4-day class is the first of the two data science courses taught by Rafal Lukawiecki. Some of the topics will be

More information

D A T A M I N I N G C L A S S I F I C A T I O N

D A T A M I N I N G C L A S S I F I C A T I O N D A T A M I N I N G C L A S S I F I C A T I O N FABRICIO VOZNIKA LEO NARDO VIA NA INTRODUCTION Nowadays there is huge amount of data being collected and stored in databases everywhere across the globe.

More information

Statistical Analysis and Visualization for Cyber Security

Statistical Analysis and Visualization for Cyber Security Statistical Analysis and Visualization for Cyber Security Joanne Wendelberger, Scott Vander Wiel Statistical Sciences Group, CCS-6 Los Alamos National Laboratory Quality and Productivity Research Conference

More information

DATA MINING - SELECTED TOPICS

DATA MINING - SELECTED TOPICS DATA MINING - SELECTED TOPICS Peter Brezany Institute for Software Science University of Vienna E-mail : brezany@par.univie.ac.at 1 MINING SPATIAL DATABASES 2 Spatial Database Systems SDBSs offer spatial

More information

CHAPTER-29 Data Mining, System Products and Research Prototypes

CHAPTER-29 Data Mining, System Products and Research Prototypes CHAPTER-29 Data Mining, System Products and Research Prototypes 29.1 How to Choose a Data Mining System 29.2 Data, mining functions and methodologies: 29.3 Coupling data mining with database anti/or data

More information

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012 Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts

More information

Hurwitz ValuePoint: Predixion

Hurwitz ValuePoint: Predixion Predixion VICTORY INDEX CHALLENGER Marcia Kaufman COO and Principal Analyst Daniel Kirsch Principal Analyst The Hurwitz Victory Index Report Predixion is one of 10 advanced analytics vendors included in

More information

Knowledge Discovery from patents using KMX Text Analytics

Knowledge Discovery from patents using KMX Text Analytics Knowledge Discovery from patents using KMX Text Analytics Dr. Anton Heijs anton.heijs@treparel.com Treparel Abstract In this white paper we discuss how the KMX technology of Treparel can help searchers

More information

Machine Learning with MATLAB David Willingham Application Engineer

Machine Learning with MATLAB David Willingham Application Engineer Machine Learning with MATLAB David Willingham Application Engineer 2014 The MathWorks, Inc. 1 Goals Overview of machine learning Machine learning models & techniques available in MATLAB Streamlining the

More information

Study and Analysis of Data Mining Concepts

Study and Analysis of Data Mining Concepts Study and Analysis of Data Mining Concepts M.Parvathi Head/Department of Computer Applications Senthamarai college of Arts and Science,Madurai,TamilNadu,India/ Dr. S.Thabasu Kannan Principal Pannai College

More information

Fairfield Public Schools

Fairfield Public Schools Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity

More information

International Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: 2454-2377 Vol. 1, Issue 6, October 2015. Big Data and Hadoop

International Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: 2454-2377 Vol. 1, Issue 6, October 2015. Big Data and Hadoop ISSN: 2454-2377, October 2015 Big Data and Hadoop Simmi Bagga 1 Satinder Kaur 2 1 Assistant Professor, Sant Hira Dass Kanya MahaVidyalaya, Kala Sanghian, Distt Kpt. INDIA E-mail: simmibagga12@gmail.com

More information

Data Mining Using SAS Enterprise Miner 7.1

Data Mining Using SAS Enterprise Miner 7.1 Data Mining Using SAS Enterprise Miner 7.1 Lorne Rothman Lorne.rothman@sas.com Principal Statistician SAS Institute (Canada) Inc. Copyright 2010 SAS Institute Inc. All rights reserved. Data Mining The

More information

01219211 Software Development Training Camp 1 (0-3) Prerequisite : 01204214 Program development skill enhancement camp, at least 48 person-hours.

01219211 Software Development Training Camp 1 (0-3) Prerequisite : 01204214 Program development skill enhancement camp, at least 48 person-hours. (International Program) 01219141 Object-Oriented Modeling and Programming 3 (3-0) Object concepts, object-oriented design and analysis, object-oriented analysis relating to developing conceptual models

More information

Lecture 11: Graphical Models for Inference

Lecture 11: Graphical Models for Inference Lecture 11: Graphical Models for Inference So far we have seen two graphical models that are used for inference - the Bayesian network and the Join tree. These two both represent the same joint probability

More information

Understanding Web personalization with Web Usage Mining and its Application: Recommender System

Understanding Web personalization with Web Usage Mining and its Application: Recommender System Understanding Web personalization with Web Usage Mining and its Application: Recommender System Manoj Swami 1, Prof. Manasi Kulkarni 2 1 M.Tech (Computer-NIMS), VJTI, Mumbai. 2 Department of Computer Technology,

More information

REGULATIONS FOR THE DEGREE OF MASTER OF SCIENCE IN COMPUTER SCIENCE (MSc[CompSc])

REGULATIONS FOR THE DEGREE OF MASTER OF SCIENCE IN COMPUTER SCIENCE (MSc[CompSc]) 315 REGULATIONS FOR THE DEGREE OF MASTER OF SCIENCE IN COMPUTER SCIENCE (MSc[CompSc]) (See also General Regulations) Any publication based on work approved for a higher degree should contain a reference

More information

EPSRC Cross-SAT Big Data Workshop: Well Sorted Materials

EPSRC Cross-SAT Big Data Workshop: Well Sorted Materials EPSRC Cross-SAT Big Data Workshop: Well Sorted Materials 5th August 2015 Contents Introduction 1 Dendrogram 2 Tree Map 3 Heat Map 4 Raw Group Data 5 For an online, interactive version of the visualisations

More information

ICT Perspectives on Big Data: Well Sorted Materials

ICT Perspectives on Big Data: Well Sorted Materials ICT Perspectives on Big Data: Well Sorted Materials 3 March 2015 Contents Introduction 1 Dendrogram 2 Tree Map 3 Heat Map 4 Raw Group Data 5 For an online, interactive version of the visualisations in

More information

Doctor of Philosophy in Computer Science

Doctor of Philosophy in Computer Science Doctor of Philosophy in Computer Science Background/Rationale The program aims to develop computer scientists who are armed with methods, tools and techniques from both theoretical and systems aspects

More information

Data Mining System, Functionalities and Applications: A Radical Review

Data Mining System, Functionalities and Applications: A Radical Review Data Mining System, Functionalities and Applications: A Radical Review Dr. Poonam Chaudhary System Programmer, Kurukshetra University, Kurukshetra Abstract: Data Mining is the process of locating potentially

More information

DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM.

DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM. DATA MINING TECHNOLOGY Georgiana Marin 1 Abstract In terms of data processing, classical statistical models are restrictive; it requires hypotheses, the knowledge and experience of specialists, equations,

More information

Azure Machine Learning, SQL Data Mining and R

Azure Machine Learning, SQL Data Mining and R Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:

More information