Data Mining Jargon. Bob Muenchen The Statistical Consulting Center
|
|
- Kerry Terry
- 8 years ago
- Views:
Transcription
1 Data Mining Jargon Bob Muenchen The Statistical Consulting Center Data mining is the automated search for useful patterns in data. It uses tools from many different disciplines, each of which uses its own technical jargon. This document defines the jargon that is most widely used. A similar document, which translates neural networking jargon into statistical terms, can be found at ftp.sas.com/neural/jargon. If you need assistance, call the Helpdesk at , send to stathelp@utk.edu, or stop by the SCC walk-in support area at 200 Stokely Management Center. All UT students, faculty, and staff researchers can get up to 10 hours of free assistance for their statistical computing needs each semester. See oit.utk.edu/scc for details. We also offer training each semester. See web.utk.edu/~training for details. Analytics the tools of data mining. The major categories of analytics are cluster analysis, decision trees, neural networks, statistical models and association analysis. Analytics that deal with the future are called Predictive Analytics. Since accurate information about the future is so valuable, some view predictive analytics as the core mission of data mining. Artificial Intelligence the field of science that studies how to make computers intelligent. It consists mainly of the fields of Machine Learning (neural networks and decision trees) and expert systems. Artificial Neural Network (ANN) see Neural Network. Association Analysis a data mining tool that discovers combinations of options and their frequency of co-occurrence. For example, 80% of people who buy paint also buy brushes. Essentially a method of rule induction in which all variables are viewed as targets. When applied to products purchased, it is called market basket analysis. Back Office the part of the company that customers don t see, which is run using data stored in an Enterprise Resource Planning system and a Supply Chain Management system. Black Box a term used to describe a model which, although it may work well, is too complex for people to understand. Usually expressible as a long series of incomprehensible equations as in neural networking models. Business Intelligence making better decisions through the use of objective analysis. The four main BI tools are Report & Query, Online Analytical Processing (OLAP), Visualization and Data Mining. Champion Model the model that best solves the data mining problem.
2 Classify developing a model to place records into known categories, e.g. defaulted on loan or not. Class Discovery a term used in biological data mining to refer to unsupervised training or cluster analysis. Class Prediction a term used in biological data mining to refer to supervised training with a categorical target variable. Cluster Analysis developing a model that discovers categories of similar records. Usually performed as a prelude to other analyses. Also called unsupervised training. Concatenate combining datasets or marts so that their columns are aligned and new rows are added. Alignment of the columns is done by using the same column headings, or by a column-by-column manual matching, such as ID in one table might be called SSN in another. CRM see Customer Relationship Management. Curse of Dimensionality refers to the fact that the more variables you study, the larger your dataset needs to be to have a chance at modeling the larger space. The relationship between variables and observations is exponential. This means that to model 10 variables at once, 100 observations may be barely sufficient, but to model 10 times as many variables would require 100 times as many observations, and so on. Removing irrelevant or redundant variables are the two easiest ways to fight the curse. Customer Relationship Management (CRM) is the process of studying and interacting with customers to maximize profits. Luckily, ensuring customer satisfaction is a key way to maximize profitability, but cutting service to unprofitable customers is also involved. This is such a popular use of analysis in business that companies such as SPSS that once said their business was statistical analysis now say it is CRM. CRM is one of the three main areas to which data mining is applied: supply chain management (SCM), enterprise resource planning (ERP) and customer relationship management (CRM). Data Access consists of reading the data for analysis. This may include inputting the data from a flat file, translating a copy of some data from a database or warehouse so that the data mining software can analyze it, or defining a method to read the data directly using a method such as the Open Database Connectivity standard. Data Conversion performing a one-time translation of data from its original format (perhaps stored in a database) into the format used by a data mining package. Example conversion tools are DBMScopy, StatTransfer, Data Junction. See also Data Access and Extract, Transform and Load. Data Cube A data structure of aggregated values summarized for a combination of pre-selected categorical variables e.g. number of items sold and their total cost for each time period, region and product. This structure is required for high-speed analysis of the summaries that is done in Online Analytical Processing (OLAP). Also called a Multidimensional Database or MDDB.
3 Data Management all of the tasks required to manage data such as correcting data entry errors, estimating values of missing data, subsetting or combining sets of data. Data Mart A small data warehouse that is focused on a single area such as a research project or a single department such as sales or accounting. Ideally, all marts in an organization should compatible, but they often differ in structure and file format Data Model Has several very different meanings. To a data miner, it can mean two different things. It can refer to the structure of how a database administrator chooses to store the data in a database or data warehouse, how the tables relate to each other. It can also refer to the way a given database program requires storing data, for example in relational or hierarchical form. If a data analyst uses this term, it refers to the rules or formulas that describe relationships among the variables (see Modeling). Data Quality (DQ) addresses the issues of getting the right measures, ensuring the measures are timely and accurate, that editing is done with controls to prevent errors, that manipulations such as formulas are accurate and documented, that the data is accurately described. Also known as Information Quality or IQ. Data Table A collection of data measurements organized into rectangular columns called fields and rows. Columns contain a single measure, such as blood pressure, for all sampling units. Also called variables, vectors or attributes. Rows contain the measures for a given sampling unit such as all medical information for a person. Also called observations, cases, records or instances. Data Visualization see Visualization. Data Warehouse A static copy of a database that has been optimized for analysis or denormalized. For example, the address of a customer stored for each purchase he makes may waste computer space, but it makes it very easy to find the mean sales for any given geographic region, without knowing the location of the address table. Doing analysis on a data warehouse also prevents analyses from interfering with ongoing data collection. Database A collection of data organized for efficient use in a continuously updated situation, such as frequent sales, reservations. Far and away the most common type of database is a Relational database in which a collection of data tables are stored and related or linked by common key. A key is a column or collection of columns that uniquely identify a row such as Social Security Number. A database is optimized for online transaction processing (e.g. selling products, entering patient information), not for analysis. The optimized state of the database is called normalized or third-normal form. Briefly, it removes redundant information such as the full customer address of every sale, storing it in a separate table. Database Administrator (DBA) the person responsible for organizing the data for an organization. Tasks include changing structure of databases to optimize speed, and of data warehouses to optimize data mining efficiency. In any large organization, this is the person you will need to work with to gain access to the data. He or she will be using many of the terms in this handout when you meet! Decision List see Decision Trees.
4 Decision Support Software (DSS) any software that uses analysis to improve decisionmaking. Also called Decision Support System. Decision Trees a method of finding rules or (rule induction) that divide the data into subgroups that are as similar as possible with regard to a target variable. See the example below for a tree that predicts survival rates for heart attack victims in an emergency room setting (made up for simplicity s sake). The whole training dataset of 100 patients is called the root node. It is divided logically into subgroups called branches that are further subdivided into other branches or finally, leaves. The process of continuing to subdivide the groups is called recursive partitioning. Decision trees are the most popular method of displaying rules. If this sequence of rules is written out in English (or a computer language) it is called a decision list. If the complete set of steps required reaching each decision are written out so that they no longer need to be read in sequence, they are called a rule set. If the decision tree predicts a categorical outcome such as purchase or not, it is called a classification tree. If it predicts a continuous variable such as dollar amount purchased, it is called a regression tree. The most popular decision tree models are called Chi-squared Automatic Interaction Detection (CHAID) Classification and Regression Trees (CART) and C4.5/C Patients 40% Died 60% Lived Blood Pressure < % Died 70% Lived Blood Pressure > % Died 30% Lived Cholesterol < % Died 90% Lived Cholesterol > % Died 80% Lived Cholesterol < % Died 20% Lived Cholesterol > % Died 10% Lived Dependent Variable see Target Variable Drill-Down a request for more detailed information, usually by double-clicking on a number or a part of a graph. For example, a table may show average salaries of professors broken down by department and gender. Drilling down on a cell of that table might display that relationship for each campus. The opposite of roll-up.
5 DSS see Decision Support Software. Ensemble Model a model that combines the results of several types of models. For example, a prediction could use the average estimation from a decision tree, a neural network and a statistical model. Enterprise Resource Planning (ERP) software that stores the core operational data of a businesses operational data such as sales, receivables, payables, in a database. One of the three main areas to which data mining is applied: supply chain management (SCM), enterprise resource planning (ERP) and customer relationship management (CRM). ERP see Enterprise Resource Planning. Estimate develop a model to find an approximate value for a continuous variable, e.g. sales, blood pressure. ETL or ETML stands for Extract, Transform, Move and Load, the steps required to gain access to data for analysis. Since the Move step is the easiest, the M is often left out. ETL is an important subset of Data Management. Executive Information System (EIS) systems any decision maker can use with little training to do ad-hoc analyses, often using OLAP Expert Systems a system that can solve a problem by incorporating the rules manually obtained from human experts. You describe the problem and let it choose how best to solve it. Examples in the area of analysis include SigmaStat, DecisionTime and the SPSS Statistics Coach. Decisions are rather flaky at the moment, but improving. Flat File - data stored in a standard format used to move data from one program to another. Windows and Macintosh call this a Text Only or Text With Line Breaks file. It may also be called an ASCII, EBCDIC (on large IBM computers) or UNICODE file. Front Office the part of a company that customers interact with. Customer data is critical to business profitability, so it is frequently mined. Heuristic see Modeling. IQ see Data Quality. Imputation the process of estimating the values of missing data prior to analysis. Independent Variable see Input Variables. Information Quality see Data Quality. Input Variables the variables thought to be related to, predict or cause the target variable. In data mining, almost any variable that is not the target variable is a candidate for an input variable.
6 Join a database procedure that pools the information stored in different tables so that they can be better analyzed. Key Performance Indicator (KPI) a very important variable. In business, it is a measure that is critically important to the overall functioning of the organization. KPI See Key Performance Indicator. Lift a measure used to compare different data mining models. Essentially it is a measure of how much better you are with the model than without. For example, if 2% of the customers you mail a catalog to would make a purchase but using the model to select catalog recipients 10% would make a purchase, then lift is 10/2 or 5. Machine Learning models that enable the computer to improve its performance through experience, especially rule induction. The definition of learning is so loose that, although rarely mentioned in this context, statistical estimation could also be considered learning. Modeling is roughly synonymous with machine learning. Market Basket Analysis see Association Analysis. Mart see Data Mart. MDDB See Data Cube. Measurment Scale the level of detail in a variable. The measurement scale helps determine the role of the variable in an analysis. Types include: Single-valued variables or constants that result from selected subsets. Binary have only two values such as male/female, purchased/didn't purchase. Nominal contain category memberships such as political party. Also called categorical, class, group, symbolic or qualitative variables. Ordinal variables contain values that have order such as small, medium, large. Interval or continuous variables have meaningful intervals, such as a weight interval of 110 pounds to 120 being the same as 120 to 130. Interval-level variables are also called numeric or scale variables. Merge combining datasets or marts so that their rows are aligned and new columns are added. Row alignment is often done using a key such as an ID number. Metadata Data about the data. Examples are column names such as gender, height; column labels containing descriptors to embellish output. Entire questions from questionnaires are common labels. Formats describe what values mean, such as 0=Female, 1=Male. Also called value labels or codes. Missing value codes, if other than blank, e.g. 999; Scale of each column: nominal, ordinal, interval; Formulas or recoding steps that were followed; Documentation such as who, what, where, when why the data collected were collected; MOLAP see Online Analytical Processing. Modeling generally refers to the process of developing rules which can classify or predict with an estimated level of precision. The rules may be in the form of a series of logical statements or mathematical formula(s).
7 Statistical models are equations that have been mathematically derived to provide the best or optimal description of relationships that involved straight lines, smooth curves, group membership or clusters of similar cases. The solution to these equations usually requires simplifying assumptions about the nature of the data that will not fit every dataset. Heuristic models use methods that have been empirically shown to work well, but which have not been shown to be best or optimal solution. Heuristic models usually make comparatively few assumptions about the nature of the data. Decision trees are an example of an analysis based on heuristics, while discriminant analysis is based on an optimal method (which assumes the data follows a multivariate normal distribution). Multidimensional Database See Data Cube. Neural Networks models that mimic the brain through systems of equations. They learn by being trained with a dataset. Unfortunately, what they learn is conveyed by a series of complex mathematical formulas. These formulas may work well but not explain much about the process they model. See Black Box. ODBC see Open Database Connectivity. OLAP see Online Analytical Processing. OLE DB an open standard for gaining access to the data stored in a multidimensional database. Most OLAP products use OLE DB to access the data. OLTP see Online Transaction Processing. Online Analytical Processing (OLAP) software that quickly displays interactive tables or graphs of pre-selected variables such as sales aggregated by time, region, state, store and product line. A two-dimensional slice of the data might show mean sales broken down by region and product line. Clicking on region might drill down to further divide those numbers by state. Some statistics such as medians and percentiles cannot be used in OLAP due to the data structure OLAP requires (a multidimensional database). OLAP is often not considered data mining since it involves only simple tables and graphs that display what is happening rather than the analytics that can help determine why it is happening or what may happen in the future. However, it is very widely discussed in the data mining literature. OLAP usually displays only sums, counts and means. This is because means at any level of breakdown can be calculated from sums of sums and sums of counts. However, the median of an aggregate is not the aggregate of the medians, which is why medians and percentiles cannot be used in OLAP. This is a major limitation in the method. OLAP is occasionally referred to as MOLAP because it runs very quickly using a Multidimensional database. When OLAP is used with a standard relational database, it is called ROLAP. ROLAP is usually hundreds of times slower than MOLAP.
8 Online Transaction Processing (OLTP) involves the efficient execution of frequent database transactions used to collect data or run a business. These transactions are recorded in a database rather than a data warehouse and are not suitable for analysis until they have been transferred to a warehouse and restructured for efficient analysis. Open Database Connectivity (ODBC) a widely used standard to extract data from a data warehouse to use for analysis. Over-fitted Model a model which has become so complex that it applies only to the dataset upon which it was developed. Another term for this is over-parameterized. Predictive Analytics see Analytics. Profit the amount of profit made in a specific modeling situation, calculated by estimating the cost of each type of error: assuming it is right when it is wrong, and vice versa. It can be used in business for obvious reasons but also in other areas. For example in medicine, you could assign a cost of concluding a patient has a treatable disease when they do not (antibiotic treatment = $35) versus concluding they do not have it when they actually do (treated after complications set in=$2,300 hospital stay). The point of maximum profit would show you the best way to use the model. Qualitative Data depending on the context this term may refer to text data such as messages or to a categorical variables such as gender. Query the process of asking a database questions. Often done in an ad-hoc, interactive way. Regression Analysis a family of statistical models that include fitting straight lines, called linear regression, smoothly curving lines, called polynomial regression or more sharply curving lines, called nonlinear regression; or models that predict group membership, called logistic regression. The main type of data mining that these generalized linear models do not do is clustering. Relational Database See Database. Report A basic listing of database information, which may consist of individual values, sums, counts or means. Often done in a pre-planned, static way. Return on Investment (ROI) the money you saved by doing data mining. It is above and beyond the usually high expenses of purchasing a data mining package, learning to use it and then using it to solve a problem. ROI see Return on Investment. ROLAP see Online Analytical Processing. Roll-Up the process of aggregating numeric data. For example, a series of tables may show professor salaries broken down by department and gender at each campus. The campuses could be rolled up to create a single table of salaries broken down by department and gender for all campuses combined. The opposite of drill-down. Rule Induction see Decision Trees.
9 Rule Set see Decision Trees. SCM see Supply Chain Management. Scoring applying a model to new data usually to predict values of continuous variables such as amount of purchase, or group memberships such as survive/die. Sequel see Structured Query Language. SRM stands for Supplier Relationship Management. See Supply Chain Management. Statistical Models see Modeling. Structured Query Language (SQL) the basic language used in almost all databases. It allows you to search a database basic information such as listing certain records or totaling sums or counts. It also lets you select subsets or samples, or to perform joins. The basic form is SELECT vars FROM tablename WHERE logical condition is true. Often pronounced sequel. Supervised Learning the process of developing a model that has a target variable, such as sales or survival to supervise it. This is opposed to unsupervised learning, which is developing a model without a target, or finding clusters or groups of similar records in your data. When the target variable is categorical, biologists would call this Class Prediction and statisticians would call it Discriminant Analysis or Logistic Regression (two different methods of achieving a similar result). Supplier Relationship Management (SRM) see supply chain management. Supply Chain Management (SCM) is the process of studying and interacting with suppliers to maximize profits. Also called supplier relationship management (SRM). One of the three main areas to which data mining is applied: supply chain management (SCM), enterprise resource planning (ERP) and customer relationship management (CRM). Target Variable the main variable of interest in a data mining project. A business example is the amount of each sale; a medical example is cure/no cure. Also called the predicted, supervisor or dependent variable. Test Data used only once at the end of the data mining project to see if the best or champion model generalizes to completely new data. Text Data written descriptions e.g. open ended survey questions, interviews, customer complaints; also called qualitative data. Text File see flat file. Text Mining the process of automatically finding the key concepts contained in text data. It may also find clusters of similar documents. The numeric output containing presence/absence of each concept and cluster membership is often passed on to a data mining step where it is combined with other numeric data for further analysis.
10 Training Data the data which is used to develop models that will, if done properly, work well on new sets of data. Unsupervised Training/Learning the process of developing a model that does not have a target variable. This boils down to finding clusters of similar cases within the data. It would usually be followed by another analysis that does involve a target variable. Biologists would call this class discovery. Statisticians would call it cluster analysis. Validation Data during data mining, each step of each model developed using the training data is tested using this data to discover the point at which the model becomes overly specific to that single set of data, an over-fitted model. Variable see Data Table. Visualization the use of dynamic, interactive graphical displays to search for useful patterns in data. Warehouse see Data Warehouse.
OLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP
Data Warehousing and End-User Access Tools OLAP and Data Mining Accompanying growth in data warehouses is increasing demands for more powerful access tools providing advanced analytical capabilities. Key
More informationIntroduction to Data Mining
Introduction to Data Mining Jay Urbain Credits: Nazli Goharian & David Grossman @ IIT Outline Introduction Data Pre-processing Data Mining Algorithms Naïve Bayes Decision Tree Neural Network Association
More informationData Mining Algorithms Part 1. Dejan Sarka
Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka (dsarka@solidq.com) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses
More informationBusiness Intelligence, Analytics & Reporting: Glossary of Terms
Business Intelligence, Analytics & Reporting: Glossary of Terms A B C D E F G H I J K L M N O P Q R S T U V W X Y Z Ad-hoc analytics Ad-hoc analytics is the process by which a user can create a new report
More informationFoundations of Business Intelligence: Databases and Information Management
Foundations of Business Intelligence: Databases and Information Management Problem: HP s numerous systems unable to deliver the information needed for a complete picture of business operations, lack of
More informationChapter 12 Discovering New Knowledge Data Mining
Chapter 12 Discovering New Knowledge Data Mining Becerra-Fernandez, et al. -- Knowledge Management 1/e -- 2004 Prentice Hall Additional material 2007 Dekai Wu Chapter Objectives Introduce the student to
More informationData Warehousing and Data Mining in Business Applications
133 Data Warehousing and Data Mining in Business Applications Eesha Goel CSE Deptt. GZS-PTU Campus, Bathinda. Abstract Information technology is now required in all aspect of our lives that helps in business
More informationData Mining Applications in Fund Raising
Data Mining Applications in Fund Raising Nafisseh Heiat Data mining tools make it possible to apply mathematical models to the historical data to manipulate and discover new information. In this study,
More informationCourse 803401 DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization
Oman College of Management and Technology Course 803401 DSS Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization CS/MIS Department Information Sharing
More informationBENEFITS OF AUTOMATING DATA WAREHOUSING
BENEFITS OF AUTOMATING DATA WAREHOUSING Introduction...2 The Process...2 The Problem...2 The Solution...2 Benefits...2 Background...3 Automating the Data Warehouse with UC4 Workload Automation Suite...3
More informationIBM SPSS Direct Marketing 23
IBM SPSS Direct Marketing 23 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 23, release
More informationInternational Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014
RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer
More informationData Mining: Overview. What is Data Mining?
Data Mining: Overview What is Data Mining? Recently * coined term for confluence of ideas from statistics and computer science (machine learning and database methods) applied to large databases in science,
More informationChapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization
Turban, Aronson, and Liang Decision Support Systems and Intelligent Systems, Seventh Edition Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization
More informationSilvermine House Steenberg Office Park, Tokai 7945 Cape Town, South Africa Telephone: +27 21 702 4666 www.spss-sa.com
SPSS-SA Silvermine House Steenberg Office Park, Tokai 7945 Cape Town, South Africa Telephone: +27 21 702 4666 www.spss-sa.com SPSS-SA Training Brochure 2009 TABLE OF CONTENTS 1 SPSS TRAINING COURSES FOCUSING
More informationIBM SPSS Direct Marketing 22
IBM SPSS Direct Marketing 22 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 22, release
More informationIn this presentation, you will be introduced to data mining and the relationship with meaningful use.
In this presentation, you will be introduced to data mining and the relationship with meaningful use. Data mining refers to the art and science of intelligent data analysis. It is the application of machine
More informationBuilding Data Cubes and Mining Them. Jelena Jovanovic Email: jeljov@fon.bg.ac.yu
Building Data Cubes and Mining Them Jelena Jovanovic Email: jeljov@fon.bg.ac.yu KDD Process KDD is an overall process of discovering useful knowledge from data. Data mining is a particular step in the
More informationWeek 3 lecture slides
Week 3 lecture slides Topics Data Warehouses Online Analytical Processing Introduction to Data Cubes Textbook reference: Chapter 3 Data Warehouses A data warehouse is a collection of data specifically
More informationThe Data Mining Process
Sequence for Determining Necessary Data. Wrong: Catalog everything you have, and decide what data is important. Right: Work backward from the solution, define the problem explicitly, and map out the data
More informationData Mining Applications in Higher Education
Executive report Data Mining Applications in Higher Education Jing Luan, PhD Chief Planning and Research Officer, Cabrillo College Founder, Knowledge Discovery Laboratories Table of contents Introduction..............................................................2
More informationDATA MINING TECHNIQUES SUPPORT TO KNOWLEGDE OF BUSINESS INTELLIGENT SYSTEM
INTERNATIONAL JOURNAL OF RESEARCH IN COMPUTER APPLICATIONS AND ROBOTICS ISSN 2320-7345 DATA MINING TECHNIQUES SUPPORT TO KNOWLEGDE OF BUSINESS INTELLIGENT SYSTEM M. Mayilvaganan 1, S. Aparna 2 1 Associate
More informationWhen to consider OLAP?
When to consider OLAP? Author: Prakash Kewalramani Organization: Evaltech, Inc. Evaltech Research Group, Data Warehousing Practice. Date: 03/10/08 Email: erg@evaltech.com Abstract: Do you need an OLAP
More informationWebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat
Information Builders enables agile information solutions with business intelligence (BI) and integration technologies. WebFOCUS the most widely utilized business intelligence platform connects to any enterprise
More informationOLAP and OLTP. AMIT KUMAR BINDAL Associate Professor M M U MULLANA
OLAP and OLTP AMIT KUMAR BINDAL Associate Professor Databases Databases are developed on the IDEA that DATA is one of the critical materials of the Information Age Information, which is created by data,
More informationDistance Learning and Examining Systems
Lodz University of Technology Distance Learning and Examining Systems - Theory and Applications edited by Sławomir Wiak Konrad Szumigaj HUMAN CAPITAL - THE BEST INVESTMENT The project is part-financed
More informationBusiness Intelligence: Effective Decision Making
Business Intelligence: Effective Decision Making Bellevue College Linda Rumans IT Instructor, Business Division Bellevue College lrumans@bellevuecollege.edu Current Status What do I do??? How do I increase
More informationBusiness Analytics and Data Visualization. Decision Support Systems Chattrakul Sombattheera
Business Analytics and Data Visualization Decision Support Systems Chattrakul Sombattheera Agenda Business Analytics (BA): Overview Online Analytical Processing (OLAP) Reports and Queries Multidimensionality
More informationEXPLORING & MODELING USING INTERACTIVE DECISION TREES IN SAS ENTERPRISE MINER. Copyr i g ht 2013, SAS Ins titut e Inc. All rights res er ve d.
EXPLORING & MODELING USING INTERACTIVE DECISION TREES IN SAS ENTERPRISE MINER ANALYTICS LIFECYCLE Evaluate & Monitor Model Formulate Problem Data Preparation Deploy Model Data Exploration Validate Models
More informationChapter 5. Warehousing, Data Acquisition, Data. Visualization
Decision Support Systems and Intelligent Systems, Seventh Edition Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization 5-1 Learning Objectives
More informationEasily Identify the Right Customers
PASW Direct Marketing 18 Specifications Easily Identify the Right Customers You want your marketing programs to be as profitable as possible, and gaining insight into the information contained in your
More information5.5 Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall. Figure 5-2
Class Announcements TIM 50 - Business Information Systems Lecture 15 Database Assignment 2 posted Due Tuesday 5/26 UC Santa Cruz May 19, 2015 Database: Collection of related files containing records on
More informationOracle9i Data Warehouse Review. Robert F. Edwards Dulcian, Inc.
Oracle9i Data Warehouse Review Robert F. Edwards Dulcian, Inc. Agenda Oracle9i Server OLAP Server Analytical SQL Data Mining ETL Warehouse Builder 3i Oracle 9i Server Overview 9i Server = Data Warehouse
More informationIMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH
IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH Kalinka Mihaylova Kaloyanova St. Kliment Ohridski University of Sofia, Faculty of Mathematics and Informatics Sofia 1164, Bulgaria
More informationUniversity of Gaziantep, Department of Business Administration
University of Gaziantep, Department of Business Administration The extensive use of information technology enables organizations to collect huge amounts of data about almost every aspect of their businesses.
More informationBusiness Intelligence Solutions. Cognos BI 8. by Adis Terzić
Business Intelligence Solutions Cognos BI 8 by Adis Terzić Fairfax, Virginia August, 2008 Table of Content Table of Content... 2 Introduction... 3 Cognos BI 8 Solutions... 3 Cognos 8 Components... 3 Cognos
More informationTechnology in Action. Alan Evans Kendall Martin Mary Anne Poatsy. Eleventh Edition. Copyright 2015 Pearson Education, Inc.
Copyright 2015 Pearson Education, Inc. Technology in Action Alan Evans Kendall Martin Mary Anne Poatsy Eleventh Edition Copyright 2015 Pearson Education, Inc. Technology in Action Chapter 9 Behind the
More informationLearning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal
Learning Example Chapter 18: Learning from Examples 22c:145 An emergency room in a hospital measures 17 variables (e.g., blood pressure, age, etc) of newly admitted patients. A decision is needed: whether
More informationA Property & Casualty Insurance Predictive Modeling Process in SAS
Paper AA-02-2015 A Property & Casualty Insurance Predictive Modeling Process in SAS 1.0 ABSTRACT Mei Najim, Sedgwick Claim Management Services, Chicago, Illinois Predictive analytics has been developing
More informationData Mining Solutions for the Business Environment
Database Systems Journal vol. IV, no. 4/2013 21 Data Mining Solutions for the Business Environment Ruxandra PETRE University of Economic Studies, Bucharest, Romania ruxandra_stefania.petre@yahoo.com Over
More informationGerry Hobbs, Department of Statistics, West Virginia University
Decision Trees as a Predictive Modeling Method Gerry Hobbs, Department of Statistics, West Virginia University Abstract Predictive modeling has become an important area of interest in tasks such as credit
More informationData Mart/Warehouse: Progress and Vision
Data Mart/Warehouse: Progress and Vision Institutional Research and Planning University Information Systems What is data warehousing? A data warehouse: is a single place that contains complete, accurate
More informationChapter 7: Data Mining
Chapter 7: Data Mining Overview Topics discussed: The Need for Data Mining and Business Value The Data Mining Process: Define Business Objectives Get Raw Data Identify Relevant Predictive Variables Gain
More informationIntroduction to Data Mining
Introduction to Data Mining 1 Why Data Mining? Explosive Growth of Data Data collection and data availability Automated data collection tools, Internet, smartphones, Major sources of abundant data Business:
More informationBUILDING BLOCKS OF DATAWAREHOUSE. G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT
BUILDING BLOCKS OF DATAWAREHOUSE G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT 1 Data Warehouse Subject Oriented Organized around major subjects, such as customer, product, sales. Focusing on
More informationCHAPTER 5: BUSINESS ANALYTICS
Chapter 5: Business Analytics CHAPTER 5: BUSINESS ANALYTICS Objectives The objectives are: Describe Business Analytics. Explain the terminology associated with Business Analytics. Describe the data warehouse
More informationPREFACE INTRODUCTION MULTI-DIMENSIONAL MODEL. Chris Claterbos, Vlamis Software Solutions, Inc. dvlamis@vlamis.com
BUILDING CUBES AND ANALYZING DATA USING ORACLE OLAP 11G Chris Claterbos, Vlamis Software Solutions, Inc. dvlamis@vlamis.com PREFACE As of this writing, Oracle Business Intelligence and Oracle OLAP are
More informationData are everywhere. IBM projects that every day we generate 2.5 quintillion bytes of data. In relative terms, this means 90
FREE echapter C H A P T E R1 Big Data and Analytics Data are everywhere. IBM projects that every day we generate 2.5 quintillion bytes of data. In relative terms, this means 90 percent of the data in the
More informationMBA 8473 - Data Mining & Knowledge Discovery
MBA 8473 - Data Mining & Knowledge Discovery MBA 8473 1 Learning Objectives 55. Explain what is data mining? 56. Explain two basic types of applications of data mining. 55.1. Compare and contrast various
More informationDecision Trees What Are They?
Decision Trees What Are They? Introduction...1 Using Decision Trees with Other Modeling Approaches...5 Why Are Decision Trees So Useful?...8 Level of Measurement... 11 Introduction Decision trees are a
More informationData Mining. Knowledge Discovery, Data Warehousing and Machine Learning Final remarks. Lecturer: JERZY STEFANOWSKI
Data Mining Knowledge Discovery, Data Warehousing and Machine Learning Final remarks Lecturer: JERZY STEFANOWSKI Email: Jerzy.Stefanowski@cs.put.poznan.pl Data Mining a step in A KDD Process Data mining:
More informationTurning your Warehouse Data into Business Intelligence: Reporting Trends and Visibility Michael Armanious; Vice President Sales and Marketing Datex,
Turning your Warehouse Data into Business Intelligence: Reporting Trends and Visibility Michael Armanious; Vice President Sales and Marketing Datex, Inc. Overview Introduction What is Business Intelligence?
More informationAzure Machine Learning, SQL Data Mining and R
Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:
More information1. OLAP is an acronym for a. Online Analytical Processing b. Online Analysis Process c. Online Arithmetic Processing d. Object Linking and Processing
1. OLAP is an acronym for a. Online Analytical Processing b. Online Analysis Process c. Online Arithmetic Processing d. Object Linking and Processing 2. What is a Data warehouse a. A database application
More informationData Warehouse design
Data Warehouse design Design of Enterprise Systems University of Pavia 21/11/2013-1- Data Warehouse design DATA PRESENTATION - 2- BI Reporting Success Factors BI platform success factors include: Performance
More informationStatistics for BIG data
Statistics for BIG data Statistics for Big Data: Are Statisticians Ready? Dennis Lin Department of Statistics The Pennsylvania State University John Jordan and Dennis K.J. Lin (ICSA-Bulletine 2014) Before
More informationIMPLEMENTATION OF DATA WAREHOUSE SAP BW IN THE PRODUCTION COMPANY. Maria Kowal, Galina Setlak
174 No:13 Intelligent Information and Engineering Systems IMPLEMENTATION OF DATA WAREHOUSE SAP BW IN THE PRODUCTION COMPANY Maria Kowal, Galina Setlak Abstract: in this paper the implementation of Data
More informationIntegrated Data Mining and Knowledge Discovery Techniques in ERP
Integrated Data Mining and Knowledge Discovery Techniques in ERP I Gandhimathi Amirthalingam, II Rabia Shaheen, III Mohammad Kousar, IV Syeda Meraj Bilfaqih I,III,IV Dept. of Computer Science, King Khalid
More informationClassification and Prediction
Classification and Prediction Slides for Data Mining: Concepts and Techniques Chapter 7 Jiawei Han and Micheline Kamber Intelligent Database Systems Research Lab School of Computing Science Simon Fraser
More informationData Mining for Fun and Profit
Data Mining for Fun and Profit Data mining is the extraction of implicit, previously unknown, and potentially useful information from data. - Ian H. Witten, Data Mining: Practical Machine Learning Tools
More informationEnhancing Compliance with Predictive Analytics
Enhancing Compliance with Predictive Analytics FTA 2007 Revenue Estimation and Research Conference Reid Linn Tennessee Department of Revenue reid.linn@state.tn.us Sifting through a Gold Mine of Tax Data
More informationDATA MINING TECHNIQUES AND APPLICATIONS
DATA MINING TECHNIQUES AND APPLICATIONS Mrs. Bharati M. Ramageri, Lecturer Modern Institute of Information Technology and Research, Department of Computer Application, Yamunanagar, Nigdi Pune, Maharashtra,
More informationData, Measurements, Features
Data, Measurements, Features Middle East Technical University Dep. of Computer Engineering 2009 compiled by V. Atalay What do you think of when someone says Data? We might abstract the idea that data are
More informationIndex Contents Page No. Introduction . Data Mining & Knowledge Discovery
Index Contents Page No. 1. Introduction 1 1.1 Related Research 2 1.2 Objective of Research Work 3 1.3 Why Data Mining is Important 3 1.4 Research Methodology 4 1.5 Research Hypothesis 4 1.6 Scope 5 2.
More informationNine Common Types of Data Mining Techniques Used in Predictive Analytics
1 Nine Common Types of Data Mining Techniques Used in Predictive Analytics By Laura Patterson, President, VisionEdge Marketing Predictive analytics enable you to develop mathematical models to help better
More informationHow to Get More Value from Your Survey Data
Technical report How to Get More Value from Your Survey Data Discover four advanced analysis techniques that make survey research more effective Table of contents Introduction..............................................................2
More informationBasic Concepts in Research and Data Analysis
Basic Concepts in Research and Data Analysis Introduction: A Common Language for Researchers...2 Steps to Follow When Conducting Research...3 The Research Question... 3 The Hypothesis... 4 Defining the
More informationDatabase Management System Dr. S. Srinath Department of Computer Science & Engineering Indian Institute of Technology, Madras Lecture No.
Database Management System Dr. S. Srinath Department of Computer Science & Engineering Indian Institute of Technology, Madras Lecture No. # 31 Introduction to Data Warehousing and OLAP Part 2 Hello and
More informationAlexander Nikov. 5. Database Systems and Managing Data Resources. Learning Objectives. RR Donnelley Tries to Master Its Data
INFO 1500 Introduction to IT Fundamentals 5. Database Systems and Managing Data Resources Learning Objectives 1. Describe how the problems of managing data resources in a traditional file environment are
More informationAn Overview of Data Mining: Predictive Modeling for IR in the 21 st Century
An Overview of Data Mining: Predictive Modeling for IR in the 21 st Century Nora Galambos, PhD Senior Data Scientist Office of Institutional Research, Planning & Effectiveness Stony Brook University AIRPO
More informationfrom Larson Text By Susan Miertschin
Decision Tree Data Mining Example from Larson Text By Susan Miertschin 1 Problem The Maximum Miniatures Marketing Department wants to do a targeted mailing gpromoting the Mythic World line of figurines.
More informationA New Approach for Evaluation of Data Mining Techniques
181 A New Approach for Evaluation of Data Mining s Moawia Elfaki Yahia 1, Murtada El-mukashfi El-taher 2 1 College of Computer Science and IT King Faisal University Saudi Arabia, Alhasa 31982 2 Faculty
More informationBusiness Intelligence and Decision Support Systems
Chapter 12 Business Intelligence and Decision Support Systems Information Technology For Management 7 th Edition Turban & Volonino Based on lecture slides by L. Beaubien, Providence College John Wiley
More informationIT and CRM A basic CRM model Data source & gathering system Database system Data warehouse Information delivery system Information users
1 IT and CRM A basic CRM model Data source & gathering Database Data warehouse Information delivery Information users 2 IT and CRM Markets have always recognized the importance of gathering detailed data
More informationChapter 4 Getting Started with Business Intelligence
Chapter 4 Getting Started with Business Intelligence Learning Objectives and Learning Outcomes Learning Objectives Getting started on Business Intelligence 1. Understanding Business Intelligence 2. The
More informationCHAPTER 4: BUSINESS ANALYTICS
Chapter 4: Business Analytics CHAPTER 4: BUSINESS ANALYTICS Objectives Introduction The objectives are: Describe Business Analytics Explain the terminology associated with Business Analytics Describe the
More information3/17/2009. Knowledge Management BIKM eclassifier Integrated BIKM Tools
Paper by W. F. Cody J. T. Kreulen V. Krishna W. S. Spangler Presentation by Dylan Chi Discussion by Debojit Dhar THE INTEGRATION OF BUSINESS INTELLIGENCE AND KNOWLEDGE MANAGEMENT BUSINESS INTELLIGENCE
More informationBasics of Dimensional Modeling
Basics of Dimensional Modeling Data warehouse and OLAP tools are based on a dimensional data model. A dimensional model is based on dimensions, facts, cubes, and schemas such as star and snowflake. Dimensional
More informationData Warehousing and Data Mining
Data Warehousing and Data Mining Part I: Data Warehousing Gao Cong gaocong@cs.aau.dk Slides adapted from Man Lung Yiu and Torben Bach Pedersen Course Structure Business intelligence: Extract knowledge
More informationOn-Line Application Processing. Warehousing Data Cubes Data Mining
On-Line Application Processing Warehousing Data Cubes Data Mining 1 Overview Traditional database systems are tuned to many, small, simple queries. Some new applications use fewer, more time-consuming,
More informationSeamless Dynamic Web Reporting with SAS D.J. Penix, Pinnacle Solutions, Indianapolis, IN
Seamless Dynamic Web Reporting with SAS D.J. Penix, Pinnacle Solutions, Indianapolis, IN ABSTRACT The SAS Business Intelligence platform provides a wide variety of reporting interfaces and capabilities
More informationPractical Data Science with Azure Machine Learning, SQL Data Mining, and R
Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Overview This 4-day class is the first of the two data science courses taught by Rafal Lukawiecki. Some of the topics will be
More informationIBM SPSS Direct Marketing 19
IBM SPSS Direct Marketing 19 Note: Before using this information and the product it supports, read the general information under Notices on p. 105. This document contains proprietary information of SPSS
More informationDATA WAREHOUSE E KNOWLEDGE DISCOVERY
DATA WAREHOUSE E KNOWLEDGE DISCOVERY Prof. Fabio A. Schreiber Dipartimento di Elettronica e Informazione Politecnico di Milano DATA WAREHOUSE (DW) A TECHNIQUE FOR CORRECTLY ASSEMBLING AND MANAGING DATA
More informationChapter 6. Foundations of Business Intelligence: Databases and Information Management
Chapter 6 Foundations of Business Intelligence: Databases and Information Management VIDEO CASES Case 1a: City of Dubuque Uses Cloud Computing and Sensors to Build a Smarter, Sustainable City Case 1b:
More informationHow To Make A Credit Risk Model For A Bank Account
TRANSACTIONAL DATA MINING AT LLOYDS BANKING GROUP Csaba Főző csaba.fozo@lloydsbanking.com 15 October 2015 CONTENTS Introduction 04 Random Forest Methodology 06 Transactional Data Mining Project 17 Conclusions
More informationCOMP3420: Advanced Databases and Data Mining. Classification and prediction: Introduction and Decision Tree Induction
COMP3420: Advanced Databases and Data Mining Classification and prediction: Introduction and Decision Tree Induction Lecture outline Classification versus prediction Classification A two step process Supervised
More informationINTRODUCTION TO BUSINESS INTELLIGENCE What to consider implementing a Data Warehouse and Business Intelligence
INTRODUCTION TO BUSINESS INTELLIGENCE What to consider implementing a Data Warehouse and Business Intelligence Summary: This note gives some overall high-level introduction to Business Intelligence and
More informationEzgi Dinçerden. Marmara University, Istanbul, Turkey
Economics World, Mar.-Apr. 2016, Vol. 4, No. 2, 60-65 doi: 10.17265/2328-7144/2016.02.002 D DAVID PUBLISHING The Effects of Business Intelligence on Strategic Management of Enterprises Ezgi Dinçerden Marmara
More information14. Data Warehousing & Data Mining
14. Data Warehousing & Data Mining Data Warehousing Concepts Decision support is key for companies wanting to turn their organizational data into an information asset Data Warehouse "A subject-oriented,
More informationCopyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 29-1
Slide 29-1 Chapter 29 Overview of Data Warehousing and OLAP Chapter 29 Outline Purpose of Data Warehousing Introduction, Definitions, and Terminology Comparison with Traditional Databases Characteristics
More informationPredictive Analytics Techniques: What to Use For Your Big Data. March 26, 2014 Fern Halper, PhD
Predictive Analytics Techniques: What to Use For Your Big Data March 26, 2014 Fern Halper, PhD Presenter Proven Performance Since 1995 TDWI helps business and IT professionals gain insight about data warehousing,
More informationBusiness Intelligence: Using Data for More Than Analytics
Business Intelligence: Using Data for More Than Analytics Session 672 Session Overview Business Intelligence: Using Data for More Than Analytics What is Business Intelligence? Business Intelligence Solution
More informationANALYTICS CENTER LEARNING PROGRAM
Overview of Curriculum ANALYTICS CENTER LEARNING PROGRAM The following courses are offered by Analytics Center as part of its learning program: Course Duration Prerequisites 1- Math and Theory 101 - Fundamentals
More informationA Review of Data Mining Techniques
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 4, April 2014,
More informationSTATISTICA. Financial Institutions. Case Study: Credit Scoring. and
Financial Institutions and STATISTICA Case Study: Credit Scoring STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table of Contents INTRODUCTION: WHAT
More informationData Mining for Business Analytics
Data Mining for Business Analytics Lecture 2: Introduction to Predictive Modeling Stern School of Business New York University Spring 2014 MegaTelCo: Predicting Customer Churn You just landed a great analytical
More informationData Mining for Successful Healthcare Organizations
Data Mining for Successful Healthcare Organizations For successful healthcare organizations, it is important to empower the management and staff with data warehousing-based critical thinking and knowledge
More informationKnowledge Discovery and Data Mining. Structured vs. Non-Structured Data
Knowledge Discovery and Data Mining Unit # 2 1 Structured vs. Non-Structured Data Most business databases contain structured data consisting of well-defined fields with numeric or alphanumeric values.
More informationAn Introduction to Data Mining
An Introduction to Intel Beijing wei.heng@intel.com January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail
More information