Introduction to Data Mining And Predictive Analytics!
|
|
|
- Edwina Hubbard
- 10 years ago
- Views:
Transcription
1 Introduction to Data Mining And Predictive Analytics!! Paul Rodriguez, Ph.D.! Predictive Analytics Center of Excellence,! San Diego Supercomputer Center! University of California, San Diego!
2 PACE: Closing the gap between Government, Industry and Academia! Bridge the Industry and Academia Gap Provide Predictive Analytics Services Data Mining Repository of Very Large Data Sets Inform, Educate and Train Predictive Analysis Center of Excellence Foster Research and Collaboration Develop Standards and Methodology High Performance Scalable Data Mining PACE is a non-profit, public educational organization! o To promote, educate and innovate in the area of Predictive Analytics! o To leverage predictive analytics to improve the education and well being of the global population and economy! o To develop and promote a new, multi-level curriculum to broaden participation in the field of predictive analytics!!!
3 Introduction to! WHAT IS DATA MINING?!
4 Necessity is the Mother of Invention! Data explosion! Automated data collection tools and mature database technology lead to tremendous amounts of data stored in databases, data warehouses and other information repositories! n We are drowning in data, but starving for knowledge! (John Naisbitt, 1982)! 4
5 Necessity is the Mother of Invention! Data explosion! n..explosion of heterogeneous, multi-disciplinary, Earth Science data has rendered traditional means of integration and analysis ineffective, necessitating the application of new analysis methods for synthesis, comparison, visualization (Hoffman et al, Data Mining in Earth System Science 2011)! 5
6 What does Data Mining Do? Explores Your Data! Finds Patterns! Performs Predictions!
7 Data Mining perspective Top-Down: Hypothesis Driven Analytical Tools Bottom-Up: Data Driven Surface Shallow Corporate Data Hidden SQL tools for simple queries and reporting Statistical & OLAP tools for summaries and analysis Data Mining methods for knowledge discovery 7
8 Query Reporting OLAP Data Mining Extraction of data Descriptive summaries Discovery of hidden patterns, information Information Analysis Insight knowledge and prediction Who purchased the product in the last 2 quarters? What is an average income of the buyers per quarter by district? What is the relationship between customers and their purchases? 8
9 What Is Data Mining in practice?! Combination of AI and statistical analysis to discover information that is hidden in the data! associations (e.g. linking purchase of pizza with beer)! sequences (e.g. tying events together: marriage and purchase of furniture)! classifications (e.g. recognizing patterns such as the attributes of employees that are most likely to quit)! forecasting (e.g. predicting buying habits of customers based on past patterns)! 9
10 Multidisciplinary Field! Database Technology Statistics Machine Learning Data Mining Visualization Artificial Intelligence Other Disciplines 10
11 TAXONOMY! Predictive Methods! Use some variables to predict some unknown or future values of other variables! Descriptive Methods! Find human interpretable patterns that describe the data! Supervised (outcomes or labels provided)vs. Unsupervised (just data)!!!!
12 What can we do with Data Mining?! Exploratory Data Analysis! Cluster analysis/segmentation! Predictive Modeling: Classification and Regression! Discovering Patterns and Rules! Association/Dependency rules! Sequential patterns! Temporal sequences! Anomaly detection! 12
13 13 Data Mining Applications! Science: Chemistry, Physics, Medicine, Energy! Biochemical analysis, remote sensors on a satellite, Telescopes star galaxy classification, medical image analysis! Bioscience! Sequence-based analysis, protein structure and function prediction, protein family classification, microarray gene expression! Pharmaceutical companies, Insurance and Health care, Medicine! Drug development, identify successful medical therapies, claims analysis, fraudulent behavior, medical diagnostic tools, predict office visits! Financial Industry, Banks, Businesses, E-commerce! Stock and investment analysis, identify loyal customers vs. risky customer, predict customer spending, risk management, sales forecasting!
14 Data Mining Tasks! n Concept/Class description: Characterization and discrimination n Generalize, summarize, and contrast data characteristics, e.g., dry vs. wet regions; normal vs. fraudulent behavior n Association (correlation and causality) n Multi-dimensional interactions and associations age(x, ) ^ income(x, 60-90K ) à buys(x, TV ) Hospital(area code) ^ procedure(x) ->claim (type) ^ claim(cost) 14
15 Data Mining Tasks! n Classification and Prediction n Finding models (functions) that describe and distinguish classes or concepts for future prediction n Example: classify countries based on climate, or classify cars based on gas mileage, fraud based on claims information, energy usage based on sensor data n Presentation: n If-THEN rules, decision-tree, classification rule, neural network n Prediction: Predict some unknown or missing numerical values 15
16 Data Mining Tasks! Cluster analysis! Class label is unknown: Group data to form new classes! Example: cluster claims or providers to find distribution patterns of unusual behavior! Clustering based on the principle: maximizing the intraclass similarity and minimizing the interclass similarity! 16
17 Data Mining Tasks n Outlier analysis n Outlier: a data object that does not comply with the general behavior of the data n Mostly considered as noise or exception, but is quite useful in fraud detection, rare events analysis n Trend and evolution analysis n Trend and deviation: regression analysis n Sequential pattern mining, periodicity analysis 17
18 KDD Process! Database Selection Transformation Data Preparation Training Data Data Mining Model, Patterns Evaluation, Verification 18
19 Steps of a KDD Process (1) n Learning the application domain: n relevant prior knowledge and goals of application n Creating a target data set: data selection n Data cleaning and preprocessing: (may take 60% of effort!) n Data reduction and transformation: n Find useful features, dimensionality/variable reduction, representation n Choosing functions of data mining n summarization, classification, regression, association, clustering 19
20 Steps of a KDD Process (2) n Choosing functions of data mining n summarization, classification, regression, association, clustering n Choosing the mining algorithm(s) n Data mining: search for patterns of interest n Pattern evaluation and knowledge presentation n visualization, transformation, removing redundant patterns, etc. n Use and integration of discovered knowledge 20
21 Learning and Modeling Methods! Decision Tree Induction (C4.5, J48)!!! Regression Tree Induction (CART, MP5)! Multivariate Adaptive Regression Splines (MARS)!! Clustering (K-means, EM, Cobweb)!!! Artificial Neural Networks (Backpropagation, Recurrent)! Support Vector Machines (SVM)! Various other models! 21
22 Decision Tree Induction! Method for approximating discrete-valued functions! robust to noisy/missing data! can learn non-linear relationships! inductive bias towards shorter trees!! 22
23 Decision Tree Induction!! Applications:! medical diagnosis ex. heart disease! analysis of complex chemical compounds! classifying equipment malfunction! risk of loan applicants! Boston housing project price prediction! fraud detection!! 23
24 Decision Tree Example!
25 Regression Tree Induction! Why Regression tree?! Ability to:! Predict continuous variable! Model conditional effects! Model uncertainty! 25
26 Regression Trees! Continuous goal variables! Induction by means of an efficient recursive partitioning algorithm! Uses linear regression to select internal nodes! Quinlan,
27 Clustering Basic idea: Group similar things together Unsupervised Learning Useful when no other info is available K-means Partitioning instances into k disjoint clusters Measure of similarity 27
28 Clustering! X X X X 28
29 Kmeans Results from 45,000 NYTimes articles cluster means shown with coordinates determining fontsize 7 viable clusters found 29
30 Artificial Neural Networks (ANNs)!! Network of many simple units Main Components Inputs Hidden layers Outputs 30 Adjusting weights of connections Backpropagation
31 Evaluation! Error on the training data vs. performance on future/unseen data! Simple solution! Split data into training and test set! Re-substitution error! error rate obtained from the training data! Three sets! training data, validation data, and test data!
32 Training and Testing! Test set! set of independent instances that have not been used in formation of classifier in any way! Assumption! data contains representative samples of the underlying problem! Example: classifiers built using customer data from two different towns A and B! To estimate performance of classifier from town in completely new town, test it on data from B!
33 Error Estimation Methods! Cross-validation! Partition in K disjoint clusters! Train k-1, test on remaining! Leave-one-out Method! Bootstrap! Sampling with replacement!
34 Data Mining Challenges!! Computationally expensive to investigate all possibilities! Dealing with noise/missing information and errors in data! Mining methodology and user interaction! Mining different kinds of knowledge in databases! Incorporation of background knowledge! Handling noise and incomplete data! Pattern evaluation: the interestingness problem! Expression and visualization of data mining results! 34
35 Data Mining Heuristics and Guide! Choosing appropriate attributes/input representation! Finding the minimal attribute space! Finding adequate evaluation function(s)! Extracting meaningful information! Not overfitting! 35
36 Model Recommendations! Usually no absolute choice and no silver bullets! (otherwise we wouldn t be here)! Start with simple methods! Consider trade off as you go more complex! Find similar application examples! (what works in this domain)! Find paradigmatic examples for models! (what works for this model)! Goals and Expectations!!!
37 Available Data Mining Tools! COTs:! n IBM Intelligent Miner! n SAS Enterprise Miner! n Oracle ODM! n Microstrategy! n Microsoft DBMiner! n Pentaho! n Matlab! n Teradata! Open Source:! n WEKA! n KNIME! n Orange! n RapidMiner! n NLTK! n R! n Rattle! 37
38 On the Importance of Data Prep! Garbage in, garbage out! A crucial step of the DM process! Could take 60-80% of the whole data mining effort!
39 Preprocessing Input/Output! Inputs:! raw data! Outputs:! two data sets: training and test (if available)! Training further broken into training and validation!
40 Variables and Features terms! Variables and their transformations are features! Instance labels are outcomes or dependent variables! Set of instances comprise the input data matrix! Often represented as a single flat file! Data Matrix size is often refer by N observations and P variables! Large N Small P => usually solvable! Small N Large P => not always solvable directly,!!!!!needs heuristics!
41 Types of Measurements! Nominal (names)! Categorical (zip codes)! Qualitative (unordered, non-scalar) Ordinal (H,M,L)! Real Numbers! May or may not have a! Natural Zero Point?! Quantitative (ordered, scalar) If not comparisons are OK but not multiplication (e.g. dates)!
42 Know variable properties! Variables as objects of study! explore characteristics of each variable:! typical values, min, max, range etc.! entirely empty or constant variables can be discarded! explore variable dependencies! Sparsity! missing, N/A, or 0?! Monotonicity! increasing without bound, e.g. dates, invoice numbers! new values not in the training set!
43 Data Integration Issues! When combining data from multiple sources:! integrate metadata (table column properties)! the same attribute may have different names in different databases! e.g.a.patient-id B.patient-number!! Using edit-distance can help resolve entities!! Detecting and resolving data value conflicts! e.g. attribute values may have different scales or nominal labels!
44 Noise in Data! Noise is unknown error source! often assumed random Normal distribution! often assumed independent between instances! Approaches to Address Noise! Detect suspicious values and remove outliers! Smooth by averaging with neighbors! Smooth by fitting the data into regression functions!
45 Missing Data! Important to review statistics of a missing variable! Are missing cases random?! Are missing cases related to some other variable?! Are other variables missing data in same instances?! Is there a relation between missing cases and outcome variable?! What is frequency of missing cases wrt all instances?!
46 Approaches to Handle Missing Data! Ignore the tuple: usually done when class label is missing or if there s plenty of data! Use a global constant to fill in ( impute ) the missing value:!(or a special unknown value for nominals)! Use the attribute mean to impute the missing value! Use the attribute mean to impute all samples belonging to the same class! Use an algorithm to impute missing value! Use separate models with/without values!
47 Variable Enhancements! Make analysis easy for the tool! if you know how to deduce a feature, do it yourself and don t make the tool find it out! to save time and reduce noise! include relevant domain knowledge! Getting enough data! Do the observed values cover the whole range of data?!
48 Data Transformation: Normalizations! Mean center!! z-score! z xnew score = = x mean(x) x mean(x) std(x) Scale to [0 1]! xnew = x min(x) max(x)- min(x) log scaling! xnew = log(x)!
49 VariableTransformation Summary! Smoothing: remove noise from data! Aggregation: summarization! Introduce/relabel/categorize variable values! Normalization: scaled to fall within a small, specified range! Attribute/feature construction!
50 Pause!
51 Data Exploration and Unsupervised Learning with Clustering! Paul F Rodriguez,PhD! San Diego Supercomputer Center! Predictive Analytic Center of Excellence!!
52 Clustering Idea! Given a set of data can we find a natural grouping?! X 2 X 1
53 Why Clustering! A good grouping implies some structure! In other words, given a good grouping,! we can then:! Interpret and label clusters! Identify important features! Characterize new points by the closest cluster (or nearest neighbors)! Use the cluster assignments as a compression or summary of the data!!
54 Clustering Objective! Objective: find subsets that are similar within cluster and dissimilar between clusters! Similarity defined by distance measures! Euclidean distance! Manhattan distance! Mahalanobis! (Euclidean w/dimensions! rescaled by variance)!!
55 Kmeans Clustering! A simple, effective, and standard method!!start with K initial cluster centers!!loop:!!!assign each data point to nearest cluster center!!!calculate mean of cluster for new center!!stop when assignments don t change! Issues:!!How to choose K?!!How to choose initial centers?!!will it always stop?!!
56 Kmeans Example! For K=1, using Euclidean distance, where will the cluster center be?! X 2 X 1
57 Kmeans Example! For K=1, the overall mean minimizes Sum Squared Error (SSE), aka Euclidean distance!
58 Kmeans Example! K=1 K=2 K=3 K=4 As K increases individual points get a cluster
59 Choosing K for Kmeans! Total Within Cluster SSE K=1 to 10 - Not much improvement after K=2 ( elbow )
60 Kmeans Example more points! How many clusters should there be?
61 Choosing K for Kmeans! Total Within Cluster SSE K=1 to 10 - Smooth decrease at K 2, harder to choose - In general, smoother decrease => less structure
62 Kmeans Guidelines! Choosing K:! Elbow in total-within-cluster SSE as K=1 N! Cross-validation: hold out points, compare fit as K=1 N! Choosing initial starting points:! take K random data points, do several Kmeans, take best fit! Stopping:! may converge to sub-optimal clusters! may get stuck or have slow convergence (point assignments bounce around), 10 iterations is often good!!
63 Kmeans Example uniform! K=1 K=2 K=3 K=4
64 Choosing K - uniform! Total Within Cluster SSE K=1 to 10 - Smooth decrease across K => less structure
65 Kmeans Clustering Issues! Deviations:! Dimensions with large numbers may dominate distance metrics! Kmeans doesn t model group variances explicitly (so groups with unequal variances may get blurred)! Outliers:! Outliers can pull cluster mean, K-mediods uses median instead of mean!!!
66 Soft Clustering Methods! Fuzzy Clustering! Use weighted assignments to all clusters! Weights depend on relative distance! Find min weighted SSE! Expectation-Maximization:! Mixture of multivariate Gaussian distributions! Mixture weights are missing! Find most likely means & variances, for the expectations of the data given the weights!!
67 Kmeans unequal cluster variance!
68 Choosing K unequal distributions! Total Within Cluster SSE K=1 to 10 - Smooth decrease across K => less structure
69 EM clustering! Selects K=2 (using Bayesian Information Criterion) Handles unequal variance
70 Kmeans computations! Distance of each point to each cluster center! For N points, D dimensions: each loop requires N*D*K operations!! Update Cluster centers! only track points that change, get change in cluster center! On HPC:! Distance calculations can be partitioned data across dimension!
71 Dimensionality Reduction via Principle Components! Idea: Given N points and P features (aka dimensions), can we represent data with fewer features:! Yes, if features are constant! Yes, if features are redundant! Yes, if features only contribute noise (conversely, want features that contribute to variations of the data)!
72 Dimensionality Reduction via Principle Components! PCA:! Find set of vector (aka factors) that describe data in alternative way! First component is the vector that maximizes the variance of data projected onto that vector! K-th component is orthogonal to all k-1 previous components!
73 PCA on 2012 Olympic Althetes Height by Weight scatter plot Height- cm (mean centered)! Idea: Can you rotate the axis so that the data lines up on one axis as much as possible? Or, find the direction in space with maximum variance of the data? Weight- Kg (mean centered)
74 Height- cm (mean centered) PCA on 2012 Olympic Althetes Height by Weight scatter plot PC2! PC1 Projection of (145,5) to PCs Total Variance Conserved: Var in Weight + Var in Height = Var in PC1 + Var in PC2 In general: Var in PC1> Var in PC2> Var in PC3 Weight- Kg (mean centered)
75 SVD (=PCA for non-square matrix)! Informally, X NxP = U NxP X S PxP X V PxP!!!!!!!= X X! Taking k<p => X NxP ~ U NxK X S KxK X V KxP! U corresponds to observations in k-dimensional factor space! (eg each row in U is a point in K dimensional space, AKA factor loadings for data)! V corresponds to measurements in K factor space! (eg each row in V is a point in k-dim. factor space, AKA factor loadings for indepented variable)! V corresponds to factors in original space!! (eg each row in V is a point in P dimensional data space)!
76 Principle Components! Can choose k heuristically as approximation improves,!or choose k so that 95% of data variance accounted! aka Singular Value Decomposition!!PCA on square matrices only!!svd, more general! Works for numeric data only! In contrast, clustering reduces to categorical groups! In some cases, k PCs k clusters!!
77 Summary! Having no label doesn t stop you from finding structure in data! Unsupervised methods are somewhat related!
78 Earth System Sciences Example! An Investigation Using SiroSOM for the Analysis of QUEST Stream-Sediment and Lake-Sediment Geochemical Data (Fraser & Hodgkinson 2009) at! Quest_Out_Data.csv)!! Summary: identify patterns and relationships anomalous data promote resource assessment and exploration! Data: 15,000 rows x 42 measures (and metadata)!
79 Data location and spreadsheet!
80 N=15,000 x P=42 correlation plot!
81 Standard Deviation of Columns! Standard Deviation
82 Kmeans SSE as K=1..30! K=1 30; Authors used 20
83 K=20, heat map of centroids! Minerals
84 Try Using Matrix Factorization (SVD), use 3 factors, then take Kmeans on those! K=1 30, Data is N=15000 x 3 factors
85 Plots of all data points on 2 factors at a time (the U matrix from svd() (you could also plot V matrix)
86 PC1xPC2 with Cluster Coded!
87 PC1xPC3 with Cluster Coded!
88 Kmeans vs SiroSOM w/kmeans! Fraser: used SOM (self organizing map, a non-linear, iteractive matrix factorization technique), then Kmeans Kmeans, k=20 on raw data gives 58% overlap with article 3 SVD factors, then Kmeans k=20 gives 29% overlap 10 SVD factors, then Kmeans K=20 gives 38% overlap SOM w/kmeans cluster My cluster assigment
89 Plotting in Physical Space! Fraser (l) cluster assignment vs Kmeans alone (r)!
90 End! If time, hierarchical clustering next!
91 Incremental & Hierarchical Clustering! Start with 1 cluster (all instances) and do splits!!!!or! Start with N clusters (1 per instance) and do merges!! Can be greedy & expensive in its search!!some algorithms might merge & split!!algorithms need to store and recalculate distances! Need distance between groups!!in constrast to K-means!
92 Incremental & Hierarchical Clustering! Result is a hierarchy of clusters! displayed as a dendrogram tree! Useful for tree-like interpretations! syntax (e.g. word co-occurences)! concepts (e.g. classification of animals)! topics (e.g. sorting Enron s)! spatial data (e.g. city distances)! genetic expression (e.g. possible biological networks)! exploratory analysis!
93 Incremental & Hierarchical Clustering! Clusters are merged/split according to distance or utility measure! Euclidean distance (squared differences)! conditional probabilities (for nominal features)!! Options to choose which clusters to Link! single linkage, mean, average (w.r.t. points in clusters)!!(may lead to different trees, depending on spreads)! Ward method (smallest increase within cluster variance)! change in probability of features for given clusters!
94 Linkage options! e.g. single linkage (closest to any cluster instance)! Cluster1 Cluster2 e.g. mean (closest to mean of all cluster instances)! Cluster1 Cluster2
95 Linkage options (cont )! e.g. average (mean of pairwise distances)! Cluster1 Cluster2 e.g. Ward s method (find new cluster with min. variance)! Cluster1 Cluster2
96 Hierarchical Clustering Demo! 3888 Interactions among 685 proteins! From Hu et.al. TAP dataset bacteriome/dataset/)! b0009 b b0009 b b0014 b b0014 b b0014 b b0014 b b0014 b b0015 b
97 Hierarchical Clustering Demo! Essential R commands: >d=read.table("hu_tap_ppi.txt"); >str(d) #show d structure 'data.frame': 3888 obs. of 3 variables: $ V1: Factor w/ 685 levels "b0009","b0014",..: $ V2: Factor w/ 536 levels "b0011","b0014",..: $ V3: num >fs =c(d[,1],d[,2]); #combine factor levels >str(fs) int [1:7776] Note: strings read as factors
98 Hierarchical Clustering Demo! Essential R commands: C =matrix(0,p,p); #Connection matrix (aka Adjacency matrix) IJ =cbind(d[,1],d[,2]) #factor level is saved as Nx2 list of i-th,j-th protein for (i in 1:N) {C[IJ[i,1],IJ[i,2]]=1;} #populate C with 1 for connections install.packages('igraph') library('igraph') gc=graph.adjacency(c,mode="directed") plot.graph(gc,vertex.size=3,edge.arrow.size=0,vertex.label=na) or just plot(.
99 Hierarchical Clustering Demo! hclust with single distance: chaining!! d2use=dist(c,method="binary") fit <- hclust(d2use, method= single") plot(fit) the cluster distance hen 2 are combined Items that cluster first
100 Hierarchical Clustering Demo! hclust with Ward distance: spherical clusters!
101 Hierarchical Clustering Demo! Where height change looks big, cut off tree! groups <- cutree(fit, k=7) rect.hclust(fit, k=7, border="red")
102 Hierarchical Clustering Demo! Kmeans vs Hierarchical:! Lots of overlap! despite that Kmeans not have binary distance option!!! Hierarchical Group Assigment Kmeans cluster assigment 1! 2! 3! 4! 5! 6! 7! 1! 26! 0! 0! 293! 0! 19! 23! 2! 2! 1! 0! 9! 0! 0! 46! 3! 72! 0! 0! 27! 0! 2! 1!! groups <- cutree(fit, k=7) Kresult=kmeans(d2use,7,10 table(kresult$cluster,group
Introduction To Predictive Analytics And Data Mining!
Introduction To Predictive Analytics And Data Mining!! Natasha Balac, Ph.D.! Predictive Analytics Center of Excellence, Director! San Diego Supercomputer Center! University of California, San Diego! PACE
Introduction. A. Bellaachia Page: 1
Introduction 1. Objectives... 3 2. What is Data Mining?... 4 3. Knowledge Discovery Process... 5 4. KD Process Example... 7 5. Typical Data Mining Architecture... 8 6. Database vs. Data Mining... 9 7.
Introduction to Data Mining
Introduction to Data Mining 1 Why Data Mining? Explosive Growth of Data Data collection and data availability Automated data collection tools, Internet, smartphones, Major sources of abundant data Business:
Clustering. Adrian Groza. Department of Computer Science Technical University of Cluj-Napoca
Clustering Adrian Groza Department of Computer Science Technical University of Cluj-Napoca Outline 1 Cluster Analysis What is Datamining? Cluster Analysis 2 K-means 3 Hierarchical Clustering What is Datamining?
Information Management course
Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 01 : 06/10/2015 Practical informations: Teacher: Alberto Ceselli ([email protected])
Social Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014
RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer
Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland
Data Mining and Knowledge Discovery in Databases (KDD) State of the Art Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland 1 Conference overview 1. Overview of KDD and data mining 2. Data
Introduction to Data Mining
Introduction to Data Mining Jay Urbain Credits: Nazli Goharian & David Grossman @ IIT Outline Introduction Data Pre-processing Data Mining Algorithms Naïve Bayes Decision Tree Neural Network Association
Medical Information Management & Mining. You Chen Jan,15, 2013 [email protected]
Medical Information Management & Mining You Chen Jan,15, 2013 [email protected] 1 Trees Building Materials Trees cannot be used to build a house directly. How can we transform trees to building materials?
The Data Mining Process
Sequence for Determining Necessary Data. Wrong: Catalog everything you have, and decide what data is important. Right: Work backward from the solution, define the problem explicitly, and map out the data
An Overview of Knowledge Discovery Database and Data mining Techniques
An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,
Clustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016
Clustering Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016 1 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate data attributes with
Data Mining Part 5. Prediction
Data Mining Part 5. Prediction 5.1 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Classification vs. Numeric Prediction Prediction Process Data Preparation Comparing Prediction Methods References Classification
Neural Networks Lesson 5 - Cluster Analysis
Neural Networks Lesson 5 - Cluster Analysis Prof. Michele Scarpiniti INFOCOM Dpt. - Sapienza University of Rome http://ispac.ing.uniroma1.it/scarpiniti/index.htm [email protected] Rome, 29
Introduction to Data Mining
Bioinformatics Ying Liu, Ph.D. Laboratory for Bioinformatics University of Texas at Dallas Spring 2008 Introduction to Data Mining 1 Motivation: Why data mining? What is data mining? Data Mining: On what
An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015
An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content
How To Cluster
Data Clustering Dec 2nd, 2013 Kyrylo Bessonov Talk outline Introduction to clustering Types of clustering Supervised Unsupervised Similarity measures Main clustering algorithms k-means Hierarchical Main
DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS
DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS 1 AND ALGORITHMS Chiara Renso KDD-LAB ISTI- CNR, Pisa, Italy WHAT IS CLUSTER ANALYSIS? Finding groups of objects such that the objects in a group will be similar
Introduction Predictive Analytics Tools: Weka
Introduction Predictive Analytics Tools: Weka Predictive Analytics Center of Excellence San Diego Supercomputer Center University of California, San Diego Tools Landscape Considerations Scale User Interface
Data Preprocessing. Week 2
Data Preprocessing Week 2 Topics Data Types Data Repositories Data Preprocessing Present homework assignment #1 Team Homework Assignment #2 Read pp. 227 240, pp. 250 250, and pp. 259 263 the text book.
Data Mining Techniques
15.564 Information Technology I Business Intelligence Outline Operational vs. Decision Support Systems What is Data Mining? Overview of Data Mining Techniques Overview of Data Mining Process Data Warehouses
Principles of Data Mining by Hand&Mannila&Smyth
Principles of Data Mining by Hand&Mannila&Smyth Slides for Textbook Ari Visa,, Institute of Signal Processing Tampere University of Technology October 4, 2010 Data Mining: Concepts and Techniques 1 Differences
An Introduction to WEKA. As presented by PACE
An Introduction to WEKA As presented by PACE Download and Install WEKA Website: http://www.cs.waikato.ac.nz/~ml/weka/index.html 2 Content Intro and background Exploring WEKA Data Preparation Creating Models/
Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/8/2004 Hierarchical
Chapter 7. Cluster Analysis
Chapter 7. Cluster Analysis. What is Cluster Analysis?. A Categorization of Major Clustering Methods. Partitioning Methods. Hierarchical Methods 5. Density-Based Methods 6. Grid-Based Methods 7. Model-Based
Cluster Analysis. Isabel M. Rodrigues. Lisboa, 2014. Instituto Superior Técnico
Instituto Superior Técnico Lisboa, 2014 Introduction: Cluster analysis What is? Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from
Data Mining: Overview. What is Data Mining?
Data Mining: Overview What is Data Mining? Recently * coined term for confluence of ideas from statistics and computer science (machine learning and database methods) applied to large databases in science,
EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set
EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set Amhmed A. Bhih School of Electrical and Electronic Engineering Princy Johnson School of Electrical and Electronic Engineering Martin
Data Mining Analytics for Business Intelligence and Decision Support
Data Mining Analytics for Business Intelligence and Decision Support Chid Apte, T.J. Watson Research Center, IBM Research Division Knowledge Discovery and Data Mining (KDD) techniques are used for analyzing
ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING)
ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING) Gabriela Ochoa http://www.cs.stir.ac.uk/~goc/ OUTLINE Preliminaries Classification and Clustering Applications
Using Data Mining for Mobile Communication Clustering and Characterization
Using Data Mining for Mobile Communication Clustering and Characterization A. Bascacov *, C. Cernazanu ** and M. Marcu ** * Lasting Software, Timisoara, Romania ** Politehnica University of Timisoara/Computer
Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP
Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP ABSTRACT In data mining modelling, data preparation
Data Mining: Concepts and Techniques. Jiawei Han. Micheline Kamber. Simon Fräser University К MORGAN KAUFMANN PUBLISHERS. AN IMPRINT OF Elsevier
Data Mining: Concepts and Techniques Jiawei Han Micheline Kamber Simon Fräser University К MORGAN KAUFMANN PUBLISHERS AN IMPRINT OF Elsevier Contents Foreword Preface xix vii Chapter I Introduction I I.
OLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP
Data Warehousing and End-User Access Tools OLAP and Data Mining Accompanying growth in data warehouses is increasing demands for more powerful access tools providing advanced analytical capabilities. Key
STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and
Clustering Techniques and STATISTICA Case Study: Defining Clusters of Shopping Center Patrons STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table
Introduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing
Introduction to Data Mining and Machine Learning Techniques Iza Moise, Evangelos Pournaras, Dirk Helbing Iza Moise, Evangelos Pournaras, Dirk Helbing 1 Overview Main principles of data mining Definition
Analytics on Big Data
Analytics on Big Data Riccardo Torlone Università Roma Tre Credits: Mohamed Eltabakh (WPI) Analytics The discovery and communication of meaningful patterns in data (Wikipedia) It relies on data analysis
Data Mining Algorithms Part 1. Dejan Sarka
Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka ([email protected]) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses
Clustering UE 141 Spring 2013
Clustering UE 141 Spring 013 Jing Gao SUNY Buffalo 1 Definition of Clustering Finding groups of obects such that the obects in a group will be similar (or related) to one another and different from (or
Chapter 12 Discovering New Knowledge Data Mining
Chapter 12 Discovering New Knowledge Data Mining Becerra-Fernandez, et al. -- Knowledge Management 1/e -- 2004 Prentice Hall Additional material 2007 Dekai Wu Chapter Objectives Introduce the student to
A Review of Data Mining Techniques
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 4, April 2014,
UNSUPERVISED MACHINE LEARNING TECHNIQUES IN GENOMICS
UNSUPERVISED MACHINE LEARNING TECHNIQUES IN GENOMICS Dwijesh C. Mishra I.A.S.R.I., Library Avenue, New Delhi-110 012 [email protected] What is Learning? "Learning denotes changes in a system that enable
Clustering. 15-381 Artificial Intelligence Henry Lin. Organizing data into clusters such that there is
Clustering 15-381 Artificial Intelligence Henry Lin Modified from excellent slides of Eamonn Keogh, Ziv Bar-Joseph, and Andrew Moore What is Clustering? Organizing data into clusters such that there is
Data Warehousing and Data Mining
Data Warehousing and Data Mining Winter Semester 2010/2011 Free University of Bozen, Bolzano DW Lecturer: Johann Gamper [email protected] DM Lecturer: Mouna Kacimi [email protected] http://www.inf.unibz.it/dis/teaching/dwdm/index.html
DATA MINING CONCEPTS AND TECHNIQUES. Marek Maurizio E-commerce, winter 2011
DATA MINING CONCEPTS AND TECHNIQUES Marek Maurizio E-commerce, winter 2011 INTRODUCTION Overview of data mining Emphasis is placed on basic data mining concepts Techniques for uncovering interesting data
ECLT 5810 E-Commerce Data Mining Techniques - Introduction. Prof. Wai Lam
ECLT 5810 E-Commerce Data Mining Techniques - Introduction Prof. Wai Lam Data Opportunities Business infrastructure have improved the ability to collect data Virtually every aspect of business is now open
Sanjeev Kumar. contribute
RESEARCH ISSUES IN DATAA MINING Sanjeev Kumar I.A.S.R.I., Library Avenue, Pusa, New Delhi-110012 [email protected] 1. Introduction The field of data mining and knowledgee discovery is emerging as a
Data Mining Clustering (2) Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining
Data Mining Clustering (2) Toon Calders Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining Outline Partitional Clustering Distance-based K-means, K-medoids,
BUSINESS ANALYTICS. Data Pre-processing. Lecture 3. Information Systems and Machine Learning Lab. University of Hildesheim.
Tomáš Horváth BUSINESS ANALYTICS Lecture 3 Data Pre-processing Information Systems and Machine Learning Lab University of Hildesheim Germany Overview The aim of this lecture is to describe some data pre-processing
Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analsis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining b Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining /8/ What is Cluster
Environmental Remote Sensing GEOG 2021
Environmental Remote Sensing GEOG 2021 Lecture 4 Image classification 2 Purpose categorising data data abstraction / simplification data interpretation mapping for land cover mapping use land cover class
An Introduction to Data Mining
An Introduction to Intel Beijing [email protected] January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail
Distances, Clustering, and Classification. Heatmaps
Distances, Clustering, and Classification Heatmaps 1 Distance Clustering organizes things that are close into groups What does it mean for two genes to be close? What does it mean for two samples to be
Knowledge Discovery and Data Mining. Structured vs. Non-Structured Data
Knowledge Discovery and Data Mining Unit # 2 1 Structured vs. Non-Structured Data Most business databases contain structured data consisting of well-defined fields with numeric or alphanumeric values.
Data Mining Solutions for the Business Environment
Database Systems Journal vol. IV, no. 4/2013 21 Data Mining Solutions for the Business Environment Ruxandra PETRE University of Economic Studies, Bucharest, Romania [email protected] Over
DATA MINING TECHNIQUES AND APPLICATIONS
DATA MINING TECHNIQUES AND APPLICATIONS Mrs. Bharati M. Ramageri, Lecturer Modern Institute of Information Technology and Research, Department of Computer Application, Yamunanagar, Nigdi Pune, Maharashtra,
Cluster Analysis: Advanced Concepts
Cluster Analysis: Advanced Concepts and dalgorithms Dr. Hui Xiong Rutgers University Introduction to Data Mining 08/06/2006 1 Introduction to Data Mining 08/06/2006 1 Outline Prototype-based Fuzzy c-means
Machine Learning. Chapter 18, 21. Some material adopted from notes by Chuck Dyer
Machine Learning Chapter 18, 21 Some material adopted from notes by Chuck Dyer What is learning? Learning denotes changes in a system that... enable a system to do the same task more efficiently the next
Data Mining - Evaluation of Classifiers
Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010
Introduction to Data Mining and Business Intelligence Lecture 1/DMBI/IKI83403T/MTI/UI
Introduction to Data Mining and Business Intelligence Lecture 1/DMBI/IKI83403T/MTI/UI Yudho Giri Sucahyo, Ph.D, CISA ([email protected]) Faculty of Computer Science, University of Indonesia Objectives
Database Marketing, Business Intelligence and Knowledge Discovery
Database Marketing, Business Intelligence and Knowledge Discovery Note: Using material from Tan / Steinbach / Kumar (2005) Introduction to Data Mining,, Addison Wesley; and Cios / Pedrycz / Swiniarski
Data Mining. Knowledge Discovery, Data Warehousing and Machine Learning Final remarks. Lecturer: JERZY STEFANOWSKI
Data Mining Knowledge Discovery, Data Warehousing and Machine Learning Final remarks Lecturer: JERZY STEFANOWSKI Email: [email protected] Data Mining a step in A KDD Process Data mining:
Data Mining and Neural Networks in Stata
Data Mining and Neural Networks in Stata 2 nd Italian Stata Users Group Meeting Milano, 10 October 2005 Mario Lucchini e Maurizo Pisati Università di Milano-Bicocca [email protected] [email protected]
Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data
CMPE 59H Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data Term Project Report Fatma Güney, Kübra Kalkan 1/15/2013 Keywords: Non-linear
Chapter 20: Data Analysis
Chapter 20: Data Analysis Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 20: Data Analysis Decision Support Systems Data Warehousing Data Mining Classification
Chapter ML:XI (continued)
Chapter ML:XI (continued) XI. Cluster Analysis Data Mining Overview Cluster Analysis Basics Hierarchical Cluster Analysis Iterative Cluster Analysis Density-Based Cluster Analysis Cluster Evaluation Constrained
1. What are the uses of statistics in data mining? Statistics is used to Estimate the complexity of a data mining problem. Suggest which data mining
1. What are the uses of statistics in data mining? Statistics is used to Estimate the complexity of a data mining problem. Suggest which data mining techniques are most likely to be successful, and Identify
A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS
A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS Mrs. Jyoti Nawade 1, Dr. Balaji D 2, Mr. Pravin Nawade 3 1 Lecturer, JSPM S Bhivrabai Sawant Polytechnic, Pune (India) 2 Assistant
Practical Data Science with Azure Machine Learning, SQL Data Mining, and R
Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Overview This 4-day class is the first of the two data science courses taught by Rafal Lukawiecki. Some of the topics will be
Data Mining Classification: Decision Trees
Data Mining Classification: Decision Trees Classification Decision Trees: what they are and how they work Hunt s (TDIDT) algorithm How to select the best split How to handle Inconsistent data Continuous
Using multiple models: Bagging, Boosting, Ensembles, Forests
Using multiple models: Bagging, Boosting, Ensembles, Forests Bagging Combining predictions from multiple models Different models obtained from bootstrap samples of training data Average predictions or
Unsupervised Data Mining (Clustering)
Unsupervised Data Mining (Clustering) Javier Béjar KEMLG December 01 Javier Béjar (KEMLG) Unsupervised Data Mining (Clustering) December 01 1 / 51 Introduction Clustering in KDD One of the main tasks in
The Scientific Data Mining Process
Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In
Data Mining for Knowledge Management. Classification
1 Data Mining for Knowledge Management Classification Themis Palpanas University of Trento http://disi.unitn.eu/~themis Data Mining for Knowledge Management 1 Thanks for slides to: Jiawei Han Eamonn Keogh
Data Mining: Data Preprocessing. I211: Information infrastructure II
Data Mining: Data Preprocessing I211: Information infrastructure II 10 What is Data? Collection of data objects and their attributes Attributes An attribute is a property or characteristic of an object
Azure Machine Learning, SQL Data Mining and R
Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:
Data Exploration and Preprocessing. Data Mining and Text Mining (UIC 583 @ Politecnico di Milano)
Data Exploration and Preprocessing Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References Jiawei Han and Micheline Kamber, "Data Mining: Concepts and Techniques", The Morgan Kaufmann
Clustering. Data Mining. Abraham Otero. Data Mining. Agenda
Clustering 1/46 Agenda Introduction Distance K-nearest neighbors Hierarchical clustering Quick reference 2/46 1 Introduction It seems logical that in a new situation we should act in a similar way as in
Data Mining. Cluster Analysis: Advanced Concepts and Algorithms
Data Mining Cluster Analysis: Advanced Concepts and Algorithms Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 More Clustering Methods Prototype-based clustering Density-based clustering Graph-based
A Survey on Pre-processing and Post-processing Techniques in Data Mining
, pp. 99-128 http://dx.doi.org/10.14257/ijdta.2014.7.4.09 A Survey on Pre-processing and Post-processing Techniques in Data Mining Divya Tomar and Sonali Agarwal Indian Institute of Information Technology,
Comparison of K-means and Backpropagation Data Mining Algorithms
Comparison of K-means and Backpropagation Data Mining Algorithms Nitu Mathuriya, Dr. Ashish Bansal Abstract Data mining has got more and more mature as a field of basic research in computer science and
Index Contents Page No. Introduction . Data Mining & Knowledge Discovery
Index Contents Page No. 1. Introduction 1 1.1 Related Research 2 1.2 Objective of Research Work 3 1.3 Why Data Mining is Important 3 1.4 Research Methodology 4 1.5 Research Hypothesis 4 1.6 Scope 5 2.
CS 591.03 Introduction to Data Mining Instructor: Abdullah Mueen
CS 591.03 Introduction to Data Mining Instructor: Abdullah Mueen LECTURE 3: DATA TRANSFORMATION AND DIMENSIONALITY REDUCTION Chapter 3: Data Preprocessing Data Preprocessing: An Overview Data Quality Major
Decision Support System Methodology Using a Visual Approach for Cluster Analysis Problems
Decision Support System Methodology Using a Visual Approach for Cluster Analysis Problems Ran M. Bittmann School of Business Administration Ph.D. Thesis Submitted to the Senate of Bar-Ilan University Ramat-Gan,
Machine Learning using MapReduce
Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous
Data Mining Cluster Analysis: Basic Concepts and Algorithms. Clustering Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analsis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining b Tan, Steinbach, Kumar Clustering Algorithms K-means and its variants Hierarchical clustering
Building Data Cubes and Mining Them. Jelena Jovanovic Email: [email protected]
Building Data Cubes and Mining Them Jelena Jovanovic Email: [email protected] KDD Process KDD is an overall process of discovering useful knowledge from data. Data mining is a particular step in the
DATA MINING TECHNIQUES SUPPORT TO KNOWLEGDE OF BUSINESS INTELLIGENT SYSTEM
INTERNATIONAL JOURNAL OF RESEARCH IN COMPUTER APPLICATIONS AND ROBOTICS ISSN 2320-7345 DATA MINING TECHNIQUES SUPPORT TO KNOWLEGDE OF BUSINESS INTELLIGENT SYSTEM M. Mayilvaganan 1, S. Aparna 2 1 Associate
Unsupervised learning: Clustering
Unsupervised learning: Clustering Salissou Moutari Centre for Statistical Science and Operational Research CenSSOR 17 th September 2013 Unsupervised learning: Clustering 1/52 Outline 1 Introduction What
Machine Learning and Data Analysis overview. Department of Cybernetics, Czech Technical University in Prague. http://ida.felk.cvut.
Machine Learning and Data Analysis overview Jiří Kléma Department of Cybernetics, Czech Technical University in Prague http://ida.felk.cvut.cz psyllabus Lecture Lecturer Content 1. J. Kléma Introduction,
Introduction to Artificial Intelligence G51IAI. An Introduction to Data Mining
Introduction to Artificial Intelligence G51IAI An Introduction to Data Mining Learning Objectives Introduce a range of data mining techniques used in AI systems including : Neural networks Decision trees
Foundations of Artificial Intelligence. Introduction to Data Mining
Foundations of Artificial Intelligence Introduction to Data Mining Objectives Data Mining Introduce a range of data mining techniques used in AI systems including : Neural networks Decision trees Present
Data Mining for Fun and Profit
Data Mining for Fun and Profit Data mining is the extraction of implicit, previously unknown, and potentially useful information from data. - Ian H. Witten, Data Mining: Practical Machine Learning Tools
Data Mining. Vera Goebel. Department of Informatics, University of Oslo
Data Mining Vera Goebel Department of Informatics, University of Oslo 2011 1 Lecture Contents Knowledge Discovery in Databases (KDD) Definition and Applications OLAP Architectures for OLAP and KDD KDD
Statistical Models in Data Mining
Statistical Models in Data Mining Sargur N. Srihari University at Buffalo The State University of New York Department of Computer Science and Engineering Department of Biostatistics 1 Srihari Flood of
DATA MINING - SELECTED TOPICS
DATA MINING - SELECTED TOPICS Peter Brezany Institute for Software Science University of Vienna E-mail : [email protected] 1 MINING SPATIAL DATABASES 2 Spatial Database Systems SDBSs offer spatial
Machine Learning and Data Mining. Fundamentals, robotics, recognition
Machine Learning and Data Mining Fundamentals, robotics, recognition Machine Learning, Data Mining, Knowledge Discovery in Data Bases Their mutual relations Data Mining, Knowledge Discovery in Databases,
Is a Data Scientist the New Quant? Stuart Kozola MathWorks
Is a Data Scientist the New Quant? Stuart Kozola MathWorks 2015 The MathWorks, Inc. 1 Facts or information used usually to calculate, analyze, or plan something Information that is produced or stored by
