Introduc)on to Large Databases & Data Mining

Size: px
Start display at page:

Download "Introduc)on to Large Databases & Data Mining"

Transcription

1 Introduc)on to Large Databases & Data Mining Tips for Assembling Your Data Analysis Toolbox for the 22 nd Century 10/05/12 Jim Heasley, Ins)turte for Astronomy 1

2 Outline I Rela)onal Databases & BIG DATA Big data volumes require a new data handling paradigm Advantages of a rela)onal database Organiza)on of data Data integrity SQL Structured (and almost standard) query language for queries What a database is not. 10/05/12 Jim Heasley, Ins)turte for Astronomy 2

3 Outline II Data mining What is it? Common data mining tasks (FREE) Tools available to you to perform many of these tasks. 10/05/12 Jim Heasley, Ins)turte for Astronomy 3

4 Outline III Examples Imagined & Real If we only had )me travel Things one might start to do with PAN STARRS data (right now). 10/05/12 Jim Heasley, Ins)turte for Astronomy 4

5 RELATIONAL DATABASES 10/05/12 Jim Heasley, Ins)turte for Astronomy 5

6 Basic Defini8ons Database: A collec)on of related data organized to provide informa)on. Data: Known facts that can be recorded and have an implicit meaning. Oben integrated from several sources. Stored in a standard format for use by mul)ple applica)ons. Database Management System (DBMS): A sobware package/ system to facilitate the crea)on and maintenance of a computerized database. Database System: The DBMS sobware together with the data itself and the hardware upon which it runs. Some)mes, the applica)ons are also included. 10/05/12 Jim Heasley, Ins)turte for Astronomy 6

7 Two approaches Generally, there are two approaches to extract informa)on from data: file processing approach file based sobware programs database approach DBMS 10/05/12 Jim Heasley, Ins)turte for Astronomy 7

8 File processing approach Application program 1 Application program n Data Instructions... Data Instructions Each application program has a specific purpose Each program uses its own data Issues: data redundancy redundant processes/interfaces data integrity ease of maintenance consistency Security preserva)on valuable company asset access control 10/05/12 Jim Heasley, Ins)turte for Astronomy 8

9 Mo8va8on for databases Data is a very important asset of an organiza)on Mo)va)ons for databases to maintain data independent from applica)on programs to avoid: redundant data redundant processes/interfaces to enable: ease of maintenance sharing of data data access control 10/05/12 Jim Heasley, Ins)turte for Astronomy 9

10 Database approach DBMS Application program 1 Instructions Data... Metadata Application program n Instructions DBMS a general purpose sobware is self describing contains data metadata (i.e. data about data) 10/05/12 Jim Heasley, Ins)turte for Astronomy 10

11 Main Characteris8cs of the Database Approach Self describing nature of a database system: A DBMS catalog stores the descrip)on of a par)cular database (e.g. data structures, types, and constraints) Insula8on between programs and data: Called program data independence. Data Abstrac8on: A data model is used to hide storage details and present the users with a conceptual view of the database. Support of mul8ple views of the data: Each user may see a different view of the database, which describes only the data of interest to that user. Concurrent Execu8ons 10/05/12 Jim Heasley, Ins)turte for Astronomy 11

12 Characteris8cs of DBMS Data is: integrated, shared, persistent self describing Abstrac)on program and data independence Mul)ple views of the data different users need different kinds of informa)on 10/05/12 Jim Heasley, Ins)turte for Astronomy 12

13 Advantages of Using the Database Controlling redundancy Sharing of data among mul)ple users. Restric)ng unauthorized access to data. Providing persistent storage for program Objects Providing Storage Structures (e.g. indexes) for efficient Query Processing backup and recovery services. mul)ple interfaces to different classes of users. complex rela)onships among data. integrity constraints. Drawing inferences and ac)ons from the stored data using deduc)ve and ac)ve rules Approach 10/05/12 Jim Heasley, Ins)turte for Astronomy 13

14 Addi8onal advantages of the database approach Re use of data across mul)ple applica)ons Data structure and access can be changed without changing applica)ons Enforcement of standards and computa)on of sta)s)cs Improved responsiveness, produc)vity 10/05/12 Jim Heasley, Ins)turte for Astronomy 14

15 Addi8onal Implica8ons of Using the Database Approach Poten)al for enforcing standards Reduced applica)on development )me Flexibility to change data structures Availability of current informa)on Extremely important for on line transac)on systems such as airline, hotel, car reserva)ons. Economies of scale 10/05/12 Jim Heasley, Ins)turte for Astronomy 15

16 Disadvantages of the database approach Complexity Size (of sobware and applica)on) Cost Performance Risk of (spectacular!) failures 10/05/12 Jim Heasley, Ins)turte for Astronomy 16

17 When not to use a DBMS Main inhibitors (costs) of using a DBMS: High ini)al investment and possible need for addi)onal hardware. Overhead for providing generality, security, concurrency control, recovery, and integrity func)ons. When a DBMS may be unnecessary: If the database and applica)ons are simple, well defined, and not expected to change. If access to data by mul)ple users is not required. When no DBMS may suffice: If the database system is not able to handle the complexity of data because of modeling limita)ons If the database users need special opera)ons not supported by the DBMS. 10/05/12 Jim Heasley, Ins)turte for Astronomy 17

18 Database Logic Opera)ons within the database are governed by standard set theory and logic. New types of databases that are built upon fuzzy sets, fuzzy logic, and fuzzy measure are currently the subject of ac)ve research, but are not (as yet) widely available. The two key set opera)ons of interest in databases are INTERSECTION (the JOIN) and UNION (called the same in the DB world). 10/05/12 Jim Heasley, Ins)turte for Astronomy 18

19 Structured Query Language The user usually interacts with the database by expressing what she/he wants to accomplish by expressing the request in SQL. Note SQL tells the database what you want to do, but not how to do it. There are many helpful tutorials about SQL available on the web. An excellent introduc)on is available at www2.aao.gov.au/2dfgrs/public/release/database/sql_intro.pdf This introduc)on is sufficiently vanilla it will get you started despite the minor varia)ons between different flavors of SQL 10/05/12 Jim Heasley, Ins)turte for Astronomy 19

20 The Schema The logical schema defines how aoributes are assigned to various tables and the defini)on of keys (indexes) that help to )e tables together. A user must have understanding of the logical schema. The physical schema defines how the data tables are stored on the physical storage media (e.g., disks). Generally, users do not need to know the physical schema although the system developers must leverage this to maximize the performance of their system. 10/05/12 Jim Heasley, Ins)turte for Astronomy 20

21 User Queries Users develop queries to the database in a procedural language, usually some form of SQL, that builds requests for informa)on stored in the databases tables, oben making use of internal rela)onships inherent in the data (e.g., intersec)ons between different tables). 10/05/12 Jim Heasley, Ins)turte for Astronomy 21

22 The SQL Select Command The most frequently used SQL command (by the typical users) is the SELECT command. This is used to get (i.e. select) data from the database tables. The basic syntax of the SELECT command is SELECT (list of aoributes you want) FROM (list of tables containing them) WHERE (list of limi)ng/restric)ng condi)ons) 10/05/12 Jim Heasley, Ins)turte for Astronomy 22

23 What a Database isn t! While the column arrangement of aoributes in database tables might remind the user of a spreadsheet program like Excel, a database is not a compu)ng engine. Further, because of the nature of SQL, the user s query simply defines what data is wanted, not how to get it. That also includes how the database may choose to execute numerical opera)ons the user embeds in the query. 10/05/12 Jim Heasley, Ins)turte for Astronomy 23

24 Database Technology Statistics Machine Learning Data Mining Visualization Information Science Other Disciplines DATA MINING: CONFLUENCE OF MULTIPLE DISCIPLINES 10/05/12 Jim Heasley, Ins)turte for Astronomy 24

25 The purpose of compu)ng is insight, not numbers. Richard Hamming, in the preface to his 1962 text on numerical methods. 10/05/12 Jim Heasley, Ins)turte for Astronomy 25

26 What is Data Mining? Finding (meaningful) paoerns in data Classifica)on Associa)on Rules Cluster Analysis Anomaly Detec)on Regression Data mining tools have been used extensively in Biology, gene)cs, medical research (Bioinforma)cs) Business and Economics Ecology and resource management Engineering Literature Music Voice and facial recogni)on 10/05/12 Jim Heasley, Ins)turte for Astronomy 26

27 Don t Re invent the Wheel! 10/05/12 Jim Heasley, Ins)turte for Astronomy 27

28 Rela8onship between Databases & Data Mining Databases are oben a key component in data mining. One oben finds data warehouses providing the informa)on needed by the mining tools. However, one usually finds that the actual data mining opera)ons are executed outside the database itself. Databases are excellent informa)on severs but are not good compute engines! 10/05/12 Jim Heasley, Ins)turte for Astronomy 28

29 Classifica8on: Defini8on Given a collec)on of records (training set ) Each record contains a set of a<ributes, one of the aoributes is the class. Find a model for class aoribute as a func)on of the values of other aoributes. Goal: previously unseen records should be assigned a class as accurately as possible. A test set is used to determine the accuracy of the model. Usually, the given data set is divided into training and test sets, with training set used to build the model and test set used to validate it. 10/05/12 Jim Heasley, Ins)turte for Astronomy 29

30 Associa8on Rule Mining Given a set of transac)ons, find rules that will predict the occurrence of an item based on the occurrences of other items in the transac)on Market Basket transac)ons Example of Associa)on Rules {Diaper} {Beer}, {Milk, Bread} {Eggs,Coke}, {Beer, Bread} {Milk}, Implica)on means co occurrence, not causality! 10/05/12 Jim Heasley, Ins)turte for Astronomy 30

31 What is Cluster Analysis? Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to) the objects in other groups Intra-cluster distances are minimized Inter-cluster distances are maximized 10/05/12 Jim Heasley, Ins)turte for Astronomy 31

32 Anomaly/Outlier Detec8on What are anomalies/outliers? The set of data points that are considerably different than the remainder of the data Variants of Anomaly/Outlier Detec)on Problems Given a database D, find all the data points x D with anomaly scores greater than some threshold t Given a database D, find all the data points x D having the top n largest anomaly scores f(x) Given a database D, containing mostly normal (but unlabeled) data points, and a test point x, compute the anomaly score of x with respect to D Applica)ons: Credit card fraud detec)on, telecommunica)on fraud detec)on, network intrusion detec)on, fault detec)on 10/05/12 Jim Heasley, Ins)turte for Astronomy 32

33 Regression (Predic8on) Regression is the process of finding a func)on that describes data classes for the purpose of being able to predict discrete numerical data values. Numerous approaches for developing the desired func)on exist, including classifica)on (IF THEN) rules, decision trees, mathema)cal formulae, or neural networks. Predic)on also encompasses the iden)fica)on of distribu)on trends based on the available data. Both classifica)on and predic)on may need to be preceded by relevance analysis, which aoempts to iden)fy those aoributes or features that do not contribute to the classifica)on or predic)on process. These aoributes can then be excluded from the analysis. A common relevance analysis technique is principal component analysis. 10/05/12 Jim Heasley, Ins)turte for Astronomy 33

34 Machine Learning 10/05/12 Jim Heasley, Ins)turte for Astronomy 34

35 Data Mining Environments There are a large number of data mining sobware packages available, both commercial and open source. A search of the internet can quickly iden)fy these. A comprehensive review of these packages is far beyond the scope of what we can deal with in this talk, so I will restrict my comments here to several well known packages used for data analysis and mining: the R sta)s)cal analysis package, Matlab (and the open source work alike Octave), and data mining packages Weka and Scikits.Learn. 10/05/12 Jim Heasley, Ins)turte for Astronomy 35

36 The R Project for Sta8s8cal Compu8ng project.org/ R, also called GNU S, is a strongly func)onal language and environment to sta)s)cally explore data sets, make many graphical displays of data. Very strong sta)sical tools. The basic system has been greatly expanded by the addi)on of packages developed by its user community 10/05/12 Jim Heasley, Ins)turte for Astronomy 36

37 Matlab (Octave) MATLAB, a commercial product from MathWorks, is a high level technical compu)ng language and interac)ve environment for algorithm development, data visualiza)on, data analysis, and numerical modeling. hop:// GNU Octave is a high level interpreted language, primarily intended for numerical computa)ons. It is ian open source work alike version of MATLAB. hop:// 10/05/12 Jim Heasley, Ins)turte for Astronomy 37

38 Weka (Waikato Environment for Knowledge Analysis) is a well known suite of machine learning sobware that supports several typical data mining tasks, par)cularly data preprocessing, clustering, classifica)on, regression, visualiza)on, and feature selec)on. Its techniques are based on the hypothesis that the data is available as a single flat file or rela)on, where each data point is labeled by a fixed number of aoributes. Weka provides access to SQL databases u)lizing Java Database Connec)vity and can process the result returned by a database query. Its main user interface is the Explorer, but the same func)onality can be accessed from the command line or through the component based Knowledge Flow interface. hop:// 10/05/12 Jim Heasley, Ins)turte for Astronomy 38

39 scikit learn is a Python module integra)ng classic machine learning algorithms in the )ghtly knit scien)fic Python world (numpy, scipy, matplotlib). It aims to provide simple and efficient solu)ons to learning problems, accessible to everybody and reusable in various contexts: machine learning as a versa)le tool for science and engineering. Tools are available for supervised & unsupervised learning, model selec)on, datasets, feature extrac)on. hop://scikit learn.org/stable/ 10/05/12 Jim Heasley, Ins)turte for Astronomy 39

40 Pluses, Minuses, Observa8ons The R and Weka sobware both have a large community which contributes to extending their func)onality through the development of new add on packages. Further R and Weka can be interfaced via the RWeka package. There are many excellent on line tutorials for these packages, and Weka itself is well described in the text Data Mining PracBcal Machine Learning Tools and Techniques by Wioen, Frank, & Hall. This text provides both a good underpinning of the methods and prac)cal tutorial informa)on. (The text is available as an e book.) Scikits.learn, while s)ll fairly new (current release is version 0.7), has a very impressive collec)on of tools and an extensive user guide. The sobware is wrioen in Python. My main reserva)on about this sobware is that while the user guide presents many examples, there is an implicit assump)on that the user knows a great deal about the field of data mining. This may leave the new user somewhat in over their head in trying to determine exactly which tool best serves their need. 10/05/12 Jim Heasley, Ins)turte for Astronomy 40

41 EXAMPLES IMAGINARY & REAL 10/05/12 Jim Heasley, Ins)turte for Astronomy 41

42 How could we have helped this lady? 10/05/12 Jim Heasley, Ins)turte for Astronomy 42

43 10/05/12 Jim Heasley, Ins)turte for Astronomy 43

44 10/05/12 Jim Heasley, Ins)turte for Astronomy 44

45 Or these gentlemen? 10/05/12 Jim Heasley, Ins)turte for Astronomy 45

46 10/05/12 Jim Heasley, Ins)turte for Astronomy 46

47 Or him? 10/05/12 Jim Heasley, Ins)turte for Astronomy 47

48 Pan STARRS Opportuni8es The PS1 Small Area Survey (SAS), covering an area of 81 deg 2, overlaps with the SDSS Stripe 82. In addi)on to the deep Stripe 82 database, the images from this region have been examined by the Ci)zen Science team known as the Galaxy Zoo. This interes)ng overlap of resources provides data for some exci)ng data mining experiments. Star Galaxy classifica)on (or more precisely, Star Galaxy QSO classifica)on) is an on going challenge for the PS1 science teams. While this work has been reasonably successful, the efforts thus far seem to have aoempted to get by with the simplest possible classifica)on approach. What might happen if we performed a classifica)on exercise wherein we use a wide range of IPP measurements (e.g., psf, Kron, Petrosian magnitude, Petrosian radii, various moments measured in individual frames and stack) with SDSS and Galaxy Zoo data providing classifica)on truth? A similar analysis, using visual inspec)on of the images to iden)fy ar)facts in the PS1 images and/or stacks, might provide a robust garbage rejec)on process. Not necessarily glamorous but definitely important. 10/05/12 Jim Heasley, Ins)turte for Astronomy 48

49 Empirical Photo Z Methods Ar)ficial Neural Networks Support Vector Machines Self Organizing Maps Gaussian Process Regression Kernel Regression Linear/Nonlinear polynomial fixng Instance Based Learning & Nearest Neighbors Boosted Decision Trees Regression Trees And these are just the ones I ve found so far! 10/05/12 Jim Heasley, Ins)turte for Astronomy 49

50 Galaxy Clusters? We all know the best way to iden)fy clusters of galaxies is from their x ray emission. Unfortunately, current x ray surveys don t provide sufficient sky & depth coverage to do this. Op)cal surveys have sufficient depth but suffer from background issues, overlapping foreground & background clusters, etc. It has long been hoped that in large scale op)cal surveys such as Pan STARRS and LSST, we will be able to use Photo Z values to sort out real clusters from accidental clustering of galaxies, and overlapping clusters at different distances. (Some of the PS1 partners in Taiwan are working on this problem.) 10/05/12 Jim Heasley, Ins)turte for Astronomy 50

51 Galaxy Clusters Can Data Mining Help? While there is a plethora of data mining techniques for finding clusters within data, most are probably not well suited for finding galaxy clusters. Many methods start off by assuming that in a given region that one knows how many clusters are present. Clearly this is not the case with our problem. Further, we need to deal with the fact that in the 3 D representa)on, we have much larger uncertainty along the line of sight due to the accuracy of the Photo Z measures. Some interes)ng work in this area has made use of a friend of friends approach. I think this could be generalized to include beoer background discrimina)on including the Photo Z distribu)on. 10/05/12 Jim Heasley, Ins)turte for Astronomy 51

52 PAU 10/05/12 Jim Heasley, Ins)turte for Astronomy 52

Data Warehousing. Yeow Wei Choong Anne Laurent

Data Warehousing. Yeow Wei Choong Anne Laurent Data Warehousing Yeow Wei Choong Anne Laurent Databases Databases are developed on the IDEA that DATA is one of the cri>cal materials of the Informa>on Age Informa>on, which is created by data, becomes

More information

Data Mining. Supervised Methods. Ciro Donalek donalek@astro.caltech.edu. Ay/Bi 199ab: Methods of Computa@onal Sciences hcp://esci101.blogspot.

Data Mining. Supervised Methods. Ciro Donalek donalek@astro.caltech.edu. Ay/Bi 199ab: Methods of Computa@onal Sciences hcp://esci101.blogspot. Data Mining Supervised Methods Ciro Donalek donalek@astro.caltech.edu Supervised Methods Summary Ar@ficial Neural Networks Mul@layer Perceptron Support Vector Machines SoLwares Supervised Models: Supervised

More information

Keeping Pace with Big Data

Keeping Pace with Big Data - A Data Mining Perspec>ve Huan Liu, Tempe, AZ hep://www.public.asu.edu/~huanliu NSF Workshop on Big Data Analy6cs for Infrastructure and Building Resilience and Sustainability, Beijing, China Sept 19-20,

More information

Big Data. The Big Picture. Our flexible and efficient Big Data solu9ons open the door to new opportuni9es and new business areas

Big Data. The Big Picture. Our flexible and efficient Big Data solu9ons open the door to new opportuni9es and new business areas Big Data The Big Picture Our flexible and efficient Big Data solu9ons open the door to new opportuni9es and new business areas What is Big Data? Big Data gets its name because that s what it is data that

More information

Ins+tuto Superior Técnico Technical University of Lisbon. Big Data. Bruno Lopes Catarina Moreira João Pinho

Ins+tuto Superior Técnico Technical University of Lisbon. Big Data. Bruno Lopes Catarina Moreira João Pinho Ins+tuto Superior Técnico Technical University of Lisbon Big Data Bruno Lopes Catarina Moreira João Pinho Mo#va#on 2 220 PetaBytes Of data that people create every day! 2 Mo#va#on 90 % of Data UNSTRUCTURED

More information

1 Actuate Corpora-on 2013. Big Data Business Analy/cs

1 Actuate Corpora-on 2013. Big Data Business Analy/cs 1 Big Data Business Analy/cs Introducing BIRT Analy3cs Provides analysts and business users with advanced visual data discovery and predictive analytics to make better, more timely decisions in the age

More information

Ibis: Scaling Python Analy=cs on Hadoop and Impala

Ibis: Scaling Python Analy=cs on Hadoop and Impala Ibis: Scaling Python Analy=cs on Hadoop and Impala Wes McKinney, Budapest BI Forum 2015-10- 14 @wesmckinn 1 Me R&D at Cloudera Serial creator of structured data tools / user interfaces Mathema=cian MIT

More information

Data Management in the Cloud: Limitations and Opportunities. Annies Ductan

Data Management in the Cloud: Limitations and Opportunities. Annies Ductan Data Management in the Cloud: Limitations and Opportunities Annies Ductan Discussion Outline: Introduc)on Overview Vision of Cloud Compu8ng Managing Data in The Cloud Cloud Characteris8cs Data Management

More information

Mega Modeling for Scien/fic Big Data Processing

Mega Modeling for Scien/fic Big Data Processing Mega Modeling for Scien/fic Big Data Processing Stefano Ceri, Emanuele Della Valle (Politecnico di Milano) Dino Pedreschi, Roberto Trasar/ (ISTI- CNR and University of Pisa) 1 The context 2 Scenario BIG

More information

An Open Dynamic Big Data Driven Applica3on System Toolkit

An Open Dynamic Big Data Driven Applica3on System Toolkit An Open Dynamic Big Data Driven Applica3on System Toolkit Craig C. Douglas University of Wyoming and KAUST This research is supported in part by the Na3onal Science Founda3on and King Abdullah University

More information

Introduc)on to the IoT- A methodology

Introduc)on to the IoT- A methodology 10/11/14 1 Introduc)on to the IoTA methodology Olivier SAVRY CEA LETI 10/11/14 2 IoTA Objec)ves Provide a reference model of architecture (ARM) based on Interoperability Scalability Security and Privacy

More information

Experiments on cost/power and failure aware scheduling for clouds and grids

Experiments on cost/power and failure aware scheduling for clouds and grids Experiments on cost/power and failure aware scheduling for clouds and grids Jorge G. Barbosa, Al0no M. Sampaio, Hamid Harabnejad Universidade do Porto, Faculdade de Engenharia, LIACC Porto, Portugal, jbarbosa@fe.up.pt

More information

ANALYTICAL TECHNIQUES FOR DATA VISUALIZATION

ANALYTICAL TECHNIQUES FOR DATA VISUALIZATION ANALYTICAL TECHNIQUES FOR DATA VISUALIZATION CSE 537 Ar@ficial Intelligence Professor Anita Wasilewska GROUP 2 TEAM MEMBERS: SAEED BOOR BOOR - 110564337 SHIH- YU TSAI - 110385129 HAN LI 110168054 SOURCES

More information

How To Understand The Big Data Paradigm

How To Understand The Big Data Paradigm Big Data and Its Empiricist Founda4ons Teresa Scantamburlo The evolu4on of Data Science The mechaniza4on of induc4on The business of data The Big Data paradigm (data + computa4on) Cri4cal analysis Tenta4ve

More information

Introduc8on to Apache Spark

Introduc8on to Apache Spark Introduc8on to Apache Spark Jordan Volz, Systems Engineer @ Cloudera 1 Analyzing Data on Large Data Sets Python, R, etc. are popular tools among data scien8sts/analysts, sta8s8cians, etc. Why are these

More information

The Library (Big) Data scien4st

The Library (Big) Data scien4st The Library (Big) Data scien4st IFLA/ALA webinar: Big Data: new roles and opportuni4es for new librarians June 15 th 2016 IFLA Big Data Special Interest Group (SIG) Wouter Klapwijk, Stellenbosch University,

More information

CMMI for High-Performance with TSP/PSP

CMMI for High-Performance with TSP/PSP Dr. Kıvanç DİNÇER, PMP Hace6epe University Implemen@ng CMMI for High-Performance with TSP/PSP Informa@on Systems & SoFware The Informa@on Systems usage has experienced an exponen@al growth over the past

More information

Introduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing

Introduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing Introduction to Data Mining and Machine Learning Techniques Iza Moise, Evangelos Pournaras, Dirk Helbing Iza Moise, Evangelos Pournaras, Dirk Helbing 1 Overview Main principles of data mining Definition

More information

Cloud Data Management System (CDMS)

Cloud Data Management System (CDMS) Cloud Management System (CMS) Wiqar Chaudry Solu9ons Engineer Senior Advisor CMS Overview he OpenStack cloud data management system features a canonical data modeling framework designed to broker context

More information

Kaseya Fundamentals Workshop DAY THREE. Developed by Kaseya University. Powered by IT Scholars

Kaseya Fundamentals Workshop DAY THREE. Developed by Kaseya University. Powered by IT Scholars Kaseya Fundamentals Workshop DAY THREE Developed by Kaseya University Powered by IT Scholars Kaseya Version 6.5 Last updated March, 2014 Day Two Overview Day Two Lab Review Patch Management Configura;on

More information

CS 4604: Introduc0on to Database Management Systems

CS 4604: Introduc0on to Database Management Systems CS 4604: Introduc0on to Database Management Systems B. Aditya Prakash Lecture #1: Introduc/on Many slides based on material by Profs. Murali, Ramakrishnan and Faloutsos Course Informa0on Instructor B.

More information

Scalus A)ribute Workshop. Paris, April 14th 15th

Scalus A)ribute Workshop. Paris, April 14th 15th Scalus A)ribute Workshop Paris, April 14th 15th Content Mo=va=on, objec=ves, and constraints Scalus strategy Scenario and architectural views How the architecture works Mo=va=on for this MCITN Storage

More information

Data Obesity: Ethics, Law or Regulation?

Data Obesity: Ethics, Law or Regulation? Data Obesity: Ethics, Law or Regulation? Mireille Hildebrandt Chair of Smart Environments, Data Protec:on and the Rule of Law, RU Nijmegen Professor of Technology Law and Law in Technology, Vrije Universiteit

More information

Big Data in medical image processing

Big Data in medical image processing Big Data in medical image processing Konstan3n Bychenkov, CEO Aligned Research Group LLC bychenkov@alignedresearch.com Big data in medicine Genomic Research Popula3on Health Images M- Health hips://cloud.google.com/genomics/v1beta2/reference/

More information

LSST Data Management plans: Pipeline outputs and Level 2 vs. Level 3

LSST Data Management plans: Pipeline outputs and Level 2 vs. Level 3 LSST Data Management plans: Pipeline outputs and Level 2 vs. Level 3 Mario Juric Robert Lupton LSST DM Project Scien@st Algorithms Lead LSST SAC Name of Mee)ng Loca)on Date - Change in Slide Master 1 Data

More information

Collision Data Analysis, A Mul0 Dimensional Approach Presented by: Howard Sco> Needham, Sandarbh Singh

Collision Data Analysis, A Mul0 Dimensional Approach Presented by: Howard Sco> Needham, Sandarbh Singh Masters Defense Collision Data Analysis, A Mul0 Dimensional Approach Presented by: Howard Sco> Needham, Sandarbh Singh Introduc0on! We wanted to find a large open source database so we can mine and experiment

More information

COIS 342 - Databases

COIS 342 - Databases Faculty of Computing and Information Technology in Rabigh COIS 342 - Databases Chapter I The database Approach Adapted from Elmasri & Navathe by Dr Samir BOUCETTA First Semester 2011/2012 Types of Databases

More information

TOLOMEO. ORFEO Toolbox. Jordi Inglada - CNES. TOoLs for Open Mul/- risk assessment using Earth Observa/on data TOLOMEO

TOLOMEO. ORFEO Toolbox. Jordi Inglada - CNES. TOoLs for Open Mul/- risk assessment using Earth Observa/on data TOLOMEO ORFEO Toolbox Jordi Inglada - CNES TOoLs for Open Mul/- risk assessment using Earth Observa/on data Outline ORFEO Toolbox : general characteris>cs Example of OTB features OTB Applica>ons & Processing Chains

More information

Python for Data Analysis and Visualiza4on. Fang (Cherry) Liu, Ph.D fang.liu@oit.gatech.edu PACE Gatech July 2013

Python for Data Analysis and Visualiza4on. Fang (Cherry) Liu, Ph.D fang.liu@oit.gatech.edu PACE Gatech July 2013 Python for Data Analysis and Visualiza4on Fang (Cherry) Liu, Ph.D PACE Gatech July 2013 Outline System requirements and IPython Why use python for data analysis and visula4on Data set US baby names 1880-2012

More information

Phone Systems Buyer s Guide

Phone Systems Buyer s Guide Phone Systems Buyer s Guide Contents How Cri(cal is Communica(on to Your Business? 3 Fundamental Issues 4 Phone Systems Basic Features 6 Features for Users with Advanced Needs 10 Key Ques(ons for All Buyers

More information

EXPERIENCE WITH SERVICE OBSERVING

EXPERIENCE WITH SERVICE OBSERVING EXPERIENCE WITH SERVICE OBSERVING ALEXANDRA TRITSCHLER NATIONAL SOLAR OBSERVATORY 1 st SOLARNET 3 rd EAST/ATST MEETING :: 5 8 AUGUST 2013 :: OSLO, NORWAY Outline 2 Introduc?on o Current Observing Models

More information

BPO. Accerela*ng Revenue Enhancements Through Sales Support Services

BPO. Accerela*ng Revenue Enhancements Through Sales Support Services BPO Accerela*ng Revenue Enhancements Through Sales Support Services What is BPO? Business Process Outsorcing (BPO) is the process of outsourcing specific business func6ons to a third- party service provider

More information

How To Use A Webmail On A Pc Or Macodeo.Com

How To Use A Webmail On A Pc Or Macodeo.Com Big data workloads and real-world data sets Gang Lu Institute of Computing Technology, Chinese Academy of Sciences BigDataBench Tutorial MICRO 2014 Cambridge, UK INSTITUTE OF COMPUTING TECHNOLOGY 1 Five

More information

Welcome! Accelera'ng Pa'ent- Centered Outcomes Research and Methodological Research. Andrea Heckert, PhD, MPH Program Officer, Science

Welcome! Accelera'ng Pa'ent- Centered Outcomes Research and Methodological Research. Andrea Heckert, PhD, MPH Program Officer, Science Accelera'ng Pa'ent- Centered Outcomes Research and Methodological Research Emily Evans, PhD, MPH Program Officer, Science Andrea Heckert, PhD, MPH Program Officer, Science June 22, 2015 Welcome! Emily

More information

Making Sense of Big Data. Dr. Thomas E. Potok Computa2onal Data Analy2cs Group Leader Oak Ridge Na2onal Laboratory potokte@ornl.

Making Sense of Big Data. Dr. Thomas E. Potok Computa2onal Data Analy2cs Group Leader Oak Ridge Na2onal Laboratory potokte@ornl. Making Sense of Big Data Dr. Thomas E. Potok Computa2onal Data Analy2cs Group Leader Oak Ridge Na2onal Laboratory potokte@ornl.gov 865-574- 0834 ORNL s Big Data Legacy Science National Security Energy

More information

Extrac'ng People s Hobby and Interest Informa'on from Social Media Content

Extrac'ng People s Hobby and Interest Informa'on from Social Media Content Extrac'ng People s Hobby and Interest Informa'on from Social Media Content Thomas Forss, Shuhua Liu and Kaj- Mikael Björk Dept of Business Administra?on and Analy?cs Arcada University of Applied Sciences

More information

Pu#ng together a bioinforma1cs team: 2014 compared with 1997

Pu#ng together a bioinforma1cs team: 2014 compared with 1997 Pu#ng together a bioinforma1cs team: 2014 compared with 1997 BIG DATA and Healthcare Analy3cs Melbourne, Thursday 3 rd April 2014 Terry Speed, Walter & Eliza Hall Ins3tute of Medical Research 1 Overview

More information

So#ware quality assurance - introduc4on. Dr Ana Magazinius

So#ware quality assurance - introduc4on. Dr Ana Magazinius So#ware quality assurance - introduc4on Dr Ana Magazinius 1 What is quality? 2 What is a good quality car? 2 and 2 2 minutes 3 characteris4cs 3 What is quality? 4 What is quality? How good or bad something

More information

Project Management Introduc1on

Project Management Introduc1on Project Management Introduc1on Session 1 Part I Introduc1on By Amal Le Collen, PMP Dr. Lauren1u Neamtu, PMP Session outline 1. PART I: Introduc1on 1. The Purpose of the PMBOK Guide 2. What is a project?

More information

Tim Blevins Execu;ve Director Labor and Revenue Solu;ons. FTA Technology Conference August 4th, 2015

Tim Blevins Execu;ve Director Labor and Revenue Solu;ons. FTA Technology Conference August 4th, 2015 Tim Blevins Execu;ve Director Labor and Revenue Solu;ons FTA Technology Conference August 4th, 2015 Governance and Organiza;onal Strategy PaIerns of Fraud and Abuse in Government What tools can we use

More information

Big Data Visualiza9on

Big Data Visualiza9on Big Data Visualiza9on Dr. Steve Cutchin Associate Professor Computer Science 2012 Boise State University 1 Computer Science Department 10 Faculty + 3 Lectures + 2 New hires. 400 Undergraduates Enrolled

More information

Theo JD Bothma Department of Informa1on Science theo.bothma@up.ac.za

Theo JD Bothma Department of Informa1on Science theo.bothma@up.ac.za Theo JD Bothma Department of Informa1on Science theo.bothma@up.ac.za Reflec1ons on the role of corpora and big data in e- lexicography in rela1on to end user informa1on needs CILC 2015 7th Interna1onal

More information

MSc Data Science at the University of Sheffield. Started in September 2014

MSc Data Science at the University of Sheffield. Started in September 2014 MSc Data Science at the University of Sheffield Started in September 2014 Gianluca Demar?ni Lecturer in Data Science at the Informa?on School since 2014 Ph.D. in Computer Science at U. Hannover, Germany

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining Jay Urbain Credits: Nazli Goharian & David Grossman @ IIT Outline Introduction Data Pre-processing Data Mining Algorithms Naïve Bayes Decision Tree Neural Network Association

More information

Map- reduce, Hadoop and The communica3on bo5leneck. Yoav Freund UCSD / Computer Science and Engineering

Map- reduce, Hadoop and The communica3on bo5leneck. Yoav Freund UCSD / Computer Science and Engineering Map- reduce, Hadoop and The communica3on bo5leneck Yoav Freund UCSD / Computer Science and Engineering Plan of the talk Why is Hadoop so popular? HDFS Map Reduce Word Count example using Hadoop streaming

More information

An Introduc@on to Big Data, Apache Hadoop, and Cloudera

An Introduc@on to Big Data, Apache Hadoop, and Cloudera An Introduc@on to Big Data, Apache Hadoop, and Cloudera Ian Wrigley, Curriculum Manager, Cloudera 1 The Mo@va@on for Hadoop 2 Tradi@onal Large- Scale Computa@on Tradi*onally, computa*on has been processor-

More information

Big Data and Health Insurance Product Selec6on (and a few other applica6on) Jonathan Kolstad UC Berkeley and NBER

Big Data and Health Insurance Product Selec6on (and a few other applica6on) Jonathan Kolstad UC Berkeley and NBER Big Data and Health Insurance Product Selec6on (and a few other applica6on) Jonathan Kolstad UC Berkeley and NBER Introduc6on Applica6ons of behavioral economics in health SeIng where behavioral assump6ons

More information

BENCHMARKING V ISUALIZATION TOOL

BENCHMARKING V ISUALIZATION TOOL Copyright 2014 Splunk Inc. BENCHMARKING V ISUALIZATION TOOL J. Green Computer Scien

More information

The Data Reservoir. 10 th September 2014. Mandy Chessell FREng CEng FBCS Dis4nguished Engineer, Master Inventor Chief Architect, Informa4on Solu4ons

The Data Reservoir. 10 th September 2014. Mandy Chessell FREng CEng FBCS Dis4nguished Engineer, Master Inventor Chief Architect, Informa4on Solu4ons Mandy Chessell FREng CEng FBCS Dis4nguished Engineer, Master Inventor Chief Architect, Solu4ons The Reservoir 10 th September 2014 A growing demand Business Teams want Open access to more informa4on More

More information

ECBDL 14: Evolu/onary Computa/on for Big Data and Big Learning Workshop July 13 th, 2014 Big Data Compe//on

ECBDL 14: Evolu/onary Computa/on for Big Data and Big Learning Workshop July 13 th, 2014 Big Data Compe//on ECBDL 14: Evolu/onary Computa/on for Big Data and Big Learning Workshop July 13 th, 2014 Big Data Compe//on Jaume Bacardit jaume.bacardit@ncl.ac.uk The Interdisciplinary Compu/ng and Complex BioSystems

More information

Help Framework. Ticket Management Ticket Resolu/on Communica/ons. Ticket Assignment Follow up Customer - communica/on System updates Delay management

Help Framework. Ticket Management Ticket Resolu/on Communica/ons. Ticket Assignment Follow up Customer - communica/on System updates Delay management Help for JD Edwards Our Help Framework Ticket qualifica/on Ticket crea/on Ticket Rou/ng Closures L1 issues Resolu/on KG SOPs Co- ordinate Ticket Assignment Follow up Customer - communica/on System updates

More information

Project Overview. Collabora'on Mee'ng with Op'mis, 20-21 Sept. 2011, Rome

Project Overview. Collabora'on Mee'ng with Op'mis, 20-21 Sept. 2011, Rome Project Overview Collabora'on Mee'ng with Op'mis, 20-21 Sept. 2011, Rome Cloud-TM at a glance "#$%&'$()!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!"#$%&!"'!()*+!!!!!!!!!!!!!!!!!!!,-./01234156!("*+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!&7"7#7"7!("*+!!!!!!!!!!!!!!!!!!!89:!;62!("$+!

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining 1 Why Data Mining? Explosive Growth of Data Data collection and data availability Automated data collection tools, Internet, smartphones, Major sources of abundant data Business:

More information

Effec%ve AX 2012 Upgrade Project Planning and Microso< Sure Step. Arbela Technologies

Effec%ve AX 2012 Upgrade Project Planning and Microso< Sure Step. Arbela Technologies Effec%ve AX 2012 Upgrade Project Planning and Microso< Sure Step Arbela Technologies Why Upgrade? What to do? How to do it? Tools and templates Agenda Sure Step 2012 Ax2012 Upgrade specific steps Checklist

More information

Hunk & Elas=c MapReduce: Big Data Analy=cs on AWS

Hunk & Elas=c MapReduce: Big Data Analy=cs on AWS Copyright 2014 Splunk Inc. Hunk & Elas=c MapReduce: Big Data Analy=cs on AWS Dritan Bi=ncka BD Solu=ons Architecture Disclaimer During the course of this presenta=on, we may make forward looking statements

More information

Texas Digital Government Summit. Data Analysis Structured vs. Unstructured Data. Presented By: Dave Larson

Texas Digital Government Summit. Data Analysis Structured vs. Unstructured Data. Presented By: Dave Larson Texas Digital Government Summit Data Analysis Structured vs. Unstructured Data Presented By: Dave Larson Speaker Bio Dave Larson Solu6ons Architect with Freeit Data Solu6ons In the IT industry for over

More information

Secure Because Math: Understanding ML- based Security Products (#SecureBecauseMath)

Secure Because Math: Understanding ML- based Security Products (#SecureBecauseMath) Secure Because Math: Understanding ML- based Security Products (#SecureBecauseMath) Alex Pinto Chief Data Scien2st Niddel / MLSec Project @alexcpsec @MLSecProject @NiddelCorp Agenda Security Singularity

More information

What is Data Mining? Data Mining (Knowledge discovery in database) Data mining: Basic steps. Mining tasks. Classification: YES, NO

What is Data Mining? Data Mining (Knowledge discovery in database) Data mining: Basic steps. Mining tasks. Classification: YES, NO What is Data Mining? Data Mining (Knowledge discovery in database) Data Mining: "The non trivial extraction of implicit, previously unknown, and potentially useful information from data" William J Frawley,

More information

1 File Processing Systems

1 File Processing Systems COMP 378 Database Systems Notes for Chapter 1 of Database System Concepts Introduction A database management system (DBMS) is a collection of data and an integrated set of programs that access that data.

More information

DEEP FILM ACCESS Project (Digital Transforma4ons in the Arts and Humani4es: Big Data) February 2014 April 2015

DEEP FILM ACCESS Project (Digital Transforma4ons in the Arts and Humani4es: Big Data) February 2014 April 2015 DEEP FILM ACCESS Project (Digital Transforma4ons in the Arts and Humani4es: Big Data) February 2014 April 2015 Dr Sarah Atkinson (PI) s.a.atkinson@brighton.ac.uk Interdisciplinary Principal Inves4gator:

More information

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015 An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content

More information

Data Mining + Business Intelligence. Integration, Design and Implementation

Data Mining + Business Intelligence. Integration, Design and Implementation Data Mining + Business Intelligence Integration, Design and Implementation ABOUT ME Vijay Kotu Data, Business, Technology, Statistics BUSINESS INTELLIGENCE - Result Making data accessible Wider distribution

More information

Project Por)olio Management

Project Por)olio Management Project Por)olio Management Important markers for IT intensive businesses Rest assured with Infolob s project management methodologies What is Project Por)olio Management? Project Por)olio Management (PPM)

More information

Pa8ern Recogni6on. and Machine Learning. Chapter 4: Linear Models for Classifica6on

Pa8ern Recogni6on. and Machine Learning. Chapter 4: Linear Models for Classifica6on Pa8ern Recogni6on and Machine Learning Chapter 4: Linear Models for Classifica6on Represen'ng the target values for classifica'on If there are only two classes, we typically use a single real valued output

More information

Social Media Analy.cs (SMA)

Social Media Analy.cs (SMA) Social Media Analy.cs (SMA) Emanuele Della Valle DEIB - Politecnico di Milano emanuele.dellavalle@polimi.it hap://emanueledellavalle.org What's social media? haps://www.youtube.com/watch?v=sgniiud_oqg

More information

Protec'ng Communica'on Networks, Devices, and their Users: Technology and Psychology

Protec'ng Communica'on Networks, Devices, and their Users: Technology and Psychology Protec'ng Communica'on Networks, Devices, and their Users: Technology and Psychology Alexey Kirichenko, F- Secure Corpora7on ICT SHOK, Future Internet program 30.5.2012 Outline 1. Security WP (WP6) overview

More information

An Integrated Approach to Manage IT Network Traffic - An Overview Click to edit Master /tle style

An Integrated Approach to Manage IT Network Traffic - An Overview Click to edit Master /tle style An Integrated Approach to Manage IT Network Traffic - An Overview Click to edit Master /tle style Agenda A quick look at ManageEngine Tradi/onal Traffic Analysis Techniques & Tools Changing face of Network

More information

B2B Offerings. Helping businesses op2mize. Infolob s amazing b2b offerings helps your company achieve maximum produc2vity

B2B Offerings. Helping businesses op2mize. Infolob s amazing b2b offerings helps your company achieve maximum produc2vity B2B Offerings Helping businesses op2mize Infolob s amazing b2b offerings helps your company achieve maximum produc2vity What is B2B? B2B is shorthand for the sales prac4ce called business- to- business

More information

Database Security. Sarajane Marques Peres, Ph.D. University of São Paulo www.each.usp.br/sarajane

Database Security. Sarajane Marques Peres, Ph.D. University of São Paulo www.each.usp.br/sarajane Database Security Sarajane Marques Peres, Ph.D. University of São Paulo www.each.usp.br/sarajane Based on Elsmari x Navathe / Silberschatz, Korth, Sudarshan s books Types of security Legal and ethical

More information

SDN- based Mobile Networking for Cellular Operators. Seil Jeon, Carlos Guimaraes, Rui L. Aguiar

SDN- based Mobile Networking for Cellular Operators. Seil Jeon, Carlos Guimaraes, Rui L. Aguiar SDN- based Mobile Networking for Cellular Operators Seil Jeon, Carlos Guimaraes, Rui L. Aguiar Background The data explosion currently we re facing with has a serious impact on current cellular networks

More information

How To Understand Cloud Compueng

How To Understand Cloud Compueng Data Management in the Cloud Introduc)on (Lecture 1) Do one thing every day that scares you. Eleanor Roosevelt 1 Data Management in the Cloud LOGISTICS AND ORGANIZATION 2 Kris)n TuCe FAB 115-09 Personnel

More information

Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland

Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland Data Mining and Knowledge Discovery in Databases (KDD) State of the Art Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland 1 Conference overview 1. Overview of KDD and data mining 2. Data

More information

Discovering Computers Fundamentals, 2010 Edition. Living in a Digital World

Discovering Computers Fundamentals, 2010 Edition. Living in a Digital World Discovering Computers Fundamentals, 2010 Edition Living in a Digital World Objec&ves Overview Discuss the importance of project management, feasibility assessment, documenta8on, and data and informa8on

More information

An Introduction to WEKA. As presented by PACE

An Introduction to WEKA. As presented by PACE An Introduction to WEKA As presented by PACE Download and Install WEKA Website: http://www.cs.waikato.ac.nz/~ml/weka/index.html 2 Content Intro and background Exploring WEKA Data Preparation Creating Models/

More information

March 10 th 2011, OSG All Hands Mee6ng, Network Performance Jason Zurawski Internet2 NDT

March 10 th 2011, OSG All Hands Mee6ng, Network Performance Jason Zurawski Internet2 NDT March 10 th 2011, OSG All Hands Mee6ng, Network Performance Jason Zurawski Internet2 NDT Agenda Tutorial Agenda: Network Performance Primer Why Should We Care? (15 Mins) GeNng the Tools (10 Mins) Use of

More information

Splunk for Data Science

Splunk for Data Science Copyright 2014 Splunk Inc. Splunk for Data Science Tom LaGa=a Data Scien@st, Splunk Olivier de Garrigues Sr Prof Services Consultant, Splunk Disclaimer During the course of this presenta@on, we may make

More information

An Introduction to Data Mining

An Introduction to Data Mining An Introduction to Intel Beijing wei.heng@intel.com January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail

More information

BIG DATA AND INVESTIGATIVE ANALYTICS

BIG DATA AND INVESTIGATIVE ANALYTICS The New Fron+er BIG DATA AND INVESTIGATIVE ANALYTICS A Publication of Infobright Table of Contents Introduc+on 3 Chapter 1: What Is Inves+ga+ve Analy+cs?. 4 Chapter 2: Top Five Requirements for Inves+ga+ve

More information

Program Model: Muskingum University offers a unique graduate program integra6ng BUSINESS and TECHNOLOGY to develop the 21 st century professional.

Program Model: Muskingum University offers a unique graduate program integra6ng BUSINESS and TECHNOLOGY to develop the 21 st century professional. Program Model: Muskingum University offers a unique graduate program integra6ng BUSINESS and TECHNOLOGY to develop the 21 st century professional. 163 Stormont Street New Concord, OH 43762 614-286-7895

More information

CS 91: Cloud Systems & Datacenter Networks Failures & Replica=on

CS 91: Cloud Systems & Datacenter Networks Failures & Replica=on CS 91: Cloud Systems & Datacenter Networks Failures & Replica=on Types of Failures fail stop : process/machine dies and doesn t come back. Rela=vely easy to detect. (oien planned) performance degrada=on:

More information

DTCC Data Quality Survey Industry Report

DTCC Data Quality Survey Industry Report DTCC Data Quality Survey Industry Report November 2013 element 22 unlocking the power of your data Contents 1. Introduction 3 2. Approach and participants 4 3. Summary findings 5 4. Findings by topic 6

More information

An Overview of Database management System, Data warehousing and Data Mining

An Overview of Database management System, Data warehousing and Data Mining An Overview of Database management System, Data warehousing and Data Mining Ramandeep Kaur 1, Amanpreet Kaur 2, Sarabjeet Kaur 3, Amandeep Kaur 4, Ranbir Kaur 5 Assistant Prof., Deptt. Of Computer Science,

More information

Data Warehouses and NoSQL Sharing Administra6ve Informa6on

Data Warehouses and NoSQL Sharing Administra6ve Informa6on Data Warehouses and NoSQL Sharing Administra6ve Informa6on Carmen Barandela So-ware Engineer CERN / GS AIS October 24 28, 2011 JINR/CERN Grid and Management Informa6on Systems Agenda Data Warehouses in

More information

Data Mining. Yeow Wei Choong Anne Laurent

Data Mining. Yeow Wei Choong Anne Laurent Data Mining Yeow Wei Choong Anne Laurent Why Mine Data? Commercial Viewpoint Lots of data is being collected and warehoused Web data, e-commerce purchases at department/ grocery stores Bank/Credit Card

More information

What Do Our Data Tell Us: Two Reports Examining Correla;ons in Utah Data

What Do Our Data Tell Us: Two Reports Examining Correla;ons in Utah Data What Do Our Data Tell Us: Two Reports Examining Correla;ons in Utah Data SUSAN LOVING, TRANSITION SPECIALIST UTAH STATE OFFICE OF EDUCATION SUSAN.LOVING@SCHOOLS.UTAH.GOV 1 Disclaimer This presenta-on is

More information

Managing Variability in Software Architectures 1 Felix Bachmann*

Managing Variability in Software Architectures 1 Felix Bachmann* Managing Variability in Software Architectures Felix Bachmann* Carnegie Bosch Institute Carnegie Mellon University Pittsburgh, Pa 523, USA fb@sei.cmu.edu Len Bass Software Engineering Institute Carnegie

More information

Introduction Predictive Analytics Tools: Weka

Introduction Predictive Analytics Tools: Weka Introduction Predictive Analytics Tools: Weka Predictive Analytics Center of Excellence San Diego Supercomputer Center University of California, San Diego Tools Landscape Considerations Scale User Interface

More information

San Jacinto College Banner & Enterprise Applica5on Review Task Force Report. November 01, 2011 FINAL

San Jacinto College Banner & Enterprise Applica5on Review Task Force Report. November 01, 2011 FINAL San Jacinto College Banner & Enterprise Applica5on Review Task Force Report November 01, 2011 FINAL 1 Content Review goal and approach 3 Barriers to effec5ve use of Banner: Consultant observa5ons 10 Consultant

More information

Networked Virtual Spaces and Clouds. Magda El Zarki UC Irvine

Networked Virtual Spaces and Clouds. Magda El Zarki UC Irvine Networked Virtual Spaces and Clouds Magda El Zarki UC Irvine Outline Introduc6on to Networked Virtual Environments (NVE) Networked Virtual Environment Architectures Quality of Experience Clouds and real

More information

Mission. To provide higher technological educa5on with quality, preparing. competent professionals, with sound founda5ons in science, technology

Mission. To provide higher technological educa5on with quality, preparing. competent professionals, with sound founda5ons in science, technology Mission To provide higher technological educa5on with quality, preparing competent professionals, with sound founda5ons in science, technology and innova5on, commi

More information

From Big Data to Value

From Big Data to Value From Big Data to Value The Power of Master Data Management 2.0 Sergio Juarez SVP Elemica EMEA & LATAM Reveal Oct 2014 Agenda Master Data Management Why Now? What To Do? How To Do It? What s Next? Today

More information

ANALYTICS IN BIG DATA ERA

ANALYTICS IN BIG DATA ERA ANALYTICS IN BIG DATA ERA ANALYTICS TECHNOLOGY AND ARCHITECTURE TO MANAGE VELOCITY AND VARIETY, DISCOVER RELATIONSHIPS AND CLASSIFY HUGE AMOUNT OF DATA MAURIZIO SALUSTI SAS Copyr i g ht 2012, SAS Ins titut

More information

ICOM 6005 Database Management Systems Design. Dr. Manuel Rodríguez Martínez Electrical and Computer Engineering Department Lecture 2 August 23, 2001

ICOM 6005 Database Management Systems Design. Dr. Manuel Rodríguez Martínez Electrical and Computer Engineering Department Lecture 2 August 23, 2001 ICOM 6005 Database Management Systems Design Dr. Manuel Rodríguez Martínez Electrical and Computer Engineering Department Lecture 2 August 23, 2001 Readings Read Chapter 1 of text book ICOM 6005 Dr. Manuel

More information

Introduction to Database Systems

Introduction to Database Systems Introduction to Database Systems A database is a collection of related data. It is a collection of information that exists over a long period of time, often many years. The common use of the term database

More information

Let s Get Nerdy: Inside Tips on Florida s Workers Compensa:on with a Dose of PEOs. Meet Your Presenter. Going Beyond the Basics.

Let s Get Nerdy: Inside Tips on Florida s Workers Compensa:on with a Dose of PEOs. Meet Your Presenter. Going Beyond the Basics. Let s Get Nerdy: Inside Tips on Florida s Workers Compensa:on with a Dose of PEOs Going Beyond the Basics Meet Your Presenter Frank Pennachio Co-founder Partner Oceanus Partners Author, Speaker and Sales

More information

Machine Learning with MATLAB David Willingham Application Engineer

Machine Learning with MATLAB David Willingham Application Engineer Machine Learning with MATLAB David Willingham Application Engineer 2014 The MathWorks, Inc. 1 Goals Overview of machine learning Machine learning models & techniques available in MATLAB Streamlining the

More information

Data Science And Big Data Analytics Course

Data Science And Big Data Analytics Course Data Science And Big Data Analytics Course Copyright 2014 EMC Corpora3on. All Rights Reserved. Introduc3on and Course Agenda 1 Introduc3on and Course Agenda 2 Introduc3on and Course Agenda The following

More information

Introduction. A. Bellaachia Page: 1

Introduction. A. Bellaachia Page: 1 Introduction 1. Objectives... 3 2. What is Data Mining?... 4 3. Knowledge Discovery Process... 5 4. KD Process Example... 7 5. Typical Data Mining Architecture... 8 6. Database vs. Data Mining... 9 7.

More information