Introduc)on to Large Databases & Data Mining
|
|
- Andra Underwood
- 7 years ago
- Views:
Transcription
1 Introduc)on to Large Databases & Data Mining Tips for Assembling Your Data Analysis Toolbox for the 22 nd Century 10/05/12 Jim Heasley, Ins)turte for Astronomy 1
2 Outline I Rela)onal Databases & BIG DATA Big data volumes require a new data handling paradigm Advantages of a rela)onal database Organiza)on of data Data integrity SQL Structured (and almost standard) query language for queries What a database is not. 10/05/12 Jim Heasley, Ins)turte for Astronomy 2
3 Outline II Data mining What is it? Common data mining tasks (FREE) Tools available to you to perform many of these tasks. 10/05/12 Jim Heasley, Ins)turte for Astronomy 3
4 Outline III Examples Imagined & Real If we only had )me travel Things one might start to do with PAN STARRS data (right now). 10/05/12 Jim Heasley, Ins)turte for Astronomy 4
5 RELATIONAL DATABASES 10/05/12 Jim Heasley, Ins)turte for Astronomy 5
6 Basic Defini8ons Database: A collec)on of related data organized to provide informa)on. Data: Known facts that can be recorded and have an implicit meaning. Oben integrated from several sources. Stored in a standard format for use by mul)ple applica)ons. Database Management System (DBMS): A sobware package/ system to facilitate the crea)on and maintenance of a computerized database. Database System: The DBMS sobware together with the data itself and the hardware upon which it runs. Some)mes, the applica)ons are also included. 10/05/12 Jim Heasley, Ins)turte for Astronomy 6
7 Two approaches Generally, there are two approaches to extract informa)on from data: file processing approach file based sobware programs database approach DBMS 10/05/12 Jim Heasley, Ins)turte for Astronomy 7
8 File processing approach Application program 1 Application program n Data Instructions... Data Instructions Each application program has a specific purpose Each program uses its own data Issues: data redundancy redundant processes/interfaces data integrity ease of maintenance consistency Security preserva)on valuable company asset access control 10/05/12 Jim Heasley, Ins)turte for Astronomy 8
9 Mo8va8on for databases Data is a very important asset of an organiza)on Mo)va)ons for databases to maintain data independent from applica)on programs to avoid: redundant data redundant processes/interfaces to enable: ease of maintenance sharing of data data access control 10/05/12 Jim Heasley, Ins)turte for Astronomy 9
10 Database approach DBMS Application program 1 Instructions Data... Metadata Application program n Instructions DBMS a general purpose sobware is self describing contains data metadata (i.e. data about data) 10/05/12 Jim Heasley, Ins)turte for Astronomy 10
11 Main Characteris8cs of the Database Approach Self describing nature of a database system: A DBMS catalog stores the descrip)on of a par)cular database (e.g. data structures, types, and constraints) Insula8on between programs and data: Called program data independence. Data Abstrac8on: A data model is used to hide storage details and present the users with a conceptual view of the database. Support of mul8ple views of the data: Each user may see a different view of the database, which describes only the data of interest to that user. Concurrent Execu8ons 10/05/12 Jim Heasley, Ins)turte for Astronomy 11
12 Characteris8cs of DBMS Data is: integrated, shared, persistent self describing Abstrac)on program and data independence Mul)ple views of the data different users need different kinds of informa)on 10/05/12 Jim Heasley, Ins)turte for Astronomy 12
13 Advantages of Using the Database Controlling redundancy Sharing of data among mul)ple users. Restric)ng unauthorized access to data. Providing persistent storage for program Objects Providing Storage Structures (e.g. indexes) for efficient Query Processing backup and recovery services. mul)ple interfaces to different classes of users. complex rela)onships among data. integrity constraints. Drawing inferences and ac)ons from the stored data using deduc)ve and ac)ve rules Approach 10/05/12 Jim Heasley, Ins)turte for Astronomy 13
14 Addi8onal advantages of the database approach Re use of data across mul)ple applica)ons Data structure and access can be changed without changing applica)ons Enforcement of standards and computa)on of sta)s)cs Improved responsiveness, produc)vity 10/05/12 Jim Heasley, Ins)turte for Astronomy 14
15 Addi8onal Implica8ons of Using the Database Approach Poten)al for enforcing standards Reduced applica)on development )me Flexibility to change data structures Availability of current informa)on Extremely important for on line transac)on systems such as airline, hotel, car reserva)ons. Economies of scale 10/05/12 Jim Heasley, Ins)turte for Astronomy 15
16 Disadvantages of the database approach Complexity Size (of sobware and applica)on) Cost Performance Risk of (spectacular!) failures 10/05/12 Jim Heasley, Ins)turte for Astronomy 16
17 When not to use a DBMS Main inhibitors (costs) of using a DBMS: High ini)al investment and possible need for addi)onal hardware. Overhead for providing generality, security, concurrency control, recovery, and integrity func)ons. When a DBMS may be unnecessary: If the database and applica)ons are simple, well defined, and not expected to change. If access to data by mul)ple users is not required. When no DBMS may suffice: If the database system is not able to handle the complexity of data because of modeling limita)ons If the database users need special opera)ons not supported by the DBMS. 10/05/12 Jim Heasley, Ins)turte for Astronomy 17
18 Database Logic Opera)ons within the database are governed by standard set theory and logic. New types of databases that are built upon fuzzy sets, fuzzy logic, and fuzzy measure are currently the subject of ac)ve research, but are not (as yet) widely available. The two key set opera)ons of interest in databases are INTERSECTION (the JOIN) and UNION (called the same in the DB world). 10/05/12 Jim Heasley, Ins)turte for Astronomy 18
19 Structured Query Language The user usually interacts with the database by expressing what she/he wants to accomplish by expressing the request in SQL. Note SQL tells the database what you want to do, but not how to do it. There are many helpful tutorials about SQL available on the web. An excellent introduc)on is available at www2.aao.gov.au/2dfgrs/public/release/database/sql_intro.pdf This introduc)on is sufficiently vanilla it will get you started despite the minor varia)ons between different flavors of SQL 10/05/12 Jim Heasley, Ins)turte for Astronomy 19
20 The Schema The logical schema defines how aoributes are assigned to various tables and the defini)on of keys (indexes) that help to )e tables together. A user must have understanding of the logical schema. The physical schema defines how the data tables are stored on the physical storage media (e.g., disks). Generally, users do not need to know the physical schema although the system developers must leverage this to maximize the performance of their system. 10/05/12 Jim Heasley, Ins)turte for Astronomy 20
21 User Queries Users develop queries to the database in a procedural language, usually some form of SQL, that builds requests for informa)on stored in the databases tables, oben making use of internal rela)onships inherent in the data (e.g., intersec)ons between different tables). 10/05/12 Jim Heasley, Ins)turte for Astronomy 21
22 The SQL Select Command The most frequently used SQL command (by the typical users) is the SELECT command. This is used to get (i.e. select) data from the database tables. The basic syntax of the SELECT command is SELECT (list of aoributes you want) FROM (list of tables containing them) WHERE (list of limi)ng/restric)ng condi)ons) 10/05/12 Jim Heasley, Ins)turte for Astronomy 22
23 What a Database isn t! While the column arrangement of aoributes in database tables might remind the user of a spreadsheet program like Excel, a database is not a compu)ng engine. Further, because of the nature of SQL, the user s query simply defines what data is wanted, not how to get it. That also includes how the database may choose to execute numerical opera)ons the user embeds in the query. 10/05/12 Jim Heasley, Ins)turte for Astronomy 23
24 Database Technology Statistics Machine Learning Data Mining Visualization Information Science Other Disciplines DATA MINING: CONFLUENCE OF MULTIPLE DISCIPLINES 10/05/12 Jim Heasley, Ins)turte for Astronomy 24
25 The purpose of compu)ng is insight, not numbers. Richard Hamming, in the preface to his 1962 text on numerical methods. 10/05/12 Jim Heasley, Ins)turte for Astronomy 25
26 What is Data Mining? Finding (meaningful) paoerns in data Classifica)on Associa)on Rules Cluster Analysis Anomaly Detec)on Regression Data mining tools have been used extensively in Biology, gene)cs, medical research (Bioinforma)cs) Business and Economics Ecology and resource management Engineering Literature Music Voice and facial recogni)on 10/05/12 Jim Heasley, Ins)turte for Astronomy 26
27 Don t Re invent the Wheel! 10/05/12 Jim Heasley, Ins)turte for Astronomy 27
28 Rela8onship between Databases & Data Mining Databases are oben a key component in data mining. One oben finds data warehouses providing the informa)on needed by the mining tools. However, one usually finds that the actual data mining opera)ons are executed outside the database itself. Databases are excellent informa)on severs but are not good compute engines! 10/05/12 Jim Heasley, Ins)turte for Astronomy 28
29 Classifica8on: Defini8on Given a collec)on of records (training set ) Each record contains a set of a<ributes, one of the aoributes is the class. Find a model for class aoribute as a func)on of the values of other aoributes. Goal: previously unseen records should be assigned a class as accurately as possible. A test set is used to determine the accuracy of the model. Usually, the given data set is divided into training and test sets, with training set used to build the model and test set used to validate it. 10/05/12 Jim Heasley, Ins)turte for Astronomy 29
30 Associa8on Rule Mining Given a set of transac)ons, find rules that will predict the occurrence of an item based on the occurrences of other items in the transac)on Market Basket transac)ons Example of Associa)on Rules {Diaper} {Beer}, {Milk, Bread} {Eggs,Coke}, {Beer, Bread} {Milk}, Implica)on means co occurrence, not causality! 10/05/12 Jim Heasley, Ins)turte for Astronomy 30
31 What is Cluster Analysis? Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to) the objects in other groups Intra-cluster distances are minimized Inter-cluster distances are maximized 10/05/12 Jim Heasley, Ins)turte for Astronomy 31
32 Anomaly/Outlier Detec8on What are anomalies/outliers? The set of data points that are considerably different than the remainder of the data Variants of Anomaly/Outlier Detec)on Problems Given a database D, find all the data points x D with anomaly scores greater than some threshold t Given a database D, find all the data points x D having the top n largest anomaly scores f(x) Given a database D, containing mostly normal (but unlabeled) data points, and a test point x, compute the anomaly score of x with respect to D Applica)ons: Credit card fraud detec)on, telecommunica)on fraud detec)on, network intrusion detec)on, fault detec)on 10/05/12 Jim Heasley, Ins)turte for Astronomy 32
33 Regression (Predic8on) Regression is the process of finding a func)on that describes data classes for the purpose of being able to predict discrete numerical data values. Numerous approaches for developing the desired func)on exist, including classifica)on (IF THEN) rules, decision trees, mathema)cal formulae, or neural networks. Predic)on also encompasses the iden)fica)on of distribu)on trends based on the available data. Both classifica)on and predic)on may need to be preceded by relevance analysis, which aoempts to iden)fy those aoributes or features that do not contribute to the classifica)on or predic)on process. These aoributes can then be excluded from the analysis. A common relevance analysis technique is principal component analysis. 10/05/12 Jim Heasley, Ins)turte for Astronomy 33
34 Machine Learning 10/05/12 Jim Heasley, Ins)turte for Astronomy 34
35 Data Mining Environments There are a large number of data mining sobware packages available, both commercial and open source. A search of the internet can quickly iden)fy these. A comprehensive review of these packages is far beyond the scope of what we can deal with in this talk, so I will restrict my comments here to several well known packages used for data analysis and mining: the R sta)s)cal analysis package, Matlab (and the open source work alike Octave), and data mining packages Weka and Scikits.Learn. 10/05/12 Jim Heasley, Ins)turte for Astronomy 35
36 The R Project for Sta8s8cal Compu8ng project.org/ R, also called GNU S, is a strongly func)onal language and environment to sta)s)cally explore data sets, make many graphical displays of data. Very strong sta)sical tools. The basic system has been greatly expanded by the addi)on of packages developed by its user community 10/05/12 Jim Heasley, Ins)turte for Astronomy 36
37 Matlab (Octave) MATLAB, a commercial product from MathWorks, is a high level technical compu)ng language and interac)ve environment for algorithm development, data visualiza)on, data analysis, and numerical modeling. hop:// GNU Octave is a high level interpreted language, primarily intended for numerical computa)ons. It is ian open source work alike version of MATLAB. hop:// 10/05/12 Jim Heasley, Ins)turte for Astronomy 37
38 Weka (Waikato Environment for Knowledge Analysis) is a well known suite of machine learning sobware that supports several typical data mining tasks, par)cularly data preprocessing, clustering, classifica)on, regression, visualiza)on, and feature selec)on. Its techniques are based on the hypothesis that the data is available as a single flat file or rela)on, where each data point is labeled by a fixed number of aoributes. Weka provides access to SQL databases u)lizing Java Database Connec)vity and can process the result returned by a database query. Its main user interface is the Explorer, but the same func)onality can be accessed from the command line or through the component based Knowledge Flow interface. hop:// 10/05/12 Jim Heasley, Ins)turte for Astronomy 38
39 scikit learn is a Python module integra)ng classic machine learning algorithms in the )ghtly knit scien)fic Python world (numpy, scipy, matplotlib). It aims to provide simple and efficient solu)ons to learning problems, accessible to everybody and reusable in various contexts: machine learning as a versa)le tool for science and engineering. Tools are available for supervised & unsupervised learning, model selec)on, datasets, feature extrac)on. hop://scikit learn.org/stable/ 10/05/12 Jim Heasley, Ins)turte for Astronomy 39
40 Pluses, Minuses, Observa8ons The R and Weka sobware both have a large community which contributes to extending their func)onality through the development of new add on packages. Further R and Weka can be interfaced via the RWeka package. There are many excellent on line tutorials for these packages, and Weka itself is well described in the text Data Mining PracBcal Machine Learning Tools and Techniques by Wioen, Frank, & Hall. This text provides both a good underpinning of the methods and prac)cal tutorial informa)on. (The text is available as an e book.) Scikits.learn, while s)ll fairly new (current release is version 0.7), has a very impressive collec)on of tools and an extensive user guide. The sobware is wrioen in Python. My main reserva)on about this sobware is that while the user guide presents many examples, there is an implicit assump)on that the user knows a great deal about the field of data mining. This may leave the new user somewhat in over their head in trying to determine exactly which tool best serves their need. 10/05/12 Jim Heasley, Ins)turte for Astronomy 40
41 EXAMPLES IMAGINARY & REAL 10/05/12 Jim Heasley, Ins)turte for Astronomy 41
42 How could we have helped this lady? 10/05/12 Jim Heasley, Ins)turte for Astronomy 42
43 10/05/12 Jim Heasley, Ins)turte for Astronomy 43
44 10/05/12 Jim Heasley, Ins)turte for Astronomy 44
45 Or these gentlemen? 10/05/12 Jim Heasley, Ins)turte for Astronomy 45
46 10/05/12 Jim Heasley, Ins)turte for Astronomy 46
47 Or him? 10/05/12 Jim Heasley, Ins)turte for Astronomy 47
48 Pan STARRS Opportuni8es The PS1 Small Area Survey (SAS), covering an area of 81 deg 2, overlaps with the SDSS Stripe 82. In addi)on to the deep Stripe 82 database, the images from this region have been examined by the Ci)zen Science team known as the Galaxy Zoo. This interes)ng overlap of resources provides data for some exci)ng data mining experiments. Star Galaxy classifica)on (or more precisely, Star Galaxy QSO classifica)on) is an on going challenge for the PS1 science teams. While this work has been reasonably successful, the efforts thus far seem to have aoempted to get by with the simplest possible classifica)on approach. What might happen if we performed a classifica)on exercise wherein we use a wide range of IPP measurements (e.g., psf, Kron, Petrosian magnitude, Petrosian radii, various moments measured in individual frames and stack) with SDSS and Galaxy Zoo data providing classifica)on truth? A similar analysis, using visual inspec)on of the images to iden)fy ar)facts in the PS1 images and/or stacks, might provide a robust garbage rejec)on process. Not necessarily glamorous but definitely important. 10/05/12 Jim Heasley, Ins)turte for Astronomy 48
49 Empirical Photo Z Methods Ar)ficial Neural Networks Support Vector Machines Self Organizing Maps Gaussian Process Regression Kernel Regression Linear/Nonlinear polynomial fixng Instance Based Learning & Nearest Neighbors Boosted Decision Trees Regression Trees And these are just the ones I ve found so far! 10/05/12 Jim Heasley, Ins)turte for Astronomy 49
50 Galaxy Clusters? We all know the best way to iden)fy clusters of galaxies is from their x ray emission. Unfortunately, current x ray surveys don t provide sufficient sky & depth coverage to do this. Op)cal surveys have sufficient depth but suffer from background issues, overlapping foreground & background clusters, etc. It has long been hoped that in large scale op)cal surveys such as Pan STARRS and LSST, we will be able to use Photo Z values to sort out real clusters from accidental clustering of galaxies, and overlapping clusters at different distances. (Some of the PS1 partners in Taiwan are working on this problem.) 10/05/12 Jim Heasley, Ins)turte for Astronomy 50
51 Galaxy Clusters Can Data Mining Help? While there is a plethora of data mining techniques for finding clusters within data, most are probably not well suited for finding galaxy clusters. Many methods start off by assuming that in a given region that one knows how many clusters are present. Clearly this is not the case with our problem. Further, we need to deal with the fact that in the 3 D representa)on, we have much larger uncertainty along the line of sight due to the accuracy of the Photo Z measures. Some interes)ng work in this area has made use of a friend of friends approach. I think this could be generalized to include beoer background discrimina)on including the Photo Z distribu)on. 10/05/12 Jim Heasley, Ins)turte for Astronomy 51
52 PAU 10/05/12 Jim Heasley, Ins)turte for Astronomy 52
Data Warehousing. Yeow Wei Choong Anne Laurent
Data Warehousing Yeow Wei Choong Anne Laurent Databases Databases are developed on the IDEA that DATA is one of the cri>cal materials of the Informa>on Age Informa>on, which is created by data, becomes
More informationData Mining. Supervised Methods. Ciro Donalek donalek@astro.caltech.edu. Ay/Bi 199ab: Methods of Computa@onal Sciences hcp://esci101.blogspot.
Data Mining Supervised Methods Ciro Donalek donalek@astro.caltech.edu Supervised Methods Summary Ar@ficial Neural Networks Mul@layer Perceptron Support Vector Machines SoLwares Supervised Models: Supervised
More informationKeeping Pace with Big Data
- A Data Mining Perspec>ve Huan Liu, Tempe, AZ hep://www.public.asu.edu/~huanliu NSF Workshop on Big Data Analy6cs for Infrastructure and Building Resilience and Sustainability, Beijing, China Sept 19-20,
More informationBig Data. The Big Picture. Our flexible and efficient Big Data solu9ons open the door to new opportuni9es and new business areas
Big Data The Big Picture Our flexible and efficient Big Data solu9ons open the door to new opportuni9es and new business areas What is Big Data? Big Data gets its name because that s what it is data that
More informationIns+tuto Superior Técnico Technical University of Lisbon. Big Data. Bruno Lopes Catarina Moreira João Pinho
Ins+tuto Superior Técnico Technical University of Lisbon Big Data Bruno Lopes Catarina Moreira João Pinho Mo#va#on 2 220 PetaBytes Of data that people create every day! 2 Mo#va#on 90 % of Data UNSTRUCTURED
More information1 Actuate Corpora-on 2013. Big Data Business Analy/cs
1 Big Data Business Analy/cs Introducing BIRT Analy3cs Provides analysts and business users with advanced visual data discovery and predictive analytics to make better, more timely decisions in the age
More informationIbis: Scaling Python Analy=cs on Hadoop and Impala
Ibis: Scaling Python Analy=cs on Hadoop and Impala Wes McKinney, Budapest BI Forum 2015-10- 14 @wesmckinn 1 Me R&D at Cloudera Serial creator of structured data tools / user interfaces Mathema=cian MIT
More informationData Management in the Cloud: Limitations and Opportunities. Annies Ductan
Data Management in the Cloud: Limitations and Opportunities Annies Ductan Discussion Outline: Introduc)on Overview Vision of Cloud Compu8ng Managing Data in The Cloud Cloud Characteris8cs Data Management
More informationMega Modeling for Scien/fic Big Data Processing
Mega Modeling for Scien/fic Big Data Processing Stefano Ceri, Emanuele Della Valle (Politecnico di Milano) Dino Pedreschi, Roberto Trasar/ (ISTI- CNR and University of Pisa) 1 The context 2 Scenario BIG
More informationAn Open Dynamic Big Data Driven Applica3on System Toolkit
An Open Dynamic Big Data Driven Applica3on System Toolkit Craig C. Douglas University of Wyoming and KAUST This research is supported in part by the Na3onal Science Founda3on and King Abdullah University
More informationIntroduc)on to the IoT- A methodology
10/11/14 1 Introduc)on to the IoTA methodology Olivier SAVRY CEA LETI 10/11/14 2 IoTA Objec)ves Provide a reference model of architecture (ARM) based on Interoperability Scalability Security and Privacy
More informationExperiments on cost/power and failure aware scheduling for clouds and grids
Experiments on cost/power and failure aware scheduling for clouds and grids Jorge G. Barbosa, Al0no M. Sampaio, Hamid Harabnejad Universidade do Porto, Faculdade de Engenharia, LIACC Porto, Portugal, jbarbosa@fe.up.pt
More informationANALYTICAL TECHNIQUES FOR DATA VISUALIZATION
ANALYTICAL TECHNIQUES FOR DATA VISUALIZATION CSE 537 Ar@ficial Intelligence Professor Anita Wasilewska GROUP 2 TEAM MEMBERS: SAEED BOOR BOOR - 110564337 SHIH- YU TSAI - 110385129 HAN LI 110168054 SOURCES
More informationHow To Understand The Big Data Paradigm
Big Data and Its Empiricist Founda4ons Teresa Scantamburlo The evolu4on of Data Science The mechaniza4on of induc4on The business of data The Big Data paradigm (data + computa4on) Cri4cal analysis Tenta4ve
More informationIntroduc8on to Apache Spark
Introduc8on to Apache Spark Jordan Volz, Systems Engineer @ Cloudera 1 Analyzing Data on Large Data Sets Python, R, etc. are popular tools among data scien8sts/analysts, sta8s8cians, etc. Why are these
More informationThe Library (Big) Data scien4st
The Library (Big) Data scien4st IFLA/ALA webinar: Big Data: new roles and opportuni4es for new librarians June 15 th 2016 IFLA Big Data Special Interest Group (SIG) Wouter Klapwijk, Stellenbosch University,
More informationCMMI for High-Performance with TSP/PSP
Dr. Kıvanç DİNÇER, PMP Hace6epe University Implemen@ng CMMI for High-Performance with TSP/PSP Informa@on Systems & SoFware The Informa@on Systems usage has experienced an exponen@al growth over the past
More informationIntroduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing
Introduction to Data Mining and Machine Learning Techniques Iza Moise, Evangelos Pournaras, Dirk Helbing Iza Moise, Evangelos Pournaras, Dirk Helbing 1 Overview Main principles of data mining Definition
More informationCloud Data Management System (CDMS)
Cloud Management System (CMS) Wiqar Chaudry Solu9ons Engineer Senior Advisor CMS Overview he OpenStack cloud data management system features a canonical data modeling framework designed to broker context
More informationKaseya Fundamentals Workshop DAY THREE. Developed by Kaseya University. Powered by IT Scholars
Kaseya Fundamentals Workshop DAY THREE Developed by Kaseya University Powered by IT Scholars Kaseya Version 6.5 Last updated March, 2014 Day Two Overview Day Two Lab Review Patch Management Configura;on
More informationCS 4604: Introduc0on to Database Management Systems
CS 4604: Introduc0on to Database Management Systems B. Aditya Prakash Lecture #1: Introduc/on Many slides based on material by Profs. Murali, Ramakrishnan and Faloutsos Course Informa0on Instructor B.
More informationScalus A)ribute Workshop. Paris, April 14th 15th
Scalus A)ribute Workshop Paris, April 14th 15th Content Mo=va=on, objec=ves, and constraints Scalus strategy Scenario and architectural views How the architecture works Mo=va=on for this MCITN Storage
More informationData Obesity: Ethics, Law or Regulation?
Data Obesity: Ethics, Law or Regulation? Mireille Hildebrandt Chair of Smart Environments, Data Protec:on and the Rule of Law, RU Nijmegen Professor of Technology Law and Law in Technology, Vrije Universiteit
More informationBig Data in medical image processing
Big Data in medical image processing Konstan3n Bychenkov, CEO Aligned Research Group LLC bychenkov@alignedresearch.com Big data in medicine Genomic Research Popula3on Health Images M- Health hips://cloud.google.com/genomics/v1beta2/reference/
More informationLSST Data Management plans: Pipeline outputs and Level 2 vs. Level 3
LSST Data Management plans: Pipeline outputs and Level 2 vs. Level 3 Mario Juric Robert Lupton LSST DM Project Scien@st Algorithms Lead LSST SAC Name of Mee)ng Loca)on Date - Change in Slide Master 1 Data
More informationCollision Data Analysis, A Mul0 Dimensional Approach Presented by: Howard Sco> Needham, Sandarbh Singh
Masters Defense Collision Data Analysis, A Mul0 Dimensional Approach Presented by: Howard Sco> Needham, Sandarbh Singh Introduc0on! We wanted to find a large open source database so we can mine and experiment
More informationCOIS 342 - Databases
Faculty of Computing and Information Technology in Rabigh COIS 342 - Databases Chapter I The database Approach Adapted from Elmasri & Navathe by Dr Samir BOUCETTA First Semester 2011/2012 Types of Databases
More informationEnsemble Methods. Adapted from slides by Todd Holloway h8p://abeau<fulwww.com/2007/11/23/ ensemble- machine- learning- tutorial/
Ensemble Methods Adapted from slides by Todd Holloway h8p://abeau
More informationTOLOMEO. ORFEO Toolbox. Jordi Inglada - CNES. TOoLs for Open Mul/- risk assessment using Earth Observa/on data TOLOMEO
ORFEO Toolbox Jordi Inglada - CNES TOoLs for Open Mul/- risk assessment using Earth Observa/on data Outline ORFEO Toolbox : general characteris>cs Example of OTB features OTB Applica>ons & Processing Chains
More informationPython for Data Analysis and Visualiza4on. Fang (Cherry) Liu, Ph.D fang.liu@oit.gatech.edu PACE Gatech July 2013
Python for Data Analysis and Visualiza4on Fang (Cherry) Liu, Ph.D PACE Gatech July 2013 Outline System requirements and IPython Why use python for data analysis and visula4on Data set US baby names 1880-2012
More informationPhone Systems Buyer s Guide
Phone Systems Buyer s Guide Contents How Cri(cal is Communica(on to Your Business? 3 Fundamental Issues 4 Phone Systems Basic Features 6 Features for Users with Advanced Needs 10 Key Ques(ons for All Buyers
More informationEXPERIENCE WITH SERVICE OBSERVING
EXPERIENCE WITH SERVICE OBSERVING ALEXANDRA TRITSCHLER NATIONAL SOLAR OBSERVATORY 1 st SOLARNET 3 rd EAST/ATST MEETING :: 5 8 AUGUST 2013 :: OSLO, NORWAY Outline 2 Introduc?on o Current Observing Models
More informationBPO. Accerela*ng Revenue Enhancements Through Sales Support Services
BPO Accerela*ng Revenue Enhancements Through Sales Support Services What is BPO? Business Process Outsorcing (BPO) is the process of outsourcing specific business func6ons to a third- party service provider
More informationHow To Use A Webmail On A Pc Or Macodeo.Com
Big data workloads and real-world data sets Gang Lu Institute of Computing Technology, Chinese Academy of Sciences BigDataBench Tutorial MICRO 2014 Cambridge, UK INSTITUTE OF COMPUTING TECHNOLOGY 1 Five
More informationWelcome! Accelera'ng Pa'ent- Centered Outcomes Research and Methodological Research. Andrea Heckert, PhD, MPH Program Officer, Science
Accelera'ng Pa'ent- Centered Outcomes Research and Methodological Research Emily Evans, PhD, MPH Program Officer, Science Andrea Heckert, PhD, MPH Program Officer, Science June 22, 2015 Welcome! Emily
More informationMaking Sense of Big Data. Dr. Thomas E. Potok Computa2onal Data Analy2cs Group Leader Oak Ridge Na2onal Laboratory potokte@ornl.
Making Sense of Big Data Dr. Thomas E. Potok Computa2onal Data Analy2cs Group Leader Oak Ridge Na2onal Laboratory potokte@ornl.gov 865-574- 0834 ORNL s Big Data Legacy Science National Security Energy
More informationExtrac'ng People s Hobby and Interest Informa'on from Social Media Content
Extrac'ng People s Hobby and Interest Informa'on from Social Media Content Thomas Forss, Shuhua Liu and Kaj- Mikael Björk Dept of Business Administra?on and Analy?cs Arcada University of Applied Sciences
More informationPu#ng together a bioinforma1cs team: 2014 compared with 1997
Pu#ng together a bioinforma1cs team: 2014 compared with 1997 BIG DATA and Healthcare Analy3cs Melbourne, Thursday 3 rd April 2014 Terry Speed, Walter & Eliza Hall Ins3tute of Medical Research 1 Overview
More informationSo#ware quality assurance - introduc4on. Dr Ana Magazinius
So#ware quality assurance - introduc4on Dr Ana Magazinius 1 What is quality? 2 What is a good quality car? 2 and 2 2 minutes 3 characteris4cs 3 What is quality? 4 What is quality? How good or bad something
More informationProject Management Introduc1on
Project Management Introduc1on Session 1 Part I Introduc1on By Amal Le Collen, PMP Dr. Lauren1u Neamtu, PMP Session outline 1. PART I: Introduc1on 1. The Purpose of the PMBOK Guide 2. What is a project?
More informationTim Blevins Execu;ve Director Labor and Revenue Solu;ons. FTA Technology Conference August 4th, 2015
Tim Blevins Execu;ve Director Labor and Revenue Solu;ons FTA Technology Conference August 4th, 2015 Governance and Organiza;onal Strategy PaIerns of Fraud and Abuse in Government What tools can we use
More informationBig Data Visualiza9on
Big Data Visualiza9on Dr. Steve Cutchin Associate Professor Computer Science 2012 Boise State University 1 Computer Science Department 10 Faculty + 3 Lectures + 2 New hires. 400 Undergraduates Enrolled
More informationTheo JD Bothma Department of Informa1on Science theo.bothma@up.ac.za
Theo JD Bothma Department of Informa1on Science theo.bothma@up.ac.za Reflec1ons on the role of corpora and big data in e- lexicography in rela1on to end user informa1on needs CILC 2015 7th Interna1onal
More informationMSc Data Science at the University of Sheffield. Started in September 2014
MSc Data Science at the University of Sheffield Started in September 2014 Gianluca Demar?ni Lecturer in Data Science at the Informa?on School since 2014 Ph.D. in Computer Science at U. Hannover, Germany
More informationIntroduction to Data Mining
Introduction to Data Mining Jay Urbain Credits: Nazli Goharian & David Grossman @ IIT Outline Introduction Data Pre-processing Data Mining Algorithms Naïve Bayes Decision Tree Neural Network Association
More informationMap- reduce, Hadoop and The communica3on bo5leneck. Yoav Freund UCSD / Computer Science and Engineering
Map- reduce, Hadoop and The communica3on bo5leneck Yoav Freund UCSD / Computer Science and Engineering Plan of the talk Why is Hadoop so popular? HDFS Map Reduce Word Count example using Hadoop streaming
More informationAn Introduc@on to Big Data, Apache Hadoop, and Cloudera
An Introduc@on to Big Data, Apache Hadoop, and Cloudera Ian Wrigley, Curriculum Manager, Cloudera 1 The Mo@va@on for Hadoop 2 Tradi@onal Large- Scale Computa@on Tradi*onally, computa*on has been processor-
More informationBig Data and Health Insurance Product Selec6on (and a few other applica6on) Jonathan Kolstad UC Berkeley and NBER
Big Data and Health Insurance Product Selec6on (and a few other applica6on) Jonathan Kolstad UC Berkeley and NBER Introduc6on Applica6ons of behavioral economics in health SeIng where behavioral assump6ons
More informationBENCHMARKING V ISUALIZATION TOOL
Copyright 2014 Splunk Inc. BENCHMARKING V ISUALIZATION TOOL J. Green Computer Scien
More informationThe Data Reservoir. 10 th September 2014. Mandy Chessell FREng CEng FBCS Dis4nguished Engineer, Master Inventor Chief Architect, Informa4on Solu4ons
Mandy Chessell FREng CEng FBCS Dis4nguished Engineer, Master Inventor Chief Architect, Solu4ons The Reservoir 10 th September 2014 A growing demand Business Teams want Open access to more informa4on More
More informationECBDL 14: Evolu/onary Computa/on for Big Data and Big Learning Workshop July 13 th, 2014 Big Data Compe//on
ECBDL 14: Evolu/onary Computa/on for Big Data and Big Learning Workshop July 13 th, 2014 Big Data Compe//on Jaume Bacardit jaume.bacardit@ncl.ac.uk The Interdisciplinary Compu/ng and Complex BioSystems
More informationHelp Framework. Ticket Management Ticket Resolu/on Communica/ons. Ticket Assignment Follow up Customer - communica/on System updates Delay management
Help for JD Edwards Our Help Framework Ticket qualifica/on Ticket crea/on Ticket Rou/ng Closures L1 issues Resolu/on KG SOPs Co- ordinate Ticket Assignment Follow up Customer - communica/on System updates
More informationProject Overview. Collabora'on Mee'ng with Op'mis, 20-21 Sept. 2011, Rome
Project Overview Collabora'on Mee'ng with Op'mis, 20-21 Sept. 2011, Rome Cloud-TM at a glance "#$%&'$()!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!"#$%&!"'!()*+!!!!!!!!!!!!!!!!!!!,-./01234156!("*+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!&7"7#7"7!("*+!!!!!!!!!!!!!!!!!!!89:!;62!("$+!
More informationIntroduction to Data Mining
Introduction to Data Mining 1 Why Data Mining? Explosive Growth of Data Data collection and data availability Automated data collection tools, Internet, smartphones, Major sources of abundant data Business:
More informationEffec%ve AX 2012 Upgrade Project Planning and Microso< Sure Step. Arbela Technologies
Effec%ve AX 2012 Upgrade Project Planning and Microso< Sure Step Arbela Technologies Why Upgrade? What to do? How to do it? Tools and templates Agenda Sure Step 2012 Ax2012 Upgrade specific steps Checklist
More informationHunk & Elas=c MapReduce: Big Data Analy=cs on AWS
Copyright 2014 Splunk Inc. Hunk & Elas=c MapReduce: Big Data Analy=cs on AWS Dritan Bi=ncka BD Solu=ons Architecture Disclaimer During the course of this presenta=on, we may make forward looking statements
More informationTexas Digital Government Summit. Data Analysis Structured vs. Unstructured Data. Presented By: Dave Larson
Texas Digital Government Summit Data Analysis Structured vs. Unstructured Data Presented By: Dave Larson Speaker Bio Dave Larson Solu6ons Architect with Freeit Data Solu6ons In the IT industry for over
More informationSecure Because Math: Understanding ML- based Security Products (#SecureBecauseMath)
Secure Because Math: Understanding ML- based Security Products (#SecureBecauseMath) Alex Pinto Chief Data Scien2st Niddel / MLSec Project @alexcpsec @MLSecProject @NiddelCorp Agenda Security Singularity
More informationWhat is Data Mining? Data Mining (Knowledge discovery in database) Data mining: Basic steps. Mining tasks. Classification: YES, NO
What is Data Mining? Data Mining (Knowledge discovery in database) Data Mining: "The non trivial extraction of implicit, previously unknown, and potentially useful information from data" William J Frawley,
More information1 File Processing Systems
COMP 378 Database Systems Notes for Chapter 1 of Database System Concepts Introduction A database management system (DBMS) is a collection of data and an integrated set of programs that access that data.
More informationDEEP FILM ACCESS Project (Digital Transforma4ons in the Arts and Humani4es: Big Data) February 2014 April 2015
DEEP FILM ACCESS Project (Digital Transforma4ons in the Arts and Humani4es: Big Data) February 2014 April 2015 Dr Sarah Atkinson (PI) s.a.atkinson@brighton.ac.uk Interdisciplinary Principal Inves4gator:
More informationAn Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015
An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content
More informationData Mining + Business Intelligence. Integration, Design and Implementation
Data Mining + Business Intelligence Integration, Design and Implementation ABOUT ME Vijay Kotu Data, Business, Technology, Statistics BUSINESS INTELLIGENCE - Result Making data accessible Wider distribution
More informationProject Por)olio Management
Project Por)olio Management Important markers for IT intensive businesses Rest assured with Infolob s project management methodologies What is Project Por)olio Management? Project Por)olio Management (PPM)
More informationPa8ern Recogni6on. and Machine Learning. Chapter 4: Linear Models for Classifica6on
Pa8ern Recogni6on and Machine Learning Chapter 4: Linear Models for Classifica6on Represen'ng the target values for classifica'on If there are only two classes, we typically use a single real valued output
More informationSocial Media Analy.cs (SMA)
Social Media Analy.cs (SMA) Emanuele Della Valle DEIB - Politecnico di Milano emanuele.dellavalle@polimi.it hap://emanueledellavalle.org What's social media? haps://www.youtube.com/watch?v=sgniiud_oqg
More informationProtec'ng Communica'on Networks, Devices, and their Users: Technology and Psychology
Protec'ng Communica'on Networks, Devices, and their Users: Technology and Psychology Alexey Kirichenko, F- Secure Corpora7on ICT SHOK, Future Internet program 30.5.2012 Outline 1. Security WP (WP6) overview
More informationAn Integrated Approach to Manage IT Network Traffic - An Overview Click to edit Master /tle style
An Integrated Approach to Manage IT Network Traffic - An Overview Click to edit Master /tle style Agenda A quick look at ManageEngine Tradi/onal Traffic Analysis Techniques & Tools Changing face of Network
More informationB2B Offerings. Helping businesses op2mize. Infolob s amazing b2b offerings helps your company achieve maximum produc2vity
B2B Offerings Helping businesses op2mize Infolob s amazing b2b offerings helps your company achieve maximum produc2vity What is B2B? B2B is shorthand for the sales prac4ce called business- to- business
More informationDatabase Security. Sarajane Marques Peres, Ph.D. University of São Paulo www.each.usp.br/sarajane
Database Security Sarajane Marques Peres, Ph.D. University of São Paulo www.each.usp.br/sarajane Based on Elsmari x Navathe / Silberschatz, Korth, Sudarshan s books Types of security Legal and ethical
More informationSDN- based Mobile Networking for Cellular Operators. Seil Jeon, Carlos Guimaraes, Rui L. Aguiar
SDN- based Mobile Networking for Cellular Operators Seil Jeon, Carlos Guimaraes, Rui L. Aguiar Background The data explosion currently we re facing with has a serious impact on current cellular networks
More informationHow To Understand Cloud Compueng
Data Management in the Cloud Introduc)on (Lecture 1) Do one thing every day that scares you. Eleanor Roosevelt 1 Data Management in the Cloud LOGISTICS AND ORGANIZATION 2 Kris)n TuCe FAB 115-09 Personnel
More informationData Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland
Data Mining and Knowledge Discovery in Databases (KDD) State of the Art Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland 1 Conference overview 1. Overview of KDD and data mining 2. Data
More informationDiscovering Computers Fundamentals, 2010 Edition. Living in a Digital World
Discovering Computers Fundamentals, 2010 Edition Living in a Digital World Objec&ves Overview Discuss the importance of project management, feasibility assessment, documenta8on, and data and informa8on
More informationAn Introduction to WEKA. As presented by PACE
An Introduction to WEKA As presented by PACE Download and Install WEKA Website: http://www.cs.waikato.ac.nz/~ml/weka/index.html 2 Content Intro and background Exploring WEKA Data Preparation Creating Models/
More informationMarch 10 th 2011, OSG All Hands Mee6ng, Network Performance Jason Zurawski Internet2 NDT
March 10 th 2011, OSG All Hands Mee6ng, Network Performance Jason Zurawski Internet2 NDT Agenda Tutorial Agenda: Network Performance Primer Why Should We Care? (15 Mins) GeNng the Tools (10 Mins) Use of
More informationSplunk for Data Science
Copyright 2014 Splunk Inc. Splunk for Data Science Tom LaGa=a Data Scien@st, Splunk Olivier de Garrigues Sr Prof Services Consultant, Splunk Disclaimer During the course of this presenta@on, we may make
More informationAn Introduction to Data Mining
An Introduction to Intel Beijing wei.heng@intel.com January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail
More informationBIG DATA AND INVESTIGATIVE ANALYTICS
The New Fron+er BIG DATA AND INVESTIGATIVE ANALYTICS A Publication of Infobright Table of Contents Introduc+on 3 Chapter 1: What Is Inves+ga+ve Analy+cs?. 4 Chapter 2: Top Five Requirements for Inves+ga+ve
More informationProgram Model: Muskingum University offers a unique graduate program integra6ng BUSINESS and TECHNOLOGY to develop the 21 st century professional.
Program Model: Muskingum University offers a unique graduate program integra6ng BUSINESS and TECHNOLOGY to develop the 21 st century professional. 163 Stormont Street New Concord, OH 43762 614-286-7895
More informationCS 91: Cloud Systems & Datacenter Networks Failures & Replica=on
CS 91: Cloud Systems & Datacenter Networks Failures & Replica=on Types of Failures fail stop : process/machine dies and doesn t come back. Rela=vely easy to detect. (oien planned) performance degrada=on:
More informationDTCC Data Quality Survey Industry Report
DTCC Data Quality Survey Industry Report November 2013 element 22 unlocking the power of your data Contents 1. Introduction 3 2. Approach and participants 4 3. Summary findings 5 4. Findings by topic 6
More informationAn Overview of Database management System, Data warehousing and Data Mining
An Overview of Database management System, Data warehousing and Data Mining Ramandeep Kaur 1, Amanpreet Kaur 2, Sarabjeet Kaur 3, Amandeep Kaur 4, Ranbir Kaur 5 Assistant Prof., Deptt. Of Computer Science,
More informationData Warehouses and NoSQL Sharing Administra6ve Informa6on
Data Warehouses and NoSQL Sharing Administra6ve Informa6on Carmen Barandela So-ware Engineer CERN / GS AIS October 24 28, 2011 JINR/CERN Grid and Management Informa6on Systems Agenda Data Warehouses in
More informationData Mining. Yeow Wei Choong Anne Laurent
Data Mining Yeow Wei Choong Anne Laurent Why Mine Data? Commercial Viewpoint Lots of data is being collected and warehoused Web data, e-commerce purchases at department/ grocery stores Bank/Credit Card
More informationWhat Do Our Data Tell Us: Two Reports Examining Correla;ons in Utah Data
What Do Our Data Tell Us: Two Reports Examining Correla;ons in Utah Data SUSAN LOVING, TRANSITION SPECIALIST UTAH STATE OFFICE OF EDUCATION SUSAN.LOVING@SCHOOLS.UTAH.GOV 1 Disclaimer This presenta-on is
More informationManaging Variability in Software Architectures 1 Felix Bachmann*
Managing Variability in Software Architectures Felix Bachmann* Carnegie Bosch Institute Carnegie Mellon University Pittsburgh, Pa 523, USA fb@sei.cmu.edu Len Bass Software Engineering Institute Carnegie
More informationIntroduction Predictive Analytics Tools: Weka
Introduction Predictive Analytics Tools: Weka Predictive Analytics Center of Excellence San Diego Supercomputer Center University of California, San Diego Tools Landscape Considerations Scale User Interface
More informationSan Jacinto College Banner & Enterprise Applica5on Review Task Force Report. November 01, 2011 FINAL
San Jacinto College Banner & Enterprise Applica5on Review Task Force Report November 01, 2011 FINAL 1 Content Review goal and approach 3 Barriers to effec5ve use of Banner: Consultant observa5ons 10 Consultant
More informationNetworked Virtual Spaces and Clouds. Magda El Zarki UC Irvine
Networked Virtual Spaces and Clouds Magda El Zarki UC Irvine Outline Introduc6on to Networked Virtual Environments (NVE) Networked Virtual Environment Architectures Quality of Experience Clouds and real
More informationMission. To provide higher technological educa5on with quality, preparing. competent professionals, with sound founda5ons in science, technology
Mission To provide higher technological educa5on with quality, preparing competent professionals, with sound founda5ons in science, technology and innova5on, commi
More informationFrom Big Data to Value
From Big Data to Value The Power of Master Data Management 2.0 Sergio Juarez SVP Elemica EMEA & LATAM Reveal Oct 2014 Agenda Master Data Management Why Now? What To Do? How To Do It? What s Next? Today
More informationANALYTICS IN BIG DATA ERA
ANALYTICS IN BIG DATA ERA ANALYTICS TECHNOLOGY AND ARCHITECTURE TO MANAGE VELOCITY AND VARIETY, DISCOVER RELATIONSHIPS AND CLASSIFY HUGE AMOUNT OF DATA MAURIZIO SALUSTI SAS Copyr i g ht 2012, SAS Ins titut
More informationICOM 6005 Database Management Systems Design. Dr. Manuel Rodríguez Martínez Electrical and Computer Engineering Department Lecture 2 August 23, 2001
ICOM 6005 Database Management Systems Design Dr. Manuel Rodríguez Martínez Electrical and Computer Engineering Department Lecture 2 August 23, 2001 Readings Read Chapter 1 of text book ICOM 6005 Dr. Manuel
More informationIntroduction to Database Systems
Introduction to Database Systems A database is a collection of related data. It is a collection of information that exists over a long period of time, often many years. The common use of the term database
More informationLet s Get Nerdy: Inside Tips on Florida s Workers Compensa:on with a Dose of PEOs. Meet Your Presenter. Going Beyond the Basics.
Let s Get Nerdy: Inside Tips on Florida s Workers Compensa:on with a Dose of PEOs Going Beyond the Basics Meet Your Presenter Frank Pennachio Co-founder Partner Oceanus Partners Author, Speaker and Sales
More informationMachine Learning with MATLAB David Willingham Application Engineer
Machine Learning with MATLAB David Willingham Application Engineer 2014 The MathWorks, Inc. 1 Goals Overview of machine learning Machine learning models & techniques available in MATLAB Streamlining the
More informationData Science And Big Data Analytics Course
Data Science And Big Data Analytics Course Copyright 2014 EMC Corpora3on. All Rights Reserved. Introduc3on and Course Agenda 1 Introduc3on and Course Agenda 2 Introduc3on and Course Agenda The following
More informationIntroduction. A. Bellaachia Page: 1
Introduction 1. Objectives... 3 2. What is Data Mining?... 4 3. Knowledge Discovery Process... 5 4. KD Process Example... 7 5. Typical Data Mining Architecture... 8 6. Database vs. Data Mining... 9 7.
More information