Big Data Analytics in Astrophysics. Katharina Morik, Informatik 8, TU Dortmund
|
|
|
- Charla Rich
- 10 years ago
- Views:
Transcription
1 Big Data Analytics in Astrophysics Katharina Morik, Informatik 8, TU Dortmund
2 Overview Short introduction to the Collaborative Research Center SFB 876 Zooming into one of its projects: C3 Astrophysical Data Analysis IceCube Magic, FACT Tools for data analysis, streaming data Science today is based on data. Data analysis is intrinsically tied to the scientific process. Active Galactic Nuclei
3 Collaborative Research Center SFB 876: Providing Information by Resource-Constrained Data Analysis Already granted:
4 SFB 876: Cyberphysical Systems Ubiquitous systems Smart phones Multimedia Navigation Support of elder people Sensor measurements Factories Medicine Science Goals: Analysis and prediction available: Anytime, Everywhere!
5 SFB 876: Big Data Analytics Statistical analysis: Carefully collected data Sophisticated analysis. Data mining: Large data sets gathered for other purposes than analysis Simple, reliable models. Big data analytics: Large volume, velocity, variety data New storage and processing architectures. Goals: Scalable algorithms, Algorithms for new architectures
6 SFB 876: Resource-aware algorithms Cyberphysical systems produce big data. Big data analytics delivers data summaries, predictions. Foundations Beyond runtime and sample complexity! Future: memory- and energyefficient analytics. Collaborative research center SFB 876 with 13 projects, 20 professors and about 50 Ph D students.
7 Christian Sohler, theoretical CS Sangkyun Lee, Data Analysis CS Olaf Spinczyk, Embedded Systems CS Alexander Schramm, Biomedicine Christian Wietfeld, Electrical Engin. Michael Schreckenberg, Traffic Physics Jochen Deuse, Machining Jian-Jia Chen, Embedded Systems CS Kristian Kersting, Data Analysis CS Katharina Morik, Data Analysis CS Heinrich Müller, Visual Analytics CS Petra Mutzel, Algorithm Engin. CS Jörg Rahnenführer, Statistics Peter Marwedel, Embedded Systems CS Katja Ickstadt, Statistics Tim Ruhe, Astrophysics Wolfgang Rhode, Astrophysics Bernhard Spaan, Particle physics Jens Teubner, Databases CS Michael Ten Hompel, Logistics Project leaders
8 Overview Short introduction to the Collaborative Research Center SFB 876 Zooming into one of its projects: C3 Astrophysical Data Analysis IceCube Magic, FACT Tools for data analysis, streaming data Science today is based on data. Data analysis is intrinsically tied to the scientific process. Active Galactic Nuclei
9 Big Data Analytics in Astrophysics High Energy Astrophysical Phenomena Neutrinos Gamma rays Instrumentation and Methods for Astrophysics Telescopes Data Analysis IceCube Collaboration Magic I, II, FACT Project C3 of SFB 876 Prof. Dr. Dr. Wolfgang Rhode Prof. Dr. Katharina Morik Dr. Tim Ruhe Big Data is Volume, Velocity, Variety. Big Data helps if rare events are outliers. Here, we are looking for the needle in the haystack: neutrinos and gamma rays are dominated by other particles. Data analysis tool RapidMiner Feature selection MRMR Streams framework
10 Structure of the remaining talk the data analysis cycle Data Analysis is not just one step in the overall scientific investigation. Data Analysis accompanies the scientific investigation. Data Analytics is interdisciplinary in nature: Physics Statistics Data management Machine learning Software/Algorithm engineering Complexity Theory Insight Physics questions
11 Data Analysis Tools - Requirements Coverage: Support the overall cycle Amenability: Easy to use for all involved scientists Rapidity: Fast creation of analysis process Comparability: international scientific terminology Reproducibility: storing the analysis process with all parameters in a small format. Not just the modeling step. Not just for statisticians, computer scientists, physicists but for all of them. Not a bunch of programs/libraries to be used in programming. Not a new name with a little variance for known methods that have a theoretic basis. Not asking the programmer whether he remembers which parameters succeeded.
12 RapidMiner Coverage: Support the overall cycle Amenability: Easy to use for all involved scientists Rapidity: Fast creation of analysis process Comparability: international scientific terminology Reproducibility: storing the analysis process with all parameters in a small format, re-run, exchange. If 115 data transformations and 264 modeling operators are not enough: Write your own operator and plug it into RapidMiner!
13 Physics Questions Explaining Birth and Death of Stars What happens when particles collide and how, exactly? High Energy Astrophysical Phenomena Neutrinos Gamma rays Instrumentation and Methods for Astrophysics Telescopes Data Analysis Insight Physics questions
14 Death of a star 1054 Chinese astronomers detect a supernova. Its remnant is the crab nebula Jocelyn Bell, Antony Hewish discover a pulsar, which comes from a rotating neutron star. The heat of the dying star has turned electrons and protons to neutrons. Neutron stars are composed of neutrons. This releases a flux of neutrinos. Neutron stars may produce gamma-rays.
15 Neutrino genesis b - - decay: n p + e - + n e Neutron becomes Proton Electron, Anti-Electron- Neutrino b + - decay: p n + e + + n e Proton becomes Neutron Positron, Electron-Neutrino Neutrinos accompanying Electron Mu-meson (myon or muon) Tauon Myon neutrino collides with proton of hydrogen resulting in a positive pion and a negative myon.
16 Physics Questions Explaining Birth and Death of Stars What happens when particles collide and how, exactly? High Energy Astrophysical Phenomena Neutrinos Gamma rays Instrumentation and Methods for Astrophysics Telescopes Data Analysis Insight Physics questions
17 Faster than light Cherenkov radiation has been observed by Marie Curie and interpreted by Pawel Cherenkov. If a loaded particle in a medium moves faster than light, the blue Cherenkov light shows up. The medium can be: Water Ice Air. A "faster than light" neutrino discovery was actually the result of a loose cable. So the scientific community can rest easy. CERN sent a beam of neutrinos to the Gran Sasso laboratory, OPERA.
18 Breakthrough of the year 2013: IceCube The IceCube project has been awarded the 2013 Breakthrough of the Year by the British magazine Physics World. Detection of 28 neutrinos from outside the Milky Way 2 neutrinos with 1 PeV (Ernie and Bert), one even 2 PeV (Big Bird). Model the distribution of all neutrinos (frequency of neutrinos at all energies)! Separate neutrinos from other particles. Interaction in detector cosmic neutrinos IceCube collaboration: 36 research institutes in 8 countries, Francis Halzen, University of Wisconsin-Madison, Olga Botner, University of Uppsala (speakers) Atmospheric neutrinos IceCube Magic Interaction in orbit proton
19 Magic and FACT Cherenkov Telescopes Air shower is visible in the camera for about 150 Nanoseconds pixels of the camera sampled at 2 GHz rate. Hardware filter decides about storage. Records of sampled voltages of camera pixels. Model the distribution of gamma rays Separate gamma from other rays
20 Data understanding What do the data look like? How to access the data? Do we need real-time access? Physics questions Data understanding Insight
21 Data volume challenging data analysis IceCube at the South Pole: within the holes, there are Digital Optical Modules (DOM) mounted, that record Cherenkov light. The trace of muons through the 3D array of DOMs shows also the direction of incoming neutrinos. 1 Terabyte per day = 1 Mio. Gigabyte A satellite takes about 10 years to transport the data of a year to the University Wisconsin. A ship transports the data on hard discs in 28 days. No real-time analysis. Ships are faster than satellites for data transmission.
22 Skewed distribution challenging data analysis Calibration, cleaning Feature extraction Signal separation Energy estimation A simulator provides labeled observations. Gamma rays of high energy are rare events as opposed to hadrons, ratio 1 to MAGIC I (2003) and MAGIC II (2009) La Palma, Roque de los Muchachos FACT telescope, same type, same place
23 FACT-Viewer
24 Streams framework for real-time processing Streams framework offers highlevel view of streaming data (Bockermann,Blom 2012). There is a plugin for RapidMiner. It incorporates MOA for the analysis of streaming data. It compiles into Storm for execution, if wanted. The Fact-tools are a set of Java classes that provide an extension to the streams framework for reading and processing data from the FACT telescope.
25 Data understanding what is to be learned? IceCube: Overall spectrum of atmospheric neutrinos from 100 Giga ev to 1 Peta ev demands better background rejection. MAGIC, FACT: Better separation of hadrons and gamma rays. Reproducible results. Insight Physics questions Data understanding Tim Ruhe, Katharina Morik, Wolfgang Rhode Application of RapidMiner in Neutrino Astronomy in: Hofmann, Klinkenberg (eds) RapidMiner Data Mining Use Cases and Business Analytics Applications, CRC Press 2014 Mark Aartsen Katharina Morik, Wolfgang Rhode, Tim Ruhe Development of a general Analysis and Unfolding Scheme and its Application to Measure the Energy Spectrum of Atmospheric Neutrinos with IceCube in: Eur.Phys.Journal 2014
26 Data preparation training samples and features Samples Balanced stratified sample enhances learning. Training: signal neutrino events Labeling by simulation. IceCube 59 strings measurements of 346 days raw events after manual cuts No true labels given, but estimate according to distribution of neutrinos. Insight Physics questions Data preparation
27 Data preparation Feature selection Too many features hide the true pattern and slow down modeling. Redundant features may lead to wrong results. Wrong: Two features with the same meaning two times an impact. Right: Each feature should have half an impact. Quality of a feature set is given by the performance of learning using the feature set. Insight Physics questions Data preparation
28 Q -- MRMR Minimum Redundancy Maximum Relevance (MRMR) Start with empty feature set. Add the one with lowest redundancy to already chosen features D(x,x) and highest relevance w.r.t. label R(x,y) Efficient implementation by Benjamin Schowe in RapidMiner s Feature Selection Extension 1 Q R( x, y) D( x, x) j x in F j Mark Hall Correlation Based Feature Selection 1999
29 Feature selection in RapidMiner 2
30 Overfitting to a sample? Measuring stability! Split the training data into m sets. Do feature selection on m- 1 sets. Use the selected features for learning and test the learner s performance. We do that m times. One sample leads to the feature set A, another to the feature set B, both select k features out of n given ones. Jaccard Index J A B A B
31 Feature transformation and extraction Right representation eases understanding and modeling, e.g., time series: deterministic, with outlier, level change y t y t time t y t+1 y t HR t timet y t+1 y t U.Gather, M. Bauer time t y t
32 Feature selection and extraction in IceCube 200 features given, diverse constructions over raw data. Feature selection using MRMR particularly successful: Redundant features in the data, e.g., the zenith angle or its consequence is obtained from several reconstruction algorithms. 25 features selected with stability over 0.8. Creating features by binning (spectrum unfolding) and learning Tim Ruhe s method. Novel spectrum unfolding: Partition attribute values into bins (intervals) Intervals become labled classes Run multiclass learning output confidence of each class for each example ex 1 ex n bin 1 bin r Neutrino? conf Learn classifier from these data.
33 Binning in RapidMiner Numerical data Discretized into 4 bins Discretized to equal frequency bins
34 Modeling tasks classification and regression Given observations x with labels y {(x 1,y 1 ), (x n,y n )} with binominal y (classification) with real-valued y (regression) Find f(x)=y such that the error is minimized. RapidMiner offers 167 learning algorithms for classification and regression. IceCube: y in {neutrino, not neutrino} Algorithm should be robust, scalable, parallel! Insight Physics questions Modeling
35 Types of Models Lazy Modeling, Local Models K-NeirestNeighbors Additive Models Decision Trees Linear Models Linear Regression Support Vector Machine Bayesian Models
36 Ensemble Methods here: for decision trees Ensemble: Take many models and decide according to the majority vote! Algorithm Training: For l decision trees (parallel): (1) Take a sample from the data (2) Take a subset of features (3) Choose the best feature according to minimal entropy Split according to feature If purity not ok, goto (3). Weka Random Forest in RapidMiner parallel Breiman 2001, Machine Learning Journal
37 Evaluation Training and testing on different samples is necessary in order to estimate the true error. Testing on the same set would overestimate the quality of the model. Best test: leave one out requires for n observations n runs of modeling. This is not efficient. Crossvalidation: split into m partitions, use m-1 for training and test on the unused one. Output the average of all tests. Fair. Insight Physics questions Evaluation
38 Cross validation in RapidMiner Your measurement is meaningless without knowledge of the error. Walter Levin 1 2 2
39 IceCube Results We enhanced the neutrino recognition by 62% (IceCube collaboration and Morik 2014). Quality cuts lead from full data D to D, rejecting the easy 91.4% of background. Random Forest leads from D to D so that % of background muons are rejected. At this background rejection atmospheric neutrino events were detected in 346 days of IceCube 59. Insight Physics questions
40 IceCube Results For the first time, the spectrum of atmospheric neutrinos could be reconstructed up to an energy level 1 PeV. Now, theoretical astrophysics can build upon this. Physics questions Insight
41 Empirical work for theory development En [GeV cm F n s sr -1 ] log (E [GeV]) 10 n IC-59 n m Unfolding AMANDA n m unfolding IC-79 n e flux Gaisser H3a Gaisser H3a No Charm Frejus n m Frejus n e Frejus n m model Frejus n e model Honda n e Insight Physics questions Evaluation
42 Interdisciplinary work
43 Big Data Analytics in Astrophysics High Energy Astrophysical Phenomena Neutrinos Gamma rays Instrumentation and Methods for Astrophysics Telescopes Data Analysis IceCube Collaboration Magic I, II, FACT Project C3 of SFB 876 Prof. Dr. Dr. Wolfgang Rhode Prof. Dr. Katharina Morik Dr. Tim Ruhe Big Data is Volume, Velocity, Variety. Big Data helps if rare events are outliers. Here, we are looking for the needle in the haystack: neutrinos and gamma rays are dominated by other particles. Data analysis tool RapidMiner Feature selection MRMR Streams framework
44 Through-Put Performance of FACT Tools FACT records 60 events per second. Each events amounts to 3 Megabyte of raw data. 180MB/second are to be processed! Average processing time in milliseconds at a log scale shows the overall process ending with a classifier application.
45 Conclusion Data Analysis is needed in order to make good use of big data (in physics). We have seen the streams framework that processes telescope data in real-time. Choosing the right representation is the key to excellent results. We have seen a stable MRMR feature selection. We have seen a learning process producing features for learning unfolding. RapidMiner supports the overall cycle Select a method by a click! Store the process for documentation and exchange.
46
47 Further work The unfolding method could use other binning methods: Reduce entropy within a bin. Tauon neutrinos are hard to recognize, but we ll try! Real-time analysis could support the array of Cherenkov telescopes! S.G.Djorgovski (Caltech, USA) Astronomy has become an immensely data-rich field. There is a need for powerful DM/KDD tools.
48 SFB 876: A projects combine cyberphysical systems with big data analytics A1: new algorithms for ubiquitous systems, which are memory- and energy-aware, Integer probabilistic graphical models App prediction for energy savings A2: theoretical basis for memoryaware streaming clustering A3: methods from embedded systems for performance enhancements of R A4: platform for the analysis of resource consumption A6: Social network analysis Data stream analysis Exploiting sparseness Approximation Parallelism (GPU)
49 SFB 876: B projects focus on cyberphysical systems B1 Breath analysis B2 Virus detection B3 Industrie 4.0 Quality prediction Steel production B4 Traffic prognosis New Sensors Direct control Real-time application of learned models
50 SFB 876: C projects focus on very large data C1 High dimensional microarray data C3 Very high frequent astrophysical data C4 Regression for large scale high dimensional data C5 High capacity particle physics (CERN) Neuroblastoma outcome prediction Feature selection Storage and filtering
Monday Morning Data Mining
Monday Morning Data Mining Tim Ruhe Statistische Methoden der Datenanalyse Outline: - data mining - IceCube - Data mining in IceCube Computer Scientists are different... Fakultät Physik Fakultät Physik
Olga Botner, Uppsala. Photo: Sven Lidström. Inspirationsdagar, 2015-03-17
Olga Botner, Uppsala Photo: Sven Lidström Inspirationsdagar, 2015-03-17 OUTLINE OUTLINE WHY NEUTRINO ASTRONOMY? WHAT ARE NEUTRINOS? A CUBIC KILOMETER DETECTOR EXTRATERRESTRIAL NEUTRINOS WHAT NEXT? CONCLUSIONS
Information Management course
Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 01 : 06/10/2015 Practical informations: Teacher: Alberto Ceselli ([email protected])
Introduction to Data Mining
Introduction to Data Mining 1 Why Data Mining? Explosive Growth of Data Data collection and data availability Automated data collection tools, Internet, smartphones, Major sources of abundant data Business:
Big Data Analytics. An Introduction. Oliver Fuchsberger University of Paderborn 2014
Big Data Analytics An Introduction Oliver Fuchsberger University of Paderborn 2014 Table of Contents I. Introduction & Motivation What is Big Data Analytics? Why is it so important? II. Techniques & Solutions
The Scientific Data Mining Process
Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In
Example application (1) Telecommunication. Lecture 1: Data Mining Overview and Process. Example application (2) Health
Lecture 1: Data Mining Overview and Process What is data mining? Example applications Definitions Multi disciplinary Techniques Major challenges The data mining process History of data mining Data mining
From Raw Data to. Actionable Insights with. MATLAB Analytics. Learn more. Develop predictive models. 1Access and explore data
100 001 010 111 From Raw Data to 10011100 Actionable Insights with 00100111 MATLAB Analytics 01011100 11100001 1 Access and Explore Data For scientists the problem is not a lack of available but a deluge.
Sanjeev Kumar. contribute
RESEARCH ISSUES IN DATAA MINING Sanjeev Kumar I.A.S.R.I., Library Avenue, Pusa, New Delhi-110012 [email protected] 1. Introduction The field of data mining and knowledgee discovery is emerging as a
How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning
How to use Big Data in Industry 4.0 implementations LAURI ILISON, PhD Head of Big Data and Machine Learning Big Data definition? Big Data is about structured vs unstructured data Big Data is about Volume
PoS(ICRC2015)865. FACT-Tools Streamed Real-Time Data Analysis
K. A. Brügge b, M. L. Ahnen a, M. Balbo c, M. Bergmann d, A. Biland a, C. Bockermann e, T. Bretz a, J. Buss b, D. Dorner d, S. Einecke b, J. Freiwald b, C. Hempfling d, D. Hildebrand a, G. Hughes a, W.
Social Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015
An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content
Data Mining. Nonlinear Classification
Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15
Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches
Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches PhD Thesis by Payam Birjandi Director: Prof. Mihai Datcu Problematic
Data Mining Practical Machine Learning Tools and Techniques
Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea
Data Mining for Knowledge Management. Classification
1 Data Mining for Knowledge Management Classification Themis Palpanas University of Trento http://disi.unitn.eu/~themis Data Mining for Knowledge Management 1 Thanks for slides to: Jiawei Han Eamonn Keogh
Azure Machine Learning, SQL Data Mining and R
Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:
COPYRIGHTED MATERIAL. Contents. List of Figures. Acknowledgments
Contents List of Figures Foreword Preface xxv xxiii xv Acknowledgments xxix Chapter 1 Fraud: Detection, Prevention, and Analytics! 1 Introduction 2 Fraud! 2 Fraud Detection and Prevention 10 Big Data for
Maschinelles Lernen mit MATLAB
Maschinelles Lernen mit MATLAB Jérémy Huard Applikationsingenieur The MathWorks GmbH 2015 The MathWorks, Inc. 1 Machine Learning is Everywhere Image Recognition Speech Recognition Stock Prediction Medical
Image Compression through DCT and Huffman Coding Technique
International Journal of Current Engineering and Technology E-ISSN 2277 4106, P-ISSN 2347 5161 2015 INPRESSCO, All Rights Reserved Available at http://inpressco.com/category/ijcet Research Article Rahul
Why is Internal Audit so Hard?
Why is Internal Audit so Hard? 2 2014 Why is Internal Audit so Hard? 3 2014 Why is Internal Audit so Hard? Waste Abuse Fraud 4 2014 Waves of Change 1 st Wave Personal Computers Electronic Spreadsheets
Robot Perception Continued
Robot Perception Continued 1 Visual Perception Visual Odometry Reconstruction Recognition CS 685 11 Range Sensing strategies Active range sensors Ultrasound Laser range sensor Slides adopted from Siegwart
Fast Analytics on Big Data with H20
Fast Analytics on Big Data with H20 0xdata.com, h2o.ai Tomas Nykodym, Petr Maj Team About H2O and 0xdata H2O is a platform for distributed in memory predictive analytics and machine learning Pure Java,
ISSN: 2320-1363 CONTEXTUAL ADVERTISEMENT MINING BASED ON BIG DATA ANALYTICS
CONTEXTUAL ADVERTISEMENT MINING BASED ON BIG DATA ANALYTICS A.Divya *1, A.M.Saravanan *2, I. Anette Regina *3 MPhil, Research Scholar, Muthurangam Govt. Arts College, Vellore, Tamilnadu, India Assistant
International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 3, May-June 2015
RESEARCH ARTICLE OPEN ACCESS Data Mining Technology for Efficient Network Security Management Ankit Naik [1], S.W. Ahmad [2] Student [1], Assistant Professor [2] Department of Computer Science and Engineering
Advanced analytics at your hands
2.3 Advanced analytics at your hands Neural Designer is the most powerful predictive analytics software. It uses innovative neural networks techniques to provide data scientists with results in a way previously
Knowledge Discovery and Data Mining. Structured vs. Non-Structured Data
Knowledge Discovery and Data Mining Unit # 2 1 Structured vs. Non-Structured Data Most business databases contain structured data consisting of well-defined fields with numeric or alphanumeric values.
Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning
Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning SAMSI 10 May 2013 Outline Introduction to NMF Applications Motivations NMF as a middle step
SIPAC. Signals and Data Identification, Processing, Analysis, and Classification
SIPAC Signals and Data Identification, Processing, Analysis, and Classification Framework for Mass Data Processing with Modules for Data Storage, Production and Configuration SIPAC key features SIPAC is
High Productivity Data Processing Analytics Methods with Applications
High Productivity Data Processing Analytics Methods with Applications Dr. Ing. Morris Riedel et al. Adjunct Associate Professor School of Engineering and Natural Sciences, University of Iceland Research
Discovery of neutrino oscillations
INSTITUTE OF PHYSICS PUBLISHING Rep. Prog. Phys. 69 (2006) 1607 1635 REPORTS ON PROGRESS IN PHYSICS doi:10.1088/0034-4885/69/6/r01 Discovery of neutrino oscillations Takaaki Kajita Research Center for
Advanced In-Database Analytics
Advanced In-Database Analytics Tallinn, Sept. 25th, 2012 Mikko-Pekka Bertling, BDM Greenplum EMEA 1 That sounds complicated? 2 Who can tell me how best to solve this 3 What are the main mathematical functions??
Data Mining Algorithms Part 1. Dejan Sarka
Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka ([email protected]) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses
The accurate calibration of all detectors is crucial for the subsequent data
Chapter 4 Calibration The accurate calibration of all detectors is crucial for the subsequent data analysis. The stability of the gain and offset for energy and time calibration of all detectors involved
8.1 Radio Emission from Solar System objects
8.1 Radio Emission from Solar System objects 8.1.1 Moon and Terrestrial planets At visible wavelengths all the emission seen from these objects is due to light reflected from the sun. However at radio
Big Data Mining Services and Knowledge Discovery Applications on Clouds
Big Data Mining Services and Knowledge Discovery Applications on Clouds Domenico Talia DIMES, Università della Calabria & DtoK Lab Italy [email protected] Data Availability or Data Deluge? Some decades
COURSE RECOMMENDER SYSTEM IN E-LEARNING
International Journal of Computer Science and Communication Vol. 3, No. 1, January-June 2012, pp. 159-164 COURSE RECOMMENDER SYSTEM IN E-LEARNING Sunita B Aher 1, Lobo L.M.R.J. 2 1 M.E. (CSE)-II, Walchand
Statistics for BIG data
Statistics for BIG data Statistics for Big Data: Are Statisticians Ready? Dennis Lin Department of Statistics The Pennsylvania State University John Jordan and Dennis K.J. Lin (ICSA-Bulletine 2014) Before
Big Data. Introducción. Santiago González <[email protected]>
Big Data Introducción Santiago González Contenidos Por que BIG DATA? Características de Big Data Tecnologías y Herramientas Big Data Paradigmas fundamentales Big Data Data Mining
Comparison of Data Mining Techniques used for Financial Data Analysis
Comparison of Data Mining Techniques used for Financial Data Analysis Abhijit A. Sawant 1, P. M. Chawan 2 1 Student, 2 Associate Professor, Department of Computer Technology, VJTI, Mumbai, INDIA Abstract
Practical Data Science with Azure Machine Learning, SQL Data Mining, and R
Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Overview This 4-day class is the first of the two data science courses taught by Rafal Lukawiecki. Some of the topics will be
The OPERA Emulsions. Jan Lenkeit. Hamburg Student Seminar, 12 June 2008. Institut für Experimentalphysik Forschungsgruppe Neutrinophysik
The OPERA Emulsions Jan Lenkeit Institut für Experimentalphysik Forschungsgruppe Neutrinophysik Hamburg Student Seminar, 12 June 2008 1/43 Outline The OPERA experiment Nuclear emulsions The OPERA emulsions
Distributed forests for MapReduce-based machine learning
Distributed forests for MapReduce-based machine learning Ryoji Wakayama, Ryuei Murata, Akisato Kimura, Takayoshi Yamashita, Yuji Yamauchi, Hironobu Fujiyoshi Chubu University, Japan. NTT Communication
Data Centric Systems (DCS)
Data Centric Systems (DCS) Architecture and Solutions for High Performance Computing, Big Data and High Performance Analytics High Performance Computing with Data Centric Systems 1 Data Centric Systems
COMP3420: Advanced Databases and Data Mining. Classification and prediction: Introduction and Decision Tree Induction
COMP3420: Advanced Databases and Data Mining Classification and prediction: Introduction and Decision Tree Induction Lecture outline Classification versus prediction Classification A two step process Supervised
Learning from Big Data in
Learning from Big Data in Astronomy an overview Kirk Borne George Mason University School of Physics, Astronomy, & Computational Sciences http://spacs.gmu.edu/ From traditional astronomy 2 to Big Data
Mining Direct Marketing Data by Ensembles of Weak Learners and Rough Set Methods
Mining Direct Marketing Data by Ensembles of Weak Learners and Rough Set Methods Jerzy B laszczyński 1, Krzysztof Dembczyński 1, Wojciech Kot lowski 1, and Mariusz Paw lowski 2 1 Institute of Computing
Master of Science in Computer Science
Master of Science in Computer Science Background/Rationale The MSCS program aims to provide both breadth and depth of knowledge in the concepts and techniques related to the theory, design, implementation,
Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 2015-03-05
Ensemble Methods Knowledge Discovery and Data Mining 2 (VU) (707004) Roman Kern KTI, TU Graz 2015-03-05 Roman Kern (KTI, TU Graz) Ensemble Methods 2015-03-05 1 / 38 Outline 1 Introduction 2 Classification
SEIZE THE DATA. 2015 SEIZE THE DATA. 2015
1 Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. BIG DATA CONFERENCE 2015 Boston August 10-13 Predicting and reducing deforestation
Knowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Unit # 11 Sajjad Haider Fall 2013 1 Supervised Learning Process Data Collection/Preparation Data Cleaning Discretization Supervised/Unuspervised Identification of right
Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski [email protected]
Introduction to Machine Learning and Data Mining Prof. Dr. Igor Trajkovski [email protected] Ensembles 2 Learning Ensembles Learn multiple alternative definitions of a concept using different training
Getting Even More Out of Ensemble Selection
Getting Even More Out of Ensemble Selection Quan Sun Department of Computer Science The University of Waikato Hamilton, New Zealand [email protected] ABSTRACT Ensemble Selection uses forward stepwise
Introduction to Machine Learning Lecture 1. Mehryar Mohri Courant Institute and Google Research [email protected]
Introduction to Machine Learning Lecture 1 Mehryar Mohri Courant Institute and Google Research [email protected] Introduction Logistics Prerequisites: basics concepts needed in probability and statistics
Big Data Analytics. Chances and Challenges. Volker Markl
Volker Markl Professor and Chair Database Systems and Information Management (DIMA), Technische Universität Berlin www.dima.tu-berlin.de Big Data Analytics Chances and Challenges Volker Markl DIMA BDOD
TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM
TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM Thanh-Nghi Do College of Information Technology, Cantho University 1 Ly Tu Trong Street, Ninh Kieu District Cantho City, Vietnam
Nine Common Types of Data Mining Techniques Used in Predictive Analytics
1 Nine Common Types of Data Mining Techniques Used in Predictive Analytics By Laura Patterson, President, VisionEdge Marketing Predictive analytics enable you to develop mathematical models to help better
Scalable Developments for Big Data Analytics in Remote Sensing
Scalable Developments for Big Data Analytics in Remote Sensing Federated Systems and Data Division Research Group High Productivity Data Processing Dr.-Ing. Morris Riedel et al. Research Group Leader,
Visualisatie BMT. Introduction, visualization, visualization pipeline. Arjan Kok Huub van de Wetering ([email protected])
Visualisatie BMT Introduction, visualization, visualization pipeline Arjan Kok Huub van de Wetering ([email protected]) 1 Lecture overview Goal Summary Study material What is visualization Examples
Calorimetry in particle physics experiments
Calorimetry in particle physics experiments Unit n. 8 Calibration techniques Roberta Arcidiacono Lecture overview Introduction Hardware Calibration Test Beam Calibration In-situ Calibration (EM calorimeters)
Big Data Text Mining and Visualization. Anton Heijs
Copyright 2007 by Treparel Information Solutions BV. This report nor any part of it may be copied, circulated, quoted without prior written approval from Treparel7 Treparel Information Solutions BV Delftechpark
Neutron Stars. How were neutron stars discovered? The first neutron star was discovered by 24-year-old graduate student Jocelyn Bell in 1967.
Neutron Stars How were neutron stars discovered? The first neutron star was discovered by 24-year-old graduate student Jocelyn Bell in 1967. Using a radio telescope she noticed regular pulses of radio
MS1b Statistical Data Mining
MS1b Statistical Data Mining Yee Whye Teh Department of Statistics Oxford http://www.stats.ox.ac.uk/~teh/datamining.html Outline Administrivia and Introduction Course Structure Syllabus Introduction to
HUAWEI Advanced Data Science with Spark Streaming. Albert Bifet (@abifet)
HUAWEI Advanced Data Science with Spark Streaming Albert Bifet (@abifet) Huawei Noah s Ark Lab Focus Intelligent Mobile Devices Data Mining & Artificial Intelligence Intelligent Telecommunication Networks
Statistical Challenges with Big Data in Management Science
Statistical Challenges with Big Data in Management Science Arnab Kumar Laha Indian Institute of Management Ahmedabad Analytics vs Reporting Competitive Advantage Reporting Prescriptive Analytics (Decision
Supervised Feature Selection & Unsupervised Dimensionality Reduction
Supervised Feature Selection & Unsupervised Dimensionality Reduction Feature Subset Selection Supervised: class labels are given Select a subset of the problem features Why? Redundant features much or
Data Mining - Evaluation of Classifiers
Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010
Dr. Raju Namburu Computational Sciences Campaign U.S. Army Research Laboratory. The Nation s Premier Laboratory for Land Forces UNCLASSIFIED
Dr. Raju Namburu Computational Sciences Campaign U.S. Army Research Laboratory 21 st Century Research Continuum Theory Theory embodied in computation Hypotheses tested through experiment SCIENTIFIC METHODS
Ionospheric Research with the LOFAR Telescope
Ionospheric Research with the LOFAR Telescope Leszek P. Błaszkiewicz Faculty of Mathematics and Computer Science, UWM Olsztyn LOFAR - The LOw Frequency ARray The LOFAR interferometer consist of a large
Data Mining: Overview. What is Data Mining?
Data Mining: Overview What is Data Mining? Recently * coined term for confluence of ideas from statistics and computer science (machine learning and database methods) applied to large databases in science,
A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier
A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier G.T. Prasanna Kumari Associate Professor, Dept of Computer Science and Engineering, Gokula Krishna College of Engg, Sullurpet-524121,
Supervised Learning (Big Data Analytics)
Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used
Pentaho Data Mining Last Modified on January 22, 2007
Pentaho Data Mining Copyright 2007 Pentaho Corporation. Redistribution permitted. All trademarks are the property of their respective owners. For the latest information, please visit our web site at www.pentaho.org
Active Learning SVM for Blogs recommendation
Active Learning SVM for Blogs recommendation Xin Guan Computer Science, George Mason University Ⅰ.Introduction In the DH Now website, they try to review a big amount of blogs and articles and find the
White Paper. How Streaming Data Analytics Enables Real-Time Decisions
White Paper How Streaming Data Analytics Enables Real-Time Decisions Contents Introduction... 1 What Is Streaming Analytics?... 1 How Does SAS Event Stream Processing Work?... 2 Overview...2 Event Stream
PROGRAM DIRECTOR: Arthur O Connor Email Contact: URL : THE PROGRAM Careers in Data Analytics Admissions Criteria CURRICULUM Program Requirements
Data Analytics (MS) PROGRAM DIRECTOR: Arthur O Connor CUNY School of Professional Studies 101 West 31 st Street, 7 th Floor New York, NY 10001 Email Contact: Arthur O Connor, [email protected] URL:
The VHE future. A. Giuliani
The VHE future A. Giuliani 1 The Cherenkov Telescope Array Low energy Medium energy High energy 4.5o FoV 7o FoV 10o FoV 2000 pixels 2000 pixels 2000 pixels ~ 0.1 ~ 0.18 ~ 0.2-0.3 The Cherenkov Telescope
REVIEW OF ENSEMBLE CLASSIFICATION
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IJCSMC, Vol. 2, Issue.
Data, Measurements, Features
Data, Measurements, Features Middle East Technical University Dep. of Computer Engineering 2009 compiled by V. Atalay What do you think of when someone says Data? We might abstract the idea that data are
An Overview of Knowledge Discovery Database and Data mining Techniques
An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,
Doctor of Philosophy in Computer Science
Doctor of Philosophy in Computer Science Background/Rationale The program aims to develop computer scientists who are armed with methods, tools and techniques from both theoretical and systems aspects
IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH
IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH Kalinka Mihaylova Kaloyanova St. Kliment Ohridski University of Sofia, Faculty of Mathematics and Informatics Sofia 1164, Bulgaria
Big Data: Image & Video Analytics
Big Data: Image & Video Analytics How it could support Archiving & Indexing & Searching Dieter Haas, IBM Deutschland GmbH The Big Data Wave 60% of internet traffic is multimedia content (images and videos)
Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012
Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts
BIG DATA What it is and how to use?
BIG DATA What it is and how to use? Lauri Ilison, PhD Data Scientist 21.11.2014 Big Data definition? There is no clear definition for BIG DATA BIG DATA is more of a concept than precise term 1 21.11.14
Information about the T9 beam line and experimental facilities
Information about the T9 beam line and experimental facilities The incoming proton beam from the PS accelerator impinges on the North target and thus produces the particles for the T9 beam line. The collisions
Machine Learning and Data Mining. Fundamentals, robotics, recognition
Machine Learning and Data Mining Fundamentals, robotics, recognition Machine Learning, Data Mining, Knowledge Discovery in Data Bases Their mutual relations Data Mining, Knowledge Discovery in Databases,
not possible or was possible at a high cost for collecting the data.
Data Mining and Knowledge Discovery Generating knowledge from data Knowledge Discovery Data Mining White Paper Organizations collect a vast amount of data in the process of carrying out their day-to-day
