Big Data: Quelques Enjeux Techniques
|
|
|
- Lizbeth Sullivan
- 10 years ago
- Views:
Transcription
1 Big Data: Quelques Enjeux Techniques Essai de Typologie des Problèmes de Big Analytics J.F. Marcotorchino VP, Scientific Director, GBU SIX Thales Communications & Security
2 The information contained in this document and any attachments are the property of THALES. You are hereby notified that any review, dissemination, distribution, copying or otherwise use of this document is strictly prohibited without Thales prior written approval. THALES Template trtp version / Split BIG DATA/BIG ANALYTICS
3 3 / Definitions Big Data: All the technologies and techniques that help scaling Large File Storage (virtual) Distributed processing (Hadoop) / Map-reduce NoSQL databases / simple & complex query Big Analytics: Techniques that are executed on a BigData infrastructure and have the following properties: Adaptation of ad hoc techniques (statistics-learning) to this environment Scales Linearly (O(N) or O(NLog(N)) order of magnitude or subject to heavy potential parallelization Linearization is mandatory either at criteria level or at constraints polytopes level Use special type of learning techniques through dimensions reduction. The information contained in this document and any attachments are the property of THALES. You are hereby notified that any review, dissemination, distribution, copying or otherwise use of this document is strictly prohibited without Thales prior written approval. THALES Template trtp version 7.0.8
4 4 / Les 4 V The 4 V Challenge Volume : Large Storage Capacity are available now NAS type (Network Attached Storage): Virtualized Storage Cloud Computing Velocity: Large Demand for Immediate results Stream Analytics for SEP/ CEP (Stream &Complex event processing) In memory Computations adapted to Key-Value stores Variety: Large Diversity of Heterogeneous Data Types Structured Data (classical DB entries) or Semi Structureed Data (Images with meta data added) Unstructured Data: Text, Speech, Raw Images etc Value: Intrinsic Value of the couple «Data/Information» is now recognized by Business companies la (((*valeur «α N» (α entier) on doit répartir les calculs sur α machines pour conserver The information contained in this document and any attachments are the property of THALES. You are hereby notified that any review, dissemination, distribution, copying or otherwise use of this document is strictly prohibited without Thales prior written approval. THALES Template trtp version 7.0.8
5 5 / Some Confusions to Avoid Do not confound : Combinatorial Complexity vs Indexing complexity, difficulty of IT computations vs the management of huge data volumes (HPC vs BIG DATA) In the first case: It is not the data amount per se which is a drawback, but the intrinsic combinatorial structure of the problem to solve: Example: solutions (Berendt -Tassa estimate 2010) to explore for clustering a set of N=10000 objects or individuals. Nevertheless N=10000 is not a huge amount In the second case: It is the data amount itself which poses a problem, through the structure of the indexing and storing architectures. (Difficulty due to the scalability constraints) The information contained in this document and any attachments are the property of THALES. You are hereby notified that any review, dissemination, distribution, copying or otherwise use of this document is strictly prohibited without Thales prior written approval. THALES Template trtp version 7.0.8
6 6 / How to address Scalability Problems Scalability by «Linearization» VS Scalability by «Parallelization» The information contained in this document and any attachments are the property of THALES. You are hereby notified that any review, dissemination, distribution, copying or otherwise use of this document is strictly prohibited without Thales prior written approval. THALES Template trtp version In the First Mode : If for a population of N objects the needed computing time is T, in case of a linear algorithm it will take a computing time αt if the population size jumps up from N to α N. In the Second Mode: If an algorithm dedicated to a population size N can be processed on a SINGLE machine within a time T, then if the la population scales up to α N (α integer ), computations can be distributed on «α» machines to keep a computing time equal to : T Combination of both modes is the best possible approach (if suitable)
7 An Operational Characterization of Big Analytics Methods Big Data Analytics : «Extended» VS «Intrinsic» cases «Extended» Case: Possible use of the NoSQL storing architectures, or new SQL ones Exhaustive Analysis of the whole data set is not mandatory at all «Analytic Sampling» or «Big Sampling» are sufficient in most cases: e.g: Customers Segmentation, CRM, Cross selling, Churn & Attrition Analysis, Intrusions Analysis or HUMS (Health & Usage Monitoring Systems). The remaining set of the population except «samples» is processed by «inferential segmentation» or by «linear assignment»
8 An Operational Characterization of Big Analytics Methods Big Data Analytics : «Extended» VS «Intrinsic» cases «Intrinsic» Case: It is mandatory to rely on the full data set (exhaustivity ), even if avoiding to do it, is still remaining a research topic No a priori knowledge, or partial knowledge of the population structure Data are stored through NoSQL architectures using the adequate correspondence formats (example for graphs DB: Neo4j, FlockDB ( open source distributed, fault-tolerant graph database for managing data at scale., chosen by Twitter) To manage the exhaustivity constraint, obligation to use heuristics or meta heuristics based upon linear iterations, or parallelization through distributed computations
9 Some NoSQL DB Types Amazon) DynamoDB (Amazon Key Value Stores Column Oriented DB (Faceboo Facebook) BigTable (Google Google) Document Oriented DB Complex grows like E Rel E = nb. of Entities Rel = average relationships / entity Infinity DB Graph Data Bases Neo4j
10 BIG DATA CONCEPTUAL FOUNDATIONS [Brewer CAP Assignment] Availability Voldemort CA AP It is impossible to satisfy the 3 items choose 2 CouchDB Consistancy CP Partition Tolerence HBase direction ou services MemcacheDB /Bekerley DB
11 Some ideas for solving Intrinsic Big Analytics approaches Use mainly exhaustive methods (if possible no statistical sampling) (Data Driven vs Hypothesis Driven ) Affinity Analysis & Sequential Patterns (pure linear matchings scalar products) Use Classifiers with linear criteria Practice Iterative Queries R 2 I 2 : Requêtage Récursif Itératif Intelligent (application de deux techniques en alternance: Similarité Régularisée + Clustering «on the fly») Unsupervised Clustering (no a priori) (Extending «No K-Means» approaches using linear relational criteria) Text mining (word spotting) Reticular Data Analysis (Social Nets, Huge IT Networks) Ce document ne peut être reproduit, modifié, adapté, publié, traduit, d'une quelconque façon, en tout ou partie, ni divulgué à un tiers sans l'accord préalable et écrit de Thales THALES 2012 Tous Droits réservés Modèle trtp version Routing procedures, Modularizations, Dynamic Topology
12 The information contained in this document and any attachments are the property of THALES. You are hereby notified that any review, dissemination, distribution, copying or otherwise use of this document is strictly prohibited without Thales prior written approval. THALES Template trtp version / BIG ANALYTICS TYPOLOGY
13 Tentative structuring of Big Analytics Approaches Learning &Neural Nets Self Encoded and Hourglass Shaped Neural Nets Reticular Data Structuring Social Networks Communities detection Level of Problem Complexity MDL Learning Models Learning Model for unsupervised Classif Limited Layers Neural Nets Supervised Rule Based Classification BiClass SVM Naïve Bayes Networks Multi Classes SVM MOLAP and XOLAP Classical BI Data Mining Image & Video Analytics Unsupervised Clustering Reticular Visual Analytics Parallel Coordinates Large Networks Topological Design Faces &Pattern Recognition Piecewise Linear Regression Sequential Patterns Recognition & Affinity Analysis Vector Matching Structuring Lack of Population Knowledge
14 An Example of Intrinsic Big Analytics Problem: Graphs Modularity Krebs Graph on American Politics S. Mandal (MIT) Liberal Centrist Conservative Girvan-Newman s Quadratic formulation modularity of network is the number of edges falling within groups minus the expected number in an equivalent network with edges placed at random. ( Deviation to Independence ) Maximizing modularity rigorously may be NP-hard Use heuristic approaches MIT Heuristic Algo: Construct the modularity matrix and find its largest eigenvalue and eigenvector Partition network into two parts based on signs of elements in the largest eigenvector Repeat for each part If a proposed split does not cause modularity to increase, declare subgraph indivisible and do not split it When entire graph consists of indivisible subgraphs, stop Typical running time O(N 2 log N) for a sparse graph Ce document ne peut être reproduit, modifié, adapté, publié, traduit, d'une quelconque façon, en tout ou partie, ni divulgué à un tiers sans l'accord préalable et écrit de Thales THALES 2012 Tous Droits réservés Modèle trtp version 7.1.0
15 By relational transform we turn the criterion into a linear function subject to linear constraints X ij X ji = 0 (i,j) (Symmetry) X ii = 1 i (Reflexivity) X ij + X jk X ik 1 (i,j,k) (Transitivity) X ij {0,1} (Binarity) Idea : relying on the locally linear «Louvain» algorithm (Blondel- Guillaume) (Univ Louvain/UPMC LIP6), use the Linear Relational Form O(N LogN ) We can do more: using the genericity of the Louvain s algo we can use better linear criteria than the Girvan-Newman s one based on Optimal Transport justifications e.g:«deviation to Indetermination» (Patricia Conde- Cespèdes ) Ce document ne peut être reproduit, modifié, adapté, publié, traduit, d'une quelconque façon, en tout ou partie, ni divulgué à un tiers sans l'accord préalable et écrit de Thales THALES 2012 Tous Droits réservés Modèle trtp version 7.1.0
16 Big Analytics :Some Topics of Interest Big Analytics for Cyber-Security Big Analytics for Smart Transport Big Analytics for National Security Big Analytics for maintenance: Components for attack detection and investigation (Intelligent IDS from normalized log analytics, IS passive and dynamic mapping, logs analytics, cyber Intelligence) Attack detection from relational & content data, intelligent IDS and sandbox coupling, Intelligent coupling with IS passive and dynamic mapping Big Data platform for logs analytics, visual analytics Business Analytics Web portal for passenger behaviour and profile understanding, traffic anomaly detection: New components and use cases focused on mobility Approach based on space-time queries, BI, early warning engine, Big Analytics and optimization technics for Smart City Fraud detection Social Web Intelligence for National Security : Cyber-infringement detection and investigation SNA :social mining, crisis management Maritime security: predictive analysis & anomaly detection E-border: Big Analytics on passengers logs applications to vehicle, radar, weapon systems, transport HUMS :(Health & Usage Monitoring Systems) Ce document ne peut être reproduit, modifié, adapté, publié, traduit, d'une quelconque façon, en tout ou partie, ni divulgué à un tiers sans l'accord préalable et écrit de Thales THALES 2012 Tous Droits réservés Modèle trtp version 7.1.0
17 Big Analytics innovation trends at medium range horizon Coupling Auto-Encoders Neural Nets with Predictive Modeling for features extraction Opening the «Data Streaming Processing» (real time) to more sophisticated and powerful analytical tools Towards real life CEP Coupling «Genetic Algorithms» with «Relational linear transforms» Linearization procedures In Networks Analysis, addressing the complexity of dynamic graphs modeling. Dynamic Modularization Ce document ne peut être reproduit, modifié, adapté, publié, traduit, d'une quelconque façon, en tout ou partie, ni divulgué à un tiers sans l'accord préalable et écrit de Thales THALES 2012 Tous Droits réservés Modèle trtp version 7.1.0
Applications for Big Data Analytics
Smarter Healthcare Applications for Big Data Analytics Multi-channel sales Finance Log Analysis Homeland Security Traffic Control Telecom Search Quality Manufacturing Trading Analytics Fraud and Risk Retail:
Challenges for Data Driven Systems
Challenges for Data Driven Systems Eiko Yoneki University of Cambridge Computer Laboratory Quick History of Data Management 4000 B C Manual recording From tablets to papyrus to paper A. Payberah 2014 2
SQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford
SQL VS. NO-SQL Adapted Slides from Dr. Jennifer Widom from Stanford 55 Traditional Databases SQL = Traditional relational DBMS Hugely popular among data analysts Widely adopted for transaction systems
How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning
How to use Big Data in Industry 4.0 implementations LAURI ILISON, PhD Head of Big Data and Machine Learning Big Data definition? Big Data is about structured vs unstructured data Big Data is about Volume
So What s the Big Deal?
So What s the Big Deal? Presentation Agenda Introduction What is Big Data? So What is the Big Deal? Big Data Technologies Identifying Big Data Opportunities Conducting a Big Data Proof of Concept Big Data
Evaluating NoSQL for Enterprise Applications. Dirk Bartels VP Strategy & Marketing
Evaluating NoSQL for Enterprise Applications Dirk Bartels VP Strategy & Marketing Agenda The Real Time Enterprise The Data Gold Rush Managing The Data Tsunami Analytics and Data Case Studies Where to go
Infrastructures for big data
Infrastructures for big data Rasmus Pagh 1 Today s lecture Three technologies for handling big data: MapReduce (Hadoop) BigTable (and descendants) Data stream algorithms Alternatives to (some uses of)
Cloud Scale Distributed Data Storage. Jürmo Mehine
Cloud Scale Distributed Data Storage Jürmo Mehine 2014 Outline Background Relational model Database scaling Keys, values and aggregates The NoSQL landscape Non-relational data models Key-value Document-oriented
Sunnie Chung. Cleveland State University
Sunnie Chung Cleveland State University Data Scientist Big Data Processing Data Mining 2 INTERSECT of Computer Scientists and Statisticians with Knowledge of Data Mining AND Big data Processing Skills:
Data Mining + Business Intelligence. Integration, Design and Implementation
Data Mining + Business Intelligence Integration, Design and Implementation ABOUT ME Vijay Kotu Data, Business, Technology, Statistics BUSINESS INTELLIGENCE - Result Making data accessible Wider distribution
The Data Mining Process
Sequence for Determining Necessary Data. Wrong: Catalog everything you have, and decide what data is important. Right: Work backward from the solution, define the problem explicitly, and map out the data
COMPANY POLICIES TO PREVENT CORRUPTION: A FRENCH EXAMPLE
Ce document ne peut être reproduit, modifié, adapté, publié, traduit, d'une quelconque façon, en tout ou partie, ni divulgué à un tiers sans l'accord préalable et écrit de THALES. 1 THALESGROUP.COM COMPANY
Big Data and Data Science: Behind the Buzz Words
Big Data and Data Science: Behind the Buzz Words Peggy Brinkmann, FCAS, MAAA Actuary Milliman, Inc. April 1, 2014 Contents Big data: from hype to value Deconstructing data science Managing big data Analyzing
ANALYTICS IN BIG DATA ERA
ANALYTICS IN BIG DATA ERA ANALYTICS TECHNOLOGY AND ARCHITECTURE TO MANAGE VELOCITY AND VARIETY, DISCOVER RELATIONSHIPS AND CLASSIFY HUGE AMOUNT OF DATA MAURIZIO SALUSTI SAS Copyr i g ht 2012, SAS Ins titut
ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat
ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web
BIG DATA Alignment of Supply & Demand Nuria de Lama Representative of Atos Research &
BIG DATA Alignment of Supply & Demand Nuria de Lama Representative of Atos Research & Innovation 04-08-2011 to the EC 8 th February, Luxembourg Your Atos business Research technologists. and Innovation
Social Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
Transforming the Telecoms Business using Big Data and Analytics
Transforming the Telecoms Business using Big Data and Analytics Event: ICT Forum for HR Professionals Venue: Meikles Hotel, Harare, Zimbabwe Date: 19 th 21 st August 2015 AFRALTI 1 Objectives Describe
BIG DATA What it is and how to use?
BIG DATA What it is and how to use? Lauri Ilison, PhD Data Scientist 21.11.2014 Big Data definition? There is no clear definition for BIG DATA BIG DATA is more of a concept than precise term 1 21.11.14
Advanced Big Data Analytics with R and Hadoop
REVOLUTION ANALYTICS WHITE PAPER Advanced Big Data Analytics with R and Hadoop 'Big Data' Analytics as a Competitive Advantage Big Analytics delivers competitive advantage in two ways compared to the traditional
Big Data Analytics. An Introduction. Oliver Fuchsberger University of Paderborn 2014
Big Data Analytics An Introduction Oliver Fuchsberger University of Paderborn 2014 Table of Contents I. Introduction & Motivation What is Big Data Analytics? Why is it so important? II. Techniques & Solutions
Big Data-ready, Secure & Sovereign Cloud
Copernicus Big Data Workshop Big Data-ready, Secure & Sovereign Cloud A Technology Enabler for Copernicus Data Innovation March 14 th, 2014 Brussels F. BOUJEMAA R&D Manager E. MICONNET - Head of Cyber
Azure Machine Learning, SQL Data Mining and R
Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:
SURVEY REPORT DATA SCIENCE SOCIETY 2014
SURVEY REPORT DATA SCIENCE SOCIETY 2014 TABLE OF CONTENTS Contents About the Initiative 1 Report Summary 2 Participants Info 3 Participants Expertise 6 Suggested Discussion Topics 7 Selected Responses
Hadoop SNS. renren.com. Saturday, December 3, 11
Hadoop SNS renren.com Saturday, December 3, 11 2.2 190 40 Saturday, December 3, 11 Saturday, December 3, 11 Saturday, December 3, 11 Saturday, December 3, 11 Saturday, December 3, 11 Saturday, December
Role of Social Networking in Marketing using Data Mining
Role of Social Networking in Marketing using Data Mining Mrs. Saroj Junghare Astt. Professor, Department of Computer Science and Application St. Aloysius College, Jabalpur, Madhya Pradesh, India Abstract:
Not Relational Models For The Management of Large Amount of Astronomical Data. Bruno Martino (IASI/CNR), Memmo Federici (IAPS/INAF)
Not Relational Models For The Management of Large Amount of Astronomical Data Bruno Martino (IASI/CNR), Memmo Federici (IAPS/INAF) What is a DBMS A Data Base Management System is a software infrastructure
Hadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, 2012. Viswa Sharma Solutions Architect Tata Consultancy Services
Hadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, 2012 Viswa Sharma Solutions Architect Tata Consultancy Services 1 Agenda What is Hadoop Why Hadoop? The Net Generation is here Sizing the
Composite Data Virtualization Composite Data Virtualization And NOSQL Data Stores
Composite Data Virtualization Composite Data Virtualization And NOSQL Data Stores Composite Software October 2010 TABLE OF CONTENTS INTRODUCTION... 3 BUSINESS AND IT DRIVERS... 4 NOSQL DATA STORES LANDSCAPE...
HPC ABDS: The Case for an Integrating Apache Big Data Stack
HPC ABDS: The Case for an Integrating Apache Big Data Stack with HPC 1st JTC 1 SGBD Meeting SDSC San Diego March 19 2014 Judy Qiu Shantenu Jha (Rutgers) Geoffrey Fox [email protected] http://www.infomall.org
The University of Jordan
The University of Jordan Master in Web Intelligence Non Thesis Department of Business Information Technology King Abdullah II School for Information Technology The University of Jordan 1 STUDY PLAN MASTER'S
ANALYTICS CENTER LEARNING PROGRAM
Overview of Curriculum ANALYTICS CENTER LEARNING PROGRAM The following courses are offered by Analytics Center as part of its learning program: Course Duration Prerequisites 1- Math and Theory 101 - Fundamentals
Integrating Big Data into the Computing Curricula
Integrating Big Data into the Computing Curricula Yasin Silva, Suzanne Dietrich, Jason Reed, Lisa Tsosie Arizona State University http://www.public.asu.edu/~ynsilva/ibigdata/ 1 Overview Motivation Big
BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON
BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON Overview * Introduction * Multiple faces of Big Data * Challenges of Big Data * Cloud Computing
The 4 Pillars of Technosoft s Big Data Practice
beyond possible Big Use End-user applications Big Analytics Visualisation tools Big Analytical tools Big management systems The 4 Pillars of Technosoft s Big Practice Overview Businesses have long managed
OLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP
Data Warehousing and End-User Access Tools OLAP and Data Mining Accompanying growth in data warehouses is increasing demands for more powerful access tools providing advanced analytical capabilities. Key
Information Management course
Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 01 : 06/10/2015 Practical informations: Teacher: Alberto Ceselli ([email protected])
BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES
BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES Relational vs. Non-Relational Architecture Relational Non-Relational Rational Predictable Traditional Agile Flexible Modern 2 Agenda Big Data
Industry 4.0 and Big Data
Industry 4.0 and Big Data Marek Obitko, [email protected] Senior Research Engineer 03/25/2015 PUBLIC PUBLIC - 5058-CO900H 2 Background Joint work with Czech Institute of Informatics, Robotics and
Practical Data Science with Azure Machine Learning, SQL Data Mining, and R
Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Overview This 4-day class is the first of the two data science courses taught by Rafal Lukawiecki. Some of the topics will be
CS Master Level Courses and Areas COURSE DESCRIPTIONS. CSCI 521 Real-Time Systems. CSCI 522 High Performance Computing
CS Master Level Courses and Areas The graduate courses offered may change over time, in response to new developments in computer science and the interests of faculty and students; the list of graduate
Network Machine Learning Research Group. Intended status: Informational October 19, 2015 Expires: April 21, 2016
Network Machine Learning Research Group S. Jiang Internet-Draft Huawei Technologies Co., Ltd Intended status: Informational October 19, 2015 Expires: April 21, 2016 Abstract Network Machine Learning draft-jiang-nmlrg-network-machine-learning-00
Data Mining Analytics for Business Intelligence and Decision Support
Data Mining Analytics for Business Intelligence and Decision Support Chid Apte, T.J. Watson Research Center, IBM Research Division Knowledge Discovery and Data Mining (KDD) techniques are used for analyzing
Problem Solving Hands-on Labware for Teaching Big Data Cybersecurity Analysis
, 22-24 October, 2014, San Francisco, USA Problem Solving Hands-on Labware for Teaching Big Data Cybersecurity Analysis Teng Zhao, Kai Qian, Dan Lo, Minzhe Guo, Prabir Bhattacharya, Wei Chen, and Ying
2015 Analyst and Advisor Summit. Advanced Data Analytics Dr. Rod Fontecilla Vice President, Application Services, Chief Data Scientist
2015 Analyst and Advisor Summit Advanced Data Analytics Dr. Rod Fontecilla Vice President, Application Services, Chief Data Scientist Agenda Key Facts Offerings and Capabilities Case Studies When to Engage
Big Data and Analytics: Getting Started with ArcGIS. Mike Park Erik Hoel
Big Data and Analytics: Getting Started with ArcGIS Mike Park Erik Hoel Agenda Overview of big data Distributed computation User experience Data management Big data What is it? Big Data is a loosely defined
Statistics for BIG data
Statistics for BIG data Statistics for Big Data: Are Statisticians Ready? Dennis Lin Department of Statistics The Pennsylvania State University John Jordan and Dennis K.J. Lin (ICSA-Bulletine 2014) Before
Mining Large Datasets: Case of Mining Graph Data in the Cloud
Mining Large Datasets: Case of Mining Graph Data in the Cloud Sabeur Aridhi PhD in Computer Science with Laurent d Orazio, Mondher Maddouri and Engelbert Mephu Nguifo 16/05/2014 Sabeur Aridhi Mining Large
NoSQL Data Base Basics
NoSQL Data Base Basics Course Notes in Transparency Format Cloud Computing MIRI (CLC-MIRI) UPC Master in Innovation & Research in Informatics Spring- 2013 Jordi Torres, UPC - BSC www.jorditorres.eu HDFS
Massive Cloud Auditing using Data Mining on Hadoop
Massive Cloud Auditing using Data Mining on Hadoop Prof. Sachin Shetty CyberBAT Team, AFRL/RIGD AFRL VFRP Tennessee State University Outline Massive Cloud Auditing Traffic Characterization Distributed
Predictive Analytics. Noam Zeigerson, CTO
Predictive Analytics Noam Zeigerson, CTO Agenda The Predictive Analytics Need Innovative Technologies Business Solutions The problem: Inconsistent stream of revenue Available Data Sources ERP data Web
Reference Architecture, Requirements, Gaps, Roles
Reference Architecture, Requirements, Gaps, Roles The contents of this document are an excerpt from the brainstorming document M0014. The purpose is to show how a detailed Big Data Reference Architecture
Using Data Mining for Mobile Communication Clustering and Characterization
Using Data Mining for Mobile Communication Clustering and Characterization A. Bascacov *, C. Cernazanu ** and M. Marcu ** * Lasting Software, Timisoara, Romania ** Politehnica University of Timisoara/Computer
Big Data: Opportunities & Challenges, Myths & Truths 資 料 來 源 : 台 大 廖 世 偉 教 授 課 程 資 料
Big Data: Opportunities & Challenges, Myths & Truths 資 料 來 源 : 台 大 廖 世 偉 教 授 課 程 資 料 美 國 13 歲 學 生 用 Big Data 找 出 霸 淩 熱 點 Puri 架 設 網 站 Bullyvention, 藉 由 分 析 Twitter 上 找 出 提 到 跟 霸 凌 相 關 的 詞, 搭 配 地 理 位 置
BIG DATA IN BUSINESS ENVIRONMENT
Scientific Bulletin Economic Sciences, Volume 14/ Issue 1 BIG DATA IN BUSINESS ENVIRONMENT Logica BANICA 1, Alina HAGIU 2 1 Faculty of Economics, University of Pitesti, Romania [email protected] 2 Faculty
Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84
Index A Amazon Web Services (AWS), 50, 58 Analytics engine, 21 22 Apache Kafka, 38, 131 Apache S4, 38, 131 Apache Sqoop, 37, 131 Appliance pattern, 104 105 Application architecture, big data analytics
Advanced In-Database Analytics
Advanced In-Database Analytics Tallinn, Sept. 25th, 2012 Mikko-Pekka Bertling, BDM Greenplum EMEA 1 That sounds complicated? 2 Who can tell me how best to solve this 3 What are the main mathematical functions??
Big Data Storage Architecture Design in Cloud Computing
Big Data Storage Architecture Design in Cloud Computing Xuebin Chen 1, Shi Wang 1( ), Yanyan Dong 1, and Xu Wang 2 1 College of Science, North China University of Science and Technology, Tangshan, Hebei,
Data-intensive HPC: opportunities and challenges. Patrick Valduriez
Data-intensive HPC: opportunities and challenges Patrick Valduriez Big Data Landscape Multi-$billion market! Big data = Hadoop = MapReduce? No one-size-fits-all solution: SQL, NoSQL, MapReduce, No standard,
Keywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance.
Volume 4, Issue 11, November 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Analytics
Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database
Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica
UPS battery remote monitoring system in cloud computing
, pp.11-15 http://dx.doi.org/10.14257/astl.2014.53.03 UPS battery remote monitoring system in cloud computing Shiwei Li, Haiying Wang, Qi Fan School of Automation, Harbin University of Science and Technology
The Big Data Paradigm Shift. Insight Through Automation
The Big Data Paradigm Shift Insight Through Automation Agenda The Problem Emcien s Solution: Algorithms solve data related business problems How Does the Technology Work? Case Studies 2013 Emcien, Inc.
Analyze It use cases in telecom & healthcare
Analyze It use cases in telecom & healthcare Chung Min Chen, VP of Data Science The views and opinions expressed in this presentation are those of the author and do not necessarily reflect the position
The Impact of Big Data on Classic Machine Learning Algorithms. Thomas Jensen, Senior Business Analyst @ Expedia
The Impact of Big Data on Classic Machine Learning Algorithms Thomas Jensen, Senior Business Analyst @ Expedia Who am I? Senior Business Analyst @ Expedia Working within the competitive intelligence unit
Introduction to Big Data Analytics p. 1 Big Data Overview p. 2 Data Structures p. 5 Analyst Perspective on Data Repositories p.
Introduction p. xvii Introduction to Big Data Analytics p. 1 Big Data Overview p. 2 Data Structures p. 5 Analyst Perspective on Data Repositories p. 9 State of the Practice in Analytics p. 11 BI Versus
Course 803401 DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization
Oman College of Management and Technology Course 803401 DSS Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization CS/MIS Department Information Sharing
Distributed Computing and Big Data: Hadoop and MapReduce
Distributed Computing and Big Data: Hadoop and MapReduce Bill Keenan, Director Terry Heinze, Architect Thomson Reuters Research & Development Agenda R&D Overview Hadoop and MapReduce Overview Use Case:
Big Data Technologies Compared June 2014
Big Data Technologies Compared June 2014 Agenda What is Big Data Big Data Technology Comparison Summary Other Big Data Technologies Questions 2 What is Big Data by Example The SKA Telescope is a new development
AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW
AGENDA What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story Hadoop PDW Our BIG DATA Roadmap BIG DATA? Volume 59% growth in annual WW information 1.2M Zetabytes (10 21 bytes) this
This Symposium brought to you by www.ttcus.com
This Symposium brought to you by www.ttcus.com Linkedin/Group: Technology Training Corporation @Techtrain Technology Training Corporation www.ttcus.com Big Data Analytics as a Service (BDAaaS) Big Data
Chapter 7. Using Hadoop Cluster and MapReduce
Chapter 7 Using Hadoop Cluster and MapReduce Modeling and Prototyping of RMS for QoS Oriented Grid Page 152 7. Using Hadoop Cluster and MapReduce for Big Data Problems The size of the databases used in
DATA EXPERTS MINE ANALYZE VISUALIZE. We accelerate research and transform data to help you create actionable insights
DATA EXPERTS We accelerate research and transform data to help you create actionable insights WE MINE WE ANALYZE WE VISUALIZE Domains Data Mining Mining longitudinal and linked datasets from web and other
Data Mining Algorithms Part 1. Dejan Sarka
Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka ([email protected]) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses
Databases 2 (VU) (707.030)
Databases 2 (VU) (707.030) Introduction to NoSQL Denis Helic KMI, TU Graz Oct 14, 2013 Denis Helic (KMI, TU Graz) NoSQL Oct 14, 2013 1 / 37 Outline 1 NoSQL Motivation 2 NoSQL Systems 3 NoSQL Examples 4
Addressing Open Source Big Data, Hadoop, and MapReduce limitations
Addressing Open Source Big Data, Hadoop, and MapReduce limitations 1 Agenda What is Big Data / Hadoop? Limitations of the existing hadoop distributions Going enterprise with Hadoop 2 How Big are Data?
ISSN: 2320-1363 CONTEXTUAL ADVERTISEMENT MINING BASED ON BIG DATA ANALYTICS
CONTEXTUAL ADVERTISEMENT MINING BASED ON BIG DATA ANALYTICS A.Divya *1, A.M.Saravanan *2, I. Anette Regina *3 MPhil, Research Scholar, Muthurangam Govt. Arts College, Vellore, Tamilnadu, India Assistant
NoSQL for SQL Professionals William McKnight
NoSQL for SQL Professionals William McKnight Session Code BD03 About your Speaker, William McKnight President, McKnight Consulting Group Frequent keynote speaker and trainer internationally Consulted to
BIG DATA TOOLS. Top 10 open source technologies for Big Data
BIG DATA TOOLS Top 10 open source technologies for Big Data We are in an ever expanding marketplace!!! With shorter product lifecycles, evolving customer behavior and an economy that travels at the speed
Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January 2015. Email: [email protected] Website: www.qburst.com
Lambda Architecture Near Real-Time Big Data Analytics Using Hadoop January 2015 Contents Overview... 3 Lambda Architecture: A Quick Introduction... 4 Batch Layer... 4 Serving Layer... 4 Speed Layer...
Predictive Analytics Techniques: What to Use For Your Big Data. March 26, 2014 Fern Halper, PhD
Predictive Analytics Techniques: What to Use For Your Big Data March 26, 2014 Fern Halper, PhD Presenter Proven Performance Since 1995 TDWI helps business and IT professionals gain insight about data warehousing,
You should have a working knowledge of the Microsoft Windows platform. A basic knowledge of programming is helpful but not required.
What is this course about? This course is an overview of Big Data tools and technologies. It establishes a strong working knowledge of the concepts, techniques, and products associated with Big Data. Attendees
Detecting Anomalous Behavior with the Business Data Lake. Reference Architecture and Enterprise Approaches.
Detecting Anomalous Behavior with the Business Data Lake Reference Architecture and Enterprise Approaches. 2 Detecting Anomalous Behavior with the Business Data Lake Pivotal the way we see it Reference
W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract
W H I T E P A P E R Deriving Intelligence from Large Data Using Hadoop and Applying Analytics Abstract This white paper is focused on discussing the challenges facing large scale data processing and the
Introduction to Data Mining
Introduction to Data Mining 1 Why Data Mining? Explosive Growth of Data Data collection and data availability Automated data collection tools, Internet, smartphones, Major sources of abundant data Business:
Introduction to Data Mining
Introduction to Data Mining Jay Urbain Credits: Nazli Goharian & David Grossman @ IIT Outline Introduction Data Pre-processing Data Mining Algorithms Naïve Bayes Decision Tree Neural Network Association
MS1b Statistical Data Mining
MS1b Statistical Data Mining Yee Whye Teh Department of Statistics Oxford http://www.stats.ox.ac.uk/~teh/datamining.html Outline Administrivia and Introduction Course Structure Syllabus Introduction to
The 3 questions to ask yourself about BIG DATA
The 3 questions to ask yourself about BIG DATA Do you have a big data problem? Companies looking to tackle big data problems are embarking on a journey that is full of hype, buzz, confusion, and misinformation.
Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization
Turban, Aronson, and Liang Decision Support Systems and Intelligent Systems, Seventh Edition Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization
Machine Learning using MapReduce
Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous
PRACTICAL DATA MINING IN A LARGE UTILITY COMPANY
QÜESTIIÓ, vol. 25, 3, p. 509-520, 2001 PRACTICAL DATA MINING IN A LARGE UTILITY COMPANY GEORGES HÉBRAIL We present in this paper the main applications of data mining techniques at Electricité de France,
BITKOM& NIK - Big Data Wo liegen die Chancen für den Mittelstand?
BITKOM& NIK - Big Data Wo liegen die Chancen für den Mittelstand? The Big Data Buzz big data is a collection of data sets so large and complex that it becomes difficult to process using on-hand database
Lecture Data Warehouse Systems
Lecture Data Warehouse Systems Eva Zangerle SS 2013 PART C: Novel Approaches in DW NoSQL and MapReduce Stonebraker on Data Warehouses Star and snowflake schemas are a good idea in the DW world C-Stores
Analytics in the Cloud. Peter Sirota, GM Elastic MapReduce
Analytics in the Cloud Peter Sirota, GM Elastic MapReduce Data-Driven Decision Making Data is the new raw material for any business on par with capital, people, and labor. What is Big Data? Terabytes of
