Computing Issues for Big Data Theory, Systems, and Applications
|
|
|
- Penelope Webster
- 10 years ago
- Views:
Transcription
1 Computing Issues for Big Data Theory, Systems, and Applications Beihang University Chunming Hu Big Data Summit, with CyberC 2013 October 10, Beijing, China.
2 Bio of Myself Chunming Hu Got my Ph.D in 2006, Beihang University Associate Professor at Institute of Advanced Computing Technology (ACT), School of Computer Science, Beihang University Research Interests Service Computing( ) Grid Computing( ) Cloud Infrastructure and System Virtualization (2008-) Network System Virtualization Cloud-Client Computing Distributed Systems for Data Processing 2
3 Agenda Understanding the Big Data Background Computing Issues (4V 3I) Big Data Research at Beihang University RCBD, lead by Wenfei Fan 973 Project on Big Data, lead by Beihang University Early Experience on Big Data BD-Tractable by Preprocessing Performance Model for Hadoop Distributed Graph Pattern Matching Event Early Detection via Social Data Summary 3
4 Big Data in Cyberspace Large Scale with Rapid Updates Social Networks 4 Micro-blogger Provider in China: 800M Users, 200M tweets everyday, 20M+ Photos. Data doubled every 18 months Data in Cyberspace IDC Report : 2009: 0.8ZB 2012: 2.7 ZB 2020(E): 35ZB IDC Report Internet Search Baidu:1PB log data per Day. Handling 1000PB Google:Processing 20PB data everyday 1PB data in DVD: ~25km 1ZB=1PB 10 6 Airplane 15,000m Chomolung ma 8,800m 4
5 4V Features in Big Data Volume Velocity Variety Value In PB or EB Distributed data Dynamic Changes Updated constantly Heterogeneous Semi-structured or unstructured Biz opportunity Sensitive Data Data Deluge Wikipedia large and complex datasets, which is quite difficult to process using existing data management tools, and traditional data processing applications 5
6 Big Data Changes [Population] Statistical distribution Hypothesis Testing Sample Data Sampling in Statistics The Real World Knowledge New Statistical Theory & Math Tools
7 Big Data Changes [Population] Statistical distribution Hypothesis Testing Sample Data Sampling in Statistics The Real World Knowledge Modal based Prediction Mining Learning Problem -Specific Sample Data Pre-processing [Population ] Multiple Large Datasets Log, Device Capture Human Replay New Statistical Theory & Math Tools New Computing Theory & Algorithm Method? Large Scale Distributed Computing Infrastructure
8 Big Data Changes [Population] Statistical distribution Hypothesis Testing Sample Data Sampling in Statistics The Real World Knowledge Modal based Prediction? Mining Learning How to find Knowledge, rules Problem -Specific Sample Data Pre-processing? Multiple Large Datasets How to Re-sampling, Dimension Reduction? Log, Device Capture Human Replay Data Error? Deviation? New Statistical Theory & Math Tools New Computing Theory & Algorithm Method? Large Scale Distributed Computing Infrastructure
9 Focusing on the Computing 4-V Inexact 用 户 强 交 互 性 跨 多 通 道 快 Datasets are inexact: Noisy, Erros. Target are inexact. Eg. to find the macro trends. Avoid exact result to reduce cost Inexact but acceptable Results
10 Focusing on the Computing 4-V Inexact Incremental 用 户 强 交 互 性 跨 多 通 道 快 Data arrives continuesly Online/Realtime processing Hard to get an Static View of Data Batch/Full data is not enough
11 Focusing on the Computing 4-V Features of Big Data Computing Inexact Incremental Inductive 用 户 强 交 互 性 跨 多 通 道 快 Multi-source Datasets References between Datasets 973 Use the data correlations to adjust the errors Transfer Learning
12 Questions on Big Data Computing Question 1: Is there new Theory for Big Data Computing? Problem Algorithm Data Good: PTIME Bad: NP-Hard Ugly: PSPACE-hard, or EXPTIME-hard, undecidable Undecidable Problem In-feasible Problem approximate Algorithms (in PTIME) Decidable Problem BDuntractabl e Feasible Problem BD- Tractable 12
13 Questions on Big Data Computing Question 2: Is Hadoop a must to data processing? Different Computing requirements, User Scenarios Different Algorithm Design methods MapReduce (OSDI 2004) Handling All Data in a Distributed way Increm ental 3I MR is not the only solution Incremental Computing: Percolator by Google (OSDI 2010) New Algorithm Methods Resampling Query preserving data compression Partial evaluation and distributed processing Top-k query and terminating 13
14 Questions on Big Data Computing Question 3: How to handle the data computing algorithms in an operatable manner? Application specific Features Analysis Data Pattern, Arriving Rate, Query, General Purpose vs. Specific Purpose Domain-specific Knowledges Data Mining and ML Methods Distributed system Offline/Online Batch/Incremental/Streaming In-memory Computing 14
15 Agenda Understanding the Big Data Background Computing Issues (4V 3I) Big Data Research at Beihang University RCBD, lead by Wenfei Fan 973 Project on Big Data, lead by Beihang University Early Experience on Big Data BD-Tractable by Preprocessing Performance Model for Hadoop Distributed Graph Pattern Matching Event Early Detection via Social Data Summary 15
16 RCBD: International Research Centre on Big Data International Research Centre on Big Data (Founded in Sept 2012) Beihang U. U. Edinburgh HKUST U.Pennsylvania Baidu
17 973 Project on Big Data Computing Theory on Big Data ( ) a 973 Project Funding from Ministry of Science & Technology, with 8 institutional organization Focusing on the Computing Issue of Big Data Processing 17
18 973 Project on Big Data WP5.Pilot Applications (Social Data, Internet Search Engine Data) WP4.Data Mining and Analyzing for Big Data WP3.Energy Efficient Distributed Data Processing WP1. Data Model and Understanding (Semantic/Visulization) WP2.Computing Complexity Theory and Algorithms Design 18
19 Agenda Understanding the Big Data Background Computing Issues (4V 3I) Big Data Research at Beihang University RCBD, lead by Wenfei Fan 973 Project on Big Data, lead by Beihang University Early Experience on Big Data BD-Tractable by Preprocessing Performance Model for Hadoop Distributed Graph Pattern Matching Event Early Detection via Social Data Summary 19
20 Some Early Experiences Theory BD-Tractable via Data Preprocessing Systems Performance Model for Hadoop Graph Pattern Matching Applications Using Mood to Detect Sudden Occurrence (Early Event Detection via Social Data) 20
21 BD-Tractable with Preprocessing Polynomial time queries become intractable on big data We want to be able to tell what queries are feasible on big data NP and beyond PTIME BD-tractable not BD-tractable BD-tractable queries: queries feasible on big data 21
22 BD-Tractable with Preprocessing How do we dealing with SQL querys on a large DATABASE? Scan through all the records? NO!! Using Index to get better query performance! B-Tree index, from O(n) to O(logn) Query Optimizations! Two steps of computing Set up the index : preprocessing Doing query on the index 22
23 BD-Tractable with Preprocessing A class Q of queries is BD-tractable if there exists a PTIME preprocessing function such that for any database D on which queries of Q are defined, D = (D) hence D is of polynomial size for all possible queries rewriting Q in Q defined on D, Q(D) can be computed by evaluating Q on D in parallel polylog time (NC) parallel log k ( D, Q ) Q 1 ( (D)) Q 2 ( (D)) D (D) Does it work? If a linear scan of D could be done in log( D ) time: 15 seconds when D is of 1 PB instead of 1.99 days 18 seconds when D is of 1 EB rather than 5.28 years BD-tractable queries are feasible on big data 23
24 Performance Model for Hadoop Accurate performance prediction can help optimize the MR jobs Cost-based scheduling strategies Query Optimization Multi-processing steps in MR Basic Idea: Benchmarking and Profiling on target machine Find out the parameters for the model 24
25 Performance Model for Hadoop User Program Hive Query Cost Predictor Progress Indicator Job Client MRPacker+ Cost Analyzor Parameter Optimizer Cost-based Scheduler JobTracker Other Scheduler Performance Sub System Performancer JHisCollector JSampler TaskTracker CDataCollector JDataCollector Task Task Task HDFS Data Node HDFS Master Computing Node Sub System Modules Hadoop Packages Modified New Data Flow
26 Distributed Graph Pattern Matching Graph patter matching Providing evaluation algorithms and optimizations for graph simulation in a distributed setting 26
27 Early Event Detection via Social Data Social Data reflect the physical world Event Detection via Social Data Event Population Trends Motion plays important role in social media How to detect the user motion through the weibo text?? 27
28 Early Event Detection via Social Data Classification 95 motion icons selected from 1000 icons Use the text with motion icons as the training sets 28
29 Early Event Detection via Social Data Abnormal event detection Mood Search 29
30 icome: a Competition for Big Data 30
31 Agenda Understanding the Big Data Background Computing Issues (4V 3I) Big Data Research at Beihang University RCBD, lead by Wenfei Fan 973 Project on Big Data, lead by Beihang University Early Experience on Big Data BD-Tractable by Preprocessing Performance Model for Hadoop Distributed Graph Pattern Matching Event Early Detection via Social Data Summary 31
32 Summary Big Data: from 4V to 3I Inexact Incremental Inductive Application-Driven Vertical Integration Theory, Algorithm, Distributed Systems, Mining & ML Methods Open Data Community Get more data from industry/government 32
33 Acknowledgement Part of the slides borrowed from Prof. Wenfei Fan at RCBD, Prof. Ke Xu at NLSDE, Beihang University Prof. Shuai Ma, Dr. Xuelian Lin at ACT, Beihang University References Wenfei Fan, FlorisGeerts, Frank Neven. Making Queries Tractable on Big Data with Preprocessing, VLDB Lin Xuelian, MengZide, XuChuan and Wang Meng.A Practical Performance Model for HadoopMapReduce. IEEE International Conference on Cluster Computing Workshops (ClusterW), 2012 Shuai Ma, Yang Cao, JinpengHuai, Tianyu Wo. Distributed Graph Pattern Matching, WWW Shuai Ma, Yang Cao, Wenfei Fan, JinpengHuai, Tianyu Wo. Capturing Topology in Graph Pattern Matching, VLDB Jichang Zhao, Li Dong, Junjie Wu, KeXu: MoodLens: an emoticon-based sentiment analysis system for chinese tweets. KDD
34 Thank You! School of Computer Science, Beihang University Chunming Hu Big Data Summit, with CyberC 2013 October 10, Beijing, China.
Open source Google-style large scale data analysis with Hadoop
Open source Google-style large scale data analysis with Hadoop Ioannis Konstantinou Email: [email protected] Web: http://www.cslab.ntua.gr/~ikons Computing Systems Laboratory School of Electrical
UPS battery remote monitoring system in cloud computing
, pp.11-15 http://dx.doi.org/10.14257/astl.2014.53.03 UPS battery remote monitoring system in cloud computing Shiwei Li, Haiying Wang, Qi Fan School of Automation, Harbin University of Science and Technology
BIG DATA CHALLENGES AND PERSPECTIVES
BIG DATA CHALLENGES AND PERSPECTIVES Meenakshi Sharma 1, Keshav Kishore 2 1 Student of Master of Technology, 2 Head of Department, Department of Computer Science and Engineering, A P Goyal Shimla University,
Advances in Natural and Applied Sciences
AENSI Journals Advances in Natural and Applied Sciences ISSN:1995-0772 EISSN: 1998-1090 Journal home page: www.aensiweb.com/anas Clustering Algorithm Based On Hadoop for Big Data 1 Jayalatchumy D. and
Telecom Data processing and analysis based on Hadoop
COMPUTER MODELLING & NEW TECHNOLOGIES 214 18(12B) 658-664 Abstract Telecom Data processing and analysis based on Hadoop Guofan Lu, Qingnian Zhang *, Zhao Chen Wuhan University of Technology, Wuhan 4363,China
Large scale processing using Hadoop. Ján Vaňo
Large scale processing using Hadoop Ján Vaňo What is Hadoop? Software platform that lets one easily write and run applications that process vast amounts of data Includes: MapReduce offline computing engine
Applied research on data mining platform for weather forecast based on cloud storage
Applied research on data mining platform for weather forecast based on cloud storage Haiyan Song¹, Leixiao Li 2* and Yuhong Fan 3* 1 Department of Software Engineering t, Inner Mongolia Electronic Information
Surfing the Data Tsunami: A New Paradigm for Big Data Processing and Analytics
Surfing the Data Tsunami: A New Paradigm for Big Data Processing and Analytics Dr. Liangxiu Han Future Networks and Distributed Systems Group (FUNDS) School of Computing, Mathematics and Digital Technology,
AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW
AGENDA What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story Hadoop PDW Our BIG DATA Roadmap BIG DATA? Volume 59% growth in annual WW information 1.2M Zetabytes (10 21 bytes) this
Massive Data Query Optimization on Large Clusters
Journal of Computational Information Systems 8: 8 (2012) 3191 3198 Available at http://www.jofcis.com Massive Data Query Optimization on Large Clusters Guigang ZHANG, Chao LI, Yong ZHANG, Chunxiao XING
Hadoop implementation of MapReduce computational model. Ján Vaňo
Hadoop implementation of MapReduce computational model Ján Vaňo What is MapReduce? A computational model published in a paper by Google in 2004 Based on distributed computation Complements Google s distributed
International Journal of Engineering Research ISSN: 2348-4039 & Management Technology November-2015 Volume 2, Issue-6
International Journal of Engineering Research ISSN: 2348-4039 & Management Technology Email: [email protected] November-2015 Volume 2, Issue-6 www.ijermt.org Modeling Big Data Characteristics for Discovering
You should have a working knowledge of the Microsoft Windows platform. A basic knowledge of programming is helpful but not required.
What is this course about? This course is an overview of Big Data tools and technologies. It establishes a strong working knowledge of the concepts, techniques, and products associated with Big Data. Attendees
Log Mining Based on Hadoop s Map and Reduce Technique
Log Mining Based on Hadoop s Map and Reduce Technique ABSTRACT: Anuja Pandit Department of Computer Science, [email protected] Amruta Deshpande Department of Computer Science, [email protected]
Introduction to Big Data! with Apache Spark" UC#BERKELEY#
Introduction to Big Data! with Apache Spark" UC#BERKELEY# This Lecture" The Big Data Problem" Hardware for Big Data" Distributing Work" Handling Failures and Slow Machines" Map Reduce and Complex Jobs"
COMP9321 Web Application Engineering
COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 11 (Part II) http://webapps.cse.unsw.edu.au/webcms2/course/index.php?cid=2411
Big Data Analytics. Prof. Dr. Lars Schmidt-Thieme
Big Data Analytics Prof. Dr. Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany 33. Sitzung des Arbeitskreises Informationstechnologie,
Task Scheduling in Hadoop
Task Scheduling in Hadoop Sagar Mamdapure Munira Ginwala Neha Papat SAE,Kondhwa SAE,Kondhwa SAE,Kondhwa Abstract Hadoop is widely used for storing large datasets and processing them efficiently under distributed
Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 15
Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases Lecture 15 Big Data Management V (Big-data Analytics / Map-Reduce) Chapter 16 and 19: Abideboul et. Al. Demetris
http://www.paper.edu.cn
5 10 15 20 25 30 35 A platform for massive railway information data storage # SHAN Xu 1, WANG Genying 1, LIU Lin 2** (1. Key Laboratory of Communication and Information Systems, Beijing Municipal Commission
The Improved Job Scheduling Algorithm of Hadoop Platform
The Improved Job Scheduling Algorithm of Hadoop Platform Yingjie Guo a, Linzhi Wu b, Wei Yu c, Bin Wu d, Xiaotian Wang e a,b,c,d,e University of Chinese Academy of Sciences 100408, China b Email: [email protected]
Indian Journal of Science The International Journal for Science ISSN 2319 7730 EISSN 2319 7749 2016 Discovery Publication. All Rights Reserved
Indian Journal of Science The International Journal for Science ISSN 2319 7730 EISSN 2319 7749 2016 Discovery Publication. All Rights Reserved Perspective Big Data Framework for Healthcare using Hadoop
A Novel Cloud Based Elastic Framework for Big Data Preprocessing
School of Systems Engineering A Novel Cloud Based Elastic Framework for Big Data Preprocessing Omer Dawelbeit and Rachel McCrindle October 21, 2014 University of Reading 2008 www.reading.ac.uk Overview
Deep Learning Meets Heterogeneous Computing. Dr. Ren Wu Distinguished Scientist, IDL, Baidu [email protected]
Deep Learning Meets Heterogeneous Computing Dr. Ren Wu Distinguished Scientist, IDL, Baidu [email protected] Baidu Everyday 5b+ queries 500m+ users 100m+ mobile users 100m+ photos Big Data Storage Processing
Storage and Retrieval of Large RDF Graph Using Hadoop and MapReduce
Storage and Retrieval of Large RDF Graph Using Hadoop and MapReduce Mohammad Farhan Husain, Pankil Doshi, Latifur Khan, and Bhavani Thuraisingham University of Texas at Dallas, Dallas TX 75080, USA Abstract.
Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control
Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control EP/K006487/1 UK PI: Prof Gareth Taylor (BU) China PI: Prof Yong-Hua Song (THU) Consortium UK Members: Brunel University
Big Data and Analytics: Challenges and Opportunities
Big Data and Analytics: Challenges and Opportunities Dr. Amin Beheshti Lecturer and Senior Research Associate University of New South Wales, Australia (Service Oriented Computing Group, CSE) Talk: Sharif
Energy Efficient MapReduce
Energy Efficient MapReduce Motivation: Energy consumption is an important aspect of datacenters efficiency, the total power consumption in the united states has doubled from 2000 to 2005, representing
BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON
BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON Overview * Introduction * Multiple faces of Big Data * Challenges of Big Data * Cloud Computing
Design of Electric Energy Acquisition System on Hadoop
, pp.47-54 http://dx.doi.org/10.14257/ijgdc.2015.8.5.04 Design of Electric Energy Acquisition System on Hadoop Yi Wu 1 and Jianjun Zhou 2 1 School of Information Science and Technology, Heilongjiang University
E6893 Big Data Analytics Lecture 2: Big Data Analytics Platforms
E6893 Big Data Analytics Lecture 2: Big Data Analytics Platforms Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science Mgr., Dept. of Network Science and Big Data
Industry 4.0 and Big Data
Industry 4.0 and Big Data Marek Obitko, [email protected] Senior Research Engineer 03/25/2015 PUBLIC PUBLIC - 5058-CO900H 2 Background Joint work with Czech Institute of Informatics, Robotics and
International Journal of Innovative Research in Computer and Communication Engineering
FP Tree Algorithm and Approaches in Big Data T.Rathika 1, J.Senthil Murugan 2 Assistant Professor, Department of CSE, SRM University, Ramapuram Campus, Chennai, Tamil Nadu,India 1 Assistant Professor,
Fast Data in the Era of Big Data: Twitter s Real-
Fast Data in the Era of Big Data: Twitter s Real- Time Related Query Suggestion Architecture Gilad Mishne, Jeff Dalton, Zhenghua Li, Aneesh Sharma, Jimmy Lin Presented by: Rania Ibrahim 1 AGENDA Motivation
Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware
Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware Created by Doug Cutting and Mike Carafella in 2005. Cutting named the program after
An efficient Join-Engine to the SQL query based on Hive with Hbase Zhao zhi-cheng & Jiang Yi
International Conference on Applied Science and Engineering Innovation (ASEI 2015) An efficient Join-Engine to the SQL query based on Hive with Hbase Zhao zhi-cheng & Jiang Yi Institute of Computer Forensics,
Open source large scale distributed data management with Google s MapReduce and Bigtable
Open source large scale distributed data management with Google s MapReduce and Bigtable Ioannis Konstantinou Email: [email protected] Web: http://www.cslab.ntua.gr/~ikons Computing Systems Laboratory
ANALYSING THE FEATURES OF JAVA AND MAP/REDUCE ON HADOOP
ANALYSING THE FEATURES OF JAVA AND MAP/REDUCE ON HADOOP Livjeet Kaur Research Student, Department of Computer Science, Punjabi University, Patiala, India Abstract In the present study, we have compared
NextGen Infrastructure for Big DATA Analytics.
NextGen Infrastructure for Big DATA Analytics. So What is Big Data? Data that exceeds the processing capacity of conven4onal database systems. The data is too big, moves too fast, or doesn t fit the structures
Network Big Data: Facing and Tackling the Complexities Xiaolong Jin
Network Big Data: Facing and Tackling the Complexities Xiaolong Jin CAS Key Laboratory of Network Data Science & Technology Institute of Computing Technology Chinese Academy of Sciences (CAS) 2015-08-10
HadoopRDF : A Scalable RDF Data Analysis System
HadoopRDF : A Scalable RDF Data Analysis System Yuan Tian 1, Jinhang DU 1, Haofen Wang 1, Yuan Ni 2, and Yong Yu 1 1 Shanghai Jiao Tong University, Shanghai, China {tian,dujh,whfcarter}@apex.sjtu.edu.cn
Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap
Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap 3 key strategic advantages, and a realistic roadmap for what you really need, and when 2012, Cognizant Topics to be discussed
CloudRank-D:A Benchmark Suite for Private Cloud Systems
CloudRank-D:A Benchmark Suite for Private Cloud Systems Jing Quan Institute of Computing Technology, Chinese Academy of Sciences and University of Science and Technology of China HVC tutorial in conjunction
Big Data Explained. An introduction to Big Data Science.
Big Data Explained An introduction to Big Data Science. 1 Presentation Agenda What is Big Data Why learn Big Data Who is it for How to start learning Big Data When to learn it Objective and Benefits of
Transforming the Telecoms Business using Big Data and Analytics
Transforming the Telecoms Business using Big Data and Analytics Event: ICT Forum for HR Professionals Venue: Meikles Hotel, Harare, Zimbabwe Date: 19 th 21 st August 2015 AFRALTI 1 Objectives Describe
Challenges for Data Driven Systems
Challenges for Data Driven Systems Eiko Yoneki University of Cambridge Computer Laboratory Quick History of Data Management 4000 B C Manual recording From tablets to papyrus to paper A. Payberah 2014 2
The 4 Pillars of Technosoft s Big Data Practice
beyond possible Big Use End-user applications Big Analytics Visualisation tools Big Analytical tools Big management systems The 4 Pillars of Technosoft s Big Practice Overview Businesses have long managed
BIG DATA What it is and how to use?
BIG DATA What it is and how to use? Lauri Ilison, PhD Data Scientist 21.11.2014 Big Data definition? There is no clear definition for BIG DATA BIG DATA is more of a concept than precise term 1 21.11.14
MapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research
MapReduce and Distributed Data Analysis Google Research 1 Dealing With Massive Data 2 2 Dealing With Massive Data Polynomial Memory Sublinear RAM Sketches External Memory Property Testing 3 3 Dealing With
Research on IT Architecture of Heterogeneous Big Data
Journal of Applied Science and Engineering, Vol. 18, No. 2, pp. 135 142 (2015) DOI: 10.6180/jase.2015.18.2.05 Research on IT Architecture of Heterogeneous Big Data Yun Liu*, Qi Wang and Hai-Qiang Chen
Monitis Project Proposals for AUA. September 2014, Yerevan, Armenia
Monitis Project Proposals for AUA September 2014, Yerevan, Armenia Distributed Log Collecting and Analysing Platform Project Specifications Category: Big Data and NoSQL Software Requirements: Apache Hadoop
Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84
Index A Amazon Web Services (AWS), 50, 58 Analytics engine, 21 22 Apache Kafka, 38, 131 Apache S4, 38, 131 Apache Sqoop, 37, 131 Appliance pattern, 104 105 Application architecture, big data analytics
Hadoop MapReduce and Spark. Giorgio Pedrazzi, CINECA-SCAI School of Data Analytics and Visualisation Milan, 10/06/2015
Hadoop MapReduce and Spark Giorgio Pedrazzi, CINECA-SCAI School of Data Analytics and Visualisation Milan, 10/06/2015 Outline Hadoop Hadoop Import data on Hadoop Spark Spark features Scala MLlib MLlib
Survey on Scheduling Algorithm in MapReduce Framework
Survey on Scheduling Algorithm in MapReduce Framework Pravin P. Nimbalkar 1, Devendra P.Gadekar 2 1,2 Department of Computer Engineering, JSPM s Imperial College of Engineering and Research, Pune, India
IMPLEMENTATION OF P-PIC ALGORITHM IN MAP REDUCE TO HANDLE BIG DATA
IMPLEMENTATION OF P-PIC ALGORITHM IN MAP REDUCE TO HANDLE BIG DATA Jayalatchumy D 1, Thambidurai. P 2 Abstract Clustering is a process of grouping objects that are similar among themselves but dissimilar
Chapter 7. Using Hadoop Cluster and MapReduce
Chapter 7 Using Hadoop Cluster and MapReduce Modeling and Prototyping of RMS for QoS Oriented Grid Page 152 7. Using Hadoop Cluster and MapReduce for Big Data Problems The size of the databases used in
Chase Wu New Jersey Ins0tute of Technology
CS 698: Special Topics in Big Data Chapter 4. Big Data Analytics Platforms Chase Wu New Jersey Ins0tute of Technology Some of the slides have been provided through the courtesy of Dr. Ching-Yung Lin at
Hortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc. 2011 2014. All Rights Reserved
Hortonworks & SAS Analytics everywhere. Page 1 A change in focus. A shift in Advertising From mass branding A shift in Financial Services From Educated Investing A shift in Healthcare From mass treatment
Research of Postal Data mining system based on big data
3rd International Conference on Mechatronics, Robotics and Automation (ICMRA 2015) Research of Postal Data mining system based on big data Xia Hu 1, Yanfeng Jin 1, Fan Wang 1 1 Shi Jiazhuang Post & Telecommunication
Keywords Stock Exchange Market, Clustering, Hadoop, Map-Reduce
Volume 6, Issue 5, May 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Stock Exchange Trend
BPOE Research Highlights
BPOE Research Highlights Jianfeng Zhan ICT, Chinese Academy of Sciences 2013-10- 9 http://prof.ict.ac.cn/jfzhan INSTITUTE OF COMPUTING TECHNOLOGY What is BPOE workshop? B: Big Data Benchmarks PO: Performance
The Big Data Paradigm Shift. Insight Through Automation
The Big Data Paradigm Shift Insight Through Automation Agenda The Problem Emcien s Solution: Algorithms solve data related business problems How Does the Technology Work? Case Studies 2013 Emcien, Inc.
Fault Analysis in Software with the Data Interaction of Classes
, pp.189-196 http://dx.doi.org/10.14257/ijsia.2015.9.9.17 Fault Analysis in Software with the Data Interaction of Classes Yan Xiaobo 1 and Wang Yichen 2 1 Science & Technology on Reliability & Environmental
Four Orders of Magnitude: Running Large Scale Accumulo Clusters. Aaron Cordova Accumulo Summit, June 2014
Four Orders of Magnitude: Running Large Scale Accumulo Clusters Aaron Cordova Accumulo Summit, June 2014 Scale, Security, Schema Scale to scale 1 - (vt) to change the size of something let s scale the
Big Data Management in the Clouds. Alexandru Costan IRISA / INSA Rennes (KerData team)
Big Data Management in the Clouds Alexandru Costan IRISA / INSA Rennes (KerData team) Cumulo NumBio 2015, Aussois, June 4, 2015 After this talk Realize the potential: Data vs. Big Data Understand why we
Extract Transform and Load Strategy for Unstructured Data into Data Warehouse Using Map Reduce Paradigm and Big Data Analytics
Extract Transform and Load Strategy for Unstructured Data into Data Warehouse Using Map Reduce Paradigm and Big Data Analytics P.Saravana kumar 1, M.Athigopal 2, S.Vetrivel 3 Assistant Professor, Dept
ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat
ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web
Native Connectivity to Big Data Sources in MicroStrategy 10. Presented by: Raja Ganapathy
Native Connectivity to Big Data Sources in MicroStrategy 10 Presented by: Raja Ganapathy Agenda MicroStrategy supports several data sources, including Hadoop Why Hadoop? How does MicroStrategy Analytics
Unified Big Data Processing with Apache Spark. Matei Zaharia @matei_zaharia
Unified Big Data Processing with Apache Spark Matei Zaharia @matei_zaharia What is Apache Spark? Fast & general engine for big data processing Generalizes MapReduce model to support more types of processing
Big Data Analytics Opportunities and Challenges
Big Data Analytics Opportunities and Challenges Anup Kumar, Professor and Director of MINDS Lab Computer Engineering and Computer Science Department University of Louisville Road Map Introduction Hadoop
Data Refinery with Big Data Aspects
International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 3, Number 7 (2013), pp. 655-662 International Research Publications House http://www. irphouse.com /ijict.htm Data
A Hadoop MapReduce Performance Prediction Method
A Hadoop MapReduce Performance Prediction Method Ge Song, Zide Meng, Fabrice Huet, Frederic Magoules, Lei Yu and Xuelian Lin University of Nice Sophia Antipolis, CNRS, I3S, UMR 7271, France Ecole Centrale
Big Data Analytics Hadoop and Spark
Big Data Analytics Hadoop and Spark Shelly Garion, Ph.D. IBM Research Haifa 1 What is Big Data? 2 What is Big Data? Big data usually includes data sets with sizes beyond the ability of commonly used software
BIG DATA ANALYSIS USING RHADOOP
BIG DATA ANALYSIS USING RHADOOP HARISH D * ANUSHA M.S Dr. DAYA SAGAR K.V ECM & KLUNIVERSITY ECM & KLUNIVERSITY ECM & KLUNIVERSITY Abstract In this electronic age, increasing number of organizations are
Analytics in the Cloud. Peter Sirota, GM Elastic MapReduce
Analytics in the Cloud Peter Sirota, GM Elastic MapReduce Data-Driven Decision Making Data is the new raw material for any business on par with capital, people, and labor. What is Big Data? Terabytes of
Efficient Data Replication Scheme based on Hadoop Distributed File System
, pp. 177-186 http://dx.doi.org/10.14257/ijseia.2015.9.12.16 Efficient Data Replication Scheme based on Hadoop Distributed File System Jungha Lee 1, Jaehwa Chung 2 and Daewon Lee 3* 1 Division of Supercomputing,
Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing
Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing Wayne W. Eckerson Director of Research, TechTarget Founder, BI Leadership Forum Business Analytics
A Big Data Analytical Framework For Portfolio Optimization Abstract. Keywords. 1. Introduction
A Big Data Analytical Framework For Portfolio Optimization Dhanya Jothimani, Ravi Shankar and Surendra S. Yadav Department of Management Studies, Indian Institute of Technology Delhi {dhanya.jothimani,
Testing 3Vs (Volume, Variety and Velocity) of Big Data
Testing 3Vs (Volume, Variety and Velocity) of Big Data 1 A lot happens in the Digital World in 60 seconds 2 What is Big Data Big Data refers to data sets whose size is beyond the ability of commonly used
NP-complete? NP-hard? Some Foundations of Complexity. Prof. Sven Hartmann Clausthal University of Technology Department of Informatics
NP-complete? NP-hard? Some Foundations of Complexity Prof. Sven Hartmann Clausthal University of Technology Department of Informatics Tractability of Problems Some problems are undecidable: no computer
How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning
How to use Big Data in Industry 4.0 implementations LAURI ILISON, PhD Head of Big Data and Machine Learning Big Data definition? Big Data is about structured vs unstructured data Big Data is about Volume
Danny Wang, Ph.D. Vice President of Business Strategy and Risk Management Republic Bank
Danny Wang, Ph.D. Vice President of Business Strategy and Risk Management Republic Bank Agenda» Overview» What is Big Data?» Accelerates advances in computer & technologies» Revolutionizes data measurement»
AN EFFICIENT SELECTIVE DATA MINING ALGORITHM FOR BIG DATA ANALYTICS THROUGH HADOOP
AN EFFICIENT SELECTIVE DATA MINING ALGORITHM FOR BIG DATA ANALYTICS THROUGH HADOOP Asst.Prof Mr. M.I Peter Shiyam,M.E * Department of Computer Science and Engineering, DMI Engineering college, Aralvaimozhi.
Keywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance.
Volume 4, Issue 11, November 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Analytics
Brave New World: Hadoop vs. Spark
Brave New World: Hadoop vs. Spark Dr. Kurt Stockinger Associate Professor of Computer Science Director of Studies in Data Science Zurich University of Applied Sciences Datalab Seminar, Zurich, Oct. 7,
Big Data and Market Surveillance. April 28, 2014
Big Data and Market Surveillance April 28, 2014 Copyright 2014 Scila AB. All rights reserved. Scila AB reserves the right to make changes to the information contained herein without prior notice. No part
Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh
1 Hadoop: A Framework for Data- Intensive Distributed Computing CS561-Spring 2012 WPI, Mohamed Y. Eltabakh 2 What is Hadoop? Hadoop is a software framework for distributed processing of large datasets
Metaheuristics in Big Data: An Approach to Railway Engineering
Metaheuristics in Big Data: An Approach to Railway Engineering Silvia Galván Núñez 1,2, and Prof. Nii Attoh-Okine 1,3 1 Department of Civil and Environmental Engineering University of Delaware, Newark,
How Companies are! Using Spark
How Companies are! Using Spark And where the Edge in Big Data will be Matei Zaharia History Decreasing storage costs have led to an explosion of big data Commodity cluster software, like Hadoop, has made
Computing at Scale: Resource Scheduling Architectural Evolution and Introduction to Fuxi System
Computing at Scale: Resource Scheduling Architectural Evolution and Introduction to Fuxi System Renyu Yang( 杨 任 宇 ) Supervised by Prof. Jie Xu Ph.D. student@ Beihang University Research Intern @ Alibaba
Introduction to Hadoop. New York Oracle User Group Vikas Sawhney
Introduction to Hadoop New York Oracle User Group Vikas Sawhney GENERAL AGENDA Driving Factors behind BIG-DATA NOSQL Database 2014 Database Landscape Hadoop Architecture Map/Reduce Hadoop Eco-system Hadoop
Hadoop: Distributed Data Processing. Amr Awadallah Founder/CTO, Cloudera, Inc. ACM Data Mining SIG Thursday, January 25 th, 2010
Hadoop: Distributed Data Processing Amr Awadallah Founder/CTO, Cloudera, Inc. ACM Data Mining SIG Thursday, January 25 th, 2010 Outline Scaling for Large Data Processing What is Hadoop? HDFS and MapReduce
Extending Hadoop beyond MapReduce
Extending Hadoop beyond MapReduce Mahadev Konar Co-Founder @mahadevkonar (@hortonworks) Page 1 Bio Apache Hadoop since 2006 - committer and PMC member Developed and supported Map Reduce @Yahoo! - Core
A Multilevel Secure MapReduce Framework for Cross-Domain Information Sharing in the Cloud
A Multilevel Secure MapReduce Framework for Cross-Domain Information Sharing in the Cloud Thuy D. Nguyen, Cynthia E. Irvine, Jean Khosalim Department of Computer Science Ground System Architectures Workshop
SQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford
SQL VS. NO-SQL Adapted Slides from Dr. Jennifer Widom from Stanford 55 Traditional Databases SQL = Traditional relational DBMS Hugely popular among data analysts Widely adopted for transaction systems
