Problems to store, transfer and process the Big Data 6/2/2016 GIANG TRAN - TTTGIANG2510@GMAIL.COM 1



Similar documents
Managing Cloud Server with Big Data for Small, Medium Enterprises: Issues and Challenges

Turning Big Data into Big Decisions Delivering on the High Demand for Data

Danny Wang, Ph.D. Vice President of Business Strategy and Risk Management Republic Bank

Next presentation starting soon Business Analytics using Big Data to gain competitive advantage

Data Centric Computing Revisited

CSC590: Selected Topics BIG DATA & DATA MINING. Lecture 2 Feb 12, 2014 Dr. Esam A. Alwagait

Big Data Analytics. Prof. Dr. Lars Schmidt-Thieme

Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 12

How To Use Big Data Effectively

Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA

A Survey on Big Data Concepts and Tools

Introduction to the Mathematics of Big Data. Philippe B. Laval

Introduction to Engineering Using Robotics Experiments Lecture 17 Big Data

Exploiting Data at Rest and Data in Motion with a Big Data Platform

Big Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014

The Next Wave of Data Management. Is Big Data The New Normal?

Transforming the Telecoms Business using Big Data and Analytics

HP Vertica at MIT Sloan Sports Analytics Conference March 1, 2013 Will Cairns, Senior Data Scientist, HP Vertica

Industry Impact of Big Data in the Cloud: An IBM Perspective

Maximizing Hadoop Performance with Hardware Compression

Applications for Big Data Analytics

Big Data Analytics. Lucas Rego Drumond

Deploying Big Data to the Cloud: Roadmap for Success

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

Big Data Analytic and Mining with Machine Learning Algorithm

From Internet Data Centers to Data Centers in the Cloud

IT Platforms for Utilization of Big Data

Exploiting the power of Big Data

How To Understand The Benefits Of Big Data

UNDERSTANDING THE BIG DATA PROBLEMS AND THEIR SOLUTIONS USING HADOOP AND MAP-REDUCE

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

The 4 Pillars of Technosoft s Big Data Practice

What happens when Big Data and Master Data come together?

Are You Ready for Big Data?

Surfing the Data Tsunami: A New Paradigm for Big Data Processing and Analytics

Big Data: Tools and Technologies in Big Data

5 Keys to Unlocking the Big Data Analytics Puzzle. Anurag Tandon Director, Product Marketing March 26, 2014

AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW

Big Data Er Big Data bare en døgnflue? Lasse Bache-Mathiesen CTO BIM Norway

Beyond Watson: The Business Implications of Big Data

BIG DATA FUNDAMENTALS

CIS 4930/6930 Spring 2014 Introduction to Data Science Data Intensive Computing. University of Florida, CISE Department Prof.

Big Data: Issues and Challenges Moving Forward

Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control

The emergence of big data technology and analytics

Big Data a threat or a chance?

Big Data and Analytics: Getting Started with ArcGIS. Mike Park Erik Hoel

Volume 3, Issue 6, June 2015 International Journal of Advance Research in Computer Science and Management Studies

Improving Data Processing Speed in Big Data Analytics Using. HDFS Method

How Big Data is Different

How to make BIG DATA work for you. Faster results with Microsoft SQL Server PDW

Real Time Big Data Processing

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

Big Data Buzzwords From A to Z. By Rick Whiting, CRN 4:00 PM ET Wed. Nov. 28, 2012

Simplifying Big Data Analytics: Unifying Batch and Stream Processing. John Fanelli,! VP Product! In-Memory Compute Summit! June 30, 2015!!

Big Data Introduction, Importance and Current Perspective of Challenges

Hur hanterar vi utmaningar inom området - Big Data. Jan Östling Enterprise Technologies Intel Corporation, NER

Copyright (c) 2012, Meta Business Systems. Mario Bojilov Meta Business Systems 20 February 2013

A Survey on Challenges and Advantages in Big Data

From Big Data to Smart Data Thomas Hahn

Ali Eghlima Ph.D Director of Bioinformatics. A Bioinformatics Research & Consulting Group

International Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: Vol. 1, Issue 6, October Big Data and Hadoop

Converged, Real-time Analytics Enabling Faster Decision Making and New Business Opportunities

Laurence Liew General Manager, APAC. Economics Is Driving Big Data Analytics to the Cloud

Can Flash help you ride the Big Data Wave? Steve Fingerhut Vice President, Marketing Enterprise Storage Solutions Corporation

Of all the data in recorded human history, 90 percent has been created in the last two years. - Mark van Rijmenam, Think Bigger, 2014

So Just What Is Big Data? James E. Tcheng, MD, FACC, FSCAI

Collaborations between Official Statistics and Academia in the Era of Big Data

WHAT IS BIG DATA? David Bechtold

Are You Ready for Big Data?

Keywords Big Data Analytic Tools, Data Mining, Hadoop and MapReduce, HBase and Hive tools, User-Friendly tools.

A New Era Of Analytic

So What s the Big Deal?

Big Data in Telco & Banking Analytics. Benjamin Sznajder IBM Research Haifa

Analyzing Big Data: The Path to Competitive Advantage

Big Data: Study in Structured and Unstructured Data

Keywords: Big Data, Hadoop, cluster, heterogeneous, HDFS, MapReduce

ATA DRIVEN GLOBAL VISION CLOUD PLATFORM STRATEG N POWERFUL RELEVANT PERFORMANCE SOLUTION CLO IRTUAL BIG DATA SOLUTION ROI FLEXIBLE DATA DRIVEN V

Chapter 7. Using Hadoop Cluster and MapReduce

Data Refinery with Big Data Aspects

Where is... How do I get to...

Big Data. Fast Forward. Putting data to productive use

Hadoop. Sunday, November 25, 12

Addressing Open Source Big Data, Hadoop, and MapReduce limitations

How To Handle Big Data With A Data Scientist

Machina Research. Where is the value in IoT? IoT data and analytics may have an answer. Emil Berthelsen, Principal Analyst April 28, 2016

BIRT in the World of Big Data

Industry 4.0 and Big Data

Transcription:

Problems to store, transfer and process the Big Data COURSE: COMPUTING CLUSTERS, GRIDS, AND CLOUDS LECTURER: ANDREY SHEVEL ITMO UNIVERSITY SAINT PETERSBURG 6/2/2016 GIANG TRAN - TTTGIANG2510@GMAIL.COM 1

Outline 1. Introduction 2. Why big data 3. Big data characteristics 4. Big data problems Storing Transferring Processing 5. Conclusion 6/2/2016 GIANG TRAN - TTTGIANG2510@GMAIL.COM 2

1. Introduction Big data is defined as data sets that are so large and complex that traditional database management concepts and tools are inadequate Big data is being generated by multiple sources such as social media, systems, sensors and mobile devices at an alarming velocity, volume and variety The Big Data is the combination of structured, semi-structured, unstructured, homogeneous and heterogeneous data 6/2/2016 GIANG TRAN - TTTGIANG2510@GMAIL.COM 3

2. Why big data Massive amount of data is being generated from various sources every day For example, Facebook processes 500 TB of data daily 80% of the world s data is unstructured Companies use data analytics for competitive advantages 1. Faster and better decision making 2. Understand customers 3. Optimize business processes 4. Prevent threads and fraud 5. Capitalize on new sources of revenue 6/2/2016 GIANG TRAN - TTTGIANG2510@GMAIL.COM 4

3. Big data characteristics 6/2/2016 GIANG TRAN - TTTGIANG2510@GMAIL.COM 5

6/2/2016 GIANG TRAN - TTTGIANG2510@GMAIL.COM 6

4. Problems - Storage Current technologies of data management systems are not able to satisfy the needs of big data, and the increasing speed of storage capacity is much less than that of data Data set/domain Large Hadron Collider/Particle Physics (CERN) Description 13-15 petabytes in 2010 Internet Communications (Cisco) 667 exabytes in 2013 Social Media 12+ terabytes of tweets every day and growing. Average retweets are 144 per tweet. Human Digital Universe 7.9 zettabytes in 2015 Others RFIDS, smart electric meters, 4.6 billion camera phones w/ GPS 6/2/2016 GIANG TRAN - TTTGIANG2510@GMAIL.COM 7

4. Problems - Storage Big data is heterogeneous Previous computer algorithms are not able to effectively store big data How to re-organize data? 6/2/2016 GIANG TRAN - TTTGIANG2510@GMAIL.COM 8

4. Problems - Storage The crucial requirements of big data storage: 1. It can handle very large amount of data and keep scaling to keep up with data growth 2. It can provide the input/output operations per second (IOPS) necessary to deliver data to analytics tools Hyperscale computing environments for big data storage Hadoop and Cassandra as analytics engines 6/2/2016 GIANG TRAN - TTTGIANG2510@GMAIL.COM 9

4. Problems - Transfer Conventional methods of transfer data: Transfer via the network, using TCP-based transfer methods (FTP, HTTP) Use storage medium Current communication network are unsuitable for such massive volume of big data 6/2/2016 GIANG TRAN - TTTGIANG2510@GMAIL.COM 10

4. Problems - Transfer Solutions: 1. Process the data in place and transmit only the resulting information 2. Perform triage on the data and transmit only that data which is critical to downstream analysis 3. Parallel transmission techniques used on the internet 4. NICE Model for Big Data transfers 6/2/2016 GIANG TRAN - TTTGIANG2510@GMAIL.COM 11

4. Problems - Processing Being able to extract real-time information from a large stream of data remains difficult The traditional serial algorithm is inefficient for the big data Processing big data requires extensive parallel processing and new analytics algorithms to provide timely and actionable information. Application parallelization Divide-and-conquer approach 6/2/2016 GIANG TRAN - TTTGIANG2510@GMAIL.COM 12

5. Conclusion Big Data is not a new concept but very challenging. The problems seem to be solvable in the near-term, but present a long-term challenges that require a lot of research. It calls for scalable storage index and a distributed approach to retrieve required results in near real-time. 6/2/2016 GIANG TRAN - TTTGIANG2510@GMAIL.COM 13

References [1] S. Kaisler, F. Armour, J. a Espinosa, and W. Money, Big Data: Issues and Challenges Moving Forward, 46th Hawaii Int. Conf. Syst. Sci., pp. 995 1004, 2013. [2] A. Adshead, "Big data storage: Defining big data and the type of storage it needs," [Online]. Available: http://www.computerweekly.com/podcast/big-data-storage-defining-big-data-andthe-type-of-storage-it-needs. [Accessed 1 June 2016]. [3] C. Sliwa, "Understanding stripped-down hyperscale storage for big data use cases," [Online]. Available: http://searchstorage.techtarget.com/podcast/understanding-stripped-downhyperscale-storage-for-big-data-use-cases. [Accessed 1 June 2016]. [4] "Big data", Wikipedia, 2016. [Online]. Available: https://en.wikipedia.org/wiki/big_data#cite_note-10. [Accessed: 01- Jun- 2016]. [5] A. Jacobs, "The Pathologies of Big Data", Queue, vol. 7, no. 6, p. 10, 2009. [6] Douglas and Laney (2008) The importance of big data : A definition. 6/2/2016 GIANG TRAN - TTTGIANG2510@GMAIL.COM 14