Principles of E-Commerce I: Business and Technology. (PoE1) Focus: Big Data Platforms Prof. Roberto V. Zicari

Size: px
Start display at page:

Download "Principles of E-Commerce I: Business and Technology. (PoE1) Focus: Big Data Platforms Prof. Roberto V. Zicari"

Transcription

1 Principles of E-Commerce I: Business and Technology. (PoE1) Focus: Big Data Platforms Prof. Roberto V. Zicari with support of Todor Ivanov, Marten Rosselli and Dr. Karsten Tolle 2015 SS

2 Principles of E-Commerce I (PoE1) Focus: Big Data Platforms Responsible: Prof. Roberto V. Zicari with support of Todor Ivanov, Marten Rosselli and Dr. Karsten Tolle Time and location: Tuesday 10:15-11:45, SR 11 (Informatikgebäude) Wednesday 10:15-11:45, SR 307 (Informatikgebäude) Goethe University Frankfurt Institute for Computer Science DBIS

3 Basic information Webpage Frankfurt Big Data Lab: Homepage DBIS: Attention: We will try to announce changes and news on the webpage. You should have a look before each lecture. Goethe University Frankfurt Institute for Computer Science DBIS

4 Hands-on This course is a hands-on course. The exercise and lecture parts are mixed as needed. Please make sure that you bring at least one notebook for two persons for each course slot. (In case this is a problem send an to Goethe University Frankfurt Institute for Computer Science DBIS

5 How to get the CPs (6) and the final score Each participant needs to do four practical assignments. Make a presentation of the results. Details will follow. Registration: With the first assignment, we will collect your data (Name, Matrikelnummer und Studiengang) for the registration to this course. Goethe University Frankfurt Institute for Computer Science DBIS

6 Schedule (preliminary, please check Web site!) Prof. Dott. Ing. Roberto V. Zicari - Intro Todor Ivanov - Hadoop Dr. Karsten Tolle - GraphDBs Marten Rosselli - NoSQL Students - Presentations Tuesday Wednesday April /15/2015 Intro to Big Data April 2015 Intro to Hadoop 1 - HDFS & MapReduce 4/22/2015 Intro to Hadoop 2 - HDFS & MapReduce April 2015 Hadoop Ecosystem 1 4/29/2015 Hadoop Ecosystem Mai 2015 Data Acquisition 1 5/6/2015 Data Acquisition Mai 2015 Graphs, GraphDBs 5/13/2015 Semantic Web, LOD Mai 2015 Intro to Pig 1 5/20/2015 Intro to Pig Mai 2015 Student Presentations 1 5/27/2015 Student Presentations Juni 2015 Advanced Pig 1 6/3/2015 Intro to Hive Juni 2015 Intro to Hive 2 6/10/2015 Advanced Hive Juni 2015 NoSQL 6/17/2015 NoSQL Juni 2015 NoSQL - Exercise 6/24/2015 NoSQL - Exercise Juni 2015 Impala 7/1/2015 New Big Data Technologies - Spark Juli 2015 NoSQL 7/8/2015 NoSQL Juli 2015 Student Presentations 3 7/15/2015 Student Presentations 4 Goethe University Frankfurt Institute for Computer Science DBIS

7 Ringvorlesung Focus Big Data, Internet of Things and Data Science Series of 10 guest lectures, open to all. Course start/end: Thursday, to Thursday, Time: Every Thursdays, 14:15 15:45 Location: Robert-Mayer-Straße 11-15, Room SR 307 (Informatikgebäude) Webpage: Frankfurt Big Data Lab: Goethe University Frankfurt Institute for Computer Science DBIS

8 Ringvorlesung Schedule Date Speaker Title Prof. Roberto V. Zicari, Frankfurt Big Data Lab, Goethe University Frankfurt Big Data: A Data Driven Society? Prof. Nikos Korfiatis, Assistant Professor of Business Analytics, Norwich Business School, University of East Anglia, UK Big Data and Regulation Dr. Alexander Zeier, Managing Director, Globally for In-Memory Solutions at Accenture In-Memory Technologies and Applications: S/4 HANA Jörg Besier, Managing Director at Accenture, Digital Delivery Lead ASG Towards a data-driven economy. How Big Data fuels the digital economy Klaas Wilhelm Bollhoefer, Chief Data Scientist, The unbelievable Machine Company Introduction to Data Science Prof. Dr. Katharina Morik, TU Dortmund University Big Data Analytics in Astrophysics Prof. Hans Uszkoreit, Scientific Director, German Research Center for Artificial Intelligence (DFKI) Matthew Eric Bassett, Director and Co-Founder of Gower Street Analytics. Former Director, Data Science at NBCUniversal International, UK Smart Data Web Value chains for industrial applications Data Science and the future of the movie business Prof. Dr. Christoph Schommer, University of Luxembourg Algorithms for Data Privacy Thomas Jarzombek, Mitglied des Deutschen Bundestages Big Data and its challenges for today s politics Goethe University Frankfurt Institute for Computer Science DBIS

9 Big Data slogans Big Data: The next frontier for innovation, competition, and productivity (McKinsey Global Institute) Data is the new gold Open Data Initiative, European Commission (aim at opening up Public Sector Information). 9

10 What Data? The term Big Data" refers to large amounts of different types of data produced with high velocity from a high number of various types of sources. Handling today's highly variable and real-time datasets requires new tools and methods, such as powerful processors, software and algorithms. The term Open Data" refers to a subset of data, namely to data made freely available for re-use to everyone for both commercial and non-commercial purposes. Linked Data is about using the Web to connect related data that wasn't previously linked, or using the Web to lower the barriers to linking data currently linked using other methods. 10

11 This is Big Data. Every day, 2.5 quintillion bytes (=2,5 exabytes) of data are created. This data comes from: digital pictures, videos, posts to social media sites, intelligent sensors, purchase transaction records, cell phone GPS signals to name a few. In 2013, estimates reached 4 zettabytes of data generated worldwide (*) Mary Meeker and Liang Yu, Internet Trends, Kleiner Perkins Caulfield Byers, 2013, 11

12 Source /12/the-data- explosion-in minute-by-minuteinfographic/

13 How Big is Big Data? 1 megabyte = 1,000,000 =10 6 bytes 1 gigabyte = 10 9 bytes 1 terabyte = 1,000,000,000,000 bytes = bytes petabyte is 1,000 terabytes (TB) =10 15 bytes 1 exabyte = bytes 1 zettabyte is 1, ,000,000,000,000,000== bytes Imagine that every person (320,590,000) in the United States took a digital photo every second of every day for over a month. All of those photos put together would equal about one zettabyte (*) (*) BIG DATA: SEIZING OPPORTUNITIES, PRESERVING VALUES Executive Office of the President, MAY The White House, Washington. 13

14 Another Definition of Big Data Big Data refers to datasets whose size is beyond the ability of typical database software tools to capture, store, manage and analyze (McKinsey Global Institute) This definition is Not defined in terms of data size (data sets will increase) Vary by sectors (ranging from a few dozen terabytes to multiple petabytes) 14

15 Examples of gigabyte-sized storage( source Wikipedia) One hour of Standard-definition television SDTV- video at 2.2 Mbit/s is approximately 1 GB. Seven minutes of High-definition television HDTV- video at Mbit/s is approximately 1 GB. 114 minutes of uncompressed Compact disc -CD-quality audio at 1.4 Mbit/s is approximately 1 GB. A digital optical disc storage DVD-R single- layer can hold about 4.7 GB. A dual-layered Blu-ray Disc -digital optical disc data storage Blu-ray disc- can hold about 50 GB. 15

16 Examples of the use of terabyte (source Wikipedia) Audio: One terabyte of audio recorded at CD quality contains approx hours of audio. Climate science: In 2010, the German Climate Computing Centre (DKRZ) was generating TB of data per year Video: Released in 2009, the 3D animated film Monsters vs. Aliens used 100 TB of storage during development The Hubble Space Telescope has collected more than 45 terabytes of data in its first 20 years of observations. Historical Internet traffic: In 1993, total Internet traffic amounted to approximately 100 TB for the year. As of June 2008, Cisco Systems estimated Internet traffic at 160 TB/s (which, assuming to be statistically constant, comes to 5 zettabytes for the year). In other words, the amount of Internet traffic per second in 2008 exceeded all of the Internet traffic in

17 Examples of the use of the petabyte (source Wikipedia) Databases: Teradata Database 12 has a capacity of 50 petabytes of compressed data Data mining: In August 2012, Facebook's Hadoop clusters include the largest single HDFS cluster known, with more than 100 PB physical disk space in a single HDFS filesystem. Yahoo stores 2 petabytes of data on behavior. Telecommunications (usage): AT&T transfers about 30 petabytes of data through its networks each day. Internet:Google processed about 24 petabytes of data per day in 2009 Data storage system: In August 2011, IBM was reported to have built the largest storage array ever, with a capacity of 120 petabytes. 17

18 Examples of the use of the petabyte (source Wikipedia) Photos: As of January 2013, Facebook users had uploaded over 240 billion photos, with 350 million new photos every day. For each uploaded photo, Facebook generates and stores four images of different sizes, which translated to a total of 960 billion images and an estimated 357 petabytes of storage. Music: One petabyencoded songs te of averagemp3- (for mobile, roughly one megabyte per minute), would require 2000 years to play. Games: World of Warcraft uses 1.3 petabytes of storage to maintain its game. Physics:experiments in the Large Hadron Collider produce about 15 petabytes of data per year Climate science:german Climate Computing Centre (DKRZ) has a storage capacity of 60 petabytes of climate data. 18

19 The Internet of Things The Internet of Things is a term used to describe the ability of devices to communicate with each other using embedded sensors that are linked through wired and wireless networks. These devices could include your thermostat, your car, or a pill you swallow so the doctor can monitor the health of your digestive tract. These connected devices use the Internet to transmit, compile, and analyze data (*) (*) BIG DATA: SEIZING OPPORTUNITIES, PRESERVING VALUES Executive Office of the President, MAY The White House, Washington. 19

20 What is Big Data supposed to create? Value (McKinsey Global Institute): Creating transparencies Discovering needs, expose variability, improve performance Segmenting customers Replacing/supporting human decision making with automated algorithms Innovating new business models,products,services 20

21 21

22 The value of big data Source: php#null

23 What is Data Science? (sourcehttp://datascience.nyu.edu/what-is-data-science/ Data science involves using automated methods to analyze massive amounts of data and to extract knowledge from them. One way to consider data science is as an evolutionary step in interdisciplinary fields like business analysis that incorporate computer science, modeling, statistics, analytics, and mathematics. 23

24 Big Data Analytics (source: Analytics/ba-p/ Descriptive Analytics The purpose of descriptive analytics is simply to summarize and tell you what happened. simplest class of analytics that you can use to reduce big data into much smaller, but consumable bites of information. Compute descriptive statistics (i.e. counts, sums, averages, percentages, min, max and simple arithmetic: + ) that summarizes certain groupings or filtered version of the data, which are typically simple counts of some events. They are mostly based on standard aggregate functions in databases 24

25 Big Data Analytics (source: Analytics/ba-p/ Predictive Analytics The purpose of predictive analytics is NOT to tell you what will happen in the future. It cannot do that. In fact, no analytics can do that. Predictive analytics can only forecast what might happen in the future, because all predictive analytics are probabilistic in nature. forecasting: 25

26 Examples of Non-Temporal Predictive Analytics An example of non-temporal predictive analytics where a model uses someone s existing social media activity data (data we have) to predict his/her potential to influence (data we don t have). Another well-known example of non-temporal predictive analytics in social analytics is sentiment analysis. (sourcehttp://community.lithium.com/t5/science-of-social-blog/big-data-reduction-2-understanding-predictive- Analytics/ba-p/

27 Big Data Analytics souce: 3. Prescriptive Analytics Prescriptive analytics not only predicts a possible future, it predicts multiple futures based on the decision maker s actions. A prescriptive model can be viewed as a combination of multiple predictive models running in parallel, one for each possible input action. 27

28 This predictive model must have two more added components in order to be prescriptive: source: Actionable: The data consumers must be able to take actions based on the predicted outcome of the model Feedback System: The model must have a feedback system that tracks the adjusted outcome based on the action taken. This means the predictive model must be smart enough to learn the complex relationship between the user s action and the adjusted outcome through the feedback data 28

29 How Big Data will be used? Combining Data together is the real value for corporations: 90% corporate data 10% social media data Sensors data just begun (e.g. smart meters) Key basis of competition and growth for individual firms (McKinsey Global Institute). 29

30 30

31 Examples of BIG DATA USE CASES Log Analytics Fraud Detection Social Media and Sentiment Analysis Risk Modeling and Management 31

32 32

33 Big Data can generate financial value(*) across sectors, e.g. Health care Public sector administration Global personal location data Retail Manufacturing (McKinsey Global Institute) (*)Note: but it could be more than that! 33

34 Limitations Shortage of talent necessary for organizations to take advantage of big data. Very few PhDs. Knowledge in statistics and machine learning, data mining. Managers and Analysts who make decision by using insights from big data. Source: McKinsey Global Institute 34

35 Smart Data? Big data provides the infrastructure for economically storing and processing unprecedented amount of data. But undigested big data (e.g. terabytes of raw logs) and the technology required for it (e.g. Hadoop, Cassandra, etc.) is pretty much inaccessible to the average business person. There is a huge disconnect between what big data provides and what businesses need. Smart data is how you can fill the gap Source: php#null 35

36 36

37 Big Data: What are the consequences? Any technological or social force that reaches down to affect the majority of society`s members is bound to produce a number of controversial topics (John Bittner, 1977) But, what are the true consequences of a society being reshaped by systematically building on data analytics? 37

38 Big Data: Challenges 1. Data 2. Process 3. Management 38

39 Data Challenges Volume: dealing with the size of it In the year 2000, 800,000 petabytes (PB) of data stored in the world (source IBM). Expect to reach 35 zettabytes (ZB) by Twitter generates 7+ terabytes (TB) of data every day. Facebook 10TB. Scale and performance requirements strain conventional databases. Scalability has three aspects: Data Volume, Hardware Size, and Concurrency. 39

40 Analytics Data Platform for Big Data Mike Carey (EDBT Keynote 2012): Big Data in the Database World (early 1980s till now) - Parallel Data Bases. Shared-nothing architecture, declarative set-oriented nature of relational queries, divide and conquer parallelism (e.g. Teradata) - Re-implemention of relational databases (e.g. HP/Vertica, IBM/Netezza, Teradata/ Aster Data,EMC/ Greenplum.) Big Data in the Systems World (late 1990s) - Apache Hadoop (inspired by Google GFS, MapReduce, contributed by large Web companies.e.g. Yahoo!, Facebook - Google BigTable, - Amazon Dynamo. 40

41 Data Challenges Variety: handling multiplicity of types, sources and formats Sensors, smart devices, social collaboration technologies. Data is not only structured, but raw, semi structured, unstructured data from web pages, web log files (click stream data), search indexes, s, documents, sensor data, etc. 41

42 Structured Data Employee EmpNo Ename DeptNo DeptName 100 Bob 10 Marketing 200 Bob 20 Purchasing 150 Peter 10 Marketing 170 Doug 20 Purchasing 105 John 10 Marketing 42

43 Clickstream Data Clickstream data is an information trail a user leaves behind while visiting a website. It is typically captured in semi-structured website log (source and( fcrawler.looksmart.com - - [26/Apr/2000:00:00: ] "GET /contacts.html HTTP/1.0" "-" "FAST-WebCrawler/2.1- pre2 fcrawler.looksmart.com - - [26/Apr/2000:00:17: ] "GET /news/news.html HTTP/1.0" "-" "FAST- WebCrawler/2.1-pre2 ppp931.on.bellglobal.com - - [26/Apr/2000:00:16: ] "GET /download/windows/asctab31.zip HTTP/1.0" "http://www.htmlgoodies.com/downloads/freeware/webdevelopment/15.html" "Mozilla/4.7 [en]c-sympa (Win95; U)" [26/Apr/2000:00:23: ] "GET /pics/wpaper.gif HTTP/1.0" "http://www.jafsoft.com/asctortf/" "Mozilla/4.05 (Macintosh; I; PPC)" [26/Apr/2000:00:23: ] "GET /asctortf/ HTTP/1.0" "http://search.netscape.com/computers/data_formats/document/text/rtf" "Mozilla/4.05 (Macintosh; I; PPC)" [26/Apr/2000:00:23: ] "GET /pics/5star2000.gif HTTP/1.0" "http://www.jafsoft.com/asctortf/" "Mozilla/4.05 (Macintosh; I; PPC)" [26/Apr/2000:00:23: ] "GET /pics/5star.gif HTTP/1.0" "http://www.jafsoft.com/asctortf/" "Mozilla/4.05 (Macintosh; I; PPC)" [26/Apr/2000:00:23: ] "GET /pics/a2hlogo.jpg HTTP/1.0" "http://www.jafsoft.com/asctortf/" "Mozilla/4.05 (Macintosh; I; PPC)" [26/Apr/2000:00:23: ] "GET /cgi-bin/newcount?jafsof3&width=4&font=digital&noshow HTTP/1.0" "http://www.jafsoft.com/asctortf/" "Mozilla/4.05 (Macintosh; I; PPC)" 43

44 Potential Uses of Clickstream Data (source One of the original uses of Hadoop at Yahoo was to store and process their massive volume of clickstream data. Now enterprises can use Hadoop Data Platforms (HDP) to refine and analyze clickstream data. They can then answer business questions such as: What is the most efficient path for a site visitor to research a product, and then buy it? What products do visitors tend to buy together, and what are they most likely to buy in the future? Where should I spend resources on fixing or enhancing the user experience on my website? 44

45 Variety (cont.) A/B testing (two versions (A and B) are compared, which are identical except for one variation that might affect a user's behavior), sessionization (Behavioral analytics focuses on how and why users behave by grouping events into sessions/ A session ID is created and stored every time a user visits your web page or mobile application.), bot detection (A bot is formed when a computer gets infected), and pathing analysis (statistics: directed dependencies among a set of variables/multiple regression analysis) all require powerful analytics on many petabytes of semistructured Web data. 45

46 Twitter source: What Twitter`s Made Of By Paul Ford, Bloomberg Businessweek, November 11-17, It`s short- 140 characters If you open one up one Tweet and look inside. You find (via an Application Programming Interface, API), e.g: -Identity of the creator (bot or human) -Location from which it originated -Date and time it went out -Number of people who read the tweet, fav`d a tweet, number of retweets, etc. we call it metadata You can access this info requesting a API key from Twitter (fast automated procedure), you get a Web address, and access it 46 as raw data for computers to read.

47 Twitter Example of metadata Coordinates part of the tweet: This value contains geographical information- latitude and longitude (in a format called GeoJSON-public open standard) Place part of the tweet: Specific, named locations. (multiple coordinates-polygons over the surface of the earth). 47

48 Twitter With this metadata (places and time), by applying some math one can reveal, for example how far one tweeter is from another learn when people are active in social media engagement 48

49 Twitter More metadata withheld copyright if set to true trouble over copyright withheld_in_countries list of countries in which the tweet is banned possibly_sensitive if set to true links to potentially offensive things : nudity, violence, or medical procedures (a user can check a box in his profile, automatically flagge 49

50 Example of search indexes (source https://cloudant.com/) Search indexes are defined by a javascript function. This is run over all of your documents, in a similar manner to a view's map function, and defines the fields that your search can query. A simple search function function(doc) { index("name", doc.name); } Defining an analyzer "indexes": { "mysearch" : { "analyzer": "whitespace", "index": "function(doc){... }" }, } 50

51 Sensor Data Analyze Machine and Sensor Data (source :http://hortonworks.com/hadoop-tutorial/how-to-analyze-machine-and-sensor-data/ ) A sensor is a device that measures a physical quantity and transforms it into a digital signal. Sensors are always on, capturing data at a low cost, and powering the Internet of Things. Sensors data: separating signal to noise ratio Potential Uses of Sensor Data Sensors can be used to collect data from many sources, such as: To monitor machines or infrastructure such as ventilation equipment, bridges, energy meters, or airplane engines. This data can be used for predictive analytics, to repair or replace these items before they break. To monitor natural phenomena such as meteorological patterns, underground pressure during oil extraction, or patient vital statistics during recovery from a medical procedure. 51

52 Raw data (source Raw data, also known as source data or atomic data, is information that has not been processed in order to be displayed in any sort of presentable form. The raw form may look very unrecognizable and be nearly meaningless without processing, but it may also be in a form that some can interpret, depending on the situation. This data can be processed manually or by a machine. In some cases, raw data may be nothing more than a series of numbers. The way those numbers are sequenced, however, and sometimes even the way they are spaced, can be very important information. A computer may interpret this information and give a readout that then may make sense to the reader. Binary code is a good example of raw data. Taken by itself as a printout, a binary code does very little for the computer user at least the vast majority of users. When it is processed through a computer, on the other hand, it provides more understandable information. In fact, binary code is typically the source code for everything a computer user sees. 52

53 Sensor data logged to a text file. Imported data into Excel (sourcememos From the Cube) 53

54 Data Challenges cont Velocity (reacting to the flood of information in the time required by the application) Stream computing: e.g. Show me all people who are currently living in the Bay Area flood zone - continuosly updated by GPS data in real time. (IBM) Challenge: the change of the data structure; the consumer has no longer control over the source of data creation; this requires the concept of late binding; it also poses a major challenge in regards to governance and data quality; with the shift of the transformation of data from ETL to at-timeof-consumption the ETL-knowledge must be give to every consumer; tools will have to help on that. -- Thomas, Fastner, ebay Combining multiple data sets 54

55 Data Challenges cont. Personally Identifiable Information much of this information is about people. Can we extract enough information to help people without extracting so much as to compromise their privacy? Partly, this calls for effective industrial practices. Partly, it calls for effective oversight by Government. Partly perhaps mostly it requires a realistic reconsideration of what privacy really means. (Paul Miller) right to be forgotten. 1,000 a day ask Google to remove search links (145,000 requests have been made in the European Union covering 497,000+ web links) 55

56 Data Challenges cont. Data dogmatism analysis of big data can offer quite remarkable insights, but we must be wary of becoming too beholden to the numbers. Domain experts and common sense must continue to play a role. e.g. It would be worrying if the healthcare sector only responded to flu outbreaks when Google Flu Trends told them to. (Paul Miller) 56

57 Process Challenges The challenges with deriving insight include - Capturing data, - Aligning data from different sources (e.g., resolving when two objects are the same), - Transforming the data into a form suitable for analysis, - Modeling it, whether mathematically, or through some form of simulation, - Understanding the output visualizing and sharing the results, (Laura Haas, IBM Research) 57

58 Management Challenges Data Privacy, Security, and Governance. - ensuring that data is used correctly (abiding by its intended uses and relevant laws), - tracking how the data is used, transformed, derived, etc, - and managing its lifecycle. Many data warehouses contain sensitive data such as personal data. There are legal and ethical concerns with accessing such data. So the data must be secured and access controlled as well as logged for audits (Michael Blaha). 58

59 Big Data: Data Platforms In the Big Data era the old paradigm of shipping data to the application isn`t working any more. Rather, the application logic must come to the data or else things will break: this is counter to conventional wisdom and the established notion of strata within the database stack. Hadoop: Processing moves to where the data is! Data management With terabytes, things are actually pretty simple -- most conventional databases scale to terabytes these days. However, try to scale to petabytes and it`s a whole different ball game. (Florian Waas, previously at Pivotal) 59

60 Big Data Analytics In order to analyze Big Data, the current state of the art is a parallel database or NoSQL data store, with a Hadoop connector. Concerns about performance issues arising with the transfer of large amounts of data between the two systems. The use of connectors could introduce delays, data silos, increase TCO. What about existing Data Warehouses? 60

61 Which Analytics Platform for Big Data? NoSQL (document store, key-value store, ) NewSQL InMemory DB Hadoop Data Warehouses Plus scripts, workflows, and ETL-like data transformations.are we going back to Federated Databases? This just seems like too many moving parts. 61

62 62

63 63

64 High Performance High Functionality Big Data Software Stack - Geoffrey Fox, Judy Qiu, Shantenu Jha, Indiana and Rutgers University xascale.org.bdec/files/whitepapers/fo x.pdf 64

65 65

66 Build your own database Spanner: Google s Globally-Distributed Database Spanner is Google s scalable, multi-version, globally- distributed, and synchronously-replicated database. It is the first system to distribute data at global scale and support externally-consistent distributed transactions. Spanner: Google's Globally-Distributed Database Published in the Proceedings of OSDI'12: Tenth Symposium on Operating System Design and Implementation, Hollywood, CA, October, Recipient of the Jay Lepreau Best Paper Award. 66

67 Google AdWords Ecosystem One shared database backing Google's core AdWords business Legacy DB: Sharded MySQL Critical applications driving Google's core ad business 24/7 availability, even with data center outages Consistency required Can't afford to process inconsistent data Eventual consistency too complex and painful Scale: 10s of TB, replicated to 1000s of machines F1: A new database, built from scratch, designed to operate at Google scale, without compromising on RDBMS features. Co-developed with new lower-level storage system, Spanner Better scalability Better availability Equivalent consistency guarantees Equally powerful SQL query 67

68 Google F1 - A Hybrid Database Scalability of Bigtable F1 - A Hybrid Database combining the Usability and functionality of SQL databases Key Ideas Scalability: Auto-sharded storage Availability & Consistency: Synchronous replication High commit latency: Can be hidden Hierarchical schema Protocol buffer column types Efficient client code A scalable database without going NoSQL. F1 - The Fault-Tolerant Distributed RDBMS Supporting Google's Ad Business Jeff Shute, Mircea Oancea, Stephan Ellner, Ben Handy, Eric Rollins, Bart Samwel, Radek Vingralek, Chad Whipkey, Xin Chen, Beat Jegerlehner, Kyle Littlefield, Phoenix Tong SIGMOD May 22,

69 Hadoop Limitations Hadoop can give powerful analysis, but it is fundamentally a batch-oriented paradigm. The missing piece of the Hadoop puzzle is accounting for real time changes. Apache Hadoop YARN (MapReduce 2.0 (MRv2)) is a sub-project of Hadoop at the Apache Software Foundation that takes Hadoop beyond batch to enable broader data-processing. 69

70 Replacing Hadoop Apache Spark is an open-source data analytics cluster computing framework originally developed in the AMPLab at UC Berkeley (https://spark.apache.org) Databricks was founded out of the UC Berkeley AMPLab by the creators of Apache Spark. A unified platform for building Big Data pipelines from ETL to Exploration and Dashboards, to Advanced Analytics and Data Products. The Stratosphere project (TU Berlin, Humboldt University, Hasso Plattner Institute) (www.stratosphere.eu) contributes to Apache Flink is a platform for efficient, distributed, general-purpose data processing. flink.incubator.apache.org The ASTERIX project (UC Irvine- started 2009) Four years of R&D involving researchers at UC Irvine, UC Riverside, and Oracle Labs. The AsterixDB code base currently consists of over 250K lines of Java code that has been co-developed by project staff and students at UCI and UCR.opensource Apache-style licence 70

71 Which Language for Analytics? There is a trend in using SQL for analytics and integration of data stores. (e.g. SQL-H, Teradata QueryGrid) Is this good? 71

72 Graphs and Big Data (sources: (http://neo4j.com/developer/graph-database/) The breadth of problems requiring graph analytics is growing rapidly Large Network Systems Social Networks Packet Inspection Natural Language Understanding Semantic Search and Knowledge Discovery CyberSecurity 72

Big Data: A data-driven society?

Big Data: A data-driven society? Big Data: A data-driven society? Roberto V. Zicari Goethe University Frankfurt Director Big Data Lab Frankfurt http://www.bigdata.uni-frankfurt.de Editor ODBMS.org, and ODBMS Industry Watch www.odbms.org

More information

Surfing the Data Tsunami: A New Paradigm for Big Data Processing and Analytics

Surfing the Data Tsunami: A New Paradigm for Big Data Processing and Analytics Surfing the Data Tsunami: A New Paradigm for Big Data Processing and Analytics Dr. Liangxiu Han Future Networks and Distributed Systems Group (FUNDS) School of Computing, Mathematics and Digital Technology,

More information

BIG DATA TECHNOLOGY. Hadoop Ecosystem

BIG DATA TECHNOLOGY. Hadoop Ecosystem BIG DATA TECHNOLOGY Hadoop Ecosystem Agenda Background What is Big Data Solution Objective Introduction to Hadoop Hadoop Ecosystem Hybrid EDW Model Predictive Analysis using Hadoop Conclusion What is Big

More information

Transforming the Telecoms Business using Big Data and Analytics

Transforming the Telecoms Business using Big Data and Analytics Transforming the Telecoms Business using Big Data and Analytics Event: ICT Forum for HR Professionals Venue: Meikles Hotel, Harare, Zimbabwe Date: 19 th 21 st August 2015 AFRALTI 1 Objectives Describe

More information

Getting Started Practical Input For Your Roadmap

Getting Started Practical Input For Your Roadmap Getting Started Practical Input For Your Roadmap Mike Ferguson Managing Director, Intelligent Business Strategies BA4ALL Big Data & Analytics Insight Conference Stockholm, May 2015 About Mike Ferguson

More information

AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW

AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW AGENDA What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story Hadoop PDW Our BIG DATA Roadmap BIG DATA? Volume 59% growth in annual WW information 1.2M Zetabytes (10 21 bytes) this

More information

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica

More information

Big Data Buzzwords From A to Z. By Rick Whiting, CRN 4:00 PM ET Wed. Nov. 28, 2012

Big Data Buzzwords From A to Z. By Rick Whiting, CRN 4:00 PM ET Wed. Nov. 28, 2012 Big Data Buzzwords From A to Z By Rick Whiting, CRN 4:00 PM ET Wed. Nov. 28, 2012 Big Data Buzzwords Big data is one of the, well, biggest trends in IT today, and it has spawned a whole new generation

More information

Hadoop. http://hadoop.apache.org/ Sunday, November 25, 12

Hadoop. http://hadoop.apache.org/ Sunday, November 25, 12 Hadoop http://hadoop.apache.org/ What Is Apache Hadoop? The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using

More information

HDP Hadoop From concept to deployment.

HDP Hadoop From concept to deployment. HDP Hadoop From concept to deployment. Ankur Gupta Senior Solutions Engineer Rackspace: Page 41 27 th Jan 2015 Where are you in your Hadoop Journey? A. Researching our options B. Currently evaluating some

More information

Big Data Explained. An introduction to Big Data Science.

Big Data Explained. An introduction to Big Data Science. Big Data Explained An introduction to Big Data Science. 1 Presentation Agenda What is Big Data Why learn Big Data Who is it for How to start learning Big Data When to learn it Objective and Benefits of

More information

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing Wayne W. Eckerson Director of Research, TechTarget Founder, BI Leadership Forum Business Analytics

More information

The 4 Pillars of Technosoft s Big Data Practice

The 4 Pillars of Technosoft s Big Data Practice beyond possible Big Use End-user applications Big Analytics Visualisation tools Big Analytical tools Big management systems The 4 Pillars of Technosoft s Big Practice Overview Businesses have long managed

More information

Real Time Big Data Processing

Real Time Big Data Processing Real Time Big Data Processing Cloud Expo 2014 Ian Meyers Amazon Web Services Global Infrastructure Deployment & Administration App Services Analytics Compute Storage Database Networking AWS Global Infrastructure

More information

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning How to use Big Data in Industry 4.0 implementations LAURI ILISON, PhD Head of Big Data and Machine Learning Big Data definition? Big Data is about structured vs unstructured data Big Data is about Volume

More information

Firebird meets NoSQL (Apache HBase) Case Study

Firebird meets NoSQL (Apache HBase) Case Study Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 thomas.steinmaurer@scch.at www.scch.at Michael Zwick DI

More information

COMP9321 Web Application Engineering

COMP9321 Web Application Engineering COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 11 (Part II) http://webapps.cse.unsw.edu.au/webcms2/course/index.php?cid=2411

More information

Big Data: A data-driven society?

Big Data: A data-driven society? Big Data: A data-driven society? Roberto V. Zicari Goethe University Frankfurt Director Big Data Lab Frankfurt http://www.bigdata.uni-frankfurt.de Editor ODBMS.org and ODBMS Industry Watch www.odbms.org

More information

Hadoop Ecosystem B Y R A H I M A.

Hadoop Ecosystem B Y R A H I M A. Hadoop Ecosystem B Y R A H I M A. History of Hadoop Hadoop was created by Doug Cutting, the creator of Apache Lucene, the widely used text search library. Hadoop has its origins in Apache Nutch, an open

More information

BIG DATA & ANALYTICS. Transforming the business and driving revenue through big data and analytics

BIG DATA & ANALYTICS. Transforming the business and driving revenue through big data and analytics BIG DATA & ANALYTICS Transforming the business and driving revenue through big data and analytics Collection, storage and extraction of business value from data generated from a variety of sources are

More information

The Future of Data Management

The Future of Data Management The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah (@awadallah) Cofounder and CTO Cloudera Snapshot Founded 2008, by former employees of Employees Today ~ 800 World Class

More information

INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE

INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE AGENDA Introduction to Big Data Introduction to Hadoop HDFS file system Map/Reduce framework Hadoop utilities Summary BIG DATA FACTS In what timeframe

More information

The Big Deal about Big Data. Mike Skinner, CPA CISA CITP HORNE LLP

The Big Deal about Big Data. Mike Skinner, CPA CISA CITP HORNE LLP The Big Deal about Big Data Mike Skinner, CPA CISA CITP HORNE LLP Mike Skinner, CPA CISA CITP Senior Manager, IT Assurance & Risk Services HORNE LLP Focus areas: IT security & risk assessment IT governance,

More information

5 Keys to Unlocking the Big Data Analytics Puzzle. Anurag Tandon Director, Product Marketing March 26, 2014

5 Keys to Unlocking the Big Data Analytics Puzzle. Anurag Tandon Director, Product Marketing March 26, 2014 5 Keys to Unlocking the Big Data Analytics Puzzle Anurag Tandon Director, Product Marketing March 26, 2014 1 A Little About Us A global footprint. A proven innovator. A leader in enterprise analytics for

More information

Big Data a threat or a chance?

Big Data a threat or a chance? Big Data a threat or a chance? Helwig Hauser University of Bergen, Dept. of Informatics Big Data What is Big Data? well, lots of data, right? we come back to this in a moment. certainly, a buzz-word but

More information

BIG DATA What it is and how to use?

BIG DATA What it is and how to use? BIG DATA What it is and how to use? Lauri Ilison, PhD Data Scientist 21.11.2014 Big Data definition? There is no clear definition for BIG DATA BIG DATA is more of a concept than precise term 1 21.11.14

More information

Apache Hadoop: The Big Data Refinery

Apache Hadoop: The Big Data Refinery Architecting the Future of Big Data Whitepaper Apache Hadoop: The Big Data Refinery Introduction Big data has become an extremely popular term, due to the well-documented explosion in the amount of data

More information

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Introduction to Hadoop HDFS and Ecosystems ANSHUL MITTAL Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Topics The goal of this presentation is to give

More information

BIG DATA TRENDS AND TECHNOLOGIES

BIG DATA TRENDS AND TECHNOLOGIES BIG DATA TRENDS AND TECHNOLOGIES THE WORLD OF DATA IS CHANGING Cloud WHAT IS BIG DATA? Big data are datasets that grow so large that they become awkward to work with using onhand database management tools.

More information

BIG DATA CHALLENGES AND PERSPECTIVES

BIG DATA CHALLENGES AND PERSPECTIVES BIG DATA CHALLENGES AND PERSPECTIVES Meenakshi Sharma 1, Keshav Kishore 2 1 Student of Master of Technology, 2 Head of Department, Department of Computer Science and Engineering, A P Goyal Shimla University,

More information

Large scale processing using Hadoop. Ján Vaňo

Large scale processing using Hadoop. Ján Vaňo Large scale processing using Hadoop Ján Vaňo What is Hadoop? Software platform that lets one easily write and run applications that process vast amounts of data Includes: MapReduce offline computing engine

More information

Applications for Big Data Analytics

Applications for Big Data Analytics Smarter Healthcare Applications for Big Data Analytics Multi-channel sales Finance Log Analysis Homeland Security Traffic Control Telecom Search Quality Manufacturing Trading Analytics Fraud and Risk Retail:

More information

Introduction to Big Data! with Apache Spark" UC#BERKELEY#

Introduction to Big Data! with Apache Spark UC#BERKELEY# Introduction to Big Data! with Apache Spark" UC#BERKELEY# So What is Data Science?" Doing Data Science" Data Preparation" Roles" This Lecture" What is Data Science?" Data Science aims to derive knowledge!

More information

Hadoop implementation of MapReduce computational model. Ján Vaňo

Hadoop implementation of MapReduce computational model. Ján Vaňo Hadoop implementation of MapReduce computational model Ján Vaňo What is MapReduce? A computational model published in a paper by Google in 2004 Based on distributed computation Complements Google s distributed

More information

Executive Summary... 2 Introduction... 3. Defining Big Data... 3. The Importance of Big Data... 4 Building a Big Data Platform...

Executive Summary... 2 Introduction... 3. Defining Big Data... 3. The Importance of Big Data... 4 Building a Big Data Platform... Executive Summary... 2 Introduction... 3 Defining Big Data... 3 The Importance of Big Data... 4 Building a Big Data Platform... 5 Infrastructure Requirements... 5 Solution Spectrum... 6 Oracle s Big Data

More information

Big Data and Analytics: Challenges and Opportunities

Big Data and Analytics: Challenges and Opportunities Big Data and Analytics: Challenges and Opportunities Dr. Amin Beheshti Lecturer and Senior Research Associate University of New South Wales, Australia (Service Oriented Computing Group, CSE) Talk: Sharif

More information

#mstrworld. Tapping into Hadoop and NoSQL Data Sources in MicroStrategy. Presented by: Trishla Maru. #mstrworld

#mstrworld. Tapping into Hadoop and NoSQL Data Sources in MicroStrategy. Presented by: Trishla Maru. #mstrworld Tapping into Hadoop and NoSQL Data Sources in MicroStrategy Presented by: Trishla Maru Agenda Big Data Overview All About Hadoop What is Hadoop? How does MicroStrategy connects to Hadoop? Customer Case

More information

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES Relational vs. Non-Relational Architecture Relational Non-Relational Rational Predictable Traditional Agile Flexible Modern 2 Agenda Big Data

More information

So What s the Big Deal?

So What s the Big Deal? So What s the Big Deal? Presentation Agenda Introduction What is Big Data? So What is the Big Deal? Big Data Technologies Identifying Big Data Opportunities Conducting a Big Data Proof of Concept Big Data

More information

Big Data Challenges and Success Factors. Deloitte Analytics Your data, inside out

Big Data Challenges and Success Factors. Deloitte Analytics Your data, inside out Big Data Challenges and Success Factors Deloitte Analytics Your data, inside out Big Data refers to the set of problems and subsequent technologies developed to solve them that are hard or expensive to

More information

Data Management in SAP Environments

Data Management in SAP Environments Data Management in SAP Environments the Big Data Impact Berlin, June 2012 Dr. Wolfgang Martin Analyst, ibond Partner und Ventana Research Advisor Data Management in SAP Environments Big Data What it is

More information

Big Data Are You Ready? Thomas Kyte http://asktom.oracle.com

Big Data Are You Ready? Thomas Kyte http://asktom.oracle.com Big Data Are You Ready? Thomas Kyte http://asktom.oracle.com The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated

More information

III Big Data Technologies

III Big Data Technologies III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution

More information

Ramesh Bhashyam Teradata Fellow Teradata Corporation bhashyam.ramesh@teradata.com

Ramesh Bhashyam Teradata Fellow Teradata Corporation bhashyam.ramesh@teradata.com Challenges of Handling Big Data Ramesh Bhashyam Teradata Fellow Teradata Corporation bhashyam.ramesh@teradata.com Trend Too much information is a storage issue, certainly, but too much information is also

More information

Hadoop and Data Warehouse Friends, Enemies or Profiteers? What about Real Time?

Hadoop and Data Warehouse Friends, Enemies or Profiteers? What about Real Time? Hadoop and Data Warehouse Friends, Enemies or Profiteers? What about Real Time? Kai Wähner kwaehner@tibco.com @KaiWaehner www.kai-waehner.de Disclaimer! These opinions are my own and do not necessarily

More information

Big Data Analytics. Prof. Dr. Lars Schmidt-Thieme

Big Data Analytics. Prof. Dr. Lars Schmidt-Thieme Big Data Analytics Prof. Dr. Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany 33. Sitzung des Arbeitskreises Informationstechnologie,

More information

Here comes the flood Tools for Big Data analytics. Guy Chesnot -June, 2012

Here comes the flood Tools for Big Data analytics. Guy Chesnot -June, 2012 Here comes the flood Tools for Big Data analytics Guy Chesnot -June, 2012 Agenda Data flood Implementations Hadoop Not Hadoop 2 Agenda Data flood Implementations Hadoop Not Hadoop 3 Forecast Data Growth

More information

International Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: 2454-2377 Vol. 1, Issue 6, October 2015. Big Data and Hadoop

International Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: 2454-2377 Vol. 1, Issue 6, October 2015. Big Data and Hadoop ISSN: 2454-2377, October 2015 Big Data and Hadoop Simmi Bagga 1 Satinder Kaur 2 1 Assistant Professor, Sant Hira Dass Kanya MahaVidyalaya, Kala Sanghian, Distt Kpt. INDIA E-mail: simmibagga12@gmail.com

More information

Native Connectivity to Big Data Sources in MSTR 10

Native Connectivity to Big Data Sources in MSTR 10 Native Connectivity to Big Data Sources in MSTR 10 Bring All Relevant Data to Decision Makers Support for More Big Data Sources Optimized Access to Your Entire Big Data Ecosystem as If It Were a Single

More information

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract W H I T E P A P E R Deriving Intelligence from Large Data Using Hadoop and Applying Analytics Abstract This white paper is focused on discussing the challenges facing large scale data processing and the

More information

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84 Index A Amazon Web Services (AWS), 50, 58 Analytics engine, 21 22 Apache Kafka, 38, 131 Apache S4, 38, 131 Apache Sqoop, 37, 131 Appliance pattern, 104 105 Application architecture, big data analytics

More information

Sunnie Chung. Cleveland State University

Sunnie Chung. Cleveland State University Sunnie Chung Cleveland State University Data Scientist Big Data Processing Data Mining 2 INTERSECT of Computer Scientists and Statisticians with Knowledge of Data Mining AND Big data Processing Skills:

More information

Big Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect

Big Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect Big Data & QlikView Democratizing Big Data Analytics David Freriks Principal Solution Architect TDWI Vancouver Agenda What really is Big Data? How do we separate hype from reality? How does that relate

More information

Big Data Analytics Platform @ Nokia

Big Data Analytics Platform @ Nokia Big Data Analytics Platform @ Nokia 1 Selecting the Right Tool for the Right Workload Yekesa Kosuru Nokia Location & Commerce Strata + Hadoop World NY - Oct 25, 2012 Agenda Big Data Analytics Platform

More information

www.pwc.com/oracle Next presentation starting soon Business Analytics using Big Data to gain competitive advantage

www.pwc.com/oracle Next presentation starting soon Business Analytics using Big Data to gain competitive advantage www.pwc.com/oracle Next presentation starting soon Business Analytics using Big Data to gain competitive advantage If every image made and every word written from the earliest stirring of civilization

More information

Data Warehouse design

Data Warehouse design Data Warehouse design Design of Enterprise Systems University of Pavia 10/12/2013 2h for the first; 2h for hadoop - 1- Table of Contents Big Data Overview Big Data DW & BI Big Data Market Hadoop & Mahout

More information

Data processing goes big

Data processing goes big Test report: Integration Big Data Edition Data processing goes big Dr. Götz Güttich Integration is a powerful set of tools to access, transform, move and synchronize data. With more than 450 connectors,

More information

BIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata

BIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata BIG DATA: FROM HYPE TO REALITY Leandro Ruiz Presales Partner for C&LA Teradata Evolution in The Use of Information Action s ACTIVATING MAKE it happen! Insights OPERATIONALIZING WHAT IS happening now? PREDICTING

More information

The Big Data Paradigm Shift. Insight Through Automation

The Big Data Paradigm Shift. Insight Through Automation The Big Data Paradigm Shift Insight Through Automation Agenda The Problem Emcien s Solution: Algorithms solve data related business problems How Does the Technology Work? Case Studies 2013 Emcien, Inc.

More information

Forecast of Big Data Trends. Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014

Forecast of Big Data Trends. Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014 Forecast of Big Data Trends Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014 Big Data transforms Business 2 Data created every minute Source http://mashable.com/2012/06/22/data-created-every-minute/

More information

Tap into Hadoop and Other No SQL Sources

Tap into Hadoop and Other No SQL Sources Tap into Hadoop and Other No SQL Sources Presented by: Trishla Maru What is Big Data really? The Three Vs of Big Data According to Gartner Volume Volume Orders of magnitude bigger than conventional data

More information

HDP Enabling the Modern Data Architecture

HDP Enabling the Modern Data Architecture HDP Enabling the Modern Data Architecture Herb Cunitz President, Hortonworks Page 1 Hortonworks enables adoption of Apache Hadoop through HDP (Hortonworks Data Platform) Founded in 2011 Original 24 architects,

More information

Are You Ready for Big Data?

Are You Ready for Big Data? Are You Ready for Big Data? Jim Gallo National Director, Business Analytics February 11, 2013 Agenda What is Big Data? How do you leverage Big Data in your company? How do you prepare for a Big Data initiative?

More information

Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related

Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related Summary Xiangzhe Li Nowadays, there are more and more data everyday about everything. For instance, here are some of the astonishing

More information

BIG Big Data Public Private Forum

BIG Big Data Public Private Forum DATA STORAGE Martin Strohbach, AGT International (R&D) THE DATA VALUE CHAIN Value Chain Data Acquisition Data Analysis Data Curation Data Storage Data Usage Structured data Unstructured data Event processing

More information

DATA MINING WITH HADOOP AND HIVE Introduction to Architecture

DATA MINING WITH HADOOP AND HIVE Introduction to Architecture DATA MINING WITH HADOOP AND HIVE Introduction to Architecture Dr. Wlodek Zadrozny (Most slides come from Prof. Akella s class in 2014) 2015-2025. Reproduction or usage prohibited without permission of

More information

Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 12

Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 12 Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases Lecture 12 Big Data Management II (NoSQL Databases / CouchDB) Chapter 20: Abiteboul et. Al. + http://guide.couchdb.org/

More information

The Next Wave of Data Management. Is Big Data The New Normal?

The Next Wave of Data Management. Is Big Data The New Normal? The Next Wave of Data Management Is Big Data The New Normal? Table of Contents Introduction 3 Separating Reality and Hype 3 Why Are Firms Making IT Investments In Big Data? 4 Trends In Data Management

More information

The Internet of Things and Big Data: Intro

The Internet of Things and Big Data: Intro The Internet of Things and Big Data: Intro John Berns, Solutions Architect, APAC - MapR Technologies April 22 nd, 2014 1 What This Is; What This Is Not It s not specific to IoT It s not about any specific

More information

Oracle Big Data SQL Technical Update

Oracle Big Data SQL Technical Update Oracle Big Data SQL Technical Update Jean-Pierre Dijcks Oracle Redwood City, CA, USA Keywords: Big Data, Hadoop, NoSQL Databases, Relational Databases, SQL, Security, Performance Introduction This technical

More information

Addressing Open Source Big Data, Hadoop, and MapReduce limitations

Addressing Open Source Big Data, Hadoop, and MapReduce limitations Addressing Open Source Big Data, Hadoop, and MapReduce limitations 1 Agenda What is Big Data / Hadoop? Limitations of the existing hadoop distributions Going enterprise with Hadoop 2 How Big are Data?

More information

Big Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014

Big Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014 Big Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014 Defining Big Not Just Massive Data Big data refers to data sets whose size is beyond the ability of typical database software tools

More information

Hortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc. 2011 2014. All Rights Reserved

Hortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc. 2011 2014. All Rights Reserved Hortonworks & SAS Analytics everywhere. Page 1 A change in focus. A shift in Advertising From mass branding A shift in Financial Services From Educated Investing A shift in Healthcare From mass treatment

More information

Hadoop IST 734 SS CHUNG

Hadoop IST 734 SS CHUNG Hadoop IST 734 SS CHUNG Introduction What is Big Data?? Bulk Amount Unstructured Lots of Applications which need to handle huge amount of data (in terms of 500+ TB per day) If a regular machine need to

More information

Integrating a Big Data Platform into Government:

Integrating a Big Data Platform into Government: Integrating a Big Data Platform into Government: Drive Better Decisions for Policy and Program Outcomes John Haddad, Senior Director Product Marketing, Informatica Digital Government Institute s Government

More information

Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control

Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control EP/K006487/1 UK PI: Prof Gareth Taylor (BU) China PI: Prof Yong-Hua Song (THU) Consortium UK Members: Brunel University

More information

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ End to End Solution to Accelerate Data Warehouse Optimization Franco Flore Alliance Sales Director - APJ Big Data Is Driving Key Business Initiatives Increase profitability, innovation, customer satisfaction,

More information

Hadoop and its Usage at Facebook. Dhruba Borthakur dhruba@apache.org, June 22 rd, 2009

Hadoop and its Usage at Facebook. Dhruba Borthakur dhruba@apache.org, June 22 rd, 2009 Hadoop and its Usage at Facebook Dhruba Borthakur dhruba@apache.org, June 22 rd, 2009 Who Am I? Hadoop Developer Core contributor since Hadoop s infancy Focussed on Hadoop Distributed File System Facebook

More information

Tapping Into Hadoop and NoSQL Data Sources with MicroStrategy. Presented by: Jeffrey Zhang and Trishla Maru

Tapping Into Hadoop and NoSQL Data Sources with MicroStrategy. Presented by: Jeffrey Zhang and Trishla Maru Tapping Into Hadoop and NoSQL Data Sources with MicroStrategy Presented by: Jeffrey Zhang and Trishla Maru Agenda Big Data Overview All About Hadoop What is Hadoop? How does MicroStrategy connects to Hadoop?

More information

Search and Real-Time Analytics on Big Data

Search and Real-Time Analytics on Big Data Search and Real-Time Analytics on Big Data Sewook Wee, Ryan Tabora, Jason Rutherglen Accenture & Think Big Analytics Strata New York October, 2012 Big Data: data becomes your core asset. It realizes its

More information

Advanced Big Data Analytics with R and Hadoop

Advanced Big Data Analytics with R and Hadoop REVOLUTION ANALYTICS WHITE PAPER Advanced Big Data Analytics with R and Hadoop 'Big Data' Analytics as a Competitive Advantage Big Analytics delivers competitive advantage in two ways compared to the traditional

More information

Bringing Big Data Modelling into the Hands of Domain Experts

Bringing Big Data Modelling into the Hands of Domain Experts Bringing Big Data Modelling into the Hands of Domain Experts David Willingham Senior Application Engineer MathWorks david.willingham@mathworks.com.au 2015 The MathWorks, Inc. 1 Data is the sword of the

More information

Big Data: Opportunities & Challenges, Myths & Truths 資 料 來 源 : 台 大 廖 世 偉 教 授 課 程 資 料

Big Data: Opportunities & Challenges, Myths & Truths 資 料 來 源 : 台 大 廖 世 偉 教 授 課 程 資 料 Big Data: Opportunities & Challenges, Myths & Truths 資 料 來 源 : 台 大 廖 世 偉 教 授 課 程 資 料 美 國 13 歲 學 生 用 Big Data 找 出 霸 淩 熱 點 Puri 架 設 網 站 Bullyvention, 藉 由 分 析 Twitter 上 找 出 提 到 跟 霸 凌 相 關 的 詞, 搭 配 地 理 位 置

More information

WA2192 Introduction to Big Data and NoSQL EVALUATION ONLY

WA2192 Introduction to Big Data and NoSQL EVALUATION ONLY WA2192 Introduction to Big Data and NoSQL Web Age Solutions Inc. USA: 1-877-517-6540 Canada: 1-866-206-4644 Web: http://www.webagesolutions.com The following terms are trademarks of other companies: Java

More information

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON Overview * Introduction * Multiple faces of Big Data * Challenges of Big Data * Cloud Computing

More information

Buyer s Guide to Big Data Integration

Buyer s Guide to Big Data Integration SEPTEMBER 2013 Buyer s Guide to Big Data Integration Sponsored by Contents Introduction 1 Challenges of Big Data Integration: New and Old 1 What You Need for Big Data Integration 3 Preferred Technology

More information

The Big Data Ecosystem at LinkedIn Roshan Sumbaly, Jay Kreps, and Sam Shah LinkedIn

The Big Data Ecosystem at LinkedIn Roshan Sumbaly, Jay Kreps, and Sam Shah LinkedIn The Big Data Ecosystem at LinkedIn Roshan Sumbaly, Jay Kreps, and Sam Shah LinkedIn Presented by :- Ishank Kumar Aakash Patel Vishnu Dev Yadav CONTENT Abstract Introduction Related work The Ecosystem Ingress

More information

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web

More information

Advanced In-Database Analytics

Advanced In-Database Analytics Advanced In-Database Analytics Tallinn, Sept. 25th, 2012 Mikko-Pekka Bertling, BDM Greenplum EMEA 1 That sounds complicated? 2 Who can tell me how best to solve this 3 What are the main mathematical functions??

More information

Big Data Big Data/Data Analytics & Software Development

Big Data Big Data/Data Analytics & Software Development Big Data Big Data/Data Analytics & Software Development Danairat T. danairat@gmail.com, 081-559-1446 1 Agenda Big Data Overview Business Cases and Benefits Hadoop Technology Architecture Big Data Development

More information

Hadoop Big Data for Processing Data and Performing Workload

Hadoop Big Data for Processing Data and Performing Workload Hadoop Big Data for Processing Data and Performing Workload Girish T B 1, Shadik Mohammed Ghouse 2, Dr. B. R. Prasad Babu 3 1 M Tech Student, 2 Assosiate professor, 3 Professor & Head (PG), of Computer

More information

Big Data and Data Science: Behind the Buzz Words

Big Data and Data Science: Behind the Buzz Words Big Data and Data Science: Behind the Buzz Words Peggy Brinkmann, FCAS, MAAA Actuary Milliman, Inc. April 1, 2014 Contents Big data: from hype to value Deconstructing data science Managing big data Analyzing

More information

Session 1: IT Infrastructure Security Vertica / Hadoop Integration and Analytic Capabilities for Federal Big Data Challenges

Session 1: IT Infrastructure Security Vertica / Hadoop Integration and Analytic Capabilities for Federal Big Data Challenges Session 1: IT Infrastructure Security Vertica / Hadoop Integration and Analytic Capabilities for Federal Big Data Challenges James Campbell Corporate Systems Engineer HP Vertica jcampbell@vertica.com Big

More information

CSC590: Selected Topics BIG DATA & DATA MINING. Lecture 2 Feb 12, 2014 Dr. Esam A. Alwagait

CSC590: Selected Topics BIG DATA & DATA MINING. Lecture 2 Feb 12, 2014 Dr. Esam A. Alwagait CSC590: Selected Topics BIG DATA & DATA MINING Lecture 2 Feb 12, 2014 Dr. Esam A. Alwagait Agenda Introduction What is Big Data Why Big Data? Characteristics of Big Data Applications of Big Data Problems

More information

The Future of Data Management with Hadoop and the Enterprise Data Hub

The Future of Data Management with Hadoop and the Enterprise Data Hub The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah Cofounder & CTO, Cloudera, Inc. Twitter: @awadallah 1 2 Cloudera Snapshot Founded 2008, by former employees of Employees

More information

Big Data. What is Big Data? Over the past years. Big Data. Big Data: Introduction and Applications

Big Data. What is Big Data? Over the past years. Big Data. Big Data: Introduction and Applications Big Data Big Data: Introduction and Applications August 20, 2015 HKU-HKJC ExCEL3 Seminar Michael Chau, Associate Professor School of Business, The University of Hong Kong Ample opportunities for business

More information

Managing Cloud Server with Big Data for Small, Medium Enterprises: Issues and Challenges

Managing Cloud Server with Big Data for Small, Medium Enterprises: Issues and Challenges Managing Cloud Server with Big Data for Small, Medium Enterprises: Issues and Challenges Prerita Gupta Research Scholar, DAV College, Chandigarh Dr. Harmunish Taneja Department of Computer Science and

More information

Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA

Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA http://kzhang6.people.uic.edu/tutorial/amcis2014.html August 7, 2014 Schedule I. Introduction to big data

More information

Native Connectivity to Big Data Sources in MicroStrategy 10. Presented by: Raja Ganapathy

Native Connectivity to Big Data Sources in MicroStrategy 10. Presented by: Raja Ganapathy Native Connectivity to Big Data Sources in MicroStrategy 10 Presented by: Raja Ganapathy Agenda MicroStrategy supports several data sources, including Hadoop Why Hadoop? How does MicroStrategy Analytics

More information