Big Data. On Distributed Systems to a Distributed Artificial Intelligence. Food for Financials Maastricht, The Netherlands, June 10 th, 2014

Size: px
Start display at page:

Download "Big Data. On Distributed Systems to a Distributed Artificial Intelligence. Food for Financials Maastricht, The Netherlands, June 10 th, 2014"

Transcription

1 Big Data On Distributed Systems to a Distributed Artificial Intelligence (Extended Version incl. Some Banking and Financial Market Aspects) Food for Financials Maastricht, The Netherlands, June 10 th, 2014 Univ.-Prof. Dr. rer. nat. Sabina Jeschke IMA/ZLW & IfU Faculty of Mechanical Engineering RWTH Aachen University

2 Outline 2 I. A Bit of a Jump into the Deep End II. III. IV. In Search of a Definition From Google Search and the 3Vs of Big Data Finally Ending up with Distributed Systems and Artificial Intelligence Expansion Big Facts and High Figures and Applications in the Banking and Financial Markets Methods and Concepts behind the Trend About Challenges their Solutions and Open Issues and the Players in the Game V. A Look Ahead From Big Data to Cyber Physical Systems, the Internet of Things and Industry 4.0 VI. Summary

3 A bit of a jump into the deep end Google Trends: Basics 3 Google Trends: Understanding interests a search-term is analyzed relative to total search-volume - across various regions of the world, and in various languages

4 A bit of a jump into the deep end Google Flu: Predicting Future (Predicting the spread of diseases) 4 It all started with the flu [Google Correlate 2011] actual flu trend can be identified 7-10 days earlier by Google Flu Trends than by official data of the Center for Disease Control (CDC) [Helft 2008]

5 A bit of a jump into the deep end Pandemics: Exploring New Patterns of Complex Scenarios 5 A circle-model to foresee and to analyze pandemics [Brockmann and Helbing 2013] Computational work conducted at Northwestern University has led to a new mathematical theory for understanding the global spread of epidemics. [ScienceDaily 2013] The spreading takes place on the worldwide air transportation network of more than 4000 airports and direct links. [Brockmann/Helbing 2013] Is the spread of infectious diseases complex, or does it look just complex? [Erickson 2013] Using data of flights, trains, etc. the cities are rearranged. Result is simple: a circular wave that produces a stone in the water. Here: distances of places and countries adjusted depending on the flight connections

6 A bit of a jump into the deep end Predicting human behavior: Election forecast 6? How Nate Silver won the election with Data Science [Smith 2012] Many data sources Using the past Consistent models Understanding limitations The man behind the forecast Nate Silver (born January 13, 1978) 2008 Presidential Election (49 out of 50 states correct) 2013 Academy Awards (3 of 4 winners correct) 2012 Presidential Election (50 out of 50 states correct)

7 A bit of a jump into the deep end IBM s Watson in Action 7 Challenge: Building a computer system that could compete at the human champion level in real time in the American TV quiz show, Jeopardy [Ferrucci et al. 2010]! Watson is an artificial intelligence capable of answering questions in natural language What is Watson? Represented by the IBM s Smarter Planet logo, Watson is ten racks of ten Power 750 servers. Watson s life began five years before the show as a Grand Challenge for IBM (like Deep Blue and Blue Gene before). I, for one, welcome our new computer overlords Ken Jennings' response to losing to an exhibition Jeopardy match to Watson

8 A bit of a jump into the deep end Change of society - Google replacing grandparents? 8 Grandparent used to be a synonymous with a spring of knowledge called upon to pass down treasures of information to new generations [Emling 2013] Grandparents knowledge: informal knowledge everyday life experience incl. common sense The survey of 1,500 grandparents found that children are increasingly using the internet to answer simple questions. [Telegraph 2013] Towards the next steps in artificial intelligence: Google from an expert system to a machine with common sense? Google Trends [Biermann 2013] Trending How To , United States 1. How to Tie a Tie 2. How to File 3. How to Get a Passport 4. How to Blog 5. How to Knit 6. How to Kiss 7. How to Flirt 8. How to Whistle 9. How to Unjailbreak 10. How to Vader

9 Outline 9 I. A Bit of a Jump into the Deep End II. III. IV. In Search of a Definition From Google Search and the 3Vs of Big Data Finally Ending up with Distributed Systems and Artificial Intelligence Expansion Big Facts and High Figures and Applications in the Banking and Financial Markets Methods and Concepts behind the Trend About Challenges their Solutions and Open Issues and the Players in the Game V. A Look Ahead From Big Data to Cyber Physical Systems, the Internet of Things and Industry 4.0 VI. Summary

10 In search of a definition Let s ask Google 10 Big data is the term for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications. The challenges include capture, curation, storage, search, sharing, transfer, analysis and visualization. Big Data refers to technologies and initiatives that involve data that is too diverse, fastchanging or massive for conventional technologies, skills and infrastructure to address efficiently. Said differently, the volume, velocity or variety of data is too great. But today, new technologies make it possible to realize value from Big Data. Every day, we create 2.5 quintillion bytes of data - so much that 90% of the data in the world today has been created in the last two years alone. This data comes from everywhere: sensors used to gather climate information, posts to social media sites, digital pictures and videos, purchase transaction records, and cell phone GPS signals to name a few. This data is big data.

11 Key factors and characteristics The gist of the matter 11? What are the main characteristics of data in Big Data? Volume Velocity Variety Veracity Data at rest Data in motion Data in many forms Data in doubt Terabytes to exabytes of existing data to process Streaming data, milliseconds to seconds to respond e.g. in high frequency trading Structured, unstructured, text, multimedia Uncertainty due to data inconsistency, incompleteness, ambiguities, latency, deception, approximations The 3Vs of Big Data [Gardner 2001] [adapted form Data Science Center]

12 The crux of the matter Big Data induce Intelligence : From Big Data to Smart Data 12 The Big Data analysis pipeline!!! transfers big data (many ) into smart data (meaningful data) accumulates intelligence from information fragments is a pipeline of aggregating (artificial) intelligence. Acquisition/ Recording Extraction/ Cleaning/ Annotation Integration/ Aggregation/ Representation Analysis/ Modeling Interpretation +

13 Further characteristics Big Data is distributed 13? Big Data is distributed Generated by a distributed world In multiple domains, applications and users generate data that is (partially) Big Data. Hence, Big Data is generated by a distributed world. Stored in distributed file systems Big Data is structured and unstructured (variability) and its size is enormous (volume). Distributed file systems are required to reliably scale to petabytes of data and thousands of machines. Analyzed by distributed computing The requirements of Big Data analytics regarding volume and velocity can only be satisfied, by distributed computing solutions.

14 Facts about the infrastructure of Big Data About servers and storage 14 Storing and processing Big Data, a matter of servers and cores We have something over a million servers in our datacenter infrastructure. Google is bigger, Amazon is a little bit smaller. Microsoft s 2013 Worldwide Partner Conference, Steve Ballmer, 2013 An unofficial estimate puts the number of Google servers to more than 2 million. [Pearn 2012] It is estimated that Google owns more than 2% of all the World s servers. [intac 2010] For Amazon Web Services it is estimated that Amazon has at least 454,400 servers in seven data center hubs around the globe. [Miller 2012]

15 The way so far and beyond Two Worlds coming together 15 Distributed Systems Big data - Volume AI Distributed sources SMART Distributed storage Distributed computing Velocity Distributed Artificial Intelligence Real-time capability Autonomy Variety Veracity Social media data FAST DS Natural language analysis Prediction Smart data Artificial Intelligence

16 Outline 16 I. A Bit of a Jump into the Deep End II. III. IV. In Search of a Definition From Google Search and the 3Vs of Big Data Finally Ending up with Distributed Systems and Artificial Intelligence Expansion Big Facts and High Figures and Applications in the Banking and Financial Markets Methods and Concepts behind the Trend About Challenges their Solutions and Open Issues and the Players in the Game V. A Look Ahead From Big Data to Cyber Physical Systems, the Internet of Things and Industry 4.0 VI. Summary

17 Evolution of the term Big Data Is there a definitive date? 17 In parallel : The term software was established in 1958 in the article The American Mathematical Monthly, written by John Tukey? What is about the term Big Data? Is there such a definitive date?! We call this the problem of big data. term first mentioned in a research article 1997 [Cox and Ellsworth 1997] Google Search Trends Big Data? Google Trends for Big Data shows an explosive growth in popularity of this term, starting around 2011 Gregory Piatetsky-Shapiro, Editor at KDnuggets

18 Evolution of Big Data as a research topic Taking a look at the published papers 18! Big Data in research emerged around 2008 [Halevi and Moed 2012] Number of Big Data papers per year Since 2000 the field is led by computer science followed by engineering and mathematics First appearance of the term in 1970 in an article on atmospheric and oceanic soundings 2 Until 2000 led by computer engineering but also in areas such as building materials, electric and telecommunication s 1980s 1990s 2000s 2010s 0

19 Evolution of Big Data as a research topic and the related disciplines 19! Big Data research is addressed by multiple disciplines [Halevi and Moed 2012] 1 Top subject area in Big Data research is computer science Computer Science Engineering 171 papers Other disciplines investigate the topic (like engineering, mathematics, ) Some areas expected to be evident show no significant growth (like chemistry, energy and humanities) Mathematics Business, Management and Accounting Physics and Astronomy Biochemistry, Genetics and Molecular Biology Social Science Materials Science ! In fact, there is a growing interest in the development of infrastructure for e-science for humanities Medicine Decision Sciences Multidisciplinary Arts and Humanities 11

20 Evolution of Big Data in banking and financial markets How banking and financial market players see Big Data 20 The industry [financial institutions] has been analyzing structured information for many years, but the new growth now is in unstructured data. [Andy Hirst, senior director of Industry Marketing for SAP, at the International SAP Conference for Financial Services in July, 2013] 1 Creating a competitive advantage for financial markets firms 2 Big data activities in banking and financial markets The faster a bank can analyze data, the better the predictive value. 27% 28% [Bryan Yurcan, associate editor for Bank Systems and Technology, 2013] % 71% 47% 47% % 69% % 37% Banking and financial markets Global 26% 24% Banking and Global financial markets Pilot and implementation of big data activities Planning big data activities Have not begun big data activities [Analytics: The real-world use of big data in financial services, IBM, 2012]

21 Evolution of Big Data in banking and financial markets Uses cases of Big Data in banking and financial markets Fraud Detection Compliance / Monitoring Customer Segmentation By 2016, 25 percent of large global companies will have adopted big data analytics for at least one security or fraud detection use case and will achieve a positive return on investment within the first six months of implementation [Gartner Business Intelligence & Analytics Summit 2014] Dodd-Frank Act, Solvency II and EMIR define new requirements regarding documentation and monitoring. New deal monitoring systems emerge that are based on Big Data technology. What products are they most likely to be interested in? How can they be persuaded toward the right product for both the Financial Institution and themselves? [ ] All of these relate to customer segmentation, but a much more dynamic way to segment customers than have historically been employed. [Oracle, Big Data Analytics: Financial Services Industry Use Cases, 2012] 4 Risk and Trading Analytics Use cases are a summarization of [SAP AG, Top 5 Big Data Uses Cases in Banking and Financial Services, 2014] and [Ruchi Verma and Sathyan R. Mani, Infosys, Use of Big Data Technologies in Capital Markets, 2012] the term big data [ ] made its way into compliance, internal audit and fraud risk management-related publications. [ ] 72% of respondents believe that emerging big data technologies can play a key role in fraud prevention and detection. Yet only 7% of respondents are aware of any specific big data technologies, and only 2% of respondents are actually using them. [Ernst & Young, Big risks require big data thinking, Global Forensic Data Analytics Survey 2014]

22 Evolution of Big Data in banking and financial markets Sense & responds systems using e.g. Twitter tweets 22 Customer segmentation The idea of segmentation One of the essential topics taught in introductory marketing courses is the concept of market segmentation, which is the division of a market into groups of consumers that share one or more characteristics. [Steve Offsey, CEO MarketBuildr, 2012] Segmentation in the age of Big Data: Understanding the customer s DNA From B2C segmentation (geographic, demographic, ) to more complex segmentation models New types of segments can be derived due to new types of data (activitybased, social network profiles, social influence and sentiment data) Resulting in dynamic micro-segments (identified by data mining and artificial intelligence algorithms) In fact, we believe that a combination of psychology and data science is the only way for marketing leaders to unlock value from insights that are unlikely to be found purely through data mining. [Punchh Launches Big Data Customer Segmentation 2013] [Big Data: from mining tomeaning, Sandra Pickering] There s much buzz around big data everywhere. But the big problem is that until now, big data analytics has mostly been about tools for large companies with big budgets.

23 Evolution of Big Data in banking and financial markets Sense & Responds systems using Twitter tweets 23 Trading Analytics Changing trading strategies Capital markets have evolved from simple strategies, like 1980s-paired models, to the intricate gaming strategies of today. Trading strategies have started including unstructured data. [Ruchi Verma and Sathyan R. Mani, Infosys, Use of Big Data Technologies in Capital Markets, 2012] Example: Evolution of Sense & Response System to handle responsiveness Social media meets financial services Dataminr transforms the Twitter stream into actionable alerts, identifying the most relevant information in real-time for clients in Finance, News and the Public Sector. [dataminr.com, 2014] On November 11, 2013, a few minutes after 8 a.m. EST, news leaked out from a Canadian newspaper that Blackberry s $4.7 billion buyout had collapsed. Wall Street wouldn t find out for a full 180 seconds, when the newswires picked up the report in real time. [Brian O Connell, Can Tweets And Facebook Posts Predict Stock Behavior?, 2014] On March 8, a Royal Caribbean cruise ship arrived in Port Everglades, Florida, with 105 passengers and three crew members sick with norovirus. When that news broke, it sent Royal Carribean Cruises Ltd. s share prices tumbling by 2.9%. But Dataminr clients had the news 48 minutes earlier. [Stan Alcorn, Twitter Can Predict The Stock Market, If You're Reading The Right Tweets, 2013]

24 Outline 24 I. A Bit of a Jump into the Deep End II. III. IV. In Search of a Definition From Google Search and the 3Vs of Big Data Finally Ending up with Distributed Systems and Artificial Intelligence Expansion Big Facts and High Figures and Applications in the Banking and Financial Markets Methods and Concepts behind the Trend About Challenges their Solutions and Open Issues and the Players in the Game V. A Look Ahead From Big Data to Cyber Physical Systems, the Internet of Things and Industry 4.0 VI. Summary

25 Challenge Scale Data-parallel models - Fynn classification of distributed systems 25 This is all about parallelization and distributed computation regular desktop computer (one single CPU; not a distributed system!) Do the CPUs apply the same instruction on all data oder different ones? Single Instruction SISD Multiple Instruction MISD Do all CPUs have their one storage, or are they sharing it? Single Data Multiple Data SIMD MIMD

26 Challenge Scale MapReduce - first major data-processing paradigm 26 Example Extreme high degree of parallelization possible Word recognition Re-arrangement of results by words (before: by documents) Summarizing all findings (by words) Parallel handling of several documents Doc 1 word1 word2 word2 word3 Doc 2 word2 word2 word1 word3 Map Map (word1, 1) (word2, 1) (word2, 1) (word3, 1) (word2, 1) (word2, 1) (word1, 1) (word3, 1) (word1, (1,1)) (word2, (1,1,1,1)) (word3, (1,1)) Reduce Reduce Reduce (word1, 2) (word2, 4) (word3, 2) Result: word frequency of different words in the collection of documents Input Map Summarize, Group, Share Reduce Result

27 Challenge Scale The Google Story - Data Analysis beyond MapReduce 27 The Google File System by Google published 2003 A large distributed file system, files are split into chunks and stored in a redundant fashion MapReduce by Google published 2004 THE distributed search algorithm, highly parallelizable Bigtable 2006 by Google published 2006 High performance no-sql data base incl. timestamps, thus keeping old versions (history) [M. Braun, TU Berlin, 2013] Percolator by google published 2010 Describes how web search index is keeped up to date, acting on top of Bigtable Pregel by Google published 2010 Mining graph data: system for large-scale graph processing Dremel by Google published 2010 Basis for Online visualizations. Acting on JSON object instead on tables with fixed fields; core element of BigQuery

28 Challenge Scale Dremel Fast Analysis of Nested Data [Tech Mortal, 2013] 28 Speed matters most! Background: For large data amount, batch processing is becoming slower and slower. Alternatively, dialogue-based processing structures are desired which focus the search on the relevant part of the data GOAL METHOD TODAY Real-time interactive analysis of massive datasets queries with response times below 20 minutes columnar storage for fast data scanning tree architecture for dispatching queries and aggregating results across huge computer clusters SQL like queries - more realistic for speed increase. Dremel: core element in Googles BigQuery engine realized as a SaaS in a cloud Google BigQuery allows users to conduct big data analysis, with no need to operate a data center. r 1 r 1 A B E r 2 r 2 C D From record-oriented back to column-oriented Nested data structure

29 Comparison of hypes NoSQL Data Base Models 29 NoSQL not only SQL (not: no SQL) Background: relational data bases are efficient for many but small transaktionen or for large transactions with rare writing processes. They are bad if it comes to many transaktionen AND many writing processes at the same time. Data model: beyond the SQL-standards based on relational databases, treelike structures etc. come into place (no need to press data into structures where they do not fit ); Documents with nested structures instead of tables Horizontal Scalability: easy since build on distributed architectures simply by adding additional nodes Famous examples: Google BigTable, Amazon Dynamo, several open source products as e.g. MongoDB Media types: text, binary large objects (picture, video, audio), nested structures as JSON objects, YAML etc. Improvements: flexible schema and automatic indexes, and: optimized for Full Text Search History: Development since 1998 (older name: document-oriented databases) Current development Standard SQL data bases are integrating additional nosql features

30 Triumph of Big Data computing Natural Language Processing: IBM s DeepQA and Watson 30 Artificial intelligence meets Big Data! Problem: The open-domain Question-Answer problem Domains: Information retrieval, natural language processing, knowledge representation, reasoning, machine learning, computer-human interfaces X ran this? demo: If leadership is an art then surely Jack Welch has proved himself a master painter during his tenure at GE. Background: Consider, for example, the Computer in Star Trek. Taken to its ultimate form, broad and accurate open-domain question answering may well represent a crowning achievement for the field of Artificial Intelligence (AI). [IBM 2011]! Solution: DeepQA A massively parallel probabilistic evidence-based architecture 3 years of effort 20 researchers and software engineers operated within the winner s cloud 3 seconds response time [Ferrucci et al. 2010]

31 Technology timeline The early years ( ) 31 Web data Social data Berkeley DB Apache Lucene Tokyo Cabinet Lucene OpenSource Nutch Apache Solr Google Search Google FileSystem Google MapReduce Larry Page and Sergey Brin begin their research project at Stanford (later: Google) Doug Cutting writes Lucene (key component of Nutch, Solr and ElasticS earch ); opensource search framework Larry Page and Sergey Brin present the large-scale Hypertextual Web Search Engine Google Doug Cutting and Mike Cafarella present Nutch (search engine, formerly part of Lucene) Google releases a paper about applying MapReduce on Large Data Clusters Red box: open source Bluebox: Google Data handling / storage Search High-level language Distributed computation adopted from [Outliers 2013]

32 Technology timeline Today (since 2005) 32 Mobile data Apache Hadoop Apache Pig Mongo DB Apache Hive Disco Project Elastic Search Apache Storm Dynamo DB Orient DB Couch DB Google BigTable Apache HBase Redis Apache Spark Apache Cassandra Google Dremel Doug Cutting and Mike Cafarella write Hadoop, a framework derived from Google s MapReduce Pig originated by Yahoo became an opensource project of the Apache Foundation. Google releases a paper /Melnik et.al.) about applying Dremel and BigQuery Apache Storm, an event processor and distributed computation framework is released - Clojure programming language, provides similar functionality as MapReduce Red box: open source Blue box: Google Data handling / storage Search High-level language Distributed computation adopted from [Outliers 2013]

33 Outline 33 I. A Bit of a Jump into the Deep End II. III. IV. In Search of a Definition From Google Search and the 3Vs of Big Data Finally Ending up with Distributed Systems and Artificial Intelligence Expansion Big Facts and High Figures and Applications in the Banking and Financial Markets Methods and Concepts behind the Trend About Challenges their Solutions and Open Issues and the Players in the Game V. A Look Ahead From Big Data to Cyber Physical Systems, the Internet of Things and Industry 4.0 VI. Summary

34 A look ahead About the potential of Big Data in social science 34 NSF Strategic Plan The revolution in information and communications technologies is another major factor influencing the conduct of 21st century research. New cyber tools for collecting, analyzing, communicating, and storing information are transforming the conduct of research and learning. One aspect of the information technology revolution is the data deluge, shorthand for the emergence of massive amounts of data and the changing capacity of scientists and engineers to maintain and analyze it. The new availability of data, presents a huge potential for researchers in social science Peter Doorn, director at Data Archiving and Network Services In social science research, there is a great tradition of survey methodology with people doing interviews about all kinds of ideas people may have. However, a new approach is to do things like a sentiment analysis on Twitter posts, for example. This is a totally new way of getting knowledge about what is going on in society.

35 A Paradigm shift to computational social science Big Data opens new opportunities for humanities 35 Big Data can help to reduce Runkel & McGrath s three-horned dilemma [Chang et al. 2013] Obtrusive research operations Control is more fully achieved with realism and generality Little control is given up outside the lab Judgement Tasks Field Experiments Lower in costs The three horns of Runkel/McGrath s framework are realism, generality, precision. Because of data collection limitations of the traditional data collection methods, no method can be general, realistic and precise all at the same time. research metrologies are dilemmatic because of the nature of data collection performed in each research approach. [Kaufmann & Wood 2003] Unobtrusive research operations Much easier to do now Universal behavior systems Formal Theory Computer Simulations Particular behavior systems Realism is supported by control, generality more fully than before Overcoming scholastical differences between qualitative and quantitative research methods Enhances Grounded Theory picture adapted from [Chang, Kauffman and Kwon, 2013]! (We) should stop acting as if our goal is to author extremely elegant theories, and instead embrace complexity and make use of the best ally we have: the unreasonable effectiveness of data. [Halevy et al. 2009]

36 A bit of a jump into the deep end Predicting human behavior: Transparent consumers 36 How Target figured out a teen girl was pregnant before her father did Unique Target Id Each interaction with retailer is assigned to that id Group of pregnant customers Customer Coupon campaign Customer profiles Clustering customers into groups, for example to identify disruptions in life (e. g. weddings, job changes and pregnancy) Andrew Pole Statistician working for Target Pole identified about 25 products that allowed him to assign each customer a pregnancy prediction score and the estimated due date

37 A look ahead Roboter Recruiting : Don't call us, we'll call you 37 in more and more companies, computer algorithms are part of the employment of new workers [Handelsblatt 03/2014]! CV data are combined with success data of the particular company or field Germany: about 40% USA: > 90% Great developers are everywhere, and Gild can prove it. on Fairness? different mental models between human and computer software based selection is incapable of analyzing true motivation, extraordinary engagement etc. Talents might be overlooked / lost But: Selection shows a higher degree of equal opportunities regarding gender, age, culture, etc. Selection shows a higher degree of tolerance in respect of disruptions in the CV

38 Multi- and interdisciplinary challenge The Sexiest Job of the 21st Century 38 Transforming data into business value: The data scientist. It s a high-ranking professional with the training and curiosity to make discoveries in the world of big data. [Davenport and Patil 2012] A data scientist is somebody who is inquisitive, who can stare at data and spot trends. It's almost like a Renaissance individual who really wants to learn and bring change to an organization. [IBM 2014]! Demand for people with deep expertise in data analysis [McDonnell 2011] 2008 employment Forecast of graduates 156 thousands +161 Towards artificial intelligence: and with a sound knowledge on machine learning, natural language processing, etc. etc. Adjustments (reemployment & attrition) Projected 2018 supply % gap

39 Facts about the digital universe Some Facts on Data in the digital universe 39 Fact 1 From 2005 to 2020, the digital universe will grow by a factor of 300, (more than 5,200 gigabytes for every man, woman, and child). From now until 2020, the digital universe will about double every two years. [IDC 2012] The Digital Universe [IDC 2012] Fact 2 The investment in spending on IT considered the "infrastructure" of the digital universe will grow by 40% between 2012 and As a result, the investment per GB will drop from $2.00 to $0.20. [IDC 2012] Fact 3 Fact 4 A majority of the information in the digital universe, 68% in 2012, is created by consumers [ ]. Yet enterprises have liability or responsibility for nearly 80% They deal with issues of copyright, privacy, and compliance [IDC 2012] Only a tiny fraction of the digital universe has been explored for analytic value. By 2020, as much as 33% of the digital universe will contain information that might be valuable if analyzed. [IDC 2012] Consumer-generated (1,934 EB) Enterprise Touch (2,225 EB) Overlap (1,342 EB) Digital Universe Useful if tagged & analyzed

40 The fourth industrial (r)evolution Big Data meets Industry Everybody & everything is networked 40 The first three industrial revolutions came about as a result of mechanisation, electricity and IT. The introduction of the Internet of Things is ushering in a fourth industrial revolution. Industry 4.0 will address and solve some of the challenges facing the world today such as resource and energy efficiency, urban production and demographic change. Henning Kagermann et.al., acatech, 2013 Vision of Wireless Next Generation System (WiNGS) Lab at the University of Texas at San Antonio, Dr. Kelley Weidmüller, Vision Industrial Revolution 4.0 Intelligently networked, self-controlling manufacturing systems local to global local to global around 1750 around 1900 around 1970 today 1 st industrial revolution Mechanical production systematically using the power of water and steam Power revolution Centralized electric power infrastructure; mass production by division of labor Digital revolution Digital computing and communication technology, enhancing systems intelligence Information revolution Everybody and everything is networked networked information as a huge brain

41 Cyber-Physical Systems Towards complex and networked social-technical systems 41 let s have a look Communication Consumer Energy Infrastructure Health Care Manufacturing Military Robotics Transportation [CAR2CAR, 2011] and [ConnectSafe, 2011]

42 The fourth industrial (r)evolution Not Restricted to Industry: Cyber Physical Systems in All Areas 42 Back to: The earth converted into a huge brain Tesla 1926 Integrating complex information from multiple heterogenous sources opens multiple possibilities of optimization: e.g. energy consumption, security services, rescue services as well as increasing the quality of life Building automation Smart metering Smart grid Room automation Smart environment and more

43 Outline 43 I. A Bit of a Jump into the Deep End II. III. IV. In Search of a Definition From Google Search and the 3Vs of Big Data Finally Ending up with Distributed Systems and Artificial Intelligence Expansion Big Facts and High Figures and Applications in the Banking and Financial Markets Methods and Concepts behind the Trend About Challenges their Solutions and Open Issues and the Players in the Game V. A Look Ahead From Big Data to Cyber Physical Systems, the Internet of Things and Industry 4.0 VI. Summary

44 Summary Big Data - Underneath the hype Google Car So what exactly is the real revolution? It s not data. It s being data-driven. Collect and organize data and use tools to extract information and gain insights Quickly and effectively test the derived hypothesis to prove cause-and-effect The real data revolution won t be a sugar-coated miracle pill that anyone can adopt simply by buying some software, hiring a data scientist, and a cloud full of data. Many organizations will be usurped by new competitors who grow up natively with this new worldview. [Brinker 2013] Collect from distributed sources and organize in distributed systems Social media and crowd data collection Distributed storage, querying and analysis Derive and test hypothesis and analyze for insights Hypothesis /evidence scoring based on evidence models Sense and response Predictive analysis Distributed systems technically and socially Distributed artificial intelligence Artificial Intelligence smart computer systems

45 45 Thank you! Univ.-Prof. Dr. rer. nat. Sabina Jeschke Head of Institute Cluster IMA/ZLW & IfU phone: Co-authored by: Dr.-Ing. Tobias Meisen Institute Cluster IMA/ZLW & IfU phone:

46 Prof. Dr. rer. nat. Sabina Jeschke Born in Kungälv/Schweden 1991 Birth of Son Björn-Marcel Studies of Physics, Mathematics, Computer Sciences, TU Berlin 1994 NASA Ames Research Center, Moffett Field, CA/USA 10/1994 Fellowship Studienstiftung des Deutschen Volkes 1997 Diploma Physics Research Fellow, TU Berlin, Institute for Mathematics Lecturer, Georgia Institute of Technology, GA/USA Project leadership, TU Berlin, Institute for Mathematics 04/2004 Ph.D. (Dr. rer. nat.), TU Berlin, in the field of Computer Sciences from 2004 Set-up and leadership of the Multimedia-Center at the TU Berlin Juniorprofessor New Media in Mathematics & Sciences & Director of the Media-center MuLF, TU Berlin Univ.-Professor, Institute for IT Service Technologies (IITS) & Director of the Computer Center (RUS), Department of Electrical Engineering, University of Stuttgart since 06/2009 Univ.-Professor, Institute for Information Management in Mechanical Engineering (IMA) & Center for Learning and Knowledge Management (ZLW) & Institute for Management Cybernetics (IfU), RWTH Aachen University since 10/2011 Vice dean of the department of Mechanical Engineering, RWTH Aachen University since 03/2012 Chairwoman VDI Aachen

Big Data. On Distributed Systems to a Distributed Artificial Intelligence. Aachener Dienstleistungsforum Aachen, Germany, March 26 th, 2014

Big Data. On Distributed Systems to a Distributed Artificial Intelligence. Aachener Dienstleistungsforum Aachen, Germany, March 26 th, 2014 Big Data On Distributed Systems to a Distributed Artificial Intelligence Aachener Dienstleistungsforum Aachen, Germany, March 26 th, 2014 Univ.-Prof. Dr. rer. nat. Sabina Jeschke IMA/ZLW & IfU Faculty

More information

ehumanities From big data and digital technologies to new and/or enhanced methods in humanities and social sciences

ehumanities From big data and digital technologies to new and/or enhanced methods in humanities and social sciences ehumanities From big data and digital technologies to new and/or enhanced methods in humanities and social sciences ICT-Workshop ehumanities Aachen, Germany, April 9 th, 2014 Univ.-Prof. Dr. rer. nat.

More information

Danny Wang, Ph.D. Vice President of Business Strategy and Risk Management Republic Bank

Danny Wang, Ph.D. Vice President of Business Strategy and Risk Management Republic Bank Danny Wang, Ph.D. Vice President of Business Strategy and Risk Management Republic Bank Agenda» Overview» What is Big Data?» Accelerates advances in computer & technologies» Revolutionizes data measurement»

More information

International Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: 2454-2377 Vol. 1, Issue 6, October 2015. Big Data and Hadoop

International Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: 2454-2377 Vol. 1, Issue 6, October 2015. Big Data and Hadoop ISSN: 2454-2377, October 2015 Big Data and Hadoop Simmi Bagga 1 Satinder Kaur 2 1 Assistant Professor, Sant Hira Dass Kanya MahaVidyalaya, Kala Sanghian, Distt Kpt. INDIA E-mail: simmibagga12@gmail.com

More information

How To Handle Big Data With A Data Scientist

How To Handle Big Data With A Data Scientist III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution

More information

Big Data a threat or a chance?

Big Data a threat or a chance? Big Data a threat or a chance? Helwig Hauser University of Bergen, Dept. of Informatics Big Data What is Big Data? well, lots of data, right? we come back to this in a moment. certainly, a buzz-word but

More information

Surfing the Data Tsunami: A New Paradigm for Big Data Processing and Analytics

Surfing the Data Tsunami: A New Paradigm for Big Data Processing and Analytics Surfing the Data Tsunami: A New Paradigm for Big Data Processing and Analytics Dr. Liangxiu Han Future Networks and Distributed Systems Group (FUNDS) School of Computing, Mathematics and Digital Technology,

More information

Big Data: Opportunities & Challenges, Myths & Truths 資 料 來 源 : 台 大 廖 世 偉 教 授 課 程 資 料

Big Data: Opportunities & Challenges, Myths & Truths 資 料 來 源 : 台 大 廖 世 偉 教 授 課 程 資 料 Big Data: Opportunities & Challenges, Myths & Truths 資 料 來 源 : 台 大 廖 世 偉 教 授 課 程 資 料 美 國 13 歲 學 生 用 Big Data 找 出 霸 淩 熱 點 Puri 架 設 網 站 Bullyvention, 藉 由 分 析 Twitter 上 找 出 提 到 跟 霸 凌 相 關 的 詞, 搭 配 地 理 位 置

More information

Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA

Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA http://kzhang6.people.uic.edu/tutorial/amcis2014.html August 7, 2014 Schedule I. Introduction to big data

More information

Are You Ready for Big Data?

Are You Ready for Big Data? Are You Ready for Big Data? Jim Gallo National Director, Business Analytics February 11, 2013 Agenda What is Big Data? How do you leverage Big Data in your company? How do you prepare for a Big Data initiative?

More information

BIG DATA CHALLENGES AND PERSPECTIVES

BIG DATA CHALLENGES AND PERSPECTIVES BIG DATA CHALLENGES AND PERSPECTIVES Meenakshi Sharma 1, Keshav Kishore 2 1 Student of Master of Technology, 2 Head of Department, Department of Computer Science and Engineering, A P Goyal Shimla University,

More information

Big Data Buzzwords From A to Z. By Rick Whiting, CRN 4:00 PM ET Wed. Nov. 28, 2012

Big Data Buzzwords From A to Z. By Rick Whiting, CRN 4:00 PM ET Wed. Nov. 28, 2012 Big Data Buzzwords From A to Z By Rick Whiting, CRN 4:00 PM ET Wed. Nov. 28, 2012 Big Data Buzzwords Big data is one of the, well, biggest trends in IT today, and it has spawned a whole new generation

More information

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web

More information

COMP9321 Web Application Engineering

COMP9321 Web Application Engineering COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 11 (Part II) http://webapps.cse.unsw.edu.au/webcms2/course/index.php?cid=2411

More information

BIG DATA What it is and how to use?

BIG DATA What it is and how to use? BIG DATA What it is and how to use? Lauri Ilison, PhD Data Scientist 21.11.2014 Big Data definition? There is no clear definition for BIG DATA BIG DATA is more of a concept than precise term 1 21.11.14

More information

BIG DATA TECHNOLOGY. Hadoop Ecosystem

BIG DATA TECHNOLOGY. Hadoop Ecosystem BIG DATA TECHNOLOGY Hadoop Ecosystem Agenda Background What is Big Data Solution Objective Introduction to Hadoop Hadoop Ecosystem Hybrid EDW Model Predictive Analysis using Hadoop Conclusion What is Big

More information

Getting to Know Big Data

Getting to Know Big Data Getting to Know Big Data Dr. Putchong Uthayopas Department of Computer Engineering, Faculty of Engineering, Kasetsart University Email: putchong@ku.th Information Tsunami Rapid expansion of Smartphone

More information

Big Data and Analytics: Challenges and Opportunities

Big Data and Analytics: Challenges and Opportunities Big Data and Analytics: Challenges and Opportunities Dr. Amin Beheshti Lecturer and Senior Research Associate University of New South Wales, Australia (Service Oriented Computing Group, CSE) Talk: Sharif

More information

How To Use Hadoop For Gis

How To Use Hadoop For Gis 2013 Esri International User Conference July 8 12, 2013 San Diego, California Technical Workshop Big Data: Using ArcGIS with Apache Hadoop David Kaiser Erik Hoel Offering 1330 Esri UC2013. Technical Workshop.

More information

How To Scale Out Of A Nosql Database

How To Scale Out Of A Nosql Database Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 thomas.steinmaurer@scch.at www.scch.at Michael Zwick DI

More information

Software Engineering for Big Data. CS846 Paulo Alencar David R. Cheriton School of Computer Science University of Waterloo

Software Engineering for Big Data. CS846 Paulo Alencar David R. Cheriton School of Computer Science University of Waterloo Software Engineering for Big Data CS846 Paulo Alencar David R. Cheriton School of Computer Science University of Waterloo Big Data Big data technologies describe a new generation of technologies that aim

More information

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time SCALEOUT SOFTWARE How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time by Dr. William Bain and Dr. Mikhail Sobolev, ScaleOut Software, Inc. 2012 ScaleOut Software, Inc. 12/27/2012 T wenty-first

More information

CIS 4930/6930 Spring 2014 Introduction to Data Science Data Intensive Computing. University of Florida, CISE Department Prof.

CIS 4930/6930 Spring 2014 Introduction to Data Science Data Intensive Computing. University of Florida, CISE Department Prof. CIS 4930/6930 Spring 2014 Introduction to Data Science Data Intensive Computing University of Florida, CISE Department Prof. Daisy Zhe Wang Data Science Overview Why, What, How, Who Outline Why Data Science?

More information

So What s the Big Deal?

So What s the Big Deal? So What s the Big Deal? Presentation Agenda Introduction What is Big Data? So What is the Big Deal? Big Data Technologies Identifying Big Data Opportunities Conducting a Big Data Proof of Concept Big Data

More information

www.pwc.com/oracle Next presentation starting soon Business Analytics using Big Data to gain competitive advantage

www.pwc.com/oracle Next presentation starting soon Business Analytics using Big Data to gain competitive advantage www.pwc.com/oracle Next presentation starting soon Business Analytics using Big Data to gain competitive advantage If every image made and every word written from the earliest stirring of civilization

More information

International Journal of Advancements in Research & Technology, Volume 3, Issue 5, May-2014 18 ISSN 2278-7763. BIG DATA: A New Technology

International Journal of Advancements in Research & Technology, Volume 3, Issue 5, May-2014 18 ISSN 2278-7763. BIG DATA: A New Technology International Journal of Advancements in Research & Technology, Volume 3, Issue 5, May-2014 18 BIG DATA: A New Technology Farah DeebaHasan Student, M.Tech.(IT) Anshul Kumar Sharma Student, M.Tech.(IT)

More information

Big Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014

Big Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014 Big Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014 Defining Big Not Just Massive Data Big data refers to data sets whose size is beyond the ability of typical database software tools

More information

Doing Multidisciplinary Research in Data Science

Doing Multidisciplinary Research in Data Science Doing Multidisciplinary Research in Data Science Assoc.Prof. Abzetdin ADAMOV CeDAWI - Center for Data Analytics and Web Insights Qafqaz University aadamov@qu.edu.az http://ce.qu.edu.az/~aadamov 16 May

More information

The Data Engineer. Mike Tamir Chief Science Officer Galvanize. Steven Miller Global Leader Academic Programs IBM Analytics

The Data Engineer. Mike Tamir Chief Science Officer Galvanize. Steven Miller Global Leader Academic Programs IBM Analytics The Data Engineer Mike Tamir Chief Science Officer Galvanize Steven Miller Global Leader Academic Programs IBM Analytics Alessandro Gagliardi Lead Faculty Galvanize Businesses are quickly realizing that

More information

Big Data. Lyle Ungar, University of Pennsylvania

Big Data. Lyle Ungar, University of Pennsylvania Big Data Big data will become a key basis of competition, underpinning new waves of productivity growth, innovation, and consumer surplus. McKinsey Data Scientist: The Sexiest Job of the 21st Century -

More information

Are You Ready for Big Data?

Are You Ready for Big Data? Are You Ready for Big Data? Jim Gallo National Director, Business Analytics April 10, 2013 Agenda What is Big Data? How do you leverage Big Data in your company? How do you prepare for a Big Data initiative?

More information

Applications for Big Data Analytics

Applications for Big Data Analytics Smarter Healthcare Applications for Big Data Analytics Multi-channel sales Finance Log Analysis Homeland Security Traffic Control Telecom Search Quality Manufacturing Trading Analytics Fraud and Risk Retail:

More information

The big data revolution

The big data revolution The big data revolution Friso van Vollenhoven (Xebia) Enterprise NoSQL Recently, there has been a lot of buzz about the NoSQL movement, a collection of related technologies mostly concerned with storing

More information

The Next Wave of Data Management. Is Big Data The New Normal?

The Next Wave of Data Management. Is Big Data The New Normal? The Next Wave of Data Management Is Big Data The New Normal? Table of Contents Introduction 3 Separating Reality and Hype 3 Why Are Firms Making IT Investments In Big Data? 4 Trends In Data Management

More information

Sunnie Chung. Cleveland State University

Sunnie Chung. Cleveland State University Sunnie Chung Cleveland State University Data Scientist Big Data Processing Data Mining 2 INTERSECT of Computer Scientists and Statisticians with Knowledge of Data Mining AND Big data Processing Skills:

More information

Integrating a Big Data Platform into Government:

Integrating a Big Data Platform into Government: Integrating a Big Data Platform into Government: Drive Better Decisions for Policy and Program Outcomes John Haddad, Senior Director Product Marketing, Informatica Digital Government Institute s Government

More information

Big Data & Analytics: Your concise guide (note the irony) Wednesday 27th November 2013

Big Data & Analytics: Your concise guide (note the irony) Wednesday 27th November 2013 Big Data & Analytics: Your concise guide (note the irony) Wednesday 27th November 2013 Housekeeping 1. Any questions coming out of today s presentation can be discussed in the bar this evening 2. OCF is

More information

Real Time Big Data Processing

Real Time Big Data Processing Real Time Big Data Processing Cloud Expo 2014 Ian Meyers Amazon Web Services Global Infrastructure Deployment & Administration App Services Analytics Compute Storage Database Networking AWS Global Infrastructure

More information

Transforming the Telecoms Business using Big Data and Analytics

Transforming the Telecoms Business using Big Data and Analytics Transforming the Telecoms Business using Big Data and Analytics Event: ICT Forum for HR Professionals Venue: Meikles Hotel, Harare, Zimbabwe Date: 19 th 21 st August 2015 AFRALTI 1 Objectives Describe

More information

Big Data: Tools and Technologies in Big Data

Big Data: Tools and Technologies in Big Data Big Data: Tools and Technologies in Big Data Jaskaran Singh Student Lovely Professional University, Punjab Varun Singla Assistant Professor Lovely Professional University, Punjab ABSTRACT Big data can

More information

Big Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect

Big Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect Big Data & QlikView Democratizing Big Data Analytics David Freriks Principal Solution Architect TDWI Vancouver Agenda What really is Big Data? How do we separate hype from reality? How does that relate

More information

Beyond Watson: The Business Implications of Big Data

Beyond Watson: The Business Implications of Big Data Beyond Watson: The Business Implications of Big Data Shankar Venkataraman IBM Program Director, STSM, Big Data August 10, 2011 The World is Changing and Becoming More INSTRUMENTED INTERCONNECTED INTELLIGENT

More information

Statistical Challenges with Big Data in Management Science

Statistical Challenges with Big Data in Management Science Statistical Challenges with Big Data in Management Science Arnab Kumar Laha Indian Institute of Management Ahmedabad Analytics vs Reporting Competitive Advantage Reporting Prescriptive Analytics (Decision

More information

Industry 4.0 and Big Data

Industry 4.0 and Big Data Industry 4.0 and Big Data Marek Obitko, mobitko@ra.rockwell.com Senior Research Engineer 03/25/2015 PUBLIC PUBLIC - 5058-CO900H 2 Background Joint work with Czech Institute of Informatics, Robotics and

More information

Smarter Planet evolution

Smarter Planet evolution Smarter Planet evolution 13/03/2012 2012 IBM Corporation Ignacio Pérez González Enterprise Architect ignacio.perez@es.ibm.com @ignaciopr Mike May Technologies of the Change Capabilities Tendencies Vision

More information

CAP4773/CIS6930 Projects in Data Science, Fall 2014 [Review] Overview of Data Science

CAP4773/CIS6930 Projects in Data Science, Fall 2014 [Review] Overview of Data Science CAP4773/CIS6930 Projects in Data Science, Fall 2014 [Review] Overview of Data Science Dr. Daisy Zhe Wang CISE Department University of Florida August 25th 2014 20 Review Overview of Data Science Why Data

More information

Big Data Zurich, November 23. September 2011

Big Data Zurich, November 23. September 2011 Institute of Technology Management Big Data Projektskizze «Competence Center Automotive Intelligence» Zurich, November 11th 23. September 2011 Felix Wortmann Assistant Professor Technology Management,

More information

Big Data Mining: Challenges and Opportunities to Forecast Future Scenario

Big Data Mining: Challenges and Opportunities to Forecast Future Scenario Big Data Mining: Challenges and Opportunities to Forecast Future Scenario Poonam G. Sawant, Dr. B.L.Desai Assist. Professor, Dept. of MCA, SIMCA, Savitribai Phule Pune University, Pune, Maharashtra, India

More information

Collaborations between Official Statistics and Academia in the Era of Big Data

Collaborations between Official Statistics and Academia in the Era of Big Data Collaborations between Official Statistics and Academia in the Era of Big Data World Statistics Day October 20-21, 2015 Budapest Vijay Nair University of Michigan Past-President of ISI vnn@umich.edu What

More information

BIG DATA FUNDAMENTALS

BIG DATA FUNDAMENTALS BIG DATA FUNDAMENTALS Timeframe Minimum of 30 hours Use the concepts of volume, velocity, variety, veracity and value to define big data Learning outcomes Critically evaluate the need for big data management

More information

BIG DATA TRENDS AND TECHNOLOGIES

BIG DATA TRENDS AND TECHNOLOGIES BIG DATA TRENDS AND TECHNOLOGIES THE WORLD OF DATA IS CHANGING Cloud WHAT IS BIG DATA? Big data are datasets that grow so large that they become awkward to work with using onhand database management tools.

More information

Big Analytics: A Next Generation Roadmap

Big Analytics: A Next Generation Roadmap Big Analytics: A Next Generation Roadmap Cloud Developers Summit & Expo: October 1, 2014 Neil Fox, CTO: SoftServe, Inc. 2014 SoftServe, Inc. Remember Life Before The Web? 1994 Even Revolutions Take Time

More information

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing Wayne W. Eckerson Director of Research, TechTarget Founder, BI Leadership Forum Business Analytics

More information

Let the data speak to you. Look Who s Peeking at Your Paycheck. Big Data. What is Big Data? The Artemis project: Saving preemies using Big Data

Let the data speak to you. Look Who s Peeking at Your Paycheck. Big Data. What is Big Data? The Artemis project: Saving preemies using Big Data CS535 Big Data W1.A.1 CS535 BIG DATA W1.A.2 Let the data speak to you Medication Adherence Score How likely people are to take their medication, based on: How long people have lived at the same address

More information

Big Data Analytics for Space Exploration, Entrepreneurship and Policy Opportunities. Tiffani Crawford, PhD

Big Data Analytics for Space Exploration, Entrepreneurship and Policy Opportunities. Tiffani Crawford, PhD Big Analytics for Space Exploration, Entrepreneurship and Policy Opportunities Tiffani Crawford, PhD Big Analytics Characteristics Large quantities of many data types Structured Unstructured Human Machine

More information

Big Data Analytics. Prof. Dr. Lars Schmidt-Thieme

Big Data Analytics. Prof. Dr. Lars Schmidt-Thieme Big Data Analytics Prof. Dr. Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany 33. Sitzung des Arbeitskreises Informationstechnologie,

More information

Big Systems, Big Data

Big Systems, Big Data Big Systems, Big Data When considering Big Distributed Systems, it can be noted that a major concern is dealing with data, and in particular, Big Data Have general data issues (such as latency, availability,

More information

DATA MINING AND WAREHOUSING CONCEPTS

DATA MINING AND WAREHOUSING CONCEPTS CHAPTER 1 DATA MINING AND WAREHOUSING CONCEPTS 1.1 INTRODUCTION The past couple of decades have seen a dramatic increase in the amount of information or data being stored in electronic format. This accumulation

More information

Big Data + Predictive Analytics = Actionable Business Insights: Consider Big Data as the Most Important Thing for Business since the Internet

Big Data + Predictive Analytics = Actionable Business Insights: Consider Big Data as the Most Important Thing for Business since the Internet Big Data + Predictive Analytics = Actionable Business Insights: Consider Big Data as the Most Important Thing for Business since the Internet Adapted from the forthcoming book, Business Innovation in the

More information

Large-Scale Data Processing

Large-Scale Data Processing Large-Scale Data Processing Eiko Yoneki eiko.yoneki@cl.cam.ac.uk http://www.cl.cam.ac.uk/~ey204 Systems Research Group University of Cambridge Computer Laboratory 2010s: Big Data Why Big Data now? Increase

More information

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract W H I T E P A P E R Deriving Intelligence from Large Data Using Hadoop and Applying Analytics Abstract This white paper is focused on discussing the challenges facing large scale data processing and the

More information

Big Data Analytics in Space Exploration and Entrepreneurship

Big Data Analytics in Space Exploration and Entrepreneurship Space Society of Silicon Valley Big Data Analytics in Space Exploration and Entrepreneurship Tiffani Crawford, PhD January 14, 2015 Big Data Analytics Data Characteristics Large quantities of many data

More information

White Paper: Hadoop for Intelligence Analysis

White Paper: Hadoop for Intelligence Analysis CTOlabs.com White Paper: Hadoop for Intelligence Analysis July 2011 A White Paper providing context, tips and use cases on the topic of analysis over large quantities of data. Inside: Apache Hadoop and

More information

How Big Is Big Data Adoption? Survey Results. Survey Results... 4. Big Data Company Strategy... 6

How Big Is Big Data Adoption? Survey Results. Survey Results... 4. Big Data Company Strategy... 6 Survey Results Table of Contents Survey Results... 4 Big Data Company Strategy... 6 Big Data Business Drivers and Benefits Received... 8 Big Data Integration... 10 Big Data Implementation Challenges...

More information

Big Data & Analytics for Semiconductor Manufacturing

Big Data & Analytics for Semiconductor Manufacturing Big Data & Analytics for Semiconductor Manufacturing 半 導 体 生 産 におけるビッグデータ 活 用 Ryuichiro Hattori 服 部 隆 一 郎 Intelligent SCM and MFG solution Leader Global CoC (Center of Competence) Electronics team General

More information

CSC590: Selected Topics BIG DATA & DATA MINING. Lecture 2 Feb 12, 2014 Dr. Esam A. Alwagait

CSC590: Selected Topics BIG DATA & DATA MINING. Lecture 2 Feb 12, 2014 Dr. Esam A. Alwagait CSC590: Selected Topics BIG DATA & DATA MINING Lecture 2 Feb 12, 2014 Dr. Esam A. Alwagait Agenda Introduction What is Big Data Why Big Data? Characteristics of Big Data Applications of Big Data Problems

More information

Introduction to Big Data the four V's

Introduction to Big Data the four V's Chapter 1: Introduction to Big Data the four V's This chapter is mainly based on the Big Data script by Donald Kossmann and Nesime Tatbul (ETH Zürich) Big Data Management and Analytics 15 Goal of Today

More information

The 4 Pillars of Technosoft s Big Data Practice

The 4 Pillars of Technosoft s Big Data Practice beyond possible Big Use End-user applications Big Analytics Visualisation tools Big Analytical tools Big management systems The 4 Pillars of Technosoft s Big Practice Overview Businesses have long managed

More information

A Strategic Approach to Unlock the Opportunities from Big Data

A Strategic Approach to Unlock the Opportunities from Big Data A Strategic Approach to Unlock the Opportunities from Big Data Yue Pan, Chief Scientist for Information Management and Healthcare IBM Research - China [contacts: panyue@cn.ibm.com ] Big Data or Big Illusion?

More information

Raul F. Chong Senior program manager Big data, DB2, and Cloud IM Cloud Computing Center of Competence - IBM Toronto Lab, Canada

Raul F. Chong Senior program manager Big data, DB2, and Cloud IM Cloud Computing Center of Competence - IBM Toronto Lab, Canada What is big data? Raul F. Chong Senior program manager Big data, DB2, and Cloud IM Cloud Computing Center of Competence - IBM Toronto Lab, Canada 1 2011 IBM Corporation Agenda The world is changing What

More information

This Symposium brought to you by www.ttcus.com

This Symposium brought to you by www.ttcus.com This Symposium brought to you by www.ttcus.com Linkedin/Group: Technology Training Corporation @Techtrain Technology Training Corporation www.ttcus.com Big Data Analytics as a Service (BDAaaS) Big Data

More information

Data Refinery with Big Data Aspects

Data Refinery with Big Data Aspects International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 3, Number 7 (2013), pp. 655-662 International Research Publications House http://www. irphouse.com /ijict.htm Data

More information

BIG DATA TOOLS. Top 10 open source technologies for Big Data

BIG DATA TOOLS. Top 10 open source technologies for Big Data BIG DATA TOOLS Top 10 open source technologies for Big Data We are in an ever expanding marketplace!!! With shorter product lifecycles, evolving customer behavior and an economy that travels at the speed

More information

Big Data and Industrial Internet

Big Data and Industrial Internet Big Data and Industrial Internet Keijo Heljanko Department of Computer Science and Helsinki Institute for Information Technology HIIT School of Science, Aalto University keijo.heljanko@aalto.fi 16.6-2015

More information

Trends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum

Trends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum Trends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum Siva Ravada Senior Director of Development Oracle Spatial and MapViewer 2 Evolving Technology Platforms

More information

White Paper. How Streaming Data Analytics Enables Real-Time Decisions

White Paper. How Streaming Data Analytics Enables Real-Time Decisions White Paper How Streaming Data Analytics Enables Real-Time Decisions Contents Introduction... 1 What Is Streaming Analytics?... 1 How Does SAS Event Stream Processing Work?... 2 Overview...2 Event Stream

More information

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning How to use Big Data in Industry 4.0 implementations LAURI ILISON, PhD Head of Big Data and Machine Learning Big Data definition? Big Data is about structured vs unstructured data Big Data is about Volume

More information

Hadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, 2012. Viswa Sharma Solutions Architect Tata Consultancy Services

Hadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, 2012. Viswa Sharma Solutions Architect Tata Consultancy Services Hadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, 2012 Viswa Sharma Solutions Architect Tata Consultancy Services 1 Agenda What is Hadoop Why Hadoop? The Net Generation is here Sizing the

More information

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON Overview * Introduction * Multiple faces of Big Data * Challenges of Big Data * Cloud Computing

More information

Hadoop. http://hadoop.apache.org/ Sunday, November 25, 12

Hadoop. http://hadoop.apache.org/ Sunday, November 25, 12 Hadoop http://hadoop.apache.org/ What Is Apache Hadoop? The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using

More information

The Big Deal about Big Data. Mike Skinner, CPA CISA CITP HORNE LLP

The Big Deal about Big Data. Mike Skinner, CPA CISA CITP HORNE LLP The Big Deal about Big Data Mike Skinner, CPA CISA CITP HORNE LLP Mike Skinner, CPA CISA CITP Senior Manager, IT Assurance & Risk Services HORNE LLP Focus areas: IT security & risk assessment IT governance,

More information

BIG DATA MARKETING: THE NEXUS OF MARKETING, ANALYSTS, AND IT

BIG DATA MARKETING: THE NEXUS OF MARKETING, ANALYSTS, AND IT BIG DATA MARKETING: THE NEXUS OF MARKETING, ANALYSTS, AND IT The term Big Data is definitely a leading contender for the marketing buzz-phrase of 2012. On November 11, 2011, a Google search on the phrase

More information

Statistics for BIG data

Statistics for BIG data Statistics for BIG data Statistics for Big Data: Are Statisticians Ready? Dennis Lin Department of Statistics The Pennsylvania State University John Jordan and Dennis K.J. Lin (ICSA-Bulletine 2014) Before

More information

BIG DATA I N B A N K I N G

BIG DATA I N B A N K I N G $ BIG DATA IN BANKING Table of contents What is Big Data?... How data science creates value in Banking... Best practices for Banking. Case studies... 3 7 10 1. Fraud detection... 2. Contact center efficiency

More information

Big Data Analytics. Lucas Rego Drumond

Big Data Analytics. Lucas Rego Drumond Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany Big Data Analytics Big Data Analytics 1 / 36 Outline

More information

From Data to Foresight:

From Data to Foresight: Laura Haas, IBM Fellow IBM Research - Almaden From Data to Foresight: Leveraging Data and Analytics for Materials Research 1 2011 IBM Corporation The road from data to foresight is long? Consumer Reports

More information

BIG DATA & ANALYTICS. Transforming the business and driving revenue through big data and analytics

BIG DATA & ANALYTICS. Transforming the business and driving revenue through big data and analytics BIG DATA & ANALYTICS Transforming the business and driving revenue through big data and analytics Collection, storage and extraction of business value from data generated from a variety of sources are

More information

Big Data Explained. An introduction to Big Data Science.

Big Data Explained. An introduction to Big Data Science. Big Data Explained An introduction to Big Data Science. 1 Presentation Agenda What is Big Data Why learn Big Data Who is it for How to start learning Big Data When to learn it Objective and Benefits of

More information

Big Data Introduction, Importance and Current Perspective of Challenges

Big Data Introduction, Importance and Current Perspective of Challenges International Journal of Advances in Engineering Science and Technology 221 Available online at www.ijaestonline.com ISSN: 2319-1120 Big Data Introduction, Importance and Current Perspective of Challenges

More information

The Future of Data Management

The Future of Data Management The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah (@awadallah) Cofounder and CTO Cloudera Snapshot Founded 2008, by former employees of Employees Today ~ 800 World Class

More information

Executive Summary... 2 Introduction... 3. Defining Big Data... 3. The Importance of Big Data... 4 Building a Big Data Platform...

Executive Summary... 2 Introduction... 3. Defining Big Data... 3. The Importance of Big Data... 4 Building a Big Data Platform... Executive Summary... 2 Introduction... 3 Defining Big Data... 3 The Importance of Big Data... 4 Building a Big Data Platform... 5 Infrastructure Requirements... 5 Solution Spectrum... 6 Oracle s Big Data

More information

Application Development. A Paradigm Shift

Application Development. A Paradigm Shift Application Development for the Cloud: A Paradigm Shift Ramesh Rangachar Intelsat t 2012 by Intelsat. t Published by The Aerospace Corporation with permission. New 2007 Template - 1 Motivation for the

More information

Is Big Data a Big Deal? What Big Data Does to Science

Is Big Data a Big Deal? What Big Data Does to Science Is Big Data a Big Deal? What Big Data Does to Science Netherlands escience Center Wilco Hazeleger Wilco Hazeleger Student @ Wageningen University and Reading University Meteorology PhD @ Utrecht University,

More information

A New Era Of Analytic

A New Era Of Analytic Penang egovernment Seminar 2014 A New Era Of Analytic Megat Anuar Idris Head, Project Delivery, Business Analytics & Big Data Agenda Overview of Big Data Case Studies on Big Data Big Data Technology Readiness

More information

Educational Opportunities in Big Data

Educational Opportunities in Big Data Educational Opportunities in Big Data Could current Big Gaps in Talent fill the void and Big Market Demand? Dr. KRS Murthy Dr.Sri.Murthy@Gmail.Com BigDataExpert@Gmail.Com (408)-464-3333 Big Gaps in Big

More information

DAMA NY DAMA Day October 17, 2013 IBM 590 Madison Avenue 12th floor New York, NY

DAMA NY DAMA Day October 17, 2013 IBM 590 Madison Avenue 12th floor New York, NY Big Data Analytics DAMA NY DAMA Day October 17, 2013 IBM 590 Madison Avenue 12th floor New York, NY Tom Haughey InfoModel, LLC 868 Woodfield Road Franklin Lakes, NJ 07417 201 755 3350 tom.haughey@infomodelusa.com

More information

Big Data. Fast Forward. Putting data to productive use

Big Data. Fast Forward. Putting data to productive use Big Data Putting data to productive use Fast Forward What is big data, and why should you care? Get familiar with big data terminology, technologies, and techniques. Getting started with big data to realize

More information

An interdisciplinary model for analytics education

An interdisciplinary model for analytics education An interdisciplinary model for analytics education Raffaella Settimi, PhD School of Computing, DePaul University Drew Conway s Data Science Venn Diagram http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram

More information

Hadoop Big Data for Processing Data and Performing Workload

Hadoop Big Data for Processing Data and Performing Workload Hadoop Big Data for Processing Data and Performing Workload Girish T B 1, Shadik Mohammed Ghouse 2, Dr. B. R. Prasad Babu 3 1 M Tech Student, 2 Assosiate professor, 3 Professor & Head (PG), of Computer

More information

Hadoop Ecosystem B Y R A H I M A.

Hadoop Ecosystem B Y R A H I M A. Hadoop Ecosystem B Y R A H I M A. History of Hadoop Hadoop was created by Doug Cutting, the creator of Apache Lucene, the widely used text search library. Hadoop has its origins in Apache Nutch, an open

More information