Defining Architecture Components of the Big Data Ecosystem

Size: px
Start display at page:

Download "Defining Architecture Components of the Big Data Ecosystem"

Transcription

1 Defining Architecture Components of the Big Data Ecosystem Yuri Demchenko SNE Group, University of Amsterdam 2 nd BDDAC2014 Symposium, CTS2014 Conference May 2014, Minneapolis, USA

2 Outline Big Data and Data Intensive Science as a new technology wave The Fourth Paradigm Big Data definition: From 6 Vs to 5 parts Big Data technology drivers Where do the data come from? What are Big Data drivers? Big Data: Paradigm change and new challenges From Big Data to All-Data Moving to data centric service models Defining Big Data Architecture Framework (BDAF) Big Data Infrastructure (BDI) and Big Data Analytics infrastructure/tools Summary and Discussion Big Data Architecture Framework Slide_2

3 Big Data and Security Research at System and Network Engineering, University of Amsterdam Long time research and development on Infrastructure services and facilities High speed optical networking and data intensive applications Semantic description of infrastructure and network services Collaborative systems, Grid, Clouds and currently Big Data Focus on Infrastructure definition and services Software Defined Infrastructure based on Cloud/Intercloud technologies Dynamically provisioned security infrastructure and services NIST Big Data Working Group Contribution to Reference Architecture, Big Data Definition and Taxonomy, Big Data Security Research Data Alliance (RDA) Interest Group on Education and Skills Development on Data Intensive Science Big Data Analytics Interest Group Big Data Interest Group at UvA Non-formal but active, meets two-weekly/monthly Provided input to NIST BD-WG and RDA activities Big Data Architecture Framework 3

4 Technology Definitions and Timeline - Overview Service Oriented Architecture (SOA): First proposed in 1996 and revived with the Web Services advent in Currently standard for industry, and widely used Provided a conceptual basis for Web Services development Computer Grids: Initially proposed in 1998 and finally shaped in 2003 with the Open Grid Services Architecture (OGSA) by Open Grid Forum (OGF) Currently remains as a collaborative environment Migrates to cloud and inter-cloud platform Cloud Computing: Initially proposed in 2008 Now in a productive phase Defined new features, capabilities, operational/usage models and actually provided a guidance for the new technology development Originated from the Service Computing domain and service management focused Big Data and Data Intensive Science: Yet to be defined Involves more components and processes to be included into the definition Can be better defined as Ecosystem where data are the main driving component Need to define the Big Data properties, expected technology capabilities and provide a guidance/vision for future technology development Big Data Architecture Framework 4

5 Visionaries and Drivers: Seminal works, High level reports, Activities The Fourth Paradigm: Data-Intensive Scientific Discovery. By Jim Gray, Microsoft, Edited by Tony Hey, et al. Riding the wave: How Europe can gain from the rising tide of scientific data. Final report of the High Level Expert Group on Scientific Data. October NIST Big Data Working Group (NBD-WG) ISO/IEC JTC1 Big Data Study Group (SGBD) AAA Study: Study on AAA Platforms For Scientific data/information Resources in Europe, TERENA, UvA, LIBER, UinvDeb. ( ) Big Data Architecture Framework 5

6 The Fourth Paradigm of Scientific Research 1. Theory and logical reasoning 2. Observation or Experiment E.g. Newton observed apples falling to design his theory of mechanics But Gallileo Galilei made experiments with falling objects from the Pisa leaning tower 3. Simulation of theory or model Digital simulation can prove theory or model 4. Data-driven Scientific Discovery (aka Data Science) More data beat hypnotized theory Big Data Architecture Framework 6

7 Gartner Technology Hypercycle (October 2013) Big Data Cloud Computing Source Big Data Architecture Framework 7

8 Our/SNE Big Data Technology Research Cycle Big Data Mid-End 2013 New style of technology development Technology consolidation Cloud Computing 2012 End Mid 2014 Remote BD technology following. EU Study AAA for Research Data Main research in Cloud/Intercloud Component technologies mastering Education courses development Active research into Big Data domain definition Building community links Active and productive research Teaching on Big Data Tech/Infra Source Big Data Architecture Framework 8

9 Big Data Definitions Overview IDC definition of Big Data (conservative and strict approach) : "A new generation of technologies and architectures designed to economically extract value from very large volumes of a wide variety of data by enabling high-velocity capture, discovery, and/or analysis Gartner definition Big data is high-volume, high-velocity and high-variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making. Termed as 3 parts definition, not 3V definition Big Data: a massive volume of both structured and unstructured data that is so large that it's difficult to process using traditional database and software techniques. From The Big Data Long Tail blog post by Jason Bloomberg (Jan 17, 2013). Data that exceeds the processing capacity of conventional database systems. The data is too big, moves too fast, or doesn t fit the structures of your database architectures. To gain value from this data, you must choose an alternative way to process it. Ed Dumbill, program chair for the O Reilly Strata Conference Big Data Architecture Framework 9

10 Improved: 6 (5+1) V s of Big Data Volume Variety Structured Unstructured Multi-factor Probabilistic Linked Dynamic Changing data Changing model Linkage Variability Adopted in general by NIST BD-WG Terabytes Records/Arch Tables, Files Distributed 6 Vs of Big Data Trustworthiness Authenticity Origin, Reputation Availability Accountability Veracity Velocity Batch Real/near-time Processes Streams Value Correlations Statistical Events Hypothetical Generic Big Data Properties Volume Variety Velocity Acquired Properties (after entering system) Value Veracity Variability Commonly accepted 3V s of Big Data Big Data Architecture Framework 10

11 Big Data Definition: From 6V to 5 Parts (1) (1) Big Data Properties: 5V Volume, Variety, Velocity, Value, Veracity Additionally: Data Dynamicity (Variability) (2) New Data Models Data Lifecycle and Variability Data linking, provenance and referral integrity (3) New Analytics Real-time/streaming analytics, interactive and machine learning analytics (4) New Infrastructure and Tools High performance Computing, Storage, Network Heterogeneous multi-provider services integration New Data Centric (multi-stakeholder) service models New Data Centric security models for trusted infrastructure and data processing and storage (5) Source and Target High velocity/speed data capture from variety of sensors and data sources Data delivery to different visualisation and actionable systems and consumers Full digitised input and output, (ubiquitous) sensor networks, full digital control Big Data Architecture Framework 11

12 Big Data Definition: From 6V to 5 Parts (1) (1) Big Data Properties: 5V Volume, Variety, Velocity, Value, Veracity Additionally: Data Dynamicity (Variability) (2) New Data Models Data linking, provenance and referral integrity Data Lifecycle and Variability/Evolution (3) New Analytics Real-time/streaming analytics, interactive and machine learning analytics (4) New Infrastructure and Tools High performance Computing, Storage, Network Heterogeneous multi-provider services integration New Data Centric (multi-stakeholder) service models New Data Centric security models for trusted infrastructure and data processing and storage (5) Source and Target High velocity/speed data capture from variety of sensors and data sources Data delivery to different visualisation and actionable systems and consumers Full digitised input and output, (ubiquitous) sensor networks, full digital control Big Data Architecture Framework 12

13 Big Data Definition: From 6V to 5 Parts (2) Refining Gartner definition Big data is (1) high-volume, high-velocity and high-variety information assets that demand (3) cost-effective, innovative forms of information processing for (5) enhanced insight and decision making Big Data (Data Intensive) Technologies are targeting to process (1) high-volume, high-velocity, high-variety data (sets/assets) to extract intended data value and ensure high-veracity of original data and obtained information that demand costeffective, innovative forms of data and information processing (analytics) for enhanced insight, decision making, and processes control; all of those demand (should be supported by) new data models (supporting all data states and stages during the whole data lifecycle) and new infrastructure services and tools that allows also obtaining (and processing data) from a variety of sources (including sensor networks) and delivering data in a variety of forms to different data and information consumers and devices. (1) Big Data Properties: 5V (2) New Data Models (3) New Analytics (4) New Infrastructure and Tools (5) Source and Target Big Data Architecture Framework 13

14 Data Transformation Big Data Nature: Origin and Target (consumers) Big Data Origin Science Internet, Web Industry Business Living Environment, Cities Social media and networks Healthcare Telecom/Infrastructure Big Data Target Use Scientific discovery New technologies Manufacturing, processes, transport Personal services, campaigns Living environment support Healthcare support Social Networking Volume, Velocity, Variety & Value, Veracity, Variability Big Data Architecture Framework 14

15 Big Data technology drivers (1) Modern e-science in search for new knowledge Scientific experiments and tools are becoming bigger and heavily based on data processing and mining The long tail of science Traditional data intensive industry Genomic research, drugs development, Healthcare High-tech industry, CAD/CAM, weather/climate, etc. Customer facing industry and companies Advertisement, retail business, service delivery Intelligence and security Network/infrastructure management Network monitoring, Intrusion detection, troubleshooting Big Data Architecture Framework 15

16 The Long Tail of Science (aka Dark Data ) Collectively Long Tail science is generating a lot of data Estimated as over 1PB per year and it is growing fast with the new technology proliferation rule: 20% users generate 80% data but not necessarily 80% knowledge Source: Dennis Gannon (Microsoft) NIST Big Data Workshop, 2012 Big Data Architecture Framework 16

17 Big Data technology drivers Technology Loop Technology loop (known as Jevons Paradox) Increased efficiency to process current demand will create new uses and increase demand even more Elastic Demand for Work: A doubling of fuel efficiency more than doubles work demanded, increasing the amount of fuel used. Jevons paradox occurs. Big Data Architecture Framework 17

18 Big Data technology drivers (2) Managing public campaigns, e.g. election, public relations The rise of public opinion stored in platforms like Twitter, Google, Facebook, etc. provide enough intelligence to influence the campaign development, timing, geography and even the colour of the campaign signs Twitter was a major source of data aggregation for the Republican Race in the US Multimillion-dollar contract for data management and collection services awarded May 1, 2013 to Liberty Work to build advanced list of voters Article In Data we trust by T.Edsall in The New York Times Book: In Data We Trust: How Customer Data is Revolutionising Our Economy (Aug 2012) A strategy for tomorrow's data world Big Data Architecture Framework 18

19 NIST Big Data Working Group (NBD-WG) and ISO/IEC JTC1 Study Group on Big Data (SGBD) Started June Weekly calls, open participation, mailing list Targeted formal delivery Autumn 2014 of a set of NIST documents Volume 1: NIST Big Data Definitions Volume 2: NIST Big Data Taxonomies Volume 3: NIST Big Data Use Case & Requirements (co-chair Geoffrey Fox) Volume 4: NIST Big Data Security and Privacy Requirements Volume 5: NIST Big Data Architectures White Paper Survey Volume 6: NIST Big Data Reference Architecture Volume 7: NIST Big Data Technology Roadmap ISO/IEC Study Group on Big Data (SGBD) Term (December 2013) September 2014 Extends NIST BDWG activity and scope 2nd meeting hosted in Amsterdam May Big Data Architecture Framework 19

20 2 nd ISO/IEC SGBD meeting May 2014 Discussions and results Two days workshop May 2014 EU and NL focus, UvA activities Refining Big Data technology definition and Big Data Architecture definition New items proposed Big Data market aspects Data ownership (including during data lifecycle/staging and aggregation) Opacity (obfuscation) data linkage during processing Data linkage Big Data Architecture Framework 20

21 From Big Data to All-Data Paradigm Change Breaking paradigm changing factor Data storage and processing Security Identification and provenance Traditional model BIG Storage and BIG Computer with FAT pipe Move compute to data vs Move data to compute New Paradigm Continuous data production Continuous data processing DataBus as a Data container and Protocol Big Data Move or not to move? Network? Distributed Big Data Storage Data Abstraction Data Bus Infrastructure Abstraction Distributed Compute and Analytics Big Computer Visuali sation Presen tation Action DataBus: (1) Data Container (2) Metadata, State (3) Data Transfer Protocol Big Data Architecture Framework 21

22 Moving to Data-Centric Models and Technologies Current IT and communication technologies are host based or host centric Any communication or processing are bound to host/computer that runs software Especially in security: all security models are host/client based Big Data requires new data-centric models Data location, search, access Data integrity and identification Data lifecycle and variability Data centric (declarative) programming models Data aware infrastructure to support new data formats and data centric programming models Data centric security and access control Big Data Architecture Framework 22

23 Defining Big Data Architecture Framework Existing attempts address architecture issues in a traditional way: ODCA, TMF, NIST Architecture vs Ecosystem Big Data undergo a number of transformations during their lifecycle Big Data fuel the whole transformation chain Data sources and data consumers, target data usage Multi-dimensional relations between Data models and data driven processes Infrastructure components and data centric services Architecture vs Architecture Framework Separates concerns and factors Control and Management functions, orthogonal factors Architecture Framework components are inter-related Big Data Architecture Framework 23

24 Big Data Architecture Framework (BDAF) (1) (1) Data Models, Structures, Types Data formats, non/relational, file systems, etc. (2) Big Data Management Big Data Lifecycle (Management) Model Big Data transformation/staging Provenance, Curation, Archiving (3) Big Data Analytics and Tools Big Data Applications Target use, presentation, visualisation (4) Big Data Infrastructure (BDI) Storage, Compute, (High Performance Computing,) Network Sensor network, target/actionable devices Big Data Operational support (5) Big Data Security Data security in-rest, in-move, trusted processing environments Big Data Architecture Framework 24

25 Big Data Architecture Framework (BDAF) Aggregated Relations between components (2) Col: Used By Row: Requires This Data Models & Structures Data Managmnt & Lifecycle BigData Infrastruct & Operations BigData Analytics & Applications BigData Security Data Models Structrs Data Managmnt & Lifecycle BigData Infrastr & Operations BigData Analytics & Applicatn BigData Security Big Data Architecture Framework 25

26 Big Data Ecosystem: General BD Infrastructure Data Transformation, Data Management Consumer Data Source Data Collection& Registratio n Data Filter/Enrich, Classification Data Analytics, Modeling, Prediction Big Data Target/Customer: Actionable/Usable Data Target users, processes, objects, behavior, etc. Big Data Source/Origin (sensor, experiment, logdata, behavioral data) Big Data Analytic/Tools Data Delivery, Visualisatio n Federated Access and Delivery Infrastructure (FADI) Big Data Infrastructure Heterogeneous multi-provider inter-cloud infrastructure Data management infrastructure Collaborative Environment (user/groups managements) Advanced high performance (programmable) network Security infrastructure Storage General Purpose Compute General Purpose High Performance Computer Clusters Storage Specialised Databases Archives (analytics DB, In memory, operstional) Data Management Data Data categories: Data categories: metadata, categories: metadata, (un)structured, metadata, (un)structured, (non)identifiable (un)structured, (non)identifiable (non)identifiable Intercloud multi-provider heterogeneous Infrastructure Security Infrastructure Network Infrastructure Internal Infrastructure Management/Monitoring Big Data Architecture Framework 26

27 Big Data Infrastructure and Analytics Tools Big Data Infrastructure Heterogeneous multi-provider inter-cloud infrastructure Data management infrastructure Collaborative Environment (user/groups managements) Advanced high performance (programmable) network Security infrastructure Big Data Analytics High Performance Computer Clusters (HPCC) Analytics/processing: Realtime, Interactive, Batch, Streaming Big Data Analytics tools and applications Big Data Architecture Framework 27

28 Consumer Data Analitics Application Data Lifecycle/Transformation Model Data (inter)linking? PID/OID ORCID Identification Privacy, Opacity Data Model (1) Data Model (1) Data Storage Data Model (3) Common Data Model? Data Variety and Variability Semantic Interoperability Data Model (4) Data Source Data Collection& Registration Data Filter/Enrich, Classification Data Analytics, Modeling, Prediction Data Delivery, Visualisation Data repurposing, Analitics re-factoring, Secondary processing Does Data Model changes along lifecycle or data evolution? Identifying and linking data Persistent identifiers Data ownership Traceability vs Opacity Referral integrity Big Data Architecture Framework 28

29 Evolutional/Hierarchical Data Model Actionable Data Papers/Reports ORCID Archival Data Usable Data PID/DOI Processed Data (for target use) Processed Data (for target use) Processed Data (for target use) Classified/Structured Data Classified/Structured Data Classified/Structured Data Raw Data Topics for discussion, research and standardisation Common Data Model? Data interlinking? Fits to Graph data type? Metadata Referrals Control information Policy Data patterns Big Data Architecture Framework 29

30 Summary and topics for discussion Researching, learning mastering Big Data domain is a Big Data problem itself Cloud Computing as a natural platform for Big Data Acceptance of clouds will grow, so demand for specialists New generically data centric models are required New distributed data processing and analytics computing models to be developed/re-factored Data Scientist is a new focus for talents search by companies and task for universities to develop a new curriculum Big Data Architecture Framework 30

31 Additional topics Data Scientist: New profession and need for Education&Training Big Data Architecture Framework 31

32 Data Scientist: New Profession and Opportunities McKinsey Institute on Big Data Jobs (2011) There will be a shortage of talent necessary for organizations to take advantage of Big Data. By 2018, the United States alone could face a shortage of 140,000 to 190,000 people with deep analytical skills as well as 1.5 million managers and analysts with the know-how to use the analysis of big data to make effective decisions SOURCE:US Bureau of Labor Statistics; US Census; Dun & Bradstreet; company interviews; McKinsey analysis Wsh SGBD, May 2014 Big Data and Education 32

33 Strata Survey Skills and Data Scientist Self-ID Analysing the Analysers. O Reilly Strata Survey Harris, Murphy & Vaisman, 2013 Based on how data scientists think about themselves and their work Identified four Data Scientist clusters Wsh SGBD, May 2014 Big Data and Education 33

34 Skills and Self-ID Top Factors Business ML/BigData Math/OR Programming Statistics ML Machine Learning OR Operations Research Wsh SGBD, May 2014 Big Data and Education 34

35 Slide from the presentation Demystifying Data Science (by Natasha Balac, SDSC) Wsh SGBD, May 2014 Big Data and Education 35

36 Key to a Great Data Scientist Technical skills (Coding, Statistics, Math) + Commitment +Creativity + Intuition = Great Data Scientist! + Presentation Skills + Business Savvy How Long Does It Take For a Beginner to Become a Good Data Scientist? 3-5 years according to KDnuggets survey [278 votes total] Wsh SGBD, May 2014 Big Data and Education 36

Overview NIST Big Data Working Group Activities

Overview NIST Big Data Working Group Activities Overview NIST Big Working Group Activities and Big Architecture Framework (BDAF) by UvA Yuri Demchenko SNE Group, University of Amsterdam Big Analytics Interest Group 17 September 2013, 2nd RDA Plenary

More information

Cloud and Big Data Standardisation

Cloud and Big Data Standardisation Cloud and Big Data Standardisation EuroCloud Symposium ICS Track: Standards for Big Data in the Cloud 15 October 2013, Luxembourg Yuri Demchenko System and Network Engineering Group, University of Amsterdam

More information

Big Data Standardisation in Industry and Research

Big Data Standardisation in Industry and Research Big Data Standardisation in Industry and Research EuroCloud Symposium ICS Track: Standards for Big Data in the Cloud 15 October 2013, Luxembourg Yuri Demchenko System and Network Engineering Group, University

More information

Big Security for Big Data: Addressing Security Challenges for the Big Data Infrastructure

Big Security for Big Data: Addressing Security Challenges for the Big Data Infrastructure Big Security for Big : Addressing Security Challenges for the Big Infrastructure Yuri Demchenko SNE Group, University of Amsterdam On behalf of Yuri Demchenko, Canh Ngo, Cees de Laat, Peter Membrey, Daniil

More information

Big Data course and Learning Model for Online education (LMO) at the Laureate Online Education (University of Liverpool)

Big Data course and Learning Model for Online education (LMO) at the Laureate Online Education (University of Liverpool) Big Data course and Learning Model for Online education (LMO) at the Laureate Online Education (University of Liverpool) Yuri Demchenko (University of Amsterdam) Emanuel Gruengard (Laureate Online Education)

More information

Big Data and Data Intensive Science:

Big Data and Data Intensive Science: Big Data and Data Intensive Science: Infrastructure and Other Challenges Yuri Demchenko SNE Group, University of Amsterdam EENET Conference 4 October 2013, Tartu, Outline Big Data and Data Intensive Science

More information

EDISON Education for Data Intensive Science to Open New science frontiers

EDISON Education for Data Intensive Science to Open New science frontiers H2020 INFRASUPP-4 CSA Project EDISON Education for Data Intensive Science to Open New science frontiers Yuri Demchenko University of Amsterdam Outline Consortium members EDISON Project Concept and Objectives

More information

Defining the Big Data Architecture Framework (BDAF)

Defining the Big Data Architecture Framework (BDAF) Defining the Big Architecture Framework (BDAF) Outcome of the Brainstorming Session at the University of Amsterdam Yuri Demchenko (facilitator, reporter), SNE Group, University of Amsterdam 17 July 2013,

More information

Defining Architecture Components of the Big Data Ecosystem

Defining Architecture Components of the Big Data Ecosystem Defining Architecture Components of the Big Data Ecosystem Yuri Demchenko, Cees de Laat System and Network Engineering Group University of Amsterdam Amsterdam, The Netherlands e-mail: {y.demchenko, C.T.A.M.deLaat}@uva.nl

More information

Open Cloud exchange (OCX)

Open Cloud exchange (OCX) Open Cloud exchange (OCX) Draft Proposal and Progress GN3plus JRA1 Task 2 - Network Architectures for Cloud Services Yuri Demchenko SNE Group, University of Amsterdam 10 October 2013, GN3plus Symposium,

More information

Architecture Framework and Components for the Big Data Ecosystem

Architecture Framework and Components for the Big Data Ecosystem System and Network Engineering Group, UvA UNIVERSITEIT VAN AMSTERDAM System and Network Engineering Architecture Framework and Components for the Big Data Ecosystem Draft Version 0.2 Yuri Demchenko, Canh

More information

The New World of Data. Don Strickland President, Strickland & Associates

The New World of Data. Don Strickland President, Strickland & Associates The New World of Data Don Strickland President, Strickland & Associates THE NEW WORLD OF DATA 1900 1950 2000 Physical Infrastructure Labor Capital Physical Infrastructure Labor Capital Physical Infrastructure

More information

Demystifying The Data Scientist

Demystifying The Data Scientist Demystifying The Data Scientist Natasha Balac, Ph.D. Predictive Analytics Center of Excellence, Director San Diego Supercomputer Center University of California, San Diego Brief History of SDSC 1985-1997:

More information

Big Data a threat or a chance?

Big Data a threat or a chance? Big Data a threat or a chance? Helwig Hauser University of Bergen, Dept. of Informatics Big Data What is Big Data? well, lots of data, right? we come back to this in a moment. certainly, a buzz-word but

More information

Standard Big Data Architecture and Infrastructure

Standard Big Data Architecture and Infrastructure Standard Big Data Architecture and Infrastructure Wo Chang Digital Data Advisor Information Technology Laboratory (ITL) National Institute of Standards and Technology (NIST) [email protected] May 20, 2016

More information

Big Data. Fast Forward. Putting data to productive use

Big Data. Fast Forward. Putting data to productive use Big Data Putting data to productive use Fast Forward What is big data, and why should you care? Get familiar with big data terminology, technologies, and techniques. Getting started with big data to realize

More information

The 4 Pillars of Technosoft s Big Data Practice

The 4 Pillars of Technosoft s Big Data Practice beyond possible Big Use End-user applications Big Analytics Visualisation tools Big Analytical tools Big management systems The 4 Pillars of Technosoft s Big Practice Overview Businesses have long managed

More information

Big Data Analytics. Prof. Dr. Lars Schmidt-Thieme

Big Data Analytics. Prof. Dr. Lars Schmidt-Thieme Big Data Analytics Prof. Dr. Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany 33. Sitzung des Arbeitskreises Informationstechnologie,

More information

Survey of Big Data Architecture and Framework from the Industry

Survey of Big Data Architecture and Framework from the Industry Survey of Big Data Architecture and Framework from the Industry NIST Big Data Public Working Group Sanjay Mishra May13, 2014 3/19/2014 NIST Big Data Public Working Group 1 NIST BD PWG Survey of Big Data

More information

Towards a Thriving Data Economy: Open Data, Big Data, and Data Ecosystems

Towards a Thriving Data Economy: Open Data, Big Data, and Data Ecosystems Towards a Thriving Data Economy: Open Data, Big Data, and Data Ecosystems Volker Markl [email protected] dima.tu-berlin.de dfki.de/web/research/iam/ bbdc.berlin Based on my 2014 Vision Paper On

More information

International Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: 2454-2377 Vol. 1, Issue 6, October 2015. Big Data and Hadoop

International Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: 2454-2377 Vol. 1, Issue 6, October 2015. Big Data and Hadoop ISSN: 2454-2377, October 2015 Big Data and Hadoop Simmi Bagga 1 Satinder Kaur 2 1 Assistant Professor, Sant Hira Dass Kanya MahaVidyalaya, Kala Sanghian, Distt Kpt. INDIA E-mail: [email protected]

More information

What to Look for When Selecting a Master Data Management Solution

What to Look for When Selecting a Master Data Management Solution What to Look for When Selecting a Master Data Management Solution What to Look for When Selecting a Master Data Management Solution Table of Contents Business Drivers of MDM... 3 Next-Generation MDM...

More information

Addressing Big Data Issues in Scientific Data Infrastructure

Addressing Big Data Issues in Scientific Data Infrastructure Addressing Big Issues in Scientific Infrastructure Yuri Demchenko, SNE Group, University of Amsterdam 21 May 2013, BDDAC2013 Workshop CTS2013 San Diego Outline Background to Big research at SNE/UvA Big

More information

COMP9321 Web Application Engineering

COMP9321 Web Application Engineering COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 11 (Part II) http://webapps.cse.unsw.edu.au/webcms2/course/index.php?cid=2411

More information

Hadoop Data Hubs and BI. Supporting the migration from siloed reporting and BI to centralized services with Hadoop

Hadoop Data Hubs and BI. Supporting the migration from siloed reporting and BI to centralized services with Hadoop Hadoop Data Hubs and BI Supporting the migration from siloed reporting and BI to centralized services with Hadoop John Allen October 2014 Introduction John Allen; computer scientist Background in data

More information

Big Data and Analytics: Challenges and Opportunities

Big Data and Analytics: Challenges and Opportunities Big Data and Analytics: Challenges and Opportunities Dr. Amin Beheshti Lecturer and Senior Research Associate University of New South Wales, Australia (Service Oriented Computing Group, CSE) Talk: Sharif

More information

The InterNational Committee for Information Technology Standards INCITS Big Data

The InterNational Committee for Information Technology Standards INCITS Big Data The InterNational Committee for Information Technology Standards INCITS Big Data Keith W. Hare JCC Consulting, Inc. April 2, 2015 Who am I? Senior Consultant with JCC Consulting, Inc. since 1985 High performance

More information

locuz.com Big Data Services

locuz.com Big Data Services locuz.com Big Data Services Big Data At Locuz, we help the enterprise move from being a data-limited to a data-driven one, thereby enabling smarter, faster decisions that result in better business outcome.

More information

ISO/IEC JTC1 SC32. Next Generation Analytics Study Group

ISO/IEC JTC1 SC32. Next Generation Analytics Study Group November 13, 2013 ISO/IEC JTC1 SC32 Next Generation Analytics Study Group Title: Author: Project: Status: Big Data Efforts Keith W. Hare Discussion Paper References: 1/6 1 NIST Big Data Public Working

More information

Big Data & Analytics: Your concise guide (note the irony) Wednesday 27th November 2013

Big Data & Analytics: Your concise guide (note the irony) Wednesday 27th November 2013 Big Data & Analytics: Your concise guide (note the irony) Wednesday 27th November 2013 Housekeeping 1. Any questions coming out of today s presentation can be discussed in the bar this evening 2. OCF is

More information

BIG DATA FUNDAMENTALS

BIG DATA FUNDAMENTALS BIG DATA FUNDAMENTALS Timeframe Minimum of 30 hours Use the concepts of volume, velocity, variety, veracity and value to define big data Learning outcomes Critically evaluate the need for big data management

More information

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON Overview * Introduction * Multiple faces of Big Data * Challenges of Big Data * Cloud Computing

More information

The Data Lifecycle: Managing Data through Business. Ewan Willars Friday 27 February

The Data Lifecycle: Managing Data through Business. Ewan Willars Friday 27 February The Lifecycle: Managing through Business Ewan Willars Friday 27 February ACCA s unrivalled global network 4,000 members & students in Canada The changing role of finance and the CFO Traditional control

More information

Addressing Big Data Issues in Scientific Data Infrastructure

Addressing Big Data Issues in Scientific Data Infrastructure Addressing Big Data Issues in Scientific Data Infrastructure Yuri Demchenko, Paola Grosso, Cees de Laat System and Network Engineering Group University of Amsterdam Amsterdam, The Netherlands e-mail: {y.demchenko,

More information

Big Data Introduction, Importance and Current Perspective of Challenges

Big Data Introduction, Importance and Current Perspective of Challenges International Journal of Advances in Engineering Science and Technology 221 Available online at www.ijaestonline.com ISSN: 2319-1120 Big Data Introduction, Importance and Current Perspective of Challenges

More information

ISSN:2321-1156 International Journal of Innovative Research in Technology & Science(IJIRTS)

ISSN:2321-1156 International Journal of Innovative Research in Technology & Science(IJIRTS) Nguyễn Thị Thúy Hoài, College of technology _ Danang University Abstract The threading development of IT has been bringing more challenges for administrators to collect, store and analyze massive amounts

More information

USING BIG DATA FOR INTELLIGENT BUSINESSES

USING BIG DATA FOR INTELLIGENT BUSINESSES HENRI COANDA AIR FORCE ACADEMY ROMANIA INTERNATIONAL CONFERENCE of SCIENTIFIC PAPER AFASES 2015 Brasov, 28-30 May 2015 GENERAL M.R. STEFANIK ARMED FORCES ACADEMY SLOVAK REPUBLIC USING BIG DATA FOR INTELLIGENT

More information

International Journal of Advancements in Research & Technology, Volume 3, Issue 5, May-2014 18 ISSN 2278-7763. BIG DATA: A New Technology

International Journal of Advancements in Research & Technology, Volume 3, Issue 5, May-2014 18 ISSN 2278-7763. BIG DATA: A New Technology International Journal of Advancements in Research & Technology, Volume 3, Issue 5, May-2014 18 BIG DATA: A New Technology Farah DeebaHasan Student, M.Tech.(IT) Anshul Kumar Sharma Student, M.Tech.(IT)

More information

Data Centric Systems (DCS)

Data Centric Systems (DCS) Data Centric Systems (DCS) Architecture and Solutions for High Performance Computing, Big Data and High Performance Analytics High Performance Computing with Data Centric Systems 1 Data Centric Systems

More information

The Canadian Realities of Big Data and Business Analytics. Utsav Arora February 12, 2014

The Canadian Realities of Big Data and Business Analytics. Utsav Arora February 12, 2014 The Canadian Realities of Big Data and Business Analytics Utsav Arora February 12, 2014 Things to think about for today How Important is Big Data for me? Why do I need to implement Big Data and Analytics

More information

BIG DATA & DATA SCIENCE

BIG DATA & DATA SCIENCE BIG DATA & DATA SCIENCE ACADEMY PROGRAMS IN-COMPANY TRAINING PORTFOLIO 2 TRAINING PORTFOLIO 2016 Synergic Academy Solutions BIG DATA FOR LEADING BUSINESS Big data promises a significant shift in the way

More information

Big Analytics: A Next Generation Roadmap

Big Analytics: A Next Generation Roadmap Big Analytics: A Next Generation Roadmap Cloud Developers Summit & Expo: October 1, 2014 Neil Fox, CTO: SoftServe, Inc. 2014 SoftServe, Inc. Remember Life Before The Web? 1994 Even Revolutions Take Time

More information

Big Data. Dr.Douglas Harris DECEMBER 12, 2013

Big Data. Dr.Douglas Harris DECEMBER 12, 2013 Dr.Douglas Harris DECEMBER 12, 2013 GOWTHAM REDDY Fall,2013 Table of Contents Computing history:... 2 Why Big Data and Why Now?... 3 Information Life-Cycle Management... 4 Goals... 5 Information Management

More information

Standards for Big Data in the Cloud

Standards for Big Data in the Cloud Standards for Big Data in the Cloud International Cloud Symposium 15/10/2013 Carola Carstens (Project Officer) DG CONNECT, Unit G3 Data Value Chain European Commission Outline 1) Data Value Chain Unit

More information

Deploying Big Data to the Cloud: Roadmap for Success

Deploying Big Data to the Cloud: Roadmap for Success Deploying Big Data to the Cloud: Roadmap for Success James Kobielus Chair, CSCC Big Data in the Cloud Working Group IBM Big Data Evangelist. IBM Data Magazine, Editor-in- Chief. IBM Senior Program Director,

More information

Bringing Strategy to Life Using an Intelligent Data Platform to Become Data Ready. Informatica Government Summit April 23, 2015

Bringing Strategy to Life Using an Intelligent Data Platform to Become Data Ready. Informatica Government Summit April 23, 2015 Bringing Strategy to Life Using an Intelligent Platform to Become Ready Informatica Government Summit April 23, 2015 Informatica Solutions Overview Power the -Ready Enterprise Government Imperatives Improve

More information

A Big Picture for Big Data

A Big Picture for Big Data Supported by EU FP7 SCIDIP-ES, EU FP7 EarthServer A Big Picture for Big Data FOSS4G-Europe, Bremen, 2014-07-15 Peter Baumann Jacobs University rasdaman GmbH [email protected] Our Stds Involvement

More information

Defining Generic Architecture for Cloud Infrastructure as a Service Model

Defining Generic Architecture for Cloud Infrastructure as a Service Model Defining Generic Architecture for Cloud Infrastructure as a Service Model Yuri Demchenko 1 University of Amsterdam Science Park 904, Amsterdam, The Netherlands E-mail: [email protected] Cees de Laat University

More information

5 Keys to Unlocking the Big Data Analytics Puzzle. Anurag Tandon Director, Product Marketing March 26, 2014

5 Keys to Unlocking the Big Data Analytics Puzzle. Anurag Tandon Director, Product Marketing March 26, 2014 5 Keys to Unlocking the Big Data Analytics Puzzle Anurag Tandon Director, Product Marketing March 26, 2014 1 A Little About Us A global footprint. A proven innovator. A leader in enterprise analytics for

More information

BIG DATA. Value 8/14/2014 WHAT IS BIG DATA? THE 5 V'S OF BIG DATA WHAT IS BIG DATA?

BIG DATA. Value 8/14/2014 WHAT IS BIG DATA? THE 5 V'S OF BIG DATA WHAT IS BIG DATA? WHAT IS BIG DATA? BIG DATA DR. KLARA NELSON THE UNIVERSITY OF TAMPA "Volumes of data that are unusually large, or types of data that are unstructured" Thomas Davenport, Keeping Up with the Quants, 2013,

More information

Cloud computing based big data ecosystem and requirements

Cloud computing based big data ecosystem and requirements Cloud computing based big data ecosystem and requirements Yongshun Cai ( 蔡 永 顺 ) Associate Rapporteur of ITU T SG13 Q17 China Telecom Dong Wang ( 王 东 ) Rapporteur of ITU T SG13 Q18 ZTE Corporation Agenda

More information

Airline Applications of Business Intelligence Systems

Airline Applications of Business Intelligence Systems Airline Applications of Business Intelligence Systems Mihai ANDRONIE* *Corresponding author Spiru Haret University Str. Ion Ghica 13, Bucharest 030045, Romania [email protected] DOI: 10.13111/2066-8201.2015.7.3.14

More information

NextGen Infrastructure for Big DATA Analytics.

NextGen Infrastructure for Big DATA Analytics. NextGen Infrastructure for Big DATA Analytics. So What is Big Data? Data that exceeds the processing capacity of conven4onal database systems. The data is too big, moves too fast, or doesn t fit the structures

More information

Standardization Requirements Analysis on Big Data in Public Sector based on Potential Business Models

Standardization Requirements Analysis on Big Data in Public Sector based on Potential Business Models , pp. 165-172 http://dx.doi.org/10.14257/ijseia.2014.8.11.15 Standardization Requirements Analysis on Big Data in Public Sector based on Potential Business Models Suwook Ha 1, Seungyun Lee 2 and Kangchan

More information

Ernestina Menasalvas Universidad Politécnica de Madrid

Ernestina Menasalvas Universidad Politécnica de Madrid Ernestina Menasalvas Universidad Politécnica de Madrid EECA Cluster networking event RITA 12th november 2014, Baku Sectors/Domains Big Data Value Source Public administration EUR 150 billion to EUR 300

More information

BIG DATA WITHIN THE LARGE ENTERPRISE 9/19/2013. Navigating Implementation and Governance

BIG DATA WITHIN THE LARGE ENTERPRISE 9/19/2013. Navigating Implementation and Governance BIG DATA WITHIN THE LARGE ENTERPRISE 9/19/2013 Navigating Implementation and Governance Purpose of Today s Talk John Adler - Data Management Group Madina Kassengaliyeva - Think Big Analytics Growing data

More information

NIST Big Data Public Working Group

NIST Big Data Public Working Group NIST Big Data Public Working Group Requirements May 13, 2014 Arnab Roy, Fujitsu On behalf of the NIST BDWG S&P Subgroup S&P Requirements Emerging due to Big Data Characteristics Variety: Traditional encryption

More information

Big Data Are You Ready? Jorge Plascencia Solution Architect Manager

Big Data Are You Ready? Jorge Plascencia Solution Architect Manager Big Data Are You Ready? Jorge Plascencia Solution Architect Manager Big Data: The Datafication Of Everything Thoughts Devices Processes Thoughts Things Processes Run the Business Organize data to do something

More information

The Promise of Industrial Big Data

The Promise of Industrial Big Data The Promise of Industrial Big Data Big Data Real Time Analytics Katherine Butler 1 st Annual Digital Economy Congress San Diego, CA Nov 14 th 15 th, 2013 Individual vs. Ecosystem What Happened When 1B

More information

Statistics for BIG data

Statistics for BIG data Statistics for BIG data Statistics for Big Data: Are Statisticians Ready? Dennis Lin Department of Statistics The Pennsylvania State University John Jordan and Dennis K.J. Lin (ICSA-Bulletine 2014) Before

More information

Transforming the Telecoms Business using Big Data and Analytics

Transforming the Telecoms Business using Big Data and Analytics Transforming the Telecoms Business using Big Data and Analytics Event: ICT Forum for HR Professionals Venue: Meikles Hotel, Harare, Zimbabwe Date: 19 th 21 st August 2015 AFRALTI 1 Objectives Describe

More information

Are You Ready for Big Data?

Are You Ready for Big Data? Are You Ready for Big Data? Jim Gallo National Director, Business Analytics April 10, 2013 Agenda What is Big Data? How do you leverage Big Data in your company? How do you prepare for a Big Data initiative?

More information

User Needs and Requirements Analysis for Big Data Healthcare Applications

User Needs and Requirements Analysis for Big Data Healthcare Applications User Needs and Requirements Analysis for Big Data Healthcare Applications Sonja Zillner, Siemens AG In collaboration with: Nelia Lasierra, Werner Faix, and Sabrina Neururer MIE 2014 in Istanbul: 01-09-2014

More information

What happens when Big Data and Master Data come together?

What happens when Big Data and Master Data come together? What happens when Big Data and Master Data come together? Jeremy Pritchard Master Data Management fgdd 1 What is Master Data? Master data is data that is shared by multiple computer systems. The Information

More information

Danny Wang, Ph.D. Vice President of Business Strategy and Risk Management Republic Bank

Danny Wang, Ph.D. Vice President of Business Strategy and Risk Management Republic Bank Danny Wang, Ph.D. Vice President of Business Strategy and Risk Management Republic Bank Agenda» Overview» What is Big Data?» Accelerates advances in computer & technologies» Revolutionizes data measurement»

More information

Big Data Analytics Platform @ Nokia

Big Data Analytics Platform @ Nokia Big Data Analytics Platform @ Nokia 1 Selecting the Right Tool for the Right Workload Yekesa Kosuru Nokia Location & Commerce Strata + Hadoop World NY - Oct 25, 2012 Agenda Big Data Analytics Platform

More information

Manjula Ambur NASA Langley Research Center April 2014

Manjula Ambur NASA Langley Research Center April 2014 Manjula Ambur NASA Langley Research Center April 2014 Outline What is Big Data Vision and Roadmap Key Capabilities Impetus for Watson Technologies Content Analytics Use Potential use cases What is Big

More information

How To Use Big Data Effectively

How To Use Big Data Effectively Why is BIG Data Important? March 2012 1 Why is BIG Data Important? A Navint Partners White Paper May 2012 Why is BIG Data Important? March 2012 2 What is Big Data? Big data is a term that refers to data

More information

Big Data Analytics. Chances and Challenges. Volker Markl

Big Data Analytics. Chances and Challenges. Volker Markl Volker Markl Professor and Chair Database Systems and Information Management (DIMA), Technische Universität Berlin www.dima.tu-berlin.de Big Data Analytics Chances and Challenges Volker Markl DIMA BDOD

More information

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica

More information

Hur hanterar vi utmaningar inom området - Big Data. Jan Östling Enterprise Technologies Intel Corporation, NER

Hur hanterar vi utmaningar inom området - Big Data. Jan Östling Enterprise Technologies Intel Corporation, NER Hur hanterar vi utmaningar inom området - Big Data Jan Östling Enterprise Technologies Intel Corporation, NER Legal Disclaimers All products, computer systems, dates, and figures specified are preliminary

More information

Big Data and Complex Networks Analytics. Timos Sellis, CSIT Kathy Horadam, MGS

Big Data and Complex Networks Analytics. Timos Sellis, CSIT Kathy Horadam, MGS Big Data and Complex Networks Analytics Timos Sellis, CSIT Kathy Horadam, MGS Big Data What is it? Most commonly accepted definition, by Gartner (the 3 Vs) Big data is high-volume, high-velocity and high-variety

More information

Training for Big Data

Training for Big Data Training for Big Data Learnings from the CATS Workshop Raghu Ramakrishnan Technical Fellow, Microsoft Head, Big Data Engineering Head, Cloud Information Services Lab Store any kind of data What is Big

More information

Big Data Challenges and Success Factors. Deloitte Analytics Your data, inside out

Big Data Challenges and Success Factors. Deloitte Analytics Your data, inside out Big Data Challenges and Success Factors Deloitte Analytics Your data, inside out Big Data refers to the set of problems and subsequent technologies developed to solve them that are hard or expensive to

More information

Data Science Body of Knowledge (DS-BoK): Approach and Initial

Data Science Body of Knowledge (DS-BoK): Approach and Initial EDISON Discussion Document Data Science Body of Knowledge (DS-BoK): Approach and Initial version Project acronym: EDISON Project full title: Education for Data Intensive Science to Open New science frontiers

More information

Architecting for the Internet of Things & Big Data

Architecting for the Internet of Things & Big Data Architecting for the Internet of Things & Big Data Robert Stackowiak, Oracle North America, VP Information Architecture & Big Data September 29, 2014 Safe Harbor Statement The following is intended to

More information

Big Data & Security. Aljosa Pasic 12/02/2015

Big Data & Security. Aljosa Pasic 12/02/2015 Big Data & Security Aljosa Pasic 12/02/2015 Welcome to Madrid!!! Big Data AND security: what is there on our minds? Big Data tools and technologies Big Data T&T chain and security/privacy concern mappings

More information

Big Data Analytics. Lucas Rego Drumond

Big Data Analytics. Lucas Rego Drumond Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany Big Data Analytics Big Data Analytics 1 / 36 Outline

More information