Big Data and Data Intensive Science:

Size: px
Start display at page:

Download "Big Data and Data Intensive Science:"

Transcription

1 Big Data and Data Intensive Science: Infrastructure and Other Challenges Yuri Demchenko SNE Group, University of Amsterdam EENET Conference 4 October 2013, Tartu,

2 Outline Big Data and Data Intensive Science as a new technology wave Big Data 5+1 Vs: Volume, Velocity, Variety, Value, Variability, Veracity Big Data in Science, Industry and Business Where do the data come from? What are Big Data drivers? Big Data: Paradigm change and new challenges From Big Data to All-Data Moving to data centric service models Defining Big Data Architecture Framework (BDAF) Big Data Infrastructure (BDI) and Big Data Analytics tools Addressing Big Data Infrastructure challenges by GEANT/NRENs Horizon2020: e-infrastructure challenges Data Scientist: New profession and need for Education&Training Summary and Discussion Big Data and Infrastructure Challenges Slide_2

3 Big Data Research at System and Network Engineering (SNE), University of Amsterdam Long time research and development on Infrastructure services and facilities High speed optical networking and data intensive applications Application and infrastructure security services Collaborative systems, Grid, Clouds and currently Big Data Focus on Infrastructure definition and services Software Defined Infrastructure based on Cloud/Intercloud technologies NIST Big Data Working Group Active contribution on Reference Architecture, Big Data Definition and Taxonomy, Big Data Security Research Data Alliance Interest Group on Education and Skills Development on Data Intensive Science Big Data Interest Group at SNE, UvA Non-formal but active, meets two-weekly for brainstorming sessions Provides input to NIST BD-WG and RDA BoF on Big Data Challenges for NREN s organised at TNC2013 https://tnc2013.terena.org/core/event/15 SNE Technical Report on Architecture Framework and Components for the Big Data Ecosystem. Draft Version 0.2, 12 September Big Data and Infrastructure Challenges 3

4 Visionaries and Drivers: Seminal works and High level reports The Fourth Paradigm: Data-Intensive Scientific Discovery. By Jim Gray, Microsoft, Edited by Tony Hey, et al. Riding the wave: How Europe can gain from the rising tide of scientific data. Final report of the High Level Expert Group on Scientific Data. October https://www.rd-alliance.org/ NIST Big Data Working Group (NBD-WG) https://www.rd-alliance.org/ AAA Study: Study on AAA Platforms For Scientific data/information Resources in Europe, TERENA, UvA, LIBER, UinvDeb. Big Data and Infrastructure Challenges 4

5 The Fourth Paradigm of Scientific Research 1. Theory and logical reasoning 2. Observation or Experiment E.g. Newton observed apples falling to design his theory of mechanics But Gallileo Galilei made experiments with falling objects from the Pisa leaning tower 3. Simulation of theory or model Digital simulation can prove theory or model 4. Data-driven Scientific Discovery (aka Data Science) More data beat hypnotized theory Big Data and Infrastructure Challenges 5

6 Big Data and Data Intensive Science - The next technology focus Scientific and Research Data e-science Big Data is/has becoming the next buzz word Not much academic research and papers Read seminal works, Dive into blogs and tweets Based on e-science concept and entire information and artifacts digitising Requires also new information and semantic models for information structuring and presentation Requires new research methods using large data sets and data mining Methods to evolve and results to be improved Changes the way how the modern research is done (in e-science) Secondary research, data re-focusing, linking data and publications Big Data require a new infrastructure to support both distributed data (collection, storage, processing) and metadata/discovery services High performance network and computing, distributed storage and access Cloud Computing as native platform for distributed dynamic virtualised (data supporting) infrastructure Demand for trusted/trustworthy infrastructure Big Data and Infrastructure Challenges 6

7 Gartner Technology Hypercycle Big Data Cloud Computing Source Big Data and Infrastructure Challenges 7

8 5 yr for Cloud Computing 2 yr for Big Data adoption Big Data and Infrastructure Challenges 8

9 Technology Definitions and Timeline - Overview Service Oriented Architecture (SOA): First proposed in 1996 and revived with the Web Services advent in Currently standard for industry, and widely used Provided a conceptual basis for Web Services development Computer Grids: Initially proposed in 1998 and finally shaped in 2003 with the Open Grid Services Architecture (OGSA) by Open Grid Forum (OGF) Currently remains as a collaborative environment Migrates to cloud and inter-cloud platform Cloud Computing: Initially proposed in 2008 Now entering productive phase Defined new features, capabilities, operational/usage models and actually provided a guidance for the new technology development Originated from the Service Computing domain and service management focused Big Data and Data Intensive Science: Yet to be defined Involves more components and processes to be included into the definition Can be better defined as Ecosystem where data are the main driving component Need to define the Big Data properties, expected technology capabilities and provide a guidance/vision for future technology development Big Data and Infrastructure Challenges 9

10 Big Data Definitions Overview IDC definition of Big Data (conservative and strict approach) : "A new generation of technologies and architectures designed to economically extract value from very large volumes of a wide variety of data by enabling high-velocity capture, discovery, and/or analysis Gartner definition Big data is high-volume, high-velocity and high-variety information assets that demand costeffective, innovative forms of information processing for enhanced insight and decision making. Termed as 3 parts definition, not 3V definition Big Data: a massive volume of both structured and unstructured data that is so large that it's difficult to process using traditional database and software techniques. From The Big Data Long Tail blog post by Jason Bloomberg (Jan 17, 2013). Data that exceeds the processing capacity of conventional database systems. The data is too big, moves too fast, or doesn t fit the structures of your database architectures. To gain value from this data, you must choose an alternative way to process it. Ed Dumbill, program chair for the O Reilly Strata Conference Termed as the Fourth Paradigm *) The techniques and technologies for such data-intensive science are so different that it is worth distinguishing data-intensive science from computational science as a new, fourth paradigm for scientific exploration. (Jim Gray, computer scientist) *) The Fourth Paradigm: Data-Intensive Scientific Discovery. Edited by Tony Hey, Stewart Tansley, and Kristin Tolle. Microsoft, Big Data and Infrastructure Challenges 10

11 Improved: 5+1 V s of Big Data Variety Structured Unstructured Multi-factor Probabilistic Linked Dynamic Changing data Changing model Linkage Variability Volume Terabytes Records/Arch Tables, Files Distributed 6 Vs of Big Data Trustworthiness Authenticity Origin, Reputation Availability Accountability Velocity Batch Real/near-time Processes Streams Correlations Statistical Events Hypothetical Value Generic Big Data Properties Volume Variety Velocity Acquired Properties (after entering system) Value Veracity Variability Commonly accepted 3V s of Big Data Veracity Big Data and Infrastructure Challenges 11

12 Big Data Definition: From 5+1V to 5 Parts (1) (1) Big Data Properties: 5V Volume, Variety, Velocity, Value, Veracity Additionally: Data Dynamicity (Variability) (2) New Data Models Data Lifecycle and Variability Data linking, provenance and referral integrity (3) New Analytics Real-time/streaming analytics, interactive and machine learning analytics (4) New Infrastructure and Tools High performance Computing, Storage, Network Heterogeneous multi-provider services integration New Data Centric (multi-stakeholder) service models New Data Centric security models for trusted infrastructure and data processing and storage (5) Source and Target High velocity/speed data capture from variety of sensors and data sources Data delivery to different visualisation and actionable systems and consumers Full digitised input and output, (ubiquitous) sensor networks, full digital control Big Data and Infrastructure Challenges 12

13 Big Data Definition: From 5+1V to 5 Parts (1) (1) Big Data Properties: 5V Volume, Variety, Velocity, Value, Veracity Additionally: Data Dynamicity (Variability) (2) New Data Models Data linking, provenance and referral integrity Data Lifecycle and Variability/Evolution (3) New Analytics Real-time/streaming analytics, interactive and machine learning analytics (4) New Infrastructure and Tools High performance Computing, Storage, Network Heterogeneous multi-provider services integration New Data Centric (multi-stakeholder) service models New Data Centric security models for trusted infrastructure and data processing and storage (5) Source and Target High velocity/speed data capture from variety of sensors and data sources Data delivery to different visualisation and actionable systems and consumers Full digitised input and output, (ubiquitous) sensor networks, full digital control Big Data and Infrastructure Challenges 13

14 Big Data Definition: From 5V to 5 Parts (2) Refining Gartner definition Big data is (1) high-volume, high-velocity and high-variety information assets that demand (3) cost-effective, innovative forms of information processing for (5) enhanced insight and decision making Big Data (Data Intensive) Technologies are targeting to process (1) high-volume, high-velocity, high-variety data (sets/assets) to extract intended data value and ensure high-veracity of original data and obtained information that demand (3) cost-effective, innovative forms of data and information processing (analytics) for enhanced insight, decision making, and processes control; all of those demand (should be supported by) (2) new data models (supporting all data states and stages during the whole data lifecycle) and (4) new infrastructure services and tools that allows also obtaining (and processing data) from (5) a variety of sources (including sensor networks) and delivering data in a variety of forms to different data and information consumers and devices. (1) Big Data Properties: 5V (2) New Data Models (3) New Analytics (4) New Infrastructure and Tools (5) Source and Target Big Data and Infrastructure Challenges 14

15 Data Transformation Big Data Nature: Origin and consumers (target) Big Data Origin Science Internet, Web Industry Business Living Environment, Cities Social media and networks Healthcare Telecom/Infrastructure Big Data Target Use Scientific discovery New technologies Manufacturing, processes, transport Personal services, campaigns Living environment support Healthcare support Social Networking Big Data and Infrastructure Challenges 15

16 Volume, Velocity, Variety Examples e-science Volume Terabyte records, transactions, tables, files. LHC 5 PB a month (now is under re-construction) LOFAR, SKA 5 PB every hour, requires processing asap to discard non-informative data Large Synoptic Survey Telescope (LSST) - 10 Petabytes per year Genomic research x10 TB per individual Earth, climate and weather data Velocity batch, near-time, real-time, streams. LHC ATLAS detector generates about 1 Petabyte raw data per second, during the collision time about 1 ms Variety structures, unstructured, semi-structured, and all the above in a mix Biodiversity, Biological and medical, facial research Human, psychology and behavior research History, archeology and artifacts Big Data and Infrastructure Challenges 16

17 Volume, Velocity, Variety Examples Industry Volume Terabyte records, transactions, tables, files. A Boeing Jet engine produce out 10TB of operational data for every 30 minutes they run Hence a 4-engine Jumbo jet can create 640TB on one Atlantic crossing. Multiply that to 25,000 flights flown each day Velocity batch, near-time, real-time, streams. Today s on-line ad serving requires 40ms to respond with a decision Financial services (i.e., stock quotes feed) need near 1ms to calculate customer scoring probabilities Stream data, such as movies, need to travel at high speed for proper rendering Variety structures, unstructured, semi-structured, and all the above in a mix WalMart processes 1M customer transactions per hour and feeds information to a database estimated at 2.5PB (petabytes) There are old and new data sources like RFID, sensors, mobile payments, in-vehicle tracking, etc. Big Data and Infrastructure Challenges 17

18 The Long Tail of Science (aka Dark Data ) Collectively Long Tail science is generating a lot of data Estimated as over 1PB per year and it is growing fast with the new technology proliferation rule: 20% users generate 80% data but not necessarily 80% knowledge Source: Dennis Gannon (Microsoft) NIST Big Data Workshop, 2012 Big Data and Infrastructure Challenges 18

19 Big Data technology drivers (1) Modern e-science in search for new knowledge Scientific experiments and tools are becoming bigger and heavily based on data processing and mining Traditional data intensive industry Genomic research, drugs development, Healthcare High-tech industry, CAD/CAM, weather/climate, etc. Intelligence and security Network/infrastructure management Network monitoring, Intrusion detection, troubleshooting Big Data and Infrastructure Challenges 19

20 Big Data technology drivers Technology Loop Technology loop (known as Jevons Paradox) Increased efficiency to process current demand will create new uses and increase demand even more Elastic Demand for Work: A doubling of fuel efficiency more than doubles work demanded, increasing the amount of fuel used. Jevons paradox occurs. Big Data and Infrastructure Challenges 20

21 Big Data technology drivers (2) Consumer facing companies like Google and Facebook have driven many of the recent advances in Big Data efficiency Facebook has some 900+ million users and is still growing Google handles number of search queries at 3 billion per day Twitter handles some 400 million tweets per day count for 12 terabytes per day Used also for market sentiments prediction Power companies: process up to 350 billion annual meter readings to better predict power consumption Processes/activity data recording and analysis Flight data, log data, intelligence, traffic Business (retail) uses Big Data technologies to search for customers Modern business concept (multi-channel) of delivering directly to customers requires prediction of customer behavior Data volumes What cause(s) and what effect? Big Data gives companies a fighting chance in the battle over the customer Big Data and Infrastructure Challenges 21

22 Big Data technology drivers (3) - Advertising this new course of big data, gleaned from a wealth of unstructured information on the web, has the ability to turn advertising on its head at least enough to make media people rethink algorithms for maximizing performance. HessieJ.com Traditional Ad Model: User profiles More Sophisticated Ad Model: Behavioral targeting - "smart ads" Future Ad Model: Enter Social Data Example: Mary Brown searches for information about a future trip to Mallorca She also goes to travel sites, reads hotel reviews and has excitedly spoken to close friends on Twitter and Facebook about her plans and preparations Now we have not only recent behavioral activity where she s been on the internet, but we also are aware of her conversations that validate her behavior It is safe to assume that Mary will definitely be going to Mallorca What this information does for a travel company? They now have MORE information on that user that will allow them to not only serve an ad, or even respond to that user with relevant offers, but DO so with a certain degree of confidence that Mary will at the very least click on the ad. Big Data and Infrastructure Challenges 22

23 Big Data technology drivers (3a) Service Delivery Consumer products and services delivery Netflix already captures movie genre preferences by the user and makes recommendations based on recent shows/movies watched Announced $2mln prize for effective customer targeting in 2003 Netflix recommender system in use as a reference technology implementation It is already capturing which devices the user is watching recent programs/shows and when Marrying that data with GetGlue (news feed on movies), for example, validates the original information and supplements the usage information Combined and correlating, allows Netflix to optimize the movie offering to keep you a satisfied customer It can also capture the comments and shares from those watching the movie in order to drive messaging to attract new users Big Data and Infrastructure Challenges 23

24 Big Data technology drivers (3b) Managing public campaigns, e.g. election, public relations The rise of public opinion stored in platforms like Twitter, Google, Facebook, etc. provide enough intelligence to influence the campaign development, timing, geography and even the colour of the campaign signs Twitter was a major source of data aggregation for the Republican Race in the US Multimillion-dollar contract for data management and collection services awarded May 1, 2013 to Liberty Work to build advanced list of voters Article In Data we trust by T.Edsall in The New York Times Book: In Data We Trust: How Customer Data is Revolutionising Our Economy (Aug 2012) A strategy for tomorrow's data world Big Data and Infrastructure Challenges 24

25 Big Data technology drivers (4) - Emerging Social media itself share and socialise/collaborate Facebook, Twitter, YouTube, Flickr, etc. Workplace improvement Means more data will be collected and monitored on the personnel Healthcare, health/physiological and medical information Human health monitoring not just for ill or aged people Early diagnostics, proactive care advising Big Data and Infrastructure Challenges 25

26 Big Data and Infrastructure Challenges 26

27 Foreseen Big Data Innovations in Cloud-Based Big Data Solutions Amazon s Elastic Map Reduce (EMR) is a market leader Expected new innovative Big Data and Cloud solutions Real-Time Hadoop Google s Dremel-like solutions that will allow real-time queries on Big Data and be open source Distributed and deep Machine Learning Mahout iterative scalable distributed back propagation machine learning and data mining algorithm New algorithms Jubatus, HogWild Big Data Appliances (also for home) Raspberry Pi and home-made GPU clusters Hardware vendors (Dell, HP, etc.) pack mobile ARM processors into server boxes Adepteva's Parallella will put a 16-core supercomputer into range of $99 Easier Big Data Tools Open Source and easy to use drag-and-drop tools for Big Data Analytics to facilitate the BD adoption Commercial examples: Radoop = RapidMiner + Mahout, Tableau, Datameer, etc. LexisNexis: from ECL (Enterprise Control Language) to KEL (Knowledge Engineering Language) Source: Big Data in 2013 by Mike Guattieri, Forrester Big Data and Infrastructure Challenges 27

28 NIST Big Data Working Group (NBD-WG) Deliverables target September September initial draft documents 30 September Workshop and F2F meeting Activities: Conference calls every day 17-19:00 (CET) by subgroup - Big Data Definition and Taxonomies Requirements (chair: Geoffrey Fox, Indiana Univ) Big Data Security Reference Architecture Technology Roadmap BigdataWG mailing list and useful documents Input documents Big Data Reference Architecture Big Data Architectures Survey Requirements for 51 usecases Big Data and Infrastructure Challenges 28

29 From Big Data to All-Data Paradigm Change Breaking paradigm changing factor Data storage and processing Security Identification and provenance Traditional model BIG Storage and BIG Computer with FAT pipe Move compute to data vs Move data to compute New Paradigm Continuous data production Continuous data processing DataBus as a Data container and Protocol Big Data Move or not to move? Network? Distributed Big Data Storage Data Abstraction Data Bus Infrastructure Abstraction Distributed Compute and Analytics Big Computer Visuali sation Presen tation Action DataBus: (1) Data Container (2) Metadata, State (3) Data Transfer Protocol Big Data and Infrastructure Challenges 29

30 Moving to Data-Centric Models and Technologies Current IT and communication technologies are host based or host centric Any communication or processing are bound to host/computer that runs software Especially in security: all security models are host/client based Big Data requires new data-centric models Data location, search, access Data integrity and identification Data lifecycle and variability Data centric (declarative) programming models Data aware infrastructure to support new data formats and data centric programming models Data centric security and access control Big Data and Infrastructure Challenges 30

31 Defining Big Data Architecture Framework Existing attempts don t converge to something consistent: ODCA, TMF, NIST See Architecture vs Ecosystem Big Data undergo a number of transformations during their lifecycle Big Data fuel the whole transformation chain Data sources and data consumers, target data usage Multi-dimensional relations between Data models and data driven processes Infrastructure components and data centric services Architecture vs Architecture Framework Separates concerns and factors Control and Management functions, orthogonal factors Architecture Framework components are inter-related Big Data and Infrastructure Challenges 31

32 Big Data Architecture Framework (BDAF) (1) (1) Data Models, Structures, Types Data formats, non/relational, file systems, etc. (2) Big Data Management Big Data Lifecycle (Management) Model Big Data transformation/staging Provenance, Curation, Archiving (3) Big Data Analytics and Tools Big Data Applications Target use, presentation, visualisation (4) Big Data Infrastructure (BDI) Storage, Compute, (High Performance Computing,) Network Sensor network, target/actionable devices Big Data Operational support (5) Big Data Security Data security in-rest, in-move, trusted processing environments Big Data and Infrastructure Challenges 32

33 Big Data Architecture Framework (BDAF) Aggregated Relations between components (2) Col: Used By Row: Requires This Data Models & Structures Data Managmnt & Lifecycle BigData Infrastruct & Operations BigData Analytics & Applications BigData Security Data Models Structrs Data Managmnt & Lifecycle BigData Infrastr & Operations BigData Analytics & Applicatn BigData Security Big Data and Infrastructure Challenges 33

34 Big Data Ecosystem: Data, Transformation, Infrastructure Consumer Data Source Data Collection& Registration Data Filter/Enrich, Classification Data Analytics, Modeling, Prediction Data Delivery, Visualisation Big Data Target/Customer: Actionable/Usable Data Target users, processes, objects, behavior, etc. Big Data Source/Origin (sensor, experiment, logdata, behavioral data) Big Data Analytic/Tools Federated Access and Delivery Infrastructure (FADI) Storage General Purpose Compute General Purpose High Performance Computer Clusters Storage Specialised Databases Archives (analytics DB, In memory, operstional) Data Management Data Data categories: Data categories: metadata, categories: metadata, (un)structured, metadata, (un)structured, (non)identifiable (un)structured, (non)identifiable (non)identifiable Intercloud multi-provider heterogeneous Infrastructure Security Infrastructure Network Infrastructure Internal Infrastructure Management/Monitoring Big Data and Infrastructure Challenges 34

35 General BDI services and components Data management infrastructure and tools Registries, search/indexing, ontologies, schemas, namespace Collaborative Environment (user/groups managements) Heterogeneous multi-provider Inter-cloud infrastructure Compute, Storage, Network (provisioned on-demand dynamically scaling) Federated Access and Delivery Infrastructure (FADI) Advanced high performance (programmable) network Security infrastructure (access control, Identity and policy management, confidentiality, privacy, trust) Big Data and Infrastructure Challenges 35

36 Big Data Infrastructure and Analytic Tools Big Data Target/Customer: Actionable/Usable Data Target users, processes, objects, behavior, etc. Big Data Source/Origin (sensor, experiment, logdata, behavioral data) Big Data Analytic/Tools Analytics: Refinery, Linking, Fusion Analytics Applications : Link Analysis Cluster Analysis Entity Resolution Complex Analysis Federated Access and Delivery Infrastructure (FADI) Analytics : Realtime, Interactive, Batch, Streaming Storage General Purpose Compute General Purpose High Performance Computer Clusters Storage Specialised Databases Archives Data Management Data Data categories: Data categories: metadata, categories: metadata, (un)structured, metadata, (un)structured, (non)identifiable (un)structured, (non)identifiable (non)identifiable Intercloud multi-provider heterogeneous Infrastructure Security Infrastructure Network Infrastructure Internal Infrastructure Management/Monitoring Big Data and Infrastructure Challenges 36

37 Big Data Analytics Infrastructure High Performance Computer Clusters (HPCC) Specialised Storage, Distributed/Replicated, Archives, Databases, SQL/NoSQL Big Data Analytics Tools/Applications Analytics/processing: Real-time, Interactive, Batch, Streaming Link Analysis, Graph analysis Cluster Analysis Entity Resolution Complex Analysis Big Data and Infrastructure Challenges 37

38 Consumer Data Analitics Application Data Transformation/Lifecycle Model Data Model (1) Data Model (1) Data (inter)linking? Persistent ID Identification Privacy, Opacity Data Storage Data Model (3) Common Data Model? Data Variety and Variability Semantic Interoperability Data Model (4) Data Source Data Collection& Registration Data Filter/Enrich, Classification Data Analytics, Modeling, Prediction Data Delivery, Visualisation Does Data Model changes along lifecycle or data evolution? Identifying and linking data Persistent identifier Data repurposing, Analitics re-factoring, Secondary processing Traceability vs Opacity Referral integrity Big Data and Infrastructure Challenges 38

39 Evolutional/Hierarchical Data Model Actionable Data Papers/Reports Archival Data Usable Data Processed Data (for target use) Processed Data (for target use) Processed Data (for target use) Classified/Structured Data Classified/Structured Data Classified/Structured Data Raw Data Common Data Model? Data interlinking? Fits to Graph data type? Metadata Referrals Control information Policy Data patterns Big Data and Infrastructure Challenges 39

40 Horizon2020: Challenge (2.1): Development, deployment and operation of e-infrastructures Call on Specific Challenge 2.1: Development, deployment and operation of e- infrastructures (based on discussion draft summer 2013) CHALLENGE 1 High Performance Computing (HPC) CHALLENGE 2 - CONNECTIVITY Topic 4: Research and Education Networking GÉANT CHALLENGE 3 - DATA Topic 5: Community data services Topic 6: Managing, preserving and computing with big research data Topic 7: e-infrastructure for Open Access Topic 8: Towards global data e-infrastructures - RDA CHALLENGE 4 e-infrastructure INTEGRATION Topic 9: e-infrastructures for virtual research environments (VRE) Topic 10: Provisioning of core services across e-infrastructures Topic 11: Skills and new professions for e-infrastructures CHALLENGE 5 POLICY AND INTERNATIONAL Topic 12: Policy development and international cooperation Big Data and Infrastructure Challenges 40

41 GEANT and NRENs: Addressing Big Data Infrastructure Challenges Open Cloud exchange (OCX) initiative by GN3plus JRA1: Network Architectures for Horizon 2020 GEANT Network to support 2Tbps capacity backbone SURFnet PSNC 100 Gbps remote robotics demo at TNC2013 From Software Defined Network (SDN) to Software Defined Infrastructure (SDI) A new thinking beyond current challenges Federated Identity Management and Federated Access and Delivery Infrastructure (FADI) Big Data and Infrastructure Challenges 41

42 General use case for infrastructure provisioning: Workflow => Logical (Cloud) Infrastructure Enterprise/Scientific workflow Input Data Storage Data Instrum. Data Data Filtering Special Proc 1 Special Proc 2 Data Archive Visual Present Campus A CE Visualisation Visualisation Campus B CE User Group A User User User Resource/ Service Provider VR1 VR2 VR3 VR6 VR4 VR5 Cloud 1 IaaS Cloud 2 PaaS VR7 User User User Enterprise/Project based Intercloud Infrastructure User Group B Enterprise/Scientific workflow Is mapped to heterogeneous cloud infrastructure containing IaaS, PaaS components Resource/ Service Provider Cloud IaaS Provider Cloud PaaS Provider Big Data and Infrastructure Challenges 42

43 Multi/inter- cloud infrastructure provisioning: => ICADI and OCX functions Enterprise/Scientific workflow Campus A User Group A CE User User User Visualisation Input Data Storage Data Instrum. Data Data Filtering Special Proc 1 Special Proc 2 Data Archive Visual Present Visualisation User User User Campus B CE User Group B OCX as a part of GEANT infrastructure OCX at NREN Open Cloud exchange (OCX) interconnects CSP and Customer (campus) infrastructures Resource/ Service Provider OCX assumes collocation or collapsed backbone for connecting all OCX members Resource/ Service Provider VR1 VR2 VR3 VR4 VR6 VR5 Cloud 1 IaaS Cloud IaaS Provider Cloud 2 PaaS VR7 Enterprise/Project based Intercloud Infrastructure Cloud PaaS Provider OCX at NREN Big Data and Infrastructure Challenges 43

44 General use case for infrastructure provisioning: Logical Infrastructure => Network Infrastructure (2) Resource and Cloud Provider Domains Campus A Infrastructure VR1 VR3 VR5 VR7 Campus B Infrastructure VR2 VR4 VR6 Campus A CE Visualisation Network Provider 1 Cloud Carrier or Network Provider 2 OCX at Cloud Carrier or Network Provider (NREN) level Visualisation Campus B CE User Group A User User User Resource/ Service Provider VR1 VR2 VR3 VR6 VR4 VR5 Cloud 1 IaaS Cloud 2 PaaS VR7 User User User Enterprise/Project based Intercloud Infrastructure User Group B Provisioning network infrastructure may involve multiple providers Introducing OCX (Open Cloud exchange) Resource/ Service Provider Cloud IaaS Provider Cloud PaaS Provider Big Data and Infrastructure Challenges 44

45 OCX Definition and Operational Principles Direct service/inter-member peering Re-use and leverage Internet exchange Point (IXP) experience Open collocation services No third party (intermediary/broker) services Transparency for cloud based services No involvement into peering or mutual business relations Trusted Third Party (TTP) To support dynamic service agreements and/or federation establishment Trusted Introducer for dynamic trust establishment May include other special services to support smooth services delivery and integration between CSP and Customer E.g., Local policies, service registry and discovery, Application/VM repository Big Data and Infrastructure Challenges 45

46 OCX Trusted Third Party services TTP OCX L0-L2/L3 topology Any-to-any Distributed, collapsed, hierarchical Topology information exchange L0- L2 + L3? QoS control SDN control over OCX switching OCX Pre-established trust relation with OCX as TTP Trust relations established as a part of dynamic federation between OCX members TTP goals and services Enable dynamic federations establishment Trusted Certificates and CA s Repository Similar to TACAR (TERENA Academic CA Repository) Trusted Introducer Service Trusted Introduction Protocol Service Registry and Discovery SLA repository and clearinghouse Big Data and Infrastructure Challenges 46

47 OCX location options: GN3, NREN (+University?) CSP GEANT NREN University DFlow IP/L3 VR3 VR4 VR6 VR5 VR7 OCX Visualisation User User User CE L2 L1 L0 VR3 VR4 VR6 VR5 VR7 OCX Visualisation User User User CE Big Data and Infrastructure Challenges 47

48 OCX Hierarchical Topology Model CSP GEANT NREN University VR4 VR6 VR7 OCX Visualisation CE DFlow VR3 VR5 User User User IP/L3 L2 L1 L0 VR3 VR4 VR6 VR5 VR7 OCX Visualisation User User User CE Big Data and Infrastructure Challenges 48

49 Pilot OCX Concept Implementation by SURFnet NetherLight as an Open Lightpath Exchanges (OLE) operated by SURFnet To Bandwidth on Demand, Virtual Networking 100G transatlantic trials, and Open Cloud exchange (OCX) pilot/prototype Since 2012, SURFnet and NetherLight have been performing pilots with cloud providers that want to offer their services with lightpath characteristics to SURFnet connected institutions GreenQloud (Iceland) offers service delivery over lightpaths via NetherLight/SURFnet with advantages above the regular IP services Guaranteed bandwidth and latency Protected from the external Internet Possibility of domain extension (VM s appear as if inside campus network) Cost reduction due to offloading of traffic Pilot since 2012 between OneXS and Windesheim University of Applied Sciences to deliver unified services of fixed and mobile telephony Big Data and Infrastructure Challenges 49

50 General SURFnet Lightpath interconnection Universities GeenQloud: cloud services OneXS: unified services of fixed and mobile telephony Big Data and Infrastructure Challenges 50

51 Data Scientist: New Profession and Opportunities McKinsey Institute on Big Data Jobs There will be a shortage of talent necessary for organizations to take advantage of Big Data. By 2018, the United States alone could face a shortage of 140,000 to 190,000 people with deep analytical skills as well as 1.5 million managers and analysts with the know-how to use the analysis of big data to make effective decisions SOURCE:US Bureau of Labor Statistics; US Census; Dun & Bradstreet; company interviews; McKinsey analysis Big Data and Infrastructure Challenges 51

52 H2020 Topic 11 - Skills and new professions for e-infrastructure research data Scope of work of the emerging professions of e-infrastructure developers, integrators, operators, research technologists, data scientists and data librarians To support researchers, professors and students in computing and networking, as well as in handling, analysing, and storing large amounts of data and content Support the establishment of these professions as distinct professions from that of a researcher and a traditional computer specialist/engineer Create a reference model and a Common Body of Knowledge (CBK) which defines their competencies Defining or updating university curricula for the e-infrastructure competences mentioned above, and promoting their adoption Development of appropriate curricula and training materials Big Data and Infrastructure Challenges 52

53 Slide from the presentation Demystifying Data Science (by Natasha Balac, SDSC) Big Data and Infrastructure Challenges 53

54 Strata Survey Skills and Data Scientist Self-ID Analysing the Analysers. O Reilly Strata Survey Harris, Murphy&Vaisman, 2013 Based on how data scientists think about themselves and their work Identified four Data Scientist clusters Big Data and Infrastructure Challenges 54

55 Skills and Self-ID Top Factors Business ML/BigData Math/OR Programming Statistics ML Machine Learning OR Operations Research Big Data and Infrastructure Challenges 55

Big Data Architecture Research at UvA

Big Data Architecture Research at UvA Big Architecture Research at UvA Yuri Demchenko System and Network Engineering Group, University of Amsterdam ISO/IEC SGBD Big Technologies Workshop Part of ISO/IEC Big Study Group meeting 13-16 May Outline

More information

Defining Architecture Components of the Big Data Ecosystem

Defining Architecture Components of the Big Data Ecosystem Defining Architecture Components of the Big Data Ecosystem Yuri Demchenko SNE Group, University of Amsterdam 2 nd BDDAC2014 Symposium, CTS2014 Conference 19-23 May 2014, Minneapolis, USA Outline Big Data

More information

Overview NIST Big Data Working Group Activities

Overview NIST Big Data Working Group Activities Overview NIST Big Working Group Activities and Big Architecture Framework (BDAF) by UvA Yuri Demchenko SNE Group, University of Amsterdam Big Analytics Interest Group 17 September 2013, 2nd RDA Plenary

More information

Cloud and Big Data Standardisation

Cloud and Big Data Standardisation Cloud and Big Data Standardisation EuroCloud Symposium ICS Track: Standards for Big Data in the Cloud 15 October 2013, Luxembourg Yuri Demchenko System and Network Engineering Group, University of Amsterdam

More information

Big Data Standardisation in Industry and Research

Big Data Standardisation in Industry and Research Big Data Standardisation in Industry and Research EuroCloud Symposium ICS Track: Standards for Big Data in the Cloud 15 October 2013, Luxembourg Yuri Demchenko System and Network Engineering Group, University

More information

Open Cloud exchange (OCX)

Open Cloud exchange (OCX) Open Cloud exchange (OCX) Draft Proposal and Progress GN3plus JRA1 Task 2 - Network Architectures for Cloud Services Yuri Demchenko SNE Group, University of Amsterdam 10 October 2013, GN3plus Symposium,

More information

Big Security for Big Data: Addressing Security Challenges for the Big Data Infrastructure

Big Security for Big Data: Addressing Security Challenges for the Big Data Infrastructure Big Security for Big : Addressing Security Challenges for the Big Infrastructure Yuri Demchenko SNE Group, University of Amsterdam On behalf of Yuri Demchenko, Canh Ngo, Cees de Laat, Peter Membrey, Daniil

More information

Big Data and Data Intensive (Science) Technologies

Big Data and Data Intensive (Science) Technologies Big Data and Data Intensive (Science) Technologies Yuri Demchenko System and Network Engineering, UvA CN CN CN CN CN CN CN CN CN CN CN 2014, UvA Big Data and Data Science 1 Outline Big Data and Data Intensive

More information

Big Data course and Learning Model for Online education (LMO) at the Laureate Online Education (University of Liverpool)

Big Data course and Learning Model for Online education (LMO) at the Laureate Online Education (University of Liverpool) Big Data course and Learning Model for Online education (LMO) at the Laureate Online Education (University of Liverpool) Yuri Demchenko (University of Amsterdam) Emanuel Gruengard (Laureate Online Education)

More information

EDISON Education for Data Intensive Science to Open New science frontiers

EDISON Education for Data Intensive Science to Open New science frontiers H2020 INFRASUPP-4 CSA Project EDISON Education for Data Intensive Science to Open New science frontiers Yuri Demchenko University of Amsterdam Outline Consortium members EDISON Project Concept and Objectives

More information

Instructional Model for Building effective Big Data Curricula for Online and Campus Education

Instructional Model for Building effective Big Data Curricula for Online and Campus Education Instructional Model for Building effective Big Data Curricula for Online and Campus Education Big Data course and Learning Model for Online education (LMO) at the Laureate Online Education (University

More information

Defining the Big Data Architecture Framework (BDAF)

Defining the Big Data Architecture Framework (BDAF) Defining the Big Architecture Framework (BDAF) Outcome of the Brainstorming Session at the University of Amsterdam Yuri Demchenko (facilitator, reporter), SNE Group, University of Amsterdam 17 July 2013,

More information

Addressing Big Data Issues in Scientific Data Infrastructure

Addressing Big Data Issues in Scientific Data Infrastructure Addressing Big Issues in Scientific Infrastructure Yuri Demchenko, SNE Group, University of Amsterdam 21 May 2013, BDDAC2013 Workshop CTS2013 San Diego Outline Background to Big research at SNE/UvA Big

More information

Defining Architecture Components of the Big Data Ecosystem

Defining Architecture Components of the Big Data Ecosystem Defining Architecture Components of the Big Data Ecosystem Yuri Demchenko, Cees de Laat System and Network Engineering Group University of Amsterdam Amsterdam, The Netherlands e-mail: {y.demchenko, C.T.A.M.deLaat}@uva.nl

More information

Big Data Challenges for e-science Infrastructure

Big Data Challenges for e-science Infrastructure Big Challenges for e-science Infrastructure Yuri Demchenko, SNE Group, University of Amsterdam AAA-Study Project COINFO2012 Conference 24-25 November 2012, Nanjing, China 23-25 November 2012, Nanjing Big

More information

Instructional Model for Building effective Big Data Curricula for Online and Campus Education

Instructional Model for Building effective Big Data Curricula for Online and Campus Education Instructional Model for Building effective Big Data Curricula for Online and Campus Education 1 Yuri Demchenko, 2 Emanuel Gruengard, 1,3 Sander Klous, 1 University of Amsterdam System and Network Engineering

More information

Big Data a threat or a chance?

Big Data a threat or a chance? Big Data a threat or a chance? Helwig Hauser University of Bergen, Dept. of Informatics Big Data What is Big Data? well, lots of data, right? we come back to this in a moment. certainly, a buzz-word but

More information

Are You Ready for Big Data?

Are You Ready for Big Data? Are You Ready for Big Data? Jim Gallo National Director, Business Analytics April 10, 2013 Agenda What is Big Data? How do you leverage Big Data in your company? How do you prepare for a Big Data initiative?

More information

Big Data Analytics. Prof. Dr. Lars Schmidt-Thieme

Big Data Analytics. Prof. Dr. Lars Schmidt-Thieme Big Data Analytics Prof. Dr. Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany 33. Sitzung des Arbeitskreises Informationstechnologie,

More information

The future of Big Data A United Hitachi View

The future of Big Data A United Hitachi View The future of Big Data A United Hitachi View Alex van Die Pre-Sales Consultant 1 Oktober 2014 1 Agenda Evolutie van Data en Analytics Internet of Things Hitachi Social Innovation Vision and Solutions 2

More information

Are You Ready for Big Data?

Are You Ready for Big Data? Are You Ready for Big Data? Jim Gallo National Director, Business Analytics February 11, 2013 Agenda What is Big Data? How do you leverage Big Data in your company? How do you prepare for a Big Data initiative?

More information

Workprogramme 2014-15

Workprogramme 2014-15 Workprogramme 2014-15 e-infrastructures DCH-RP final conference 22 September 2014 Wim Jansen einfrastructure DG CONNECT European Commission DEVELOPMENT AND DEPLOYMENT OF E-INFRASTRUCTURES AND SERVICES

More information

Big Data and Education: Education and Skills Development in Big Data and Data Intensive Science

Big Data and Education: Education and Skills Development in Big Data and Data Intensive Science Big Data and Education: Education and Skills Development in Big Data and Data Intensive Science Yuri Demchenko SNE Group, University of Amsterdam ISO/IEC SGBD Big Data Technologies Workshop Part of ISO/IEC

More information

The 4 Pillars of Technosoft s Big Data Practice

The 4 Pillars of Technosoft s Big Data Practice beyond possible Big Use End-user applications Big Analytics Visualisation tools Big Analytical tools Big management systems The 4 Pillars of Technosoft s Big Practice Overview Businesses have long managed

More information

Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies

Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies Big Data: Global Digital Data Growth Growing leaps and bounds by 40+% Year over Year! 2009 =.8 Zetabytes =.08

More information

Big Data Challenges and Success Factors. Deloitte Analytics Your data, inside out

Big Data Challenges and Success Factors. Deloitte Analytics Your data, inside out Big Data Challenges and Success Factors Deloitte Analytics Your data, inside out Big Data refers to the set of problems and subsequent technologies developed to solve them that are hard or expensive to

More information

The Next Wave of Data Management. Is Big Data The New Normal?

The Next Wave of Data Management. Is Big Data The New Normal? The Next Wave of Data Management Is Big Data The New Normal? Table of Contents Introduction 3 Separating Reality and Hype 3 Why Are Firms Making IT Investments In Big Data? 4 Trends In Data Management

More information

Architecture Framework and Components for the Big Data Ecosystem

Architecture Framework and Components for the Big Data Ecosystem System and Network Engineering Group, UvA UNIVERSITEIT VAN AMSTERDAM System and Network Engineering Architecture Framework and Components for the Big Data Ecosystem Draft Version 0.2 Yuri Demchenko, Canh

More information

Big Data. Fast Forward. Putting data to productive use

Big Data. Fast Forward. Putting data to productive use Big Data Putting data to productive use Fast Forward What is big data, and why should you care? Get familiar with big data terminology, technologies, and techniques. Getting started with big data to realize

More information

Impact of Big Data in Oil & Gas Industry. Pranaya Sangvai Reliance Industries Limited 04 Feb 15, DEJ, Mumbai, India.

Impact of Big Data in Oil & Gas Industry. Pranaya Sangvai Reliance Industries Limited 04 Feb 15, DEJ, Mumbai, India. Impact of Big Data in Oil & Gas Industry Pranaya Sangvai Reliance Industries Limited 04 Feb 15, DEJ, Mumbai, India. New Age Information 2.92 billions Internet Users in 2014 Twitter processes 7 terabytes

More information

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON Overview * Introduction * Multiple faces of Big Data * Challenges of Big Data * Cloud Computing

More information

Addressing Big Data Issues in Scientific Data Infrastructure

Addressing Big Data Issues in Scientific Data Infrastructure Addressing Big Data Issues in Scientific Data Infrastructure Yuri Demchenko, Paola Grosso, Cees de Laat System and Network Engineering Group University of Amsterdam Amsterdam, The Netherlands e-mail: {y.demchenko,

More information

What happens when Big Data and Master Data come together?

What happens when Big Data and Master Data come together? What happens when Big Data and Master Data come together? Jeremy Pritchard Master Data Management fgdd 1 What is Master Data? Master data is data that is shared by multiple computer systems. The Information

More information

CAP4773/CIS6930 Projects in Data Science, Fall 2014 [Review] Overview of Data Science

CAP4773/CIS6930 Projects in Data Science, Fall 2014 [Review] Overview of Data Science CAP4773/CIS6930 Projects in Data Science, Fall 2014 [Review] Overview of Data Science Dr. Daisy Zhe Wang CISE Department University of Florida August 25th 2014 20 Review Overview of Data Science Why Data

More information

D. Regvart (CARNET), Y. Demchenko (UvA), S. Filiposka (UKiM), M. de Vos (SURFnet), T. Karaliotas (GRNET), K. Baumann (SWITCH), D.

D. Regvart (CARNET), Y. Demchenko (UvA), S. Filiposka (UKiM), M. de Vos (SURFnet), T. Karaliotas (GRNET), K. Baumann (SWITCH), D. 31-03-2014 Network Architectures for Cloud Services Milestone MS101 (MJ1.2.1) Contractual Date: 31-03-2014 Actual Date: 31-03-2014 Grant Agreement No.: 605243 Work Package/Activity: 12/JRA1 Task Item:

More information

Exploiting Data at Rest and Data in Motion with a Big Data Platform

Exploiting Data at Rest and Data in Motion with a Big Data Platform Exploiting Data at Rest and Data in Motion with a Big Data Platform Sarah Brader, sarah_brader@uk.ibm.com What is Big Data? Where does it come from? 12+ TBs of tweet data every day 30 billion RFID tags

More information

COMP9321 Web Application Engineering

COMP9321 Web Application Engineering COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 11 (Part II) http://webapps.cse.unsw.edu.au/webcms2/course/index.php?cid=2411

More information

Getting Started Practical Input For Your Roadmap

Getting Started Practical Input For Your Roadmap Getting Started Practical Input For Your Roadmap Mike Ferguson Managing Director, Intelligent Business Strategies BA4ALL Big Data & Analytics Insight Conference Stockholm, May 2015 About Mike Ferguson

More information

Deploying Big Data to the Cloud: Roadmap for Success

Deploying Big Data to the Cloud: Roadmap for Success Deploying Big Data to the Cloud: Roadmap for Success James Kobielus Chair, CSCC Big Data in the Cloud Working Group IBM Big Data Evangelist. IBM Data Magazine, Editor-in- Chief. IBM Senior Program Director,

More information

How to Leverage Big Data in the Cloud to Gain Competitive Advantage

How to Leverage Big Data in the Cloud to Gain Competitive Advantage How to Leverage Big Data in the Cloud to Gain Competitive Advantage James Kobielus, IBM Big Data Evangelist Editor-in-Chief, IBM Data Magazine Senior Program Director, Product Marketing, Big Data Analytics

More information

CIS 4930/6930 Spring 2014 Introduction to Data Science Data Intensive Computing. University of Florida, CISE Department Prof.

CIS 4930/6930 Spring 2014 Introduction to Data Science Data Intensive Computing. University of Florida, CISE Department Prof. CIS 4930/6930 Spring 2014 Introduction to Data Science Data Intensive Computing University of Florida, CISE Department Prof. Daisy Zhe Wang Data Science Overview Why, What, How, Who Outline Why Data Science?

More information

White Paper. How Streaming Data Analytics Enables Real-Time Decisions

White Paper. How Streaming Data Analytics Enables Real-Time Decisions White Paper How Streaming Data Analytics Enables Real-Time Decisions Contents Introduction... 1 What Is Streaming Analytics?... 1 How Does SAS Event Stream Processing Work?... 2 Overview...2 Event Stream

More information

Data Centric Systems (DCS)

Data Centric Systems (DCS) Data Centric Systems (DCS) Architecture and Solutions for High Performance Computing, Big Data and High Performance Analytics High Performance Computing with Data Centric Systems 1 Data Centric Systems

More information

Big Data and Analytics: Challenges and Opportunities

Big Data and Analytics: Challenges and Opportunities Big Data and Analytics: Challenges and Opportunities Dr. Amin Beheshti Lecturer and Senior Research Associate University of New South Wales, Australia (Service Oriented Computing Group, CSE) Talk: Sharif

More information

BIG DATA FUNDAMENTALS

BIG DATA FUNDAMENTALS BIG DATA FUNDAMENTALS Timeframe Minimum of 30 hours Use the concepts of volume, velocity, variety, veracity and value to define big data Learning outcomes Critically evaluate the need for big data management

More information

Big Data Analytics Empowering SME s to think and act

Big Data Analytics Empowering SME s to think and act Big Data Analytics Empowering SME s to think and act Author: Haricharan Mylaraiah Chief Operating Officer 1320 Greenway Drive, Irving, TX 75038 Contents Executive Summary... 3 Introduction... 4 What Big

More information

International Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: 2454-2377 Vol. 1, Issue 6, October 2015. Big Data and Hadoop

International Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: 2454-2377 Vol. 1, Issue 6, October 2015. Big Data and Hadoop ISSN: 2454-2377, October 2015 Big Data and Hadoop Simmi Bagga 1 Satinder Kaur 2 1 Assistant Professor, Sant Hira Dass Kanya MahaVidyalaya, Kala Sanghian, Distt Kpt. INDIA E-mail: simmibagga12@gmail.com

More information

The Future of Data Management

The Future of Data Management The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah (@awadallah) Cofounder and CTO Cloudera Snapshot Founded 2008, by former employees of Employees Today ~ 800 World Class

More information

NextGen Infrastructure for Big DATA Analytics.

NextGen Infrastructure for Big DATA Analytics. NextGen Infrastructure for Big DATA Analytics. So What is Big Data? Data that exceeds the processing capacity of conven4onal database systems. The data is too big, moves too fast, or doesn t fit the structures

More information

A New Era Of Analytic

A New Era Of Analytic Penang egovernment Seminar 2014 A New Era Of Analytic Megat Anuar Idris Head, Project Delivery, Business Analytics & Big Data Agenda Overview of Big Data Case Studies on Big Data Big Data Technology Readiness

More information

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica

More information

Hadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, 2012. Viswa Sharma Solutions Architect Tata Consultancy Services

Hadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, 2012. Viswa Sharma Solutions Architect Tata Consultancy Services Hadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, 2012 Viswa Sharma Solutions Architect Tata Consultancy Services 1 Agenda What is Hadoop Why Hadoop? The Net Generation is here Sizing the

More information

Demystifying The Data Scientist

Demystifying The Data Scientist Demystifying The Data Scientist Natasha Balac, Ph.D. Predictive Analytics Center of Excellence, Director San Diego Supercomputer Center University of California, San Diego Brief History of SDSC 1985-1997:

More information

Trends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum

Trends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum Trends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum Siva Ravada Senior Director of Development Oracle Spatial and MapViewer 2 Evolving Technology Platforms

More information

ATA DRIVEN GLOBAL VISION CLOUD PLATFORM STRATEG N POWERFUL RELEVANT PERFORMANCE SOLUTION CLO IRTUAL BIG DATA SOLUTION ROI FLEXIBLE DATA DRIVEN V

ATA DRIVEN GLOBAL VISION CLOUD PLATFORM STRATEG N POWERFUL RELEVANT PERFORMANCE SOLUTION CLO IRTUAL BIG DATA SOLUTION ROI FLEXIBLE DATA DRIVEN V ATA DRIVEN GLOBAL VISION CLOUD PLATFORM STRATEG N POWERFUL RELEVANT PERFORMANCE SOLUTION CLO IRTUAL BIG DATA SOLUTION ROI FLEXIBLE DATA DRIVEN V WHITE PAPER Create the Data Center of the Future Accelerate

More information

Big Analytics: A Next Generation Roadmap

Big Analytics: A Next Generation Roadmap Big Analytics: A Next Generation Roadmap Cloud Developers Summit & Expo: October 1, 2014 Neil Fox, CTO: SoftServe, Inc. 2014 SoftServe, Inc. Remember Life Before The Web? 1994 Even Revolutions Take Time

More information

Luncheon Webinar Series May 13, 2013

Luncheon Webinar Series May 13, 2013 Luncheon Webinar Series May 13, 2013 InfoSphere DataStage is Big Data Integration Sponsored By: Presented by : Tony Curcio, InfoSphere Product Management 0 InfoSphere DataStage is Big Data Integration

More information

Defining Generic Architecture for Cloud Infrastructure as a Service Model

Defining Generic Architecture for Cloud Infrastructure as a Service Model Defining Generic Architecture for Cloud Infrastructure as a Service Model Yuri Demchenko 1 University of Amsterdam Science Park 904, Amsterdam, The Netherlands E-mail: y.demchenko@uva.nl Cees de Laat University

More information

5 Keys to Unlocking the Big Data Analytics Puzzle. Anurag Tandon Director, Product Marketing March 26, 2014

5 Keys to Unlocking the Big Data Analytics Puzzle. Anurag Tandon Director, Product Marketing March 26, 2014 5 Keys to Unlocking the Big Data Analytics Puzzle Anurag Tandon Director, Product Marketing March 26, 2014 1 A Little About Us A global footprint. A proven innovator. A leader in enterprise analytics for

More information

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time SCALEOUT SOFTWARE How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time by Dr. William Bain and Dr. Mikhail Sobolev, ScaleOut Software, Inc. 2012 ScaleOut Software, Inc. 12/27/2012 T wenty-first

More information

Big Data Analytics. Chances and Challenges. Volker Markl

Big Data Analytics. Chances and Challenges. Volker Markl Volker Markl Professor and Chair Database Systems and Information Management (DIMA), Technische Universität Berlin www.dima.tu-berlin.de Big Data Analytics Chances and Challenges Volker Markl DIMA BDOD

More information

Danny Wang, Ph.D. Vice President of Business Strategy and Risk Management Republic Bank

Danny Wang, Ph.D. Vice President of Business Strategy and Risk Management Republic Bank Danny Wang, Ph.D. Vice President of Business Strategy and Risk Management Republic Bank Agenda» Overview» What is Big Data?» Accelerates advances in computer & technologies» Revolutionizes data measurement»

More information

Innovation Session BIG DATA. HP EMEA Software Performance Tour 2014

Innovation Session BIG DATA. HP EMEA Software Performance Tour 2014 HP EMEA Software Performance Tour 2014 Innovation Session BIG DATA Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Unlocking

More information

Enabling the SmartGrid through Cloud Computing

Enabling the SmartGrid through Cloud Computing Enabling the SmartGrid through Cloud Computing April 2012 Creating Value, Delivering Results 2012 eglobaltech Incorporated. Tech, Inc. All rights reserved. 1 Overall Objective To deliver electricity from

More information

Klarna Tech Talk: Mind the Data! Jeff Pollock InfoSphere Information Integration & Governance

Klarna Tech Talk: Mind the Data! Jeff Pollock InfoSphere Information Integration & Governance Klarna Tech Talk: Mind the Data! Jeff Pollock InfoSphere Information Integration & Governance IBM s statements regarding its plans, directions, and intent are subject to change or withdrawal without notice

More information

Get More Scalability and Flexibility for Big Data

Get More Scalability and Flexibility for Big Data Solution Overview LexisNexis High-Performance Computing Cluster Systems Platform Get More Scalability and Flexibility for What You Will Learn Modern enterprises are challenged with the need to store and

More information

International Journal of Advancements in Research & Technology, Volume 3, Issue 5, May-2014 18 ISSN 2278-7763. BIG DATA: A New Technology

International Journal of Advancements in Research & Technology, Volume 3, Issue 5, May-2014 18 ISSN 2278-7763. BIG DATA: A New Technology International Journal of Advancements in Research & Technology, Volume 3, Issue 5, May-2014 18 BIG DATA: A New Technology Farah DeebaHasan Student, M.Tech.(IT) Anshul Kumar Sharma Student, M.Tech.(IT)

More information

How Big Is Big Data Adoption? Survey Results. Survey Results... 4. Big Data Company Strategy... 6

How Big Is Big Data Adoption? Survey Results. Survey Results... 4. Big Data Company Strategy... 6 Survey Results Table of Contents Survey Results... 4 Big Data Company Strategy... 6 Big Data Business Drivers and Benefits Received... 8 Big Data Integration... 10 Big Data Implementation Challenges...

More information

Executive Summary... 2 Introduction... 3. Defining Big Data... 3. The Importance of Big Data... 4 Building a Big Data Platform...

Executive Summary... 2 Introduction... 3. Defining Big Data... 3. The Importance of Big Data... 4 Building a Big Data Platform... Executive Summary... 2 Introduction... 3 Defining Big Data... 3 The Importance of Big Data... 4 Building a Big Data Platform... 5 Infrastructure Requirements... 5 Solution Spectrum... 6 Oracle s Big Data

More information

III Big Data Technologies

III Big Data Technologies III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution

More information

DAMA NY DAMA Day October 17, 2013 IBM 590 Madison Avenue 12th floor New York, NY

DAMA NY DAMA Day October 17, 2013 IBM 590 Madison Avenue 12th floor New York, NY Big Data Analytics DAMA NY DAMA Day October 17, 2013 IBM 590 Madison Avenue 12th floor New York, NY Tom Haughey InfoModel, LLC 868 Woodfield Road Franklin Lakes, NJ 07417 201 755 3350 tom.haughey@infomodelusa.com

More information

Why is BIG Data Important?

Why is BIG Data Important? Why is BIG Data Important? March 2012 1 Why is BIG Data Important? A Navint Partners White Paper May 2012 Why is BIG Data Important? March 2012 2 What is Big Data? Big data is a term that refers to data

More information

Big Data Introduction, Importance and Current Perspective of Challenges

Big Data Introduction, Importance and Current Perspective of Challenges International Journal of Advances in Engineering Science and Technology 221 Available online at www.ijaestonline.com ISSN: 2319-1120 Big Data Introduction, Importance and Current Perspective of Challenges

More information

locuz.com Big Data Services

locuz.com Big Data Services locuz.com Big Data Services Big Data At Locuz, we help the enterprise move from being a data-limited to a data-driven one, thereby enabling smarter, faster decisions that result in better business outcome.

More information

On Establishing Big Data Breakwaters

On Establishing Big Data Breakwaters On Establishing Big Data Breakwaters with Analytics Dr. - Ing. Morris Riedel Head of Research Group High Productivity Data Processing, Juelich Supercomputing Centre, Germany Adjunct Associated Professor,

More information

Splunk Company Overview

Splunk Company Overview Copyright 2015 Splunk Inc. Splunk Company Overview Name Title Safe Harbor Statement During the course of this presentation, we may make forward looking statements regarding future events or the expected

More information

BIG DATA & SOCIAL INNOVATION KENNETH THOMAS, CLIENT MANAGER

BIG DATA & SOCIAL INNOVATION KENNETH THOMAS, CLIENT MANAGER BIG DATA & SOCIAL INNOVATION KENNETH THOMAS, CLIENT MANAGER 1 MAKING THE RIGHT DECISSION AT THE RIGHT PLACE AT THE RIGHT TIME 2 THE DATA MULTIPLIER EFFECT AT WORK BUSINESS DRIVEN HUMAN DRIVEN MACHINE DRIVEN

More information

Big Data Analytics- Innovations at the Edge

Big Data Analytics- Innovations at the Edge Big Data Analytics- Innovations at the Edge Brian Reed Chief Technologist Healthcare Four Dimensions of Big Data 2 The changing Big Data landscape Annual Growth ~100% Machine Data 90% of Information Human

More information

HOW TO MAKE SENSE OF BIG DATA TO BETTER DRIVE BUSINESS PROCESSES, IMPROVE DECISION-MAKING, AND SUCCESSFULLY COMPETE IN TODAY S MARKETS.

HOW TO MAKE SENSE OF BIG DATA TO BETTER DRIVE BUSINESS PROCESSES, IMPROVE DECISION-MAKING, AND SUCCESSFULLY COMPETE IN TODAY S MARKETS. HOW TO MAKE SENSE OF BIG DATA TO BETTER DRIVE BUSINESS PROCESSES, IMPROVE DECISION-MAKING, AND SUCCESSFULLY COMPETE IN TODAY S MARKETS. ALTILIA turns Big Data into Smart Data and enables businesses to

More information

BIG DATA-AS-A-SERVICE

BIG DATA-AS-A-SERVICE White Paper BIG DATA-AS-A-SERVICE What Big Data is about What service providers can do with Big Data What EMC can do to help EMC Solutions Group Abstract This white paper looks at what service providers

More information

HDP Hadoop From concept to deployment.

HDP Hadoop From concept to deployment. HDP Hadoop From concept to deployment. Ankur Gupta Senior Solutions Engineer Rackspace: Page 41 27 th Jan 2015 Where are you in your Hadoop Journey? A. Researching our options B. Currently evaluating some

More information

Big Data at Cloud Scale

Big Data at Cloud Scale Big Data at Cloud Scale Pushing the limits of flexible & powerful analytics Copyright 2015 Pentaho Corporation. Redistribution permitted. All trademarks are the property of their respective owners. For

More information

Beyond Watson: The Business Implications of Big Data

Beyond Watson: The Business Implications of Big Data Beyond Watson: The Business Implications of Big Data Shankar Venkataraman IBM Program Director, STSM, Big Data August 10, 2011 The World is Changing and Becoming More INSTRUMENTED INTERCONNECTED INTELLIGENT

More information

USING BIG DATA FOR INTELLIGENT BUSINESSES

USING BIG DATA FOR INTELLIGENT BUSINESSES HENRI COANDA AIR FORCE ACADEMY ROMANIA INTERNATIONAL CONFERENCE of SCIENTIFIC PAPER AFASES 2015 Brasov, 28-30 May 2015 GENERAL M.R. STEFANIK ARMED FORCES ACADEMY SLOVAK REPUBLIC USING BIG DATA FOR INTELLIGENT

More information

Big Data and Trusted Information

Big Data and Trusted Information Dr. Oliver Adamczak Big Data and Trusted Information CAS Single Point of Truth 7. Mai 2012 The Hype Big Data: The next frontier for innovation, competition and productivity McKinsey Global Institute 2012

More information

SQLstream Blaze and Apache Storm A BENCHMARK COMPARISON

SQLstream Blaze and Apache Storm A BENCHMARK COMPARISON SQLstream Blaze and Apache Storm A BENCHMARK COMPARISON 2 The V of Big Data Velocity means both how fast data is being produced and how fast the data must be processed to meet demand. Gartner The emergence

More information

Machina Research. Where is the value in IoT? IoT data and analytics may have an answer. Emil Berthelsen, Principal Analyst April 28, 2016

Machina Research. Where is the value in IoT? IoT data and analytics may have an answer. Emil Berthelsen, Principal Analyst April 28, 2016 Machina Research Where is the value in IoT? IoT data and analytics may have an answer Emil Berthelsen, Principal Analyst April 28, 2016 About Machina Research Machina Research is the world s leading provider

More information

The Promise of Industrial Big Data

The Promise of Industrial Big Data The Promise of Industrial Big Data Big Data Real Time Analytics Katherine Butler 1 st Annual Digital Economy Congress San Diego, CA Nov 14 th 15 th, 2013 Individual vs. Ecosystem What Happened When 1B

More information

Big Data lisää älyä tiedosta

Big Data lisää älyä tiedosta 2011 Tieto Corporation Big Data lisää älyä tiedosta ebusiness Forum 21.5.2013 Ilkka Korkiakoski VP Financial Services Agenda Megatrends and needs for Big Data What is the value of Big Data? Use scenarios

More information

Understanding Your Customer Journey by Extending Adobe Analytics with Big Data

Understanding Your Customer Journey by Extending Adobe Analytics with Big Data SOLUTION BRIEF Understanding Your Customer Journey by Extending Adobe Analytics with Big Data Business Challenge Today s digital marketing teams are overwhelmed by the volume and variety of customer interaction

More information

Is Big Data a Big Deal? What Big Data Does to Science

Is Big Data a Big Deal? What Big Data Does to Science Is Big Data a Big Deal? What Big Data Does to Science Netherlands escience Center Wilco Hazeleger Wilco Hazeleger Student @ Wageningen University and Reading University Meteorology PhD @ Utrecht University,

More information

Data Centric Computing Revisited

Data Centric Computing Revisited Piyush Chaudhary Technical Computing Solutions Data Centric Computing Revisited SPXXL/SCICOMP Summer 2013 Bottom line: It is a time of Powerful Information Data volume is on the rise Dimensions of data

More information

Buyer s Guide to Big Data Integration

Buyer s Guide to Big Data Integration SEPTEMBER 2013 Buyer s Guide to Big Data Integration Sponsored by Contents Introduction 1 Challenges of Big Data Integration: New and Old 1 What You Need for Big Data Integration 3 Preferred Technology

More information

Busin i ess I n I t n e t ll l i l g i e g nce c T r T e r nds For 2013

Busin i ess I n I t n e t ll l i l g i e g nce c T r T e r nds For 2013 Business Intelligence Trends For 2013 10 Trends The last few years the change in Business Intelligence seems to accelerate under the pressure of increased business demand and technology innovations. Here

More information

VIEWPOINT. High Performance Analytics. Industry Context and Trends

VIEWPOINT. High Performance Analytics. Industry Context and Trends VIEWPOINT High Performance Analytics Industry Context and Trends In the digital age of social media and connected devices, enterprises have a plethora of data that they can mine, to discover hidden correlations

More information

Reference Architecture, Requirements, Gaps, Roles

Reference Architecture, Requirements, Gaps, Roles Reference Architecture, Requirements, Gaps, Roles The contents of this document are an excerpt from the brainstorming document M0014. The purpose is to show how a detailed Big Data Reference Architecture

More information