Big Picture of Big Data Software Engineering With example research challenges

Similar documents
Big Data Driven Knowledge Discovery for Autonomic Future Internet

CS 389 Software Engineering. Lecture 2 Chapter 2 Software Processes. Adapted from: Chap 1. Sommerville 9 th ed. Chap 1. Pressman 6 th ed.

Kimmo Rossi. European Commission DG CONNECT

Rapid software development. Ian Sommerville 2004 Software Engineering, 7th edition. Chapter 17 Slide 1

Unit 1 Learning Objectives

Digital Industries Trailblazer Apprenticeship. Software Developer - Occupational Brief

How To Handle Big Data With A Data Scientist

DATAOPT SOLUTIONS. What Is Big Data?

Navigating Big Data business analytics

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase

Big Data and New Paradigms in Information Management. Vladimir Videnovic Institute for Information Management

Sources: Summary Data is exploding in volume, variety and velocity timely

Database Marketing, Business Intelligence and Knowledge Discovery

Concept and Project Objectives

Agile So)ware Development

IT Services Management Service Brief

Key Evolutions of ERP

Trustworthiness of Big Data

The profile of your work on an Agile project will be very different. Agile projects have several things in common:

IoT Analytics Today and in 2020

Associate Prof. Dr. Victor Onomza Waziri

Big Data - Infrastructure Considerations

Ironfan Your Foundation for Flexible Big Data Infrastructure

Software Engineering Reference Framework

In-Database Analytics

A Business Analysis Perspective on Business Process Management

Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control

BIG DATA & ANALYTICS. Transforming the business and driving revenue through big data and analytics

Luncheon Webinar Series May 13, 2013

Testing Big data is one of the biggest

Dell* In-Memory Appliance for Cloudera* Enterprise

Big Data Support Services. Service Definition

Rapid software development. Ian Sommerville 2004 Software Engineering, 7th edition. Chapter 17 Slide 1

Talousjohto muutosagenttina ja informaatiotulvan tulkkina

Big Data Explained. An introduction to Big Data Science.

The Decision Management Manifesto

BigData Flipkart. Raju Shetty Dir. of Engg, Data Platform

Plan-Driven Methodologies

Big Data Challenges and Success Factors. Deloitte Analytics Your data, inside out

Are You Big Data Ready?

Rapid Software Development

Agile Data Warehousing

GETRONICS: A BALANCED CLOUD POSITION

High-Performance Analytics

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON

BIG DATA COURSE 1 DATA QUALITY STRATEGIES - CUSTOMIZED TRAINING OUTLINE. Prepared by:

Why NoSQL? Your database options in the new non- relational world IBM Cloudant 1

Social Data Science for Intelligent Cities

1.1 The Nature of Software... Object-Oriented Software Engineering Practical Software Development using UML and Java. The Nature of Software...

Diploma Of Computing

SAP Solution Brief SAP HANA. Transform Your Future with Better Business Insight Using Predictive Analytics

HOW TO DO A SMART DATA PROJECT

Agile Business Intelligence Data Lake Architecture

BIG DATA THE NEW OPPORTUNITY

How Big Data is Different

Building Your Big Data Team

ANALYTICS CENTER LEARNING PROGRAM

Master Data Management Architecture

Analytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world

Towards a Thriving Data Economy: Open Data, Big Data, and Data Ecosystems

can you effectively plan for the migration and management of systems and applications on Vblock Platforms?

(Main Head) The future of data mining Part 3

Top-down or Bottom-up?

FROM A RIGID ECOSYSTEM TO A LOGICAL AND FLEXIBLE ENTITY: THE SOFTWARE- DEFINED DATA CENTRE

Big Data Tips the Power Balance Between IT and Business Users

Ali Eghlima Ph.D Director of Bioinformatics. A Bioinformatics Research & Consulting Group

Big Data & Analytics: Your concise guide (note the irony) Wednesday 27th November 2013

Modern IT Operations Management. Why a New Approach is Required, and How Boundary Delivers

A Process Model for Software Architecture

INTERSEC BENCHMARK. High Performance for Fast Data & Real-Time Analytics Part I: Vs Hadoop

Software Life Cycle. Main issues: Discussion of different life cycle models Maintenance or evolution

Navigating the Big Data infrastructure layer Helena Schwenk

COMP 354 Introduction to Software Engineering

BIG DATA & DATA SCIENCE

International Journal of Advancements in Research & Technology, Volume 3, Issue 5, May ISSN BIG DATA: A New Technology

How To Turn Big Data Into An Insight

Hadoop for Enterprises:

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Tailoring software engineering education

IoT is a King, Big data is a Queen and Cloud is a Palace

IBM System x reference architecture solutions for big data

Chapter 4 Software Lifecycle and Performance Analysis

locuz.com Big Data Services

DOBUS And SBL Cloud Services Brochure

Structure of Presentation. The Role of Programming in Informatics Curricula. Concepts of Informatics 2. Concepts of Informatics 1

To introduce software process models To describe three generic process models and when they may be used

Real-Time Data Access Using Restful Framework for Multi-Platform Data Warehouse Environment

Senior Business Intelligence/Engineering Analyst

Transcription:

Big Picture of Big Data Software Engineering With example research challenges Nazim H. Madhavji, UWO, Canada Andriy Miranskyy, Ryerson U., Canada Kostas Kontogiannis, NTUA, Greece madhavji@gmail.com avm@ryerson.ca kkontog@softlab.ece.ntua.gr 22 May, 2015 BIGDSE 2015, Florence, Italy 1

Introduction The Big Data technology and services market: compound annual growth rate of 27% $9.8 billion in 2012 to projected $32.4 billion in 2017 anticipated to generate data exponentially. Big Data is on the rise... 22 May, 2015 BIGDSE 2015, Florence, Italy 2

Observation -- 80:20 Bias Technical and business communities are focusing: More on data analytics and Big Data infrastructures (76%) Less on new application areas (22%) involving Big Data (the so-called Big Data Software Engineering (BDSE) ). 22 May, 2015 BIGDSE 2015, Florence, Italy 3

Observation -- 80:20 Bias No inherent reason for this bias. Perhaps, more a function of: availability of data, infrastructure, mining, and analysis tools Relatively quick payback 22 May, 2015 BIGDSE 2015, Florence, Italy 4

If we know our History... A predominant model of software development in the 60s was Code & Fix. There was the so-called, software crisis. We have learnt over the years: Waterfall, iterative, evolutionary, CBSE, SoS, agile, etc., models of development. 22 May, 2015 BIGDSE 2015, Florence, Italy 5

Question Are we repeating this mentality all over again? The current paradigm is that of Mine and Analyse. Confusion abounds in industry. Are we missing a Big Picture? 22 May, 2015 BIGDSE 2015, Florence, Italy 6

Advert 1/2 Big Data Engineer We are looking for someone who has A LOT of experience (10k hours) in coding and building software. Experience with Big Data stacks like Elasticsearch, HBase, Hadoop and Cassandra (or large databases in general). http://www.indeed.com/viewjob?jk=6ca00b23587bc40c&from=api&q=title%3a+%22bi g+data%22&atk=19cpkk2050nki3ag&sclk=1&sjdu=bhisijc3owpxoxyflbyjaggghl1ggq PT4rYranXcPF8ktqmPQB5SbCesna_YdpNYpg8Rvg9Sy6uCI4M2jH2jH3MJzfB7c2xnydF8CD wehp7jlfxcogv4mbps67pvzriz3ryntwirqjnlmjjsmgbsuhmqha7reqhojva4jnrnr6v1 ZVuvjpFoqGuyJgqEaaOyedbz0p5qzcdJEmb0NIeR9uiLF7iNn0vCkkuXsX5rXbKBaur7uSD8BUCqqnz14s2c8ru4nevKZREByoIDCWDOyllotLF4Xmbi8afcr3kojgq8DtYdE9jceLhUBqJXoVqvllX9J6i7GBIYZzZpDaTGCKg8WwzM6e5ZsrX Gi09xo&pub=4009563241607923 22 May, 2015 BIGDSE 2015, Florence, Italy 7

Advert 2/2 Big Data Engineer Research, design, and implementation of new technologies which will provide a solid ground for Data Processing and Machine Learning tools. http://www.indeed.com/viewjob?jk=c1eeb42674926be3&from=api&q=title%3a+%22big+da ta%22&atk=19cpkk2050nki3ag&sclk=1&sjdu=bhisijc3owpxoxyflbyjaggghl1ggqpt4ryranx cpf_baxbgyzgnbbqj5rol-se_cbjg4e6deepvps-kyz1-nkqcoam3hebl-kphggqdvvlcnrhv4- ZZPb3eJY1alR6DxaZYW71PQZS1jJf5zPs7WircXR0Z- EUfLyGZpWOL20pEz8c40C8i6w3z1_TOWlPRdfa0wob- 2aBDIf3ABy9d12YYZcHl9qM_bV6CkLuZ0m0yTm3bKcWrWxLHwYzeb_KYVKUGy2RZX7qrzbFo8 LLzL1_JJsH2kAEc4NZgztkCCtdNgEPtjHRjz9oKg5y_sH6C7lSpNjlxdaTSymFk3zNfA&pub=4009563241607923 Advert 1/2 and 2/2 are both Big Data Engineer yet have completely different tasks. 22 May, 2015 BIGDSE 2015, Florence, Italy 8

Advert 1/2 Big Data Architect Responsibilities -- Collaborate with senior management at the largest financial institutions in the world, and the Clutch senior stakeholders advising them, to translate business challenges into technical requirements http://www.indeed.com/viewjob?jk=d186ffc9abee2ed0&from=api&q=title%3a+%22big+dat a%22&atk=19cpkk2050nki3ag&sclk=1&sjdu=bhisijc3owpxoxyflbyjaggghl1ggqpt4ryranxc PF9nUaxYmrmEG0GRgHtiyduBuYsH8OgnPcc3TEk53cUJUvl99Pzwdt7FjJ18XVa9LDqtoc37jflOt9_fZU2i - fubwp6ozeueqnb4ds8wy_vvvxxkq1jc4tsrak360vydyxn1yzwaurs7loviqp6qot5xnxllrm_ cslonmdw22jcsivv4amz5tta6eijtdlnkfm29e6qf1eqq4hi10l19ko- waaox8hs9kbauvvvcrgq-81ia9koedmb1cpxjgbdwxkdnig7s0z- KsUf4FWgzn2LLTJfpVhyuCnCdpXVPUNkwg&pub=4009563241607923 22 May, 2015 BIGDSE 2015, Florence, Italy 9

Advert 2/2 Big Data Architect Position Purpose -- Engineer, deploy, and support Hadoop clusters in production and lower environments. Provide hands-on administration of running Hadoop jobs and optimizing daily tasks. http://www.indeed.com/viewjob?jk=ddcd956b9e8ae114&from=api&q=title%3a+%22big+da ta%22&atk=19cpmp0ee0nr17te&sclk=1&sjdu=bhisijc3owpxoxyflbyjaggghl1ggqpt4ryran XcPF9nU-axYmrmEG0GRgHtiyduj_6r7WPyR-HwKeEmfutq6ECaCOa- XVrXAQ7ijoynCOxnAMwsHBWdxtw6gbG6mt6WJd7LSR5jDOnafN0Qs4-8Fmx6lef8GCfkZYVSQ50u9NchkLTvKA-EX3VlfjtVu2T-wLZp4t74-3QSh_CyOaFarQJzuBSr_vBZnkAjTnU4FcWg21WX0QHTx0aHowDb4btV_qpBfEtJDGZLTUVMp9j stwtrepdwaz_1ulxvp717kvtkaggphmjagkfly10vnubrfgdqvgny4iplvbfheod7_q&pub=40 09563241607923 Advert 1/2 and 2/2 are both Big Data Architect yet have completely different tasks. 22 May, 2015 BIGDSE 2015, Florence, Italy 10

We envisage a Big Picture of Big Data Software Engineering It is anticipated to help in at least two ways: (a) in organisational structuring agent roles, processes, relationships, etc., among the interacting elements (b) in precipitating research in targeted areas of the Big Data environment. 22 May, 2015 BIGDSE 2015, Florence, Italy 11

Corporate Decision-making Streamed/Historical data, Technologies, Infrastructures, etc. Business and Client Scenarios STRATEGIC THINKING (Practice) ACTION ON BIG DATA Big Data SE practice STRATEGY IMPLEMENTATION Marketeers Users Corporate User Support Management Exogenous Application Change Concept (Practice) Views. Operational (Predictive) Project & Program Evolving Process Program Understanding Managers Theories, and Structure Computational Models Procedures Procedures, Laws and of Application and Program Requirements Algorithms System Domains Definition Analysis A (Practice) Big Data systems & services, and Analytics (e.g., Financial sector) CS research on Big Data RESEARCH (Theory) Used in SE research on Big Data Environments 22 May, 2015 BIGDSE 2015, Florence, Italy 12

Corporate Decision-making Streamed/Historical data, Technologies, Infrastructures, etc. Business and Client Scenarios (Practice) Big Data SE practice Marketeers Users Corporate User Support Management Exogenous Application Change Concept (Practice) Views. Operational (Predictive) Project & Program Evolving Process Program Understanding Managers Theories, and Structure Computational Models Procedures Procedures, Laws and of Application and Program Requirements Algorithms System Domains Definition Analysis A (Practice) Big Data systems & services, and Analytics (e.g., Financial sector) CS research on Big Data (Theory) Used in SE research on Big Data Environments 22 May, 2015 BIGDSE 2015, Florence, Italy 13

Illustrative Research Challenges Requirements Engineering (RE) Architectures Testing and Maintenance 22 May, 2015 BIGDSE 2015, Florence, Italy 14

Requirements Engineering (RE) RE process involves, amongst others: modelling the application domain, eliciting scenarios and requirements from stakeholders and from other sources, developing functional and behavioural models, analysis, prioritisation, and validation of requirements. Real-world characteristics of the elements (e.g., dimensions, weight, cost, transfer rates, etc.) need to be represented in requirements notation along with operations on them. 22 May, 2015 BIGDSE 2015, Florence, Italy 15

Requirements Engineering (RE) Thus, when eliciting scenarios of Big Data system responses, obviously, we need to capture properties (such as volume, velocity, variety, varacity, etc.) in RE models. Currently, RE for Big Data applications is an emerging area. Need to separate requirements for: infrastructures, analytic tools, and end-user or business applications. 22 May, 2015 BIGDSE 2015, Florence, Italy 16

Requirements Engineering (RE) Example business scenario: customer arrives in a department store, possible previous experiences in the store, real-time processing, personalised and time-sensitive offers to this customer, at specific points of displays where the customer would likely be shopping. How should a requirements analyst exploit video footage, images, historical data, and info.? Needs to address all the Vs of Big Data. 22 May, 2015 BIGDSE 2015, Florence, Italy 17

Example RE challenges 1. Envisaging innovative application scenarios in the context of Big Data. Focus is on the application domain, stakeholdervalue, and creative ways to utilise the Vs of Big Data including combinations thereof. 22 May, 2015 BIGDSE 2015, Florence, Italy 18

Example RE challenges 2. Creating underlying technologies, libraries and frameworks that provide primitive operations on Big Data characteristics. Analogous to creating mathematical theories. Example: Fast comparisons of videos Encryption of data of various types (e.g., videos, images, voice, etc.) for secure transmission of information. Etc. 22 May, 2015 BIGDSE 2015, Florence, Italy 19

Example RE challenges 3. Envisaging system solutions (at RE-time) (taking into account primitive operations): In order to specify, analyse, validate, prioritise requirements for Big Data applications. 22 May, 2015 BIGDSE 2015, Florence, Italy 20

Illustrative Research Challenges Requirements Engineering (RE) Architectures Testing and Maintenance 22 May, 2015 BIGDSE 2015, Florence, Italy 21

Reference Architectures for Big Data Applications Limited work on reference architectures and patterns for Big Data analytics applications. There is also a need to investigate: How these reference architectures can be mapped onto existing technologies, frameworks, and tools to yield relevant and optimal application deployment. What is, and how to assess, the impact of different architectural design decisions on functional and nonfunctional requirements in Big Data analytics applications Note this middle-out development process; not always top-down. 22 May, 2015 BIGDSE 2015, Florence, Italy 22

Dynamic Resource Deployment and Management Big data analytics applications encompass the use of multiple data sources and processes implementing complex algorithms. Use of virtualisation technology, and data distribution. These technologies pose new issues related to dynamic resource allocation and resource management. Some interesting points to investigate include: Novel techniques and frameworks that allow for moving computation logic closer to where different data sources reside, rather than moving massive data to centralised data centers Novel techniques to assess and verify whether key policies (e.g. privacy, security, accuracy) hold in a given deployment and ensure compliance within an organisation s governance framework 22 May, 2015 BIGDSE 2015, Florence, Italy 23

Illustrative Research Challenges Requirements Engineering (RE) Architectures Testing and Maintenance How to scale down the resources to produce realistic BDS test system? How to scale/synthesize data representatively? How to capture workload on a production BDS and replay it on a test BDS? 22 May, 2015 BIGDSE 2015, Florence, Italy 24

Summary Current Big Data paradigm is biased in favour of data analytics. We propose a Big Picture of Big Data that extends the paradigm into application engineering. We give illustrative research challenges in the areas of: Requirements engineering System architectures Creating representative test system for BDS 22 May, 2015 BIGDSE 2015, Florence, Italy 28

END 22 May, 2015 BIGDSE 2015, Florence, Italy 29