Who needs humans to run computers? Role of Big Data and Analytics in running Tomorrow s Computers illustrated with Today s Examples

Similar documents
Proactive Performance Management for Enterprise Databases

PEPPERDATA IN MULTI-TENANT ENVIRONMENTS

Monitoring, Managing and Supporting Enterprise Clouds with Oracle Enterprise Manager 12c Name, Title Oracle

ITIL A guide to Event Management

Optimizing Data Center Networks for Cloud Computing

Capacity planning with Microsoft System Center

Predictive Intelligence: Identify Future Problems and Prevent Them from Happening BEST PRACTICES WHITE PAPER

How To Use Ibm Tivoli Monitoring Software

Monitoring, Managing and Supporting Enterprise Clouds with Oracle Enterprise Manager 12c Jan van Tiggelen, Senior Sales Consultant Oracle

API Management Introduction and Principles

Cisco Unified Communications and Collaboration technology is changing the way we go about the business of the University.

Reduce IT Costs by Simplifying and Improving Data Center Operations Management

Establishing a business performance management ecosystem.

Computer Network Management: Best Practices

Network Management and Monitoring Software

Cognos Performance Troubleshooting

Capacity Management PinkVERIFY

Customer Success Platform Buyer s Guide

The power of collaboration: Accenture capabilities + Dell solutions

Using DCIM to Identify Data Center Efficiencies and Cost Savings. Chuck Kramer, RCDD, NTS

ITIL A guide to event management

The High Availability and Resiliency of the Pertino Cloud Network Engine

VMware on VMware: Private Cloud Case Study Customer Presentation

IBM SmartCloud Monitoring

Delivering Quality in Software Performance and Scalability Testing

Modern IT Operations Management. Why a New Approach is Required, and How Boundary Delivers

VMware Virtualization and Cloud Management Overview VMware Inc. All rights reserved

Resource Efficient Computing for Warehouse-scale Datacenters

Enterprise Storage Management System

Amazon Web Services Primer. William Strickland COP 6938 Fall 2012 University of Central Florida

WHITE PAPER. Data Center Fabrics. Why the Right Choice is so Important to Your Business

The Importance of Software License Server Monitoring White Paper

Solution Guide Parallels Virtualization for Linux

Accelerating Web-Based SQL Server Applications with SafePeak Plug and Play Dynamic Database Caching

Predictive analytics with System z

CA Virtual Assurance/ Systems Performance for IM r12 DACHSUG 2011

Uptime Infrastructure Monitor Whitepaper THE TRUTH ABOUT AGENT VS. AGENTLESS MONITORING. A Short Guide to Choosing the Right Monitoring Solution.

CSE 517A MACHINE LEARNING INTRODUCTION

A new Breed of Managed Hosting for the Cloud Computing Age. A Neovise Vendor White Paper, Prepared for SoftLayer

Introduction to SOA governance and service lifecycle management.

Module 1 Study Guide

Analyzing Big Data with Splunk A Cost Effective Storage Architecture and Solution

Radware ADC-VX Solution. The Agility of Virtual; The Predictability of Physical

Big Data Driven Knowledge Discovery for Autonomic Future Internet

Monitoring Microsoft Exchange to Improve Performance and Availability

theguard! ApplicationManager System Windows Data Collector

IBM Analytical Decision Management

White Paper. The Ten Features Your Web Application Monitoring Software Must Have. Executive Summary

Virtualizing Apache Hadoop. June, 2012

Private Cloud: Regain Control of IT

Open Source Business Rules Management System Enables Active Decisions

Load Balancing and Clustering in EPiServer

Data Centric Computing Revisited

Enabling Continuous Delivery by Leveraging the Deployment Pipeline

APPLICATION PERFORMANCE MONITORING

Optimized for the Industrial Internet: GE s Industrial Data Lake Platform

Connected Operations. The internet of things, people and services. Overview. The Market. OPC Opportunity Q&A

Work Smarter, Not Harder: Leveraging IT Analytics to Simplify Operations and Improve the Customer Experience

Ecom Infotech. Page 1 of 6

DELL BACKUP ADMINISTRATION & MANAGEMENT SERVICES

BETTER BUSINESS FORESIGHT THROUGH SHARPER TECHNOLOGICAL INSIGHT.

Energy Savings in the Data Center Starts with Power Monitoring

10 Best Practices for Application Performance Testing

CA NSM System Monitoring Option for OpenVMS r3.2

Use product solutions from IBM Tivoli software to align with the best practices of the Information Technology Infrastructure Library (ITIL).

An Oracle White Paper June, Enterprise Manager 12c Cloud Control Application Performance Management

Find the needle in the security haystack

English. Trapeze Rail System.

Business white paper. Top ten reasons to automate your IT processes

Monitoring your cloud based applications running on Ruby and MongoDB

Monitoring Best Practices for

Figure 1: Illustration of service management conceptual framework

friendlyway composer network services server

Datasheet FUJITSU Software ServerView Cloud Monitoring Manager V1.0

LEVERAGE VBLOCK SYSTEMS FOR Esri s ArcGIS SYSTEM

Expanding Uniformance. Driving Digital Intelligence through Unified Data, Analytics, and Visualization

IBM Cloud Security Draft for Discussion September 12, IBM Corporation

can you effectively plan for the migration and management of systems and applications on Vblock Platforms?

TORNADO Solution for Telecom Vertical

How To Use Mindarray For Business

Build Your Managed Services Business with ScienceLogic

WHITE PAPER. Six Simple Steps to Improve Service Quality and Reduce Costs

Big Data Use Cases Update

Hyper-connectivity and Artificial Intelligence

Networking in the Hadoop Cluster

WHITE PAPER. SAS IT Intelligence. Balancing enterprise strategy, business objectives, IT enablement and costs

Intel Service Assurance Administrator. Product Overview

Cloud Infrastructure Services for Service Providers VERYX TECHNOLOGIES

Transcription:

15 April 2015, COST ACROSS Workshop, Würzburg Who needs humans to run computers? Role of Big Data and Analytics in running Tomorrow s Computers illustrated with Today s Examples Maris van Sprang, 2015 IBM, The Netherlands Agenda Preface Grand Challenges Autonomous computer Roadmap Conclusions 2 1

Preface / Disclaimer This presentation is my view It is based on experience I have with my clients I have combined it with more general work inside and outside IBM It is not an IBM official view (but I am working on it) 3 Agenda Preface Grand Challenges Autonomous computer Roadmap Conclusions 4 2

Grand Challenges from IBM IBM Grand Challenges Goal is attractive but is obviously difficult Just-not-impossible Multi-year effort Cross organization: IBM Research, Software, Hardware and Services SMART goal, to register progress and to determine success Not directly for a client, so pure investment Even if it is not successful, frontiers of the possible will have been pushed outward 5 Chess program Deep Blue defeats world champion Kasparov in 1997 For decades, the chess world was divided whether a computer would ever be able to beat the human world champion Hardware was not fast enough for brute force algorithms Software was not good enough to mimic the way humans play chess Deep Blue was built by IBM as a Grand Challenge and defeated world champion Gary Kasparov in 1997 Since then hardware has shrunk from 1 full rack to a mobile phone 6 3

Watson defeats human top Jeopardy! players in 2011 Grand Challenge in 2005 to make a computer that understands Natural Language sufficiently to answer spoken questions (NLP) and to beat human Jeopardy! top players Cognitive computer able to deal with incomplete, conflicting or partial incorrect input data Watson had to learn about many different knowledge fields: sports, geography, history, arts, etc Internally launched in 2005, progress was slow but in 2011 Watson was strong enough to beat the human champion players Its original size of 10 racks POWER 750 servers with 2880 cores has shrunk to only 3 pizza boxes 7 What might a next Grand Challenge from IBM be? Many possibilities are conceivable: A computer that you can have discussions with Intelligent computer understanding to some point what you want Sentient computer that is aware of its surroundings Self-conscious computer Human computer implants. How about a computer that.. That keeps running when it hasn t yet finished its tasks without too much human attention is a little bit more robust than today s computers Autonomous 8 4

In this presentation focus is on Non-Functional aspects of Autonomous computers Functionality Autonomy with respect to Functionality has to deal with many more aspects such as incorrect or incomplete input or application coding errors This will require meta information about the purpose of a service Non-Functional Aspects Non-Functional Requirements include many different aspects and in general are perceived as Quality Autonomy of Quality implies that computers Self-manage the Quality to the intended / designed level Note: The term computer or -sytem in this presentation is meant to include the whole stack: from hardware up to applications. Non-Functional Requirements Availability Reliability Stability Scalability Performance Security Maintainability (and others) 9 Agenda Preface Grand Challenges Autonomous computer Roadmap Conclusions 10 5

Q: Why should we want Autonomous computers? Software problem led to two days of downtime at the largest bank in Europe has tarnished their image as the most reliable banking website. A leading freight company lost $120 million in revenue because IT was unaware that critical warning messages were associated with their key freight delivery application. They were unable to deliver packages for an entire day due to downtime. 11 Downtime is more costly than ever before ~ 5-7 M$ / hr brokerage, ~ 2-3 M$ / hr credit card, ~ 660 k$ Mobile CSP, ~ 90 k$ / hr Airline Res The Bottom Line: In Today s World, the App can never go DOWN!!! Q: Why should we want Autonomous computers? A: Because we have to make computers much more reliable Society becomes more and more dependent on computer systems and impact of failures has grown But these systems have become more complicated and have to operate in an environment that becomes more and more dynamic Pressure to build and release new functionality reduces attention for Quality Testing cannot cover all conceivable failure modes in the allotted time Every human action has some probability of causing a new error e.g. by a typing error or it may be too late A step increase is needed in making more robust and reliable computer systems One possibility is to use computers to manage computers Use self-managing computers and take human activities as much as possible out of the loop 12 6

Q: How can we make computers run more reliably? A: Let s have a look at some of the root causes of failures Disks become 100% utilized Log files of databases cannot be written anymore Maximum number of connections is used Buffer space become too small CPU utilization becomes too high Application changes were introduced but not completely tested. (but, OK, one only tests for known possible errors and cannot test all potential errors) An incorrect state was reached but no alert was raised because it was not a known failure mode Alerts have been raised but not reacted upon (or too late, or incorrectly) A recent change in payments in the Netherlands switched to reading data from chip instead of a magnetic carrier. This led to an unanticipated growth of transaction data causing full file systems For a full day a bank showed incorrect account- and transaction data to clients because batch processing was delayed too much. Reason: slowed down job. Why have alerts? 13 Agenda Preface Grand Challenges Autonomous computer Roadmap Conclusions 14 7

How to build a self-managing computer system? Simple question: Can it be known upfront whether a computer finishes in time, given some input, program and SLA goal? NO! In fact, we even cannot know, in general, if it ever will stop. But what we can do is make the system like a car-navigation system: consider with high frequency what the best route is keep the end goal in mind stay alert for road conditions and traffic jams 15 An autonomous computer requires a feedback control loop and. what recent trends mean for this 3 Actions that are automatically executed make changes to the computer system Software Defined Environments Act Sense 1 Data that is sampled from the environment: detect threats to quality verify effect of actions Big Data 16 Analytics Interpret 2 Model that interprets data, compares it with target states and translates delta s into corrective actions 8

1 Developments in Big Data have caused that data size or speed are no limiting factors anymore Lemma s 1. Without data, one essentially operates blindfolded 2. Finding a needle in a haystack is not an excuse for not using the data 3. If one cannot find signals of a particular event in the available data, then he didn t look in the right data and needs to capture more data. 4. More is better Logs & Monitor output Performance Metrics Documentation Transactions Alerts, Alarms & Events IBM already sells technology to support 10 5 metrics per blade 17 2 Analytics is the foundation to interpret data and determine if actions are needed and to assess effects of taken actions Fixed rules, such has programming, need to be avoided More suitable is Cognitive Computing leveraging Machine Learning techniques, such as Neural Nets The system can be in a number of states and the Neural Net determines the actual state while using all metrics as input One of the states is Normal behavior which is dynamically determined by Behavior Learning methods Success of states is determined by SLA s related to throughput, response times, maximum finish times but also by total energy usage. So this is an Optimization problem. A second Machine Learning engine determines actions to bring the system from unfavorable states towards favorable states and learns from already taken actions 18 9

3 Software Defined Environments (SDE) could be made fully configurable by automated processes SDE Characteristics Software defined abstraction of compute, storage and network resources Aggregation of infrastructure heterogeneous resources and sharing among different workloads and tenants Infrastructure rapid and automatic reconfiguration to meet functional and nonfunctional requirements. Automatic, policy based, assignment of resources to workloads Integration, automation and optimization of IT resources Autonomic and proactive management of workload Quality aspects Agile & Flexible Workloads and Abstracted infrastructure have configurable capacity & performance characteristics (without reboots) Optimized Total capacity of environment is optimally used: maximum number of workloads is served Individual workloads are optimized: SLA s are met Adaptive & Autonomous Built in continuous configuration optimization to meet workloads SLA s 19 Role of Big Data, Analytics and SDE in Maturity Levels towards Autonomic Computing Maturity Level Goals Rules & Control Data Way of work 1. Foundation 2. Extended Virtualization Understand on workload level what happened before or during an incident Standards Fire and Forget Workload characteristics and their execution Ad hoc, when needed, manual Understand if workloads meet their SLA s. Determine mapping between workload and environment Policies Guidelines Multiple independent goals Workload SLA s. Environment capabilities. Data collection, ETL and integration is automated. Results are used during a manual activity. 3. Extended Automation Understand how the SDE environment operates Improve the SDE environment Static thresholds Partly dependent goals Track and Adjust Environment goals Environment behavior Overall system metrics Results can be used as part of other automation processes 4. Integrating Capabilities Predict when goals are not met to enable proactive measures Generate improvement suggestions Non-linear forecasting Using observed normal behavior Results are used in manual improvement actions. The aspects are incremental: each maturity level adds capabilities on top of the lower levels. 5. Autonomic Computing Determine the configuration to optimize the overall system thus balancing SDEand environment goals. Multiple conflicting goals Behavior learning system Adaptive Workload plans Results can be used in an automated way. 20 10

Agenda Preface Grand Challenges Autonomous computer Roadmap Conclusions 21 Conclusion There is a need to make computers run much more reliable This can be realized by making computers run autonomous with respect to Quality Recent developments in Big Data, Analytics and Software Defined Environments enable running computers more autonomic Realizing this goal requires a combined approach in many fields and is not just a new version of software "We can't solve problems by using the same kind of thinking we used when we created them." Albert Einstein 22 11

23 12