Big Data: Strategies and Synergies. Melinda H. Connor D.D., Ph.D., AMP, FAM

Similar documents
Perform-Tools. Powering your performance

Effective Java Programming. efficient software development

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2007 Lecture 5 - DBMS Architecture

Guideline for stresstest Page 1 of 6. Stress test

PART IV Performance oriented design, Performance testing, Performance tuning & Performance solutions. Outline. Performance oriented design

Faster Cloud Backup: How It's Done

Agenda. Enterprise Application Performance Factors. Current form of Enterprise Applications. Factors to Application Performance.

EMC KAZEON E-DISCOVERY: PERFORMANCE TUNING GUIDELINES AND BEST PRACTICES

PostgreSQL Backup Strategies

In-Memory Analytics: A comparison between Oracle TimesTen and Oracle Essbase

TESTING AND OPTIMIZING WEB APPLICATION S PERFORMANCE AQA CASE STUDY

Operating System Overview. Otto J. Anshus

Full and Para Virtualization

CS550. Distributed Operating Systems (Advanced Operating Systems) Instructor: Xian-He Sun

Big Data, Fast Processing Speeds Kevin McGowan SAS Solutions on Demand, Cary NC

DMS Performance Tuning Guide for SQL Server

Architecting Distributed Databases for Failure A Case Study with Druid

Software: Systems and. Application Software. Software and Hardware. Types of Software. Software can represent 75% or more of the total cost of an IS.

ICOM 6005 Database Management Systems Design. Dr. Manuel Rodríguez Martínez Electrical and Computer Engineering Department Lecture 2 August 23, 2001

Load Testing Analysis Services Gerhard Brückl

Transaction Performance Maximizer InterMax

The World According to the OS. Operating System Support for Database Management. Today s talk. What we see. Banking DB Application

THE NEAL NELSON DATABASE BENCHMARK : A BENCHMARK BASED ON THE REALITIES OF BUSINESS

Perforce with Network Appliance Storage

lesson 1 An Overview of the Computer System

Why Threads Are A Bad Idea (for most purposes)

Monitoring and Diagnosing Production Applications Using Oracle Application Diagnostics for Java. An Oracle White Paper December 2007

Real World Considerations for Implementing Desktop Virtualization

Course 55144B: SQL Server 2014 Performance Tuning and Optimization

Resource control in ATLAS distributed data management: Rucio Accounting and Quotas

Storage Architectures for Big Data in the Cloud

Augmented Search for Software Testing

Exceptions to the Rule: Essbase Design Principles That Don t Always Apply

Course MS10975A Introduction to Programming. Length: 5 Days

CS 377: Operating Systems. Outline. A review of what you ve learned, and how it applies to a real operating system. Lecture 25 - Linux Case Study

Augmented Search for IT Data Analytics. New frontier in big log data analysis and application intelligence

Performance Testing of Java Enterprise Systems

Microsoft SQL Server: MS Performance Tuning and Optimization Digital

On Benchmarking Popular File Systems

Boost SQL Server Performance Buffer Pool Extensions & Delayed Durability

Architecting ColdFusion For Scalability And High Availability. Ryan Stewart Platform Evangelist

In Memory Accelerator for MongoDB

Overview and History of Operating Systems

Cyber Security: Guidelines for Backing Up Information. A Non-Technical Guide

Chapter 6, The Operating System Machine Level

Retrieving Data from Apple ios Devices Using XRY

EXPERT: Rich Sloan, co-founder and CEO, StartupNation.com; Darrin. VIDEO DESCRIPTION: See how Wells Fargo s OptRight Payroll Solutions can

Overview of I/O Performance and RAID in an RDBMS Environment. By: Edward Whalen Performance Tuning Corporation

Logentries Insights: The State of Log Management & Analytics for AWS

Course 55144: SQL Server 2014 Performance Tuning and Optimization

Mark Bennett. Search and the Virtual Machine

Chapter 3. Internet Applications and Network Programming

Web DNS Peer-to-peer systems (file sharing, CDNs, cycle sharing)

Cloud Storage. Parallels. Performance Benchmark Results. White Paper.

SVMi-4 & SVM-400. Voice Mail System. System Administration Manual

THE BUSY DEVELOPER'S GUIDE TO JVM TROUBLESHOOTING

Introducing PgOpenCL A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child

<Insert Picture Here> Designing and Developing Highly Scalable Applications with the Oracle Database

Promise of Low-Latency Stable Storage for Enterprise Solutions

Using SQL Monitor at Interactive Intelligence

Parallel Computing: Strategies and Implications. Dori Exterman CTO IncrediBuild.

Module 15: Monitoring

Chapter 14: Recovery System

PARALLELS CLOUD STORAGE

Architecting For Failure Why Cloud Architecture is Different! Michael Stiefel

Eloquence Training What s new in Eloquence B.08.00

MAGENTO HOSTING Progressive Server Performance Improvements

Week Overview. Installing Linux Linux on your Desktop Virtualization Basic Linux system administration

Identifying Performance Bottleneck using JRockit. - Shivaram Thirunavukkarasu Performance Engineer Wipro Technologies

BIRT Document Transform

Introduction to Embedded Systems. Software Update Problem

MySQL Backup and Recovery: Tools and Techniques. Presented by: René Senior Operational DBA

Performance Tuning and Optimizing SQL Databases 2016

picojava TM : A Hardware Implementation of the Java Virtual Machine

Java Performance. Adrian Dozsa TM-JUG

Agile Performance Testing

Monitoring applications in multitier environment. Uroš Majcen A New View on Application Management.

Using Data Mining and Machine Learning in Retail

Programming NAND devices

Windows Server Performance Monitoring

Essentials of Java Performance Tuning. Dr Heinz Kabutz Kirk Pepperdine Sun Java Champions

Augmented Search for Web Applications. New frontier in big log data analysis and application intelligence

SQL Server Performance Tuning and Optimization

JBoss Seam Performance and Scalability on Dell PowerEdge 1855 Blade Servers

iservdb The database closest to you IDEAS Institute

COSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters

Document Image Archive Transfer from DOS to UNIX

Cloud Computing at Google. Architecture

1 How to Monitor Performance

Enterprise Manager Performance Tips

Benchmarking FreeBSD. Ivan Voras

"Charting the Course... MOC AC SQL Server 2014 Performance Tuning and Optimization. Course Summary

Datacenter Operating Systems

How To Develop Software

Your business in the 21 st Century. Understanding Cloud

Why Performance Test Outside the Firewall? Exposing What You Have Missed

CloudCmp:Comparing Cloud Providers. Raja Abhinay Moparthi

Outline. Database Management and Tuning. Overview. Hardware Tuning. Johann Gamper. Unit 12

How To Test For Performance

Transcription:

Big Data: Strategies and Synergies Melinda H. Connor D.D., Ph.D., AMP, FAM

Melinda H. Connor, D.D., Ph.D., AMP, FAM Adjunct Professor, Akamai University, Hilo, Hawaii Science Advisor, Spirituals for the 21st Century, Georgia and Nolan Payton Archive of Sacred Music, California State University Dominguez Hills CEO, National Foundation for Energy Healing Melinda_Connor@mindspring.com

What are the Big Issues around Big Data?

Challenges: Quality of programming skills of the computer programmers. Level of problem definition. Level of actual problem understanding in the specific area. Correct hardware to solve the issue. Correct software to solve the issue.

Challenges con t: Intersection and compatibility of the hardware and software. Intersection and compatibility of the software on multiple platforms. Understanding of the end user needs. Production of the reports in a format that the end user can understand.

Client Quote I don t care how your software works. I don t want to spend time with your software. I just want the data I need to run my business!

Flip Side: Poorly trained user community wanting turn key solutions. The incorrect people making the purchasing decisions. Poorly defined understanding of what the real problem is that they are trying to solve. Poor quality problem reports.

Where to start...

How can utilize the terabytes per hour that you are receiving? Define the needs closely as possible to match the needs of the business or situation Do data mining! There will be more that you can use Select the correct platform to do the processing at speed Understand all of the tools that are available do not limit yourself to one companies tools but do write in clauses that the software must work together or no one gets paid.

What is the most effective management of this big data? Play both ends against the middle! One end is the problem you are trying to solve. The other end is the report the end user needs. Build fast platforms that are correctly sized for the load. Limit the bottlenecks in the hardware. Have the correct people do the purchasing and use industry specialists.

SPEED, CORRECT PLATFORM, CORRECT FORM OF DATA BASE, CORRECT TOOLS for ANALYSIS and the CORRECT FORM OF THE REPORT

What are the most effective ways of understanding the ecological landscape of the data you are receiving? Start by understanding the types of data you are collecting. Then understand the tools available. For example: Object oriented vs relational databases which do you use and when do you use one or the other?

How do you determine new corporate strategic direction based on the data when the shape of the data itself is not clear? By defining the problem that you are trying to solve very tightly. Then you get the data which answers the questions.

How long do you keep the raw data? How much storage space do you have available and how fast are you getting the data? What are your storage processing speeds and how fast can you process the data that is available. Know where the bottlenecks are in the physical limitations of your hardware: For example: if you have a slow IO handler? Know the limitations in the way your database is designed: File vs table vs row/column locking! What about threading? When is the OS software going to start thrashing? What about speed of allocation of memory space? What are the legal requirements?

Real World Example: Internet broadcast of a science experiment: 8k users logged on a system designed for 2400 users with different businesses. RESULT Crashed every server in the system.

And what data will you dump? Everything you can! You will be getting more! Life/data runs in cycles. You will not hear or see the information only once. There are ways to back up the raw data and keep it for a number of years but do you REALLY need that data?

What about the limitations of the hardware of the various platforms and the network structure itself? Problem definition skills of decision makers. They do not define the needs of the business closely enough because they are not using the actual data. Do not understand sizing the volume of data properly so that the correct processing platform is selected. Do not understand what shape the final product needs to be in to be useful to the team.

Real World Example: Hospital System (50 hospitals) Wanted to have end users on PC s so selected a PC based system which could not handle the processing load. Decided on centralized servers without tiered support. Did not purchase enough servers. Did not distribute network load effectively. Did not provide enough training on the software to medical personnel.

Programmer Training Issues with the training of the programmers: Many do not understand how to write the software to use the hardware most effectively. AND they do not understand the stacking. AND they do not understand how to optimize the code to make the best use of the compilers.

Use an industry specialist!

What are the most effective ways of data-mining? Specialized software for the platform. Build the algorithms to determine if there are any random correspondences. Know what data you what to review. Build meta-data platforms whenever possible. Have the people doing the design and builds understand the shape of the data before they start!

Real World Example: Soft Drink Company in 122 countries: Need to understand peek load days for manufacture and distribution. Problem trying to address was concurrence when one country would have to support the overload of another. Meta-data critical to understanding and defining the shape of the data.

What about cross platform portability of the final product? Wolf Geiger (1992) - Data is only as good as the format in which it is presented to the person who has to use it. If it is not in a format that they can use there is no point in spending the time to do any of the processing.

Real World Example: Asked the end user to write down exactly what they wanted in the report. Asked the manager to write down exactly what they wanted in the report. Asked the computer programmer to write down exactly what the clients wanted in the report. Two of three matched. Which one did not?

Cell Phone Data: How should it be parsed? Has to be done on super computers to start based on the volume of the data but it has to end in PC formats! Object oriented db with full variable length fields. Needs Multi-dimensional processing: Computational linguistics. Analysis of word stressors. Analysis of grammatical syntax. Cognitive focus (topic basis). Recognized vocal stress vs topic. Risk factor assignment. Background noise assessment. Probability analysis of each of the factors to determine further review. Data presentation tools have to be in a format that is currently used that everyone understands where to look to find the important information. Cross platform portability!!!!

Questions?

Thank you!