Overview. COSC 6397 Big Data Analytics. Fundamentals. Edgar Gabriel Spring 2015. Data Characteristics. Performance Characteristics

Similar documents
THE NAVAJO NATION Department of Personnel Management JOB VACANCY ANNOUNCEMENT INFORMATION SYSTEMS TECHNICIAN

Handout 3. Free Electron Gas in 2D and 1D

Prepare for business. Prepare for success

Keep Te d C u b e. TedCube

Preflighting for Newspaper

HEROS: Energy-Efficient Load Balancing for Heterogeneous Data Centers

Tank Level GPRS/GSM Wireless Monitoring System Solutions

Plane Waves, Polarization and the Poynting Vector

Using Open Source Tools to Support Collaboration Within CALIBRE.

A Newer Secure Communication, File Encryption and User Identification based Cloud Security Architecture

SCO TT G LEA SO N D EM O Z G EB R E-

Contents. Presentation contents: Basic EDI dataflow in Russia. eaccounting for HR and Payroll. eaccounting in a Cloud

Sale Mode Choice of Product Extended Warranty based on the Service Level

HEAT TRANSFER ANALYSIS OF LNG TRANSFER LINE

ISSeG EGEE07 Poster Ideas for Edinburgh Brainstorming

Ten Steps for an Easy Install of the eg Enterprise Suite

This guide is intended for administrators, who want to install, configure, and manage SAP Lumira, server for BI Platform

Chapter 3: Cluster Analysis

Chapter 4: Thinking Like a Programmer

DIRECT DATA EXPORT (DDE) USER GUIDE

IT Update - August 2006

Oakland Accelerated College Experience

Financial Mathematics

FundingEdge. Guide to Business Cash Advance & Bank Statement Loan Programs

TeamSnap Media Kit

How much life insurance do I need? Wrong question!

Fixed vs. Variable Interest Rates

SQL 2005 Database Management Plans

Host Country: Czech Republic Other parties: Denmark Expected ERUs in : ~ 1,250,000 tco 2

Calculating the Real ROI of Video Conferencing Technology. SHOCKING TRUTH It s Not Just About Travel Costs!

efusion Table of Contents

B I N G O B I N G O. Hf Cd Na Nb Lr. I Fl Fr Mo Si. Ho Bi Ce Eu Ac. Md Co P Pa Tc. Uut Rh K N. Sb At Md H. Bh Cm H Bi Es. Mo Uus Lu P F.

Structure of CFA file MCC 3.20 / HC 4.0 / SMO?? / MCT 3.20 ( version 6)

Before attempting to connect or operate this product, please read these instructions carefully and save this manual for future use.

Budget Planning. Accessing Budget Planning Section. Select Click Here for Budget Planning button located close to the bottom of Program Review screen.

SOME APPLICATIONS OF FORECASTING Prof. Thomas B. Fomby Department of Economics Southern Methodist University May 2008

Aegis Identity Software, Inc. Experts in Identity Management 100% Focused on Education

Load Balancing Algorithm Based on QoS Awareness Applied in Wireless Networks

Server 2008 R2 - Generic - Case

BARTON COLLEGE PRACTICE PLACEMENT TEST. a) 4 b) 4 c) 12 d) a) 7a 11 b) a 17 c) a 11 d) 7a 17. a) 14 b) 1 c) 66 d) 81

The Casino Experience

Caching Software Performance Test: Microsoft SQL Server Acceleration with FlashSoft Software 3.8 for Windows Server

Budgeting. Here are five easy ways to keep your budget. Keeping up with all the INS and OUTS POSITIVE. Budget Quick Start. Go Green!

Broadband Task Force Team C: Permitting Process

Writing a Cell Phone Strategy

Custom Portlets. an unbiased review of the greatest Practice CS feature ever. Andrew V. Gamet

CST

1.- L a m e j o r o p c ió n e s c l o na r e l d i s co ( s e e x p li c a r á d es p u é s ).

Identify Storage Technologies and Understand RAID

A Place to Choose Quality, Affordable Health Insurance

UNIT PLAN. Methods. Soccer Unit Plan 20 days, 40 minutes in length. For 7-12 graders. Name

PLANNING FOR QUALITY CARE AND INDEPENDENCE. Why you need to plan for long-term care assistance, and what funding options are available.

Remote Desktop Tutorial. By: Virginia Ginny Morris

1) Update the AccuBuild Program to the latest version Version or later.

The Gibbs Free Energy and Cell Voltage

NextGen: PM Contract Library. User Manual

STIOffice Integration Installation, FAQ and Troubleshooting

ISAM TO SQL MIGRATION IN SYSPRO

Externalities. Information Failure Unstable Markets

C H A P T E R 1 Writing Reports with SAS

Live Analytics for Kaltura Live Streaming Information Guide. Version: Jupiter

Cruisin with Carina Motorcycle and Car Tour Guide

GED MATH STUDY GUIDE. Last revision July 15, 2011

High-frequency response of a CG amplifier

Title: How Do You Handle Exchange Mailboxes for Employees Who Are No Longer With the Company

SITE APPLICATIONS USER GUIDE:

GETTING STARTED With the Control Panel Table of Contents


This report provides Members with an update on of the financial performance of the Corporation s managed IS service contract with Agilisys Ltd.

Southeast Michigan Disaster Recovery Talking Points

SYSTEMS & SERVICES VENDOR PROGRAMS SPECIALTY MARKET PROGRAMS BE A SPECIALIST OR REFER A SPECIALIST

FTE is defined as an employee who is employed on average at least 30 hours of service per week.

What payments will I need to make during the construction phase? Will the lender advance construction funds prior to the work being completed?

Welcome to Remote Access Services (RAS)

A Project Management framework for Software Implementation Planning and Management

Fund Accounting Class II

How to put together a Workforce Development Fund (WDF) claim 2015/16

CareLink Connectivity Options Introduction and Comparison Updated June 2013

ELEC 204 Digital System Design LABORATORY MANUAL

JCUT-3030/6090/1212/1218/1325/1530

Polymorphic Shellcodes vs. Application IDSs

Cloud Industry Trends in Asia Pacific & Singapore. Cheong Lai Siong Cloud Chapter ViceChair SITF

MedNetwork Systems Impulse Database Management

Electric Circuits II. More about Mutual Inductance. Lecture #22

Victims Compensation Claim Status of All Pending Claims and Claims Decided Within the Last Three Years

BERGEN COMMUNITY COLLEGE DIVISION OF BUSINESS, PERFORMING ARTS AND SOCIAL SCIENCES BUSINESS DEPARTMENT

Using Sentry-go Enterprise/ASPX for Sentry-go Quick & Plus! monitors

Service Desk Self Service Overview

Transcription:

COSC 6397 Big Data Analtics Fundantals Edga Gabil Sping 2015 Ovviw Data Chaactistics Pfanc Chaactistics Platf Cnsidatins 1

What aks lag scal Data Analsis had? Oftn suaizd as VVVV Vlu: 5 Exabts f data catd until 2003 Th sa aunt f data catd in 2011 in tw das Estiat f 2013: 10 inuts f cating th sa aunt f data Exapl: a cunicatin svic pvid with 100 illin custs gnats ~5 ptabts f lcatin data p da F WWW t VVVV Vlcit: Thughput: aunt f data vd thugh th pips bil data vlus gwing at 78% p a Expctd t ach 10.8 xabts p nth in 2016 Latnc: Analtics usd t b st and pt Data shwn was f stda Ral-ti analtics gaining ppulait S svics availabl which guaant analsis in 10s 2

F WWW t VVVV Vait Data cs f a vait f sucs in diffnt fats Exapl: call cnt which nds t intgat infatin f Eail Tubl tickt Cnvsatin Scial dia blgs F WWW t VVVV Vacit Data suffs f significant cctnss and accuac pbls Cdibilit:.g. scial dia spns t a capaign shuld nt b basd n thid pat liks liks can b puchasd Rspns b disguntld pls Audinc Suitabilit Cust svic idntifing a pbl in a pduct has t sha th infatin slctivl 3

Analzing lag data vlus Lag: data than can b pcssd n a singl PC Taks t lng t b pcssd n a singl PC Th qustins Hw t utiliz ultipl pcsss Hw t valuatd whth w did a gd jb in using ultipl pcsss Adinistativ ptins f using ultipl pcsss f lag scal analsis Pfanc tics (I) Spdup: hw uch fast ds a pbl un n p pcsss cpad t 1 pcss? T S( p) T ttal ttal (1) ( p) Optial: S(p) = p (lina spdup) Paalll Efficinc: Spdup nalizd b th nub f pcsss S( p) E( p) p Optial: E(p) = 1.0 4

Pfanc tics (II) Exapl: Applicatin A taks 35 in. n a singl pcss, 27 n tw pcsss and 18 n 4 pcsss. 35 1.29 S( 2) 1.29 E( 2) 0. 645 27 2 35 S( 4) 1.94 485 18 1.94 E( 4) 0. 4 Adahl s Law (I) Basic ida: st applicatins hav a (sall) squntial factin, which liits th spdup T T T ft ( 1 f ) T ttal squntial f: factin f th cd which can nl b xcutd squntiall Tttal (1) S( p) 1 f ( f ) T p paalll ttal Ttal 1 1 f (1) f p Ttal 5

Exapl f Adahl s Law f=0 f=0.05 f=0.1 f=0.2 60.00 50.00 40.00 30.00 20.00 10.00 0.00 1 2 4 8 12 16 20 24 28 32 36 40 44 48 52 56 60 64 Adahl s Law (II) Adahl s Law assus, that th pbl siz is cnstant In st applicatins, th squntial pat is indpndnt f th pbl siz, whil th pat which can b xcutd in paalll is nt. 6

Pfanc tics (III) Scalup: ati f th xcutin ti f a pbl f siz n n 1 pcss t th xcutin ti f th sa pbl f siz n*p n p pcsss Tttal (1, n) Sc ( p) T ( p, n* p) ttal Optiall, xcutin ti ains cnstant,.g. T ttal ( p, n) T (2 p,2n) ttal Clust Cputing Clust: cllctin f individual PC s (cput nds) cnnctd b a (high pfanc) ntwk intcnnct Each cput nd is an indpndnt ntit with its wn Pcss ain On ultipl ntwking cads All cput nds tpicall hav accss t a shad fil sst (.g. Ntwk Fil Sst (NFS) ) Rvs th ncssit t plicat pgas and data n all cput nds All accsss t fils qui cunicatin v th ntwk 7

Cncptual Viw Ntwk cad Had div Ntwk cad Had div Ntwk cad Had div Ntwk cad Had div Ntwk cad Had div Ntwk cad Had div Ntwk cad Had div Ntwk Intcnnct Ntwk cad Had div Clust Cpnnts (I) Cput nds stl basd n gula PC tchnlg Intl AD pcsss 1-4GB f ain p c Opating Ssts: tpicall Linux/UNIX anagnts f sucs: clust schdul anags allcatin f cput nds t uss 8

Clust Cpnnts (II) Ntwking tics: Latnc: inial ti t snd a v sht ssag f n cunicatin ndpint t an th ndpint Unit: s, μs Bandwidth: aunt f data which can b tansfd f n pcss t anth in a ctain ti fa Unit: Bts/sc,, GB/s; Bits/sc,, Gb/s Of-th-shlf tchnlg vs. High-End Tchnlg Gigabit-Ethnt vs. InfiniBand, 10GE, Ca Gini st clusts cntain bth, a high-nd and a lw-nd ntwk intcnnct Ntwk Tplg f iptanc f lag clusts If than n switch is quid hw a nds cnnctd tic: Bisctin bandwidth Paalll Databass A paalll databas sst sks t ipv pfanc thugh paalllizatin f vaius patins data is std in a distibutd fashin distibutin is gvnd b pfanc cnsidatins. ipvs pcssing and input/utput spds b using ultipl s and disks in paalll. Paalll databass ftn us ultipcss achitctu Shad achitctu: ultipl pcsss sha th ain spac Shad disk achitctu: ach nd has its wn ain, all nds sha ass stag Shad nthing achitctu: ach nd has its wn ass stag as wll as ain. 9

Advantags f Paalll Databas Ssts Sst uss an ptiiz t tanslat SQL cands int a qu plan whs xcutin is dividd ang cput nds High lvl pgaing (SQL) ds nt qui an knwldg f undling hadwa A lt f data is alad std in databas ssts 20+ as f xpinc in paalll databas ssts Disadvantag f Paalll Databas Ssts Databas ssts hav ctain quints n th data fat Difficult t handl igula, unstuctud incplt data sts Databas ssts nt fficint in adding lag data vlus Pic f lag scal paalll databas ssts 10

Clud Cputing Clud Cputing: gnal t usd t dscib a class f ntwk basd cputing a cllctin/gup f intgatd and ntwkd hadwa, sftwa and Intnt infastuctu (calld a platf). Using th Intnt f cunicatin and tanspt pvids hadwa, sftwa and ntwking svics t clints Hids th cplxit and dtails f th undling infastuctu f uss and applicatins b pviding v sipl gaphical intfac API Clud Cputing (II) Th platf pvids n dand svics, that a alwas n, anwh, anti and an plac. Pa f us and as ndd Scal up and dwn in capacit and functinalit Th hadwa and sftwa svics a availabl t gnal public, ntpiss, cpatins and businsss akts Svics data a hstd n t infastuctu 11

Clud Svic dls Sftwa as a Svic (SaaS): xcut a spcific applicatin quid f businss / sach Platf as a Svic (PaaS): dpl cust catd applicatins Infastuctu as a Svic (IaaS): nt pcssing and cput capacit, stag, tc. Clud Cputing Sua Psitiv N nd f lcal IT infastuctu Scalabilit Rliabilit nt a aj cncn Iplicit sftwa updats Ngativ N pfanc guaants utilizatin f shad sucs Pivac, scuit, cplianc, tust Nd t valuat utilizatin/csts bnfits 12

Initial hadwa invstnt csts Initial sftwa invstnt csts aintnanc csts Sftwa dvlpnt ffts Sftwa Flxibilit Cpaisn f th platfs Clust cputing Paalll Databas Clud Cputing High High Z Lw High Z Lw Lw-diu Z High Lw diu High Lw diu Efficinc High High Lw Csts p jb Lw Lw High 13