Release Ultimately, big data is more about attitude than tools; data-driven organizations look at big data as a solution, not a problem.

Size: px
Start display at page:

Download "Release 2.0 11. Ultimately, big data is more about attitude than tools; data-driven organizations look at big data as a solution, not a problem."

Transcription

1 Release Issue , February Ultimately, big data is more about attitude tha tools; data-drive orgaizatios look at big data as a solutio, ot a problem. Roger Magoulas ad Be Lorica, from Big Data: Techologies ad Techiques for Large Scale Data, page 32

2 Release 2.0 Issue 11, February 2009 ISSN Published six times a year by O Reilly Media, Ic., 1005 Gravestei Highway North, Sebastopol, CA This ewsletter covers the world of iformatio techology ad the Iteret ad the busiess ad societal issues they raise. executive editor Tim O Reilly editor ad publisher Sara Wige art director Mark Paglietti copy editor Sarah Scheider cotributig writers Brady Forrest Jerry Michalski Sarah Milstei Peter Morville Natha Torkigto David Weiberger 2009, O Reilly Media, Ic. All rights reserved. No material i this publicatio may be reproduced without prior writte permissio; however, we gladly arrage for reprits, bulk orders, or site liceses. Idividual subscriptios cost $495 per year Cotets 01: Big Data: Techologies ad Techiques for Large-Scale Data By Roger Magoulas ad Be Lorica 01: Preface: Stories from the Field 02: Itroductio to Big Data 03: Why Big Data Matters 05: Big Data Techologies 06: Massively Parallel Processig (MPP) 09: Colum-Orieted Databases 12: How Colum Stores Work 13: MapReduce 18: Key Techology Dimesios 18: Sigle Server ad Distributed Data/Parallel Processig Clusters 19: A Data Architecture for Fast Platforms 23: Data Partitioig 24: MapReduce ad SQL 25: Relatioal ad Key/Value Pairs 26: Reliability ad Resiliece 26: Hardware Optios 28: Big Data Tool Feature Grid 29: Big Data Roadmaps 33: How Sciece Hadles Big Data 39: Disclosure 35: Boldly Goig Where No Data Has Goe Before 39: Ackowledgmets 40: Caledar subscriptio iformatio Release 2.0 PO Box North Hollywood, CA customer service

3 Roger Magoulas is the Director of Research at O Reilly. Be Lorica is a Seior Aalyst i O Reilly s Research group. Big Data: Techologies ad Techiques for Large-Scale Data Preface: Stories from the Field You take a leave of absece from a orgaizatio kow for hadlig big data to work o the data aalysis systems for the Obama campaig. You re faced with oe big server ad five terabytes of messy voter registratio data from multiple sources i multiple formats. You re tasked with optimizig get out the vote efforts by fidig out who has already voted ad removig those ames from the call-bak lists used by cavassers real-time, o electio day. With oly a few weeks to build the system, you assemble a small team of people comfortable with several aspects of big data maagemet, i.e., the size ad state of the data, aalytics, ad servig the data to may users ad may devices. I the ed, while there are problems o electio day, you are able to clea 1.6 millio voters from the call lists the campaig distributes to cavassers that afteroo, makig those lists 25% shorter, o average. Your kowledge ad experiece with big data maagemet makes a complex task maageable i a tight timeframe with a small team. Ad, you spare 1.6 millio supporters a uecessary phoe call. Alteratively, you work for a large social etworkig website ad you re tasked with creatig a aalysis ifrastructure that serves a uique group of users. The data is already large ad growig fast. There s o real busiess pla ad the data is iterestig beyod just its busiess isights there s fodder for real sociology research i your social graph. There s o time ad o guidace, but everyoe thiks the data is importat. You determie that the Hadoop implemetatio of MapReduce provides the scalig, performace, ad flexibility you eed. There s o requiremet to predefie a schema, so you ca just throw data ito the Hadoop platform. Your developers ad aalysts ca build ad use the various access poits, with the help of a few tutorials. Over time, the most effective ad repeatable aalysis : 01

4 Release February 2009 Big Data: Techologies ad Techiques for Large-Scale Data Roger Magoulas ad Be Lorica patters become clear ad you start to develop access tools that support differet classes of users, e.g., programmig laguage APIs for developers ad SQL-like access for aalysts. Your ow data-drive orgaizatio has the ifrastructure to capture all the data geerated by your website, ru experimets ad oe-off aalyses, ad idetify opportuities to build more formal, repeatable data aalysis iterfaces whe eeded. We heard these stories from folks doig leadig-edge big data implemetatios, ad their experieces are t isolated or uusual. More ad more orgaizatios are facig the challeges ad opportuities of workig with big data, ad they re trasformig themselves i the process. We ve see that the orgaizatios that best hadle their big data challeges ca gai competitive advatage ad improve their product ad service offerigs. For those orgaizatios facig ew challeges ad ew opportuities regardig big data, we preset a roadmap of choices ad trade-offs for largescale data maagemet. There s a lot to make sese of ad may competig perspectives. The high-level descriptios ad guidace regardig what to cosider ca iform a deeper dive ito makig decisios about your big data eviromet. Itroductio to Big Data Big Data: whe the size ad performace requiremets for data maagemet become sigificat desig ad decisio factors for implemetig a data maagemet ad aalysis system. For some orgaizatios, facig hudreds of gigabytes of data for the first time may trigger a eed to recosider data maagemet optios. For others, it may take tes or hudreds of terabytes before data size becomes a sigificat cosideratio. We re at the frot edge of a data deluge, brought o by ew, pervasive data sources. A few years ago, a retailer aalyzig thousads of T-shirt sales to discer customer behavior thought it was dealig with big data. Today, social etworkig compaies examie hudreds of millios of persoal iteractios to idetify social treds ad relatioships, ad eergy utilities plow through petabytes of sesor data to uderstad use treds ad demad projectios. Give the scale of today's data sets, traditioal approaches to data acquisitio, maagemet, ad aalysis do't always measure up. Over the past year, we ve oticed a umber of fait sigals, from the people we talk to ad the data we research, that makig sese of large-scale data 02 :

5 stores is icreasigly iterestig ad importat. What we fid most otable is the broad array of orgaizatios, from large eterprises ad govermet agecies dow to startups, that are tacklig big data the size of the orgaizatio o loger directly correlates with the size of the data challeges. We re also seeig the role of data becomig more cetral to busiess strategy. Compaies like Google (where aalytics is at the heart of how they maage ad reveue), Facebook, (which is attemptig to haress the power of data o its social graph of users to develop its busiess pla), or Twitter (focusig o aalyzig its micro-messagig data as the basis for a busiess model) are examples of compaies orgaized aroud data isight. Vedors ad various ope source commuities are respodig with a ew set of tools ad techiques to hadle this emergig focus o data. We see ew big data challeges, growig iterest i the topic, ad a icreasigly diverse set of tools available to address these challeges. Big data is a big topic. To help make sese of all that big data etails, we divide the topic ito three broad activities: Data acquisitio Data maagemet Aalysis ad isight Data acquisitio whether from data-collectig sesors, icreasigly computerized systems, web cotet, telematics, social etworks, or ubiquitous computig leads to the eed to store ad maage more data, data that ca become valuable with access ad iterative, repeatable aalysis. This puts data maagemet at the ceter of big data scalig to acquire more data ad providig fast, coveiet access ad sophisticated aalysis to all that data. I this report we focus o data maagemet as a critical lik i the big data story. We ll ivestigate differet approaches to hadlig large-scale data, describe the techology, idetify key trade-offs, ad address resource requiremets. Why Big Data Matters We believe that orgaizatios eed to embrace ad uderstad data to make better sese of the world. (We believe it so much that O Reilly co-sposored a ucoferece coverig Collective Itelligece topics, e.g., data miig ad aalytics aroud huma behavior.) Big data matters because: The world is icreasigly awash i sesors that create more data both explicit sesors like poit-of-sales scaers ad RFID tags, ad implicit sesors like cell phoes with GPSs ad search activity. Key Takeaway Buildig ad makig sese of massive databases is the core competecy of the iformatio age. Beig better at data is why Google beat Yahoo! ad Microsoft i search ad oe reaso why Barack Obama beat Joh McCai. A bad ecoomy accelerates the importace of big data compaies without big data competecies will be left behid. : 03

6 Release February 2009 Big Data: Techologies ad Techiques for Large-Scale Data Roger Magoulas ad Be Lorica Haressig both explicit ad implicit huma cotributio leads to far more profoud ad powerful isights tha traditioal data aalysis aloe, e.g.: Google ca detect regioal flu outbreaks seve to te days faster tha the Ceters for Disease Cotrol ad Prevetio by moitorig icreased search term activity for phrases associated with flu systems, [M. Helft, Google Uses Searches to Track Flu s Spread, New York Times, 11/11/2008] MIT researchers were able to predict locatio ad social iteractios by aalyzig patters i geo/spatial/proximity data collected from studets usig GPS-eabled cell phoes for a semester, [N. Eagle ad A. Petlad, Reality miig: sesig complex social systems, Persoal ad Ubiquitous Computig, Vol 10, #4, ] IMMI captures media ratig data by givig participats special cell phoes that moitor ambiet oise ad idetify where ad what media (e.g., TV, radio, music, video games) a perso is watchig, listeig to, or playig, [J. Poti, Are Those Commercials Workig? Just Liste. New York Times, 9/9/2007] Competitive advatage comes from capturig data more quickly, ad buildig systems to respod automatically to that data. The practice of sesig, processig, ad respodig (based o pre-built models of what matters, "the database of expectatios," so to speak) is arguably the hallmark of livig thigs. We're ow startig to build computers that work the same way. Ad we're buildig eterprises aroud this ew kid of sese-ad-respod computig ifrastructure. As our aggregate behavior is measured ad moitored, it becomes feedback that improves the overall itelligece of the system, a pheomeo Tim O Reilly refers to as haressig collective itelligece. With more data becomig publicly available, from the Web, from public data sharig sites like Ifochimps, Swivel, ad IBM s May Eyes, from icreasigly trasparet govermet sources, from sciece orgaizatios, from data aalysis cotests (e.g., Netflix), ad so o, there are more opportuities for mashig data together ad ope sourcig aalysis. Brigig disparate data sources together ca provide cotext ad deeper isights tha what s available from the data i ay oe orgaizatio. Experimetatio ad models drive the aalysis culture. At Google, the search quality team has the authority ad madate to fie-tue search rakigs ad results. To boost search quality ad relevacy, they focus o tweakig the algorithms, ot aalyzig the data. 04 :

7 Models improve as more data becomes available, e.g., Google s automatic laguage traslatio tools keep gettig better over time as they absorb more data*. Models ad algorithms become the focus, ot data maagemet. Big data repositories provide the opportuity, via aalysis, for isights that ca help you uderstad ad guide your orgaizatio s activities ad behaviors. You ca improve results by combiig more data from more sources with more sophisticated aalysis ad models. The power of big data eeds to be tempered with the resposibility of protectig privacy ad civil liberties, prevetig sesitive data from gettig hacked or iappropriately shared, ad treatig people geeratig the data fairly. The isights gaied from big data ca be used to improve products ad customer service, but they ca also be used i ways that creep out customers ad make them feel ucomfortable or watched. The idustry does t have all the aswers, e.g., academic research shows it s difficult to create truly aoymous data. There are techiques, such as aggregatig data beyod the level of a idividual, that ca protect privacy while still allowig isightful aalysis. Although it s beyod the scope of this report to address privacy issues i detail, you ll eed to cosider them as you work with big data. Big Data Techologies Surveyig the techology aroud big data, there are three fudametal strategies for storig ad providig fast access to large data sets: Improved hardware performace ad capacity Faster CPUs More CPU cores Requires parallel/threaded operatios to take advatage of multicore CPUs Icreased disk capacity ad data trasfer throughput Icreased etwork throughput Reducig the size of data accessed Data compressio Key Takeaway Big data is drivig ew approaches: MPP, MapReduce, colum-orieted data are all becomig essetial parts of the database toolkit. The relatioal model is o loger the oly database that matters. For may problems, MapReduce-style processig is superior. What s more, it s easier for may programmers to uderstad ad implemet tha SQL. * FOOTNOTE: How Google traslates without uderstadig; Most of the right words, i mostly the right order, by Bill Softky, The Register, May 15, google_traslatio/ Parallelism is the aswer to big data challeges: it lets you divide ad coquer, ad it s built to scale. : 05

8 Release February 2009 Big Data: Techologies ad Techiques for Large-Scale Data Roger Magoulas ad Be Lorica Data structures that, by desig, limit the amout of data required for queries (e.g., bitmaps, colum-orieted databases) Distributig data ad parallel processig Puttig data o more disks to parallelize disk I/O Various schemes to put slices of data o separate compute odes that ca work o these smaller slices i parallel; icludes custom hardware ad commodity server architectures Massively distributed architectures lead to a emphasis o fault tolerace ad performace moitorig As the umber of odes i a cluster icreases, failures or slowdows are ievitable ad the systems eeds resiliecy to recover from faults i a reliable maer Higher-throughput etworks to improve data trasfer betwee odes These three techology strategies udergird the discussio that follows. Because they are ot discrete, mutually exclusive approaches, we do t offer simple apples-to-apples comparisos; each separably ad i combiatio ca provide effective platforms for hadlig big data challeges. Massively Parallel Processig (MPP) Key Takeaway Massively Parallel Processig (MPP) ca dramatically improve query ad load performace for all data types. It works with existig relatioal tools ad ifrastructure, ad you ca throw hardware at a cluster to improve performace. The MPP relatioal/sql database architecture spreads data over a umber of idepedet servers, or odes, i a maer trasparet to those usig the database. We focus o aalytic MPP systems usually called shared-othig databases, as the odes that make up the cluster operate idepedetly, commuicatig via a etwork but ot sharig disk or memory resources (see sidebar). With moder multi-core CPUs, MPP databases ca be cofigured to treat each core as a ode ad ru tasks i parallel o a sigle server. By distributig data across odes ad ruig database operatios across those odes i parallel, MPP databases are able to provide fast performace eve whe hadlig very large data stores. The massively parallel, or sharedothig, architecture allows MPP databases to scale performace i a earliear fashio as odes are added, i.e., a cluster with eight odes will ru twice as fast as a cluster with four odes for the same data. The collectio of servers that make up a MPP system is kow as a cluster. Withi a MPP cluster there are two topologies: Master ode as a sigle poit for all cluster coectios, for aggregatig results ad orchestratig activities o the rest of the odes i the cluster. 06 :

9 Peer architecture where all odes i the cluster ca be used to coect to data, ad ca aggregate results ad coordiate activity across the cluster. The topologies represet trade-offs ivolvig the umber of coectios, ease of addig ad removig compute or data odes, ad availability. Peer architectures offer more flexibility with more coordiatio overhead tha master ode architectures. A key to MPP performace is distributig the data evely across all the odes i the cluster. This requires idetifyig a key whose value is radom eough that, eve over time, the data does ot cocetrate i oe or a subset of odes. The MPP databases have algorithms that help keep the data distributed i practice, usig fields like dates or states for distributio may lead to a skewed data distributio ad poor performace. MPP databases are available i three flavors: tightly coupled with proprietary hardware as a data appliace, loosely coupled with a specially cofigured server platform as a data appliace, ad idepedet as software. Data appliaces help reduce istallatio ad cofiguratio complexity for cliets ad help vedors optimize performace; they are a popular optio for MPP databases (see Hardware Optios sectio for more detail). The focus of moder shared-othig MPP system vedors is o easy implemetatio ad operatios coupled with SQL compliace. MPP systems ca create more work for system admiistrators ad desigers: The eed to admiister all server odes i a cluster ad a dedicated etwork more parts to break ad take care of More complex database performace moitorig ad problem resolutio Idetifyig keys that evely distribute the data across the odes i the clusters, especially over time ad to support jois Rebalacig/redistributig data if the umber of odes i the cluster is chaged (some MPP systems offer optios to help redistribute data to ew odes) MPP Architectures There are shared-everythig architectures for MPP from Oracle (RAC) ad IBM. Sharedeverythig is optimized to support OLTP (trasactioal) operatios ad requires sychroizatio overhead betwee odes to esure data itegrity (i.e., trasactios are atomic ad complete with o collisios). Shared-othig MPP are optimized for readitesive tasks associated with aalysis. Shared-othig MPP databases are ot ew. There were versios available from a umber of vedors i the mid- 80s. I the 90s, NCR s Teradata became the most promiet MPP database vedor, sellig a MPP appliace to mostly eterprise customers. Usig custom hardware, Teradata s appliace became a popular choice for hadlig big data eeds that exteded beyod what oe machie could hadle. I recet years, a umber of vedors have emerged with shared-othig MPP architectures, icludig Netezza, Greeplum, Kogitio WX2, ad Aster Cluster. These ew etrats compete by offerig better value, either via ruig o commodity servers or clever hardware cofiguratios or by removig restrictios o data warehouse desig. Shared-othig MPP databases are ot desiged for OLTP workloads ad they do t have the coectios, robust trasactio support, ad other features associated with trasactioal systems. MPP database advatages: Fast query ad load performace; scalable by throwig more hardware at the cluster Stadard SQL Easy itegratio with ETL (extract/trasform/load), visualizatio, ad display tools No ew skills required for SQL or SQL abstractio layer-savvy developers : 07

10 Release February 2009 Big Data: Techologies ad Techiques for Large-Scale Data Roger Magoulas ad Be Lorica Geerally fast to istall, cofigure, ad ru Parallelizatio available via stadard SQL; o special codig required Other cosideratios: Performace is affected by the choice of distributio keys; skewed data distributios slow queries Hardware costs ad eergy costs eve with low-cost commodity servers Limits to the umber of coectios (hudreds of users at the upper limit) Maagig simultaeous operatios ca be difficult, as orchestratig parallel operatios is complex ad aalysis eviromets ted to ru may table scas. Queries with complex, multi-table jois ca ru slowly whe too much data eeds trasferrig betwee odes Attetio to distributio keys ca improve joi performace by colocatig commoly joied data MPP vedors cocetrate their egieerig effort to reduce ad speed up itercoect traffic, ad, to co-locate related data o the same odes Data eeds to be redistributed ad rebalaced whe ew odes are added MPP databases beefit from multi-core CPUs (parallelism is available for each core), large amouts of RAM, ad direct-attached disks Clusters of heterogeous odes are limited by the performace of the slowest ode with evely distributed data Check with vedors o recommeded etwork optios; some MPP database vedors suggest high-speed etworks for optimal performace. MPP databases are geerally sold i two flavors: As a appliace with hardware ad software budled together. As software that rus o commodity hardware. Software MPP databases ca be packaged with commodity hardware ad sold as a appliace. The feature set for MPP databases is evolvig to iclude MapReduce ad colum-orieted storage, puttig the MPP database at the ceter of a big data architecture that supports multiple operatig modes. Here s a summary of recetly itroduced or plaed features for MPP databases: Itegratig MapReduce with the database Addig colum-orieted storage optios Icreased moitorig ad adaptive operatios 08 :

11 Support for icreasig the umber of simultaeous coectios, via improved queueig algorithms Fast data loadig via direct writes to files Optios for rebalacig data Makig ew compute odes available before the data is rebalaced Backgroud rebalacig (to avoid a complete dump ad reload of data; ot a simple task whe dealig with terabytes or more of data) Optios to keep data i-memory to icrease performace More compressio optios, geared towards reducig the amout of storage eeded while miimizig the impact o load ad query performace a tricky balacig act that depeds o data ad system load MPP databases are a relatively easy trasitio for a orgaizatio already steeped i relatioal database techologies ad resources. Colum-Orieted Databases Relatioal Database Maagemet Systems (RDBMSs) typically store table data as rows, i.e., all the colums associated with a row are stored ad retrieved together regardless of the umber of colums i the row used. I a columorieted database, the data is stored by colums, ad, whe possible, tured ito bitmaps or compressed i other ways to reduce the amout of data stored (see sidebar). Compressig colums reduces how much data eeds storig; the combiatio of compressed data ad retrievig oly the colums requested speeds query performace by reducig the amout of I/O required ad icreasig the amout of query data that ca be stored i fast memory. The techiques for reducig the data footprit of colum data work best o iteger colums with few distict values. More complex data types ad more complex relatioships betwee colums reduces compressio opportuities, icreasig data sizes ad slowig query performace. Colum-orieted databases are relatioal, usig SQL as the laguage for accessig ad maipulatig data, ad the same set of theory cocepts that udergird covetioal RDBMSs, i.e., tables ca be joied, filtered, grouped, ad ordered. The colum orietatio exists uder the covers for most users of the database. Holdig the colums together ito table etities requires joi idices, a extra layer of storage ad abstractio compared to traditioal roworieted relatioal databases. This differece is felt most by desigers ad admiistrators, as they eed to map query patters ad requiremets to colum idex ad compressio strategies. Key Takeaway Colum stores are impressively fast whe used with the right type of data, e.g., time series ad trial data. They ve gaied adherets i the past two years, ad are overcomig their udeserved reputatio for requirig extra egieerig effort. : 09

12 Release February 2009 Big Data: Techologies ad Techiques for Large-Scale Data Roger Magoulas ad Be Lorica Because the differet colum idex strategies have differet performace characteristics, desigers create multiple idexes o the same colum ad let the optimizer pick the best choice. Usig multiple idexes icreases how much data eeds to be stored, which limits the overall reductio i data size from compressig the colums. Colum-orieted databases provide fast query performace for aalysis eviromets with mostly iteger data, e.g., time series ad bioiformatics, ad whe most queries focus o a sigle or small subset of colums. The advatages of colum-orieted DBMSs dimiish for queries that require may colums ad complex table jois due to the extra overhead of brigig all the colums together. The colum-orieted database folks we iterviewed desiged aroud complex table jois to avoid the impact o performace. Colum-orieted databases speed performace by limitig the amout of data eeded to process queries. With less data to move aroud, disk throughput ad latecy becomes less critical ad etwork storage devices* becomes a optio. Network storage devices provides the followig scalig optios: 1) icrease storage capacity by addig disks to the storage uit; 2) improve disk I/O by addig more disk spidles to icrease parallel reads; 3) icrease data throughput betwee the server ad storage uit with a high-speed, dedicated etwork (e.g., fiber chael); 4) add compute capacity with reader servers attached to the storage uit (additioal servers coect to a sigle storage device). Colum-orieted databases have bee aroud for more tha a decade, first popularized by Sybase IQ ad joied by a umber of commercial ad ope source offerigs i the last few years. Colum-orieted databases are desiged to be easy to istall. Resource impacts iclude: DBAs ad desigers eed to determie the idex strategy for each colum; vedors provide idex aalysis tools that recommed idex strategies based o colum data Developers ad aalysts should be aware that query performace tuig icludes limitig the umber of requested colums Usig etwork storage devices requires more attetio to etwork ad storage uit admiistratio * FOOTNOTE: We are usig etwork storage devices as a geeric term for two types of shared, etworkedattached storage devices: Storage Area Networks (SAN) ad Networ- Attached Storage (NAS). 10 :

13 Colum-orieted database advatages: Fast read query performace: Data size, memory requiremets, ad I/O are reduced by data compressio ad by oly accessig requested colums Stadard SQL itegrates with relatioal database tools ad iterfaces for database desig, data access, query, ad aalysis Compressio ad fast query performace ca allow a smaller hardware platform Works best with iteger data ad queries that access oe or a few colums, E.g., time series, fiace data, sesor data, or bioiformatics Compressio reduces the amout of disk storage required Best compressio with low cardiality (data with few distict values), iteger data Compressio gais ca be partially offset by the eed for extra idexes to support differet query requiremets ad high cardiality data Architectures are available for hadlig may coectios Other cosideratios: Compressig data ad idex buildig ca slow write performace Vedors all have fast data-loadig optios ad focus egieerig resources o improvig write performace Performace is impacted by large ustructured text, complex joi logic, ad high cardiality data (i.e., colums with may distict values) Scalig for performace ad data size ca be complex, requires plaig, depeds o hardware topology ad DBMS optios, ad ca be expesive Colum-orieted databases perform best o systems with lots of RAM ad multi-core CPUs For systems with a etwork storage device, addig disk spidles, RAM cachig, ad high-throughput, dedicated coectios betwee the storage uit ad the server ca help improve performace Scalig MPP colum-orieted databases is similar to scalig roworieted MPP databases; for MPP architecture, direct-attached disks are recommeded Future features of colum-orieted DBMSs MPP cofiguratios Faster data-loadig algorithms Icreased support for text ad ustructured data : 11

14 Release February 2009 Big Data: Techologies ad Techiques for Large-Scale Data Roger Magoulas ad Be Lorica Colum-orieted DBMSs create performace advatages o a give hardware platform by reducig the amout of data that eeds to be processed. With the right data eviromet, colum-orieted DBMSs ca be a optio to meet performace requiremets while maitaiig a fit with existig relatioal database tools ad ifrastructure especially for orgaizatios tryig to miimize the umber of servers required to hadle a give data volume. How Colum Stores Work Uderstadig how colums ca be compressed or tured ito bitmaps helps explai the techology uderlyig colum-orieted databases or addig columorieted support to other database schemes. These examples show how colum databases maage colum data for compressio ad fast reads (see C-Store: A colum-orieted DBMS, Proceedigs of the 31st VLDB Coferece, 2005, by Stoebraker, Abadi, et al.). Commercial colum-orieted databases also have more complex ecodig ad compressio schemes tha those outlied below. For data that ca be ordered ad has few distict values, the colum ca be ecoded by storig the value, whe the value first appears, ad how may times the value appears reducig the umber of rows required to represet a colum of may rows to oe row for each value. The figure below shows how the origial values map to the ew (v, s, ) ecodig (v -: value, s -: start, :, umber of times v appears). The value 303 is stored as (303, 7, 5), i.e., the first istace of the value 303 is row seve ad the value appears five times. 12 :

15 Whe the colum sort depeds o a foreig colum, the colum ca be ecoded with the value ad a bitmap of the relative positio of where the value is stored. The bitmaps are typically sparse ad ca be more efficietly stored ad idexed. The followig diagram shows the origial, usorted colum data ad the map to the value bitmap. I this example, the value 303 bitmaps looks like , i.e., the value 303 occurs i the 3rd ad the 9th positio of the bitmap. These compressio techiques led themselves best to iteger data ad to colums with low cardiality (i.e., colums with few distict values). Iteral dictioaries ad other lookup strategies ca be used that process text fields more like itegers. Colum store strategies do t work as well o colums with may distict values or o ustructured data. MapReduce I the last few years, we ve bee hearig about ad have bee itrigued by the buzz surroudig MapReduce from startups ad large techology compaies. May of the data-itese startups we meet with are usig or pla to use Hadoop for some or all of their data maagemet. We see the ethusiasm about MapReduce comig from: The massive scale of data Google is processig with their MapReduce ifrastructure as proof the techology works Key Takeaway MapReduce is the ext, ew thig i big data, scalig to meet the biggest data processig eviromets, e.g., petabytes at Google, Yahoo!, Facebook. : 13

16 Release February 2009 Big Data: Techologies ad Techiques for Large-Scale Data Roger Magoulas ad Be Lorica The success of Hadoop at Facebook, Yahoo!, The New York Times ad others The availability of Amazo Web Services (AWS) as a coveiet ad cheap platform for tryig MapReduce Small orgaizatios lookig for a affordable, scalable way to maage ad aalyze big data The expese of commercial big data products MapReduce refers both to a style of programmig ad to a parallel data processig egie for maagig large-scale, distributed data. As a style of programmig, the cocept is based o combiig fuctios (map ad reduce) commo to may programmig laguages. The map fuctio performs filterig or trasformatios before creatig output records as a key/ value pair. A split fuctio, most typically a hash (but ay determiistic fuctio that guaratees the data always lads o the same server will work), distributes these records for storage to the servers that make up the system. The reduce fuctio performs some type of aggregate fuctio o all the records i the bucket associated with a key. The map fuctios ca be distributed to ru o differet odes i a cluster, with each map give a portio of the iput data to process. By aalogy to SQL, the map is like group by ad the reduce is like a aggregate fuctio (e.g., sum or cout) for a aggregate query. MapReduce programmig ca be applied i other cotexts for example, agaist a distributed relatioal database (both Aster Data ad Greeplum have released versios of their MPP databases with MapReduce fuctioality) or key/ value pair data stores like CouchDB. As a parallel data processig egie, MapReduce is most closely associated with Google s implemetatio ad with Hadoop, the ope source cloe of MapReduce supported by Yahoo!, Facebook, Cloudera, ad others. Google implemeted MapReduce as a stack that was the used as the ispiratio for Hadoop. Sice much of the Hadoop documetatio refereces the Google MapReduce stack, we describe the stack i more detail: Google File System (GFS) GFS triplicates all data across a cluster. If a ode i the cluster becomes uavailable, the data is the automatically replicated. BigTable: a distributed data storage system with a multidimesioal data structure BigTable uses the terms rows ad colums differetly tha they are used i relatioal databases; the rows ad colums are really elemets of a map (hash i Perl or Ruby, dictioary i Pytho, associative array i PHP, 14 :

17 object i JavaScript), ad BigTable is described as a multidimesioal sparse map. A spreadsheet aalogy may help values are accessed by usig row ad colum as a idex. All colums are timestamped to allow data versioig, with automatic retrieval of the most recet items. Users ofte serialize fields ito a sigle colum, creatig a simpler key/ value pair data structure that they ca deserialize o retrieval. A complete descriptio of the BigTable data structure is beyod the scope of this report. For more i-depth iformatio, see Bigtable: A Distributed System for Structured Data by Chag, Dea, et al. (http:// labs.google.com/papers/bigtable.html) for the official explaatio, or, for a simpler explaatio see Jim Wilso s Uderstadig HBase ad BigTable, at Hbase_ad_BigTable. MapReduce a cliet for performig parallel MapReduce o data stored i BigTable tables o GFS Sawzall a higher-level query ad aalysis laguage that rus o top of MapReduce, simplifyig filterig, aggregatio, ad statistics aalysis; aalogous to SQL Workqueue schedules tasks ad restarts jobs that fail Chubby system for coordiatig distributed applicatios, icludig cofiguratio ad sychroizatio The MapReduce platform icludes ode moitorig, fault detectio, ad queuig processes that help maage MapReduce jobs. From the user perspective, MapReduce provides a platform for operatig o may data items i parallel while isolatig the user from the details of ruig a distributed program, i.e., data distributio, replicatio, fault tolerace, ad schedulig. Google also uses the proprietary compressio algorithms BMDiff ad Zippy to shrik the size of data stored. Hadoop, a ope source Java framework for ruig applicatios i parallel across large clusters of commodity hardware, was created by Doug Cuttig, the developer of Lucee (search tool) ad Nutch (distributed web crawler). Cuttig was iflueced by what he leared about GFS ad MapReduce i , ad the Hadoop project grew out of his work o Nutch. I 2006 Doug was hired by Yahoo!, got a team of egieers, ad started the ope source Apache Foudatio Hadoop project to give Yahoo! the same type of distributed : 15

18 Release February 2009 Big Data: Techologies ad Techiques for Large-Scale Data Roger Magoulas ad Be Lorica processig that Google was ejoyig with their MapReduce platform. By early 2008 Hadoop was able to hit web-scale distributio (http://research.yahoo. com/files/cuttig.pdf). Hadoop has become the primary MapReduce platform used outside of Google. A Hadoop Summit i March, 2008, drew more tha 400 people. Over half were ruig Hadoop, with at least 15% ruig Hadoop o a miimum of 100 odes. Hadoop has oe or more corollaries to the Google MapReduce stack, as show i the followig table.: Google MapReduce Hadoop Notes GFS Hadoop Distributed File System (HDFS) Keeps triplicates of all data distributed across cluster odes BigTable HBase Alteratives iclude HyperTable MapReduce Hadoop MapReduce Hadoop Core budles MapReduce ad HDFS Hadoop Streamig Provides Hadoop access to ay stdi/stdout biary, icludeig iterpreted laguages like Shell Script, Pytho, Rails, ad Perl Sawzall Pig (Pig Lati) Data flow ad executio laguage Hive Facebook ope source query ad aalysis framework built o top of Hadoop Workqueue JobTracker Schedules ad maages jobs Chubby ZooKeeper Coordiatio system for distributed applicatios, icludig cofiguratio ad sychroizatio Compressio: BMDiff, Zippy zlib/gzip, LZO, bzip2 Google compressio optimized for processig speed, ot compressio Hadoop has a vibrat ad egaged user ad developer commuity. We expect Hadoop ad related offerigs to cotiue to improve, add fuctioality, ad geerate ew busiesses that support the Hadoop commuity. Implemetig MapReduce as a methodology ad as a data processig egie is best served by cosiderig your staff s collective skills ad experiece: Hadoop requires learig ew system admiistratio skills for istallig, cofigurig, moitorig, tuig, ad maiteace. Admiisterig multiple servers; addig ew servers to scale Simpler, more flexible data structures MapReduce table structures should be familiar to developers who work with programmig laguage data structures 16 :

19 Developers ad aalysts most familiar with relatioal databases will eed to lear the differet MapReduce data structures Gettig buy-i from techical resources ad/or fidig experieced MapReduce resources ca help with adoptio Traiig ad/or pilot projects ca help staff lear MapReduce ad ew approaches to data maagemet Programmig resources may be required to build high-level user iterfaces to replace RDBMS-orieted tools MapReduce advatages: Fast performace eabled by parallel processig o distributed data Trasparet, fault-tolerat executio of parallel data-hadlig processes Built-i resiliecy/fault tolerace coheret with scalig to large clusters Scalig to large-scale data volumes Potetial for thousads of odes Largest Hadoop clusters have 2,000 odes; Google s MapReduce is rumored to use more tha 10K servers (perhaps may more) for MapReduce jobs Scalig ad performace o commodity hardware Ca icrease ad decrease size of cluster via recofiguratio Prove cloud computig optios Simpler, more flexible data structure No requiremet to predefie data structure desig Parallelism ad resiliecy come free to developers ad aalysts, i.e., o explicit codig for parallelism is required Other cosideratios: Hardware ad eergy costs rise as the umber of servers icreases Admiistratio, desig, ad aalysis ifrastructure tools are available, but still maturig RDBMS-orieted DBA, desig, ad query tools do t work with MapReduce Less compute efficiecy compared to more heavily idexed alteratives i some circumstaces Depeds o data structure desig ad query complexity Ca lead to eedig more servers or servers that cosume more eergy per process : 17

20 Release February 2009 Big Data: Techologies ad Techiques for Large-Scale Data Roger Magoulas ad Be Lorica Future features of MapReduce: More SQL-like ad easier-to-use query ad aalysis tools (see Hive) Icreased scalig Hadoop has a desig target of 10K ode clusters i 2009 Tools to simplify admiistratio, cofiguratio, istallatio ad moitorig processes The orgaizatios we spoke with who use MapReduce had a cosistetly high opiio of their experiece. They liked the scalig ad flexible data structures. Their developers ad aalysts were quickly traied ad up-to-speed ad stayed egaged with the data ad, remai ethusiastic about usig MapReduce. MapReduce complemets a experimetal approach towards data, i.e., loadig raw data ito simple data structures ad ruig ad hoc ad oe-off aalysis util query patters emerge that ca be tured ito more formal ad easy-toaalyze data structures. This experimetal, discovery-orieted approach helps make MapReduce a good fit for orgaizatios tryig to make data cetral to busiess strategy ad decisio makig. Drawbacks oted iclude trial ad error fiddlig to get cofiguratio optimized ad maturity issues aroud documetatio ad features all cosidered relatively isigificat ad ot a hidrace to adoptio. Key Takeaway For most databases, a sigle server properly cofigured is eough. It s a big leap to move from oe to two servers, but oce you do, growig a cluster is relatively easy. Commodity hardware works, rides the mass market iovatio curve, ad avoids vedor lock-i. Key Techology Dimesios There s a lot to keep i mid whe ivestigatig big data techology. Alog with careful attetio to data, staff skills, ad usage, we recommed cosiderig the followig techology dimesios to make the best decisio for your orgaizatio. Sigle Server ad Distributed Data/ Parallel Processig Clusters Sigle Server A sigle box with oe or more sigle or multi-core CPUs ad direct-attached or etwork disk storage. Assumptios: Server performace ad data capacity cotiues to improve, i.e., CPUs gai cores ad capacity (per Moore s law), hard disk desity icreases, memory desity icreases, ad other factors make high-ed servers powerful eough to hadle ever-icreasig big data loads 18 :

Domain 1: Designing a SQL Server Instance and a Database Solution

Domain 1: Designing a SQL Server Instance and a Database Solution Maual SQL Server 2008 Desig, Optimize ad Maitai (70-450) 1-800-418-6789 Domai 1: Desigig a SQL Server Istace ad a Database Solutio Desigig for CPU, Memory ad Storage Capacity Requiremets Whe desigig a

More information

(VCP-310) 1-800-418-6789

(VCP-310) 1-800-418-6789 Maual VMware Lesso 1: Uderstadig the VMware Product Lie I this lesso, you will first lear what virtualizatio is. Next, you ll explore the products offered by VMware that provide virtualizatio services.

More information

Configuring Additional Active Directory Server Roles

Configuring Additional Active Directory Server Roles Maual Upgradig your MCSE o Server 2003 to Server 2008 (70-649) 1-800-418-6789 Cofigurig Additioal Active Directory Server Roles Active Directory Lightweight Directory Services Backgroud ad Cofiguratio

More information

Digital Enterprise Unit. White Paper. Web Analytics Measurement for Responsive Websites

Digital Enterprise Unit. White Paper. Web Analytics Measurement for Responsive Websites Digital Eterprise Uit White Paper Web Aalytics Measuremet for Resposive Websites About the Authors Vishal Machewad Vishal Machewad has over 13 years of experiece i sales ad marketig, havig worked as a

More information

ODBC. Getting Started With Sage Timberline Office ODBC

ODBC. Getting Started With Sage Timberline Office ODBC ODBC Gettig Started With Sage Timberlie Office ODBC NOTICE This documet ad the Sage Timberlie Office software may be used oly i accordace with the accompayig Sage Timberlie Office Ed User Licese Agreemet.

More information

To c o m p e t e in t o d a y s r e t a i l e n v i r o n m e n t, y o u n e e d a s i n g l e,

To c o m p e t e in t o d a y s r e t a i l e n v i r o n m e n t, y o u n e e d a s i n g l e, Busiess Itelligece Software for Retail To c o m p e t e i t o d a y s r e t a i l e v i r o m e t, y o u e e d a s i g l e, comprehesive view of your busiess. You have to tur the decisio-makig of your

More information

IT Support. 020 8269 6878 n www.premierchoiceinternet.com n support@premierchoiceinternet.com. 30 Day FREE Trial. IT Support from 8p/user

IT Support. 020 8269 6878 n www.premierchoiceinternet.com n support@premierchoiceinternet.com. 30 Day FREE Trial. IT Support from 8p/user IT Support IT Support Premier Choice Iteret has bee providig reliable, proactive & affordable IT Support solutios to compaies based i Lodo ad the South East of Eglad sice 2002. Our goal is to provide our

More information

Agenda. Outsourcing and Globalization in Software Development. Outsourcing. Outsourcing here to stay. Outsourcing Alternatives

Agenda. Outsourcing and Globalization in Software Development. Outsourcing. Outsourcing here to stay. Outsourcing Alternatives Outsourcig ad Globalizatio i Software Developmet Jacques Crocker UW CSE Alumi 2003 jc@cs.washigto.edu Ageda Itroductio The Outsourcig Pheomeo Leadig Offshore Projects Maagig Customers Offshore Developmet

More information

Enhancing Oracle Business Intelligence with cubus EV How users of Oracle BI on Essbase cubes can benefit from cubus outperform EV Analytics (cubus EV)

Enhancing Oracle Business Intelligence with cubus EV How users of Oracle BI on Essbase cubes can benefit from cubus outperform EV Analytics (cubus EV) Ehacig Oracle Busiess Itelligece with cubus EV How users of Oracle BI o Essbase cubes ca beefit from cubus outperform EV Aalytics (cubus EV) CONTENT 01 cubus EV as a ehacemet to Oracle BI o Essbase 02

More information

SOCIAL MEDIA. Keep the conversations going

SOCIAL MEDIA. Keep the conversations going SOCIAL MEDIA Keep the coversatios goig Social media is where most of the world is. It is therefore a ope source of cosumer data, a chael of commuicatio ad a platform for establishig relatioships with customers.

More information

Security Functions and Purposes of Network Devices and Technologies (SY0-301) 1-800-418-6789. Firewalls. Audiobooks

Security Functions and Purposes of Network Devices and Technologies (SY0-301) 1-800-418-6789. Firewalls. Audiobooks Maual Security+ Domai 1 Network Security Every etwork is uique, ad architecturally defied physically by its equipmet ad coectios, ad logically through the applicatios, services, ad idustries it serves.

More information

A Balanced Scorecard

A Balanced Scorecard A Balaced Scorecard with VISION A Visio Iteratioal White Paper Visio Iteratioal A/S Aarhusgade 88, DK-2100 Copehage, Demark Phoe +45 35430086 Fax +45 35434646 www.balaced-scorecard.com 1 1. Itroductio

More information

Professional Networking

Professional Networking Professioal Networkig 1. Lear from people who ve bee where you are. Oe of your best resources for etworkig is alumi from your school. They ve take the classes you have take, they have bee o the job market

More information

Domain 1: Identifying Cause of and Resolving Desktop Application Issues Identifying and Resolving New Software Installation Issues

Domain 1: Identifying Cause of and Resolving Desktop Application Issues Identifying and Resolving New Software Installation Issues Maual Widows 7 Eterprise Desktop Support Techicia (70-685) 1-800-418-6789 Domai 1: Idetifyig Cause of ad Resolvig Desktop Applicatio Issues Idetifyig ad Resolvig New Software Istallatio Issues This sectio

More information

Business Application Services. Business Applications that provide value to your enterprise.

Business Application Services. Business Applications that provide value to your enterprise. Busiess Applicatio Services Busiess Applicatios that provide value to your eterprise. Sesiple s expertise ca help orgaizatio decode the performace issues ad trasform them ito valuable beefits that meet

More information

ANALYTICS. Insights that drive your business

ANALYTICS. Insights that drive your business ANALYTICS Isights that drive your busiess Eterprises are trasformig their busiesses by supplemetig their databases with real ad up-to-date customer data. Aalytics, as a catalyst, refies raw data ad aligs

More information

Authentication - Access Control Default Security Active Directory Trusted Authentication Guest User or Anonymous (un-authenticated) Logging Out

Authentication - Access Control Default Security Active Directory Trusted Authentication Guest User or Anonymous (un-authenticated) Logging Out FME Server Security Table of Cotets FME Server Autheticatio - Access Cotrol Default Security Active Directory Trusted Autheticatio Guest User or Aoymous (u-autheticated) Loggig Out Authorizatio - Roles

More information

Domain 1 Components of the Cisco Unified Communications Architecture

Domain 1 Components of the Cisco Unified Communications Architecture Maual CCNA Domai 1 Compoets of the Cisco Uified Commuicatios Architecture Uified Commuicatios (UC) Eviromet Cisco has itroduced what they call the Uified Commuicatios Eviromet which is used to separate

More information

*The most important feature of MRP as compared with ordinary inventory control analysis is its time phasing feature.

*The most important feature of MRP as compared with ordinary inventory control analysis is its time phasing feature. Itegrated Productio ad Ivetory Cotrol System MRP ad MRP II Framework of Maufacturig System Ivetory cotrol, productio schedulig, capacity plaig ad fiacial ad busiess decisios i a productio system are iterrelated.

More information

CCH Accountants Starter Pack

CCH Accountants Starter Pack CCH Accoutats Starter Pack We may be a bit smaller, but fudametally we re o differet to ay other accoutig practice. Util ow, smaller firms have faced a stark choice: Buy cheaply, kowig that the practice

More information

Domain 1: Configuring Domain Name System (DNS) for Active Directory

Domain 1: Configuring Domain Name System (DNS) for Active Directory Maual Widows Domai 1: Cofigurig Domai Name System (DNS) for Active Directory Cofigure zoes I Domai Name System (DNS), a DNS amespace ca be divided ito zoes. The zoes store ame iformatio about oe or more

More information

Baan Service Master Data Management

Baan Service Master Data Management Baa Service Master Data Maagemet Module Procedure UP069A US Documetiformatio Documet Documet code : UP069A US Documet group : User Documetatio Documet title : Master Data Maagemet Applicatio/Package :

More information

client communication

client communication CCH Portal cliet commuicatio facig today s challeges Like most accoutacy practices, we ow use email for most cliet commuicatio. It s quick ad easy, but we do worry about the security of sesitive data.

More information

Skytron Asset Manager

Skytron Asset Manager Skytro Asset Maager Meet Asset Maager Skytro Asset Maager is a wireless, pateted RFID asset trackig techology specifically desiged for hospital facilities to deliver istat ROI withi a easy to istall, fully

More information

Unicenter TCPaccess FTP Server

Unicenter TCPaccess FTP Server Uiceter TCPaccess FTP Server Release Summary r6.1 SP2 K02213-2E This documetatio ad related computer software program (hereiafter referred to as the Documetatio ) is for the ed user s iformatioal purposes

More information

Smart Connected Products & The Internet of Things

Smart Connected Products & The Internet of Things Smart Coected Products & The Iteret of Thigs Who we are Taget delivers Talet Globally. Established for 40 years we specialise i delivery of iovative & tailored talet solutios to customers aroud the world.

More information

Making training work for your business

Making training work for your business Makig traiig work for your busiess Itegratig core skills of laguage, literacy ad umeracy ito geeral workplace traiig makes sese. The iformatio i this pamphlet will help you pla for ad build a successful

More information

facing today s challenges As an accountancy practice, managing relationships with our clients has to be at the heart of everything we do.

facing today s challenges As an accountancy practice, managing relationships with our clients has to be at the heart of everything we do. CCH CRM cliet relatios facig today s challeges As a accoutacy practice, maagig relatioships with our cliets has to be at the heart of everythig we do. That s why our CRM system ca t be a bolt-o extra it

More information

CCH CRM Books Online Software Fee Protection Consultancy Advice Lines CPD Books Online Software Fee Protection Consultancy Advice Lines CPD

CCH CRM Books Online Software Fee Protection Consultancy Advice Lines CPD Books Online Software Fee Protection Consultancy Advice Lines CPD Books Olie Software Fee Fee Protectio Cosultacy Advice Advice Lies Lies CPD CPD facig today s challeges As a accoutacy practice, maagig relatioships with our cliets has to be at the heart of everythig

More information

Ideate, Inc. Training Solutions to Give you the Leading Edge

Ideate, Inc. Training Solutions to Give you the Leading Edge Ideate, Ic. Traiig News 2014v1 Ideate, Ic. Traiig Solutios to Give you the Leadig Edge New Packages For All Your Traiig Needs! Bill Johso Seior MEP - Applicatio Specialist Revit MEP Fudametals Ad More!

More information

TIAA-CREF Wealth Management. Personalized, objective financial advice for every stage of life

TIAA-CREF Wealth Management. Personalized, objective financial advice for every stage of life TIAA-CREF Wealth Maagemet Persoalized, objective fiacial advice for every stage of life A persoalized team approach for a trusted lifelog relatioship No matter who you are, you ca t be a expert i all aspects

More information

Full Lifecycle Project Cost Controls

Full Lifecycle Project Cost Controls Full Lifecycle Project Cost Cotrols EcoSys EPC is a ext geeratio plaig ad cost cotrols software solutio deliverig best practices for full lifecycle project cost maagemet i a itegrated, easy-to-use web

More information

iprox sensors iprox inductive sensors iprox programming tools ProxView programming software iprox the world s most versatile proximity sensor

iprox sensors iprox inductive sensors iprox programming tools ProxView programming software iprox the world s most versatile proximity sensor iprox sesors iprox iductive sesors iprox programmig tools ProxView programmig software iprox the world s most versatile proximity sesor The world s most versatile proximity sesor Eato s iproxe is syoymous

More information

insight reporting solutions

insight reporting solutions reportig solutios Create ad cotrol olie customized score reports to measure studet progress ad to determie ways to improve istructio. isight Customized Reportig empowers you to make data-drive decisios.

More information

Radio Dispatch Systems

Radio Dispatch Systems Radio Dispatch Systems ZETRON DISPATCH SOLUTIONS: AT THE CENTER OF YOUR CRITICAL OPERATIONS Your dispatch system is the ceterpoit through which your key operatios are coordiated ad cotrolled. That s why

More information

Online Banking. Internet of Things

Online Banking. Internet of Things Olie Bakig & The Iteret of Thigs Our icreasigly iteretcoected future will mea better bakig ad added security resposibilities for all of us. FROM DESKTOPS TO SMARTWATCHS Just a few years ago, Americas coducted

More information

The Big Picture: An Introduction to Data Warehousing

The Big Picture: An Introduction to Data Warehousing Chapter 1 The Big Picture: A Itroductio to Data Warehousig Itroductio I 1977, Jimmy Carter was Presidet of the Uited States, Star Wars hit the big scree, ad Apple Computer, Ic. itroduced the world to the

More information

Modified Line Search Method for Global Optimization

Modified Line Search Method for Global Optimization Modified Lie Search Method for Global Optimizatio Cria Grosa ad Ajith Abraham Ceter of Excellece for Quatifiable Quality of Service Norwegia Uiversity of Sciece ad Techology Trodheim, Norway {cria, ajith}@q2s.tu.o

More information

FortiGuard Fortinet s Global Security Research and Protection

FortiGuard Fortinet s Global Security Research and Protection SOLUTION BRIEF FortiGuard Fortiet s Global Research ad Protectio itelligece represets kowledge of the idetity, capabilities, ad itetios of idividuals ad orgaizatios egaged i espioage, sabotage, or theft

More information

Document Control Solutions

Document Control Solutions Documet Cotrol Solutios State of the art software The beefits of Assai Assai Software Services provides leadig edge Documet Cotrol ad Maagemet System software for oil ad gas, egieerig ad costructio. AssaiDCMS

More information

Effective Data Deduplication Implementation

Effective Data Deduplication Implementation White Paper Effective Data Deduplicatio Implemetatio Eterprises with IT ifrastructure are lookig at reducig their carbo foot prit ad ifrastructure maagemet cost by slimmig dow their data ceters. I cotrast,

More information

Domain 1 - Describe Cisco VoIP Implementations

Domain 1 - Describe Cisco VoIP Implementations Maual ONT (642-8) 1-800-418-6789 Domai 1 - Describe Cisco VoIP Implemetatios Advatages of VoIP Over Traditioal Switches Voice over IP etworks have may advatages over traditioal circuit switched voice etworks.

More information

Agency Relationship Optimizer

Agency Relationship Optimizer Decideware Developmet Agecy Relatioship Optimizer The Leadig Software Solutio for Cliet-Agecy Relatioship Maagemet supplier performace experts scorecards.deploymet.service decide ware Sa Fracisco Sydey

More information

CREATIVE MARKETING PROJECT 2016

CREATIVE MARKETING PROJECT 2016 CREATIVE MARKETING PROJECT 2016 The Creative Marketig Project is a chapter project that develops i chapter members a aalytical ad creative approach to the marketig process, actively egages chapter members

More information

Flood Emergency Response Plan

Flood Emergency Response Plan Flood Emergecy Respose Pla This reprit is made available for iformatioal purposes oly i support of the isurace relatioship betwee FM Global ad its cliets. This iformatio does ot chage or supplemet policy

More information

A guide to School Employees' Well-Being

A guide to School Employees' Well-Being A guide to School Employees' Well-Beig Backgroud The public school systems i the Uited States employ more tha 6.7 millio people. This large workforce is charged with oe of the atio s critical tasks to

More information

QUADRO tech. PST Flightdeck. Put your PST Migration on autopilot

QUADRO tech. PST Flightdeck. Put your PST Migration on autopilot QUADRO tech PST Flightdeck Put your PST Migratio o autopilot Put your PST Migratio o Autopilot A moder aircraft hardly remids its pilots of the early days of air traffic. It is desiged to eable flyig as

More information

CCH Practice Management

CCH Practice Management 1 CCH Practice Maagemet practice maagemet facig today s challeges Every year it seems we face more regulatios, growig cliet expectatios ad lower margis o our compliace work. It s a tough time for a accoutig

More information

3G Security VoIP Wi-Fi IP Telephony Routing/Switching Unified Communications. NetVanta. Business Networking Solutions

3G Security VoIP Wi-Fi IP Telephony Routing/Switching Unified Communications. NetVanta. Business Networking Solutions 3G Security VoIP Wi-Fi IP Telephoy Routig/Switchig Uified Commuicatios NetVata Busiess Networkig Solutios Opportuity to lower Total Cost of Owership ad improve Retur o Ivestmet The ADTRAN Advatage ADTRAN

More information

QUADRO tech. FSA Migrator 2.6. File Server Migrations - Made Easy

QUADRO tech. FSA Migrator 2.6. File Server Migrations - Made Easy QUADRO tech FSA Migrator 2.6 File Server Migratios - Made Easy FSA Migrator Cosolidate your archived ad o-archived File Server data - with ease! May orgaisatios struggle with the cotiuous growth of their

More information

Assessment of the Board

Assessment of the Board Audit Committee Istitute Sposored by KPMG Assessmet of the Board Whe usig a facilitator, care eeds to be take if the idividual is i some way coflicted due to the closeess of their relatioship with the

More information

How to use what you OWN to reduce what you OWE

How to use what you OWN to reduce what you OWE How to use what you OWN to reduce what you OWE Maulife Oe A Overview Most Caadias maage their fiaces by doig two thigs: 1. Depositig their icome ad other short-term assets ito chequig ad savigs accouts.

More information

On-Premise CRM to Salesforce Migration - Benefits, Challenges and Best Practices

On-Premise CRM to Salesforce Migration - Benefits, Challenges and Best Practices White Paper O-Premise CRM to Salesforce Migratio - Beefits, Challeges ad Best Practices With the advet of cloud computig, orgaizatios are lookig to move their Customer Relatioship Maagemet (CRM) applicatios

More information

PUBLIC RELATIONS PROJECT 2016

PUBLIC RELATIONS PROJECT 2016 PUBLIC RELATIONS PROJECT 2016 The purpose of the Public Relatios Project is to provide a opportuity for the chapter members to demostrate the kowledge ad skills eeded i plaig, orgaizig, implemetig ad evaluatig

More information

E-Plex Enterprise Access Control System

E-Plex Enterprise Access Control System Eterprise Access Cotrol System Egieered for Flexibility Modular Solutio The Eterprise Access Cotrol System is a modular solutio for maagig access poits. Employig a variety of hardware optios, system maagemet

More information

INVESTMENT PERFORMANCE COUNCIL (IPC) Guidance Statement on Calculation Methodology

INVESTMENT PERFORMANCE COUNCIL (IPC) Guidance Statement on Calculation Methodology Adoptio Date: 4 March 2004 Effective Date: 1 Jue 2004 Retroactive Applicatio: No Public Commet Period: Aug Nov 2002 INVESTMENT PERFORMANCE COUNCIL (IPC) Preface Guidace Statemet o Calculatio Methodology

More information

How to read A Mutual Fund shareholder report

How to read A Mutual Fund shareholder report Ivestor BulletI How to read A Mutual Fud shareholder report The SEC s Office of Ivestor Educatio ad Advocacy is issuig this Ivestor Bulleti to educate idividual ivestors about mutual fud shareholder reports.

More information

TruStore: The storage. system that grows with you. Machine Tools / Power Tools Laser Technology / Electronics Medical Technology

TruStore: The storage. system that grows with you. Machine Tools / Power Tools Laser Technology / Electronics Medical Technology TruStore: The storage system that grows with you Machie Tools / Power Tools Laser Techology / Electroics Medical Techology Everythig from a sigle source. Cotets Everythig from a sigle source. 2 TruStore

More information

A GUIDE TO BUILDING SMART BUSINESS CREDIT

A GUIDE TO BUILDING SMART BUSINESS CREDIT A GUIDE TO BUILDING SMART BUSINESS CREDIT Establishig busiess credit ca be the key to growig your compay DID YOU KNOW? Busiess Credit ca help grow your busiess Soud paymet practices are key to a solid

More information

Analyzing Longitudinal Data from Complex Surveys Using SUDAAN

Analyzing Longitudinal Data from Complex Surveys Using SUDAAN Aalyzig Logitudial Data from Complex Surveys Usig SUDAAN Darryl Creel Statistics ad Epidemiology, RTI Iteratioal, 312 Trotter Farm Drive, Rockville, MD, 20850 Abstract SUDAAN: Software for the Statistical

More information

Handling. Collection Calls

Handling. Collection Calls Hadlig the Collectio Calls We do everythig we ca to stop collectio calls; however, i the early part of our represetatio, you ca expect some of these calls to cotiue. We uderstad that the first few moths

More information

Ken blanchard college of business

Ken blanchard college of business Ke blachard College of BUSINESS a history of excellece Established i 1949, Grad Cayo Uiversity has more tha a 60-year track record of helpig studets achieve their academic goals. The Ke Blachard College

More information

Privacy Guide for Small Businesses: The Basics

Privacy Guide for Small Businesses: The Basics Office of the Privacy Commissioer of Caada PIPEDA Privacy Guide for Small Busiesses: The Basics Privacy is the best policy Hadlig privacy cocers correctly ca help improve your orgaizatio s reputatio. Whe

More information

AGC s SUPERVISORY TRAINING PROGRAM

AGC s SUPERVISORY TRAINING PROGRAM AGC s SUPERVISORY TRAINING PROGRAM Learig Today...Leadig Tomorrow The Kowledge ad Skills Every Costructio Supervisor Must Have to be Effective The Associated Geeral Cotractors of America s Supervisory

More information

LEASE-PURCHASE DECISION

LEASE-PURCHASE DECISION Public Procuremet Practice STANDARD The decisio to lease or purchase should be cosidered o a case-by case evaluatio of comparative costs ad other factors. 1 Procuremet should coduct a cost/ beefit aalysis

More information

A Guide to Better Postal Services Procurement. A GUIDE TO better POSTAL SERVICES PROCUREMENT

A Guide to Better Postal Services Procurement. A GUIDE TO better POSTAL SERVICES PROCUREMENT A Guide to Better Postal Services Procuremet A GUIDE TO better POSTAL SERVICES PROCUREMENT itroductio The NAO has published a report aimed at improvig the procuremet of postal services i the public sector

More information

InventoryControl. The Complete Inventory Tracking Solution for Small Businesses

InventoryControl. The Complete Inventory Tracking Solution for Small Businesses IvetoryCotrol The Complete Ivetory Trackig Solutio for Small Busiesses Regular Logo 4C Productivity Solutios for Small Busiesses Logo Outlie Get i cotrol of your ivetory with Wasp Ivetory Cotrol the complete

More information

Trustwave Leverages OEM Partnerships to Deepen SIEM Market Penetration

Trustwave Leverages OEM Partnerships to Deepen SIEM Market Penetration Trustwave Leverages OEM Parterships to Deepe SIEM Market Peetratio Accelerated lauch of ew security appliaces delivers reveue growth with assist from UNICOM Egieerig ad Dell OEM Solutios Itroductio Trustwave

More information

Wells Fargo Insurance Services Claim Consulting Capabilities

Wells Fargo Insurance Services Claim Consulting Capabilities Wells Fargo Isurace Services Claim Cosultig Capabilities Claim Cosultig Claims are a uwelcome part of America busiess. I a recet survey coducted by Fulbright & Jaworski L.L.P., large U.S. compaies face

More information

Optimal control of water supply systems

Optimal control of water supply systems Optimal cotrol of water supply systems Cotext OPIR Predictio: improved performace ad optimal water quality The past te years have witessed a tred toward improvig drikig water productio ad distributio by

More information

From Customer Satisfaction to Customer Advocacy

From Customer Satisfaction to Customer Advocacy White Paper From Customer Satisfactio to Customer Advocacy Impact of First Time Resolutio (FTR) o Customer Satisfactio ad Sales Performace Baks ad Fiacial Istitutios are ivestig heavily i customer-cetric

More information

GOOD PRACTICE CHECKLIST FOR INTERPRETERS WORKING WITH DOMESTIC VIOLENCE SITUATIONS

GOOD PRACTICE CHECKLIST FOR INTERPRETERS WORKING WITH DOMESTIC VIOLENCE SITUATIONS GOOD PRACTICE CHECKLIST FOR INTERPRETERS WORKING WITH DOMESTIC VIOLENCE SITUATIONS I the sprig of 2008, Stadig Together agaist Domestic Violece carried out a piece of collaborative work o domestic violece

More information

Enable Compliance, Quality, and Efficiency in Your Safety Operations with Oracle Argus

Enable Compliance, Quality, and Efficiency in Your Safety Operations with Oracle Argus Eable Compliace, Quality, ad Efficiecy i Your Safety Operatios with Oracle Argus A Complete Solutio for Cliical ad Post-Marketig Safety A costatly evolvig regulatory climate worldwide, the critical focus

More information

CCH Accounts Production

CCH Accounts Production CCH Accouts Productio accouts productio facig today s challeges Preparig statutory ad fiacial accouts is a core activity for our practice, as it is for may professioal firms. Although legislatio ad accoutig

More information

IntelliSOURCE Comverge s enterprise software platform provides the foundation for deploying integrated demand management programs.

IntelliSOURCE Comverge s enterprise software platform provides the foundation for deploying integrated demand management programs. ItelliSOURCE Comverge s eterprise software platform provides the foudatio for deployig itegrated demad maagemet programs. ItelliSOURCE Demad maagemet programs such as demad respose, eergy efficiecy, ad

More information

The ERP Card-Solution. The power, control and efficiency of ERP combined with the ease-of-use and financial benefits of a P-Card.

The ERP Card-Solution. The power, control and efficiency of ERP combined with the ease-of-use and financial benefits of a P-Card. The ERP Card-Solutio Xpoetial - It's about Itegratio The power, cotrol ad efficiecy of ERP combied with the ease-of-use ad fiacial beefits of a P-Card. TM poetial The ERP-Card Solutio P-Cards ad ERP For

More information

The Forgotten Middle. research readiness results. Executive Summary

The Forgotten Middle. research readiness results. Executive Summary The Forgotte Middle Esurig that All Studets Are o Target for College ad Career Readiess before High School Executive Summary Today, college readiess also meas career readiess. While ot every high school

More information

Project Deliverables. CS 361, Lecture 28. Outline. Project Deliverables. Administrative. Project Comments

Project Deliverables. CS 361, Lecture 28. Outline. Project Deliverables. Administrative. Project Comments Project Deliverables CS 361, Lecture 28 Jared Saia Uiversity of New Mexico Each Group should tur i oe group project cosistig of: About 6-12 pages of text (ca be loger with appedix) 6-12 figures (please

More information

leasing Solutions We make your Business our Business

leasing Solutions We make your Business our Business if you d like to discover how Bp paribas leasig Solutios Ca help you to achieve your goals please get i touch leasig Solutios We make your Busiess our Busiess We look forward to hearig from you you ca

More information

Engineering Data Management

Engineering Data Management BaaERP 5.0c Maufacturig Egieerig Data Maagemet Module Procedure UP128A US Documetiformatio Documet Documet code : UP128A US Documet group : User Documetatio Documet title : Egieerig Data Maagemet Applicatio/Package

More information

ContactPro Desktop for Multi-Media Contact Center

ContactPro Desktop for Multi-Media Contact Center CotactPro Desktop for Multi-Media Cotact Ceter CCT CotactPro (CP) is the perfect solutio for the aget desktop i a Avaya multimedia call ceter eviromet. CotactPro empowers agets to efficietly serve customers

More information

CHAPTER 3 DIGITAL CODING OF SIGNALS

CHAPTER 3 DIGITAL CODING OF SIGNALS CHAPTER 3 DIGITAL CODING OF SIGNALS Computers are ofte used to automate the recordig of measuremets. The trasducers ad sigal coditioig circuits produce a voltage sigal that is proportioal to a quatity

More information

The Canadian Council of Professional Engineers

The Canadian Council of Professional Engineers The Caadia Coucil of Professioal Egieers Providig leadership which advaces the quality of life through the creative, resposible ad progressive applicatio of egieerig priciples i a global cotext Egieerig

More information

Data Center Ethernet Facilitation of Enterprise Clustering. David Flynn, Linux Networx Orlando, Florida March 16, 2004

Data Center Ethernet Facilitation of Enterprise Clustering. David Flynn, Linux Networx Orlando, Florida March 16, 2004 Data Ceter Etheret Facilitatio of Eterprise Clusterig David Fly, Liux Networx Orlado, Florida March 16, 2004 1 2 Liux Networx builds COTS based clusters 3 Clusters Offer Improved Performace Scalability

More information

Message Exchange in the Utility Market Using SAP for Utilities. Point of View by Marc Metz and Maarten Vriesema

Message Exchange in the Utility Market Using SAP for Utilities. Point of View by Marc Metz and Maarten Vriesema Eergy, Utilities ad Chemicals the way we see it Message Exchage i the Utility Market Usig SAP for Utilities Poit of View by Marc Metz ad Maarte Vriesema Itroductio Liberalisatio of utility markets has

More information

Diploma in Secretarial Administration

Diploma in Secretarial Administration Istitute of Fiace Diploma i Secretarial Admiistratio Awarded by the Lodo Chamber of Commerce ad Idustry (LCCI) Startig October 2007 ope for erollmet from July 2007 Be smart start right eroll ow! Eglish

More information

Frequently Asked Questions

Frequently Asked Questions Logview Tax Frequetly Asked Questios Logview Tax FAQ Logview Tax FAQ Page 2 What is Logview Tax? Logview Tax is a suite of tax techology software products for Corporate Tax. Logview uderstads the types

More information

optimise your investment in Microsoft technology. Microsoft Consulting Services from CIBER

optimise your investment in Microsoft technology. Microsoft Consulting Services from CIBER optimise your ivestmet i Microsoft techology. Microsoft Cosultig Services from Microsoft Cosultig Services from MICROSOFT CONSULTING SERVICES ca help with ay stage i the lifecycle of adoptig Microsoft

More information

SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES

SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES Read Sectio 1.5 (pages 5 9) Overview I Sectio 1.5 we lear to work with summatio otatio ad formulas. We will also itroduce a brief overview of sequeces,

More information

Silver Lining of Cloud Computing

Silver Lining of Cloud Computing White Paper Silver Liig of Cloud Computig - Key Priciples ad Best Practices CXOs eed to evaluate differet deploymet models, service models ad key characteristics of the cloud to implemet the precise spectrum

More information

An Approach to Fusion CRM Adoption

An Approach to Fusion CRM Adoption White Paper A Approach to Fusio CRM Adoptio May eterprise customers are ivestig time ad effort to evaluate how ext-geeratio Oracle eterprise applicatios will chage iteractios with iteral ad exteral stakeholders.

More information

Comfort for Life CAPT CAPF CHPF CAUF CSCF

Comfort for Life CAPT CAPF CHPF CAUF CSCF Comfort for Life CAPT CAPF CHPF CAUF CSCF Evaporator Coils www.daikicomfort.com "I'm a Daiki Comfort Pro" Those five simple words ca offer you comfort like you ve ever experieced. The words mea that the

More information

Determining the sample size

Determining the sample size Determiig the sample size Oe of the most commo questios ay statisticia gets asked is How large a sample size do I eed? Researchers are ofte surprised to fid out that the aswer depeds o a umber of factors

More information

June 3, 1999. Voice over IP

June 3, 1999. Voice over IP Jue 3, 1999 Voice over IP This applicatio ote discusses the Hypercom solutio for providig ed-to-ed Iteret protocol (IP) coectivity i a ew or existig Hypercom Hybrid Trasport Mechaism (HTM) etwork, reducig

More information

Total Program Management for High-Tech

Total Program Management for High-Tech Total Program Maagemet for High-Tech ORGANIZE Makig Order Out of Chaos Sortig the requiremets, fidig the right resources, aligig the capabilities, ad creatig a cohesive Team Maagemet Effort are dautig

More information

Output Analysis (2, Chapters 10 &11 Law)

Output Analysis (2, Chapters 10 &11 Law) B. Maddah ENMG 6 Simulatio 05/0/07 Output Aalysis (, Chapters 10 &11 Law) Comparig alterative system cofiguratio Sice the output of a simulatio is radom, the comparig differet systems via simulatio should

More information

Systems Design Project: Indoor Location of Wireless Devices

Systems Design Project: Indoor Location of Wireless Devices Systems Desig Project: Idoor Locatio of Wireless Devices Prepared By: Bria Murphy Seior Systems Sciece ad Egieerig Washigto Uiversity i St. Louis Phoe: (805) 698-5295 Email: bcm1@cec.wustl.edu Supervised

More information

Business Process Services. White Paper. Smart Ways to Implement Smart Meters: Using Analytics for Actionable Insights and Optimal Rollout

Business Process Services. White Paper. Smart Ways to Implement Smart Meters: Using Analytics for Actionable Insights and Optimal Rollout Busiess Process Services White Paper Smart Ways to Implemet Smart Meters: Usig Aalytics for Actioable Isights ad Optimal Rollout About the Authors Sumit Joshi Sumit is part of the Aalytics ad Isights team

More information