Nanocubes for Real-Time Exploration of Spatiotemporal Datasets

Similar documents
2.4 Network flows. Many direct and indirect applications telecommunication transportation (public, freight, railway, air, ) logistics

Chapter 13. Network Flow III Applications Edge disjoint paths Edge-disjoint paths in a directed graphs

On the Connection Between Multiple-Unicast Network Coding and Single-Source Single-Sink Network Error Correction

How Much Can Taxes Help Selfish Routing?

Physical Topology Discovery for Large Multi-Subnet Networks

Fortified financial forecasting models: non-linear searching approaches

Optimal Path Routing in Single and Multiple Clock Domain Systems

How has globalisation affected inflation dynamics in the United Kingdom?

Empirical heuristics for improving Intermittent Demand Forecasting

CHAPTER 11 NONPARAMETRIC REGRESSION WITH COMPLEX SURVEY DATA. R. L. Chambers Department of Social Statistics University of Southampton

Heat demand forecasting for concrete district heating system

Process Modeling for Object Oriented Analysis using BORM Object Behavioral Analysis.

A Comparative Study of Linear and Nonlinear Models for Aggregate Retail Sales Forecasting

Chapter 7. Response of First-Order RL and RC Circuits

Appendix A: Area. 1 Find the radius of a circle that has circumference 12 inches.

New Evidence on Mutual Fund Performance: A Comparison of Alternative Bootstrap Methods. David Blake* Tristan Caulfield** Christos Ioannidis*** and

Cross-sectional and longitudinal weighting in a rotational household panel: applications to EU-SILC. Vijay Verma, Gianni Betti, Giulio Ghellini

How To Solve An Uncerain Daa Problem

Formulating Cyber-Security as Convex Optimization Problems

The Application of Multi Shifts and Break Windows in Employees Scheduling

Equity Valuation Using Multiples. Jing Liu. Anderson Graduate School of Management. University of California at Los Angeles (310)

Morningstar Investor Return

Duration and Convexity ( ) 20 = Bond B has a maturity of 5 years and also has a required rate of return of 10%. Its price is $613.

Task is a schedulable entity, i.e., a thread

CHARGE AND DISCHARGE OF A CAPACITOR

A Note on Using the Svensson procedure to estimate the risk free rate in corporate valuation

Formulating Cyber-Security as Convex Optimization Problems Æ

PROFIT TEST MODELLING IN LIFE ASSURANCE USING SPREADSHEETS PART ONE

Signal Rectification

Performance Center Overview. Performance Center Overview 1

11/6/2013. Chapter 14: Dynamic AD-AS. Introduction. Introduction. Keeping track of time. The model s elements

Differential Equations and Linear Superposition

The Transport Equation

Individual Health Insurance April 30, 2008 Pages

Multiprocessor Systems-on-Chips

An approach for designing a surface pencil through a given geodesic curve

Constant Data Length Retrieval for Video Servers with Variable Bit Rate Streams

4. International Parity Conditions

Calculation of variable annuity market sensitivities using a pathwise methodology

Infrastructure and Evolution in Division of Labour

Automatic measurement and detection of GSM interferences

Two-Group Designs Independent samples t-test & paired samples t-test. Chapter 10

Banking, Inside Money and Outside Money

Making a Faster Cryptanalytic Time-Memory Trade-Off

Real-time Particle Filters

Inductance and Transient Circuits

Module 4. Single-phase AC circuits. Version 2 EE IIT, Kharagpur

cooking trajectory boiling water B (t) microwave time t (mins)

INTEREST RATE FUTURES AND THEIR OPTIONS: SOME PRICING APPROACHES

Permutations and Combinations

Chapter 2 Kinematics in One Dimension

17 Laplace transform. Solving linear ODE with piecewise continuous right hand sides

Vector Autoregressions (VARs): Operational Perspectives

Measuring macroeconomic volatility Applications to export revenue data,

Distributing Human Resources among Software Development Projects 1

4 Convolution. Recommended Problems. x2[n] 1 2[n]

Chapter 4: Exponential and Logarithmic Functions

Journal Of Business & Economics Research September 2005 Volume 3, Number 9

Return Calculation of U.S. Treasury Constant Maturity Indices

Chapter 2 Problems. 3600s = 25m / s d = s t = 25m / s 0.5s = 12.5m. Δx = x(4) x(0) =12m 0m =12m

Information Theoretic Evaluation of Change Prediction Models for Large-Scale Software

I. Basic Concepts (Ch. 1-4)

OPTIMAL BATCH QUANTITY MODELS FOR A LEAN PRODUCTION SYSTEM WITH REWORK AND SCRAP. A Thesis

TEMPORAL PATTERN IDENTIFICATION OF TIME SERIES DATA USING PATTERN WAVELETS AND GENETIC ALGORITHMS

Acceleration Lab Teacher s Guide

AP Calculus BC 2010 Scoring Guidelines

Making Use of Gate Charge Information in MOSFET and IGBT Data Sheets

Single-machine Scheduling with Periodic Maintenance and both Preemptive and. Non-preemptive jobs in Remanufacturing System 1

The Role of the Scientific Method in Software Development. Robert Sedgewick Princeton University

Reputation and Social Network Analysis in Multi-Agent Systems

INTRODUCTION TO FORECASTING

Risk Modelling of Collateralised Lending

Principal components of stock market dynamics. Methodology and applications in brief (to be updated ) Andrei Bouzaev, bouzaev@ya.

Capacity Planning and Performance Benchmark Reference Guide v. 1.8

9. Capacitor and Resistor Circuits

Hedging with Forwards and Futures

1 HALF-LIFE EQUATIONS

The Grantor Retained Annuity Trust (GRAT)

ANALYSIS AND COMPARISONS OF SOME SOLUTION CONCEPTS FOR STOCHASTIC PROGRAMMING PROBLEMS

MTH6121 Introduction to Mathematical Finance Lesson 5

Policies & Procedures. I.D. Number: 1071

A Re-examination of the Joint Mortality Functions

Appendix D Flexibility Factor/Margin of Choice Desktop Research

Direc Manipulaion Inerface and EGN algorithms

AP Calculus AB 2010 Scoring Guidelines

Chapter 9 Bond Prices and Yield

CLASSIFICATION OF REINSURANCE IN LIFE INSURANCE

DISTRIBUTED DATA PARALLEL TECHNIQUES FOR CONTENT-MATCHING INTRUSION DETECTION SYSTEMS. G. Chapman J. Cleese E. Idle

Globally-Optimal Greedy Algorithms for Tracking a Variable Number of Objects

Supplementary Appendix for Depression Babies: Do Macroeconomic Experiences Affect Risk-Taking?

Signal Processing and Linear Systems I

Chapter 8: Regression with Lagged Explanatory Variables

Sc i e n c e a n d t e a c h i n g:

INTRODUCTION TO MARKETING PERSONALIZATION. How to increase your sales with personalized triggered s

Mathematics in Pharmacokinetics What and Why (A second attempt to make it clearer)

AP Calculus AB 2013 Scoring Guidelines

Trends in TCP/IP Retransmissions and Resets

The Greek financial crisis: growing imbalances and sovereign spreads. Heather D. Gibson, Stephan G. Hall and George S. Tavlas

Forecasting Sales: A Model and Some Evidence from the Retail Industry. Russell Lundholm Sarah McVay Taylor Randall

LLC Resonant Converter Reference Design using the dspic DSC

Transcription:

Nanocube for RealTime Exploraion of Spaioemporal Daae Lauro Lin, Jame T Kloowki, and arlo Scheidegger Fig 1 Example viualizaion of 210 million public geolocaed Twier po over he coure of a year The daa rucure we propoe enable realime (hee image above were rendered faer han he ypical creen refreh rae) viual exploraion of large, paioemporal, mulidimenional daae The viual encoding buil uing nanocube are wihin a conrollable difference o one rendered by a radiional linear can over he daae They naurally uppor linked navigaion and bruhing, and include choropleh map, ime erie over arbirary region and cale of pace and ime, parallel e, hiogram, and binned caerplo The color cale of he choropleh map i a diverging cale in which blue correpond o being relaively more popular, and red correpond o higher relaive populariy of ndroid device brac onider realime exploraion of large mulidimenional paioemporal daae wih billion of enrie, each defined by a locaion, a ime, and oher aribue re cerain aribue correlaed paially or emporally? re here rend or oulier in he daa? nwering hee queion require aggregaion over arbirary region of he domain and aribue of he daa Many relaional daabae implemen he wellknown daa cube aggregaion operaion, which in a ene precompue every poible aggregae query over he daabae Daa cube are omeime aumed o ake a prohibiively large amoun of pace, and o conequenly require dik orage In conra, we how how o conruc a daa cube ha fi in a modern lapop main memory, even for billion of enrie; we call hi daa rucure a nanocube We preen algorihm o compue and query a nanocube, and how how i can be ued o generae wellknown viual encoding uch a heamap, hiogram, and parallel coordinae plo When compared o exac viualizaion creaed by canning an enire daae, nanocube plo have bounded creen error acro a variey of cale, hank o a hierarchical rucure in pace and ime We demonrae he effecivene of our echnique on a variey of realworld daae, and preen memory, iming, and nework bandwidh meauremen We find ha he iming for he querie in our example are dominaed by nework and uerineracion laencie Index Term Daa cube, Daa rucure, Ineracive exploraion 1 I NTRODUTION daae ge larger, exploraory daa viualizaion become more difficul onider a daae wih a billion enrie We can compue a mall ummary of he daae and viualize he ummary inead of he daae, bu a ncombe famou quare how [], ummarie hemelve canno acerain heir own validiy Summarie migh help, bu in order o underand if ha i he cae, we will ineviably find Lauro Lin i wih T&T Reearch Email: llin@reearchacom Jim Kloowki i wih T&T Reearch Email: jkloow@reearchacom arlo Scheidegger i wih T&T Reearch Email: ccheid@reearchacom Manucrip received 1 March 201; acceped 1 ugu 201; poed online 1 Ocober 201; mailed on 4 Ocober 201 For informaion on obaining reprin of hi aricle, pleae end email o: vcg@compuerorg ourelve having o viualize one billion reidual far a cale goe, we are back o quare one In oher word, daa ummarizaion alone will never olve he problem of cale in exploraory viualizaion viualizaion praciioner, wha hen can we do? Even drawing he imple caerplo i no raighforward If we decide o produce he viualizaion by canning he row of a able, we will eiher need nonrivial parallel rendering algorihm or ignifican ime o produce a drawing Neiher of hee oluion i aracive or cale well wih daae ize Daa cube are rucure ha perform aggregaion acro every poible e of dimenion of a able in a daabae, o uppor quick exploraion [15, 1] Many viualizaion yem are buil on op of daa cube, concreely or concepually Sill, only recenly have reearcher ared o examine daa cube creaion algorihm in he conex of informaion viualizaion [, 18, 21] Daa cube are ofen problemaic in ha hey can ake prohibiively large amoun of memory a he number of dimenion increae In

Five Twee: Locaion and Device 0,1 0,1 1 2 0,1 1,0 01,10 10,01 parenchild (ame dimenion): proper hared o 1 o 2 o o 4 o 5 01,10 01,10 ndroid ndroid ndroid ndroid conen (nex dimenion): proper hared `device( ) = ndroid `device( ) = `paial1 0,1 1,1 `paial2 00,11 01,11 10,11 11,11 00,10 01,10 10,10 11,10 o 1 o 1 o 2 o 1 o 2 o 1 o 2 o 1,1 0,1 1,0 4 5 01,10 10,01 10,10 o 2 o o 1 o 2 o 1 o 2 01,10 0,1 1,0 1,1 o 10,01 11,01 updaed in curren ep dimenion boundary 10,10 0,0 1,0 00,01 01,01 10,01 11,01 00,00 01,00 10,00 11,00 ndroid ndroid ndroid ndroid ndroid ndroid Indexing Schema o 2 o o 1 o 4 o 1 o 2 o 1 o 2 o o 4 o 2 o o 5 o 1 o 4 o 1 o 2 o o 5 o 4 S =[[`paial1, `paial2 ], [`device ]] o 1 o o 2 o 4 o 1 o 2 o o 4 o 5 o 1 o 2 o 5 o Fig 2 n illuraion of how o build a nanocube for five poin [o 1,,o 5 ] under chema S The complee proce i decribed in Secion 4 Secion 4, we how how o conruc a daa cube ha fi in he main memory of a modern lapop compuer or workaion, exending he work of Simani e al [1] In addiion, he query ime o build he viual encoding in which we are inereed will be a mo proporional o he ize of he oupu, which i bounded by he number of creen pixel (wihin a mall facor) Thi i an imporan obervaion: he ime complexiy of a viualizaion algorihm hould ideally be bounded he number of pixel i ouche on he creen Our echnique enable realime exploraory viualizaion on daae ha are large, paioemporal, and mulidimenional ecaue he peed of our daa cube rucure hinge parly on i being mall enough o fi in main memory, we call i a nanocube y realime, we mean query ime on average under a milliecond for a ingle hread running on compuer ranging from lapop, o workaion, o ervercla compuing node (Secion 6) y large, we mean ha he daae we uppor have million o billion of enrie y paioemporal, we mean ha nanocube uppor querie ypical of paial daabae, uch a couning even in a paial region ha can be eiher a recangle covering mo of he world, or a heamap of aciviy in downown San Francico (Secion 41) y he ame oken, nanocube uppor emporal querie a muliple cale, uch a even coun by hour, day, week, or monh over a period of year (Secion 4) Daa cube in general enable he Viual Informaion Seeking Manra [29] of Overview fir, zoom and filer, hen deailondemand by providing ummarie and leing uer drill down by expanding along he waned dimenion Nanocube alo provide overview, filer, zooming, and deailondemand inide he paioemporal dimenion hemelve y mulidimenional, we mean ha beide laiude, longiude, and ime, each enry can have addiional aribue (ee ecion 6) ha can be ued in query elecion and rollup we will how, nanocube lend hemelve very well o building viual encoding which are fundamenal building block of ineracive viualizaion yem, uch a caerplo, hiogram, parallel coordinae plo, and choropleh map In ummary, we conribue: a novel daa rucure ha improve on he curren ae of he ar daa cube echnology o enable realime exploraory viualizaion of mulidimenional, paioemporal daae; algorihm o query he nanocube and build linked and bruhable viual encoding commonly found in viualizaion yem; and cae udie highlighing he rengh and weaknee of our echnique, ogeher wih experimen o meaure i uilizaion of pace, ime, and nework bandwidh 2 RELTED WORK Relaional daabae are o widepread and fundamenal o he pracice of compuing ha hey were a naural arge for informaion viualizaion almo ince he field incepion [20] Mackinlay uomaic Preenaion Tool i he breakhrough reul ha criically conneced he relaional rucure of he daa wih he graphical primiive available for diplay [2] and ulimaely lead o daa cube viualizaion ool like Polari [4, 5] and Show Me [24] Nanocube are pecifically deigned o peed up querie for paioemporal daa cube, and could evenually be ued a a backend for hee ype of applicaion In conra, ome of he work in large daa viualizaion involve hipping he compuaion and daa o a cluer of proceing node While parallelim i an aracive opion for increaing hroughpu, i doe no necearily help achieve low laency, which i eenial for fluid ineracion wih a viualizaion ool a reul, ophiicaed echnique uch a query predicion become neceary [6] Leveraging he enormou power of graphic proceing uni ha alo become popular [25, 21], bu wihou algorihmic change, linear can hrough he daae will ill be oo low for fluid ineracion, even wih GPU noher popular way o cope wih large daae i hrough ampling Saiical ampling can be performed on he daabae backend [26, 1, 10, 14], or on he fronend [11] Sill, he echnique we inroduce wih nanocube can produce reul quickly and exacly (o wihin creen preciion) wihou requiring approximaion, which we believe i preferable In addiion, a Liu e al argue, ampling by ielf i no ufficien o preven overploing, and migh acually mak imporan daa oulier [21] Fekee and Plaian have propoed modificaion of radiional viual encoding which ue he compuer creen more efficienly [1] Thee cale beer wih daae ize, bu neverhele require a raveral of all inpu daa poin ha render he propoal le aracive for larger daae arr e al were among he fir o propoe echnique replacing a caerplo wih an equivalen deniy plo [5]; nanocube enable hee viualizaion a a variey of daae ize and cale areful daa aggregaion [17], hen, appear o be one of he few calable oluion for lowlaency large daa graphic While Elmqvi and Fekee propoe variaion of viualizaion echnique ha include aggregaion a a fircla ciizen [12], in hi paper we how how o iue querie uch ha, a he creen reoluion in which he applicaion i operaing, he reul i indiinguihable (or cloe o) from a complee

1: funcion NNOUE([o 1,o 2,,o n ], S, l ime ) n > 0 2: nano cube NODE( ) New empy node : for i = 1 o n do 4: updaed node /0 5: DD(nano cube, o i, 1, S, l ime, updaed node) 6: end for 7: reurn nano cube 8: end funcion 1: funcion TRILPROPERPTH(roo, [v 1,,v k ]) 2: ack STK( ) New Empy Sack : PUSH(ack, roo) 4: node roo 5: for i = 1 o k do 6: child HILD(node, v i ) 7: if child = null hen 8: child NEWPROPERHILD(node, v i, NODE( )) 9: ele if ISSHREDHILD(node,child) hen 10: child REPLEHILD(node, child, SHLLOWOPY(child)) 11: end if 12: PUSH(ack, child) 1: node child 14: end for 15: reurn ack 16: end funcion 1: funcion SHLLOWOPY(node) 2: node c NODE( ) : SETSHREDONTENT(node c, ONTENT(node)) 4: for v in HILDRENLELS(node) do 5: NEWSHREDHILD(node c, v, HILD(node, v)) 6: end for 7: reurn node c 8: end funcion Fig Peudocode of an algorihm o build nanocube can hrough he daae We noe ha overaggreive aggregaion ielf inroduce poenial dicrepancie beween he viualizaion and he daae, and here are propoal o underand hi [9] We are inereed in bounding he difference beween our viual encoding and a viual encoding ha would ravere he enirey of he daa by he ize of he creen, ie he number of pixel in i Relaed o hi, pixeloriened echnique [19] have been inveigaed However, hee end o focu on he developmen of new viual encoding, while in hi paper we how how o creae he already wellknown and eablihed encoding wih low error, high performance and ineraciviy Our echnique i mo cloely relaed o he work of Simani e al [0, 1, 2] Nanocube improve upon heir work in wo fundamenal direcion Fir, we develop a model for paioemporal daa cube ha exploi unique characeriic of pace and ime o ge a good compromie beween pace uage and efficiency of querie (Secion 421 and 6) Second, we how how hee rucure enable he viualizaion which are common in ineracive ool (Secion 4) There have been recen effor o build daa cube rucure pecifically uied for viualizaion rofiler [] i baed on he clever obervaion ha many querie in ineracive viualizaion are incremenal: auming ha previou reul are available, he reul needed for he nex query can be quickly compued Unforunaely, we do no ee how hi would work for he mulicale querie neceary in a paioemporal eing Ju a recenly, Kandel e al propoed Daavore, a columnoriened daabae ha uppor fa daa cube querie [18], and Liu e al leverage graphic hardware in immen, achieving exremely fa querie over large daa [21] We provide a deailed, direc comparion of nanocube o Daavore and immen in Secion 7 1: procedure DD(roo, o, d, S, l ime, updaed node) 2: [l 1,,l k ] HIN(S, d) : ack TRILPROPERPTH(roo, [l 1 (o),,l k (o)]) 4: child null 6: node POP(ack) 7: updae fale SUMMEDTLETIMESERIES( ) : NODE( ) ) 12: updae rue 15: updae rue 17: updae rue 21: INSERT(ONTENT(node), l ime (o)) 27: child node DT UES Following common pracice, we will call he able in Figure 4 a relaion, i column aribue, i line record, and i enrie value n aggregaion repreen he idea of elecing a cerain group of record from a relaion and ummarizing hi group uing an aggregaion funcion (eg coun, um, max, min) For example, a poible aggregaion for he relaion could be o elec all i record and ummarize hoe uing coun, yielding five a he aggregaion reul If we allow a pecial value ll o be a valid aribue value, we could repreen hi aggregaion a relaion in Figure 4 record ha conain he pecial value ll i an aggregaion record Uing hi noaion, i i eay o underand ome convenional way of decribing aggregaion for a given relaion: GROUP Y, UE, and ROLL UP GROUP Y operaion i one in which a relaion i derived from a bae relaion given a li of aribue and an aggregaion funcion For example, GROUP Y on aribue Device and Language wih he coun aggregaion funcion reul in he relaion in Figure 4 Noe ha for every differen combinaion of value preen in he aribue of a bae relaion, an aggregaion record i added o he reuling relaion In our running example, hee combinaion are (ndroid, en), (, en), and (, ru) The UE operaion i he reul of collecing all poible GROUP Y aggregaion ino a ingle relaion for a given li of aribue (ie 2 n GROUP Y for n inpu aribue) In our running example, he UE for coun on Device and Language i he union of four GROUP Y: on (1) no aribue; on (2) Device only; on () Language only; and (4) on Device and Language, hown in relaion D in Figure 4 Finally, a ROLL UP i a conrained verion of he UE operaion where he order of he inpu aribue i imporan ROLL UP on Device and Language (in hi order) mean he union of GROUP Y on: (1) no aribue; (2) Device; and () Device and Lan

Naural language query c URL coun of all Dela fligh R U R { Dela } R U /where/carrier=dela coun of all Dela fligh in he Midwe R Midwe R { Dela } R U /region/midwe/where/carrier=dela coun of all fligh in 2010 R U D R 2010 /field/carrier/when/2010 imeerie of all Unied fligh in 2009 R U R { Unied } D 2009 /erie/when/2009/where/carrier=unied heamap of Dela fligh in 2010 D ile0 R { Dela } R 2010 /ile/ile0/when/2010/where/carrier=dela Fig 5 implified e of querie uppored by nanocube The column repreen pace;, ime; c, caegory R mean rollup, D mean drilldown The value nex o R or D conain he ube of ha dimenion domain being eleced U repreen he enire domain ( univere ) guage, bu doe no include he GROUP Y on Language only he reul of GROUP Y, UE and ROLL UP can be een a relaion, we can naurally compoe uch operaor (eg a ROLL UP UE) 4 NNOUE: OMPT, SPTIOTEMPORL DT UE Daa viualizaion in a compuer are necearily bounded by diplay ize, and o we would like o be able o quickly collec ube of he daae ha would end up in he ame pixel on he creen However, paioemporal navigaion i inherenly mulicale The ame daa rucure hould uppor quick indexing for a viualizaion over muliple year of ime erie and for drilling down ino one paricular hour or day Similarly, he daa cube hould uppor aggregaion querie over va paial region covering enire coninen, a well a very narrow querie covering only a few ciy block The daabae noion of ROLL UP, in a ene, align nicely wih he noion of Level of Deail For example, if he record of a able (relaion) conain a locaion aribue, one can deign a ROLL UP query whoe reuling relaion encode he ame informaion a he one encoded in a level of deail daa rucure More concreely, uppoe l 1,,l k are aribue compued from he original locaion aribue and yield quadree addree of increaingly higher level of deail (from 1 o k) ROLL UP query on hee (compued) aribue reul in, eenially, he ame informaion a he one conained in a quadree (given ha we are keeping he ame ummary in boh, eg coun) The econd imporan noion in he deign of nanocube i he idea ha we wan o combine aggregaion of independen dimenion a independen level of deail For example we migh wan o know for a whole counry, wha i he paial diribuion of wee generaed by an : coare on he paial dimenion, bu pecific on he device dimenion onverely, we migh wan o know he diribuion of wee (coare on device) in a mall ciy block (fine in pace) In relaional daabae erminology, hi model ha a name: i i a UE of ROLL UP, or a ROLL UP UE Wih he erminology e, we can ae: a nanocube i a daa rucure o efficienly ore and query paioemporal ROLL UP UE eide implemenaion rick, he main difference beween nanocube and previouly publihed pare coaleced daa cube uch a Dwarf cube [0] i in he deign of aggregaion acro paioemporal dimenion (ee Secion 41 and 4) Nex, we preen a formal decripion of he componen ha make up our nanocube index, peudocode for building nanocube, an illuraed example, and how querie are made again our index line Submiion ID: 276 n > 0 py node y Sack E( )) v)) ood comion 421 viualizare pecife package ive viualavailable, d Unformulicale ly, Kandel ppor fa are in im While we anocube e achieve ore daa here ime 5 a relaalue n of record ion funce aggregaummarize If we alrepreen on record Uing hi noaion, i i eay o underand ome convenional way of decribing aggregaion for a given relaion: group by, cube, and rollup group by operaion i one in which a relaion i derived from a bae relaion given a li of aribue and an aggregae funcion For example, group by on aribue Device and Language wih he coun aggregae funcion reul in he relaion in Figure 5 Noe ha for every differen combinaion of value preen in he aribue of a bae relaion an aggregaion record i added o he reuling relaion In our running example (Figure 5) hee combinaion are (ndroid, en), (, en), and (, ru) The cube operaion ion ID: 276 funcion (egcoun, um, max, min) For example, a poible ummarize hoe uing coun In hi cae, five would be he final aggregaion reul If we allow enrie in he record o have a pecial value ll, we could repreen hi aggreaion a he following relaion: record Uing hi noaion, i i eay o underand he convenional way of decribing aggregaion for a given relaion: group by, cube, and rollup group by operaion i one in which a relaion i funcion For example, group by on aribue Device and Language iion ID: 276 f, funcion (egcoun, um, max, min) For example, a poible ummarize hoe uing coun In hi cae, five would be he final aggregaion reul If we allow enrie in he record o have a pecial value ll, we could repreen hi aggreaion a he following relaion: record Uing hi noaion, i i eay o underand he convenional way of decribing aggregaion for a given relaion: group by, cube, and rollup group by operaion i one in which a relaion i funcion For example, group by on aribue Device and Language Noe ha for every differen combinaion of value preen in he aribue of he bae relaion an aggregaion record i added o he reiion ID: 276 f, funcion (egcoun, um, max, min) For example, a poible ummarize hoe uing coun In hi cae, five would be he final aggregaion reul If we allow enrie in he record o have a pecial value ll, we could repreen hi aggreaion a he following relaion: record Uing hi noaion, i i eay o underand he convenional way of decribing aggregaion for a given relaion: group by, cube, and rollup group by operaion i one in which a relaion i funcion For example, group by on aribue Device and Language Noe ha for every differen combinaion of value preen in he aribue of he bae relaion an aggregaion record i added o he re Naural language query c URL coun of all Dela fligh R U R { Dela } R U /where/carrier=dela coun of all Dela fligh in he Midwe R (Midwe) R { Dela } R U /region/(midwe)/where/carrier=dela coun of early fligh in 2010 R U D R (2010) /field/carrier/when/(2010) imeerie of all Unied fligh in 2009 R U R U D (2009) /erie/when/2009/where/carrier=unied heamap of Dela fligh in 2010 D (ile) R { Dela } R (2010) /ile/(ile)/where/carrier=dela Fig 4 implified e of querie uppored by he nanocube daa rucure The column repreen pace;, ime; c, caegory R mean rollup, D mean drilldown The value nex o R or D conain he ube of ha dimenion domain being eleced We ue U o repreen he enire domain ( univere ) Omied here, bu uppored by our rucure, are: he exra parameer for number of ep hroughou he ime region in a imebaed drilldown; muliple caegorie wih eparae rollup and drilldown; ile of variable reoluion uling relaion In our running example hee combinaion are (ndroid, en), (, en), and (, ru) The cube operaion i he reul of collecing all poible group by aggregaion ino a ingle relaion for a given li of aribue In our running example, he cube for coun on Device and Language would be he ame a he union of four group by : on (1) no aribue; on (2) Device only; on () Language only; and (4) on Device and Language (2 n group by where n i he number of inpu aribue): ll ndroid ll 2 ll ll ll ll eu 4 ll ll ru 1 Finally, a roll up i a conrained verion of he cube operaion where he order of he inpu aribue i imporan So a roll up on Device and Language (in hi order) mean he union of group by on: (1) no aribue; (2) Device; and () Device and Language Noe ha he group by on Language only i no par of he roll up he reul of group by, cube and roll up can be een a relaion, we can naurally compoe uch operaion we will decribe nanocube i a pecialized daa rucure o ore and query cube of roll up 4 NNOUES: OMPT, SPTIOTEMPORL ROLLUP UE Daa viualizaion in a compuer are necearily bounded by diplay ize, and o we would like o be able o quickly collec ubpace of he daae ha would end up in he ame pixel on he creen Howindependen level of deail For example we migh wan o know for a whole counry, wha i he paial diribuion of wee gereraed by an : coare on he paial dimenion, bu pecific on he device dimenion; converely we migh wan o know he diribuion of wee (coare on device) in a mall block of a ciy (fine in pace) In relaional daabae erminology, hi model ha a name: i i a cube of rollup, or a rollup cube Now wih he language e up, we can ae: nanocube i a daa rucure o efficienly ore and query paioemporal rollup cube eide implmenaion rick (eg agged poiner, carefully deign of he bi layou of he rucure, pecifically deigned o live in main memory), here i, o he be of our knowledge, a qualiaive difference in nanocube o oher daa rucure like [29] The difference i in wha nanocube ore for each aggregaion which i deeply relaed o paioemporal daae: i ore ime erie in a pare ummed able forma Thi elemen of nanocube i explained in Secion 4 and, canno be canno be efficienly imulaed (memorywie) by previou daarucure In he remainder of hi ecion, we preen a formal decripion of he componen ha make up our nanocube index, peudocode for building nanocube ogeher wih an illuraed example, and how querie are made again our index 41 Definiion Le O be a e of objec labeling funcion ` : O! L aociae a label value o he objec of O We can hink of ` a an aribue in a relaional daabae In connecion wih he level of deail dicuion above, if `1 and `2 are wo labeling funcion for O, we ay `1 i coarer han `2 or ha `2 i finer han `1 if for any wo objec o,o 0 2 O he implicaion `2(o)=`2(o 0 ) ) `1(o)=`1(o 0 ) hold We denoe hi fac by `1 < `2 equence of labeling funcion c =[`1,`,,`k] for objec O i a chain for O if every labeling funcion i coarer han he nex Relaion ggregaion Group y on Device, Language ube on Device, Language Equivalen o Group y on all poible ube of {Device, Language} D Fig 5 ample relaion and i aociaed aggregaion operaor line Submiion ID: 276 n > 0 py node py Sack E( )) v)) ood comion 421 viualizare pecife package ive viualavailable, ed Unformulicale ly, Kandel ppor fa are in im While we anocube e achieve ore daa here ime 5 a relaalue n of record ion funce aggregaummarize If we alrepreen on record Uing hi noaion, i i eay o underand ome convenional way of decribing aggregaion for a given relaion: group by, cube, and rollup group by operaion i one in which a relaion i derived from a bae relaion given a li of aribue and an aggregae funcion For example, group by on aribue Device and Language wih he coun aggregae funcion reul in he relaion in Figure 5 Noe ha for every differen combinaion of value preen in he aribue of a bae relaion an aggregaion record i added o he reuling relaion In our running example (Figure 5) hee combinaion are (ndroid, en), (, en), and (, ru) The cube operaion ion ID: 276 funcion (egcoun, um, max, min) For example, a poible ummarize hoe uing coun In hi cae, five would be he final aggregaion reul If we allow enrie in he record o have a pecial value ll, we could repreen hi aggreaion a he following relaion: record Uing hi noaion, i i eay o underand he convenional way of decribing aggregaion for a given relaion: group by, cube, and rollup group by operaion i one in which a relaion i funcion For example, group by on aribue Device and Language miion ID: 276 f, funcion (egcoun, um, max, min) For example, a poible ummarize hoe uing coun In hi cae, five would be he final aggregaion reul If we allow enrie in he record o have a pecial value ll, we could repreen hi aggreaion a he following relaion: record Uing hi noaion, i i eay o underand he convenional way of decribing aggregaion for a given relaion: group by, cube, and rollup group by operaion i one in which a relaion i funcion For example, group by on aribue Device and Language Noe ha for every differen combinaion of value preen in he aribue of he bae relaion an aggregaion record i added o he re iion ID: 276 f, funcion (egcoun, um, max, min) For example, a poible ummarize hoe uing coun In hi cae, five would be he final aggregaion reul If we allow enrie in he record o have a pecial value ll, we could repreen hi aggreaion a he following relaion: record Uing hi noaion, i i eay o underand he convenional way of decribing aggregaion for a given relaion: group by, cube, and rollup group by operaion i one in which a relaion i funcion For example, group by on aribue Device and Language Noe ha for every differen combinaion of value preen in he aribue of he bae relaion an aggregaion record i added o he re Naural language query c URL coun of all Dela fligh R U R { Dela } R U /where/carrier=dela coun of all Dela fligh in he Midwe R (Midwe) R { Dela } R U /region/(midwe)/where/carrier=dela coun of early fligh in 2010 R U D R (2010) /field/carrier/when/(2010) imeerie of all Unied fligh in 2009 R U R U D (2009) /erie/when/2009/where/carrier=unied heamap of Dela fligh in 2010 D (ile) R { Dela } R (2010) /ile/(ile)/where/carrier=dela Fig 4 implified e of querie uppored by he nanocube daa rucure The column repreen pace;, ime; c, caegory R mean rollup, D mean drilldown The value nex o R or D conain he ube of ha dimenion domain being eleced We ue U o repreen he enire domain ( univere ) Omied here, bu uppored by our rucure, are: he exra parameer for number of ep hroughou he ime region in a imebaed drilldown; muliple caegorie wih eparae rollup and drilldown; ile of variable reoluion uling relaion In our running example hee combinaion are (ndroid, en), (, en), and (, ru) The cube operaion i he reul of collecing all poible group by aggregaion ino a ingle relaion for a given li of aribue In our running example, he cube for coun on Device and Language would be he ame a he union of four group by : on (1) no aribue; on (2) Device only; on () Language only; and (4) on Device and Language (2 n group by where n i he number of inpu aribue): ll ndroid ll 2 ll ll ll ll eu 4 ll ll ru 1 Finally, a roll up i a conrained verion of he cube operaion where he order of he inpu aribue i imporan So a roll up on Device and Language (in hi order) mean he union of group by on: (1) no aribue; (2) Device; and () Device and Language Noe ha he group by on Language only i no par of he roll up he reul of group by, cube and roll up can be een a relaion, we can naurally compoe uch operaion we will decribe nanocube i a pecialized daa rucure o ore and query cube of roll up 4 NNOUES: OMPT, SPTIOTEMPORL ROLLUP UE Daa viualizaion in a compuer are necearily bounded by diplay ize, and o we would like o be able o quickly collec ubpace of he daae ha would end up in he ame pixel on he creen However, paioemporal navigaion i inherenly mulicale The ame daa rucure hould uppor quick indexing for a viualizaion over muliple year of ime erie and for drilling down ino one paricular hour or day Similarly, he daa cube hould uppor aggregaion independen level of deail For example we migh wan o know for a whole counry, wha i he paial diribuion of wee gereraed by an : coare on he paial dimenion, bu pecific on he device dimenion; converely we migh wan o know he diribuion of wee (coare on device) in a mall block of a ciy (fine in pace) In relaional daabae erminology, hi model ha a name: i i a cube of rollup, or a rollup cube Now wih he language e up, we can ae: nanocube i a daa rucure o efficienly ore and query paioemporal rollup cube eide implmenaion rick (eg agged poiner, carefully deign of he bi layou of he rucure, pecifically deigned o live in main memory), here i, o he be of our knowledge, a qualiaive difference in nanocube o oher daa rucure like [29] The difference i in wha nanocube ore for each aggregaion which i deeply relaed o paioemporal daae: i ore ime erie in a pare ummed able forma Thi elemen of nanocube i explained in Secion 4 and, canno be canno be efficienly imulaed (memorywie) by previou daarucure In he remainder of hi ecion, we preen a formal decripion of he componen ha make up our nanocube index, peudocode for building nanocube ogeher wih an illuraed example, and how querie are made again our index 41 Definiion Le O be a e of objec labeling funcion ` : O! L aociae a label value o he objec of O We can hink of ` a an aribue in a relaional daabae In connecion wih he level of deail dicuion above, if `1 and `2 are wo labeling funcion for O, we ay `1 i coarer han `2 or ha `2 i finer han `1 if for any wo objec o,o 0 2 O he implicaion `2(o)=`2(o 0 ) ) `1(o)=`1(o 0 ) hold We denoe hi fac by `1 < `2 equence of labeling funcion c =[`1,`,,`k] for objec O i a chain for O if every labeling funcion i coarer han he nex labeling funcion in he equence: `i < `i+1 Noe how chain are relaed o roll up, we avoid he ame name o no overload more he erm roll up The number of level of a chain i defined by level(c)= c + 1 n indexing chema for objec O coni of a equence of Relaion ggregaion Group y on Device, Language ube on Device, Language Equivalen o Group y on all poible ube of {Device, Language} D Fig 5 ample relaion and i aociaed aggregaion operaor nline Submiion ID: 276 n > 0 py node ) py Sack DE( )) ), v)) good comcion 421 e viualizaure pecifre package ive viuale available, ed Unformulicale ly, Kandel uppor fa ware in im While we nanocube be achieve vore daa (where imre 5 a relavalue n of record aion funcle aggregaummarize If we ald repreen ion record Uing hi noaion, i i eay o underand ome convenional way of decribing aggregaion for a given relaion: group by, cube, and rollup group by operaion i one in which a relaion i derived from a bae relaion given a li of aribue and an aggregae funcion For example, group by on aribue Device and Language wih he coun aggregae funcion reul in he relaion in Figure 5 Noe ha for every differen combinaion of value preen in he aribue of a bae relaion an aggregaion record i added o he reuling relaion In our running example (Figure 5) hee combinaion are (ndroid, en), (, en), and (, ru) The cube operaion iion ID: 276 funcion (egcoun, um, max, min) For example, a poible ummarize hoe uing coun In hi cae, five would be he final aggregaion reul If we allow enrie in he record o have a pecial value ll, we could repreen hi aggreaion a he following relaion: record Uing hi noaion, i i eay o underand he convenional way of decribing aggregaion for a given relaion: group by, cube, and rollup group by operaion i one in which a relaion i funcion For example, group by on aribue Device and Language Noe ha for every differen combinaion of value preen in he aribue of he bae relaion an aggregaion record i added o he rebmiion ID: 276 0 e k e of ey ] i a 1 a e k i ild ily, be e e ve a i funcion (egcoun, um, max, min) For example, a poible ummarize hoe uing coun In hi cae, five would be he final aggregaion reul If we allow enrie in he record o have a pecial value ll, we could repreen hi aggreaion a he following relaion: record Uing hi noaion, i i eay o underand he convenional way of decribing aggregaion for a given relaion: group by, cube, and rollup group by operaion i one in which a relaion i funcion For example, group by on aribue Device and Language Noe ha for every differen combinaion of value preen in he aribue of he bae relaion an aggregaion record i added o he re miion ID: 276 e e of ey ] i a 1 a e k i ild iy, e e e ve a funcion (egcoun, um, max, min) For example, a poible ummarize hoe uing coun In hi cae, five would be he final aggregaion reul If we allow enrie in he record o have a pecial value ll, we could repreen hi aggreaion a he following relaion: record Uing hi noaion, i i eay o underand he convenional way of decribing aggregaion for a given relaion: group by, cube, and rollup group by operaion i one in which a relaion i funcion For example, group by on aribue Device and Language Noe ha for every differen combinaion of value preen in he aribue of he bae relaion an aggregaion record i added o he re Naural language query c URL coun of all Dela fligh R U R { Dela } R U /where/carrier=dela coun of all Dela fligh in he Midwe R (Midwe) R { Dela } R U /region/(midwe)/where/carrier=dela coun of early fligh in 2010 R U D R (2010) /field/carrier/when/(2010) imeerie of all Unied fligh in 2009 R U R U D (2009) /erie/when/2009/where/carrier=unied heamap of Dela fligh in 2010 D (ile) R { Dela } R (2010) /ile/(ile)/where/carrier=dela Fig 4 implified e of querie uppored by he nanocube daa rucure The column repreen pace;, ime; c, caegory R mean rollup, D mean drilldown The value nex o R or D conain he ube of ha dimenion domain being eleced We ue U o repreen he enire domain ( univere ) Omied here, bu uppored by our rucure, are: he exra parameer for number of ep hroughou he ime region in a imebaed drilldown; muliple caegorie wih eparae rollup and drilldown; ile of variable reoluion uling relaion In our running example hee combinaion are (ndroid, en), (, en), and (, ru) The cube operaion i he reul of collecing all poible group by aggregaion ino a ingle relaion for a given li of aribue In our running example, he cube for coun on Device and Language would be he ame a he union of four group by : on (1) no aribue; on (2) Device only; on () Language only; and (4) on Device and Language (2 n group by where n i he number of inpu aribue): ll ndroid ll 2 ll ll ll ll eu 4 ll ll ru 1 Finally, a roll up i a conrained verion of he cube operaion where he order of he inpu aribue i imporan So a roll up on Device and Language (in hi order) mean he union of group by on: (1) no aribue; (2) Device; and () Device and Language Noe ha he group by on Language only i no par of he roll up he reul of group by, cube and roll up can be een a relaion, we can naurally compoe uch operaion we will decribe nanocube i a pecialized daa rucure o ore and query cube of roll up 4 NNOUES: OMPT, SPTIOTEMPORL ROLLUP UE Daa viualizaion in a compuer are necearily bounded by diplay ize, and o we would like o be able o quickly collec ubpace of he daae ha would end up in he ame pixel on he creen However, paioemporal navigaion i inherenly mulicale The ame daa rucure hould uppor quick indexing for a viualizaion over muliple year of ime erie and for drilling down ino one paricular hour or day Similarly, he daa cube hould uppor aggregaion querie over va paial region covering enire coninen, a well a very narrow querie covering only a few ciy block independen level of deail For example we migh wan o know for a whole counry, wha i he paial diribuion of wee gereraed by an : coare on he paial dimenion, bu pecific on he device dimenion; converely we migh wan o know he diribuion of wee (coare on device) in a mall block of a ciy (fine in pace) In relaional daabae erminology, hi model ha a name: i i a cube of rollup, or a rollup cube Now wih he language e up, we can ae: nanocube i a daa rucure o efficienly ore and query paioemporal rollup cube eide implmenaion rick (eg agged poiner, carefully deign of he bi layou of he rucure, pecifically deigned o live in main memory), here i, o he be of our knowledge, a qualiaive difference in nanocube o oher daa rucure like [29] The difference i in wha nanocube ore for each aggregaion which i deeply relaed o paioemporal daae: i ore ime erie in a pare ummed able forma Thi elemen of nanocube i explained in Secion 4 and, canno be canno be efficienly imulaed (memorywie) by previou daarucure In he remainder of hi ecion, we preen a formal decripion of he componen ha make up our nanocube index, peudocode for building nanocube ogeher wih an illuraed example, and how querie are made again our index 41 Definiion Le O be a e of objec labeling funcion ` : O! L aociae a label value o he objec of O We can hink of ` a an aribue in a relaional daabae In connecion wih he level of deail dicuion above, if `1 and `2 are wo labeling funcion for O, we ay `1 i coarer han `2 or ha `2 i finer han `1 if for any wo objec o,o 0 2 O he implicaion `2(o)=`2(o 0 ) ) `1(o)=`1(o 0 ) hold We denoe hi fac by `1 < `2 equence of labeling funcion c =[`1,`,,`k] for objec O i a chain for O if every labeling funcion i coarer han he nex labeling funcion in he equence: `i < `i+1 Noe how chain are relaed o roll up, we avoid he ame name o no overload more he erm roll up The number of level of a chain i defined by level(c)= c + 1 n indexing chema for objec O coni of a equence of chain S =[c 1,c 2,,c n ] The dimenion of an indexing chema S i he lengh of i equence of chain and i denoed by dim(s) The Relaion ggregaion Group y on Device, Language ube on Device, Language Equivalen o Group y on all poible ube of {Device, Language} D Fig 5 ample relaion and i aociaed aggregaion operaor Fig 4 ample relaion and i aociaed aggregaion operaor 41 Definiion Le O be a e of objec labeling funcion l : O L aociae a label value o he objec of O We can hink of l a an aribue in a relaional daabae In connecion wih he level of deail dicuion above, if l 1 and l 2 are wo labeling funcion for O, we ay l 1 i coarer han l 2 or ha l 2 i finer han l 1 if for any wo objec o,o O he implicaion l 2 (o) = l 2 (o ) l 1 (o) = l 1 (o ) hold We denoe hi fac by l 1 l 2 equence of labeling funcion c = [l 1,l,,l k ] for objec O i a chain for O if every labeling funcion i coarer han he nex labeling funcion in he equence: l i l i+1 The number of level of a chain i defined by level(c) = c + 1 n indexing chema for objec O coni of a equence of chain S = [c 1,c 2,,c n ] The dimenion of an indexing chema S i he lengh of i equence of chain and i denoed by dim(s) The mulipliciy of a chema S i he produc of i chain number of level: µ(s) = n i=1 level(c i) full aignmen for a equence of labeling funcion [l 1,l 2,,l k ] i a equence of label value [v 1,v 2,,v k ] where v i i a label value under l i ny prefix of a full aignmen for a equence of labeling funcion, including he empy one, i referred o a a parial aignmen Noe ha a full aignmen i alo a parial aignmen ince a equence i alo a prefix of ielf n addre on a chema i a equence of parial aignmen for i chain, more formally, if S = [c 1,c 2,,c n ] i an indexing chema, hen a = [p 1, p 2,, p n ] i an addre of S if p i i a parial aignmen for chain c i The e of poible addree of S i denoed by addr(s) The addree of an objec o under indexing chema S, denoed by addr(o,s) are all he addree in addr(s) whoe parial aignmen are conien wih he label value aociaed o o and i i eay o ee ha he ize of addr(o,s) i alway µ(s) eide a chema S, he definiion of a nanocube require a eparae labeling funcion, l ime : O T, which we refer o a he ime labeling funcion ince we ue i o encode he emporal apec of our daae Thu, a nanocube for objec o 1,,o n i denoed by: NNOUE([o 1,,o n ],S,l ime ) key in a nanocube i any pair (a,) where a addr(s) and correpond o a full aignmen (ee definiion above) and T i a poible ime label If we remove he requiremen of a being a full aignmen, we ay ha pair (a,) i an aggregae key Noe ha every key i alo an aggregae key The e of all poible key and he e of all poible aggregae key of a nanocube are repecively referred o a i key pace, or K, and i aggregae key pace, or K a The ize of he key pace, K, i referred o a i cardinaliy 42 uilding he Index To eae he remaining expoiion, we aume ha a nanocube map an aggregae key o a coun Neverhele, nanocube uppor any kind of ummary ha i an algebra wih weighed um and ubracion Noably, hi include linear combinaion of momen aiic, wih which we can compue mean, variance and covariance The peudocode for building a nanocube i preened in Figure The main idea of he algorihm i for every objec o i o fir find he fine addre of he chema S hi by hi objec, updae he ime erie aociaed wih hi addre and from here on updae in a deepe fir fahion, all coarer addree alo hi by o i Noe ha he conen of he la dimenion of chema S i alway a ime erie and ha i why, in line 21 of DD, we iner he ime label of he curren objec The imporan rick ued i o, when poible, allow for hared link

even 0 1 2 4 5 6 7 8 9 10 11 12 1 ime bin: 1 accum: 2 bin: accum: query/erie/1//4 ar a bin 1, ue bucke of bin each, and collec 4 of hee bucke olve uing bin: 4 accum: 4 bin: 6 accum: 7 bin: 10 accum: 9 Summed Table Spare Repreenaion for oun reul 4 0 2 Fig 6 n illuraion of he ummedarea able varian we ue for our ime erie indexing cheme Every node in Figure 2 ore an array of imeamped coun like he one in hi figure acro dimenion (dahed blue line in Figure 2) and in he ame dimenion (dahed black connecion) In real ue cae hi haring i reponible for ignifican memory aving and enable exploring even larger daae on mall lapop 421 Nanocube Example onider he cenario where an analy i inereed in underanding he paioemporal diribuion of Twier daa (ie wee), including which device (eg, ndroid) people are uing Naural queion o ak include: Which device i more popular for weeing? I one device more popular in cerain area han in oher? How ha hi populariy changed over ime? We illurae he conrucion of a nanocube buil uing Twier daa in Figure 2 For clariy, hi example conain only five wee o 1,,o 5, all ordered in ime hown on he oplef map of Figure 2, he fir wo wee (o 1 and o 2 ) were en from he ea coa of he Unied Sae; he hird wee (o ), from Souh frica; he fourh wee (o 4 ) wa en from ia, and he fifh wee (o 5 ) from uralia Twee o 1 and o 4 were en from an ndroid device while o 2, o, and o 5 were en from an device The labeling funcion l device, l paial1, and l paial2 a well a he chema of hi nanocube, S, are all defined on he lef par of hi figure The labeling l device aign a device o each wee and l paial1 and l paial2 aign a paial label o each wee The wee label given by l paial1 and l paial2 are eenially addree in a quadree pariion of a quare Noe ha l paial1 i coarer han l paial2 The righ par of Figure 2 preen inermediae nanocube generaed by NNOUE (Figure ) afer each wee i inered 4 Querying he ube Nanocube uppor hree diinc dimenion ype, which are alway ravered in a fixed order: paial, caegorical, and finally emporal efore decribing querie for each of hee pecific dimenion ype, we fir illurae how imple querie are conduced on nanocube uing an example Recall ha he end reul of he query will be o reurn precompued aggregae acro one or more dimenion In Figure 2(5), aume we are inereed in he coun of all wee ha occurred in he norhwe quadran of he world, regardle of he device ype and ime The aggregae key k a = ((p 1, p 2 ),) for hi query coni of: (1) he parial aignmen for he norhwe quadran in he paial dimenion: p 1 = [0,1]; (2) he empy parial pah for he device dimenion p 2 = [] indicaing any device; and () a ime label indicaing any ime Finding he precompued aggregae for a given aggregae key i called a imple query In hi example, we ar a he opmo node and ravere all black parenchild link decribed in he parial aignmen p 1 : in hi cae only he black [0,1] link We nex cro he dimenion boundary line by ravering he (blue) conen link of he curren node The raveral proce i repeaed for he device dimenion uing he parial aignmen p 2 In hi pecific cae, no rericion are made on he device, and we can jump o he nex dimenion by ravering he conen link hi poin, we reach a Fig 7 Which device i more popular for weeing: (blue) or ndroid (orange)? Thi choropleh map highligh area in which device are more popular baed on a ample of 210M wee When we zoom in o hicago we can oberve omehing no een from he overview diplay: ouh and we of he ciy, ndroid i more popular han leaf node conaining {o 1,o 2 } Since no ime conrain i impoed, he coun of elemen inide he leaf (2) i he anwer for he query Noe ha, for each dimenion, a imple query only ravere a ingle pah of i ree before jumping o he roo node of a ree in he nex dimenion (or o a leaf node which encode ime and i reaed differenly) In general, higher level querie migh ravere muliple pah of a ingle ree, and may alo repor ingle aggregae, muliple aggregae, or even combine aggregae from muliple branche To abrac and claify how a general nanocube query procee a dimenion, we ue he erminology of rollup and drilldown (he ROLL UP relaional operaion i relaed bu ha a differen meaning han he one we inend here) The dimenion ha i he bai of a rollup hould repor a ingle aggregae value a a reul Thi aggregae migh be a ingle exiing aggregae in he nanocube or a combinaion of muliple aggregae from differen branche of ha dimenion drilldown repor aggregae value for muliple branche in ha dimenion In a ingle nanocube query, each dimenion i independenly e o be ued a he bai for eiher a rollup or a drilldown In Figure 5, we provide a e of example querie and heir mapping o he erver query URL (ee Secion 5) I i worh noing ha he order of he d dimenion doe no impac he worcae query runime For example, a marginal barchar of a caegorical dimenion (wih k bar), require O(kd) ime, regardle of he caegory choen or he ordering of he dimenion 41 Spaial Querie In our curren implemenaion, he fir dimenion o be ravered in a nanocube i alway he paial dimenion I i helpful o hink of hi dimenion a being repreened by a radiional quadree [28], where each quadree node i enriched by an exra poiner (conen poiner) ha jump o he nex dimenion of he nanocube If a query mache exacly he region repreened by a quadree node, hen he conen poiner of ha node i he gaeway for all aggregae ha refer preciely o ha region If he query include caegorical rericion (or drilldown), hen hee can be found by ravering down he following caegorical dimenion, a decribed below However, paial region will very rarely mach exacly one node in he quadree; herefore, we ue he radiional region quadree inerecing algorihm o compue he minimal dijoin e of quadree node ha exacly cover he query region [28], and um he reuling rollup acro he node rbirarily haped region are no currenly uppored for paial querie becaue of he addiional complexiy ha i inroduced, bu here i no inrinic barrier in he framework which preven hem from working For paial rollup, we uppor arbirary recangular region For drilldown, we currenly uppor region defined by he iling cheme of mo mapping ervice on he WWW For example, he wide ile in he world in OpenSreeMap [16] ha coordinae

9/11 Fig 8 hiory of merican irline and Dela The ime erie how he weekly percenage of he number of commercial fligh in he Unied Sae fer 9/11 Dela (orange) aw a poiive pike where merican (blue) aw a negaive one The big bump on merican wa he merger wih TW The heamap how he paial hopo of he wo companie couning all fligh afer 9/11 (he ime bar can be dragged and reized o change he conidered ime window for he heamap) (0, 0, 0), while a ile for blocklevel map of downown Lo ngele migh have coordinae (22485, 5242, 17) The fir wo coordinae are ineger addree, and he hird coordinae correpond o he zoom level: going down a zoom level double he reoluion in boh x and y Our paial drilldown are hen pecified by a ile (x, y, z) addre and an addiional ineger reoluion, which denoe how many level o break down pace inide he ile Tradiionally, ile from mapping ervice are quare wih 256 pixel on he ide, which correpond in our cae o a reoluion of 8 Since our paial drilldown reurn an array of coun broken down by laiude and longiude, hey are he bai for paial deniy plo and choropleh map 42 aegorical Querie aegorical dimenion in a nanocube are repreened by fla ree, which alway conain a roo node wih poenially a many children a here are differen value in ha caegory To reric he domain o a cerain value of he caegory, he query engine imply follow he pah down he child of he correponding value aegorical rollup are performed by imply reurning he coun correponding o eiher he oplevel node (in cae of no rericion) or he child node (in cae of a rericion) aegorical drilldown are alo imilarly imple: hey are a pare array of all children wih nonzero coun We noe ha ince caegorical dimenion appear under paial dimenion, anwering paial region rollup wih eiher caegorical rericion or drilldown require combining he caegorical rollup acro all quadree node ha are reached by he region n analogou phenomenon happen for need drilldown acro muliple caegorie For example, he binned caerplo in Figure 11 can be buil direcly from he reul of drilling down in boh day of week and hour of day The recombinable parallel e viualizaion of Figure 1 require a riple breakdown of language, device and applicaion Single caegory drilldown alo rivially enable hiogram plo 4 Temporal Querie To repreen he emporal dimenion, we ue a pare varian of ummedarea able [8] (Figure 6) Each ime erie in a node i ored a a dene, ored array of cumulaive coun, agged by imeamp Wih hi daa rucure, we can compue a emporal rollup of even coun along any coniguou period, uing only wo binary earche: one o find he array elemen wih he lea upper bound of he period beginning, and anoher o find he greae lower bound of he period end The difference beween hee number i he oal number of even in he period emporal drilldown happen imilarly, and we can compue a ime erie wih enrie by performing + 1 binary Fig 9 Two kind of uomer Ticke: Type 1 (Red) and Type 2 (lue) The heamap on he lef map correpond o ime bar, and he one on he righ o ime bar : boh encode he difference beween number of repor of Type 2 and Type 1 in each poin of he map Repor of Type 1 exceed repor of Type 2, bu no everywhere: noice ha he region of Denver i ill blue Zooming ino Denver we ee ha he number of Type 1 repor ha increaed over ime, bu Type 2 ill dominae earche Each deermine he breaking poin in he cumulaive array, and he final value i compued by epwie difference Thi cheme for oring ime enrie i aracive for everal reaon Fir, i enure ha we can ore ime erie of any granulariy wihou requiring a need ree rucure like our paial indexing cheme Second, he running ime i eenially opimal (up o a log n facor), and he algorihm i exremely fa in pracice 5 I MPLEMENTTION We ue a clienerver archiecure for he curren implemenaion of nanocube The erver read he mulidimenional daa, build a nanocube, and hen procee querie on he nanocube from clien applicaion The erver i a ++11 emplaebaed implemenaion which make i eay o plug in differen daa rucure for each dimenion of he nanocube For example, for he Twier daa, we ue a 2d quadree for he paial dimenion (laiude and longiude), and fla ree for each caegorical dimenion (eg language, device, applicaion), and our ummedarea able varian for he ime dimenion The nanocube conrucion algorihm ha no been opimized for peed (reul are included in ecion 6) bu here are everal poible improvemen ha we could make: uing muliple hread, or uing memory pool o avoid he overhead of repeaed memory allocaion and deallocaion Due o he cale of he inpu daa, mo of our effor ha been pen on opimizing memory uage, including opimized librarie for memory allocaion (libcmalloc) and agged poiner, which allow u o ue he 16 mo ignifican bi in a 64bi poiner o quickly idenify differen ype of node in our daa rucure The nanocube erver expoe i PI for querie via HTTP More pecifically, i provide a web ervice hrough which querie can be iued [27] fer he daa cube i buil, he daa rucure are no longer muaed, and o he erver i eaily parallelizable (i alo mean ha nanocube are addonly: hey canno be updaed if a record i removed from he bae relaion) Our implemenaion ue he Mongooe library for handling muliple HTTP reque in eparae hread concurrenly [22] We have buil wo fronend viualizaion clien o query he nanocube erver One clien i wrien in ++ and ue OpenGL for efficien rendering The oher clien i browerbaed and i wrien in Javacrip, HTML5, SVG, WebGL, and D [4] 6 E XPERIMENTS To udy he behavior of nanocube, we colleced ix daae ha ranged in ize from four million record up o over one billion record Each daae include geopaial, emporal, and domainpecific caegorical dimenion wih up o 0 diinc value For all bu he

Fig 10 Highligh of a viual analyi eion of he DR daae, wih 1, 04, 884, 027 record We noiced he differen paern in call volume by ineracing wih he daae and rying differen region and caegory elecion Noice he paern occur a differen paial and emporal cale ynheic daae experimen, we included he geopaial imeerie dimenion, and varied he oher dimenion baed on he daae In he following ecion, we provide a brief overview of each of he daae, followed by an overall ummary of our experimenal reul in ecion 68 For each of he experimen, we paid paricular aenion o how much memory wa required o build and ore he nanocube index, a well a he overall complexiy of he daae ielf, which varied grealy from one o he nex Once he nanocube were conruced, we queried hem uing one or boh of our fronend clien o highligh he eae wih which analy could explore he daa The query ime and bandwidh uage acro all experimen are conien, o we repor hem in aggregae here The mean query ime wa 800µ (le han 1 milliecond) wih a maximum of 12 milliecond The oupu ize per query averaged 5K, wih a maximum ize of 50K (geographical ile dominaed bandwidh uage) Our erver currenly ue no compreion, alhough we plan o uppor ranparen gzip ream encoding The mean number of querie for he ++ clien wa 100 reque per econd The HTML5 clien i much quieer, a around 1 query per econd, ince linked view are only updaed when a bruh i releaed The ++ clien wa deigned for LN, and i bandwidh uage i around 5Mbp, well wihin curren capaciie 61 Twier eween November 2011 and June 2012, we colleced abou 210 million wee ha originaed in he Unied Sae uing Twier public feed which provide a repreenaive ampling of all wee The rae of wee obained averaged abou one million per day The daa wa reamed in he form of JSON objec, from which we exraced he following aribue: laiude and longiude of he device, he ime he wee occurred, he clien applicaion ued, he ype of device, and he language of he wee The caegorical dimenion in our daa (applicaion, device, language) had repecively 4, 5, and 15 diinc value Wih a nanocube buil uing hi daa, we could quickly explore he daa o beer underand he area in which one device i more popular han anoher, where each of he language i mo prevalen, and how ha informaion change over ime (ee Figure 7) 62 irline ommercial Fligh Hiory Thi publicly available daae conain daa for every commercial fligh in he Unied Sae over a 20 year period (19872008) [2, 6] For over 120 million fligh, he record include he cheduled deparure and arrival ime, he acual deparure and arrival ime, he origin and deinaion airpor, he airline, and oher field For hi experimen, we buil our index uing he origin airpor (for laiude and longiude), cheduled deparure ime, he deparure delay, and he airline Thi allow u o anwer querie relaed o overall deparure delay for any airpor, airline, ime of day, or combinaion hereof In Figure 8 we preen an overview on he weekly percenage of oal commercial fligh in he US for a 20 year period of Dela and merican irline

Fig 11 Selecing differen geographical region highligh how differen populaion ineraced wih he righkie ocial nework While in he US and UK here i no ubanial difference beween weekday and weekend raffic, in Japan weekday uage i markedly lower 6 all Deail Record For each cellular phone call, elecommunicaion companie collec informaion abou he call including ime, duraion, and he equence of cell ower ha carried he call Thi informaion i organized ino wha are known a all Deail Record (DR) large US ervice provider (privaely) hared wih u over one billion DR generaed from a one monh period in July 2010 Due o he eniiviy of DR daa, our daa ha been compleely anonymized No peronally idenifiable informaion wa gahered or ued in conducing hi udy To he exen ha any daa wa ued, i wa anonymou and aggregaed daa The nanocube wa buil uing he geopaial emporal daa (of fir cell ower), a well a he duraion (ranformed o a caagorical dimenion) of each call (ee Figure 10) 64 Locaionaed Social Nework The nex daae i alo publicly available, and coni of locaionbaed checkin in he righkie ocial nework colleced by ho e al [7] The daae comprie all daa checkin from he (nowdefunc) webie beween pril 2008 and Ocober 2010 In addiion o laiude, longiude, and he ime of each checkin, we redundanly encoded hour of day and day of week a exra caegorical dimenion, ince we expeced here o be inereing periodic dayoday and weekday v weekend paern (ee Figure 11 and 12) 65 uomer icke Thi daae conain a record of abou 8 million record of cuomer ineracion of a large US ervice provider over a period of 25 year The daae conain laiude, longiude, ime and repor ype (one of eigh caegorie) The ame meaure aken o anonymize DR daa in Secion 6 were ued here In Figure 9, we highligh he ue of nanocube o deec relaive change in caegory in he ime erie plo, and how choropleh map rericed o differen ime region how he change in geographical diribuion of he repor ype 66 SPLOM Thi i a collecion of ynheic daae (each wih five dimenion) deigned by Kandel e al [18] o exercie daa cube echnology (SPLOM and for ScaerPLO Marix, he viual encoding ued o explore he daae in ha work), alo ued by Liu e al [21] To compare reource uage o ha of hee oher propoal, we buil nanocube uing five differen bin ize per dimenion, from 10 o 50 67 Memory Uage To underand he memory requiremen o build a nanocube, i i imporan o remember ha objec are no inered direcly ino he Fig 12 y upporing mulicale ime erie querie, we can explore he righkie checkin frequency o inveigae global rend a well a horlived even The ios clien for righkie wa releaed exacly when he upward pike happened The downward pike wa caued by a global ouage ha laed a few day nanocube, bu raher hrough heir correponding key (ee Figure ) n objec key idenifie he mo pecific bin in he nanocube ha conain ha objec Thu, depending upon he reoluion defined for he dimenion of a nanocube, wo differen objec may or may no be diinguihable For example, if he ime reoluion of a nanocube i one hour, wo objec wih imeamp a 20h10m and 20h50m will boh have have key wih he ame ime label rounded o 20h a reul, new occurrence of key ha were already inered ino a nanocube do no require addiional orage pace Figure 14() how he memory uage growh for he SPLOM daae a we iner from zero o one billion objec ino he five nanocube of increaing bin ize In all cae here i an iniial rapid growh ha quickly flaen ou In he cae of SPLOM 50, he index grew from 0 o 00M wih he fir 200 million objec inerion, bu grew le han 100M larger a a reul of he nex 800 million objec inerion The explanaion for hi behavior i ha, by a characeriic of he ynheic objec generaor (ample from a normal diribuion for each dimenion) a key e of high probabiliy wa quickly generaed making i harder and harder for a new objec wih an uneen key o be generaed Thu, laer in he proce, mo inered objec will no require more memory ince heir key were already inered ino he nanocube We refer o hi phenomena a key auraion In Figure 14(), we preen curve for memory uage and number of key for he DR daae, boh relaive o he final nanocube number To e for a key auraion effec, we excluded he ime dimenion preen in he original daa Once again, we oberve an iniial rapid growh on memory uage explained by he large number of combinaion of cell locaion and call duraion no ye inered Once he bulk of he key correponding o hee combinaion are inered, a relaively mall bu eady rae of new key are inered reflecing a mall bu eady growh in he cell ower infrarucure Similarly defined curve for he Fligh daae are hown in Figure 14() The fir par of hi experimen follow he ame rend a before: rapid iniial growh, followed by a auraion of key and a eady bu much lower growh reflecing he mall rae of new airpor locaion and carrier abou 80M inered fligh (circa 1995), we again oberve a regime of rapid growh, which correpond o a bur of new carrier 68 Performance Summary In Figure 1, we ummarize he relevan informaion for building our nanocube on he previouly decribed daae The number of inpu objec N, he memory requiremen, and he build ime are repored in he fir hree column, while he exac chema ued for each daae