Expanding the Horizons of Cloud Computing Beyond the Data Center Jon Weissman Department of CS&E University of Minnesota Open Cirrus October 2011
Key Trends Client technology devices: smart phones, tablets, sensors Big data distributed and dynamic Privacy/trust small circles Multiple DCs global services
Minnesota Cloud Research Eye towards cloud evolution Projects Nebula Mobile cloud DMapReduce Proxy cloud Active cloud storage Virtualization
Big Data Trend Big data is distributed earth science: weather data, seismic data life science: GenBank, PubMed health science: GoogleEarth + CDC pandemic data web 2.0: user multimedia blogs everyone is a sensor Cost in moving data to the cloud
Big Data Trend: Nebulas Process data close by fully and/or on-route to the central cloud cost: save time and money privacy (think: patient records, local google doc) Close by network distance trusted peers
Example: Dispersed-Data-Intensive Services Data is geographically distributed Costly, inefficient to move to central location
Example Instance: Blog Analysis blog1 blog2 blog3
Nebula Decentralized, less-managed cloud dispersed storage/compute resources low user cost
Nebula: A New Cloud Model Make the cloud more distributed exploit the rich collection of edge computers volunteers (P2P, @home), commercial (CDNs) enormous computing potential, network diversity lower latency: on demand, native client sandboxing Nebula Central
Example: Blog Analysis blog1 blog2 blog3
Blog Results 140000 120000 100000 Amazon emulator Nebula testbed Time taken (sec) 80000 60000 40000 20000 0 40 80 120 160 240 320 # Blogs
Current Status Prototype running on PlanetLab Chrome browser clients + native client Distributed data-store service Network dashboard service
Nebula Going Forward Organize Nebulas around trusted peers social groups communities of interest local resources Nebula + commercial cloud use the edge opportunistically
Nebula Going Forward Hybrid paradigms early discard fan-in
Mobility Trend: Mobile Cloud
Mobility Trend: Mobile Cloud Mobile users/applications: s-phones, tablets resource limited: power, CPU, memory applications are becoming sophisticated Improve mobile user experience performance, reliability, fidelity tap into the cloud dynamically based on current resource state, preferences, interests, privacy
Cloud Mobile Opportunity Dynamic outsourcing move computation, data to the cloud dynamically User context exploit user behavior to pre-fetch, pre-compute, cache Multi-user sharing discover implicit cloud sharing based on interests, social ties
Outsourcing Partitioning across (Mobile <-> Cloud) local data capture + cloud processing images/video, speech, digital design, aug. reality Server Server Server Server. Server Dalvik cloud end (EC-2) Code repository Proxy Outsourcing Client mobile end (Android). Application Profiler Outsourcing Controller
Experimental Results -Image Response time both WIFI & 3G up to 27 speedup Sharpening Avg. Time Power consumption save up to 9 times 19 Avg. Power
Current Work Optimizations User patterns => speculation, aggregation, data reduction Cloud-side Cross-user patterns => sharing: VM provisioning, data placement, data re-use
Big Data Trend + Multi-DCs: DMapReduce Data Push Map Reduce Input Data Output Data 21
Wide-Area MapReduce Big data is distributed earth science: weather data/seismic data life science: GenBank/PubMed health science: GoogleEarth /pandemic data web 2.0: user multimedia blogs Data in different data-centers Run MapReduce across them Data-flow spanning wide-area networks
Option 1: Local MapReduce MapReduce Job Final Result Data Push (Fast) 23 Data Source (US) Data Center (US) Data Center (EU) Data Push (Slow) Data Source (EU)
Option 2: Global MapReduce MapReduce Job Final Result Data Push (Fast) Data Push (Fast) 24 Data Source (US) Data Center (US) Data Push (Slow) Data Center (EU) Data Push (Slow) Data Source (EU)
Option 3: Distributed MapReduce Combine Results Final Result Data Push (Fast) MapReduce Job MapReduce Job Data Push (Fast) Data Source (US) Data Center (US) Data Center (EU) Data Source (EU) 25
PlanetLab: WordCount (text data) Time in seconds 1800 1600 1400 1200 1000 800 600 400 Local MR Global MR Distributed MR 200 0 Push US Push EU Map Reduce Result Combine Total Distributed MR works best in presence of data aggregation 26
PlanetLab: WordCount (random data) 900 Time in seconds 800 700 600 500 400 300 200 Local MR Global MR Distributed MR 100 0 Push US Push EU Map Reduce Result Combine Total Local MR works best in presence of data ballooning 27
EC-2 Results Time in seconds 1000 900 800 700 600 500 400 300 200 100 0 Local MR WordCount (Text) Distributed MR Push US Push EU Map Reduce Result Combine 300 Total Time in Seconds 900 800 700 600 500 400 300 200 100 0 Sort WordCount (Random) Local MR Distributed MR Push US Push EU Map Reduce Result Combine Total Time in Seconds 250 200 150 100 50 Local MR Distributed MR 28 0 Push US Push EU Map Reduce Result Combine Total
Intelligent Data Placement for Global Services HDFS push local node, same rack, random rack Resource Topology Application Characteristics /DC i /rack A /node X Data placement Scheduling Data expansion factors input->intermediate, α Intermediate->output, β
Experimental Results (Word count) Smart HDFS HDFS
Cloud Evolution Summary mobile users, big data, privacy/trust, global services Our Vision of the Cloud locality of users, data, other clouds/data centers, usercentric behavior