Best PracDces for Building and Deploying PredicDve Models Over Big Data Module 12: Case Study Matsu Robert Grossman Open Data Group & Univ. of Chicago Collin Benne= Open Data Group October 23, 2012
Zoom Levels Zoom Level 1: 4 images Zoom Level 2: 16 images Zoom Level 3: 64 images Zoom Level 4: 256 images
Build Tile Cache - Mapper Mapper Input Key: Bounding Box Mapper Input Value: (minx = - 135.0 miny = 45.0 maxx = - 112.5 maxy = 67.5) Mapper Output Key: Bounding Box Mapper Output Value: Mapper Output Key: Bounding Box Mapper Output Value: Mapper Output Key: Bounding Box Mapper Output Value: Step 1: Input to Mapper Mapper resizes and/or cuts up the original image into pieces to output Bounding Boxes Mapper Output Key: Bounding Box Mapper Output Value: Mapper Output Key: Bounding Box Mapper Output Value: Mapper Output Key: Bounding Box Mapper Output Value: Mapper Output Key: Bounding Box Mapper Output Value: Step 2: Processing in Mapper Mapper Output Key: Bounding Box Mapper Output Value: Step 3: Mapper Output
Build Tile Cache - Reducer Reducer Key Input: Bounding Box (minx = - 45.0 miny = - 2.8125 maxx = - 43.59375 maxy = - 2.109375) Reducer Value Input: Step 1: Input to Reducer Assemble Images based on bounding box Output to Accumulo Builds up Layers for WMS for various datasets Step 2: Reducer Output
Tiling procedure in detail
Preprocess Satellite Imagery EO- 1 images are provided by NASA as Level 1 images Each band and the metadata are individual files For distributed processing in Hadoop, we need to read all of an image s bands in the same map instance So we serialize the band files into a single file
Image SerializaDon Project Matsu supports two soludons: Regular file, Base64- encoded EnDre file is a single line Hadoop SequenceFile, Base64- encoded Each band is a single line Each approach uses Avro (open source Apache Soiware FoundaDon project) for serializadon.
SerializaDon Approaches Mapper reads every band and specifies which ones are kept. Less efficient, more portable as it does not rely on Hadoop SequenceFile support Mapper specifies which bands to read More efficient, only the bands needed for the analydc are read
Tiling procedure in detail
Map An image is read by a single mapper Actual bands are selected and/or virtual bands created Sent to reducer by geographical Dles
Tiling procedure in detail
Reduce Reducers produce Web Tiles for each zoom level Storage in Accumulo Index: Graphic Tile Timestamp Value: Image Metadata
Building zoomed- out images Reduce step overlays Dles and builds zoomed- out images Four neighboring Dles are combined and shrunken to decrease by one zoom level Process condnues undl one image covers the endre region that the reducer is responsible for (e.g. 1/2 N th of the world) Tdepth- lngindex- latindex parent is depth 1, lng/2, lat/2
Tiling procedure in detail
AnalyDc Modules If an analydc produces a web Dle, then it can piggy back along the web Dling workflow The data generated is only the addidonal bands to be displayed
Embedding MulDple Modules
Example: Algebraic combinadon of spectral bands Some CO2 acdvity follows visible cloud formadons, some doesn t Icelandic volcano in April 2010 (Eyjanallajökull) Visible frame is full of ash clouds CO2 distribudon is non- uniform
Example: Algebraic combinadon of spectral bands Some CO2 acdvity follows visible cloud formadons, some doesn t Icelandic volcano in April 2010 (Eyjanallajökull) Visible frame is full of ash clouds CO2 distribudon is non- uniform Module Code to Create AddiDonal Band: sum1 = 4. sumx = 183. + 184. + 188. + 189. sumxx = 183.**2 + 184.**2 + 188.**2 + 189.**2 sumy = B183 + B184 + B188 + B189 sumxy = 183.*B183 + 184.*B184 + 188.*B188 + 189.*B189 delta = sum1*sumxx - sumx**2 constant = (sumxx*sumy - sumx*sumxy) / delta linear = (sum1*sumxy - sumx*sumy) / delta subtracted = (B185 - (constant + 185.*linear))/2. + (B186 - (constant + 186.*linear))/2.
QuesDons? For the most current version of these slides, please see tutorials.opendatagroup.com
About Open Data Open Data began operadons in 2001 and has built predicdve models for companies for over ten years Open Data provides management consuldng, outsourced analydc services, & analydc staffing For more informadon www.opendatagroup.com info@opendatagroup.com