MusicVideosand Gastronomificationfor BigDataAnalysis ThomasLevine&BrianAbelson csvsoundsytem http://strataconf.com/strata2014/public/schedule/detail/31767 https://github.com/tlevine/gastronomification big data talk
csvsoundsystem Bigdataconsultingfirmbasedin NewYork Specialties Keepthingssimple Makingthingsthatpeopleunderstand csv standsfor comma separated values
Big data
Voluminous data
Complex data
Voluminous data Complex data
Voluminous data Big data Complex data
rowid ----1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20... price ----750 1398 905 815 724 1180 50 400 3400 400 1200 3995 550 150 700 650 3700 1290 650 950... updated -----1391226689 1391220478 1391230147 1391229252 Voluminous spreadsheet 1391229855 1391230209...
id -1 2 3 4 5 price ----750 1398 905 815 724 updated ---------1391226689 long ----------97.772741-122.68216 1391220478-76.467017 1391230147-105.27003-104.99784 lat --------30.437796 45.52551 42.47831 40.003697 39.745692 zip ----78729 97209 14850 80302 80204 Complex spreadsheet..................
Big spreadsheet
Aggregation High volume spreadsheet Low volume spreadsheet
Dimensionalityreduction Complex spreadsheet Simple spreadsheet
Visualizing big data
GraphsonHadoop
Talktothesepeople
What about complex data?
Data Visualization
Data Visualization
Data Dimensionality reduction Data Visualization
Data Visualization
Data Musicvideos
Data Gastronomification
Tradeoffs: Scalesofmeasurement Ratiodata Intervaldata Ordinaldata Nominaldata Visuals,music, andfoodcanall doeachlevel Ratioand intervalare moredifficult withfood
Other benefits of music videos and gastronomification
Application:Complexdata fornon technicalaudiences Situation: Peopleunderstandbargraphs,butthey areunivariate. Graphscaneasilyrepresentsix variables,butpeopledon'talways understand. Problem:Graphsaretooabstract Solution:Musicandfood
Engagingyoungaudiences http://www.youtube.com/watch?v=jwuenyv1cb0 http://www.youtube.com/watch?v=jwuenyv1cb0
Data drivenculture http://fms.csvsoundsystem.com
Dataguacamole NewYorkCitymathtestscores 32districts 6grades(3rdthrough8th) 7years(2006to2012) Abowlforeachyear Levelsofingredientsbasedon relativetestscoresfordifferent schoolsindifferentgrades
Dataguacamole
Censusspices http://www.backspac.es/r/si56i91do6/census spices on itp
Modelingmobileadclicks Decisivemobileadvertisement targeting(http://decisive.is/) Collectingdataon10%ofall mobileadtraffic Biddingalgorithmusespredictive modelingwithmachinelearning Representingdataastacostuning theirbiddingalgorithms
Decisive jalapeno=wifi (versus3g/4g) cheese=clickedad taco=adimpression meat typeofphone onions=shownad
Classifyinghealthcare eligibility BoozAllenHamilton Systemtodeterminehealthcare eligibilityofpeople Dataset: Eachrecordisaperson. Mostfeaturesaredates. Usingsheetmusic(ourspreadsheet basedsmalldatasolution)todetect incorrectclassifications
ThesearenotBoozAllen'sdata;wecan't showyouthose.
Whatabout realtimebigdata gastronomification?
ThreeV'sof BigData
Voluminous Big Complex data data data
Variety Volume
Variety Volume Velocity
HotKarot (www.hotkarot.cz) OpenSaucetechnology Connectstovariousdatasources Realtime
Ourrealtimesolution Demo
our open source libraries
ddr,ddpy,& sheetmusic
Music software Data software
Music software Data software
Mergingdatasoftware andmusicsoftware ddr:musicsequencerbackedbyr vectorsanddata.frames ddpy:midigeneratorbackedby pandasdataframes. sheetmusic:web basedmusic compositionbackedbygoogle Spreadsheets
geom_taco
Whywewrotegeom_taco Dependenceonhumanexpertslimits gastronomification OpenSauce(http://www.hotkarot.cz) CensusSpices geom_tacousescommodity infrastructure Robust,scalable,inexpensive
geom_taco Ageomforggplot Non visualaesthetics Fill Salsa Guacamole...
End
Jobs Salespersons JavaEngineers TestEngineers jobs@datagastronomification.com
ThomasLevine tlevine@datagastronomification.com BrianAbelson babelson@datagastronomification.com