Finding Value and Being Valuable in the Trough of Disillusionment Jordan McIver uk.linkedin.com/in/jmcdatascience February 2016
Agenda Why you should stay awake What will it take Give me a break
Agenda Why you should stay awake
Gartner Hype Cycle - 2014
Gartner Hype Cycle - 2015
Mo Data Mo Problems Incremental Change Brain.Contents %<% rm BS %<% insert Terminology.Consensus Fish Sticks instead of Fish Poles
Opportunity for Them
Opportunity for You 55 per cent of data scien9sts have fewer than three years of experience in the discipline 84% of CIOs believe that their organiza9on can analyze data in real 9me, only 42% of developers agree with that statement h?p://www.sas.com/content/dam/sas/en_gb/image/other1/events/wmagds/datascien9st- survey- report- web%20final.pdf h?ps://voltdb.com/sites/default/files/real- 9me- data- report.pdf
Opportunity for You Wings Shell Gills h?ps://www.dezyre.com/ar9cle/type- a- data- scien9st- vs- type- b- data- scien9st/194
Job Role: Data Scientist The following experience: Working technology business developing new products, at least 4 years experience in developing and delivering analy3cal solu3ons Familiar with at least two industries: Banking & Capital Markets Retail & Consumer Goods U9li9es Telecom Healthcare High Tech Manufacturing Experience in at least two analy9cs applica9ons: Machine Data u9liza9on Internet of Things Process analy9cs Supply Chain Analy9cs Proficiency in R and Python and at least two years working experience with the following tools: Map Reduce (Java or other language) Mahout Hive or Pig Graph Databases A solid understanding of how sobware components can be integrated to form a solu9on architecture, pros and cons of different technologies etc. Key Technical Requirements Strong experience rela9ng to predic3ve modeling, data mining, data explora3on etc. Design and development of modeling data marts (feature engineering) Development of precise requirements SQL Documenta9on & communica9on skills The candidate must have an outstanding academic background, least a 2:1 degree and a Master degree of equivalent in either Maths, Sta3s3cs, Economics, Finance The successful candidate will have at least 4 years experience in a cu?ng edge technology business Previous experience in building and explaining sophis3cated models to senior management and incorpora9ng feedback in model development
Agenda What will it take
"If you just do analy.cs, if you just do scrip.ng, if you just make a model and this model does not go into produc.on then at the end of the day you just did research, but your company is not going to profit" h?p://www.compu9ng.co.uk/ctg/news/2433095/a- lot- of- companies- will- stop- hiring- data- scien9sts- when- they- realise- that- the- majority- bring- no- value- says- data- scien9st
Hypothesis led experimenta.on over predetermined solu.ons Ac.onable response to events over data repor.ng Building produc.on ready prototypes over comprehensive IT strategies Stream processing over rela.onal databases
What do you get. Quicker, cheaper, be?er.. Manage uncertainty.. Business change!!!!!!!! ANALOGIES
Agile Data.. not Data Agility Approaches, tools, ethos, science, engineering, analysis
Agile Data Science hap://www.datasciencemanifesto.org/ Solving problems, not models or algorithms All valida9on of data, hypotheses and performance should be tracked, reviewed and automated Prior to building a model, construct an evalua9on framework with end- to- end business focused acceptance criteria A product needs a pool of measures to evaluate its quality. A single number cannot capture the complexity of reality Even research can be broken down into clearly defined tasks; the smallest of itera9ons should be preferred in acquiring, integra9ng and correc9ng knowledge
Agile Data Architecture Russell Jurney
h?ps://www.linkedin.com/pulse/agile- data- scien9sts- do- scale- sam- savage h?p://columbia- applied- data- science.github.io/pages/lowclass- python- style- guide.html Agile Data Development Cloud EDA vs./into produc9on Notebooks (ipython) vs. code editors (PyCharm, Intellij, Eclipse) R tes?hat, Runit, quickcheck, svunit Con9nuous Integra9on - Jenkins / Hudsons Sam Savage Refactoring TDD/BDD
EXAMPLE!!!!
Proposed Timeline (Using a Test & Learn Process) 0 2 4 8 6 Showcase 2 Weeks
Agenda Give me a break
The first step in extracting features is to look at the data Initial Insights Class 1 and Class 2 are the most difficult to separate (confirmed by classification performance results) Elevation could be important to create new features as structure is evident in some plots Some values are zero / missing Many variable combinations do not indicate any interesting structures
OUT OF DATE..