The Zen of Data Science Eugene Dubossarsky Chief Data Scientist Principal Founder eugene@contexti.com a1@analystfirst.com @cargomoose
Presentation Summary - Promised -Key concepts, dos and don'ts of Data Science -Science and engineering : very different! - What are Data Scientists for? - Where should Data Science sit in the business? - How should data science be measured, managed, planned? - Starting, nourishing and growing a successful Data Science function in your business skills and experience - Becoming an effective data scientist
CHANGE OF PLAN!
Presentation Summary But Actually More Like... Shameless self promotion Parables Metaphors Abstract Philosophical Stuff Surprises Challenges and Reframes You saying This is relevant to my life how?
Self Promotion. Shameless. Ask me about public and in-house training in: R Data Science Fraud Detection Soft Skills and Communication Skills for Data Analysts Managing Data Analysts
Self Promotion. Shameless. Ask me about: Analyst First (analystfirst.com, # analystfirst) R User Groups (Sydney or Melbourne) Mentoring, Advice, Strategy and Delivery in: Data Science, Analytics, Big Data, Business Intelligence, Predictive Modelling, Machine Learning etc Working for upside pay for performance.
Presentation Summary Tools vs Ideas Science vs Technology Finding vs Building Science and Engineering Engagement Exploration a legitimate, vital and strategic business activity Intelligence a business function Mastery Apprenticeship
The Zen bit The bare essence The kernel of truth The thing that isn't illusion The way (Tao) to enlightenment (Satori) Clarity and simplicity derived from meditation, possibly quite different to everyday experience
Parable 1: Getting Airports Wrong Everybody thinks that this is an airplane:
Parable 1: Getting Airports Wrong Imagine your job is to build an Airport You need to take the design of airplanes in to account. The only problem is:
Parable 1: Getting Airports Wrong This is what is called a fundamental category error. Anything done with this misconception in place will be a waste of time, money and resources. Working around it, and being realistic about the client's expectations is a bit beside the point.
Parable 1: Getting Airports Wrong Most people probably want to focus on the aerodynamics of the airplane as currently conceived, the buzz around technology to support such airplanes and may see this as being business focused, while more fundamental discussions would be seen as negative, academic or too challenging.
Parable 1: Getting Airports Wrong Nevertheless, getting the fundamental issue sorted out would seem to be the first order of business, no matter how abstract, controversial, politically inconvenient or offensive to some quarters, or how many people have built careers managing, selling and practicing in this paradigm.
Parable 1: Getting Airports Wrong Because... Uh.. Donkey?
Why The Parable? There are several fundamental category errors affecting the field of data analytics. We will explore a number of them in this presentation.
So What the Heck is a Data Scientist, anyway? Data Scientist = Hadoop Guy? (or so job ads would have you believe) Guy Who Does Stuff with Data? Guy Who Does Stuff with Lots of Data? Guy Who Does Stuff with Big Data? Guy Who Does Stuff With Big Data That Sounds Cool or Businessy? (And what makes Data Big anyway?)
A Key Distinction : Science and Engineering Is there a difference? What is it? Does it matter and why?
A Key Distinction : Science and Engineering Is there a difference? What is it? Does it matter and why?
Science and Engineering Are in fact direct opposites (complementary, not antagonistic) Skills, work style, personality types, appropriate management frameworks and place in the business are quite different. The confusion needs sorting out.
Science and Engineering (Source:Shane Parrish, farmstreetblog.com, )
Now I've Lost You... That's not realistic - most data scientists are actually engineers by this framework! That sounds too technical, academic or not relevant to business
Now I've Lost You... That's not realistic - most data scientists are actually engineers Yep. That sounds too technical, academic or not relevant to business Maybe, Too Bad and No
Engineering Start with an identified idea, end with a design Build or maintain something to pre-defined parameters Uncertainty is the enemy (time, budget, resources, performance)
Engineering Plans, Timeframes and Specifications, vs ongoing (loosely focused) discussion Delivers Products and pre-determined KPIs. The Unexpected is a (usually unwelcome) exception Works to milestones and a specification Engaged with operational and technical management
Engineers Outcomes are Things (software, products, reports, processes, even businesses) An Engineer may do more or less the same thing many times An Engineer performs projects and manages processes An engineer is managed according to tight requirements
Engineers easier to identify easier to manage easier to understand less stressful to deal with Easier to train more plentiful easier to recruit
Engineers And Data Data is a resource to move and manipulate Focus is on building and maintaining processes that do that Data is a commodity that flows through the system. The focus is on the system.
Science and Scientists Start with reality - derive new insights Uncertainty IS the job outputs and their consequences are unknown ahead of time Projects and processes are anathema, and people who manage them don't help. Explore and Interrogate Data for Insights No two jobs are the same No job can be specified too tightly Findings are inherently uncertain
Scientists and Data Focused on The Data. Tools help but don't feature. Data is complex, an undiscovered country to explore. Data is not a commodity : it is complex, everchanging and information rich
Scientists and Leaders Data is The Last Frontier, where dangers lurk and opportunities abound. The scientist is the guide. Objective is to Tell the Story of the Data, to someone who cares and matters (ideally CEO), preferably as part of an ongoing conversation A buffer between the two does not help
Science and Engineering Scientists help you identify new risks and opportunities, they provide transformational insights. Engineers make transformations tangible Scientists explore Engineers deliver and maintain The personality types are actually quite different
Science and Engineering There is a lot of crossover It is good to be skilled in both Many of the tools used are the same The distinction is not obvious to most outsiders The distinction is crucial
Why the Confusion?. It's all technical, apparently It has the word data in it. Process and predictability is cognitively less onerous than exploration. Also emotionally less onerous. Some vendors like it that way. Much of management likes it that way. Much of management is out of its depth And almost all of HR and recruiting
Science and Engineering Real Business Needs Both Pretend Business only needs Engineering (and maybe not even that) Science is crucial for real competition and risk Science is irrelevant otherwise Engineering is Delivery Science is Intelligence
The Intelligence Function Where Data Science Should Sit in the Business? Absent in most enterprises Present informally in most real businesses A strategic, secret asset not to be bragged about or shared Data is not just structured, electronic, concrete or even conscious
The Intelligence Function Strategic, secret role Trusted, discreet, low-key advisor, mentor, guide (Machiavelli had a bit to say on this) A mix of Mr Spock, James Bond and Steve Jobs Many guises, many names Well understood by militaries at war, and organisations with real challenges, risks and uncertainty Often next in line for CEO
The Intelligence Function Where Data Science Should Sit in the Business Not IT Not Operations Right near the CEO Reporting directly, discreetly, interactively Not managed by Prince2, waterfall or any other project management or Business Analysis methods Lean Startup, real Agile (see Manifesto) and OODA loop much more like it
Data Science and Analytics Today Insights or Process? Tools or Outcomes? Transformation or BAU? Value or Compliance? Asset or Vanity? Engaged or Disengaged? Measured?
Data Science and Analytics Today Insights or Process? Tools or Outcomes? Transformation or BAU? Value or Compliance? Asset or Vanity? Engaged or Disengaged? Measured?
Insights vs Process Insights CANNOT be the same each time. But Much of Analytics can Deriving value from predictive targeting is a repeatable, mechanical process. Deriving value from insights obtained from that same model is not.
Insights vs Process Only one requires a scientist. Only one is valued by businesses that don't have real competitive, environmental and other change pressures.
Data Science and Analytics Today Insights or Process? Tools? Transformation or BAU? Value or Compliance? Asset or Vanity? Engaged or Disengaged? Measured?
Tools and Trinkets Is Hadoop really the most important thing on a data scientist's resume? Why or why not? What is missing?
Data Science and Analytics Today Insights or Process? Tools? Transformation or BAU? Value or Compliance? Asset or Vanity? Engaged or Disengaged? Measured?
Data Science and Analytics Today Insights or Process? Tools or Science? Transformation or BAU? Value or Compliance? Asset or Vanity? Engaged or Disengaged? Measured?
Data Science and Analytics Today Insights or Process? Tools or Science? Transformation or BAU? Value or Compliance? Asset or Vanity? Engaged or Disengaged? Measured?
Data Science and Analytics Today Insights or Process? Tools or Science? Transformation or BAU? Value or Compliance? Vital Asset or Vanity? Engaged or Disengaged? Measured?
Value, Compliance or Vanity? What would happen to the business if the analytics/data science/data mining function disappered overnight? Who would care? Why? Why does the function exist in the business in the first place? Science does not serve vanity well, and is not necessary for compliance.
Data Science and Analytics Today Insights or Process? Tools or Science? Transformation or BAU? Value or Compliance? Vital Asset or Vanity? Leadership Engaged or Disengaged? Measured?
Engagement in Parables Is investing in data analytics like investing in stocks or investing in an education (or gym membership)? If analytics was a taxi, does the CEO think the analytics function are car mechanics, drivers or tour guides, does he know, does he care?
Engagement in Extremes Analytics in a hedge fund Analytics in a compliance function in a bank What are the KPIs? Does the CEO personally care about the insights produced? Can the organisation do without the analytics function? Can the organisation afford the CEO ignoring the analytics function?
Data Science and Analytics Today Insights or Process? Tools or Science? Transformation or BAU? Value or Compliance? Vital Asset or Vanity? Leadership Engaged or Disengaged? Measured?
Measurement How many predictive analytics functions in banking, telco, insurance etc are measured explicitly on improvement in predictive accuracy, with the CEO keeping an eye on this (retention, acquisition, risk, pricing models)? How many know/care about the predictive accuracy of their competitors?
Finding Training and Managing Data Scientists Not Easy
Finding Data Scientists Data Scientists are part engineer, part entrepreneur and part hunter/gatherer outcome focused explorers! ADHD is an asset, personality profile is not typical corporate Communication skills and lateral thinking as important as technical skill Technical skills are DEEEEP, eclectic
Finding Data Scientists Most severely recruiters out of their depth Ditto most HR The best people are un-/under-/misemployed! It takes one to know one
Training Data Scientists Eclectic skill set Hard Skills Stats/Machine Learning/Computing/Psychology Domain expertise Many soft skills Conceptual Communication Science! Agile/Lean Startup/Cynefin/OODA
Training Data Scientists Experience is crucial Mistakes are valuable Apprenticeship is Key! Courses help, but not a substitute. Won't teach the soft skills and conceptual outlook
Managing Data Scientists Yes: Real Agile, Lean Startup, Cynefin, OODA loop No: PRINCE2, Project Management, Business Analysis, Operational Management, the IT function. Yes: someone who is engaged, empowered, interested. No: Just about everyone actually doing this out there...
So Who Needs Data Scientists? Businesses facing real competition, real threats, real uncertainty and real change.
Who Doesn't Really Need Data Scientists? Everyone Else.