Towards Better Pipeline Data Governance J. Tracy Thorleifson Eagle Information Mapping, Inc.
Outline The limitations of data There are known knowns. These are things we know that we know. There are known unknowns. That is to say, there are things that we know we don't know. But there are also unknown unknowns. There are things we don't know we don't know. Donald Rumsfield Lessons from manufacturing process management If you can't describe what you are doing as a process, you don't know what you're doing. W. Edwards Deming Pipelines and Black Swans It s tough to make predictions, especially about the future. Yogi Berra
The $64 Question Unfortunately in the San Bruno accident, we found that the company s underlying records were not accurate... My question is that if your many efforts to improve safety are predicated on identifying risk, and if your baseline understanding of your infrastructure is not accurate, how confident are you that your risks are being assessed appropriately? Deborah Hersman, NTSB Chairman, at the National Pipeline Safety Forum, April 18, 2011
The $Billion Answer Is this a pipe? Is this a pipeline? The map is not the territory. Alfred Korzibski, 1931
The Process of Data Abstraction Your pipeline database isn t the real pipeline The pipeline database is a representation of the pipeline
Common Pitfalls in Digital Pipeline Data Abstraction Source documents that summarize information Alignment sheets summarize pipe data Individual joints of pipe are typically not represented Insufficient detail in source documents Records for older pipelines may simply not contain information we now require Source documents that do not accurately reflect change over time Missing repair records Missing assessment records Insufficient documentation of data provenance Lack of metadata regarding the source of the data
The Goal As PHMSA and NTSB recommended, operators relying on the review of design, construction, inspection, testing and other related data to calculate MAOP or MOP must assure that the records used are reliable. An operator must diligently search, review and scrutinize documents and records, including but not limited to, all as-built drawings, alignment sheets, and specifications, and all design, construction, inspection, testing, maintenance, manufacturer, and other related records. These records shall be traceable, verifiable, and complete. PHMSA Advisory Bulletin ADB-11-01
Information Manufacture The process of converting raw data to refined information is fundamentally a manufacturing process Too often, we approach information creation like skilled artisans Information is crafted, not manufactured Process uniformity is lacking Data validation, verification and clean up is performed as a custom, one off event Reproducibility is dependent on the skill of the practitioner (i.e. the Subject Matter Expert) While results may be acceptable, it s a grossly inefficient way to run a business
Tools for Success Borrowed from Manufacturing Process Management Six Sigma Process improvement through defect reduction and process uniformity Lean Manufacturing Process improvement through elimination of waste Theory of Constraints (TOC) Process improvement through maximization of throughput All concentrate on DEFECT PREVENTION
Lessons from Six Sigma - if there are six standard deviations between the process mean and the nearest specification limit, the process yield is 99.99966% 3.4 defects per million operations Define and document your processes Establish process metrics Data cycle time Data defect incidence Analyze results; improve the process Institute process controls to prevent defects Fail safe data checks to prevent bad data from entering the system
Lessons from Lean Manufacturing Identify and relentlessly eliminate wastes Long data cycle times Bad data Incorporate autonomation (smart automation) in your fail safe checks Computers are lousy at correcting problems, but great at identifying them Utilize the power of GIS Incorporate spatial context into your autonmated fail safes
Lessons from Theory of Constraints Indentify process constraints, address them in priority order Complicated processes are like rate-limited chemical reactions The overall reaction rate is constrained by the slowest reaction step Speed up the slowest reaction step, and the overall reaction rate increases
Document Your Data! The most accurate information is worthless if you don t know where it comes from Make data provenance a priority Treat data like courtroom evidence Document the chain of custody Popular pipeline data models like PODS and the APDM facilitate only record-level history tracking This is necessary, but insufficient Data edits should be tracked at the attribute level The outcome of every decision branch in the data manufacturing process should be recorded
The Problem of Induction, Black Swans, and Thermodynamics The problem of induction (as explored by English philosopher David Hume) During much of the 17th century, an Englishman could seemingly state with confidence, all swans we have seen are white; therefore all swans are white. Black swans were discovered in Australia in 1697 A Black Swan is: Any event, positive or negative, that is highly improbable, and results in nonlinear consequences Black Swans do not conform to Gaussian distributions, but rather obey Pareto (power law) distributions An outlier event; nothing in our past experience convincingly points to its possibility It s the Second Law of Thermodynamics: Sooner or later everything turns to $#!+. Woody Allen
Black Swans and Narrative Fallacy Human beings are incredibly adept at explaining things This leads to an unwarranted confidence in our ability to predict outcomes resulting from complexly interacting phenomena Explanation Prediction Things always become obvious after the fact. Nassim Nicholas Taleb Question: How good are our risk models, really?
Black Swans and Diagnostic Testing Nuclear Cardiac Stress Testing is used to diagnose Coronary Artery Disease (CAD) Sensitivity = 91% Failure to detect disease = 9% In other words, it s about the same as playing Russian Roulette with a revolver that has ten cartridge chambers Specificity = 72% False positives = 28% Utility as a predictor of Acute Coronary Syndrome The current myocardial perfusion imaging toolset has limited sensitivity for screening patients who are at risk for ACS. Question: Is hydrostatic testing a panacea for incomplete pipeline records?
Mitigation vs. Common Sense State-of-the-art shark bite risk mitigation: The Neptunic shark suit Designed to mitigate the effects of unsolicited social interactions with hungry sharks of the bitey variety Chainmail-style protection provides the diver with full body coverage Common sense: Avoid risk DON T SWIM WITH SHARKS!!!
Conclusion Data can never represent the physical world with complete fidelity We don t really know much of what we think we know Information creation is a manufacturing process 1. We don't know what we don't know. 2. If we can't express what we do know numerically, we don't really know much about it. 3. If we don't know much about it, we can't control it. 4. If we can't control it, we are at the mercy of chance. Dr. Mikel J. Harry Black Swans are unpredictable and unavoidable The best you can accomplish is Black Swan robustness Hubris is fatal