Does Big Data offer Better Solutions for Microbial Food Safety and Quality? Martin Wiedmann Department of Food Science Cornell University, Ithaca, NY E-mail: mw16@cornell.edu
Acknowledgments Helpful discussions (and some slides): Frank Yiannas, Vice President - Food Safety, Walmart Laura Strawn, Asst. Prof., Virginia Tech Dan Kephart, R&D Leader, Food Safety Testing, ThermoFisher Inspirations and ideas from the 2014 IAFP symposium Big Data: Food Safety's Holy Grail or Pandora's Box (L. K. Strawn, M. Wiedmann, and F. Yiannas)
Outline Big data the big picture Big data in food safety an example of big data applications in food DNA fingerprinting and whole genome sequencing Geographic information system (GIS) Temperature monitoring
Big data: an all-encompassing term for any collection of data sets so large and complex that it becomes difficult to process using on-hand data management tools or traditional data processing applications.
http://dashburst.com/infographic/big-data-volume-variety-velocity/
Forbes 2011
The Digital Universe in 2020, IDC (2012)
THE SUPPLY CHAIN IS GLOBAL AND HIGHLY COMPLEX INCREASES THE NEED FOR TESTING PRODUCT CONTENT SAFETY AUTHENTICITY ADULTERATION 66% OF FRUITS & VEGETABLES 80% OF SEAFOOD
Big Data A Change in Thinking
Big data & food production Holy Grail or Pandora's Box?
Microbial foodborne diseases 2011 CDC study estimates 47.8 million cases of gastrointestinal foodborne illnesses 127,000 serious illnesses resulting in hospitalizations 3,037 deaths (range: 1,492 4,983) WHO: Food and water-borne diarrhoeal illnesses present a growing public health problem that claim 2.2m lives annually with 1.9m of these children.
Big Data & Food Safety Similar to other areas, the amount of food safety related data being generated by government, industry, and academia is increasing rapidly Track and Trace monitoring RFID chip technologies Sensor-based technologies (Humidity, Temperature) More testing (targeted and untargeted) Social media Outbreak detection and genome sequencing GIS based data
Case study human listeriosis outbreak
Human listeriosis cases - NYS 1/97-10/98 8 7 6 5 4 3 2 1 0 Jan Mar May Jul Sep Nov Jan Mar Jun Aug Oct
Ribotyping results - Nov 8, 9 pm
Ribotyping results - Nov 8, 12 pm
Epidemic curve for 1/97-2/99 in NYS 8 7 6 1044A Other Ribotypes 5 4 3 2 1 0 Jan Mar May Jul Sep Nov Jan Mar Jun Aug Oct Dec Feb
Similarity Search Results
Conclusions 101 human cases and 21 deaths in 22 US states linked to infection by the same sub-type of Listeria monocytogenes Outbreak traced back to a single specific plant in Michigan Facility had a HACCP plan and was under USDA inspection
Food isolate Human case Human case
Number of Cases Number of Cases Public Health Impact of Molecular Epidemiology 70 60 50 40 30 20 10 0 1993 Western States E. coli O157 Outbreak 726 cases 4 deaths 39 d outbreak detected 1993 1 8 15 22 29 36 43 50 57 64 71 Day of Outbreak Meat recall 2002 Colorado E. coli O157 Outbreak 70 60 50 40 30 20 10 0 outbreak detected 2002 18 d 1 8 15 22 29 36 43 50 57 64 71 Day of Outbreak If only 5 cases of E. coli O157:H7 infections were averted by the recall of ground beef in the Colorado outbreak, the PulseNet system would have recovered all costs for start up and operation for 5 years. (Elbasha et al. Emerg. Infect. Dis. 6:293-297, 2000)
Source? PFGE-XbaI FSL No. Serotype Source FSL S9-121 FSL S9-122 FSL S9-254 FSL S9-255 FSL S9-256 FSL S9-257 FSL S9-258 FSL S9-259 FSL S9-260 FSL S9-261 FSL S9-262 FSL S9-263 FSL S9-264 FSL S9-265 FSL S9-266 FSL S9-267 FSL S9-268 FSL S9-269 FSL S9-270 4,[5],12:i:- 4,[5],12:i:-. Poultry. Poultry
Xbal SpeI Isolates represent Salmonella from pistachios (West Coast), pepper, sausages (East Coast facility, and human cases) Den Bakker et al. 2011. AEM.
Tip-dated maximum clade credibility tree based on SNP data for 47 Montevideo isolates
Big Data Example GIS Environmental sampling effort Remotely-sensed data Statistical modeling
Predictive Risk Maps
Tracking Listeria monocytogenes from planting to harvest Pathogen Hotspots in field over time Yellow-Blue indicates LM density L. monocytogenes hotspots in yellow-green Evidence of spatio-temporal clustering in this field Ripleys K function
Call to action & the future Eliminate handwritten data (digitize) Invest in I.T. systems & solutions, including data analysts Demand better predictive analyses Move from retrospective trouble shooting to prospective problem prevention Use structured & unstructured sources of data Ask questions & question assumptions Train data scientists that can address food related issues Enter public-private partnerships that facilitate use of big data in food safety as well as food production, processing, and distribution