FOOD FOR THOUGHT Topial Insights from our Sujet Matter Experts DEGREE OF DIFFERENCE TESTING: AN ALTERNATIVE TO TRADITIONAL APPROACHES The NFL White Paper Series Volume 14, June 2014
Overview Differene testing an appear to e the simplest and most straight forward type of sensory testing, ut very often this is far from reality. In many situations, ompanies simply want to know if there is a differene etween two samples. Sounds easy enough, and traditional differene testing methods, suh as triangle or tetrad tests, are appropriate in many ases. They have een well doumented and muh deated. Diffiulties start to arise when testing indiates that there is a differene etween the samples, ut there really isn t, or when there is a elief that the test failed to find a differene that atually exits (statistial Type 1 and Type II errors). Missing a differene in a test or finding a nonexistent differene y statistial hane is the ane of many researhers. Beyond the statistial onern of Is this a real differene or a false positive?, questions also ome up suh as: Is this test or is the panel too sensitive? Not sensitive enough? If there was some inherent variaility in the sample, did that ause a differene? Even when all goes well and the existene of a differene is estalished, the question immediately following is often How ig is the differene? or What is the nature of the differene? While not the answer to all differene testing hallenges, Degree of Differene (DOD) testing an e an extremely useful alternative to traditional approahes. DOD testing an determine if there is a differene, assess how ig the differene is, and, when used in onjuntion with desriptive analysis, an determine the nature of those differenes. It also provides a way to aommodate produt variaility and to ompare multiple produts at one. Appliations DOD testing an e used in most situations where differene testing is desired, ut the following situations are when it may e partiularly appliale: There is a need to ompare multiple test samples to a single referene produt There is a need to understand the nature of the differenes There is a desire to avoid false positives There is ath to ath variaility in the referene produt Samples are non-homogenous Finding the smallest differene possile is not the primary ojetive A desriptive panel is needed to ondut DOD studies. These sensory panelists should e trained to use sales to rate oth the degree of differene etween two samples in a pair and the intensities of sensory attriutes in a single sample. Typially, ten trained panelists evaluate eah pair of samples twie and the resulting data set is statistially analyzed. Example I: DOD Testing Senario A prodution plant would like to redue the ost of their produt. Their team has developed two potential alternative ost redution proesses (prototypes 1 and 2). It is also known that there is some ath to ath variaility in the urrent proess that the ompany onsiders aeptale. The prodution plant would like to determine if either of the two ost redution prototypes is lose enough to the urrent produt to warrant the hange. Page 2 of 5
Overall DOD vs. Method Panelists are served samples in pairs ontaining a referene produt (marked R ) and a test produt (laeled with a 3-digit random numer). Pairs are presented to eah panelist in a randomized order. Within eah produt pair, eah panelist rates how different the test sample is from the referene. In this example, of the urrent prodution is used as a referene. Panelists ompare this referene to oth ost redution prototype samples to see if either is lose enough to the urrent prodution to make a swith. Bath B of the urrent prodution is also ompared to to see if there is a differene etween the two prodution runs and is ompared to itself to get a aseline DOD sore (Figure 1). Figure 1 Panelists see: Sample identity: (not seen) Comparison: Prototype 2 R vs. Prototype 2 Bath B Current Bath to Bath Variaility Baseline (R vs. R) Blind For most studies, ten panelists evaluate eah produt pair two times, resulting in 20 DOD ratings per sample. This data is statistially analyzed to show how different eah sample is from the referene. A typial output is shown in Figure 2. Prototype 1 R vs. Prototype 1 Results As expeted, the referene sample ( of urrent prodution) shows the lowest degree of differene; it is eing ompared to itself and its DOD sore provides a aseline to ompare the rest of the samples to. All of the other samples tested show statistially higher DOD sores than the referene vs. itself. Prototype 1 is the most different from the sample. Bath B and Prototype 2 show similar DOD sores, demonstrating that Prototype 2 is no more different from the referene than the ath to ath variaility that already exists in their produt. Based on these results, Prototype 2 would e a reasonale sustitution for the urrent produt. 5.0 4.0 3.0 2.0 1.0 0.0 () Figure 2 Produts that share a letter are not signifiantly different at the 95% onfidene level. a Bath B Prototype 1 Prototype 2 If this study had een run as a triangle test, oth of the prototype samples would have een different from the and the proess hange would not have een implemented. Example II: DOD Testing with Attriute Intensity Assessment Senario A yogurt ompany wants to evaluate three alternative suppliers of the strawerry omponent they add to their yogurt to see whih is losest to their urrent supplier. Page 3 of 5
Overall DOD vs. Method Trained sensory panelists partiipate in an orientation session to taste the samples, develop a allot of key sensory attriutes, and anhor the sales. An Overall Degree of Differene sale is inluded on the allot in addition to the sensory attriute intensity sales (Figure 3). Figure 3 Testing proeeds as desried in Example I samples are served in pairs that inlude the referene sample (laeled R ) and one test produt (laeled with a 3-digit random numer). Panelists first rate the Overall Degree of Differene of the test sample vs. the referene, and then rate the intensity of eah of the sensory attriutes for the test sample laeled with the 3- digit numer. Alternatively, in a traditional triangle test, a sample from eah of the three new suppliers would e tested versus the urrent supplier s sample to determine if eah is different from the urrent produt. Results Degree of Differene Results are shown in Figure 4. Supplier 2 is most similar to the urrent supplier referene produt; it is as lose to the referene as the lind referene is to itself. Suppliers 1 and 3 are oth different from the urrent supplier, ut Supplier 3 is most different. In addition to the Degree of Differene results, desriptive analysis on key sensory attriutes provides additional information on the nature of the differenes among the produts (Figure 5). Supplier 2 is similar to the urrent supplier in all attriutes tested. Suppliers 1 and 3 were oth higher than the referene in Total Flavor and Strawerry flavor, ut Supplier 1 was higher in Candy Strawerry and Sweetness, while Supplier 3 was higher in Jammy Strawerry and Sourness. 5.0 4.0 3.0 2.0 1.0 0.0 (Current Supplier) Figure 4 Produts that share a letter are not signifiantly different at the 95% onfidene level. Supplier 1 Supplier 2 Supplier 3 Figure 5 a Based on these results, Supplier 2 s produt ould e used as a sustitute for the urrent supplier. Additionally, sensory feedak an e provided to the other suppliers to aid in the reformulation proess. If a series of triangle tests had een onduted, Supplier 2 would have passed and Suppliers 1 and 3 would have failed versus the urrent supplier. No information on the nature of the differenes would have een availale. Page 4 of 5
Choosing a Differene Test There are many reasonale approahes to sensory differene testing and if maximum disrimination sensitivity is not your primary ojetive, any type of differene test may e an aeptale hoie. When trying to find very small differenes or when there is a usiness need to e quite ertain that a differene annot e found, triangle or tetrad tests are statistially powerful. However, these methods only show if a differene exists etween the referene and eah of the test samples; they do not provide a measure of the magnitude of the differene or the nature of the differene. In many ases, DOD testing provides enefits over traditional differene tests. In addition to allowing for a small variaility in samples to e aeptale as part of normal variation, DOD testing inludes the aility to test multiple variants at one and thus assess if some samples are further from the referene sample than others. DOD testing an also effiiently e omined with sensory attriute testing to determine the magnitude of speifi sensory differenes, providing understanding of the nature of the differenes found. Aout The NFL The National Food Laoratory is a food and everage onsulting and testing firm providing reative, pratial and siene-ased solutions for the following areas: Food Safety and Quality; Produt and Proess Development; and Sensory and Consumer Researh. We reate value for our lients y enaling them to develop ommerially safe, high quality and great tasting foods and everages. For more information aout The National Food Laoratory, please visit us at www.thenfl.om. For more information please ontat: Dawn Chapman at ChapmanD@TheNFL.om, 925.551.4243 Page 5 of 5