1 Pennies and Blood Mike Bomar In partial fulfillment of the requirements for the Master of Arts in Teaching with a Specialization in the Teaching of Middle Level Mathematics in the Department of Mathematics. Dr. Gordon Woodward, Advisor July 29
2 If you had three coins, one of which is counterfeit, and the counterfeit coin was a bit heavier than the other 2, one way in which to find which coin was the counterfeit coin would be to weigh two of the coins on a balance scale. If the scale tipped either way then the heavier coin on the scale would be the counterfeit coin. If the scale were perfectly balanced, then the coin we did not weigh would be the counterfeit coin. So if we had three coins, in which one was a counterfeit, it would only take one weighing on the balance scale to find the counterfeit coin. What if we have 9 coins and we know that one coin is a counterfeit coin, and so, is a bit heavier than the other coins? To find the counterfeit coin, we could use the same process. In our first scenario, we divided our one group of three coins into three equal groups (of one coin) and weighed two of those groups. We can do the same process if we have one group of nine coins. We can divide that group of nine coins into three equal groups of three. We would weigh two of those groups on the scale, and if the scale tipped one way or the other, then we would know that the counterfeit coin existed in the heavier group. If the scale did not tip either way and both groups that we weighed were perfectly balanced, then we know that the counterfeit coin existed in the group that we did not weigh. Through this process, we would be able to find in which sub-group of three coins that the counterfeit coin was by using just one weighing. However, this does not find us the counterfeit coin. At this point we have it narrowed down to three possible coins. We can now take the three coins and divide them into three smaller groups of one coin each. We can weigh two of those coins and if there is a heavier coin on the scale, then that coin is the counterfeit coin. If both coins on the scale are equal in weight, then the counterfeit coin is the coin we did not weigh. So, if we had 9 coins, we could find the counterfeit coin in exactly two weighings. In summary, if we had three coins, it would take one weighing to find the counterfeit coin, whereas if we had nine coins, it would take two weighings to find the counterfeit coin. In order for this problem to work out perfectly we would need the ability to divide the number of coins that we have into three equal groups. This would allow us to weigh two of the groups and keep one group off to the side. The groups must have the same number of coins in them so that the only determining factor in one side of the scale
3 being heavier than the other side is that one of the coins is heavier. However, it is not just enough to be able to divide the original group of coins into three equal groups, me must also be able to divide any subgroups of more than one coin into three equal groups. This would allow us continually use this process until we were down one coin in each group. At that point, we would be able to find the counterfeit coin with just one more weighing. In order for the original number of coins to be divisible by three and each subsequent subgroup to be divisible by three, the original group of coins must be a power of three. This means we must be able to write the original number of coins into the form of 3 m, where m is any positive integer. When we had three coins, we could write 3 as 3 1. When we had nine coins, we could write 9 as 3 2. This means that when we divide our original group of coins into three equal groups we would get: 3m 3 = 3m 1. Since 3 m-1 is a power of 3, we know it is divisible by three. So each of the three equal subgroups of 3 m-1 could be divided into three more equal subgroups when we find which one includes the counterfeit coin. This would be: 3m 1 3 = 3m 1 1 = 3 m 2. Again, since this is a power of three, then this subgroup can be divided into three more equal subgroups. Also, to get to a subgroup of 3 m-2, it would take two weighings. So when do we know that we are down to just one more weighing? We are down to just one more weighing when each of the three subgroups consists of just one coin that is, 3 coins. So in order to get 3 m down to 3 we would have to get to 3 m-m. Since 3 m-2 would require 2 weighings, then 3 m-m would require m number of weighings. So when we start with 3 m number of coins, of which one is a counterfeit coin, it would take m number of weighings to find which coin is the counterfeit coin. This held true with 3 and 9 coins. When we had 3 or 3 1 coins, it took 1 weigh. When we had 9 or 3 2 coins, it took 2 weighings. The coin problem worked in powers of 3 because weight was the determining factor. We could weigh two groups at one time and if those two were balanced, the other group held the defect, while if one side of the scale was heavier, the heavier side held the defect. Any characteristic that can be compared, such as an electrical charge, can be compared in this manner. However, in many situations no such comparison works.
4 Suppose a medical biologist has developed a blood test for detecting a certain abnormality in infants. There are only two possible cases: the infant s blood is clear or infected. We could pool the blood into smaller subgroups and test each pool of blood. If a pool tests as infected then we know that at least one of the samples in that pool is infected. If we test a pool and the result is clear, then we know that every sample in that pool is clear. This is different from our coin problem, because we need to test every pool in order to see if it contains the defect. Now suppose that we have 1 samples, and we know that exactly one sample is infected and the rest are clear. We could realistically test all 1 samples to find the infected one. However, that process doesn t seem very cost effective. We would like to find the infected sample with the least amount of tests. So we would pool the blood. Realistically, we could pool the blood in any way that we wanted. We could make two pools, one of 9 samples and another of 1 samples, and we could get lucky and have the infected sample contained in the pool of 1. However, there is only a 1% chance that the infected sample would be in the pool of 1. In order to make our testing more efficient, we would like to have each pool to have the same chance of containing the infected sample. This means that we divide the total number of samples into two equal groups of 5. We no longer need to test any of the samples in the pool that tests clear, and we will need to divide the pool again in order to test the samples that test infected. Here is a diagram that shows a couple of different possibilities for the total number of tests that it would take to determine which sample out of 1 is the infected sample. Note that some subgroups could not be divided evenly, and this was the cause of the different number of outcomes. Test Test Test
5 Test Test By testing in this manner, we can guarantee that the most number of tests that need to be done is seven. It is possible to find the infected sample in fewer tests, in a couple of the examples we found it implementing only six tests. However, if you wanted to find the infected sample in fewer tests it would require you using a different method. This method works in the following way. First, I divide the total number of samples in half and make two pools of all of the samples. It is important that I know that only one sample is infected so that if I test the first pool and find that it is infected, then I do not need to test the second pool. I then continue to divide the infected pools in half and test one of those pools each time until I find a pool with just a sample of one that is infected. When we generalized the counterfeit coin problem, we said that the number of coins had to be of the form 3 m, for some positive integer m, for we were dividing our collective group of coins into three separate piles for weighing. In determining the infected blood samples, using our method, we are dividing the samples into two separate pools, which means that the total number of samples must be of the form 2 m, where m is a positive integer. Looking at the diagram, when we had 1 samples, it took at most 7 tests to find the infected sample. However, it is not possible to get 1 from 2 m if m is a whole number. Since 2 6 = 64 and 2 7 = 128 and because 1 has a prime factor of 5. So, we know 2 6 <1 < 2 7. Actually 1 is about It appears that using this method we can round m up to the next whole number and this will tell us the most tests we would need to conduct in order to find the infected blood sample. Also, the number of tests cannot be any larger then m, since we would need to add good clear blood samples to get
6 2 m samples, in which it would take us then a maximum of m tests to find the infected sample. In general, if we have s samples, such that 2 m 1 < s 2 m, the most number of tests that we would need to conduct is m number of tests. Just like in the counterfeit coin problem, when we have 2 m number of samples and we divide them in half, each pool consists of 2 m-1 number of samples. We then divide the pool of 2 m-1 samples in half, and so, each of the new pools consists of 2 m-2 samples. We could only do this m number of times because at that point each of the new pools would consist of 2 m-m number of samples, which would be 2 or 1 sample. At that point we would be able to tell which sample is infected. Now, since 2 m 1 < s 2 m, it may take fewer tests because we won t always be able to divide our infected pools perfectly in half. However, the greatest number of tests that will be required is m tests, because you could add 2 m-s good samples and complete the testing in m number of tests. The unrealistic part of the example with the 1 samples is that we knew that exactly one sample was infected. In real life it is not likely that only one sample will be infected. This will affect our method, for it is no longer enough to test just one pool for infected samples; we will have to test both pools. If one of the pools tested clear, we know that all of the samples in that pool were clear. So we will look at the scenario that we have n number of samples. We will say that n = 2 m. However, we will say that we have x number of infected samples where x 2 m. Here is an example, where m = 3, so n = 8, and x = 1. I will mark where the infected sample is in the diagram. The shaded regions represent pools that came up as infected after testing. Tests I 1 I 2 & 3 I 4 & 5 I 6 & 7
7 This shows that we would need to perform 7 tests if there is only 1 infected sample. Let s look at a few examples of how the positioning of the infected samples could affect the number of tests it we would need to use if x = 2 and m = 3. Tests I I 1 I I 2 & 3 I I 4 & 5 I I 6 & 7 Tests I I 1 I I 2 & 3 I I 4 & 5 I I 6, 7, 8, & 9 Tests I I 1 I I 2 & 3 I I 4, 5, 6, & 7 I I 8, 9, 1, & 11 So, with x = 2 and m = 3, we can see that, depending on the positioning of the infected samples, it took us anywhere from 7 to 11 tests to find the infected samples. Now I will look at what happens with x = 3 and m = 3.
8 Tests I I I 1 I I I 2 & 3 I I I 4 & 5 I I I 6, 7, 8, & 9 Tests I I I 1 I I I 2 & 3 I I I 4, 5, 6, & 7 I I I 8, 9, 1, 11, 12, & 13 As we can see, for these examples, with x = 3 and m = 3, it would take from 9 to 13 tests. Let s try an example in case x = 4 and m = 3. Tests I I I I 1 I I I I 2 & 3 I I I I 4 & 5 I I I I 6, 7, 8, & 9 Tests I I I I 1 I I I I 2 & 3 I I I I 4, 5, 6, & 7 I I I I 8, 9, 1, & 11 Tests I I I I 1 I I I I 2 & 3 I I I I 4, 5, 6, & 7 I I I I 8, 9, 1, 11, 12, & 13 Tests I I I I 1 I I I I 2 & 3 I I I I 4, 5, 6, & 7 I I I I 8, 9, 1, 11, 12, 13, 14, & 15
9 This shows that with x = 4 and m = 3, we might have to complete anywhere from 9 to 15 tests. Furthermore, 15 tests is the maximum in case m = 3 using this method because in this case we will have tested every possible pool, and this is shown in the last diagram. So, with m = 3, it appears that the least amount of tests we could use with this method is 7 and the most is 15. Now, there were only two cases in which it took 7 tests using this method, whereas the remainder took more then 8. With m = 3 or n = 8, this method might not be the most cost efficient because we could test each test individually which would take only 8 tests and we would still find all of the infected samples. Sometimes it might be more cost efficient to simply just test all of the samples. It would be nice in that case to know that there is a number of tests that we couldn t possibly surpass by knowing the values of x and m. I think it would be best to start from the bottom of our diagram. If we know that there are x number of infected samples, we know that the most number of boxes or samples to test would be 2x. This is true because the tests before the last row consisted of sample sizes of 2, so in order for us to test both of the samples individually, when the two samples were pooled together individually that pool had to have tested infected. Here is a diagram that shows the last few rows of a test. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Now, since with each new set of tests we are dividing the pools by two and we start with 2 m number of samples, we know that we can only divide the pools in half m number of times, as shown from the counterfeit coin example and our example with 1 samples and one infected sample. We know that at each stage of testing there will be at most x infected samples. So, the next stage will have no more then 2x pools to test. Since the most number of stages we can have is m and that we need to test the entire pool to start
10 the process, we know that we couldn t possibly test more 1 + 2mx times. However, this is an overestimate, since according to this expression, each row would take the same number of tests as the last row. So, this does give us a number that is definitely greater than the possible number tests we could do (i.e. it gives us an upper bound). Furthermore, when x = 1, the expression 1 + 2mx gives us the exact number of tests needed using this method. Recall that m is the number of rows we would have after the initial row and each row would have two tests, one with the infected pool and the other with the clear pool. This means that after the initial row there would be 2m number of tests done, when x = 1, then 2mx is just 2m. Then we would add one for the initial test of all of the samples. So, 1 + 2mx is an exact number of tests required using this method when x = 1. In our last example, x represented the number of infected samples. In real life, we won t know that number. However, we might have a good idea. We might expect 1% of a population to be infected, so if we had n number of people in a population, we would expect (.1)n number of people to be infected. So we could call the expected percentage of infected people p. Thus, if n represents the number of people in a population, we expect np to be the number of infected people. This means that if we wanted to know an upper bound for the number of tests performed, we could use the formula 1 + 2mx and replace x with np. So our new formula would be 1 + 2mnp. Also, since n = 2 m, we know that m = log 2 (n). So an upper bound on the number of tests necessary is given by the expression 1 + 2[log 2 (n)]np.
11 Suppose we wanted to know which of the following methods of testing blood samples would be the most cost effective: our method or testing each sample individually. We know that the estimated maximum number of tests t needed using our method can be represented by t = 1 + 2[log 2 (n)]np, and the number tests t needed by Testing Each Sample VS Our Method (p =.1) Testing Each Sample Our Method E+5 2E+5 2E+5 2E+5 3E+5 3E+5 4E+5 4E+5 4E+5 5E+5 5E+5 6E+5 6E+5 6E+5 7E+5 7E+5 8E+5 8E+5 8E+5 9E+5 9E+5 1E+6 1E+6 1E+6 Number of Samples using the method of testing each sample individually would be represented by t = n. We can graph these two equations on the same plane and see at which different values of n which method would take fewer tests to find the infected samples.
12 Testing Each Sample VS Our Method (p =.5) Testing Each Sample Our Method Number of Samples Testing Each Sample VS Our Method (p =.1) Testing Each Sample Our Method Number of Samples
13 Testing Each Sample VS Our Method (p =.15) Testing Each Sample Our Method Number of Samples Testing Each Sample VS Our Method (p =.2) Testing Each Sample Our Method Number of Samples
14 Testing Each Sample VS Our Method (p =.25) Testing Each Sample Our Method Number of Samples As we can see by the graphs, as the percentage of infected samples increases, fewer samples are required before testing each sample becomes the more efficient way of testing. When p =.1, our method was the most efficient method of testing up to about 1 15 samples. When p =.5, we needed to have around 1 samples in order for testing each sample to be the most efficient way of testing. When p =.1, we needed to have around 3 samples in order for testing each sample to be the most efficient way of testing. When p =.15, we needed to have around 7 samples in order for testing each sample to be the most efficient way of testing. When p =.2, we needed to have around 3 samples in order for testing each sample to be the most efficient way of testing. When p =.25, the most efficient way of testing will always be to test each sample since when n = 1 and n = 2 equations were equal. Then for n > 2, the most efficient method was to test each sample. So, it would be important to know about how many infected samples that we have along with knowing the total number of samples, because we could save ourselves a considerable money and time by using the optimal method
15 Let s now suppose that we are working in a laboratory that tests many thousands of samples each day. We are now looking for a specific infection in these samples. We expect that about 2% of the samples will carry the infection. We would like to know which method of testing will be more efficient: either testing every sample once or testing the samples in pools where we divide each infected pool by two until we are down to pools of sample size one. If we look at the chart when p =.2, it shows that at right around 3 samples it will become more efficient to test each sample individually. If we wanted to find out exactly at what point it becomes more efficient we can substitute different values of n into our equation t = 1 + 2[log 2 (n)]np and when t > n, then it will become more efficient to test each sample individually. Also, since you can t have a partial sample I will only use whole numbers for n. So: n 1 + 2[log 2 (n)]n(.2) t [log 2 (1)](1)(.2) [log 2 (2)](2)(.2) [log 2 (3)](3)(.2) [log 2 (4)](4)(.2) [log 2 (5)](5)(.2) 5.64 So, we can see that if we had a sample size of 1, each method requires only one test. With a sample size of 2 or 3, testing each sample individually is not the most efficient method. However, when n is greater than or equal to 4, the most efficient method of testing is to test each sample individually. Furthermore, as soon as 2[log 2 (n)](.2) > 1 our method of testing would require more tests then the method of testing each sample individually. Since in our example we were told that we had thousands of samples each day and that about 2% are infected, we can assume that the most efficient method of testing is to test each sample individually. Another use of finding the minimum number of weighings for testing purposes would be in the case of the Rh-antigen. If we are trying to determine if a person is Rhpositive, one way to detect this is through weight. If the Rh-antigen is present in a person s blood, the sample weighs slightly more. If we had several samples and knew
16 that exactly one of the samples was Rh-negative and wanted to find that sample we could use the same method as the counterfeit coin example. We could consider the total number of samples as 3 m. Then we would divide the entire pool of 3 m samples into three pools of 3 m-1 samples in each pool. We would weigh two of the pools and leave one off to the side. If the scale tips one way or the other, we know that the lighter side of the scale is the pool that contains the Rh-negative sample. If the balance is even, then the pool that we did not weigh is the pool that contains the Rh-negative sample. We would then divide the pool of size 3 m-1 that we had determined to contain the Rh-negative sample into 3 smaller pools all of size 3 m-2 and repeat the weighing process. We then would continue dividing the pools that we had determined to contain the Rh-negative sample into 3 equal pools until we were down to pools of size 3 m-m or 3. At that point we will be able to find the Rh-negative sample with one last weigh. This works out perfectly if our total sample is of size 3 m and m is a whole number. However, what if the integer m is not of the form 3 m? In this case, we would make three pools, two of size 3 n, where n is a whole number and where 2(3 n ) is greater than the remaining samples. We can then weigh the two pools of size 3 n. If one of the sides of the scale is lighter, then that pool contains the sample that is Rh-negative. Since that pool, would be a power of 3, we can continue use the process described in the previous paragraph. If the scale is balanced, then the pool that was left aside is the pool with the Rh-negative sample. This pool would be of size 3 x, where x is a real number. We can then continue with the process described in this paragraph. Eventually, we will find the sample of Rh-negative in a pool that is a size of a power of 3 or we will be left with just one or two samples. At that point, if we were left with just two samples, we can do one more weighing. If we were left with just one sample, that is the Rh-negative sample. However, the way that the Rh-antigen is usually detected is to mix the blood with a chemical. If the blood is Rh-positive then an observable agglutination occurs. Since, the agglutination occurs when the Rh-antigen is present, it would only make sense to test each sample individually. If we consider the entire pool as size 2 m, then divide them into 2 pools of size 2 m-1, and then test each pool, both pools would show the agglutination because each pool would have contained at least one sample of Rh-positive blood. Since there is only one sample in the entire pool that is Rh-negative, then every pool we test
17 would show the agglutination unless every sample in the pool was Rh-negative. The only possibility of finding a sample in which the agglutination will not occur is the pool size of sample one that is the sample of Rh-negative blood. For that same reason, our process of dividing samples of size 3 m into groups of three would not work either. Since there is only one sample in the pool that is Rh-negative, the most efficient way to find that sample would be to test each sample individually. In conclusion, we can see how a simple math problem involving counterfeit coins which some might see as nothing more than a puzzling problem to bring out at parties blossomed into a very practical discussion about efficient blood testing strategies. A laboratory might be able to save a considerable amount of money by employing such a strategy, but as we saw, there are many factors to consider.
18 References Castillo, Joan Joseph. (29). Probability Sampling and Randomization. Retrieved 22 August, 211, from Experiment Resources: Multistage Sampling. (n.d.). In Wikipedia. Retrieved 22 August, 211, from Probability Sampling. (2 April 29). In Statistics Canada. Retrieved 22 August, 211, from Sampling. (28). International Encyclopedia of the Social Sciences. Retrieved 22 August, 211, from Encyclopedia.com: