Efficient Recovery of Secrets

Efficient Recovery of Secrets Marcel Fernandez Miguel Soriano, IEEE Senior Member Department of Telematics Engineering. Universitat Politècnica de Catalunya. C/ Jordi Girona 1 i 3. Campus Nord, Mod C3, UPC. 08034 Barcelona. Spain. email: marcelf,soriano @entel.upc.es Abstract In the guessing secrets game defined by Chung, Graham and Leighton [3], player has to unveil a set of ½ secrets that player has chosen from a pool of Æ values. To discover the secrets, player is allowed to ask a series of boolean questions. For each question asked, can adversarially choose one of the secrets but once he has made his choice he must answer truthfully. In this paper we present a solution to the guessing secrets game consisting in an error correcting code equipped with a tracing algorithm that, using the Viterbi algorithm as its underlying routine, efficiently recovers the secrets. 1. Introduction In the original I ve got a secret TV game show [1] a contestant with a secret was questioned by four panelists. The questions were directed towards guessing the secret. A prize money was given to the contestant if the secret could not be guessed by the panel. In this paper we consider a variant of the game, as defined by Chung, Graham and Leighton [3]. In this variant, called guessing secrets, there are two players and. Player draws a subset of secrets from a set of Æ values. Player asks a series of questions in order discover the secrets. Using the same approach as Alon, Guruswami, Kaufman and Sudan discussed in [2], we present a solution to the guessing secrets problem consisting in a (2,2)-separating code. We also design a tracing algorithm that, from the trellis representation of a block code, recovers the secrets using the Viterbi algorithm [6] as its underlying routine. The algorithm discussed is a parallel list decoding Viterbi algorithm [8] that corrects (in terms of list decoding) ½ ½errors, which is one more error than the error correcting bound of the code. This work has been supported in part by the Spanish Research Council (CICYT) Project TIC2002-00818 (DISQET). The problem of guessing secrets is related to several topics in computer science such as separating systems [7], efficient delivery of Internet content [3] and the construction of schemes for the copyright protection of digital data [2]. As a matter of fact, our results can be used as a tracing algorithm for the fingerprinting code in [5]. The paper is organized as follows. In Section 2 a formal description of the game of guessing secrets for the case of secrets is presented. Section 3 gives an overview of the coding theory concepts used throughout the paper and defines (2,2)-separating codes. A first approach to solve the guessing secrets problem using a (2,2)-separating parity check code is given in Section 4. In Section 5, a tracing algorithm that allows to recovers the secrets, using the Viterbi algorithm as its underlying routine, is discussed. Finally, our conclusions are given in Section 6. 2. Guessing two secrets with binary answers In this section we present a formal description of the game of guessing secrets for the case of secrets. In the first part of the game, player draws exactly two secrets Ë ½, from a set of Æ values. Then, player asks a series of boolean questions in order discover the secrets. For each question asked, can adversarially choose a secret among the secrets, but once the choice is made he must answer truthfully. We first note that there s no way to guarantee that player can learn both secrets, since if all replies apply to just one of the two secrets, then cannot learn nothing about the other. Note also, that can never assert that a certain secret is one of s secrets, since can always take three secrets ½ and answer using a majority strategy. In this case, the answer that provides will be feasible for the three sets of secrets ½, ½ and. Using the above reasoning, we see that for a given answer we have the following possible configurations for the sets of secrets: A star configuration, when all pairs of secrets share a common element. A degenerated star configu- 1

ration, when there is a single pair of secrets. And a triangle configuration, when there are three possible disjoint pairs secrets. The solution for the secrets problem will then consist, in finding the appropriate star or triangle configuration for a given sequence of answers. 2.1. Explicit construction of the strategy Following the discussion in [2], we denote the questions in a given strategy as a sequence of Ò boolean functions ½ Æ ¼ ½. For a given secret Ü the sequence of answers to the questions will then be Üµ ½ Üµ Üµ Ò Üµ. Without loss of generality we suppose that ÐÓ Æ is an integer. In this case, using the binary representation for ½ Æ we can redefine as the mapping ¼ ½ ÐÓ Æ ¼ ½ Ò. From this point of view can be seen as an error-correcting code. From now on we will refer to a given strategy using its associated code, and to the sequence of answers to a given secret using its associated codeword. The question now is: which properties an errorcorrecting code must possess in order to solve the guessing secrets problem?. Depending on the sequence of answers, player needs to recover a triangle or a star configuration. In either case, he can use the following strategy. Use the Æ secrets as vertices to construct a complete graph Ã Æ. The pair of secrets ½ µ can then be seen as an edge of Ã Æ. Since we are considering each question as function ½ Æ ¼ ½, the answer induces a partition ½ ¼µ ½ ½µ. If the answer of player to question is ¼ ½ and the pair of secrets chosen by is ½ µ,wehavethat ½ µ ½ µ. Now player can remove all edges within the subgraph of Ã Æ spanned by ½ ½ µ. It follows that from the questions ½ Òµ, that asks, he must be able to remove all edges until he is left with a subgraph that contains no pair of disjoint edges [2]. We now show how the strategy described in the previous paragraph can be accomplished using a certain code. Let ½ µ µ µ and µ be the sequence of answers associated with four distinct secrets ½ and. Note that each sequence will correspond to a codeword of. The questions that asks, should have the following property: for every two disjoint pairs of secrets, there is a question that allows to rule out at least one of the pairs. This implies that there should exist at least one value, ½ Ò, called the discriminating index for which ½ µ µ µ µ. A code with a discriminating index for every two disjoint pairs of codewords, is called a (2,2)-separating code [7]. and will be defined more precisely in Section 3. Moreover, such a code gives a strategy that solves the guessing secrets game. 3. Background on coding theory 3.1. Binary (2,2)-separating codes In this section we give a description of binary (2,2)- separating codes. Let Á Ò be the vector space over Á,then Á Ò is called a code. The field, Á is called the code alphabet. A code is called a linear code if it forms a subspace of Á Ò. The number of nonzero coordinates in Ü is called the weight of Ü and is commonly denoted by Û Üµ. TheHamming distance µ between two words Á Ò Õ is the number of positions where and differ. The minimum distance of, is defined as the smallest distance between two different codewords. If the dimension of the subspace is, and its minimum Hamming distance is, then we call an [n,k,d]-code. A Ò µ Ò matrix H,isaparity check matrix for the code, if is the set of codewords for which H ¼, where ¼ is the all-zero Ò µ tuple. Each row of the matrix is called a parity check equation. A code whose codewords satisfy all the parity check equations of a parity check matrix is called a parity check code. we define the of For any two words a, b in Á Ò Õ descendants µ as µ Ü Á Ò Õ Ü ½ Ò For a code, thedescendant code is defined as: Ë µ If c is a descendant of a and b, then we call a and b parents of c. A code is µ- Ô Ö Ò [7], if for any two disjoint subsets of codewords of size two, and,where, their respective sets of descendants are also disjoint, µ µ. Next corollary from [4] gives a sufficient condition for a linear code to be (2,2)-separating. Corollary 1 ([4]) All linear, equidistant codes are (2,2)- separating. Next proposition, given without proof, shows that the minimum distance of a linear binary equidistant code, is an even number. Proposition 1 Let be an equidistant binary linear parity check code. The minimum distance of is an even number. 3.2. Trellis representation of block codes The contents of this section are based on [9]. For a binary linear block code, a trellis is defined as a graph in which the nodes represent states, and the edges represent transitions between these states. The nodes are 2

grouped into sets Ë, indexed by a time parameter, ¼ Ò. The parameter indicates the depth of the node. The edges are unidirectional, with the direction of the edge going from the node at depth, to the node at depth ½. Each edge is labeled using an element of Á. In any depth, the number of states in the set Ë is at most Ò µ. The states at depth are denoted by,for µ certain values of, ¼ ½ Ò ½. The states will be identified by binary Ò µ-tuples. In other words, if we order all the binary Ò µ-tuples from ¼ to Ò µ ½, then corresponds to the th tuple in the list. Using this order, for each set of nodes Ë, we can associate the set Á that consists of all the integers, such that Ë.Theset of edges incident to node is denoted by Á µ. In the trellis representation of a code, each distinct path corresponds to a different codeword, in which the labels of the edges in the path are precisely the codeword symbols. The correspondence between paths and codewords is one to one, and it is readily seen from the construction process of the trellis, that we now present. The construction algorithm of the trellis of a linear block code, uses the fact that every code word of must satisfy all the parity check equations imposed by the parity check matrix H. In this case, the codewords are precisely the coefficients ½ Ò of the linear combinations of the columns of H, that satisfy ½ ½ Ò Ò ¼ (1) where ¼ is the all zero Ò µ-tuple. Intuitively, the algorithm first constructs a graph, in which all linear combinations of the columns of H are represented by a distinct path. Then removes all paths corresponding to the linear combinations that do not not satisfy (1). 1. Initialization (depth ¼): Ë ¼ ¼ ¼,where ¼ ¼ ¼µ. ¼ 2. Iterate for each depth ¼ ½ Ò ½µ. (a) Construct Ë ½ ¼ ½ ½ Á,using ½ ½ Ð ½ Á and Ð ¼ ½ (b) For every Á, according to 2a: Draw a connecting edge between the node and the nodes it generates at depth ½µ, according to 2a. Label each edge, with the value of Á that generated from ½. 3. Remove all nodes that do not have a path to the all-zero state at depth Ò, and also remove all edges incident to these nodes. According to the convention in 2b, for every edge, we can define the function Ð Ð Ó µ that, given a codeword ½ Ò µ, returns the that generated from ½ There are different paths in the trellis starting at depth 0 and ending at depth Ò, each path corresponding to a codeword. Since the nodes (states) are generated by adding linear combinations of Ò µ-tuples of elements of Á,the number of nodes (states) at each depth is at most Ò µ. 3.3. The Viterbi Algorithm This section provides a brief overview of the Viterbi algorithm. The Viterbi algorithm is a recursive optimal solution to the problem of estimating the state sequence of a discretetime finite-state Markov process observed in memoryless noise [6]. In this scenario, given a sequence of observations, each path of the trellis has an associated length. The VA identifies the state sequence corresponding to the minimum length path from time ¼ to time Ò. The incremental length metric associated with moving from state to state ½,isgivenbyÐ denotes the edge that goes from to. ½ We consider time to be discrete. Using the notation of Section 3.2, the state at time is one of a finite number Á of states, since Ë. In the trellises we deal with in this paper, there is only a single initial state ¼ ¼,anda single final state ¼ Ò. Since the process runs from time 0 to time Ò, the state sequence can be represented by a vector ¼ ¼ ¼ Ò., where Among all paths starting at node ¼ ¼ and terminating at the node, we denote by the path segment with the shortest length. For a given node, the path, is called the survivor path, and its length is denoted by Ä.Note that, Ä Ñ Ò Ä ½ Ð. ½ ½ Due to the structure of the trellis, at any time ½ there are at most Ë ½ survivors, one for each ½.Thekey observation is the following one [6]: the shortest complete path Ò ¼ must begin with one of these survivors, if it did not, but passed through state Ð ½ at time ½, then we could replace its initial segment by Ð ½ to get a shorter path, which is a contradiction. With the previous observation in mind, we see that for any time ½µ, we only need to mantain Ñ survivors Ñ ½ (½ Ñ Á ½, one survivor for each node), and their lengths Ä Ñ ½. In order to move from time ½ to time : we extend the time ½µ survivors, one time unit along their edges in the trellis, this is denoted by ½ µ. ½ 3

compute the new length Ä, of the new extended paths, and for each node (state) we select as the time survivor the extended path with the shortest length. The algorithm proceeds by extending paths and selecting survivors until time Ò is reached, where there is only one survivor left. Viterbi Algorithm. Variables: time index., Á Survivor terminating at. Ä, Á Survivor length. Ä ½ Length of the path ½ µ. ½ Initialization: ¼; ¼ ; ¼ ¼ ¼ arbitrary, ¼, Á ; Ä ¼ ¼; ¼ Ä ½, ¼, Á. Recursion: ½ Òµ for every Ë do for every, such that ½ is defined, do ½ Compute Ä Ä ½ Ð ½ ½ Find Ä Ñ Ò Ä ½ ½ Store the tuple Ä µ Termination: At time Ò the shortest complete path is stored as the survivor ¼ Ò. 4. A strategy using (2,2)-separating codes In this section we give an explicit strategy to solve the guessing secrets game. In Section 2.1, it was shown that that the problem of guessing secrets is reduced to constructing (2,2)-separating codes. This is stated formally in the the following lemma. Lemma 1 ([2]) There exists a (2,2)-separating code ¼ ½ ÐÓ Æ ¼ ½ Ò if and only if there exists a strategy for to solve the 2-secrets guessing problem for a universe size of Æ that uses Ò questions. From Corollary 1, it follows that to construct a (2,2)- separating code, it suffices to construct an equidistant code. Nevertheless, we do not only want an strategy to solve that problem, but one that is invertible. An invertible strategy allows for an efficient algorithm to recover the secrets. The problem of constructing a code with an efficient decoding algorithm is usually solved by giving some (algebraic) structure to the code. Therefore, we impose that our code, besides being equidistant, also satisfies all the parity check equations of a parity check matrix. As it will be shown in Section 5, this will allow to recover the secrets, with a simple algorithm that uses a modified version of the Viterbi algorithm. 5. Efficient recovery of the secrets We now tackle the problem of how to efficiently recover the secrets, when the strategy used is an equidistant parity check matrix Ò code. To recover the secrets we first need a way to relate the word associated to a sequence of answers, given by, with the codewords corresponding to these secrets. This is done in the following lemma. Lemma 2 Suppose an equidistant parity check Ò code is used as the strategy to solve the guessing secrets problem. Let ½ and be a pair of secrets and let Ü ½ and Ü be its associated codewords. The set of possible sequences of answers of according to the secrets ½ and is precisely Ü ½,Ü µ, the descendant set of Ü ½ and Ü. Using the previous lemma, if we denote by Þ the word corresponding to the sequence of answers given by player, then according to Section 2 we have that: 1. In a star configuration, for the common secret, say Ù, we have that Ù Þµ µ ½. 2. In a degenerated star configuration, for the single pair of secrets, say Ù Ú, wehavethat Ù Þµ Ú Þµ. 3. In a triangle configuration, for the three possible pairs of secrets, say Ù Ú, Ù Û and Ú Û, wehave that Ù Þµ Ú Þµ Û Þµ. Note that, from Proposition 1 it follows that is an even number. Therefore, we need an algorithm that outputs all codewords of a (2,2)-separating code within distance of Þ. Since the error correcting bound of the code is ½ we have that in both cases, degenerated star and triangle, we need to correct one more than the error correcting bound of the code. As it is shown below, this can be done by modifying the Viterbi algorithm. 5.1. Recovering secrets with the VA In [9] it is shown that maximum likelihood decoding of any Ò block code can be accomplished by applying the VA to a trellis representing the code. However, the algorithm discussed in [9] falls into the category of unique decoding algorithms since it outputs a single codeword, and 4

is therefore not fully adequate for our purposes. In this section we present a modified version of the Viterbi algorithm that when applied to the guessing secrets problem, given a sequence of answers, outputs a list that contains the codewords corresponding to the appropriate triangle or star configuration. The algorithm we present falls into the category of list Viterbi decoding algorithms [8]. We first give an intuitive description of the algorithm. Recall that given a sequence of answers Þ we need to find, either the unique codeword at a distance less or equal than ½ of Þ, or the codeword, or the two or three codewords at a distance of Þ. Let Þ Þ ½ Þ Þ Ò µ be a descendant. Let ¼ ¼ Ð ½ ¼ Ò ½ the sequence of edges in the path associated with codeword ½ Ò µ. As defined in Section 3.2, we have that Ð Ð Ó ½ µ. Each distinct path of the trellis corresponds to a distinct codeword, and since we need to search for codewords within a given distance of Þ, it seems natural to define the length of the edge ½, Ð ½,asÐ ½ Þ µ Þ Ð Ð Ó ½ µµ Since we expect the algorithm to return all codewords within distance of Þ, we can have more than one survivor for each node. For node, we denote the lth survivor as Ð. Using the above length definition for Ð ½,wedefine the length of the path associated with codeword, as the Hamming distance between Þ and, both truncated in the first symbols, Ä Þ µ È Ñ ½ Þ Ñ Ð Ð Ó Ñ ½ µµ Then, whenever Ä Ö we can remove the path from consideration. Note that, for a given node the different survivors do not necessarily need to have the same length. For each node (state), in the trellis, we maintain a list of tuples Ä µ, ½, where is a path passing through and Ä is its corresponding length. Tracing Viterbi Algorithm. (TVA) Variables: time index. Ñ, Á Ñth survivor terminating at. Ä Ñ, Á Ñth survivor length. Ä Ñ ½ Length of the path ½ ½ µ., Á List of survivors terminating at. Initialization: ¼; ¼ ¼ ½ ¼ ; Ä ¼ ½ ¼ ¼; ¼ ¼ ¼ ½ ¼ Ä ¼ ½ ¼ ¼ µ ; Recursion: ½ Òµ for every Ë do Ñ ¼ for every ½ such that for every ½ Compute Ä Ñ if Ä Ñ add Ñ Ñ Ñ ½ end if Termination: ½ is defined do ½ Ä ½ Ð ½ Ö Ä Ñ µ to The codewords associated with each path Ò ¼ Ñ are all within distance Ö of Þ. 6. Conclusions ¼ Ò This paper discusses an explicit set of questions that solves the guessing secrets game together with an efficient algorithm to recover the secrets. The explicit set of questions is based on a (2,2)-separating separating code, that is also a parity check code, and the recovery of the secrets consists in the decoding of a block code beyond its error correction bound, using a modification of the Viterbi algorithm, that in a single pass through the trellis representing the block code, returns all the codewords of the (2,2)- separating code within distance of a given descendant. References [1] I ve got a secret. A classic tv gameshow. http://www.timvp.com/ivegotse.html. [2] N. Alon, V. Guruswami, T. Kaufman, and M. Sudan. Guessing secrets efficiently via list-decoding. In Proc. of the 13th Annual ACM-SIAM SODA, pages 254 262, 2002. [3] F. Chung, R. Graham, and T. Leighton. Guessing secrets. The Electronic Journal of Combinatorics, 8:R13, 2001. [4] G. Cohen, S. Encheva, and H. G. Schaathun. On separating codes. Technical report, ENST, Paris, 2001. [5] J. Domingo-Ferrer and J. Herrera-Joancomartí. Simple collusion-secure fingerprinting schemes for images. In ITCC 00, pages 128 132. IEEE Computer Society, 2000. [6] G. D. Forney. The Viterbi algorithm. Proc. IEEE, 61:268 278, 1973. [7] Y. L. Sagalovich. Separating systems. Probl. Inform. Trans., 30(2):14 35, 1994. [8] N. Seshadri and C.-E. W. Sundberg. List Viterbi decoding algorithms with applications. IEEE Trans. Comm., 42:313 323, 1994. [9] J. K. Wolf. Efficient maximum likelihood decoding of linear block codes using a trellis. IEEE Trans. Inform. Theory, 24:76 80, 1978. 5