Statistical theory/distribution for tuples/vectors of LLR valuesby jamie_m Tags: kullbackleibler, llr, loglikelihood ratio 

#1
Nov312, 05:19 PM

P: 15

I've been doing some cryptographic research in which I need to be able to assign "scores" to vectors of floatingpoint values. Each of these floatingpoint values is a loglikelihood ratio LLR(\hat{q}, p, q) where:
\hat{q} is empirical data. The empirical data for element i in the vector is not necessarily the same as that for element j in the vector. p is a theoretical distribution. q is the uniform distribution. (We're dealing with Bernoullidistributed data, so q can be represented as the 2element array {0.5, 0.5}) The idea is that the higher the LLR values in the vector are, the more likely it is that the correct key has been found, and the higher the score assigned to the vector should be. Currently, I'm using the following scoring system:
For the sake of analysing this system, and in particular finding out how large a sample size is required for the correct key to have probability p_k of resulting in the largest score, I'd be interested in knowing if anyone knows anything about the statistical theory (if any!) governing vectors of LLR values, and in particular if anyone knows of a better/more typically used scoring system than the variant on Euclidean distance I've used here. (Please provide evidence for its being "better" if you can!) This would also apply to situations where the correct key need not result in the single highest score, just one of the X highest for some predecided X. An earlier version of this work used vectors of KullbackLeibler distances (between \hat{q} and p) instead. This didn't prove to be as effective, but if anyone can supply any information, similar to the aboverequested, for this scenario I'd be very grateful. Many thanks, James McLaughlin. 



#2
Nov312, 09:45 PM

Sci Advisor
P: 3,172

I suggest that you make another attempt at defining the problem.
Make it clear whether an individual random variable in the vector of random variables is a bernoulli random variable or not. Explain if there is any dependence between the random variables in a vector. (It might be best to explain what the random varibles represent. For example, are are counts of some features of a text message? Is each sample vector computed from a text message that has the same size? ) Is the "theoretical" distribution actually correct? For example, there are theoretical distributions for character frequences and ncharacter combinations in English text. However, these features are not independent of each other. I don't know if there is an theoretical joint distribution for them. 



#3
Nov412, 02:00 PM

P: 15

> I suggest that you make another attempt at defining the problem.
I'll try to do so; and to go into a bit more detail in the process. This is going to be quite long and I'm not sure I can condense it into a tl;dr, plus this will involve some Binomial random variables which are used to derive empirical Bernoulli distributions. Firstly, I referred to the research as "cryptographic", so I should say something about at least one of the ciphers it applies to. The most straightforward one to use in this description would be the one described in http://www.engr.mun.ca/~howard/PAPERS/ldc_tutorial.pdf (This cipher is intended to be used in experiments and in teaching people about cryptography, but isn't strong enough to be used in any realworld applications. It's not the only one this research is relevant to, but it is the simplest.) The cipher key is eighty bits in length, and is divided into five separate sixteen bit "round keys". A function RF() is applied four times, whereby RF(16bit integer i, integer j between 1 and 4) = g(bitwise xor of i and the jth round key) for some function g. The cipher C is then defined as C(i, key) = bitwise xor of RF(RF(RF(RF(i, 1), 2), 3), 4) and the 5th round key. One of the features of the function g is that, given some (but not all) of g's output bits, it's possible to use the round key to partially compute g's inverse and obtain some of its input bits. This is because g's output is the concatentaion of four separate functions, each taking four of g's input bits as their own input. Now, this is what I'm trying to do: Assume that we have obtained N pairs (plaintext P_i, encryption C_i of P_i under key K), and we'd like to try to deduce some of the bits of K from them. Call them (P, C)pairs. The set of bits we are attempting to deduce is a subset of the set of bits in the fifth round key. Let the size of this subset be denoted l_1. Furthermore, we will also be working with  and may be able to obtain some partial information on  a subset of the bits in the fourth round key. Let the number of such fourthround key bits be denoted l_2. Define a Boolean function FB, taking several bits as input, and outputting one bit. More precisely, let FB take an input (n_1 + n_2) bits in length. Let FC be a function mapping the first n_1 bits to some onebit value x and FD be a function mapping the remaining n_2 bits to some onebit value y. Then FB is the value (x xor y). Define a twodimensional array  call it "counters". counters[k_a][k_b] is defined as: #((P, C)pairs) such that, for (l_1)bit value k_a and (l_2)bit value k_b, FB(a particular set of n_1 bits in the plaintext, another set of n_2 bits resulting from the use of k_a and k_b to xor part of the ciphertext with k_a and compute some of the inverse of RF) = 0. Clearly the values in counters are Binomially distributed. So, for each value of (k_a, k_b), and for each ciphertext C_i, we have computed part of RF^{1}((part of C_i) ^ k_a) and counted how many times this was equal to a second function computed on the plaintext bits P_i. Now, for the wrong values of k_a, we expect the values in counters[][] to be approximately N/2 (i.e. uniformly distributed). For the correct guesses, we expect the distribution to be different. We need to have analysed the cipher in some way to get information on the distribution, but we certainly don't expect it to be uniform. For example, let l_2 be 4 and let the predicted distribution (as you can see, it is in fact a sequence of Bernoulli distributions) be: Pr(FB(plaintext bits, bits obtained from partial decryption with correct k_a and k_b) = 0) = 0.875 Pr(FB(plaintext bits, bits obtained from partial decryption with correct k_a and k_b) = 1) = 10.875 Pr(FB(plaintext bits, bits obtained from partial decryption with correct k_a and (correct k_b xor 0001)) = 0) = 0.375 Pr(FB(plaintext bits, bits obtained from partial decryption with correct k_a and (correct k_b xor 0001)) = 1) = 10.375 Pr(FB(plaintext bits, bits obtained from partial decryption with correct k_a and any other k_b) = 0) = 0.5 Pr(FB(plaintext bits, bits obtained from partial decryption with correct k_a and any other k_b) = 1) = 0.5 (Now, I'm lucky enough to be able to do experiments in a situation where the random variables corresponding to each of those Bernoulli distributions are independent of each other. But this won't always be the case. I'm hoping to be able to get some theoretical results for a situation in which they're independent; and then maybe to be able to generalise to at least some situations where they're not.) This yields the following expectations for the counter values  which as I've said are binomially, not Bernoulli, distributed: counters[correct k_a][correct k_b] \approx (7/8)*N counters[correct k_a][correct k_b xor 0001] \approx (3/8)*N counters[correct k_a][all other values] \approx N/2. counters[incorrect k_a][any value] \approx N/2. (I say \approx because the values are very unlikely to be precisely equal to their theoretical expectations.) Or it may be that the distribution will be either that or Pr(FB(plaintext bits, bits obtained from partial decryption with correct k_a and k_b) = 0) = (1  0.875) = 0.125 Pr(FB(plaintext bits, bits obtained from partial decryption with correct k_a and (correct k_b xor 0001)) = 0) = (1  0.375) = 0.625 Pr(FB(plaintext bits, bits obtained from partial decryption with correct k_a and any other k_b) = 0) = (1  0.5) = 0.5 with corresponding expectations: counters[correct k_a][correct k_b] \approx (1/8)*N counters[correct k_a][correct k_b xor 0001] \approx (5/8)*N counters[correct k_a][all other values] \approx N/2. counters[incorrect k_a][any value] \approx N/2. We have no a priori way of predicting which of these two distributions (or "these two sets of Bernoulli distributions") is correct. Unfortunately this twodistribution situation always applies, where each probability value predicted by the first distribution is 1.0  the value predicted by the second.) At this stage, let counters_2 be a vector of floating point values such that counters_2[k_a][k_b] = counters[k_a][k_b]/N. We compute this as soon as we've processed all N (P, C)pairs and computed all the values in counters. The theoretical expectations as described above are now: counters_2[correct k_a][correct k_b] \approx 0.875. counters_2[correct k_a][correct k_b xor 0001] \approx 0.375. counters_2[correct k_a][all other values] \approx 0.5. counters_2[incorrect k_a][any value] \approx 0.5. and counters_2[correct k_a][correct k_b] \approx 0.125. counters_2[correct k_a][correct k_b xor 0001] \approx 0.625. counters_2[correct k_a][all other values] \approx 0.5. counters_2[incorrect k_a][any value] \approx 0.5.  i.e. the theoretical probability that each Bernoulli trial (or computation of FB) would result in the value 0. Now, let's define a new array of floatingpoint values, Scores, with l_1 elements. Let's also define two 2D arrays of floating point values, LLRs_with_first_theoretical[][] and LLRs_with_second_theoretical[][].




#4
Nov512, 07:12 PM

Sci Advisor
P: 3,172

Statistical theory/distribution for tuples/vectors of LLR valuesIs "k_a" the value of the bit (i.e. 0 or 1) or is it the index of the bit position in the key? (e.g. the 3rd bit in the 5th round key) 



#5
Nov712, 07:33 AM

P: 15

(So, for instance, the 8bit values would be the integers 0 to 255.) 



#6
Nov712, 09:56 PM

Sci Advisor
P: 3,172

I'll attempt some editoral work on your description until I reach the point of total confusion. Bits K[1] through K[16] form round_key[1], the "first round key",. Bits K[17] through K[32] form the "second round key", round_key[2]. etc. Define the fuction RF(t,i) that outputs a 16 bit string as a function of the 16 bit input string t and the integer i, i = 1,2,3,4, as follows: RF(t,i) = g( bitwise xor of t with round_key[i] ) To encrypt a 16 bit string of plain text p into a 16 bit string of cypher text C the following algorithm is used: c = p for i = 1 to 4 { c = RF(c,i) } c = bitwise xor of c with round_key[5] Let L1 be a given subsequence of bits of round_key[5]. Let the number of bits in L1 be l_1. Let L2 be a given subsequence of bits of round_key[4]. Let hte number of bits in L2 be l_2. I'll work on this more later. I'll post this now so I won't lose the effort to some computer glitch. 



#7
Nov812, 02:02 AM

Sci Advisor
P: 3,172

I can believe, for example, that B[2][4] and B[3][1] are independent random variables if we make an independent draw of a (P,C) pair for each. But we if we evaluating them on the same (P,C) pair and making only one random draw, then this is less plausible. A similar situation holds for the counts as random variables. 



#8
Nov812, 05:31 PM

P: 15

Thanks for the detailed reply  I'll need to study it in more depth tomorrow to reply properly. But I think I may be able to shed some light on one thing you said:
"Why have we reached a choice between two distributions that are related in this way?" I'm afraid that it's due to the inner workings of the cipher  which of the two distributions is the correct one depends on the parity of a particular subset of the bits in the first three round keys. Since we don't know what these bits are, we can't compute their parity, and so we don't know in advance whether the distribution corresponding to parity 0, or the distribution corresponding to parity 1, is correct. 



#9
Nov912, 09:55 AM

P: 15

E[A[correct k_a][correct k_b]] = 0.875 * N E[A[correct k_a][correct k_b xor 0001]] = 0.375 * N So if the correct value of k_b was 13 in decimal (1101 in binary), then "correct k_b xor 0001" is 1100 in binary (12 in decimal) and we have: E[A[correct k_a][13]] = 0.875 * N E[A[correct k_a][12]] = 0.375 * N Now, at this point, we come to the tworelateddistributions situation. I've explained why there are two such distributions in my comment above, and I want to make it clear that the two distributions are for the same set of (P, C)pairs. The reason I didn't at first mention the twodistribution situation was that I thought it would overcomplicate the description if I tried to deal with it too early on, and so I mentioned it near to the end of my explanation. Clearly this did in fact serve to obfuscate, not clarify, the situation, which wasn't my intention. Apologies for that. 



#10
Nov912, 12:39 PM

Sci Advisor
P: 3,172

That clears up many minor points, but I still don't understand the fundamental question: What are the random variables that are involved in this problem? How exactly are they defined?
What I understand so far is that one random variable is a random choice of a (P,C) pair. We hold various parameters constant and apply some process whose inputs are these parameters and the (P,C) pair. I think the process produces a 0 or 1, so the process is a random variable by virtue of being function of the (P,C)pair, which is a random variable. When we take a sample of N indpendently chosen (P,C)pairs, the output of the process gives a total number 0's and a total number of 1's. However, apparently the random variable of interest is not the number of 1's produced by the process and it is not the number of 0's produced by the process. It is somehow ambiguous which count we are interested in. But you haven't defined any probability associated with this ambiguity. So you haven't defined the bottomline random variable. The data is an array A[i][j] of "counts". I'm assuming A[i][j] is the count of how many times the process produced a 0. If i and j are given, do you know whether the random variable of interest is the count of 0's or the count of 1's for those particular i,j? From what you've said so far, I'm not sure. Your original post talked about using log liklihoods. But if you used log liklihoods, you had to have a specific distribution didn't you? 



#11
Nov1212, 11:53 AM

P: 15

I think I can clarify further and hopefully eliminate the remaining ambiguity:
Now, for the correct k_a, we have two sets of theoretical distributions. Each set contains a theoretical distribution for the value of counters[correct k_a][k_b] for all valid values of k_b. So you could say that the number of 0s is the random variable as far as each of these distributions is concerned  though since the number of 1s is equal in all cases to Nthis, it contains no more information than the number of 0s.




#12
Nov1212, 12:40 PM

Sci Advisor
P: 3,172

I think I'm beginning to get it.
So, for example, assume we have 50 PCpairs. if k_a = 11 is the correct key bit string and k_b varies over the strings 00,01,10,11 then one set of distributions is a vector of "uniform" distributions of counts: We can summarize this vector of uniform distributions in various ways. We could write a vector of the expected number of 0's out of 50 PC pairs, which would be (25,25,25,25) . Or could write the vector that gives the probability of a 0 in each distribution as (0.5,0.5,0.5,0.5). The other vector of distributions either comes from something you can prove about the behavior of using the correct k_a or it comes from the vector of empirical counts, (count[11][00],count[11][01],count[11][10],count[11][11]). ( I can see why its convenient not to distringuish between a bit string and the value of the base 2 integer that it represents, but I am making a distinction. So I mean "10" to be a string that represents the number (1)( 2^1 )+ (0)(2^0) = 2, not the number ten.) 



#13
Nov1312, 08:23 AM

P: 15

Moreover, as you note in your last post, the experimental data has enabled us to construct a set of empirical distributions  the main reason why I referred to sets of theoretical distributions was to distinguish them from the set of empirical distributions. If we express one of these two sets of theoretical distributions as a vector v such that v[i] is the probability of 0 in the ith distribution, and index from 1 to m, then the other theoretical distribution would be expressible as a vector of the form: ((1.0  v[1]), (1.0  v[2]), ..., (1.0  v[m])) 



#14
Nov1312, 09:38 AM

Sci Advisor
P: 3,172





#15
Nov1412, 04:52 PM

P: 15

If 00 is the correct value of k_b, and if you guess at, say, 01 as the correct value, this pretty much only means that your empirical data won't fit f1 as well as it would have if you'd guessed the correct value, 00. But [which of f1 and f2 was correct] was already fixed before you started obtaining the (P, C)pairs, and does not change. 



#16
Nov1412, 07:00 PM

Sci Advisor
P: 3,172

Suppose k_a = 00 is the correct key. Suppose the (correct) probability that a randomly selected PC pair produces a 0 that adds one to the count A[00][00] is 0.8. Does this also mean that the probability that a randomly selected PC pair also has a probability of 0.8 of adding one to the count of A[00][01] ? 


Register to reply 
Related Discussions  
Statistical Ph.  Distribution of Occupation  Classical Physics  0  
statistical physics:boltzman distribution  Advanced Physics Homework  1  
Vectors, ntuples, and headaches oh my!  Calculus & Beyond Homework  2  
Ntuples and Vectors  General Math  2  
Question of Statistical Thermodynamics (Boltzmann Distribution)  Advanced Physics Homework  6 