- #1
jamie_m
- 14
- 0
I've been doing some cryptographic research in which I need to be able to assign "scores" to vectors of floating-point values. Each of these floating-point values is a log-likelihood ratio LLR(\hat{q}, p, q) where:
\hat{q} is empirical data. The empirical data for element i in the vector is not necessarily the same as that for element j in the vector.
p is a theoretical distribution.
q is the uniform distribution.
(We're dealing with Bernoulli-distributed data, so q can be represented as the 2-element array {0.5, 0.5})
The idea is that the higher the LLR values in the vector are, the more likely it is that the correct key has been found, and the higher the score assigned to the vector should be.
Currently, I'm using the following scoring system:
There are clearly other systems I could use; for example a straightforward summation of the values in the vector.
For the sake of analysing this system, and in particular finding out how large a sample size is required for the correct key to have probability p_k of resulting in the largest score, I'd be interested in knowing if anyone knows anything about the statistical theory (if any!) governing vectors of LLR values, and in particular if anyone knows of a better/more typically used scoring system than the variant on Euclidean distance I've used here.
(Please provide evidence for its being "better" if you can!)
This would also apply to situations where the correct key need not result in the single highest score, just one of the X highest for some pre-decided X.
An earlier version of this work used vectors of Kullback-Leibler distances (between \hat{q} and p) instead. This didn't prove to be as effective, but if anyone can supply any information, similar to the above-requested, for this scenario I'd be very grateful.
Many thanks,
James McLaughlin.
\hat{q} is empirical data. The empirical data for element i in the vector is not necessarily the same as that for element j in the vector.
p is a theoretical distribution.
q is the uniform distribution.
(We're dealing with Bernoulli-distributed data, so q can be represented as the 2-element array {0.5, 0.5})
The idea is that the higher the LLR values in the vector are, the more likely it is that the correct key has been found, and the higher the score assigned to the vector should be.
Currently, I'm using the following scoring system:
Code:
score=0.0;
for (int i=0; i < v.length(); i++)
{
if (v[i] > 0)
{
add (v[i] squared) to score
}
else
{
subtract (v[i] squared) from score
//Yes, the score can take a negative value.
}
}
There are clearly other systems I could use; for example a straightforward summation of the values in the vector.
For the sake of analysing this system, and in particular finding out how large a sample size is required for the correct key to have probability p_k of resulting in the largest score, I'd be interested in knowing if anyone knows anything about the statistical theory (if any!) governing vectors of LLR values, and in particular if anyone knows of a better/more typically used scoring system than the variant on Euclidean distance I've used here.
(Please provide evidence for its being "better" if you can!)
This would also apply to situations where the correct key need not result in the single highest score, just one of the X highest for some pre-decided X.
An earlier version of this work used vectors of Kullback-Leibler distances (between \hat{q} and p) instead. This didn't prove to be as effective, but if anyone can supply any information, similar to the above-requested, for this scenario I'd be very grateful.
Many thanks,
James McLaughlin.