Thursday, January 31, 2013

Half Way There

Hello Blog! I am back after a month of stressful activity, starting from finishing Grad Apps, finishing and presenting my independent work, finishing my image processing final assignment, studying for finals, participating in MIT Mystery Hunt (we got 3rd!), and ending with taking my finals. Now after recuperating, I am ready to make a post about the mid-year results of my project.

The main result of my mid-year report was finding the probability of error of the optimal key difference guess given n samples. The basis of this analysis depends on the fact that we know the distributions of encryption times given whether plaintext pair "matched" or not. This gives us some information on what the key is. 

To find the probability of error, first, we find the optimal decision regions for guessing each key difference based on the Bayesian probability of error. Once we have the optimal decision regions, we used Sanov's theorem to find the asymptotic probability of an event generated by one distribution falls in the optimal region for another distribution. Then, we used Chernoff's Information to find the best bound on the error exponent. 

Applying these methods to our problem, we get the following equation for the best error exponent.



where Q_0 is the distribution of times given that a specific plaintext pair matches and Q_1 is the distribution of times given that a specific plaintext pair does not match. L is the number of possible hypotheses for a certain key byte.

I thought the expression above was cool because you can use it to prove Cauchy-Schwarz Inequality (since the equation above is a result of finding relative entropy which is always positive). Also, if you let Q_0 and Q_1 be Gaussians with different variances, then we can prove AMGM.  

I used the above expression to find an approximation on my simplified model, but unfortunately, the results were not great. It conflicted with experimental data. It predicted that the cache needs to have a 90% miss rate for the experimental data to be possible. Sadly, caches normally have less than 5% miss rate. Anyways, this is something we need to still examine. Most likely, the problem is that our simplified model doesn't capture the actual situations in experimental data. 

If you are interested in seeing my presentation, you can click the link here.

Other than the probability of error, I tried looking at minimal statistics. This was sparked by Bonneau's original method for the cache timing attack, which I thought was losing information because  his method only keeps track of averages. However, I didn't find anything too interesting. 

Trivia!

You have a list of integers. The list is special in that all the numbers have a unique duplicate except for one number, which has no duplicate. How do you find the number which does not have a duplicate using almost constant memory? 

3 comments:

  1. Wow congrats on making it through all that! :)

    I'm glad your project is coming along! I was just wondering, it seems like you work with your model in a step-wise fashion, first Bayes, then Sanov, then Chernoff? Is the sequence in which you apply these theories essential? and also, are there other theories you could use in place of those? :)

    ReplyDelete
    Replies
    1. I guess I described my result in a step-wise fashion because that is how I thought about it. I think there are other ways of thinking about the problem, but to get the result I got, you have to something very similar to the theories I used.

      Delete
  2. Seems like you're keeping busy! Anyways, I was wondering how you know the distributions of encryption times given whether plaintext pair "matched" or not?

    ReplyDelete