Thursday, January 31, 2013

Half Way There

Hello Blog! I am back after a month of stressful activity, starting from finishing Grad Apps, finishing and presenting my independent work, finishing my image processing final assignment, studying for finals, participating in MIT Mystery Hunt (we got 3rd!), and ending with taking my finals. Now after recuperating, I am ready to make a post about the mid-year results of my project.

The main result of my mid-year report was finding the probability of error of the optimal key difference guess given n samples. The basis of this analysis depends on the fact that we know the distributions of encryption times given whether plaintext pair "matched" or not. This gives us some information on what the key is. 

To find the probability of error, first, we find the optimal decision regions for guessing each key difference based on the Bayesian probability of error. Once we have the optimal decision regions, we used Sanov's theorem to find the asymptotic probability of an event generated by one distribution falls in the optimal region for another distribution. Then, we used Chernoff's Information to find the best bound on the error exponent. 

Applying these methods to our problem, we get the following equation for the best error exponent.



where Q_0 is the distribution of times given that a specific plaintext pair matches and Q_1 is the distribution of times given that a specific plaintext pair does not match. L is the number of possible hypotheses for a certain key byte.

I thought the expression above was cool because you can use it to prove Cauchy-Schwarz Inequality (since the equation above is a result of finding relative entropy which is always positive). Also, if you let Q_0 and Q_1 be Gaussians with different variances, then we can prove AMGM.  

I used the above expression to find an approximation on my simplified model, but unfortunately, the results were not great. It conflicted with experimental data. It predicted that the cache needs to have a 90% miss rate for the experimental data to be possible. Sadly, caches normally have less than 5% miss rate. Anyways, this is something we need to still examine. Most likely, the problem is that our simplified model doesn't capture the actual situations in experimental data. 

If you are interested in seeing my presentation, you can click the link here.

Other than the probability of error, I tried looking at minimal statistics. This was sparked by Bonneau's original method for the cache timing attack, which I thought was losing information because  his method only keeps track of averages. However, I didn't find anything too interesting. 

Trivia!

You have a list of integers. The list is special in that all the numbers have a unique duplicate except for one number, which has no duplicate. How do you find the number which does not have a duplicate using almost constant memory? 

Friday, December 7, 2012

CSoI Conference

I had the pleasure of attending the NSF Center for Science of Information Site Visit earlier this week. It was a great experience. I met some of the students, post-docs, and professors in the information theory field and I got to learn about some of the projects people were doing.

I particularly liked the discussions on projects by Tom Courtade, a post-doc at Stanford. He introduced a problem on boolean functions and a problem on lossy compression of graphs using the log-loss distortion function. The first project involves finding I(X^n, b(Y^n)) where b is a boolean function of n bits, and Y^n is a noisier version of X^n. The question seems simple, but actually, as Tom explained, it is very complicated. The second problem presented an interesting way of encoding graphs. There were however some disagreements on whether log-loss is a good measurement of distortion for applications of lossy encoding of graphs. 

I had the chance to meet many of professors who are at the graduate schools I am applying to and I shared my poster with many of them (click here to see my poster). I was amused when Professor David Tse came by to see my poster. When I told him that I was an undergrad applying to Berkeley, he said to me in a very serious voice, "Consider this your interview". When I finished telling him about my project, he said again in a very serious voice, "I will be reading your application". I guess I should consider that a good thing, since he could have told me that he wasn't going to read my application. 

Overall, I think the conference was a great opportunity for me to see what goes on the field of information theory. I now have a better understanding than before on what the research projects are like. I enjoyed the academic atmosphere and I enjoyed meeting the  grad students who told me about what grad student life was like.

Unrelated to the conference, today I got a chance to speak with Professor Ruby Lee, who has the background knowledge on cache. When I explained what I was trying to do for my project, she suggested that I run some tests to collect data before making assumptions about the timing distributions of the cache hits and the cache misses. Though I find gathering data to be much less interesting than theory, I guess she makes a good point. I will see what I can do about this. Meanwhile, I will continue to try to see what the Chernoff information can tell us about the error in making a guess about the distribution. 

Here is a question from the Putnam exam from last weekend. It dutifully honors the year 2012:

Given any set of 12 numbers selected from the range (1, 12), is it possible to choose three distinct elements from this set which are the lengths of an acute triangle?

I will have a busy next few weeks ahead of me due to grad school apps. Hopefully I will have time to update this blog.

Friday, November 23, 2012

Thanksgiving!

Happy Thanksgiving!


For the past two weeks, I have been trying to see if I can calculate how many samples it takes to determine the key with a certain probability. The good news is that in this process, I was able to understand large deviation theory better. The bad news is that I don't feel like I am making significant progress. It takes me a long time to figure out what I am doing. I think I spend most of my time just staring at my paper. One of my professor's grad student tried to assure me that this always happens in the beginning of research. While I have hope that I will get better at this, I think I am doomed for the poster presentation I have to give in 13 days.

The ideas I am working on involve finding the probability that two nibbles of the key has has a specified difference when XOR-ed together. In other words, I am trying to figure out the probability distribution of K0 ^ K4 = Y, which can take on the values y0, y1, y2,...,y15.

I am assuming that I know for each plaintext the time it takes for the plaintext to get encrypted, which we will call X. This plaintext has a key difference of say Z. I want to figure out what Y is given that we know X. The weird thing about his model is that X is most useful for determining the probability that Y = y_i, only if Z = y_i. If Z = y_j. for j != i, X tells us as much about Y = y_j as it does for any Y != y_i. Therefore, we can just set 
P( Y = y_i | Z = y_j, j != i) = (1 - P ( Y = y_j | Z = y_j ) ) / 15.   

Using this model, along with Bayes' rule, assuming iid distribution of X and uniform prior of Y, (and that my math is correct), we can derive that 

P( Y = y_i | X^n ) = C * P(X^n | Y = y_i) / P(X^n | Y = y_j, j != i)

where C is a normalizing constant and n is the number of plaintext times we have such that Z = y_i for all X. If we write that in terms of relative entropy, we can get that  

P( Y = y_i | X^n ) = 2 ^ (-n *( D (P_x || y_i) - D(P_x || y_j) ))

where P_x is the empirical distribution of X^n. This relative entropy form may be helpful for applying some known theorems, however, I am still not entirely sure what to do with this. The current plan is to see if I can use this to calculate a lower bound on the probability of a given key difference using a simple Bernoulli distribution to model P(X | Y = y_i, Z = y_i) and P(X | Y = y_j, Z = y_i, i != j). Hopefully there will be interesting results.

Question for this week. I got this from our group meeting:

You are in a room with 3 light switches. Each of these switches control one of three lights in an adjacent room. From your room, you can't see which switch controls which light. You are allowed to flip some combination of switches and walk into the adjacent room to see which lights are on. How can you determine which switch controls which light by only making one trip to the adjacent room?

Sunday, November 11, 2012

A Simplified Model

Recap: We are trying to analysis the cache-timing side-channel attack which is used to extract the key from AES encryption. What happens is that as each plaintext is encrypted, the attacker can measure the time it takes the cipher to complete the encryption. Since AES encryption uses table look-ups, the algorithm will need to make calls to the cache. Since there can be cache-hits and cache-misses, the values which are looked up will have some relationship with the times it takes to look up the each value in the cache. We can use this information to extract the key. 

There are may different ways an attacker can used the timing information. We will focus on one model at a time. The current model we will look at is the first round model described by Bonneau's paper[1]. In this post, I will explain some of the ideas I have to simplify this model in order to make the problem easier to analyze. 

Now, since we are looking at the first round attack, we will the making the following assumption. If two look-ups that occur in the first round look up the same index, then the second of these look-ups is likely to have a cache hit. Therefore if, two of the same indices are looked-up in the first round, then the total time will on average be shorter than if all the values looked-up are different. 

So let us specifically consider two different indices being looked-up in the first round, say indices i_0 and i_1. If we know that i_0 = i_1, then we can figure out certain bits of the key. We we want to figure out, if given timing data, whether we can infer if i_0 = i_1.

The problem is, there will be other cache-hits and cache-misses that occur in other look-ups, so it is very hard to tell if i_0 = i_1. However, what we can do is model the times that come from the other look-ups as noise. We can approximate the distribution of the timing data for one encryption, given that i_0 = i_1 and that i_0 != i_1.

To do this, let us make certain assumptions on the system doing the encryption. We will suppose that the time for each encryption, excluding the time it takes to process the cache-hits and misses, is the same each time. We will assume that each cache-hit takes a fixed H time units to process, and each miss takes a fixed M time units to process (not true in hierarchy memory models). Let M - H = D. Therefore the distribution of any encryption is in step sizes of D. In this situation, the timing data can exactly tell us the number of misses. 

We know that there are 160 table look-ups in AES encryption. So, here we can model the distribution of timing data as a Bernoulli distribution, setting the probability of a cache-miss to be P. Even though 160 is a small number, the Bernoulli distribution can be approximated by a Gaussian. Depending on if i_0 = i_1 or i_0 != i_1, we have slightly different distributions, and we can use these distributions to do hypothesis testing and to find the probability of error given a number of samples. 

However, we made many assumptions in the model, which may not correctly model how a system would behave in the real world. We will need to look into these assumptions some more.

Okay, here is a brain teaser for this post. This is a USAMTS problem from some years ago.

Suppose that you have 14 coins, which all have a unique year written on them so that they can be distinguished. Now, only 7 of these coins are real. The other 7 are fake. Suppose real coins weigh 1 gram and fake coins weigh .9999 grams. You only have a strange futuristic scale to weigh these coins. The scale will compare two groups of coins and tell you which group is heavy. If the two groups are the same, then the scale returns all your coins. However, if one group is heavier than the other, the scale still tells you which group is heavier, but then the scale takes one coin uniformly at random from the heavier group and keeps it as payment. How can you use this scale to determine and keep in your possession one real coin. 


References:
[1] Bonneau, J., & Mironov, I. (2006). Cache-collision timing attacks against AES. Lecture Notes in Computer Science series 4249, 201-215. 

Saturday, November 3, 2012

Ideas to Think About


This week was fall break for Princeton (which coincided with Hurricane Sandy. It's amazing that Princeton can plan its academic calendar around natural disasters). The week before that, Eva and I met with Professor Ruby Lee and Professor Cuff to discuss some approaches to tackling this project. I will present some of these here. 

From Professor Lee's perspective, there are four main questions we should be asking in this project:

1. How do we make the cache-timing side-channel attacks go faster?
2. What are the root causes that allow this attack to work?
3. How can we defend against these attacks?
4. What kinds of quantitative metrics can we use to measure the amount of information leakage? 

(If you don't know what the cache-timing side-channel attack is, see post below titled Project Description.)

The question that I would most like to answer for this project is the fourth one. (Eva and I also currently believe it is the easiest one to answer). We can narrow down question 4 to the following: What is the equivocation of the key given a set of N timing data?

Professor Cuff gave us some ideas about how we can answer this. He suggested that for the attack, we assume that only the plaintext and the timing data is known. If we assume that the attacker knows both the plaintext and ciphertext, then given infinite resources the attacker will be able to figure out the key. This would create a degenerate problem. Professor Cuff also suggested that we look into ideas of hypothesis testing. Particularly, he recommended using Sanov's theorem.

I am currently looking into some of the ideas Professor Cuff proposed. Looking at the problem through hypothesis testing is makes sense, since for any variation of the cache-timing attack, the attacker needs to decide between possible key distributions at some point. Therefore, using some already developed theories in hypothesis testing, we may be able to find some kind of bound on the amount of information leaked. 

If we were analyzing the first-round-attack variation of the cache-timing attack, then we can break down the problem by only looking at 4 bit segments (or nibbles) at a time. Then for each key nibble, we would only have to try 16 hypotheses to determine the key nibble. This is much more tractable than trying hypotheses on the whole length of the key, which would be 2^128 hypotheses. 

Don't have a good brain teaser question for this week. I'll try to find one for a later post.

For those not familiar with cryptography, note that the plaintext is the input to a encryption system and the output is the ciphertext. The key determines what the ciphertext will be. The goal of the most cipher attacks is to try to extract the key. Now, given the plaintext and the resulting ciphertext, the key is uniquely determined. However, calculating the key from the just plaintext and ciphertext is too computationally intensive to be feasible.

Saturday, October 27, 2012

Project Description

Apologies for not updating sooner! In this post, I will give a description of the project I plan to be working on, but first! some background on cryptography! 

Currently, the standard for symmetric block ciphers is AES (Advanced Encryption Standard). It is used in many security protocols today. This standard was selected by the National Institute for Standards and Technology for its security and efficient computation. However, since its selection in 2001, a number of attacks have been developed against AES. There is a collection of attacks called side-channel attacks, which have been shown in research to work well.

Side-channel attacks target information leaked from the the physical implementation of the cipher. By measuring side-channel information, like power consumption or computation time, the attacker can figure out what kinds of computation and what kinds of values were used by the encryption device, which leads to information about the secret key. (Note: many modern cipher attacks focus on extracting the key from the encryption. This will also be the goal of this project.)

The particular side-channel attack we will be focusing on is the cache-timing attack. Many software implementations of AES rely on table look-ups, which means that there will be a significant amount of memory accesses. Each memory access can result in a cache hit or a cache miss, events which take different amounts of time. Therefore, the total time taken for an encryption process can leak information about what kinds of values were looked up, leading to information about the key. 

Our project goal is to formally analyze the cache-timing attack using ideas from information theory. So the cool part: we are dealing with code-breaking. (Yay code-breaking!) The hard part: The cache-timing attack has many different forms. Implementations of AES, hardware used, form of the attack, and the key information obtained vary significantly. It is difficult to isolate the particular problem we want to work on. 

There is more to say about this project, but I don't want to squeeze everything into one post, so I will end here and add another brain teaser questions for readers to ponder about. This question is not so hard if you have some  basic knowledge of number theory, otherwise the problem is pretty impossible.

Imagine that you are standing at the origin of the x and y plane. Suppose at every pair of integer coordinates, there is a pole extending up in the z direction. These poles are infinitely thin and extend way over your head. What is the probability that you can see a given pole selected at random?

Thursday, October 4, 2012

Post 2

I met with my professor's grad student, Eva, last week and we discussed some possible ideas for a project. The conclusion of the meeting was that I still need to do more readings. The paper I am reading now is "The Wire-Tap Channel" by Wyner. It is a very old paper (from 1975), and since it has survived until today, it clearly is very influential. I'm guessing my project can relate to this paper in some way. 

Something I found exciting this week was that in my probability class, we proved the law of large numbers and followed up with a corollary about entropy. The corollary was the asymptotic equipartition property. I had learned about this before, but it was cool to see it presented in another class. 

Another thing I learned which I thought was interesting was how the information theory notion of entropy related with the notion of entropy defined in the second law of thermodynamics. The second law of thermodynamics states that entropy always increases, but when using Markov chains to analyze a physical system, the information theory entropy of the distribution over time can actually decrease. This was surprising. 

Last week, I posted a trivia question. It would make sense for me to post the answer to the question this week, but I don't think anyone has actually read this blog yet. So I will postpone posting the answer.

In the meantime, here is another question which I thought was cute. It is not super difficult, but the idea is nice.

Suppose you have a box with N pieces of string. You reach into the box and randomly take out two ends and tie them together. These two ends can come from the same string or from different strings. You repeat this action until there are not more string ends to tie. What is the expected number of loops?