Friday, November 23, 2012

Thanksgiving!

Happy Thanksgiving!


For the past two weeks, I have been trying to see if I can calculate how many samples it takes to determine the key with a certain probability. The good news is that in this process, I was able to understand large deviation theory better. The bad news is that I don't feel like I am making significant progress. It takes me a long time to figure out what I am doing. I think I spend most of my time just staring at my paper. One of my professor's grad student tried to assure me that this always happens in the beginning of research. While I have hope that I will get better at this, I think I am doomed for the poster presentation I have to give in 13 days.

The ideas I am working on involve finding the probability that two nibbles of the key has has a specified difference when XOR-ed together. In other words, I am trying to figure out the probability distribution of K0 ^ K4 = Y, which can take on the values y0, y1, y2,...,y15.

I am assuming that I know for each plaintext the time it takes for the plaintext to get encrypted, which we will call X. This plaintext has a key difference of say Z. I want to figure out what Y is given that we know X. The weird thing about his model is that X is most useful for determining the probability that Y = y_i, only if Z = y_i. If Z = y_j. for j != i, X tells us as much about Y = y_j as it does for any Y != y_i. Therefore, we can just set 
P( Y = y_i | Z = y_j, j != i) = (1 - P ( Y = y_j | Z = y_j ) ) / 15.   

Using this model, along with Bayes' rule, assuming iid distribution of X and uniform prior of Y, (and that my math is correct), we can derive that 

P( Y = y_i | X^n ) = C * P(X^n | Y = y_i) / P(X^n | Y = y_j, j != i)

where C is a normalizing constant and n is the number of plaintext times we have such that Z = y_i for all X. If we write that in terms of relative entropy, we can get that  

P( Y = y_i | X^n ) = 2 ^ (-n *( D (P_x || y_i) - D(P_x || y_j) ))

where P_x is the empirical distribution of X^n. This relative entropy form may be helpful for applying some known theorems, however, I am still not entirely sure what to do with this. The current plan is to see if I can use this to calculate a lower bound on the probability of a given key difference using a simple Bernoulli distribution to model P(X | Y = y_i, Z = y_i) and P(X | Y = y_j, Z = y_i, i != j). Hopefully there will be interesting results.

Question for this week. I got this from our group meeting:

You are in a room with 3 light switches. Each of these switches control one of three lights in an adjacent room. From your room, you can't see which switch controls which light. You are allowed to flip some combination of switches and walk into the adjacent room to see which lights are on. How can you determine which switch controls which light by only making one trip to the adjacent room?

Sunday, November 11, 2012

A Simplified Model

Recap: We are trying to analysis the cache-timing side-channel attack which is used to extract the key from AES encryption. What happens is that as each plaintext is encrypted, the attacker can measure the time it takes the cipher to complete the encryption. Since AES encryption uses table look-ups, the algorithm will need to make calls to the cache. Since there can be cache-hits and cache-misses, the values which are looked up will have some relationship with the times it takes to look up the each value in the cache. We can use this information to extract the key. 

There are may different ways an attacker can used the timing information. We will focus on one model at a time. The current model we will look at is the first round model described by Bonneau's paper[1]. In this post, I will explain some of the ideas I have to simplify this model in order to make the problem easier to analyze. 

Now, since we are looking at the first round attack, we will the making the following assumption. If two look-ups that occur in the first round look up the same index, then the second of these look-ups is likely to have a cache hit. Therefore if, two of the same indices are looked-up in the first round, then the total time will on average be shorter than if all the values looked-up are different. 

So let us specifically consider two different indices being looked-up in the first round, say indices i_0 and i_1. If we know that i_0 = i_1, then we can figure out certain bits of the key. We we want to figure out, if given timing data, whether we can infer if i_0 = i_1.

The problem is, there will be other cache-hits and cache-misses that occur in other look-ups, so it is very hard to tell if i_0 = i_1. However, what we can do is model the times that come from the other look-ups as noise. We can approximate the distribution of the timing data for one encryption, given that i_0 = i_1 and that i_0 != i_1.

To do this, let us make certain assumptions on the system doing the encryption. We will suppose that the time for each encryption, excluding the time it takes to process the cache-hits and misses, is the same each time. We will assume that each cache-hit takes a fixed H time units to process, and each miss takes a fixed M time units to process (not true in hierarchy memory models). Let M - H = D. Therefore the distribution of any encryption is in step sizes of D. In this situation, the timing data can exactly tell us the number of misses. 

We know that there are 160 table look-ups in AES encryption. So, here we can model the distribution of timing data as a Bernoulli distribution, setting the probability of a cache-miss to be P. Even though 160 is a small number, the Bernoulli distribution can be approximated by a Gaussian. Depending on if i_0 = i_1 or i_0 != i_1, we have slightly different distributions, and we can use these distributions to do hypothesis testing and to find the probability of error given a number of samples. 

However, we made many assumptions in the model, which may not correctly model how a system would behave in the real world. We will need to look into these assumptions some more.

Okay, here is a brain teaser for this post. This is a USAMTS problem from some years ago.

Suppose that you have 14 coins, which all have a unique year written on them so that they can be distinguished. Now, only 7 of these coins are real. The other 7 are fake. Suppose real coins weigh 1 gram and fake coins weigh .9999 grams. You only have a strange futuristic scale to weigh these coins. The scale will compare two groups of coins and tell you which group is heavy. If the two groups are the same, then the scale returns all your coins. However, if one group is heavier than the other, the scale still tells you which group is heavier, but then the scale takes one coin uniformly at random from the heavier group and keeps it as payment. How can you use this scale to determine and keep in your possession one real coin. 


References:
[1] Bonneau, J., & Mironov, I. (2006). Cache-collision timing attacks against AES. Lecture Notes in Computer Science series 4249, 201-215. 

Saturday, November 3, 2012

Ideas to Think About


This week was fall break for Princeton (which coincided with Hurricane Sandy. It's amazing that Princeton can plan its academic calendar around natural disasters). The week before that, Eva and I met with Professor Ruby Lee and Professor Cuff to discuss some approaches to tackling this project. I will present some of these here. 

From Professor Lee's perspective, there are four main questions we should be asking in this project:

1. How do we make the cache-timing side-channel attacks go faster?
2. What are the root causes that allow this attack to work?
3. How can we defend against these attacks?
4. What kinds of quantitative metrics can we use to measure the amount of information leakage? 

(If you don't know what the cache-timing side-channel attack is, see post below titled Project Description.)

The question that I would most like to answer for this project is the fourth one. (Eva and I also currently believe it is the easiest one to answer). We can narrow down question 4 to the following: What is the equivocation of the key given a set of N timing data?

Professor Cuff gave us some ideas about how we can answer this. He suggested that for the attack, we assume that only the plaintext and the timing data is known. If we assume that the attacker knows both the plaintext and ciphertext, then given infinite resources the attacker will be able to figure out the key. This would create a degenerate problem. Professor Cuff also suggested that we look into ideas of hypothesis testing. Particularly, he recommended using Sanov's theorem.

I am currently looking into some of the ideas Professor Cuff proposed. Looking at the problem through hypothesis testing is makes sense, since for any variation of the cache-timing attack, the attacker needs to decide between possible key distributions at some point. Therefore, using some already developed theories in hypothesis testing, we may be able to find some kind of bound on the amount of information leaked. 

If we were analyzing the first-round-attack variation of the cache-timing attack, then we can break down the problem by only looking at 4 bit segments (or nibbles) at a time. Then for each key nibble, we would only have to try 16 hypotheses to determine the key nibble. This is much more tractable than trying hypotheses on the whole length of the key, which would be 2^128 hypotheses. 

Don't have a good brain teaser question for this week. I'll try to find one for a later post.

For those not familiar with cryptography, note that the plaintext is the input to a encryption system and the output is the ciphertext. The key determines what the ciphertext will be. The goal of the most cipher attacks is to try to extract the key. Now, given the plaintext and the resulting ciphertext, the key is uniquely determined. However, calculating the key from the just plaintext and ciphertext is too computationally intensive to be feasible.