Tuesday, 7 May 2013

Speeding up CRC calculations for Bluetooth Low Energy

Over the past few days Mike Ryan has been working hard to cram as much of the Bluetooth Low Energy (BTLE) functionality as possible in to the Ubertooth firmware.  In doing so he plans to relieve the host system of the work involved in finding and processing packets.  In time this will allow Ubertooth to monitor and inject packets in to BTLE connections while running from a very low powered host, or possibly without a host system at all.

This has involved some excellent work using the CC2400 chip to automatically detect BTLE packets, a task which it is unfortunately unable to achieve for basic rate Bluetooth.  Once we know where a packet starts we are able to handle the packet data as a set of bytes rather than needing to break the data up in to bits before running through the whitening and CRC algorithms.

While Mike worked on the whitening algorithm, he set the CRC as an open challenge, which I gladly took up.  I thought that it may make an interesting post to explain how CRC algorithms are implemented and show how to trade off time for memory, or time for space complexity for the computational theorists among us, by using a look up table (LUT).  This may be common knowledge to many people and there are automated tools to achieve it, but I wanted to work it out by hand.

This part is, at least in part, for my own reference when I look at the code in a year's time and ask "who did that?  And how o we know it's correct?"

Linear Feedback Shift Registers

Linear Feedback Sift Registers (LFSRs) are often used for CRC checks, forward error correction or to generate pseudo-random data.  They are computationally cheap and simple to implement in hardware if required, so they are perfect for low cost networking chips.  Bluetooth uses them to implement data whitening, header error checks, CRCs and forward error correction on packet data.

The LFSR that implements the CRC on BTLE packets looks something like this:
The LFSR for CRC on BTLE packets as drawn by me.  See Vol 6, part B, Section 3.2 of the Bluetooth specification for a better, but non-free version of the diagram.
For simplicity we can imagine the LFSR as parts, a shift register and the feedback element, using XOR.  Each incoming bit of packet data is XOR'd with the right-most bit of the register, for consistency we'll assume that the bits are numbered 0-23 from left to right.  Bit 23 is XOR'd with the incoming data bit and becomes next_bit.  The register is shifted one bit to the right and next_bit is added to the end, becoming bit 0.  This is a shift register.

Now for the feedback part, each of those arrows feeding in to the top of the register represents a bit in the register that will be XOR'd with next_bit.  T\hat is all you need to know about LFSRs for most usese, in fact it should be trivial to implement one using the above information.  Here's our implementation of the above LFSR:

u32 btle_calc_crc(u32 crc_init, u8 *data, int len) {
    u32 state = crc_init;
    u32 lfsr_mask = 0x5a6000; // 010110100110000000000000
    int i, j;

    for (i = 0; i < len; ++i) {
        u8 cur = data[i];
        for (j = 0; j < 8; ++j) {
            int next_bit = (state ^ cur) & 1;
            cur >>= 1;
            state >>= 1;
            if (next_bit) {
                state |= 1 << 23;
                state ^= lfsr_mask;

    return state;

Optimising the LFSR

As you can see, we run through the inner loop for each bit of data, although we only perform the XOR if we next_bit was set.  This is a very small optimisation that makes use of the shift operation filling with 0s and the fact that XOR with 0 would have no effect.  Logically this process looks a little like this:
The LFSR split in to a shift and a feedback, or XOR, component.
The diagram above shows the two stage LFSR, with the second stage containing the different masks to be XOR'd with the register depending on the state of next_bit.  This is a two value look up table holding 24 bits od XOR mask.

If we can shift then look up the XOR for one bit, why not more?  As long as we shift by the appropriate amount, the XOR result only relies on the incoming data and the state of the register.  Even better, there is no feedback in to the lowest byte of the register, so early bits in an incoming byte don't affect the value of later bits.

Working with Bytes

Taking a byte of input data, we first XOR it with the lowest byte of the register to get next_byte, then we shift the register to the right by a byte and append next_byte.  This takes care of the shift.

To finish off we need to apply the eight XOR masks based on the content of next_byte.  As the register is shifted for each bit, the masks are XOR'd together with each successive mask shifted by one bit, this is shown in the diagram below.
The final mask is produced by XORing the mask for each bit of next_byte.

The derived mask is specific to the next_byte value of 01101101, so we are able to store it in a table and retrieve it for future use.  If we do this for all 256 values of next_byte we can build a full look up table, and use it to calculate the CRC.

The following code implements the CRC using a LUT:

u32 crcgen_lut(u32 crc_init, char *payload, int len)
    u32 state
= crc_init;
    int i;
    u8 key;
    for (i = 0; i < len; ++i) {
        key = payload[i] ^ (state & 0xff);
        state = (state >> 8) ^ crc_lut[key];
    return state;

The LUT itself consists of 256 32bit values, so is too large to reproduce here, but it can be found on Github.

While it is possible to write code to that builds the LUT from shifted masks for each value of next_byte, it was easier to use the known good implementation of the CRC algorithm given earlier to provide the final state of the register for all one byte payloads and then XOR it with the pre-mask state, as shown below.
The XOR mask for each key is calculated and then stored in the LUT.
After looking at the code, Michael Ossmann pointed out that leaving the key byte blank while building the LUT would yield the same result and avoid a pointless shift operation in the final algorithm.  It seems that no matter how nerdy you try to be, someone will out-geek you.

Thursday, 7 February 2013

Motivating the Problem

One of the most difficult aspects of talking to people about Bluetooth packet sniffing is what my university supervisor called "motivating the problem". What he meant by this was trying to convince others that the problem which you were trying to solve was really as hard as you know it to be.
Sniffing Bluetooth is Hard
Over the past five years we have dedicated a lot of time to motivating the problem when we give presentations on Bluetooth security, often resulting in glazed looks from some audience members. This post is intended to go some way towards explaining the challenges facing our project and to encourage anyone interested to participate.

Bluetooth packet sniffing falls foul of the motivation problem because it is so often compared to other wireless protocols that appeared at around the same time, such as 802.11 and Zigbee, which had promiscuous packet sniffing solutions available, using commodity hardware, soon after their release.

Another reason that Bluetooth sniffing is hard to discuss is the set of terms that need to be defined before we can even begin to describe the problems involved. At a minimum, the following are useful to know before entering in to a discussion about Bluetooth packets:

Piconet - A personal area network with one master device connected to potentially many slave devices.
Master device - The device that defines a piconet, often but not always the "smartest" device, e.g. a PC or phone.
Slave device - The device being connected to the master, e.g. a keyboard, mouse or headset.
Bluetooth Device Address - A 48 bit unique device address, usually shown in the same format as IEEE 802 MAC addresses and issued from the same address space. It is common for smartphones to have consecutive Bluetooth device and wifi MAC addresses.
NAP - Non-significant Address Part. The first two bytes of the device address.
UAP - Upper Address Part. The third byte of the device address. Forms the organizationally unique identifier when combined with the NAP.
LAP - Lower Address Part. The lower three bytes of the device address, assigned by the manufacturer.
CLK27 - A 27 bit counter that increments 3200 times per second and wraps in slightly less than 24 hours. Every device maintains an internal clock value, although we are mostly concerned with the master device's clock. Often referred to as "the clock".
CLKN - The upper 26 bits of CLK27.  This is a clock that ticks 1600 times per second, once per packet "slot".
AFH Map - Adaptive Frequency Hopping allows Bluetooth connections to avoid using noisy channels, such as channels that overlap nearby wireless networks. The map specifies which channels are available for a given connection.

The feature of Bluetooth that makes packet sniffing so hard is frequency hopping. Originally designed to ensure robust connections, it causes more problems and confusion for packet sniffers than any other feature. The pseudo-random hopping sequence that all devices within a piconet share is determined by the LAP, UAP and CLKN of the master device. To have any chance of extracting useful data from a Bluetooth connection we need to know these three values.

The situation gets even worse for encrypted links. To have any chance of sniffing the pairing process, and using the extracted data to find the pin (see: http://www.eng.tau.ac.il/~yash/shaked-wool-mobisys05/index.html), we must know these three values before the target devices begin to communicate.

One solution to this is to begin sniffing all traffic in a target piconet on the assumption that a new device will be paired with it in the future. In many ways it is good that this is not a practical attack vector, however it makes research and investigation in to Bluetooth authentication and encryption a harder task.

Devices that support Bluetooth v2.0+ also support adaptive frequency hopping (AFH), which adds an additional variable to the list that we have to find before we can monitor a connection.

The Ubertooth tools passively monitor each channel in turn to find piconets and build up information on the LAP, UAP, CLKN and AFH map, this behaviour can be seen in the Kismet plugin.  Ubertooth-follow is the one exception to this method as it uses a Bluetooth dongle to acquire the values that the Ubertooth needs to follow a hopping pattern.

Hopefully this has provided a crash course in Bluetooth packet sniffing for anyone who wants to get involved or try out the Ubertooth tools.  We're working hard to improve the amount of data that we are able to collect as well as adding features such as packet injection, Bluetooth Low Energy support, integration with external tools such as Wireshark and Kismet and support for more low cost embedded platforms such as ARM (raspberryPi, BeagleBone) and Android.

If you would like to get involved in the Ubertooth project, you can join the discussion in #ubertooth on irc.freenode.net or join the Ubertooth mailing list.