BlinkUp Protocol Reversing

Written 2018-04-06

Tags:Toymail Modem WiFi ElectricImp 

ElectricImp's BlinkUp protocol is an optical modulation protocol designed for low-speed data transfer using mobile phone displays or LEDs as transmitters. Since the transmitter runs on my phone, we can begin to reverse engineer the protocol strictly from the smali assembly extracted from the APK file.

To start, smali/com/electricimp/blinkup/'s createFromIntent() shows there are currently four different packet types sent over BlinkUp, with (packet type numbers used):

For each packet type, the packet coder is told the packet size, contents, then told the packet is complete, at which point a 16 bit CRC is appended to the packet.

Packet Bitstream Structure


Packet Header

The packet header is quite simple, a preamble of 8 times the byte 85. In binary, this creates an alternating pattern of ones and zeros. Usually, in modem design, a preamble such as this allows the receiver to notice that data is incoming, and to synchronize its receive clock with the transmitted clock. Next, the header byte 42(0b00101010) comes. At this point, the CRC accumulator is zeroed out. Finally, the packet length is appended, updating the CRC.

Packet Payload

This varies with the packet type, but starts with a 1 byte indication of the packet type, followed by a UTF8 string ending with a NULL terminator. For packets without a string argument, the payload is simply the packet type byte followed directly by a NULL terminator. Multiple strings can be encoded in a simple integer-to-string dictionary by appending more packet type, string pairs. Perhaps packet type is not the best descriptor, perhaps string type?


Luckily, the implementation uses a CRC table. This means we can take the lazy path to fingerprinting it by searching github for the strings of numbers we find in the CRC table until we find an identical table in somebody else's source code. In this case, the CRC is a CRC-16, The poly is 0x8005 (x^16 + x^15 + x^2 + 1)

Encoding options

No protocol would be complete without a bunch of random cruft tacked on over time as the details of whatever mobile phone you used change.

Baud Rate

There are options for encoding the bitstream at both half and two thirds speeds, I suspect to deal with faster and faster mobile phone displays.

Binary vs Trinary Encoding

In the simple case, bits are encoded with a black screen meaning zero, and a white screen meaning one.

But another option is the ability to encode in tri-level flashes. In this case, bit 0 is presented as black followed by grey, and bit 1 is presented as black followed by white.

Toymail Talkie Network Services

Written 2018-03-31

Tags:Toymail WiFi ElectricImp 

Toymail makes some pretty neat little IoT toys that let kids and parents send short voice clips to each other, or to other friends. @that_guy_ego recently brought on to SecKC to play with. I downloaded the Android APK and started disassembling it. Of note, the toys heavily leverage a technlogy from ElectricImp, which makes small IoT modules with WiFi. These are known as Imps. The Imp chip forms the majority of complexity inside the Toymail toys. Toymail runs an app on the Imp that implements the toy's behaviours and features.



During mobile app setup, the user is prompted for a wifi network and password. This is sent using ElectricImp's BlinkUp protocol. BlinkUp works like a simple modem, but instead of encoding data in sound, the setup information is encoded with light using the mobile device screen. Of note, the data flow is only to the toy, there's no response back. In addition to the WiFi network credentials, a unique token is generated and sent to the device. Once the toy connects to the network, it uploads the token, informing ElectricImp that it has a network connection, and ElectricImp informs the mobile app that the toy is now online.

The details of the BlinkUp protocol may be a future post

Firmware Updates

Firmware updates are served from, and appear to be encrypted. We haven't yet verified this, but I'm not surprised.

Imp->ElectricImp Communications

Each model of Imp uses a different DNS name to call home. For example:

The clients, Imps, may have very different TLS stacks limited in different ways. This allows ElectricImp to upgrade TLS independently, without breaking clients that may not support the latest TLS.

Interestingly, all of the above URLs require TLS client authentication. From Petasense, who also uses Imp modules, we see that they're used to identify each client to the server.

App->Toymail Communications

Interestingly, the Android app does not use TLS pinning, and instead relies upon the mobile platform to handle security.

The following shell script will dump the possible API endpoints once the app is disassembled:

grep -r https . | cut -d "\"" -f 2 | sort -u 


Future Work

Why Duffs Device is Awesome

Written 2018-03-11


Recently, I had the need to optimize some low level graphics code on a Cortex-M4, I ran into a need to issue a large number of consecutive stores, similarly to what Tom Duff once needed to do, but I needed to write a value to consecutive memory addresses instead of copying a large buffer into a memory-mapped I/O port.

I'm using GCC6.3.0 to count instructions, with 'arm-linux-gnueabihf-gcc-6 -O3 -mthumb -mcpu=cortex-m4 -S'

To express my problem in a sentence, write a function that takes a pointer to an array, a value to fill the array with, and a number of elements to fill, and fills the array with the value as efficiently as possible. This is quite similar to memset, though the array elements and value are the same type, and count will always be positive.

A simple loop approach:

void memset32(uint32_t * buf, uint32_t val, uint32_t count)
while( count-- )
	*buf++ = val;

This takes 4 instructions per value to fill. Ideally this number should be as close to 1 as possible. But it's simple and clean. Like what you'd see in OpenBSD.

A simple unrolled approach:

void memset32_unroll(uint32_t * buf, uint32_t val, uint32_t count)
while( count > 8 ) {
    *buf++ = val;*buf++ = val;
    *buf++ = val;*buf++ = val;
    *buf++ = val;*buf++ = val;
    *buf++ = val;*buf++ = val;
    count -= 8;
while( count-- )
    {*buf++ = val;}

By unrolling the loop 8x, this comes to 12 instructions per 8 values to fill, with 4 instructions per value for anything left after the unrolling. This has the downside of being slightly larger than necessary. Additionally, we could save an instruction from the unrolled loop by subtracting 8 and using the architecture's built-in compare with 0, bringing the expected total to 11 instructions per 8 stores.

An unrolled switch case

void memset32_naive_switchloop(uint32_t * buf, uint32_t val, uint32_t count)
switch( count ) {
    case 8:*buf++=val;count--;
    case 7:*buf++=val;count--;
    case 6:*buf++=val;count--;
    case 5:*buf++=val;count--;
    case 4:*buf++=val;count--;
    case 3:*buf++=val;count--;
    case 2:*buf++=val;count--;
    case 1:*buf++=val;count--;
        goto loop;
    case 0:return;

On ARM, this uses a table-branch to locate which label to use in the switch case, then executes unrolled stores until reaching 'case 1' and repeating the loop. Coincidentally, this exposes GCC bug 70359, and doubles the number of instructions per store! Outside of that, one downside is that two branches are used per loop, one for the 'goto'(or while loop if you prefer), and one for the switch-case.

Duff's Device

We should be able to do better than the above approaches. We want a single block of store instructions, we want most passes through the loop to consist of unrolled loops. We want the block of stores that doesn't fit in the unrolling to be done as quickly as possible, ideally without a per-store loop. And we don't want multiple branches for the majority of loops. Duff's Device gives us all of these.

void memset32_duff(uint32_t * buf, uint32_t val, uint32_t count)
uint32_t n=(count+7)/8;
    case 0: do{ *buf++ = val;
    case 7:     *buf++ = val;
    case 6:     *buf++ = val;
    case 5:     *buf++ = val;
    case 4:     *buf++ = val;
    case 3:     *buf++ = val;
    case 2:     *buf++ = val;
    case 1:     *buf++ = val;

Duff's device uses the count%8 to figure out how many stores would not have unrolled cleanly, and does them in the first pass through the loop by jumping into the middle of it. The compiler treats the remaining passes as a conditional branch from the end end of 'case 1' directly to 'case 0'.

On ARM32, if you ignore the GCC bug by combining STR and ADDS instructions, this comes to a total of 10 instructions per 8 stores, and is the shortest unrolled implementation.

Google your UUIDs

Written 2018-01-20

Tags:OURA Google UUID Bluetooth 

When reverse engineering a device, it is often times handy to Google any UUIDs you might find.

In my case, I otherwise would not have noticed that Antti works at OURA, because a github issue was posted with an OURA Bluetooth UUID(98ED0001-A541-11E4-B6A0-0002A5D5C51B) that is only present in the OURA APP and AES encrypted ring firmware.

In my particular case, this corroborates my suspicion that the following UUIDs:

belong to the OURA's bluetooth communications subsystem. This is trivial to prove if one owns an OURA, but as I do not, the github issue suffices.

Quarters In My Change Jar

Written 2018-01-01

Tags:Statistics Math 


To evaluate the feasibility of an upcoming project, I needed to know the approximate distribution of quarters in circulation in the local area. Upon filling my change jar, I filtered out the quarters and counted them. There were 102 quarters.

Trial was repeated 5 days later with 400 quarters from a local bank, then another 400 quarters and results were summed.

By Year

102 Quarters502 Quarters901 Quarters
ByYear ByYear502 ByYear901
By year the data is a little hard to digest.

By Decade

102 Quarters502 Quarters901 Quarters
ByDecade ByDecade502 ByDecade901
By binning the data into a point per decade we can get a clearer picture. I think there are a few possible effects here, but the data is so noisy I wouldn't claim anything:

Other work

Rhett Allain at wired has a similar piece up.