Marsaglia's XORSHIFT32 in PSIMD

Written 2018-08-03

Tags:PRNG XORSHIFT PSIMD 

Here's a simple C program using PSIMD to implement Marsaglia's XORSHIFT32 PRNG with four parallel generators.

#include "psimd.h"
#include 

PSIMD_INTRINSIC psimd_u32 psimd_lsr_u32(psimd_u32 a, uint32_t b)  {return a >> b;}
PSIMD_INTRINSIC psimd_u32 psimd_lsl_u32(psimd_u32 a, uint32_t b)  {return a << b;}
PSIMD_INTRINSIC psimd_u32 psimd_xor_u32(psimd_u32 a, psimd_u32 b) {return a ^  b;}

psimd_u32 xorshift32_4( psimd_u32 x )
{
    x = psimd_xor_u32( x,  psimd_lsl_u32( x, 13 ) );
    x = psimd_xor_u32( x,  psimd_lsr_u32( x, 17 ) );
    x = psimd_xor_u32( x,  psimd_lsl_u32( x, 5  ) );
    return x;
}

uint32_t xorshift32(uint32_t x)
{
    x ^= x << 13;
    x ^= x >> 17;
    x ^= x << 5;
    return x;
}

#define print_psimd_u32( _x ) printf("%08x %08x %08x %08x\n", _x[0], _x[1], _x[2], _x[3] )

int main()
{
psimd_u32 state_start = { 1, 2, 3, 4 };
print_psimd_u32( state_start );

psimd_u32 state_vector = xorshift32_4( state_start );
print_psimd_u32( state_vector );

psimd_u32 state_scalar = { xorshift32(state_start[0]),xorshift32(state_start[1]),xorshift32(state_start[2]),xorshift32(state_start[3]) };
print_psimd_u32( state_vector );

return 0;
}

And we get the same results for both PSIMD and scalar execution

0x000000010x000000020x000000030x00000004Starting States
0x000420210x000840420x000c60630x00108084PSIMD States
0x000420210x000840420x000c60630x00108084Scalar States

With GCC6.3, -O3, assembly for the two algorithms comes out to the same number of instructions, even though one does four times the work.

xorshift32_4:
    .cfi_startproc
    movdqa  %xmm0, %xmm1
    pslld   $13, %xmm1
    pxor    %xmm0, %xmm1
    movdqa  %xmm1, %xmm0
    psrld   $17, %xmm0
    pxor    %xmm1, %xmm0
    movdqa  %xmm0, %xmm1
    pslld   $5, %xmm1
    pxor    %xmm1, %xmm0
    ret
    .cfi_endproc

xorshift32:
    .cfi_startproc
    movl    %edi, %eax
    sall    $13, %edi
    xorl    %edi, %eax
    movl    %eax, %edi
    shrl    $17, %edi
    xorl    %edi, %eax
    movl    %eax, %edi
    sall    $5, %edi
    xorl    %edi, %eax
    ret
    .cfi_endproc

PSIMD

Written 2018-08-03

Tags:SSE NEON ARM PSIMD Intel 

PSIMD is a set of Single Instruction, Multiple Data intrinsics portable across multiple CPU backends for GCC and Clang. These instrinsics map nicely to 128-bit SIMD units like Cortex-A ARM NEON, as well as SSE2 on Intel. It may also take advantage of SIMD units on other processors as supported by GCC and Clang, but these are the only ones I'm familiar with.

PSIMD is the first attempt I'm aware of to fully abstract the underlying hardware into a medium level compiler intrinsic. For another post, I'm working on replacing an example of Marsaglia's XORSHIFT PRNG using PSIMD, and I haven't yet had to worry at all about running on ARM or Intel.

Park Place Hotel Implosion

Written 2018-06-24

Tags:implosion destruction 

ParkPlaceHotelImplosion

UUID Smali Regex

Written 2018-05-31

Tags:BLE Bluetooth 

This is just a note so that I don't have to type it every time. But if you...
egrep -i "[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}"
it finds UUIDs. I use it to search for BLE UUIDs in smali disassembly.

ANKR Bluetooth Tracker Hacks

Written 2018-05-31

Tags:BLE ANKR Bluetooth 

ANKR

ANKR makes a neat little device you can clip on your keychain and make your keys beep from your phone. You can also make your phone ring by pressing a button on the keychain device.

Bluetooth Low Energy (BLE) Hacking

Download the APK for the ANKR app, unpack it with apktool, and grep the smali disassembly for 'java/util/UUID'. com/bleon/ankrmanager/Bluetooth/NordicBasedDevice/NordicBeaconDevice.smali contains the following handy Bluetooth service and characteristic UUIDs: Funny enough, if we write a nonzero byte between 0 and 10(0xA) to the buzzer characteristic, the device emits that number of beeps. From this, it's trivial to develop an app or nRF52 based device that scans for the ANKR Custom Service and causes them all to beep.

Older