ALSR function pointer addresses for PRNG seeding

Written 2020-12-26

Tags:ASLR PRNG C 

On a recent /r/c_programming post, someone noticed that srand(time(NULL)) would duplicate libc random number generator initialization whenever two programs ran it within the same second. /u/jujijengo commented that you could instead use srand(time(NULL) ^ malloc ^ printf), which on ASLR-aware operating systems should throw in some bits related to the addresses of dynamically linked functions like malloc and printf. I thought this was a pretty slick idea, but something bothered me about using two functions that could be in the same library.

As an aside, you should always evaluate random number generators against your application requirements. libc srand()/rand() aren't great, but today we'll focus only on reducing the number of repeated seeds at the same time on ASLR-aware operating systems.

On 64-bit ASLR systems, the address of a library should have at least a few random bits (How many? How random? Up to the OS and address space) that tend to vary between program executions, so xoring them with the time should help.

But, the addresses of malloc and printf usually aren't independently random - at least on Debian Buster, they're both provided by GNU libc. Effectively, the contiguous pages of libc's code will be placed at a random location, but the following are static across process creations, even with ASLR:

Test Program

A simple C program with the following snippet of code can be executed many times to output ASLR-based libc function addresses:

printf("%p %p %" PRIxPTR" %" PRIiPTR "\n",malloc, printf, (uintptr_t)malloc^(uintptr_t)printf, (intptr_t)((uintptr_t)malloc - (uintptr_t)printf));

Results on Debian Buster on AMD64

Here are some statistics on 100k runs on a 64-bit Debian Buster Intel laptop from the snippet above:
number of samples in run:100000
unique locations of malloc:99981
unique locations of printf:99981
unique results of malloc xor printf:70
up to 10 most common counts and offsets between malloc and printf:
 100000 179696
up to 10 most common counts and values(hex) of malloc xor printf:
   3156 7c630
   3222 1d4630
   6185 74630
   6239 d4630
   6243 6c630
   6324 5c630
   6334 3c630
  12409 2c630
  12457 54630
  12479 34630

Offset between malloc and printf is probably fixed

This makes sense because if malloc and printf are in the same text section of libc, we would expect the dynamic linker would always locate into contiguous memory pages.

ASLR can repeat libc load addresses on a 64-bit machine

"ASLR: How Robust is the Randomness?" by Ganz and Peisert suggests 64-bit ASLR can use 48-bits of random virtual address, which seems reasonable, since it must be less than 64-log2(PAGE_SIZE), which is 52 bits on this system. Also, at least on this system, the upper four bits of text addresses in libc appear to always be 0x7.

But even assuming the ASLR implementation uses a good PRNG, there's still a birthday problem limitation. But none of the online birthday attack calculators I tried support computing the probability of collisions between 2^48 unique locations over 100000 samples.

Whatever the cause, we do see that over 100k program invocations, repeated addresses do occur, but rarely. It's not always 19 times per 100k, that's just what happened this run.

ASLR repeats malloc xor printf much more often than either function independently.

This makes sense to me, as any time the upper bits of both function addresses are the same, xor will zero them, and this is where most of the randomness of ASLR lives.

ASLR repeats malloc xor printf more seldom than I thought it would.

Initially I had worried was that malloc^printf would evaluate to a constant - this was not quite correct - it seems to depend on the offset between them. If malloc and printf are located in the same virtual address page, then malloc ^ printf's upper ASLR bits would always cancel to zero and the lower bits would represent only xor of the offset differences.

At least on this system, malloc and printf at around 175 kilobytes apart, so malloc ^ printf is a mix of behaviours. The upper bits are still zeroed. The lower bits still represent the xor of the offset differences. But the middle bits change whenever the upper bits of the offset differences overlap the lower bits of the page address differences.

Conclusions

Using ASLR addresses to scavenge a few noisy bits can work for simple PRNG seeding, but care must be taken that each address is independent. /u/jujijengo suggested Melissa O'Neill's approach in the PCG entropy fallback generator, which uses time(NULL) ^ current_function ^ address_of_a_stack_variable if /dev/random is not available, which should work well, since stacks and text-segments are always separately locatable by ASLR.