Introducing... RHex

Written 2011-04-05

Tags:Regular expression Programming Hex editor Editors 

What is RHex? Richard's Hexeditor - currently is a list of features I want in a hexeditor. Eventually, I would like to implement it, as it seems every hex editor is missing something I need. Currently I use hexedit and codewright as hexeditors. Here's what I want:
  • Side-by-side text and hex display (almost all editors have this)
  • text regex(most have this)
  • binary regex(some have this)
  • replace section with different size(few have this)
  • asynchronous editing(If I insert a byte into a 7 gig file, I don't want to wait for the rest of the file to be re-written)
  • libmagic support(none I know of do this, also I want a mode where each byte is a possible start of magic)
  • scriptable file decoder(If I'm editing a JPEG, I want some color highlighted text for different areas)
  • big file support( >4gig, but I'm not opposed to the 32 bit mode only supporting 2gig files or smaller)
  • variable sized groupings ( If I'm looking at a structure that consists of uint32s, I want the editor to reflect that), and...
  • offset groupings ( If I'm looking at a structure that consists of uint32s, but at an address not divisible by 4, I want the editor to reflect that )
  • reverse-endian groupings( If I'm looking at a structure that consists of uint32s, I want the ability to have the editor reverse the groups of bytes for user i/o)
To implement this, I'd like to design it as a library for file editing. This file would be responsible for memory management and either threading, asio, or nonblocking i/o. Hopefully this way you could make a gui or console version without too much work. However, the lowest level has a lot of bookkeeping to do to allow asynchronous editing. I might want two modes, a 'safe mode' where the editor is writing a new version of the file and a 'live mode' where the editor is actively editing the file directly, probably through mmap(). Either way, it has to manage outstanding changes so that the editor is always presented with whatever it wrote into the hex library, and the library is trying to get it synced to disk quickly.

Tough stuff, but manageable. If only I had time to write it.