Several years ago, I spent a serious chunk of time figuring out how to make really teensy ELF executable files. I started down this path because I was annoyed that all of my programs, no matter how short they were, never got smaller than 4k or so. I felt that was excessive, for C, and so I started looking at what ELF files contained, and how much of that actually needed to be there. (And then, after a while, how much of it was supposed to be there but could be ripped out anyway.) Anyway, I eventually managed to shrink an executable down to 45 bytes, and I was able to demonstrate that that was the smallest possible size an ELF executable could be and still run, under x86 Linux at least.
I wrote up my findings, and some people found it interesting, and I got some positive feedback. A couple of people naturally pointed out that a shell script that did the same thing was much shorter than 45 bytes, to which my response was always that a shell script is not an executable, and if you want to consider scripts then you need to include the size of the interpreter binary along with the script size.
But then one Internet Random Person™ pointed out that I could have made a smaller executable if I had created an aout binary instead. If you don't know what aout files are, don't worry — that just means you're not old. (They are also called "a.out files", but that can be easily confused with just a file named "a.out", so I prefer to spell the name of the format "aout".) At one time the aout format was widely used on Linux, because it was Linux's only binary format. ELF was not introduced to Linux until version 2.0 (or more precisely in one of the 1.x experimental kernels). aout was, and still is, a very simple format. It sports a 32-byte header, along with a handful of other metadata. Unfortunately the aout format has some annoying limitations around dynamic linking, so the fact that Linux switched away from it early on is not too surprising. ELF is a much nicer format for a mature system. But even though aout binaries were no longer fashionable, they still worked.
However, when I first tried to run the aout executable the person had sent me, I got an "Exec format error" message — i.e. this file format is not supported. It turned out that a security issue had been uncovered at one point that involved aout core dumps. (Did you know? Executable file formats come with their own core dump file formats to match.) I'm vague on the details, but as a result most distros started compiling their kernels without aout support. The format was considered to be pretty thoroughly deprecated by that time, so it wasn't seen as a difficult call to make.
(Support for aout is still present in the kernel source tree, however, and you're free to include it if you compile your own kernel. At the time of this writing there was talk about removing it entirely, but apparently some architectures still have a use for it. Some people have suggested removing support just for aout core dumps, but in the absence of a pressing issue it remains as it is. The whole conversation is a good reminder that adding a feature to software is often far easier than removing it.)
But so I did compile a kernel with aout support, and verify that the 35-byte binary did in fact work. And the whole thing got me to wondering, how many executable file formats does Linux actually support? I looked into it, and I found out that the way in which Linux handles binary formats is a dynamic feature of the kernel. That is to say, it's relatively straightforward to add support for a new format, without having to recompile your kernel, or reboot your machine even.
.jarcan automatically invoke the JavaVM for you. That feature is controlled via the
/proc/sys/fs/binfmt_miscsystem, so check out the
binfmt_miscdocumentation if you're curious. Interpreters are not what I'm interested in here, though; I'm focusing exclusively on actual binary files.
What I was secretly hoping to discover was if a "flat" format existed — that is, a binary file format with no metadata at all. Obviously, such a format would allow for even smaller executables. No such format was supported, however. But that isn't too surprising, as such a format isn't very useful. Where there is no metadata, there are no features, no options. A flat format is a one-size-fits-all approach, and that's not what most people need from their binary format standards.
Despite such shortcomings, it so happens that there is a flat,
metadata-less executable file format that is well supported on another
popular OS. If you haven't already guessed, I'm referring to the
.com file format that MS Windows supports, having inherited it from
MS-DOS, which in turn inherited it from CP/M. It is truly flat. When
you run a
.com file, the OS loads the whole thing in memory at a
standard address and runs it. And this approach works okay on a
single-tasking system like MS-DOS. (Or rather, on the MS-DOS-like
subsystem that MS Windows presents to a running
.com file.) In that
environment, the OS can say, "Here you go, program. You have 640
kilobytes of RAM. Have fun!"
And so naturally I asked myself, what would it take to get a
file format working under Linux? I mean yes it would not be terribly
useful … but it would let me make the smallest executable file ever.
Sure, I could never hope to get support for such a format added to the
actual Linux kernel. But I could add it to my own kernel, where at
least I would be able to use it myself. I could be living in my own
Linux makes it easy to do this sort of thing via loadable kernel modules. What exactly is a kernel module? Basically, a kernel module is an object file built for the kernel that just hasn't been linked yet. The trick is that you can link one into a running kernel without having to stop the kernel or even pull over. (This isn't quite the same thing as what is generally meant by "dynamic linking", though it is very similar in spirit.) Kernel modules mainly allow users to manage support for various kinds of hardware dynamically, but they allow you to add support for all kinds of things without having to recompile the kernel. So let's take a moment and look at how to create a kernel module.
Before we get started, we need to make sure that the kernel's header files are present on the machine. For users on Debian-based systems, this is typically done with the shell command:
uname(1)is used to ensure that you're getting the files that match the specific version of the kernel you're currently running.
This will install a directory under
/usr/src corresponding to the
your current kernel version. (In fact, you may find that you already
have several directories located there, one for every version of the
kernel that you can currently boot into.) This directory contains,
among other things, all of the header files that we'll need to build
We'll start with a very simple one, predictably named "hello kernel".
A quick rundown of what is being done here:
|The header file |
|These macros just insert some metadata into
our module. You can view this information via the |
|And then there are two special functions,
the init function and the exit function. The |
The kernel folks have made it extremely easy to build kernel modules. Here's the makefile:
All you do is put the name of your object file on the first line, and everything else is done for you.
So, if we run
make(1), a bunch of stuff will get created:
hello.o, a regular object file, but we also have an object
hello.ko. This is the kernel module. We have become
insmod(8) tool can be used to load this module into the running
When we type
lsmod(8) to list the active modules, it's right there at
the top, as the most recently added module.
The actual effect of this module is simply to output some log
messages. We can use
dmesg(8) to verify that our module really did
We can then use
rmmod(8) to remove the module from the running
kernel whenever we want:
So that's all there is to writing kernel modules.
No, of course that's not true. Writing a useful kernel module does
require some specialized knowledge. For example, kernel modules don't
have access to
libc itself is mainly an abstraction
layer that sits atop the kernel. That said, much of the functionality
you're used to having easy access to as a C programmer is also present
in the kernel, though sometimes in slightly different dress.
(Filesystems, for example, are one bit of detail that we as Unix
programmers often ignore, but are obviously a major concern inside the
But don't let all that deter you. The rewards of writing kernel
modules are worth the inconveniences. There is a great deal that you
can do inside a kernel module that is simply impossible outside of it.
Remember, Linux is a monolithic kernel — which means that once you
are loaded, you have the keys to the kingdom. Your lowly, nonstandard
kernel module can do anything. Of course, this is a double-edged
sword, because that also means that you can accidentally do
anything. As it happens, a lot of bad things are actually pretty hard
to do accidentally, such as mucking up some other process's code.
You'd need to jump through a few hoops just to get a pointer to
someone else's memory. But some unfortunate things are remarkably easy
to do. For example, one time while working on my kernel module, I
--i instead of
++i in the iterator of my
loop. I inserted that module into my kernel to test it, and my mouse
cursor disappeared, and my music stopped playing … and then it was
time to reboot my computer.
But that sort of risk shouldn't scare you away. With modern journaling file systems and the like, you're never going to be at any real risk of losing data. (I mean, unless you're actually working on implementing a filesystem, in which case please back up your files regularly.) I encourage you to experiment and try out your own ideas for kernel modules.
All right, but what exactly does this have to do with making smaller executables? Well, as I mentioned earlier, the kernel's list of accepted binary file formats is dynamic. Specifically, this means that there are functions inside the kernel that allow code to add and remove binary formats from this list.
This is done by registering a set of callback functions, and these callbacks get invoked when the kernel is asked to execute a binary file. The kernel invokes the callbacks on this list, and the first one that claims to recognize the file takes responsibility for getting it properly loaded into memory. If nobody on the list accepts it, then as a last resort the kernel will attempt to treat it as a shell script without a shebang line. And if that doesn't fly, then you'll get that "Exec format error" message described above.
\n" near the top can produce some odd-looking error messages if you try to execute it.
So we find ourselves in possession of the following facts:
Obviously, there is only one possible response to this situation.
We wish to write a kernel module that implements a flat, metadata-less binary file format for Linux. So, that's what I did.
Unlike our first kernel module, this one is actually doing some interesting work. So let's take the time to walk through this code and understand what's going on.
|Our very-short init function just calls
|The important fields of the |
|And this function is where all the work gets
done. It will be invoked by the kernel every time someone is
attempting to execute one of our files, and its purpose is to get the
file's contents into memory and running. The function is passed a
single argument, |
Recall how a program is launched under Unix: first the
call is used to duplicate the process, and then the
system call replaces the process's current program with a new one.
forksystem call is not quite the same thing as the
fork()function supplied by
libc, although the latter is just a thin wrapper around the former. Similarly,
libcprovides a family of seven different "exec" functions, but they all ultimately invoke the
The nice thing about this system is that we never have to worry about
actually creating a process from scratch. That's done for us. Every
program's process is a copy of pid 1, duplicated through a succession
forks. Our callback will instead be invoked during the
system call. In effect, when the kernel calls us, it is asking, "Hey,
I've got a file here. The user claims it's an executable binary, but
it's not an ELF file. Do you want to deal with it?" Every callback
function that has been registered with
called, in order, going down the list, until someone takes
responsibility for the file.
So that's the first thing our callback function needs to do: it needs
to decide whether or not this is actually a
.com file. Which raises
the obvious question: how do we even do that? Most binary formats
looks for a magic-number identifier in the first few bytes of metadata
— but we have no metadata. So then what?
Well, how does MS Windows identify
.com files? Answer: it looks at
the filename. When you try to execute a file with a name ending in
.com", that's all MS Windows really cares about. "Oh, you're a
.com file, are you? Okay: here's 640k and an interrupt table. Call
me when you're done."
|So that's what we do, too. One of the
fields of the |
All we have to do now is load and run the file. No pressure, right? Luckily for us, the kernel provides lots of functions that will do almost all of the heavy lifting for us. We just need to oversee the whole process. So let's quickly run down the sequence of events.
|The very first thing we do is call
|Personality is an obscure feature that allows certain behaviors of the kernel to vary on a per-process basis. For whatever reason, it's not reset by the flush.|
|At this point we are now cleared to start
defining our memory image, which is currently very empty. So the first
thing we want to do is determine how big the file is, since that's
also the size of program. Inside the kernel, we don't have the
familiar file descriptors. Instead, we have file objects. As you might
expect, the |
|We aren't setting up anything else that
processes typically contain, because we're just so down to earth like
that, so we can go straight to calling |
|And now that that's official, let's
actually load something into memory. Yes folks, it's finally time to
|We now call |
|The function |
|And then, at last, we call
Phew. It's definitely not trivial, the process of setting up a process. But as I said, all of the real work is done by other code.
Now in order to actually put this kernel module to the test, we'll need a program to execute. Specifically, we need to create a binary file in our flat, metadata-less format. One that actually does something.
The down side of creating our own binary file format is that none of our usual tools know anything about it. If we want to build a program in this format, we're on our own here. But, our format is so utterly simple that this shouldn't be hard. However, it does mean that we'll need to use assembly code.
As is traditional, our minimal test program will be one that exits
with a status code of 42. In order to make a system call under 64-bit
Linux, we need to set
rax to the system call number, and
the (first) argument, and then use the
syscall instruction. The
exit system call is assigned the ID number 60, so this should be all
bin format is
nasm's name for its flat binary output format,
so what we get in our output file is nothing more than the assembly
code that we specified.
Our project has borne fruit. Behold: it works, and it's twelve bytes in size. And we can verify that it is, in fact, our kernel module that is actually loading and running it:
This delightfully unadulterated binary file is almost a quarter the size of the smallest possible ELF executable, and less than a third the size of the aout executable that inspired this (admittedly ridiculous) exploration. And with zero bytes of overhead in our file format, we can be confident that no binary using a format that includes metadata can touch this one.
Although, of course, if we're going to start crowing about the size, then we should probably go ahead and use the smallest possible instructions.
rdi can be initialized in only three bytes of machine code, and
rax can be initialized in even less, thanks to the fact that it is
pre-initialized to zero.
Seven bytes. Seven!
To be clear, this is very much a nonstandard binary, and therefore it in no way invalidates or supplants my 45-byte ELF executable (or the aout executable). But it does make me very happy.
We should try writing a few more programs, just to verify that our kernel module really does work in general. Let's try a proper hello-world program.
More assembly, yes, but it's very straightforward for assembly. It compiles down to a 43-byte binary, and it does work:
By using shorter instructions, we could reduce this program to 35 bytes, perhaps less. I will leave that as an exercise to the interested reader.
As long as our programs only use a fixed amount of data, we can allocate space by just adding it to our binary file. If we need to allocate space dynamically, however, then that's going to be a problem. Why? Because our processes don't have a heap. Why not? Because we didn't set one up in our loader. Oops.
Let's address this oversight by going back to our kernel module and adding a few more lines of code. It's actually pretty easy. We just need to let the memory manager know what we want.
Most programs use a memory layout that looks something like this:
The sections are frequently broken up for the purpose of providing
different access rights. Code sections are marked executable but not
writeable, and the other sections are marked writeable and not
executable. The code and data sections have a constant size, while the
heap and the stack change in size as the program runs, with the heap
growing upwards and the stack growing downwards. (And if they meet in
the middle, then you've run out of memory — although on a 64-bit
machine you'll run out of physical RAM long before that point.) For
historical reasons, the end of the heap is called the program break,
brk system call can be used to move it around.
So let's tell the memory manager that we want our processes to enjoy the benefits of a heap:
The only measurement we have available to us is the file size, which
we're using to determine the size of the code section. The
PAGE_ALIGN macro rounds a value up to the next page boundary. Since
we can't allocate a fractional number of memory pages, we can take
whatever padding we'll get at the page's end, and let that be our data
section. Our heap will then be located directly following this. It
starts off with a size of zero, which the program can then expand as
(There's no reason why we couldn't define a larger data section for
our binaries, by the way. We would just need to hard-code a size for
it. Perhaps, to stay true to the MS-DOS roots of our format, we should
allocate a data section of 640k. But this requires making a second
vm_mmap() call to allocate non-file-backed memory, so I've chosen to
punt on that modification for the moment.)
After memory has actually been allocated, we will set the process's program break to its starting value. Our process should now have a functional heap that the program can dynamically modify.
In order to test this new feature, I've written an implementation of
cat, one that reads all of standard input into memory before doing
any output. It's not an interesting program beyond being a basic
demonstration of low-level heap allocation, so I won't go into it. If
you're curious, you can see the source here:
cat.asm. For now, we'll just note that it
succeeds at allocating memory at runtime:
/etc/mailcapis a text file well over 64k in size.
This version of
cat is not really a
cat utility, however, as it
can only read from standard input. A proper
cat program — and,
indeed, a majority of proper programs — will need to be able to open
files named on the command line. So how do we access the command-line
Frankly, we can't, at least not as things stand. You see, when an ELF
binary runs, it has values for
envp placed at
the top of its stack. Surprise! Those values are put there by the
loader. Yes, this issue is also the responsiblity of our kernel
So let's add support for this, too.
To be sure, the strings that make up the command-line arguments are
present in our process's image — specifically, they're sitting in
memory just above the stack. (Which means, given that the stack is
currently empty, that the
rsp register currently points to them.)
But this is no neat array of string pointers. It's just a lot of
strings, one after the other. A string of strings, if you will. Worse,
the environment variables come immediately after the command-line
arguments, without any indication of where one set ends and the other
begins. So they aren't really usable, as they stand.
At the absolute least, a program needs to know how many of the strings
are in each set. Well, it turns out that our kernel module has exactly
that information. In the
linux_binprm struct (provided via the
argument to our callback function, remember) there are two fields
envc. These are the number of command-line
arguments and environment variables, respectively. In theory, if we
transmit these two values to the running program, that would be enough
for the code to safely access the data. Of course, if that's all we
did, then every
.com program would need to trawl through their
string of strings, to determine where each item begins and ends. We
could just accept that as a fact of life for programmers using our
format, but since we're doing this work anyway, why not take the time
to do it right? We should provide our processes with
arguments — neat arrays of pointers to the strings in question —
like all the cool binary formats do.
Since these two arrays don't currently exist, we'll need to reserve
some memory for them. It may feel intuitive to tap our newly-minted
heap, but for this it actually makes more sense to just take it off
the top of the stack. (In fact, these arrays can be seen as forming a
sort of zeroth stack frame.) The top of the stack is at the top of
memory, which is stored in the
linux_binprm struct in the
p. So we want to build arrays in the memory
immediately preceding this address, and then move the top of the stack
down to precede our arrays.
Note, however, that we cannot use familiar functions like
to walk through these strings. Why? Because the memory holding these
strings isn't owned by the kernel; it belongs to the process itself.
So far, we've only been dealing with addresses in the process's
memory. We haven't tried to access that memory, except through other
functions, such as
vm_mmap(). It can thus be easy to forget that
kernel memory and user memory exist in two different address spaces
(not to mention different permission rings). The kernel is allowed to
access user memory, of course, but it needs to be intentional about
it, and this requires some extra work.
Within our kernel module, we use the
__user annotation to declare
pointers to user memory. And instead of using the
* operator to
dereference such pointers, we have two special macros:
read through a user pointer, and
put_user() to write through one.
And the kernel provides a handful of convenient functions, like
strnlen_user(), that will operate on strings stored in user space.
Once we have gone through the strings and populated our two arrays,
we'll still need to communicate the values for
envp to the program. The usual way to do this is to place them on
the stack, allowing the program to access them at its convenience.
So let's add all this to our kernel module. We'll start by defining a separate function to handle the work of walking through the strings and building the two arrays.
There's a fair bit of casting in this function because the kernel
tends to store user addresses as
unsigned long values. This may
sound counterproductive, but it's somewhat natural given that kernel
code rarely dereferences such addresses. But our function wants to
work with them as pointer types, in order to take advantage of pointer
arithmetic. (We also have to cast
argc into a pointer while we
briefly pretend that the stack is an array.)
With this function in our code, we just need to use it at the appropriate time, after the stack has been created and its address has been finalized.
In order to verify that all of these changes do in fact work, we can
write a couple more program,
env.asm. Again, they aren't particularly
interesting in the details, so I won't dissect them here. But if
you're at all familiar with reading x86 assembly, they should be
relatively straightforward to understand.
At this point, we have created a binary format for the Linux kernel that functions without metadata. Writing code for it is a bit of a pain, though — we have to write everything in assembly, and none of the standard tools work with our format.
Well, it so happens that there is something that we can do about that.
With some investment of effort, we can coax our familiar tools into
.com binaries, allowing us to use things like C compilers
once more. There's a number of steps to the whole journey, however, so
I've decided to put the gory details in a separate appendix, and I
encourage you to peruse it if you are at all curious.
Click here to check out the appendix.
But for now, I don't want to be sidelined by meandering distractions like "usability". The focus of this essay, after all, is using kernel modules to let us produce working binaries that are really teensy. We have already produced a valid seven-byte executable file, and it is undeniably a thing of beauty. But a question immediately presents itself.
Could it be even smaller?
Well, we can't shrink the program itself down any further. It's as
small as it can get. But maybe we could get by with a simpler program,
if we changed the binary loader a little bit. Nothing too ridiculous,
mind you. But I'm thinking … what if our binaries could just
automatically exit when they came to the end, instead of forcing the
programmer to use the
exit system call? I mean, a majority of
programming languages work that way, right? If a Python program makes
it to the end of the file, it just quietly exits. Could we make our
binaries do that as well?
We absolutely can. What we would need to do is append a few extra bytes of machine code to the end of our file image. This code will only be executed if the program would have crashed otherwise, so one could argue that this is a purely beneficial modification to our binary format.
However, it does mean that the code size will now be larger than the file size. This alters the addresses of how we lay out memory, which could mess with programmers' assumptions, given that programs are generally written in low-level assembly.
Rather than calling this a minor feature upgrade, I therefore feel
compelled to fork the code base at this point, and give this file
format a new name, so as not to muddy the until-now transparent ABI of
Originally, I was planning on calling this new, extended format "dot-e-com", or "dot-x-com" or something equally trite. However, after some reflection, I decided to be less boring and instead I called it the "keep-calm" (and carry on) format. This name refers naturally to the fact that you no longer have to fret about setting up mandatory instructions for exiting. Instead you can just carry on, right up to the end of the program.
However, I decided I didn't like using
.calm as a filename
extension. It was too easy to confuse with
.com when said aloud. So
I instead chose to adopt the Unicode character for a crown as the
.♚. (This is U+265A, one of the Unicode chess piece
Now I realize that some people may find it a bit controversial to use a non-ASCII symbol in a filename extension. But hey, the Unicode Consortium has gone to a great deal of trouble to standardize thousands upon thousands of glyphs, and we programmers continue to draw upon the same 100 or so characters for our symbols, keywords, and filenames. All these others are just sitting around, neglected. We should start using them more. I mean, why not? Heck, put emojis in your filenames. I'm not your dad.
(If for whatever reason you don't trust your compiler to properly
handle Unicode characters, you can instead assume a UTF-8 environment
and write the second argument to
Anyway. The machine code that we want to append to the program can be squeezed into eight bytes:
That's convenient, as it means that we can stuff it all into a
value, which can be stored in user memory with a simple
The next change we need to make is to reserve an extra eight bytes in
the layout that we report to the memory manager.
We will still use
filesize when we call
vm_mmap(), since we can
only map what's in the file. And here we hit a subtle point. Due to
the fact that memory is always mapped in page-sized chunks, we
typically get more memory than we ask for from
vm_mmap() — but not
always. If the binary file happens to be exactly (or nearly exactly)
page-sized, then there won't be enough memory mapped for our
eight-byte epilog. So, we need to check for this edge case, and when
it happens we need to call
vm_mmap() a second time to reserve a page
of anonymous memory immediately following:
And of course that ridiculous-looking 64-bit magic number at the bottom is the little-endian encoding of our epilog.
rmmodthe previous module this time, since we are defining a new binary format. The two can be active simultaneously without interfering with each other.
Of the programs we've currently written,
is the best candidate to benefit from this change, having twelve
bytes' worth of machine code that can be omitted with the new format:
The only down side of this change is that the exit status no longer reports error values from read failures.
No, that's not true. The other, more signficant, down side of this change is that this new feature can't be used to improve the size of our seven-byte executable! That program still has to explicitly exit, in order to exit with a non-zero status. (And the non-zero exit is how we can be certain that the program actually ran, and wasn't, say, mistakenly handled as a do-nothing shell script.)
Well, it so happens that I have the perfect answer to both of these concerns.
The way to address these issues is to modify our epilog so that instead of always returning zero, it uses the value at the top of the stack as the exit code. Additionally (and here's the brilliant part), the loader will set up our stack so that it starts out with a zero entry at the top. That way, if the program is well-behaved and pops everything off the stack that it pushed, it will automatically exit with a successful zero status — but it can also quit prematurely at any time, leaving an error code on the stack that will be automatically transmitted to the user. This is clearly an improvement to our original idea, right? I think it makes perfect sense, and isn't at all contrived.
Fortunately, this new epilog will still fit snugly into eight bytes:
This improvement requires only a three-line change to our kernel module:
Once we've verified that it builds, let's revisit our current-best seven-byte binary. That same program should work just as well in our new format:
But it should also work if we push the number 42 onto the stack
and just leave it there. In other words, if the program just
consists of the
Two bytes. Two bytes! Our program is the size of a single machine-language instruction! This is the limit! There's no way to make a working program any smaller than that!
Well, I mean. I don't think there is. That is, obviously, there does exist a number that is less than two, so in theory it could be smaller, but how would that even be possible? Like, if there was a one-byte instruction for storing an arbitrary value on the stack, then maybe. But there isn't. The only other realistic possibility I can think of would be if the binary file could make use of some metadata to request a particular value to place on the stack initially, instead of having it be a hard-coded zero value. But there is no metadata in our file! That's the whole point of the format, right? I mean, of course, all files have some metadata; that's just the nature of filesystems. You could argue that the filename itself counts as metadata. I mean, we are basically using the filename as metadata already, I suppose. We're looking at the extension to determine the file type. So you could in theory potentially maybe argue that, just for example, using the character immediately preceding the extension would also be valid metadata, and that could be defined to specify optional behavior, like the default stack value for example …
ext is the pointer to the filename's extension, that we initialized
way back at the top of the function, so
ext[-1] is the character
directly to the left of the dot. Don't get me wrong; I fully realize
that this is indefensibly contrived — especially since, as it turns
out, the ASCII character for 42 is the asterisk. That's quite an
inconvenient character to have in a filename.
But having come this far, how can I not continue?
Beat that, Internet Random Person™!