Download tiny.tar.gz   [34k]
(includes source and binaries)

Tiny Programs

The programs listed here are provided as exhibitions of just how compressed a Linux ELF executable can be and still function.

Take a look through /usr/bin: how big is the smallest ELF executable in there? Now try writing the smallest hello-world program you can. Can you get it under 1K? Can you get it under 100 bytes?

This is precisely what I set out to do one day, and I ended up with a minor obsession over creating the smallest possible executables.

The distribution comes with preassembled binaries along with the assembly source code. If you wish to be able to rebuild them yourself, you will need to have nasm installed.


Some Introductory Blather Which You Should Feel Free to Skip if You Want to Get to the Good Stuff

Your average ELF executable, even after being stripped, contains a fair bit of overhead, some of it completely superfluous. And when you rip out everything that doesn't contribute to a program's memory image, much of what remains, beside the program code itself, is information needed by the dynamic linker. This is of course very important information to have, normally — but if the program makes its own system calls, it can dispense with that overhead as well.

Of course, there are few tools out there that will create an executable that has none of this extra machinery. But if one is willing to define the file entirely by hand, an executable can be created with the absolute minimum contents, including only the information that Linux needs in order to get it into memory and start it running.

The minimum information an ELF executable file needs to have is an ELF header and a program header table with at least one entry. (For more information about these entities, consult the ELF standard.) But, even in these two remaining structures, there are fields that aren't used, particularly with a bare-bones executable. Furthermore, there are other fields in these structures which do serve a purpose, but which are not, at this point in time, actually examined by the Linux OS. So, these fields can be used, if the programmer is desperate enough, to hold other bits of data, or even small amounts of code....

Let me pause for a moment and make this perfectly clear: I emphatically do not advocate the wilfull, widespread disregard of standards. Certainly, just because the ELF standard says that a given field will hold a certain value does not automatically mean that Linux ought to examine that field and reject any ELF file with something else there. But by the same token, just because (the current version of) Linux ignores the contents of that field does NOT give programmers implicit permission to fill it in with anything they like. When programmers en masse adhere to existing implementations instead of existing standards is exactly when a well-written standard becomes an obstacle instead of a tool, when future standards are forced to canonize ludicrous practices in the name of backward-compatibility, and when everything generally goes to hell. So: where these programs violate requirements of the ELF standard, do not take them seriously. These programs have been offered up as entertainment, and perhaps as an education in existing practice. But please do not seek to emulate such behavior in other, more serious programming work.

This truth is illustrated by the fact that I have had portability issues with these programs more than once in their history. For example, the 2.0 Linux kernel ignored the e_phentsize entry in the ELF header. Early 2.2 and 2.4 kernels did as well, but then at some point it began to be checked, presumably as part of adding support for 64-bit ELF files. Starting with the 2.6 kernel, program segments that had a different memory size than their file size were no longer allowed to be non-writeable. Some 2.6 kernels permit ELF binaries to load to address zero, and some do not. All of the above cheats were ones that I have had to abandon in order to keep my smallest programs working. Partly because of that, you will see that with the more interesting programs, these tricks are ignored in favor of full compliance with the ELF standard, while still being small.

Of course, violating standards requirements is only one way in which these programs achieve such compression. Other aspects regarding the organization of the ELF structures in these programs is admittedly unorthodox, but nonetheless in complete adherence to the standard. Most important of all, of course, is the creative reworking of algorithms. Finally, as with any CISC assembly programming, there are also plenty of uncommon instructions that can be used to accomplish common tasks.

The path of optimizing an ELF executable for size is described in loving detail in my essay:


The Tiny Programs Collection

true.asm source
true binary

This program returns an exit code of either zero or one, depending on whether it is invoked with the name "true" or "false", respectively. This one is the runt of the litter. Its size is 45 bytes. Count them. This is the smallest it is possible for a Linux ELF executable to be.

hello.asm source
hello binary

The program that started me off on this whole pursuit: hello world. It is 62 bytes long. This one is quite dense. With the program header table overlaid on top of the ELF header, and program itself running through both of them, some of the bytes have no less than three different purposes.

Here you can find an even smaller hello-world program, apparently written by someone named Kikuyan (菊やん). It is only 58 bytes long, and would have been 57 if the author had used the same version of the greeting as myself. The program uses the technique of loading to absolute address zero, which permits a number of tricks that further reduce code size. I have not used this technique myself, because sadly current versions of Linux do not permit executables to load to address zero. It is an ingenious little program, though.

keepalive.asm source
keepalive binary

Now here's a program that I actually use. It simply runs forever, printing a bell character every four minutes or so. I keep it in the background after logging into a remote machine that times out connections when they're idle. At 56 bytes, it's a bit longer than the one-line shell script I originally used, but on the plus side it doesn't take up an extra process in order to sleep.

bf.asm source
bf binary

Brainfuck is a very simplistic programming language, which was invented by Urban Mueller solely for the purpose of being able to create a compiler that was less than 256 bytes in size, for the Amiga OS. (More information about Brainfuck can be found at my Brainfuck page.) I eventually decided to take up the challenge as well, and create a Brainfuck compiler for Linux using less than 256 bytes. The compiler works as a filter: redirect the source code to the compiler's input and the compiler's output to a file. (And don't forget to chmod +x the output file.) At the beginning I didn't expect that I would actually succeed, since I thought I would need almost half of that just for the headers. I think my first cut was 458 bytes in size. I was quite pleased with myself when I trimmed that down to less than 300 bytes, ecstatic when I finally reached the 255-byte goal, and downright triumphant when I later got it to work in 238 bytes. By the time I had a 198-byte version, I was just plain dumbfounded. It's now at 166 bytes. And though I can't quite believe it, it works just as well as the first one did. As useless as it is, I think this one would have to be the crown jewel of this little collection. It is quite possibly also the ugliest and most opaque program I have ever written.

hexdump.asm source
hexdump binary

This is your classic hexdump program. I wrote it one day in a fit of pique, after trying (and failing) to convince the standard Linux hexdump program to output bytes in the good old hexdump formatting style that I've been accustomed to since before reaching my majority. This program (and the ones that follow) are different from the previous ones in that they completely adhere to the requirements of the ELF standard. I did this not only because I got tired of fixing them when newer versions of Linux broke my old versions, but also because, unlike the previous programs, these actually do something useful. I still use hexdump on a regular basis, as the standard Linux hexdump program still can't be coaxed into this output format. The initial version of hexdump simply read from standard input, but I found this limitation irritating enough that I sacrificed a few more bytes in order to specify a filename on the command line. Its size is therefore a whopping 202 bytes.

ls.asm source
ls binary

After hexdump, I came into contact with Konstantin Boldyshev, who headed the asmutils project, and I got involved in creating more tiny programs that were actually useful. ls was my first real achievement along these lines. It is 1017 bytes in size, and sports a number of command-line options:

-C columnar output; default if stdout is a tty
-1 one file per output line; default if stdout is not a tty
-l "long" output
-F show file type suffixes
-i show inode numbers
-s show size of files in 1K-blocks
-d don't expand directories
-a show all (hidden) files
-B don't show files ending in ~ (backups)
-N don't replace non-graphic characters with a question mark
-b show non-graphic characters as octal escape sequences
-R recursively list contents of subdirectories

The "long" output format only displays numeric user IDs, and it displays timestamps as an age, instead of the actual timestamp. There is no sorting capability, and the columnar output is much less intelligent than GNU's (though it looks pretty good most of the time). Beyond that, however, it conforms pretty closely to the standard ls program we all know and love.

base64.asm source
base64 binary

This is a simple base64 decoder utility. Like hexdump, it can either accept a filename on the command line, or work on standard input. Since sometimes very large files are encoded in base64, I violated my prime directive a little bit and let the program grow more than strictly necessary, in order to optimize for speed. This version is actually noticeably faster than the utility included in GNU coreutils for 32-bit Linux. Its size is exactly 256 bytes.

factor.asm source
factor binary

This is an implementation of the standard Unix utility. It displays the prime factors of the integers specified on the command line, or on standard input if no arguments are given. This program was a bit of a departure for me. Instead of simply trying to achieve the smallest possible size, I decided to shoot for real portability, for a change. This program not only conforms to the requirements of the ELF standard, it is also the only program in this collection that actually dynamically links with libc (as opposed to making its own system calls). Thus it should continue to work with any future version of Linux, as long as new versions of the libc ABI and the ELF standard remain backwards-compatible. And since, like base64, factor is a program that can sometimes run for a long time, I attempted to balance optimizing for both size and speed. This program therefore makes significant use of the FPU, in order to increase parallelization in the inner loop. It also has online help, version information, and error messages. It therefore arguably stands as a completely functional replacement for the version in GNU coreutils. And, back when I first wrote it, it was significantly faster than the GNU program. (Since then, they've upgraded the coreutils factor with some advanced mathematics, and it's now the faster program.) Its size is 1020 bytes.

puzzle.asm source
puzzle binary

As a sort of oversized finale to this list, puzzle is a game that runs under X. You've probably seen a 15-16 sliding tile puzzle at some point in your life. Some of you may remember the program that came with the original 1984 Macintosh. But are you aware that, in order to justify including the puzzle in the limited memory and disk space available, the author squeezed it down to under 600 bytes? (See here for the details.) Alas, while I would have loved to be able to present a Linux binary that matched that size, the ELF structures required to link to all of the required X functions alone needed well over 600 bytes, before I included a single instruction of code or a pixel of data. Maybe someone more knowledgeable about X programming than I could reduce the number of functions needed, but for now here's my relatively huge version. It uses the exact same graphics as the original (which unfortunately means that the window is only slightly larger than its own title bar on a modern display). Lacking access to a Macintosh emulator, I had to rely on my own 25-year-old memory of the program's behavior, so I may have gotten some details wrong. But it is fully functional. And despite all this hedging about its enormous size, at 1713 bytes I'd still wager that it's the smallest X program you've ever used.


Software
Brian Raiter
Muppetlabs