I wrote "A Whirlwind Tutorial ..." back in July 2, 1999, mostly in one sitting. Having spent months figuring out the ELF executable file format, and having had all that effort actually pay off well beyond my original expectations, I wanted to write down all the paths and blind alleys I had just finished exploring.
I had created a 45-byte executable, and I wanted to share it with the world.
Since then, a few omissions in the original text have come to my attention, but I can no longer bring myself to alter the original. So I've added this postscript instead.
When I wrote "A Whirlwind Tutorial ...", I was unfamiliar with the fact that the -R option to strip, which removes sections by name, allows you to name sections that do not get removed by default. With judicious use of this feature of strip, both the C version and the assembly version of the executable can be reduced a bit more.
Of course, strip won't remove the section header table itself (nor the section header string table), so the switch to hand-coding the binary image is still necessary to get below the 200-byte mark.
(It may be worth noting here, just for the sake of general edification, that stripping sections out of a nontrivial program can be hazardous. ELF requires that the file offsets in an executable mirror the memory offsets to a certain degree. So removing a non-empty section out of the middle of an executable runs the risk of invalidating a memory offset in your code. Of course, the linker tends to put unnecessary sections near the end of the file so as to avoid this issue, but if you're using the -R option in the first place, it's probably because you and the linker disagree on what's necessary.)
After producing the seven-byte version of the program, I comment: "I think it's pretty safe to say that we're not going to make this program any smaller than that."
Well, actually, it could be made smaller. When Linux starts up a new executable, one of the things it does is zero out the accumulator (as well as most of the other registers). Taking advantage of this fact would have allowed me to remove the xor, bringing the program down to five bytes. However, this behavior is certainly not documented, and there's no guarantee that it can be counted on to stay that way (other than the lack of any obvious reason to change it). And in any case, such a change wouldn't have had any effect on the size of the final versions.
"None of the standard tools will deign to make an executable without a section header table of some kind." When I wrote that, I used the word "standard" quite intentionally. There is a non-standard tool that can remove a section header table from an existing executable, namely my own sstrip program. (See http://www.muppetlabs.com/~breadbox/software/elfkickers.html.) However, at the time I had just created it, and despite its incredibly simple nature I wasn't yet convinced of its robustness. So I decided it was safer not to bring it up in the first place.
Of course, the biggest omission in the original document is a question I left unanswered. The following passage used to appear near the end:
... it turns out that, contrary to every expectation, the executable bit can be dropped from the p_flags field, and Linux will set it for us anyway. Why this works, I honestly don't know -- maybe because Linux sees that the entry point goes to this segment? In any case, it works.
As it turns out, my guess was right -- in a very twisted sort of way.
I knew, of course, that Linux uses a flat-memory model, in which every selector register points to the same physical memory area. What I didn't know was that Linux memory is even flatter than that: every process (except the kernel) uses the exact same set of selectors.
When the kernel boots up, it creates the global descriptor table. One of the entries in this table is marked as being readable and executable, and another is marked as being readable and writeable. These two descriptors are then used as every program's cs and ds/es/ss registers. Changing what these selectors actually point to is then handled at the paging level, in the linear-to-physical memory translation.
Of course, memory pages have their own, independent protection flags, but they only indicate read-write vs. read-only. You can't mark a page as being executable or non-executable. (As it turns out, you never need to set more than one bit of the p_flags field. Setting either the readable or the executable bit will create a read-only page, and setting the writeable bit will create a read-write page.)
So, the actual error in my thinking was assuming that Linux was allocating selectors for every process. I couldn't see why Linux would even create an executable selector when none of the loadable segments of the ELF file were marked as being executable. But in reality, Linux had created the executable selector long before my program was even compiled.
[3 February 2001]
Okay, I lied about not being able to edit the original. What I didn't want was to make it look like I knew more than I did when I originally wrote it. Part of the fun of writing it was to communicate my still-fresh sense of discovery.
But then, newer versions of the 2.2 kernel were released which refused to execute the last few versions of the program. (My thanks to François-René Rideau for bringing this development to my attention.) I waited until I could verify that this new restriction was only in 2.2.17 and later 2.2 kernels, and not in 2.4.0. I then inserted this new information into the text and fixed the offending programs.
[13 March 2001]
When it rains, it pours. I had to make another edit to the original. I'm not sure when this changed, but the final version of the assembly program that still used the libc exit() function now cause a segfault on my system (2.2.14). It appears that calling exit() is a bad idea if you've bypassed the standard startup code. So I had to change the program to use _exit() instead. (Not that I expect a lot of readers are actually assembling and running all of the programs in the document, but far be it from me to knowingly disseminate incorrect information.)
[9 May 2009]
After many years, I have finally gotten around to updating this web page for the 2.6 kernels. Linux made a serious change in that it is no longer possible for p_memsz to be different from p_filesz if the segment is not marked writeable. Fortunately this can be easily fixed by just increasing p_filesz to match p_memsz, at least for the time being, but it took me a while to figure this out. While I was here, I also removed some text added in 2001 that referred to differences between various 2.2.x kernels. Those details are all ancient history now.