Download chd-1.1.tar.gz
Sourcehut page

A Unicode-aware hexdump utility

The program chd is a simple variation on the standard hexdump utility xxd(1) that dumps Unicode codepoints instead of raw bytes. This is particularly useful when dealing with UTF8-encoded files, where each character can occupy anywhere from one to four bytes.

Here's an example using a file containing the string "Îņţëřñåṭıőňⱥłȉʑḁŧīòń". The output from xxd:

00000000: c38e c586 c5a3 c3ab c599 c3b1 c3a5 e1b9 ................ 00000010: adc4 b1c5 91c5 88e2 b1a5 c582 c889 ca91 ................ 00000020: e1b8 81c5 a7c4 abc3 b2c5 840a ............

And the output from chd:

00000000: CE 146 163 EB 159 F1 E5 1E6D Î ņ ţ ë ř ñ å ṭ 00000008: 131 151 148 2C65 142 209 291 1E01 ı ő ň ⱥ ł ȉ ʑ ḁ 00000010: 167 12B F2 144 0A ŧ ī ò ń ␊

(The character output in the right-hand column is spaced out to permit double-width characters to be displayed correctly.)

The source code also provides a simple example of using C's wide-character functions to portably handle Unicode I/O, without hard-coding the use of e.g. UTF-8.

The code in this distribution is made available under the MIT license. Share and Enjoy. Questions and comments should be directed to me at breadbox@muppetlabs.com.


Software
Brian Raiter