I was recently tasked with creating a simple command line program that would take an input of a file of unknown contents and print a hex dump as the output. However, I didn’t really know how I could access the data of the file to begin with, and I didn’t know what a hex dump was. So I’m going to share with you what I learned and what I wrote to accomplish this task.

Since I’m most familiar with JavaScript, I decided to do this in Node. The aim is to write a command like this:

node hexdump.js data

Which will run a hexdump.js program on a file (data) and output the hex dump.

The file can be anything - an image, a binary, a regular text file, or a file with other encoded data. In my particular case, it was a ROM.

If you’ve ever tried opening a non text-based file with a text editor, you’ll remember seeing a jumbled mess of random characters. If you’ve ever wondered how a program could access that raw data and work with it, this article might be enlightening.

This article will consist of two parts: the first, background information explaining what a hex dump is, what bits and bytes are, how to calculate values in base 2, base 10, and base 16, and an explanation of printable ASCII characters. The second part will be writing the hex dump function in Node.

What’s a Hex Dump?

To understand what a hex dump is, we can create a file and view a hex dump of it. I’ll make a simple text file consisting of a Bob Ross quote.

echo -en "Just make a decision and let it go." > data

-en here is preventing trailing newlines and allowing interpretation of backslash-escaped characters, which will come in handy in a bit. Also, data is just a filename, not any sort of command or keyword.

Unix systems already have a hexdump command, and I’ll use the canonical (-C) flag to format the output.

hexdump -C data

Here’s what I get.

00000000  4a 75 73 74 20 6d 61 6b  65 20 61 20 64 65 63 69  |Just make a deci|
00000010  73 69 6f 6e 20 61 6e 64  20 6c 65 74 20 69 74 20  |sion and let it |
00000020  67 6f 2e                                          |go.|
00000023

Okay, so it looks like I have a bunch of numbers, and on the right we can see the text characters from the string I just echoed. The man page tells us that hexdump “displays file contents in hexadecimal, decimal, octal, or ascii”. The specific format used here (canonical) is further explained:

-C, --canonical

Canonical hex+ASCII display. Display the input offset in hexadecimal, followed by sixteen space-separated, two-column, hexadecimal bytes, followed by the same sixteen bytes in %_p format enclosed in ‘|’ characters.

So now we can see that each line is a hexadecimal input offset (address) which is kind of like a line number, followed by 16 hexadecimal bytes, followed by the same bytes in ASCII format between two pipes.

AddressHexadecimal bytesASCII000000004a 75 73 74 20 6d 61 6b 65 20 61 20 64 65 63 69|Just make a deci|0000001073 69 6f 6e 20 61 6e 64 20 6c 65 74 20 69 74 20|sion and let it |0000002067 6f 2e|go.|00000023

First, let’s take a look at the input offset, also referred to as an address. We can see it has leading zeros and a number. In a text editor, for example, we have lines of code in decimal, incremented by one. Line 1, line 2, all the way down to line 382, or however many lines long the program is.

The address of a hex dump counts tracks the number of bytes in the data and offsets each line by that number. So the first line starts at offset 0, and the second line represents the number 16, which is how many bytes precede the current line. 10 is 16 in hexadecimal, which we’ll go into farther along in this article.

Next we have the ASCII. If you’re not familiar, ASCII is a character encoding standard. It matches control characters and printable characters to numbers. Here is a full ASCII table.

Now this hex dump kind of makes sense for viewing ASCII text, but what about data that can’t be represented by ASCII? Not every byte or number has an ASCII match, so how will that look?

In another example, I’ll echo 0-15 represented in base 16/hexidecimal, which will be 00 to 0f. To escape hexadecimal numbers using echo, the number must be preceeded by \x.

echo -en "\x00\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f" > data2

These numbers don’t correspond to any ASCII characters, and also cannot be viewed in a regular text editor. If you try opening it in VSCode, for example, you’ll see “The file is not displayed in the editor because it is either binary or uses an unsupported text encoding.”.

If you do decide to open it anyway, you’ll probably see what appears to be a question mark. Fortunately, we can view the raw contents with hexdump.

00000000  00 01 02 03 04 05 06 07  08 09 0a 0b 0c 0d 0e 0f  |................|
00000010

As you can see, unprintable ASCII characters are represented by a ., and the bytes are confirmed hexadecimal. The address has 10 on the second line because it’s starting on the 16th byte, and 16 is 10 in hexadecimal.

#javascript #node #computer science #bits #bytes

Understanding Bits, Bytes, and Numerical Bases
3.60 GEEK