Previously, the bytes displayed for invalid characters included bytes
from the byte stream that were peeked rather than consumed. This
resulted in certain bytes being displayed multiple times, since the
peeked byte could appear in the following character.
For example, `printf '\xce\x61' | utfdump_bin` would result in the byte
0xce being displayed twice, once at the end of the invalid character and
once as the valid character `a`.
This patch modifies `utfdump::utf8::Utf8Error` so it also stores the
number of consumed bad bytes, enabling the binary to output only the
consumed bad bytes.
This patch introduces a `CombiningClass` type to represent the canonical
combining class of a codepoint. The `fmt::Display` implementation of
this type writes the human-readable name of the combining class, if
there is one. This replaces the previous behaviour, which was to just
print the raw byte value of the combining class.
This patch adds `clap` as a dependency for command-line argument
parsing, and introduces a `-f` flag to allow toggling whether or not the
full names of character classes are displayed.