10 Commits (main)

Author SHA1 Message Date
pantonshire e90b77256a version bump 3 years ago
pantonshire 9d6eae0bd0 make lib no-std compatible 3 years ago
pantonshire effe6916d3 move unicode_data_encoded.gz to lib 3 years ago
pantonshire c274ba6f01 🐛 only show consumed bad bytes for invalid characters
Previously, the bytes displayed for invalid characters included bytes
from the byte stream that were peeked rather than consumed. This
resulted in certain bytes being displayed multiple times, since the
peeked byte could appear in the following character.

For example, `printf '\xce\x61' | utfdump_bin` would result in the byte
0xce being displayed twice, once at the end of the invalid character and
once as the valid character `a`.

This patch modifies `utfdump::utf8::Utf8Error` so it also stores the
number of consumed bad bytes, enabling the binary to output only the
consumed bad bytes.
3 years ago
pantonshire f9430db2f9 update binary to use new lib 3 years ago
pantonshire 25dae48064 refactoring 3 years ago
pantonshire a3750b5732 working rust decoder for new encoded data format 3 years ago
pantonshire dc4650262d packed struct for character data 3 years ago
pantonshire 24abc7ed79 work on rust side of new encoded data format 3 years ago
pantonshire ef6765e037 ♻️ refactoring 3 years ago