32 Commits (main)
 

Author SHA1 Message Date
pantonshire 69660a7e10 update dependencies and resolver 1 year ago
pantonshire e90b77256a version bump 3 years ago
pantonshire 402da43bc1 fix utfdump_wasm Cargo.toml 3 years ago
pantonshire 673364873d specify msrv 3 years ago
pantonshire 0d1872b902 wasm library 3 years ago
pantonshire 9d6eae0bd0 make lib no-std compatible 3 years ago
pantonshire effe6916d3 move unicode_data_encoded.gz to lib 3 years ago
pantonshire c274ba6f01 🐛 only show consumed bad bytes for invalid characters
Previously, the bytes displayed for invalid characters included bytes
from the byte stream that were peeked rather than consumed. This
resulted in certain bytes being displayed multiple times, since the
peeked byte could appear in the following character.

For example, `printf '\xce\x61' | utfdump_bin` would result in the byte
0xce being displayed twice, once at the end of the invalid character and
once as the valid character `a`.

This patch modifies `utfdump::utf8::Utf8Error` so it also stores the
number of consumed bad bytes, enabling the binary to output only the
consumed bad bytes.
3 years ago
pantonshire ecf2abbdad retrieve data from unicode.org in data.py 3 years ago
pantonshire 6b911caa15 update Cargo.lock 3 years ago
pantonshire 6e8d197ae4 remove core 3 years ago
pantonshire f9430db2f9 update binary to use new lib 3 years ago
pantonshire cc47c90f50 include compressed encoded data in repo 3 years ago
pantonshire 25dae48064 refactoring 3 years ago
pantonshire a3750b5732 working rust decoder for new encoded data format 3 years ago
pantonshire dc4650262d packed struct for character data 3 years ago
pantonshire 24abc7ed79 work on rust side of new encoded data format 3 years ago
pantonshire d9b0c049ab gzip compress the unicode data 3 years ago
pantonshire a427b7f58f output encoded data 3 years ago
pantonshire 6791ac8d35 character data encoding 3 years ago
pantonshire d213c81bf6 🎉 start new encoder for unicode data 3 years ago
pantonshire ef6765e037 ♻️ refactoring 3 years ago
pantonshire 96d328c829 support for invalid utf8 3 years ago
pantonshire 26ee43af2e make Combining column human-readable
This patch introduces a `CombiningClass` type to represent the canonical
combining class of a codepoint. The `fmt::Display` implementation of
this type writes the human-readable name of the combining class, if
there is one. This replaces the previous behaviour, which was to just
print the raw byte value of the combining class.
3 years ago
pantonshire f1355b5fe3
update README to include link to releases page 3 years ago
pantonshire edb418216f update gitignore 3 years ago
pantonshire 9c1a98adb7 add help text for -f flag
This patch adds a documentation comment to `Args.full_category_names`,
which `clap` uses to generate help text for the `-f` flag.
3 years ago
pantonshire 22d7e6a2a7
add README.md 3 years ago
pantonshire 885cee0e1f
add LICENSE file 3 years ago
pantonshire 6ed0de3c7c update Cargo.toml information
This patch updates the Cargo.toml files for both crates in this
repository, adding the author, license, description and repository
fields.
3 years ago
pantonshire 6cb3b0a412 add command line option to show full category name
This patch adds `clap` as a dependency for command-line argument
parsing, and introduces a `-f` flag to allow toggling whether or not the
full names of character classes are displayed.
3 years ago
pantonshire 172d1a14fe Initial commit 3 years ago