• Home
  • Raw
  • Download

Lines Matching +full:rust +full:- +full:embedded

1 …b.com/dtolnay/unicode-ident) [![crates-io]](https://crates.io/crates/unicode-ident) [![d…
3 //! [github]: https://img.shields.io/badge/github-8da0cb?style=for-the-badge&labelColor=555555&logo…
4 //! [crates-io]: https://img.shields.io/badge/crates.io-fc8d62?style=for-the-badge&labelColor=55555…
5 //! [docs-rs]: https://img.shields.io/badge/docs.rs-66c2a5?style=for-the-badge&labelColor=555555&lo…
14 //! This crate is a better optimized implementation of the older `unicode-xid`
16 //! ASCII and non-ASCII codepoints with better performance, 2–10×
17 //! faster than `unicode-xid`.
26 //! - `unicode-ident` is this crate;
27 //! - [`unicode-xid`] is a widely used crate run by the "unicode-rs" org;
28 //! - `ucd-trie` and `fst` are two data structures supported by the
29 //! [`ucd-generate`] tool;
30 //! - [`roaring`] is a Rust implementation of Roaring bitmap.
37 //! comparing across different ratios of ASCII to non-ASCII codepoints in the
40 //! [`unicode-xid`]: https://github.com/unicode-rs/unicode-xid
41 //! [`ucd-generate`]: https://github.com/BurntSushi/ucd-generate
42 //! [`roaring`]: https://github.com/RoaringBitmap/roaring-rs
45 //! |---|---|---|---|---|---|
46 //! | **`unicode-ident`** | 9.75 K | 0.96 ns | 0.95 ns | 1.09 ns | 1.55 ns |
47 //! | **`unicode-xid`** | 11.34 K | 1.88 ns | 2.14 ns | 3.48 ns | 15.63 ns |
48 //! | **`ucd-trie`** | 9.95 K | 1.29 ns | 1.28 ns | 1.36 ns | 2.15 ns |
59 //! #### unicode-xid
64 //! ```rust
67 //! ('\u{30}', '\u{39}'), // 0-9
68 //! ('\u{41}', '\u{5a}'), // A-Z
79 //! consumes 8 bytes, because it consists of a pair of 32-bit `char` values.
85 //! which is surrounded by non-identifier codepoints consumes 64 bits in the
88 //! On a system with 64-byte cache lines, binary searching the table touches 7
93 //! Overall, the crate ends up being about 10× slower on non-ASCII input
97 //! Rust's `char` type is a 21-bit integer padded to 32 bits, which means every
106 //! #### ucd-trie
110 //! [rust-lang/rust#33098].
112 //! [rust-lang/rust#33098]: https://github.com/rust-lang/rust/pull/33098
114 //! ```rust
126 //! final states of the trie are embedded in leaves or "chunks", where each
127 //! chunk is a 64-bit integer. Each bit position of the integer corresponds to
138 //! byte UTF-8 encoded codepoints, 3 byte UTF-8 encoded codepoints and 4 byte
139 //! UTF-8 encoded codepoints, respectively.
146 //! to query based on a UTF-8 encoded string, returning the Unicode property
148 //! caller is required to tokenize their UTF-8 encoded input data into `char`,
149 //! hand the `char` into `ucd-trie`, only for `ucd-trie` to undo that work by
150 //! converting back into the variable-length representation for trie traversal.
155 //! [ucd-generate] but I am not aware of any advantage over the `ucd-trie`
156 //! representation. In particular `ucd-trie` is optimized for storing Unicode
160 //! [ucd-generate]: https://github.com/BurntSushi/ucd-generate
163 //! and slow lookups for this use case relative to `ucd-trie` is that it does
170 //! This crate is a pure-Rust implementation of [Roaring Bitmap], a data
171 //! structure designed for storing sets of 32-bit unsigned integers.
181 //! substantially slower than the Unicode-optimized crates. Meanwhile the
187 //! about 15% slower than pure-Rust `roaring`, which could just be FFI overhead.
192 //! #### unicode-ident
194 //! This crate is most similar to the `ucd-trie` library, in that it's based on
200 //! - Uses a single 2-level trie, rather than 3 disjoint partitions of different
202 //! - Uses significantly larger chunks: 512 bits rather than 64 bits.
203 //! - Compresses the XID\_Start and XID\_Continue properties together
208 //! properties in uncompressed form, in row-major order:
213 …rt bitmap" width="256" src="https://user-images.githubusercontent.com/1940490/168647353-c6eeb922-a…
214 …ue bitmap" width="256" src="https://user-images.githubusercontent.com/1940490/168647367-f447cca7-2…
223 //! This crate stores one 512-bit "row" of the above bitmaps in the leaf level
225 //! out there are 124 unique 512-bit chunks across the two bitmaps so 7 bits are
234 //! In fact since there are only 124 unique chunks, we can use an 8-bit index
235 //! with a spare bit to index at the half-chunk level. This achieves an
241 //! In contrast to binary search or the `ucd-trie` crate, performing lookups in
242 //! this data structure is straight-line code with no need for branching.
245 #![doc(html_root_url = "https://docs.rs/unicode-ident/1.0.8")]
253 pub fn is_xid_start(ch: char) -> bool { in is_xid_start()
262 pub fn is_xid_continue(ch: char) -> bool { in is_xid_continue()