public final class BytesToNameCanonicalizer extends Object
Names which are constructed directly from a byte-based
 input source).
 Complications arise from trying to do efficient reuse and merging of
 symbol tables, to be able to make use of usually shared vocabulary
 of subsequent parsing runs.| Modifier and Type | Field and Description | 
|---|---|
protected int | 
_collCount
Total number of Names in collision buckets (included in
  
_count along with primary entries) | 
protected int | 
_collEnd
Index of the first unused collision bucket entry (== size of
 the used portion of collision list): less than
 or equal to 0xFF (255), since max number of entries is 255
 (8-bit, minus 0 used as 'empty' marker) 
 | 
protected org.codehaus.jackson.sym.BytesToNameCanonicalizer.Bucket[] | 
_collList
Array of heads of collision bucket chains; size dynamically 
 | 
protected int | 
_count
Total number of Names in the symbol table;
 only used for child tables. 
 | 
protected boolean | 
_intern
Whether canonical symbol Strings are to be intern()ed before added
 to the table or not 
 | 
protected int | 
_longestCollisionList
We need to keep track of the longest collision list; this is needed
 both to indicate problems with attacks and to allow flushing for
 other cases. 
 | 
protected int[] | 
_mainHash
Array of 2^N size, which contains combination
 of 24-bits of hash (0 to indicate 'empty' slot),
 and 8-bit collision bucket index (0 to indicate empty
 collision bucket chain; otherwise subtract one from index) 
 | 
protected int | 
_mainHashMask
Mask used to truncate 32-bit hash value to current hash array
 size; essentially, hash array size - 1 (since hash array sizes
 are 2^N). 
 | 
protected Name[] | 
_mainNames
Array that contains  
Name instances matching
 entries in _mainHash. | 
protected BytesToNameCanonicalizer | 
_parent
Reference to the root symbol table, for child tables, so
 that they can merge table information back as necessary. 
 | 
protected AtomicReference<org.codehaus.jackson.sym.BytesToNameCanonicalizer.TableInfo> | 
_tableInfo
Member that is only used by the root table instance: root
 passes immutable state into child instances, and children
 may return new state if they add entries to the table. 
 | 
protected static int | 
DEFAULT_TABLE_SIZE  | 
protected static int | 
MAX_TABLE_SIZE
Let's not expand symbol tables past some maximum size;
 this should protected against OOMEs caused by large documents
 with unique (~= random) names. 
 | 
| Modifier and Type | Method and Description | 
|---|---|
Name | 
addName(String symbolStr,
       int[] quads,
       int qlen)  | 
Name | 
addName(String symbolStr,
       int q1,
       int q2)  | 
int | 
bucketCount()  | 
int | 
calcHash(int firstQuad)  | 
int | 
calcHash(int[] quads,
        int qlen)  | 
int | 
calcHash(int firstQuad,
        int secondQuad)  | 
protected static int[] | 
calcQuads(byte[] wordBytes)  | 
int | 
collisionCount()
Method mostly needed by unit tests; calculates number of
 entries that are in collision list. 
 | 
static BytesToNameCanonicalizer | 
createRoot()
Factory method to call to create a symbol table instance with a
 randomized seed value. 
 | 
protected static BytesToNameCanonicalizer | 
createRoot(int hashSeed)
Factory method that should only be called from unit tests, where seed
 value should remain the same. 
 | 
Name | 
findName(int firstQuad)
Finds and returns name matching the specified symbol, if such
 name already exists in the table. 
 | 
Name | 
findName(int[] quads,
        int qlen)
Finds and returns name matching the specified symbol, if such
 name already exists in the table; or if not, creates name object,
 adds to the table, and returns it. 
 | 
Name | 
findName(int firstQuad,
        int secondQuad)
Finds and returns name matching the specified symbol, if such
 name already exists in the table. 
 | 
static Name | 
getEmptyName()  | 
int | 
hashSeed()  | 
BytesToNameCanonicalizer | 
makeChild(boolean canonicalize,
         boolean intern)
Factory method used to create actual symbol table instance to
 use for parsing. 
 | 
int | 
maxCollisionLength()
Method mostly needed by unit tests; calculates length of the
 longest collision chain. 
 | 
boolean | 
maybeDirty()
Method called to check to quickly see if a child symbol table
 may have gotten additional entries. 
 | 
void | 
release()
Method called by the using code to indicate it is done
 with this instance. 
 | 
protected void | 
reportTooManyCollisions(int maxLen)  | 
int | 
size()  | 
protected static final int DEFAULT_TABLE_SIZE
protected static final int MAX_TABLE_SIZE
protected final BytesToNameCanonicalizer _parent
protected final AtomicReference<org.codehaus.jackson.sym.BytesToNameCanonicalizer.TableInfo> _tableInfo
protected final boolean _intern
protected int _count
protected int _longestCollisionList
protected int _mainHashMask
protected int[] _mainHash
protected Name[] _mainNames
Name instances matching
 entries in _mainHash. Contains nulls for unused
 entries.protected org.codehaus.jackson.sym.BytesToNameCanonicalizer.Bucket[] _collList
protected int _collCount
_count along with primary entries)protected int _collEnd
public static BytesToNameCanonicalizer createRoot()
protected static BytesToNameCanonicalizer createRoot(int hashSeed)
public BytesToNameCanonicalizer makeChild(boolean canonicalize, boolean intern)
intern - Whether canonical symbol Strings should be interned
   or notpublic void release()
public int size()
public int bucketCount()
public boolean maybeDirty()
public int hashSeed()
public int collisionCount()
size() - 1), but should usually be much lower, ideally 0.public int maxCollisionLength()
size() - 1 in the pathological casepublic static Name getEmptyName()
public Name findName(int firstQuad)
Note: separate methods to optimize common case of short element/attribute names (4 or less ascii characters)
firstQuad - int32 containing first 4 bytes of the name;
   if the whole name less than 4 bytes, padded with zero bytes
   in front (zero MSBs, ie. right aligned)public Name findName(int firstQuad, int secondQuad)
Note: separate methods to optimize common case of relatively short element/attribute names (8 or less ascii characters)
firstQuad - int32 containing first 4 bytes of the name.secondQuad - int32 containing bytes 5 through 8 of the
   name; if less than 8 bytes, padded with up to 3 zero bytes
   in front (zero MSBs, ie. right aligned)public Name findName(int[] quads, int qlen)
Note: this is the general purpose method that can be called for names of any length. However, if name is less than 9 bytes long, it is preferable to call the version optimized for short names.
quads - Array of int32s, each of which contain 4 bytes of
   encoded nameqlen - Number of int32s, starting from index 0, in quads
   parameterpublic final int calcHash(int firstQuad)
public final int calcHash(int firstQuad,
           int secondQuad)
public final int calcHash(int[] quads,
           int qlen)
protected static int[] calcQuads(byte[] wordBytes)
protected void reportTooManyCollisions(int maxLen)