Class JCasHashMap

java.lang.Object
org.apache.uima.jcas.impl.JCasHashMap

public class JCasHashMap extends Object
Version 2 (2014) of map between CAS addr and JCasCover Objects Note: in the general case, the cover object may *not* be a JCas one, but rather the general one This happens if there is no JCas cover object defined for the type. Assumptions: Each addr has a corresponding JCas; it is not permitted to "update" an addr with a different JCas cover class (unless the table is cleared first). Table always a power of 2 in size - permits faster hashing Accesses to this table are threadsafe, in order to support read-only CASes being shared by multiple threads. Multiple iterators in different threads could be accessing the map and updating it. Load factor tuning. 2,000,000 random inserts, 50 reps (for JIT) .5 (2x to 4x entries) 99% 5 probes 250 ms .6 (1.67 to 3.3) 99% 6 probes 285 ms .7 (1.43 to 2.86) 99% 8 probes 318 ms .8 (1.25 to 2.5) 99% 11 probes 360 ms version 1 at load factor .5 ran about 570 ms * 1.x (did 2 lookups for fetches if not found,) No "get" method, only getReserve. This method, if it doesn't find the key, eventually finds an empty (null) slot - it then stores a special "reserve" item with the same key value in that slot. Other threads doing getReserve calls, upon encountering this reserve item, wait until the reserve is converted to a real value (a notifyAll happens when this is done), and then the getReserve returns the real item. getReserve calls - when they find the item operate without any locking Locking: There is one lock used for reading and updating the table -- not used for reading when item found, only if item not found or is reserved Strategy: have 1 outer implementation delegating to multiple inner ones number = concurrency level (a power of 2) The hash uses some # of low order bits to address the right inner one. This table is used to hold JCas cover classes for CAS feature structures. There is one instance of this table associated with each CAS that is using it.

The update occurs in the code in JCasGenerated classes, which do: a call to get the value of the map for a key if that is "null", it creates the new JCas cover object, and does a "put" to add the value.

The creation of the new JCas cover object can, in turn, run arbitrary user code, which can result in updates to the JCasHashMap which occur before this original update occurs.

In a multi-threaded environment, multiple threads can do a "get" for the same Feature Structure instance. If it's not in the Map, the correct behavior is:

one of the threads adds the new element the other threads wait for the one thread to finish adding, and then return the object that the one thread added.

The implementation works as follows:

1) The JCasHashMap is split into "n" sub-maps. The number is the number of cores, but grows more slowly as the # of cores > 16. This number can be specified, but this is not currently exposed in the tuning parameters Locking occurs on the sub-maps; the outer method calls are not synchronized 2) The number of sub maps is rounded to a power of 2, to allow the low order bits of the hash of the key to be used to pick the map (via masking). 3) A getReserve that results in not-found returns a null, but adds to the table a special reserved element. 3a) This adding may result in table resizing 4) A getReserve that finds a special reserved element, knows that some other thread is in the process of adding an entry for that key, so it waits. 5) A put, if it finds a reserved-for-that-key element, replaces that with the real element, and then does a notifyAll to wake up any threads that were waiting (on this sub-map), and these threads then re-do the get. Multiple threads could be waiting on this, and they will all wake-up.

All calls are of the getReserved, followed by a put if the getReserved returns null. (Experiment - disabled after no change noted To improve locality of reference, an aux data structure of size to fit in one cache line of a Power7 (128 bytes) caches the latest lookups)

  • Field Details

    • TUNE

      static final boolean TUNE
      See Also:
    • check

      static final boolean check
      See Also:
    • DEFAULT_CONCURRENCY_LEVEL

      static int DEFAULT_CONCURRENCY_LEVEL
      must be a power of 2, > 0 package private for testing not final to allow test case to reset it must not be changed during multi-thread operation
    • loadFactor

      private final float loadFactor
      See Also:
    • initialCapacity

      private final int initialCapacity
    • useCache

      private final boolean useCache
    • concurrencyLevel

      private final int concurrencyLevel
    • concurrencyBitmask

      private final int concurrencyBitmask
    • concurrencyLevelBits

      private final int concurrencyLevelBits
    • subMaps

      private final JCasHashMapSubMap[] subMaps
    • subMapInitialCapacity

      private final int subMapInitialCapacity
    • oneSubmap

      private final JCasHashMapSubMap oneSubmap
    • C1

      private static final int C1
      See Also:
    • C2

      private static final int C2
      See Also:
    • seed

      private static final int seed
      See Also:
  • Constructor Details

    • JCasHashMap

      JCasHashMap(int capacity, boolean doUseCache)
    • JCasHashMap

      JCasHashMap(int capacity, boolean doUseCache, int aConcurrencyLevel)
  • Method Details

    • getDEFAULT_CONCURRENCY_LEVEL

      static int getDEFAULT_CONCURRENCY_LEVEL()
    • setDEFAULT_CONCURRENCY_LEVEL

      static void setDEFAULT_CONCURRENCY_LEVEL(int dEFAULT_CONCURRENCY_LEVEL)
    • concurrencyLimitedByInitialCapacity

      static boolean concurrencyLimitedByInitialCapacity(int currentConcurrencyLevel, int curMapSize)
      initial capacity (other than testing), is by default (from JCasImpl) is bigger of 256 and cas heap initial size (500,000) / 16 = 31K but users may set it lower in their uima configuration We use the current capacity of the JCasHashMap to set the concurrency limit
      Parameters:
      casCapacity - the capacity
      Returns:
      true if the concurrency is limited, and could increase with reallocation
    • sizeAdjustedConcurrency

      static int sizeAdjustedConcurrency(int curMapSize)
    • clear

      public void clear()
    • getSubMap

      private JCasHashMapSubMap getSubMap(int hash)
    • getReserve

      public FeatureStructureImpl getReserve(int key)
    • put

    • hashInt

      public static int hashInt(int k1)
    • getCapacities

      int[] getCapacities()
    • getCapacity

      int getCapacity()
    • getApproximateSize

      int getApproximateSize()
    • showHistogram

      public void showHistogram()
    • getConcurrencyLevel

      public int getConcurrencyLevel()