Package org.apache.uima.cas.impl
Class FSIndexFlat<T extends FeatureStructure>
java.lang.Object
org.apache.uima.cas.impl.FSIndexFlat<T>
Flattened indexes built as a speed-up alternative for Sorted indexes.
(might someday be extended to bag/ set, but those index iterators don't need to "sort" among subtypes)
The flattened version has several performance benefits over the normal sorted iterators
- there's no maintenance of the ordering of subtypes (via heapifyUp and heapifyDown methods)
- the conversion from the CAS int heap format to the Java cover class instance is done
once when the iterator is constructed.
Only built for Sorted indexes which have subtypes (needing merging for the total sort ordering)
Each FsLeafIndexImpl (one per cas-view, per different index, per type and subtypes of that index definition)
has a lazily-created associated instance of this class. It is lazily created because there may in general be
1000's of types/subtypes which are never iterated over.
It's created when the iicp cache is created, which is when the first iterator over this
cas-view/index/(type or subtype) is created
It's only created for sorted indexes
The flattened version is "thrown away" if an index update occurs to the type or any of the subtypes included in
the iteration, because it's no longer valid.
This condition is checked for when the iterator is created, but not checked for afterwards.
This means that these iterators are not "fail fast".
The build of the flattened version is done only after some amount of
normal iterating is done with no intervening index update. This is done
by keeping a counter of the number of times the "heapify up" or "heapify down"
is called, and comparing it against the total number of things in the index.
The counter is reset when an iterator is called for and the code detects that an update has happened to the
the type or subtypes, since the last time monitoring was started for updates.
The effect of this is to delay creating flattened
versions until it's pretty certain that they'll be stable for a while.
Threading
The flattened version creation is done on the same thread as the iterator causing it.
An experimental version was tried which ran these on separate threads, but that created a lot of complex
synchronization code, including handling cases where a CAS Reset occurs, but the index flattening thread is
still running. Also, much more synchronization / volatile / atomic kinds of operations were required, which
can slow down the iterating.
Because the CAS is single threaded for updates, but can have multiple threads "reading" it, with this feature,
"reading" the CAS using an iterator potentially results in the creation of new flattened indexes.
So, the creation activity is locked so only one thread does this, using an AtomicBoolean.
Many of normally volatile variables are not marked this way, because their values only need to be approximate.
An example is the counters used to determine if it's time to build the flat iterator. These are potentially
updated on multiple threads, so should be atomic, etc., but this is not really needed, because the effect of
using a locally cached value instead of the real on from another thread is only to somewhat delay the creation point.
ConcurrentModificationException is checked for using the isUpdateFreeSinceLastCounterReset method.
MoveToFirst/Last/FS doesn't "reset" the CME as is done in other iterators, because this is looking at a flattened snapshot.
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic class
FSIndexFlat.FSIteratorFlat<TI extends FeatureStructure>
-
Field Summary
FieldsModifier and TypeFieldDescription(package private) final int
(package private) int
private final int
private static final boolean
private static final Thread
static final boolean
private static final AtomicLong
private SoftReference
<T[]> The flattened version of the above, or null set under fsaLockprivate final FSIndexRepositoryImpl.IndexIteratorCachePair
<T> A reference to the non-flat shared index iterator cache pair(package private) final Int2IntArrayMapFixedSize
The values of the index update count, for all type/subtypes These are set on multiple threads whenever a flattened index is created as part of an iterator creation, under the lock These are read on multiple threads to determine if a flattened iterator is still valid.private AtomicBoolean
This flag is reset when the indexed is flushed.private final AtomicBoolean
false -> true by the thread reading / updating shared structure including creating an iterator (to prevent fsa reset while setting up) flatting an indexprivate int
Counter incremented by heapifyUp and Down, while iterating, perhaps on multiple threads Even so, we don't bother with thread sync given the use.private static final int
private static final AtomicInteger
private int
private static final AtomicInteger
private static final AtomicInteger
private static final boolean
static final int
(package private) static final boolean
private static final boolean
-
Constructor Summary
ConstructorsConstructorDescriptionConstructor -
Method Summary
Modifier and TypeMethodDescription(package private) void
private boolean
Called when it is determined that a flattened index would be good to have, and may not exist.private void
(package private) void
flush()
called when index is cleared(package private) boolean
An approximate test for seeing if this has a valid flat index It's approximate because another thread (running GC for example) could sneak in and invalid the results.private String
idInfo()
(package private) void
(package private) void
incrementReorderingCount
(int n) iterator()
This iterator either returns an iterator over the flattened index, or null.As of July 2015, flattened indexes are disabled - too little benefit, too many edge cases: edge cases to handle: going from non-JCas -> JCas requires existing flat indexes to be invalidated edge case: entering a PEAR, may require different impl of flattened indexes while in the PEAR, plus restoration of previous versions upon PEAR exit This iterator either returns an iterator over the flattened index, or null.private FSIndexFlat.FSIteratorFlat
<T> iteratorCore
(FeatureStructure fs, T[] localFsa) private FSIndexFlat.FSIteratorFlat
<T> (package private) String
verifyFsaSubsumes
(FeatureStructure[] localFsa)
-
Field Details
-
enabled
public static final boolean enabled- See Also:
-
trace
static final boolean trace- See Also:
-
smalltrace
private static final boolean smalltrace- See Also:
-
tune
private static final boolean tune -
debugTypeCodeUnstable
private static final boolean debugTypeCodeUnstable- See Also:
-
THRESHOLD_FOR_FLATTENING
public static final int THRESHOLD_FOR_FLATTENING- See Also:
-
NUMBER_DISCARDED_RESETABLE_MAX
private static final int NUMBER_DISCARDED_RESETABLE_MAX- See Also:
-
flattenTime
-
iicp
A reference to the non-flat shared index iterator cache pair -
fsa
The flattened version of the above, or null set under fsaLock -
isLocked
false -> true by the thread reading / updating shared structure including creating an iterator (to prevent fsa reset while setting up) flatting an index -
iteratorReorderingCount
private int iteratorReorderingCountCounter incremented by heapifyUp and Down, while iterating, perhaps on multiple threads Even so, we don't bother with thread sync given the use. -
indexUpdateCountsResetValues
The values of the index update count, for all type/subtypes These are set on multiple threads whenever a flattened index is created as part of an iterator creation, under the lock These are read on multiple threads to determine if a flattened iterator is still valid. -
isInIteratedSortedIndexes
This flag is reset when the indexed is flushed. It being reset causes the first flat iterator created to add it back into the list of things needing "flushing". The iterator creation may occur on multiple threads. -
numberFlattened
-
numberDiscardedDueToUpdates
-
numberDiscardedResetable
private volatile int numberDiscardedResetable -
numberFlatIterators
-
casResetCount
volatile int casResetCount -
casId
final int casId -
debugTypeCode
private final int debugTypeCode -
dumpMeasurements
-
-
Constructor Details
-
FSIndexFlat
Constructor- Parameters:
iicp
- the sorted index for a type being cached
-
-
Method Details
-
incrementReorderingCount
void incrementReorderingCount() -
incrementReorderingCount
void incrementReorderingCount(int n) -
flush
void flush()called when index is cleared -
idInfo
-
createFlattened
private boolean createFlattened()Called when it is determined that a flattened index would be good to have, and may not exist. This builds the flattened index, or returns if something else is already building it- Returns:
- true if flat index was created, false if skipped because another thread is building it.
-
verifyFsaSubsumes
-
captureIndexUpdateCounts
void captureIndexUpdateCounts() -
iterator
This iterator either returns an iterator over the flattened index, or null. positioned at the first element (if non empty).- Returns:
- the iterator
-
iterator
As of July 2015, flattened indexes are disabled - too little benefit, too many edge cases: edge cases to handle: going from non-JCas -> JCas requires existing flat indexes to be invalidated edge case: entering a PEAR, may require different impl of flattened indexes while in the PEAR, plus restoration of previous versions upon PEAR exit This iterator either returns an iterator over the flattened index, or null. As a side effect, if there is no flattened index, check the counts and if there's enough, kick off a subtask to create the flattened one.- Parameters:
fs
- the feature structure to use as a template for setting the initial position of this iterator- Returns:
- the iterator, or null if there's no flattened iterator (the caller will construct the appropriate iterator)
-
tryFlatIterator
-
discardFlattened
private void discardFlattened() -
iteratorCore
-
hasFlatIndex
boolean hasFlatIndex()An approximate test for seeing if this has a valid flat index It's approximate because another thread (running GC for example) could sneak in and invalid the results.- Returns:
- true if fsa not null and the index hasn't been updated
-