Class NgramFrequencyData

java.lang.Object
com.optimaize.langdetect.NgramFrequencyData

public final class NgramFrequencyData extends Object
Contains frequency information for n-grams coming from multiple LanguageProfiles.

For each n-gram string it knows the locales (languages) in which it occurs, and how frequent it occurs in those languages in relation to other n-grams of the same length in those same languages.

Immutable by definition (can't make Arrays unmodifiable).

  • Field Details

    • wordLangProbMap

      @NotNull private final @NotNull Map<String,double[]> wordLangProbMap
      Key = ngram Value = array with probabilities per loaded language, in the same order as langlist.
    • langlist

      @NotNull private final @NotNull List<LdLocale> langlist
      All the loaded languages, in exactly the same order as the data is in the double[] in wordLangProbMap. Example: if wordLangProbMap has an entry for the n-gram "foo" then for each locale in this langlist here it has a value there. Languages that don't know the n-gram have the value 0d.
  • Constructor Details

    • NgramFrequencyData

      private NgramFrequencyData(@NotNull @NotNull Map<String,double[]> wordLangProbMap, @NotNull @NotNull List<LdLocale> langlist)
  • Method Details

    • create

      @NotNull public static @NotNull NgramFrequencyData create(@NotNull @NotNull Collection<LanguageProfile> languageProfiles, @NotNull @NotNull Collection<Integer> gramLengths) throws IllegalArgumentException
      Parameters:
      gramLengths - for example [1,2,3]
      Throws:
      IllegalArgumentException - if languageProfiles or gramLengths is empty, or if one of the languageProfiles does not have the grams of the required sizes.
    • getLanguageList

      @NotNull public @NotNull List<LdLocale> getLanguageList()
    • getLanguage

      @NotNull public @NotNull LdLocale getLanguage(int pos)
    • getProbabilities

      @Nullable public @org.jetbrains.annotations.Nullable double[] getProbabilities(String ngram)
      Don't modify this data structure! (Can't make array immutable...)
      Returns:
      null if no language profile knows that ngram. entries are 0 for languages that don't know that ngram at all. The array is in the order of the getLanguageList() language list, and has exactly that size. impl note: this way the caller can handle it more efficient than returning an empty array.