Class XmlDetagger

All Implemented Interfaces:
AnalysisComponent

public class XmlDetagger extends CasAnnotator_ImplBase
A multi-sofa annotator that does XML detagging. Reads XML data from the input Sofa (named "xmlDocument"); this data can be stored in the CAS as a string or array, or it can be a URI to a remote file. The XML is parsed using the JVM's default parser, and the plain-text content is written to a new sofa called "plainTextDocument".
  • Field Details

    • PARAM_TEXT_TAG

      public static final String PARAM_TEXT_TAG
      Name of optional configuration parameter that contains the name of an XML tag that appears in the input file. Only text that falls within this XML tag will be considered part of the "document" that it is added to the CAS by this CAS Initializer. If not specified, the entire file will be considered the document.
      See Also:
    • parserFactory

      private SAXParserFactory parserFactory
    • sourceDocInfoType

      private Type sourceDocInfoType
    • mXmlTagContainingText

      private String mXmlTagContainingText
  • Constructor Details

    • XmlDetagger

      public XmlDetagger()
  • Method Details