Class TextExtractorOptions

TextExtractorOptions class

Represents text extraction options for the TextExtractor plugin.

public sealed class TextExtractorOptions : PdfExtractorOptions

Constructors

NameDescription
TextExtractorOptions()Initializes a new instance of the TextExtractorOptions object with ‘Raw’ (default) text formatting mode.
TextExtractorOptions(TextFormattingMode)Initializes a new instance of the TextExtractorOptions object for the specified text formatting mode.

Properties

NameDescription
FormattingMode { get; }Gets formatting mode.
Inputs { get; }Returns PdfExtractor plugin data collection.
override OperationName { get; }Returns name of the operation.

Methods

NameDescription
AddInput(IDataSource)Adds new data source to the PdfExtractor plugin data collection.

Other Members

NameDescription
enum TextFormattingModeDefines different modes which can be used while converting a PDF document into text. See TextExtractorOptions class.

Remarks

The TextExtractorOptions object is used to set TextFormattingMode and another options for the text extraction operation. Also, it inherits functions to add data (files, streams) representing input PDF documents.

Examples

The example demonstrates how to extract text content of PDF document.

// create TextExtractor object to extract PDF contents
using (TextExtractor extractor = new TextExtractor())
{
    // create TextExtractorOptions object to set TextFormattingMode (Pure,  or Raw - default)
    extractorOptions = new TextExtractorOptions(TextExtractorOptions.TextFormattingMode.Pure);
    
    // add input file path to data sources
    extractorOptions.AddInput(new FileDataSource(inputPath));
    
    // perform extraction process
    ResultContainer resultContainer = extractor.Process(extractorOptions);
    
    // get the extracted text from the ResultContainer object
    string textExtracted = resultContainer.ResultCollection[0].ToString();
}

See Also