PdfTextExtractionOptions

PdfTextExtractionOptions class

Represents text extraction options for the PdfExtractor plugin.

public sealed class PdfTextExtractionOptions : PdfExtractorOptions

Constructors

NameDescription
PdfTextExtractionOptions()Initializes a new instance of the PdfTextExtractionOptions object with ‘Raw’ (default) text formatting mode.
PdfTextExtractionOptions(TextFormattingMode)Initializes a new instance of the PdfTextExtractionOptions object for the specified text formatting mode.

Properties

NameDescription
DataCollection { get; }Returns PdfExtractor plugin data collection.
FormattingMode { get; }Gets formatting mode.
override OperationName { get; }Returns name of the operation.

Methods

NameDescription
AddDataSource(IDataSource)Adds new data source to the PdfExtractor plugin data collection.

Other Members

NameDescription
enum TextFormattingModeDefines different modes which can be used while converting a PDF document into text. See PdfTextExtractionOptions class.

Remarks

The PdfTextExtractionOptions object is used to set TextFormattingMode and another options for the text extraction operation. Also, it inherits functions to add data (files, streams) representing input PDF documents.

Examples

The example demonstrates how to extract text content of PDF document.

// create PdfExtractor object to extract PDF contents
using (PdfExtractor extractor = new PdfExtractor())
{
    // create PdfTextExtractionOptions object to set TextFormattingMode (Pure,  or Raw - default)
    extractorOptions = new PdfTextExtractionOptions(PdfTextExtractionOptions.TextFormattingMode.Pure);
    
    // add input file path to data sources
    extractorOptions.AddDataSource(new FileDataSource(inputPath));
    
    // perform extraction process
    ResultContainer resultContainer = extractor.Process(extractorOptions);
    
    // get the extracted text from the ResultContainer object
    string textExtracted = resultContainer.ResultCollection[0].ToText();
}

See Also