TextExtractor

Inheritance: java.lang.Object, com.aspose.pdf.groupprocessor.IVentureLicenseTarget

All Implemented Interfaces: com.aspose.pdf.groupprocessor.interfaces.IPdfTypeExtractor

public final class TextExtractor extends IVentureLicenseTarget implements IPdfTypeExtractor

Represents instance to interact with extractor.

Constructors

ConstructorDescription
TextExtractor()Creates TextExtractor instance.

Fields

FieldDescription
_numberedPages

Methods

MethodDescription
initialize(String pdfDocumentPath, int bufferSize, boolean allowAsyncInitialization)Initializes TextExtractor instance.
initialize(System.IO.Stream pdfDocumentStream, int bufferSize, boolean allowAsyncInitialization)Initializes TextExtractor instance.
initialize(String pdfDocumentPath, String password, int bufferSize, boolean allowAsyncInitialization)Initializes TextExtractor instance.
initialize(System.IO.Stream pdfDocumentStream, String password, int bufferSize, boolean allowAsyncInitialization)Initializes TextExtractor instance.
initializeAlternative(String pdfDocumentPath)Initializes TextExtractor instance.
initializeAlternative(System.IO.Stream pdfDocumentStream)Initializes TextExtractor instance.
initializeAlternative(String pdfDocumentPath, String password)Initializes TextExtractor instance.
initializeAlternative(System.IO.Stream pdfDocumentStream, String password)Initializes TextExtractor instance.
buildProperties(ByteRange range, PdfTreeNode parentNode)Builds tree of nodes those contain all pdf parameters with their values.
buildProperties(ByteRange range, PdfTreeNode parentNode, boolean extractJustValue)Builds tree of nodes those contain all pdf parameters with their values.
extractAllText()Extracts text from the document
extractAllTextInternal()
extractPageText(int pageNumber)Extracts text from the page
getPageCount()Gets count of pages in the document.
close()Closes all resources used by this instance.
dispose()Dispose object This method is obsolete, use close() instead.
getVersion()For Internal usage only
isFastExtractionUsed()Returns TRUE if the fast extraction was used
setVentureLicense(VentureLicense license)
getVentureLicense()

TextExtractor()

public TextExtractor()

Creates TextExtractor instance.

_numberedPages

public final System.Collections.Generic.Dictionary<Integer,Page> _numberedPages

initialize(String pdfDocumentPath, int bufferSize, boolean allowAsyncInitialization)

public void initialize(String pdfDocumentPath, int bufferSize, boolean allowAsyncInitialization)

Initializes TextExtractor instance.

Parameters:

ParameterTypeDescription
pdfDocumentPathjava.lang.StringPath to a pdf document.
bufferSizeintMaximum size of content in bytes that can be kept in memory.
allowAsyncInitializationbooleanAllows async initialization of resources.

initialize(System.IO.Stream pdfDocumentStream, int bufferSize, boolean allowAsyncInitialization)

public void initialize(System.IO.Stream pdfDocumentStream, int bufferSize, boolean allowAsyncInitialization)

Initializes TextExtractor instance.

Parameters:

ParameterTypeDescription
pdfDocumentStreamcom.aspose.ms.System.IO.StreamStream containing pdf document.
bufferSizeintMaximum size of content in bytes that can be kept in memory.
allowAsyncInitializationbooleanAllows async initialization of resources.

initialize(String pdfDocumentPath, String password, int bufferSize, boolean allowAsyncInitialization)

public void initialize(String pdfDocumentPath, String password, int bufferSize, boolean allowAsyncInitialization)

Initializes TextExtractor instance.

Parameters:

ParameterTypeDescription
pdfDocumentPathjava.lang.StringPath to a pdf document.
passwordjava.lang.StringDocument password.
bufferSizeintMaximum size of content in bytes that can be kept in memory.
allowAsyncInitializationbooleanAllows async initialization of resources.

initialize(System.IO.Stream pdfDocumentStream, String password, int bufferSize, boolean allowAsyncInitialization)

public void initialize(System.IO.Stream pdfDocumentStream, String password, int bufferSize, boolean allowAsyncInitialization)

Initializes TextExtractor instance.

Parameters:

ParameterTypeDescription
pdfDocumentStreamcom.aspose.ms.System.IO.StreamStream containing pdf document.
passwordjava.lang.StringDocument password.
bufferSizeintMaximum size of content in bytes that can be kept in memory.
allowAsyncInitializationbooleanAllows async initialization of resources.

initializeAlternative(String pdfDocumentPath)

public void initializeAlternative(String pdfDocumentPath)

Initializes TextExtractor instance.

Parameters:

ParameterTypeDescription
pdfDocumentPathjava.lang.StringPath to a pdf document.

initializeAlternative(System.IO.Stream pdfDocumentStream)

public void initializeAlternative(System.IO.Stream pdfDocumentStream)

Initializes TextExtractor instance.

Parameters:

ParameterTypeDescription
pdfDocumentStreamcom.aspose.ms.System.IO.StreamStream containing pdf document.

initializeAlternative(String pdfDocumentPath, String password)

public void initializeAlternative(String pdfDocumentPath, String password)

Initializes TextExtractor instance.

Parameters:

ParameterTypeDescription
pdfDocumentPathjava.lang.StringPath to a pdf document.
passwordjava.lang.String

initializeAlternative(System.IO.Stream pdfDocumentStream, String password)

public void initializeAlternative(System.IO.Stream pdfDocumentStream, String password)

Initializes TextExtractor instance.

Parameters:

ParameterTypeDescription
pdfDocumentStreamcom.aspose.ms.System.IO.StreamStream containing pdf document.
passwordjava.lang.String

buildProperties(ByteRange range, PdfTreeNode parentNode)

public long buildProperties(ByteRange range, PdfTreeNode parentNode)

Builds tree of nodes those contain all pdf parameters with their values.

Parameters:

ParameterTypeDescription
rangecom.aspose.pdf.groupprocessor.ByteRangeByte range where to parse parameters.
parentNodecom.aspose.pdf.groupprocessor.PdfTreeNodeInitial (root) node for building tree.

Returns: long - long value, the last index of the parsed range.

buildProperties(ByteRange range, PdfTreeNode parentNode, boolean extractJustValue)

public long buildProperties(ByteRange range, PdfTreeNode parentNode, boolean extractJustValue)

Builds tree of nodes those contain all pdf parameters with their values.

Parameters:

ParameterTypeDescription
rangecom.aspose.pdf.groupprocessor.ByteRangeByte range where to parse parameters.
parentNodecom.aspose.pdf.groupprocessor.PdfTreeNodeInitial (root) node for building tree.
extractJustValuebooleanFor recursive calling. Just shows that next recursive function should find parameter value but not parameter itself.

Returns: long - Last index of the parsed range.

extractAllText()

public String[] extractAllText()

Extracts text from the document

Returns: java.lang.String[] - Array of strings representing document text

extractAllTextInternal()

public String[] extractAllTextInternal()

Returns: java.lang.String[]

extractPageText(int pageNumber)

public String extractPageText(int pageNumber)

Extracts text from the page

Parameters:

ParameterTypeDescription
pageNumberint1-based number of the page

Returns: java.lang.String - Text

getPageCount()

public int getPageCount()

Gets count of pages in the document.

Returns: int - page count

close()

public void close()

Closes all resources used by this instance.

dispose()

public void dispose()

Dispose object This method is obsolete, use close() instead.

getVersion()

public String getVersion()

For Internal usage only

Returns: java.lang.String - string object

isFastExtractionUsed()

public boolean isFastExtractionUsed()

Returns TRUE if the fast extraction was used

Returns: boolean - boolean value

setVentureLicense(VentureLicense license)

public final void setVentureLicense(VentureLicense license)

Parameters:

ParameterTypeDescription
licenseVentureLicense

getVentureLicense()

public final VentureLicense getVentureLicense()

Returns: VentureLicense