TextExtractor

Inheritance: java.lang.Object, com.aspose.pdf.groupprocessor.IVentureLicenseTarget

All Implemented Interfaces: com.aspose.pdf.groupprocessor.interfaces.IPdfTypeExtractor

public final class TextExtractor extends IVentureLicenseTarget implements IPdfTypeExtractor

Represents instance to interact with extractor.

Constructors

Constructor	Description
TextExtractor()	Creates TextExtractor instance.

Fields

Field	Description
_numberedPages

Methods

Method	Description
initialize(String pdfDocumentPath, int bufferSize, boolean allowAsyncInitialization)	Initializes TextExtractor instance.
initialize(System.IO.Stream pdfDocumentStream, int bufferSize, boolean allowAsyncInitialization)	Initializes TextExtractor instance.
initialize(String pdfDocumentPath, String password, int bufferSize, boolean allowAsyncInitialization)	Initializes TextExtractor instance.
initialize(System.IO.Stream pdfDocumentStream, String password, int bufferSize, boolean allowAsyncInitialization)	Initializes TextExtractor instance.
initializeAlternative(String pdfDocumentPath)	Initializes TextExtractor instance.
initializeAlternative(System.IO.Stream pdfDocumentStream)	Initializes TextExtractor instance.
initializeAlternative(String pdfDocumentPath, String password)	Initializes TextExtractor instance.
initializeAlternative(System.IO.Stream pdfDocumentStream, String password)	Initializes TextExtractor instance.
buildProperties(ByteRange range, PdfTreeNode parentNode)	Builds tree of nodes those contain all pdf parameters with their values.
buildProperties(ByteRange range, PdfTreeNode parentNode, boolean extractJustValue)	Builds tree of nodes those contain all pdf parameters with their values.
extractAllText()	Extracts text from the document
extractAllTextInternal()
extractPageText(int pageNumber)	Extracts text from the page
getPageCount()	Gets count of pages in the document.
close()	Closes all resources used by this instance.
dispose()	Dispose object This method is obsolete, use close() instead.
getVersion()	For Internal usage only
isFastExtractionUsed()	Returns TRUE if the fast extraction was used
setVentureLicense(VentureLicense license)
getVentureLicense()

TextExtractor()

public TextExtractor()

Creates TextExtractor instance.

_numberedPages

public final System.Collections.Generic.Dictionary<Integer,Page> _numberedPages

initialize(String pdfDocumentPath, int bufferSize, boolean allowAsyncInitialization)

public void initialize(String pdfDocumentPath, int bufferSize, boolean allowAsyncInitialization)

Initializes TextExtractor instance.

Parameters:

Parameter	Type	Description
pdfDocumentPath	java.lang.String	Path to a pdf document.
bufferSize	int	Maximum size of content in bytes that can be kept in memory.
allowAsyncInitialization	boolean	Allows async initialization of resources.

initialize(System.IO.Stream pdfDocumentStream, int bufferSize, boolean allowAsyncInitialization)

public void initialize(System.IO.Stream pdfDocumentStream, int bufferSize, boolean allowAsyncInitialization)

Initializes TextExtractor instance.

Parameters:

Parameter	Type	Description
pdfDocumentStream	com.aspose.ms.System.IO.Stream	Stream containing pdf document.
bufferSize	int	Maximum size of content in bytes that can be kept in memory.
allowAsyncInitialization	boolean	Allows async initialization of resources.

initialize(String pdfDocumentPath, String password, int bufferSize, boolean allowAsyncInitialization)

public void initialize(String pdfDocumentPath, String password, int bufferSize, boolean allowAsyncInitialization)

Initializes TextExtractor instance.

Parameters:

Parameter	Type	Description
pdfDocumentPath	java.lang.String	Path to a pdf document.
password	java.lang.String	Document password.
bufferSize	int	Maximum size of content in bytes that can be kept in memory.
allowAsyncInitialization	boolean	Allows async initialization of resources.

initialize(System.IO.Stream pdfDocumentStream, String password, int bufferSize, boolean allowAsyncInitialization)

public void initialize(System.IO.Stream pdfDocumentStream, String password, int bufferSize, boolean allowAsyncInitialization)

Initializes TextExtractor instance.

Parameters:

Parameter	Type	Description
pdfDocumentStream	com.aspose.ms.System.IO.Stream	Stream containing pdf document.
password	java.lang.String	Document password.
bufferSize	int	Maximum size of content in bytes that can be kept in memory.
allowAsyncInitialization	boolean	Allows async initialization of resources.

initializeAlternative(String pdfDocumentPath)

public void initializeAlternative(String pdfDocumentPath)

Initializes TextExtractor instance.

Parameters:

Parameter	Type	Description
pdfDocumentPath	java.lang.String	Path to a pdf document.

initializeAlternative(System.IO.Stream pdfDocumentStream)

public void initializeAlternative(System.IO.Stream pdfDocumentStream)

Initializes TextExtractor instance.

Parameters:

Parameter	Type	Description
pdfDocumentStream	com.aspose.ms.System.IO.Stream	Stream containing pdf document.

initializeAlternative(String pdfDocumentPath, String password)

public void initializeAlternative(String pdfDocumentPath, String password)

Initializes TextExtractor instance.

Parameters:

Parameter	Type	Description
pdfDocumentPath	java.lang.String	Path to a pdf document.
password	java.lang.String

initializeAlternative(System.IO.Stream pdfDocumentStream, String password)

public void initializeAlternative(System.IO.Stream pdfDocumentStream, String password)

Initializes TextExtractor instance.

Parameters:

Parameter	Type	Description
pdfDocumentStream	com.aspose.ms.System.IO.Stream	Stream containing pdf document.
password	java.lang.String

buildProperties(ByteRange range, PdfTreeNode parentNode)

public long buildProperties(ByteRange range, PdfTreeNode parentNode)

Builds tree of nodes those contain all pdf parameters with their values.

Parameters:

Parameter	Type	Description
range	com.aspose.pdf.groupprocessor.ByteRange	Byte range where to parse parameters.
parentNode	com.aspose.pdf.groupprocessor.PdfTreeNode	Initial (root) node for building tree.

Returns: long - long value, the last index of the parsed range.

buildProperties(ByteRange range, PdfTreeNode parentNode, boolean extractJustValue)

public long buildProperties(ByteRange range, PdfTreeNode parentNode, boolean extractJustValue)

Builds tree of nodes those contain all pdf parameters with their values.

Parameters:

Parameter	Type	Description
range	com.aspose.pdf.groupprocessor.ByteRange	Byte range where to parse parameters.
parentNode	com.aspose.pdf.groupprocessor.PdfTreeNode	Initial (root) node for building tree.
extractJustValue	boolean	For recursive calling. Just shows that next recursive function should find parameter value but not parameter itself.

Returns: long - Last index of the parsed range.

extractAllText()

public String[] extractAllText()

Extracts text from the document

Returns: java.lang.String[] - Array of strings representing document text

extractAllTextInternal()

public String[] extractAllTextInternal()

Returns: java.lang.String[]

extractPageText(int pageNumber)

public String extractPageText(int pageNumber)

Extracts text from the page

Parameters:

Parameter	Type	Description
pageNumber	int	1-based number of the page

Returns: java.lang.String - Text

getPageCount()

public int getPageCount()

Gets count of pages in the document.

Returns: int - page count

close()

public void close()

Closes all resources used by this instance.

dispose()

public void dispose()

Dispose object This method is obsolete, use close() instead.

getVersion()

public String getVersion()

For Internal usage only

Returns: java.lang.String - string object

isFastExtractionUsed()

public boolean isFastExtractionUsed()

Returns TRUE if the fast extraction was used

Returns: boolean - boolean value

setVentureLicense(VentureLicense license)

public final void setVentureLicense(VentureLicense license)

Parameters:

Parameter	Type	Description
license	VentureLicense

getVentureLicense()

public final VentureLicense getVentureLicense()

Returns: VentureLicense

PdfArrayInBuffer