TextExtractor
Inheritance: java.lang.Object, com.aspose.pdf.groupprocessor.IVentureLicenseTarget
All Implemented Interfaces: com.aspose.pdf.groupprocessor.interfaces.IPdfTypeExtractor
public final class TextExtractor extends IVentureLicenseTarget implements IPdfTypeExtractor
Represents instance to interact with extractor.
Constructors
Constructor | Description |
---|---|
TextExtractor() | Creates TextExtractor instance. |
Fields
Field | Description |
---|---|
_numberedPages |
Methods
TextExtractor()
public TextExtractor()
Creates TextExtractor instance.
_numberedPages
public final System.Collections.Generic.Dictionary<Integer,Page> _numberedPages
initialize(String pdfDocumentPath, int bufferSize, boolean allowAsyncInitialization)
public void initialize(String pdfDocumentPath, int bufferSize, boolean allowAsyncInitialization)
Initializes TextExtractor instance.
Parameters:
Parameter | Type | Description |
---|---|---|
pdfDocumentPath | java.lang.String | Path to a pdf document. |
bufferSize | int | Maximum size of content in bytes that can be kept in memory. |
allowAsyncInitialization | boolean | Allows async initialization of resources. |
initialize(System.IO.Stream pdfDocumentStream, int bufferSize, boolean allowAsyncInitialization)
public void initialize(System.IO.Stream pdfDocumentStream, int bufferSize, boolean allowAsyncInitialization)
Initializes TextExtractor instance.
Parameters:
Parameter | Type | Description |
---|---|---|
pdfDocumentStream | com.aspose.ms.System.IO.Stream | Stream containing pdf document. |
bufferSize | int | Maximum size of content in bytes that can be kept in memory. |
allowAsyncInitialization | boolean | Allows async initialization of resources. |
initialize(String pdfDocumentPath, String password, int bufferSize, boolean allowAsyncInitialization)
public void initialize(String pdfDocumentPath, String password, int bufferSize, boolean allowAsyncInitialization)
Initializes TextExtractor instance.
Parameters:
Parameter | Type | Description |
---|---|---|
pdfDocumentPath | java.lang.String | Path to a pdf document. |
password | java.lang.String | Document password. |
bufferSize | int | Maximum size of content in bytes that can be kept in memory. |
allowAsyncInitialization | boolean | Allows async initialization of resources. |
initialize(System.IO.Stream pdfDocumentStream, String password, int bufferSize, boolean allowAsyncInitialization)
public void initialize(System.IO.Stream pdfDocumentStream, String password, int bufferSize, boolean allowAsyncInitialization)
Initializes TextExtractor instance.
Parameters:
Parameter | Type | Description |
---|---|---|
pdfDocumentStream | com.aspose.ms.System.IO.Stream | Stream containing pdf document. |
password | java.lang.String | Document password. |
bufferSize | int | Maximum size of content in bytes that can be kept in memory. |
allowAsyncInitialization | boolean | Allows async initialization of resources. |
initializeAlternative(String pdfDocumentPath)
public void initializeAlternative(String pdfDocumentPath)
Initializes TextExtractor instance.
Parameters:
Parameter | Type | Description |
---|---|---|
pdfDocumentPath | java.lang.String | Path to a pdf document. |
initializeAlternative(System.IO.Stream pdfDocumentStream)
public void initializeAlternative(System.IO.Stream pdfDocumentStream)
Initializes TextExtractor instance.
Parameters:
Parameter | Type | Description |
---|---|---|
pdfDocumentStream | com.aspose.ms.System.IO.Stream | Stream containing pdf document. |
initializeAlternative(String pdfDocumentPath, String password)
public void initializeAlternative(String pdfDocumentPath, String password)
Initializes TextExtractor instance.
Parameters:
Parameter | Type | Description |
---|---|---|
pdfDocumentPath | java.lang.String | Path to a pdf document. |
password | java.lang.String |
initializeAlternative(System.IO.Stream pdfDocumentStream, String password)
public void initializeAlternative(System.IO.Stream pdfDocumentStream, String password)
Initializes TextExtractor instance.
Parameters:
Parameter | Type | Description |
---|---|---|
pdfDocumentStream | com.aspose.ms.System.IO.Stream | Stream containing pdf document. |
password | java.lang.String |
buildProperties(ByteRange range, PdfTreeNode parentNode)
public long buildProperties(ByteRange range, PdfTreeNode parentNode)
Builds tree of nodes those contain all pdf parameters with their values.
Parameters:
Parameter | Type | Description |
---|---|---|
range | com.aspose.pdf.groupprocessor.ByteRange | Byte range where to parse parameters. |
parentNode | com.aspose.pdf.groupprocessor.PdfTreeNode | Initial (root) node for building tree. |
Returns: long - long value, the last index of the parsed range.
buildProperties(ByteRange range, PdfTreeNode parentNode, boolean extractJustValue)
public long buildProperties(ByteRange range, PdfTreeNode parentNode, boolean extractJustValue)
Builds tree of nodes those contain all pdf parameters with their values.
Parameters:
Parameter | Type | Description |
---|---|---|
range | com.aspose.pdf.groupprocessor.ByteRange | Byte range where to parse parameters. |
parentNode | com.aspose.pdf.groupprocessor.PdfTreeNode | Initial (root) node for building tree. |
extractJustValue | boolean | For recursive calling. Just shows that next recursive function should find parameter value but not parameter itself. |
Returns: long - Last index of the parsed range.
extractAllText()
public String[] extractAllText()
Extracts text from the document
Returns: java.lang.String[] - Array of strings representing document text
extractAllTextInternal()
public String[] extractAllTextInternal()
Returns: java.lang.String[]
extractPageText(int pageNumber)
public String extractPageText(int pageNumber)
Extracts text from the page
Parameters:
Parameter | Type | Description |
---|---|---|
pageNumber | int | 1-based number of the page |
Returns: java.lang.String - Text
getPageCount()
public int getPageCount()
Gets count of pages in the document.
Returns: int - page count
close()
public void close()
Closes all resources used by this instance.
dispose()
public void dispose()
Dispose object This method is obsolete, use close() instead.
getVersion()
public String getVersion()
For Internal usage only
Returns: java.lang.String - string object
isFastExtractionUsed()
public boolean isFastExtractionUsed()
Returns TRUE if the fast extraction was used
Returns: boolean - boolean value
setVentureLicense(VentureLicense license)
public final void setVentureLicense(VentureLicense license)
Parameters:
Parameter | Type | Description |
---|---|---|
license | VentureLicense |
getVentureLicense()
public final VentureLicense getVentureLicense()
Returns: VentureLicense