TextAbsorber
Inheritance: java.lang.Object
public class TextAbsorber
Represents an absorber object of a text. Performs text extraction and provides access to the result via TextAbsorber.Text object.
The example demonstrates how to extract text on the first PDF document page.
// open document
Document doc = new Document(inFile);
// create TextAbsorber object to extract text
TextAbsorber absorber = new TextAbsorber();
// accept the absorber for first page
doc.getPages().get(1).accept(absorber);
// get the extracted text
String extractedText = absorber.getText();
The TextAbsorber object is used to extract text from a Pdf document or the document’s page.
Constructors
Constructor | Description |
---|---|
TextAbsorber() | Initializes a new instance of the TextAbsorber . |
TextAbsorber(TextExtractionOptions extractionOptions) | Initializes a new instance of the TextAbsorber with extraction options. |
TextAbsorber(TextExtractionOptions extractionOptions, TextSearchOptions textSearchOptions) | Initializes a new instance of the TextAbsorber with extraction and text search options. |
TextAbsorber(TextSearchOptions textSearchOptions) | Initializes a new instance of the TextAbsorber with text search options. |
Methods
Method | Description |
---|---|
getText() | Gets extracted text that the TextAbsorber extracts on the PDF document or page. |
hasErrors() | Value indicates whether errors were found during text extraction. |
getErrors() | List of TextExtractionError objects. |
visit(Page page) | Extracts text on the specified page |
visit(XForm form) | Extracts text on the specified XForm. |
visit(IDocument pdf) | Extracts text on the specified document |
getExtractionOptions() | Gets text extraction options. |
setExtractionOptions(TextExtractionOptions value) | Sets text extraction options. |
getTextSearchOptions() | Gets text search options. |
setTextSearchOptions(TextSearchOptions value) | Sets text search options. |
TextAbsorber()
public TextAbsorber()
Initializes a new instance of the TextAbsorber .
The example demonstrates how to extract text from all pages of the PDF document.
// open document
Document doc = new Document(inFile);
// create TextAbsorber object to extract text
TextAbsorber absorber = new TextAbsorber();
// accept the absorber for all document's pages
doc.getPages().accept(absorber);
// get the extracted text
String extractedText = absorber.getText();
Performs text extraction and provides access to the extracted text via TextAbsorber.Text object.
TextAbsorber(TextExtractionOptions extractionOptions)
public TextAbsorber(TextExtractionOptions extractionOptions)
Initializes a new instance of the TextAbsorber with extraction options.
The example demonstrates how to extract text from all pages of the PDF document.
// open document
Document doc = new Document(inFile);
// create TextAbsorber object to extract text with formatting
TextAbsorber absorber = new TextAbsorber(new TextExtractionOptions(TextExtractionOptions.TextFormattingMode.Pure));
// accept the absorber for all document's pages
doc.getPages().accept(absorber);
// get the extracted text
String extractedText = absorber.getText();
Performs text extraction and provides access to the extracted text via TextAbsorber.Text object.
Parameters:
Parameter | Type | Description |
---|---|---|
extractionOptions | TextExtractionOptions | Text extraction options |
——————– |
TextAbsorber(TextExtractionOptions extractionOptions, TextSearchOptions textSearchOptions)
public TextAbsorber(TextExtractionOptions extractionOptions, TextSearchOptions textSearchOptions)
Initializes a new instance of the TextAbsorber with extraction and text search options.
Parameters:
Parameter | Type | Description |
---|---|---|
extractionOptions | TextExtractionOptions | Text extraction options |
textSearchOptions | TextSearchOptions | Text search options |
Performs text extraction and provides access to the extracted text via TextAbsorber.Text object. |
TextAbsorber(TextSearchOptions textSearchOptions)
public TextAbsorber(TextSearchOptions textSearchOptions)
Initializes a new instance of the TextAbsorber with text search options.
Parameters:
Parameter | Type | Description |
---|---|---|
textSearchOptions | TextSearchOptions | Text search options |
Performs text extraction and provides access to the extracted text via TextAbsorber.Text object. |
getText()
public String getText()
Gets extracted text that the TextAbsorber extracts on the PDF document or page.
Returns: java.lang.String - String value
The example demonstrates how to extract text from all pages of the PDF document.
// open document
Document doc = new Document(inFile);
// create TextAbsorber object to extract text
TextAbsorber absorber = new TextAbsorber();
// accept the absorber for all document's pages
doc.getPages().accept(absorber);
// get the extracted text
String extractedText = absorber.getText();
hasErrors()
public boolean hasErrors()
Value indicates whether errors were found during text extraction. Searching for errors will performed only if TextSearchOptions.LogTextExtractionErrors = true; And it may decrease performance.
Returns: boolean - boolean value
getErrors()
public List<TextExtractionError> getErrors()
List of TextExtractionError objects. It contain information about errors were found during text extraction. Searching for errors will performed only if TextSearchOptions.LogTextExtractionErrors = true; And it may decrease performance.
Returns: java.util.List<com.aspose.pdf.TextExtractionError> - List of TextExtractionError objects
visit(Page page)
public void visit(Page page)
Extracts text on the specified page
The example demonstrates how to extract text on the first PDF document page.
// open document
Document doc = new Document(inFile);
// create TextAbsorber object to extract text
TextAbsorber absorber = new TextAbsorber();
// accept the absorber for all document's pages
absorber.visit(doc.getPages(1));
// get the extracted text
String extractedText = absorber.getText();
Parameters:
Parameter | Type | Description |
---|---|---|
page | Page | Pdf pocument page object. |
visit(XForm form)
public void visit(XForm form)
Extracts text on the specified XForm.
The example demonstrates how to extract text on the first PDF document page.
// open document
Document doc = new Document(inFile);
// create TextAbsorber object to extract text
TextAbsorber absorber = new TextAbsorber();
// accept the absorber for all document's pages
absorber.visit(doc.Pages().get(1).getResources().getForms().get("Xform1"));
// get the extracted text
String extractedText = absorber.getText();
Parameters:
Parameter | Type | Description |
---|---|---|
form | XForm | Pdf form object. |
visit(IDocument pdf)
public void visit(IDocument pdf)
Extracts text on the specified document
The example demonstrates how to extract text on PDF document.
// open document
Document doc = new Document(inFile);
// create TextAbsorber object to extract text
TextAbsorber absorber = new TextAbsorber();
// accept the absorber for all document's pages
absorber.visit(doc);
// get the extracted text
String extractedText = absorber.getText();
Parameters:
Parameter | Type | Description |
---|---|---|
IDocument | Pdf pocument object. |
getExtractionOptions()
public TextExtractionOptions getExtractionOptions()
Gets text extraction options.
The example demonstrates how to set Pure text formatting mode and perform text extraction.
// open document
Document doc = new Document(inFile);
// create TextAbsorber object to extract text with formatting
TextAbsorber absorber = new TextAbsorber();
// set pure text formatting mode
absorber.setExtractionOptions ( new TextExtractionOptions(TextExtractionOptions.TextFormattingMode.Pure));
// accept the absorber for all document's pages
doc.getPages().accept(absorber);
// get the extracted text
String extractedText = absorber.getText();
Allows to define text formatting mode TextExtractionOptions during extraction. The default mode is TextExtractionOptions.TextFormattingMode.Pure
Returns: TextExtractionOptions - TextExtractionOptions value
setExtractionOptions(TextExtractionOptions value)
public void setExtractionOptions(TextExtractionOptions value)
Sets text extraction options.
The example demonstrates how to set Pure text formatting mode and perform text extraction.
// open document
Document doc = new Document(inFile);
// create TextAbsorber object to extract text with formatting
TextAbsorber absorber = new TextAbsorber();
// set pure text formatting mode
absorber.setExtractionOptions ( new TextExtractionOptions(TextExtractionOptions.TextFormattingMode.Pure));
// accept the absorber for all document's pages
doc.getPages().accept(absorber);
// get the extracted text
String extractedText = absorber.getText();
Allows to define text formatting mode TextExtractionOptions during extraction. The default mode is TextExtractionOptions.TextFormattingMode.Pure
Parameters:
Parameter | Type | Description |
---|---|---|
value | TextExtractionOptions | TextExtractionOptions value |
getTextSearchOptions()
public TextSearchOptions getTextSearchOptions()
Gets text search options.
Allows to define rectangle which delimits the extracted text. By default the rectangle is empty. That means page boundaries only defines the text extraction region.
Returns: TextSearchOptions - TextSearchOptions value
setTextSearchOptions(TextSearchOptions value)
public void setTextSearchOptions(TextSearchOptions value)
Sets text search options.
Allows to define rectangle which delimits the extracted text. By default the rectangle is empty. That means page boundaries only defines the text extraction region.
Parameters:
Parameter | Type | Description |
---|---|---|
value | TextSearchOptions | TextSearchOptions value |