TextDevice
Inheritance: java.lang.Object, com.aspose.pdf.devices.Device, com.aspose.pdf.devices.PageDevice
public final class TextDevice extends PageDevice
Represents class for converting pdf document pages into text.
The example demonstrates how to extract text on the first PDF document page.
Document doc = new Document(inFile);
String extractedText;
ByteArrayOutputStream ms = new ByteArrayOutputStream();
try
{
// create text device
TextDevice device = new TextDevice();
// convert the page and save text to the stream
device.process(doc.getPages().get_Item(1), ms);
// use the extracted text
extractedText = Encoding.getUnicode().getString(ms.toByteArray());
ms.close();
} catch (IOException e) {
e.printStackTrace();
}
The TextDevice object is basically used to extract text from pdf page.
Constructors
Constructor | Description |
---|---|
TextDevice(TextExtractionOptions extractionOptions) | Initializes a new instance of the TextDevice with text extraction options. |
TextDevice() | Initializes a new instance of the TextDevice with the Raw text formatting mode and Unicode text encoding. |
TextDevice(TextEncodingInternal encoding) | Initializes a new instance of the TextDevice for the specified encoding. |
TextDevice(Charset encoding) | Initializes a new instance of the TextDevice for the specified encoding. |
TextDevice(TextExtractionOptions extractionOptions, TextEncodingInternal encoding) | Initializes a new instance of the TextDevice for the specified encoding with text extraction options. |
TextDevice(TextExtractionOptions extractionOptions, Charset encoding) | Initializes a new instance of the TextDevice for the specified encoding with text extraction options. |
Methods
Method | Description |
---|---|
getExtractionOptions() | Gets text extraction options. |
setExtractionOptions(TextExtractionOptions value) | Sets text extraction options. |
getEncodingInternal() | Gets encoding of extracted text. |
getEncoding() | Gets encoding of extracted text. |
setEncodingInternal(TextEncodingInternal value) | Sets encoding of extracted text. |
setEncoding(Charset value) | Sets encoding of extracted text. |
processInternal(Page page, System.IO.Stream output) | Convert page and save it as text stream. |
process(Page page, OutputStream output) | Convert page and save it as text stream. |
TextDevice(TextExtractionOptions extractionOptions)
public TextDevice(TextExtractionOptions extractionOptions)
Initializes a new instance of the TextDevice with text extraction options.
Parameters:
Parameter | Type | Description |
---|---|---|
extractionOptions | TextExtractionOptions | Text extraction options. |
TextDevice()
public TextDevice()
Initializes a new instance of the TextDevice with the Raw text formatting mode and Unicode text encoding.
TextDevice(TextEncodingInternal encoding)
public TextDevice(TextEncodingInternal encoding)
Initializes a new instance of the TextDevice for the specified encoding.
Parameters:
Parameter | Type | Description |
---|---|---|
encoding | TextEncodingInternal | Encoding of extracted text |
TextDevice(Charset encoding)
public TextDevice(Charset encoding)
Initializes a new instance of the TextDevice for the specified encoding.
Parameters:
Parameter | Type | Description |
---|---|---|
encoding | java.nio.charset.Charset | Encoding of extracted text |
TextDevice(TextExtractionOptions extractionOptions, TextEncodingInternal encoding)
public TextDevice(TextExtractionOptions extractionOptions, TextEncodingInternal encoding)
Initializes a new instance of the TextDevice for the specified encoding with text extraction options.
Parameters:
Parameter | Type | Description |
---|---|---|
extractionOptions | TextExtractionOptions | Text extraction options. |
encoding | TextEncodingInternal | Encoding of extracted text. |
TextDevice(TextExtractionOptions extractionOptions, Charset encoding)
public TextDevice(TextExtractionOptions extractionOptions, Charset encoding)
Initializes a new instance of the TextDevice for the specified encoding with text extraction options.
Parameters:
Parameter | Type | Description |
---|---|---|
extractionOptions | TextExtractionOptions | Text extraction options. |
encoding | java.nio.charset.Charset | Encoding of extracted text. |
getExtractionOptions()
public TextExtractionOptions getExtractionOptions()
Gets text extraction options.
Returns: TextExtractionOptions - TextExtractionOptions element
The example demonstrates how to extracted text in raw order.
Document doc = new Document(inFile);
String extractedText;
// create text device
TextDevice device = new TextDevice(new TextExtractionOptions(TextExtractionOptions.TextFormattingMode.Raw));
// convert the page and save text to the stream
device.process(doc.getPages().get_Item(1), outFile);
setExtractionOptions(TextExtractionOptions value)
public void setExtractionOptions(TextExtractionOptions value)
Sets text extraction options.
Parameters:
Parameter | Type | Description |
---|---|---|
value | TextExtractionOptions | TextExtractionOptions element |
The example demonstrates how to extracted text in raw order.
Document doc = new Document(inFile);
String extractedText;
// create text device
TextDevice device = new TextDevice(new TextExtractionOptions(TextExtractionOptions.TextFormattingMode.Raw));
// convert the page and save text to the stream
device.process(doc.getPages().get_Item(1), outFile);
``` |
### getEncodingInternal() {#getEncodingInternal--}
public TextEncodingInternal getEncodingInternal()
Gets encoding of extracted text.
**Returns:**
[TextEncodingInternal](../../com.aspose.pdf/textencodinginternal) - TextEncodingInternal element
--------------------
The example demonstrates how to represent extracted text in UTF-8 encoding.
Document doc = new Document(inFile); String extractedText; // create text device TextDevice device = new TextDevice(java.nio.charset.Charset.forName(“UTF-8”)); // convert the page and save text to the stream device.process(doc.getPages().get_Item(1), outFile);
### getEncoding() {#getEncoding--}
public Charset getEncoding()
Gets encoding of extracted text.
**Returns:**
java.nio.charset.Charset - Charset element
--------------------
The example demonstrates how to represent extracted text in UTF-8 encoding.
Document doc = new Document(inFile); String extractedText; // create text device TextDevice device = new TextDevice(java.nio.charset.Charset.forName(“UTF-8”)); // convert the page and save text to the stream device.process(doc.getPages().get_Item(1), outFile);
### setEncodingInternal(TextEncodingInternal value) {#setEncodingInternal-com.aspose.pdf.TextEncodingInternal-}
public void setEncodingInternal(TextEncodingInternal value)
Sets encoding of extracted text.
**Parameters:**
| Parameter | Type | Description |
| --- | --- | --- |
| value | [TextEncodingInternal](../../com.aspose.pdf/textencodinginternal) | TextEncodingInternal element
--------------------
The example demonstrates how to represent extracted text in UTF-8 encoding.
Document doc = new Document(inFile);
String extractedText;
// create text device
TextDevice device = new TextDevice(TextEncodingInternal.getUTF8());
// convert the page and save text to the stream
device.process(doc.getPages().get_Item(1), outFile);
### setEncoding(Charset value) {#setEncoding-java.nio.charset.Charset-}
public void setEncoding(Charset value)
Sets encoding of extracted text.
**Parameters:**
| Parameter | Type | Description |
| --- | --- | --- |
| value | java.nio.charset.Charset | Charset element
--------------------
The example demonstrates how to represent extracted text in UTF-8 encoding.
Document doc = new Document(inFile);
String extractedText;
// create text device
TextDevice device = new TextDevice(java.nio.charset.Charset.forName("UTF-8"));
// convert the page and save text to the stream
device.process(doc.getPages().get_Item(1), outFile);
### processInternal(Page page, System.IO.Stream output) {#processInternal-com.aspose.pdf.Page-com.aspose.ms.System.IO.Stream-}
public void processInternal(Page page, System.IO.Stream output)
Convert page and save it as text stream.
--------------------
The example demonstrates how to extract text on the first PDF document page.
Document doc = new Document(inFile); String extractedText; ByteArrayOutputStream ms = new ByteArrayOutputStream();
// create text device TextDevice device = new TextDevice(); // convert the page and save text to the stream device.process(doc.getPages().get_Item(1), ms); // use the extracted text extractedText = Encoding.getUnicode().getString(ms.toByteArray()); ms.close();
**Parameters:**
| Parameter | Type | Description |
| --- | --- | --- |
| page | [Page](../../com.aspose.pdf/page) | The page to convert. |
| output | com.aspose.ms.System.IO.Stream | Result stream. |
### process(Page page, OutputStream output) {#process-com.aspose.pdf.Page-java.io.OutputStream-}
public void process(Page page, OutputStream output)
Convert page and save it as text stream.
--------------------
The example demonstrates how to extract text on the first PDF document page.
Document doc = new Document(inFile); String extractedText; ByteArrayOutputStream ms = new ByteArrayOutputStream();
// create text device TextDevice device = new TextDevice(); // convert the page and save text to the stream device.process(doc.getPages().get_Item(1), ms); // use the extracted text extractedText = Encoding.getUnicode().getString(ms.toByteArray()); ms.close();
**Parameters:**
| Parameter | Type | Description |
| --- | --- | --- |
| page | [Page](../../com.aspose.pdf/page) | The page to convert. |
| output | java.io.OutputStream | Result stream. |