PdfExtractor

PdfExtractor class

Class for extracting images and text from PDF document.

public sealed class PdfExtractor : Facade

Constructors

Name Description
PdfExtractor() Initializes new PdfExtractor object.
PdfExtractor(Document) Initializes new PdfExtractor object on base of the document.

Properties

Name Description
Document { get; } Gets the document facade is working on.
EndPage { get; set; } Gets or sets end page in the page range where extracting operation will be performed.
ExtractImageMode { get; set; } Sets the mode for extract images process.
ExtractTextMode { get; set; } Sets the mode for extract text’s result.
IsBidi { get; } Is true when text has hebriew or arabic symbols. This case must be specially considered because string functions change their behaviour and start process text from right to left (except numbers and other non text chars).
Password { get; set; } Gets or sets input file’s password.
Resolution { get; set; } Set or gets resolution for extracted images. Default value is 150. Images which have greater resolution value are more clear. However increasing resolution value results in increasing time and memory needed to extract images. Usually to get clear image it’s enough to set resolution to 150 or 300.
StartPage { get; set; } Gets or sets start page in the page range where extracting operation will be performed.
TextSearchOptions { get; set; } Gets or sets text search options.

Methods

Name Description
virtual BindPdf(Document) Initializes the facade.
override BindPdf(Stream) Binds PDF document from stream.
override BindPdf(string) Bind input PDF file.
virtual Close() Disposes Aspose.Pdf.Document bound with a facade.
Dispose() Disposes the facade.
ExtractAttachment() Extracts attachments from a Pdf document.
ExtractAttachment(string) Extracts attachment to PDF file by attachment name.
ExtractImage() Extract images from PDF file.
ExtractText() Extracts text from a Pdf document using Unicode encoding.
ExtractText(Encoding) Extracts text from a Pdf document using specified encoding.
GetAttachment() Saves all the attachment file to streams.
GetAttachment(string) Stores attachment into file.
GetAttachmentInfo() Gets the list of attachments.
GetAttachNames() Returns list of attachments in PDF file. Note: ExtractAttachments must be called befor using this method.
GetNextImage(Stream) Retreive next image from PDF file and stores it into stream.
GetNextImage(string) Retreives next image from PDF document. Note: ExtractImage must be called before using of this method.
GetNextImage(Stream, ImageFormat) Retreive next image from PDF file and stores it into stream with given image format.
GetNextImage(string, ImageFormat) Retreives next image from PDF document with given image format. Note: ExtractImage must be called before using of this method.
GetNextPageText(Stream) Saves one page’s text to stream.
GetNextPageText(string) Saves one page’s text to file.
GetText(Stream) Saves text to stream. see also:ExtractText
GetText(string) Saves text to file. see also:ExtractText
GetText(Stream, bool) Saves text to stream. see also:ExtractText
HasNextImage() Checks if more images are accessible in PDF document. Note: ExtractImage must be called before using of this method.
HasNextPageText() Indicates that whether can get more texts or not.

See Also