Aspose::Pdf::Text::TextAbsorber Class Reference

Represents an absorber object of a text. Performs text extraction and provides access to the result via TextAbsorber::Text object. More...

#include "TextAbsorber.h"

Inherits System::Object.

Inherited by Aspose::Pdf::Text::TextFragmentAbsorber, and Aspose::Pdf::Text::TextParagraphAbsorber.

Public Member Functions

virtual ASPOSE_PDF_SHARED_API System::String get_Text ()
 Gets extracted text that the TextAbsorber extracts on the PDF document or page. More...
 
ASPOSE_PDF_SHARED_API bool get_HasErrors () const
 Value indicates whether errors were found during text extraction. Searching for errors will performed only if TextSearchOptions.LogTextExtractionErrors = true; And it may decrease performance. More...
 
ASPOSE_PDF_SHARED_API System::SharedPtr< System::Collections::Generic::List< System::SharedPtr< TextExtractionError > > > get_Errors () const
 List of TextExtractionError objects. It contain information about errors were found during text extraction. Searching for errors will performed only if TextSearchOptions.LogTextExtractionErrors = true; And it may decrease performance. More...
 
virtual ASPOSE_PDF_SHARED_API System::SharedPtr< TextExtractionOptionsget_ExtractionOptions ()
 Gets text extraction options. More...
 
virtual ASPOSE_PDF_SHARED_API void set_ExtractionOptions (System::SharedPtr< TextExtractionOptions > value)
 Sets text extraction options. More...
 
virtual ASPOSE_PDF_SHARED_API System::SharedPtr< Aspose::Pdf::Text::TextSearchOptionsget_TextSearchOptions ()
 Gets text search options. More...
 
virtual ASPOSE_PDF_SHARED_API void set_TextSearchOptions (System::SharedPtr< Aspose::Pdf::Text::TextSearchOptions > value)
 Sets text search options. More...
 
virtual ASPOSE_PDF_SHARED_API void Visit (System::SharedPtr< Page > page)
 Extracts text on the specified page More...
 
virtual ASPOSE_PDF_SHARED_API void Visit (System::SharedPtr< XForm > form)
 Extracts text on the specified XForm. More...
 
virtual ASPOSE_PDF_SHARED_API void Visit (System::SharedPtr< Document > pdf)
 Extracts text on the specified document More...
 
ASPOSE_PDF_SHARED_API TextAbsorber ()
 Initializes a new instance of the TextAbsorber. More...
 
ASPOSE_PDF_SHARED_API TextAbsorber (System::SharedPtr< TextExtractionOptions > extractionOptions)
 Initializes a new instance of the TextAbsorber with extraction options. More...
 
ASPOSE_PDF_SHARED_API TextAbsorber (System::SharedPtr< TextExtractionOptions > extractionOptions, System::SharedPtr< Aspose::Pdf::Text::TextSearchOptions > textSearchOptions)
 Initializes a new instance of the TextAbsorber with extraction and text search options. More...
 
ASPOSE_PDF_SHARED_API TextAbsorber (System::SharedPtr< Aspose::Pdf::Text::TextSearchOptions > textSearchOptions)
 Initializes a new instance of the TextAbsorber with text search options. More...
 
- Public Member Functions inherited from System::Object
ASPOSECPP_SHARED_API Object ()
 Creates object. Initializes all internal data structures. More...
 
virtual ASPOSECPP_SHARED_API ~Object ()
 Destroys object. Frees all internal data structures. More...
 
ASPOSECPP_SHARED_API Object (Object const &x)
 Copy constructor. Doesn't copy anything, really, just initializes new object and enables copy constructing subclasses. More...
 
Objectoperator= (Object const &x)
 Assignment operator. Doesn't copy anything, really, just initializes new object and enables copy constructing subclasses. More...
 
ObjectSharedRefAdded ()
 Increments shared reference count. Shouldn't be called directly; instead, use smart pointers or ThisProtector. More...
 
int SharedRefRemovedSafe ()
 Decrements and returns shared reference count. Shouldn't be called directly; instead, use smart pointers or ThisProtector. More...
 
int RemovedSharedRefs (int count)
 Decreases shared reference count by specified value. More...
 
Detail::SmartPtrCounter * WeakRefAdded ()
 Increments weak reference count. Shouldn't be called directly; instead, use smart pointers or ThisProtector. More...
 
void WeakRefRemoved ()
 Decrements weak reference count. Shouldn't be called directly; instead, use smart pointers or ThisProtector. More...
 
Detail::SmartPtrCounter * GetCounter ()
 Gets reference counter data structure associated with the object. More...
 
int SharedCount () const
 Gets current value of shared reference counter. More...
 
ASPOSECPP_SHARED_API void Lock ()
 Implements C# lock() statement locking. Call directly or use LockContext sentry object. More...
 
ASPOSECPP_SHARED_API void Unlock ()
 Implements C# lock() statement unlocking. Call directly or use LockContext sentry object. More...
 
virtual ASPOSECPP_SHARED_API bool Equals (ptr obj)
 Compares objects using C# Object.Equals semantics. More...
 
virtual ASPOSECPP_SHARED_API int32_t GetHashCode () const
 Analog of C# Object.GetHashCode() method. Enables hashing of custom objects. More...
 
virtual ASPOSECPP_SHARED_API String ToString () const
 Analog of C# Object.ToString() method. Enables converting custom objects to string. More...
 
virtual ASPOSECPP_SHARED_API ptr MemberwiseClone () const
 Analog of C# Object.MemberwiseClone() method. Enables cloning custom types. More...
 
virtual ASPOSECPP_SHARED_API const TypeInfoGetType () const
 Gets actual type of object. Analog of C# System.Object.GetType() call. More...
 
virtual ASPOSECPP_SHARED_API bool Is (const TypeInfo &targetType) const
 Check if object represents an instance of type described by targetType. Analog of C# 'is' operator. More...
 
virtual ASPOSECPP_SHARED_API void SetTemplateWeakPtr (uint32_t argument)
 Set n'th template argument a weak pointer (rather than shared). Allows switching pointers in containers to weak mode. More...
 
virtual ASPOSECPP_SHARED_API bool FastCast (const Details::FastRttiBase &helper, void **out_ptr) const
 For internal purposes only. More...
 
template<>
bool Equals (float const &objA, float const &objB)
 Emulates C#-style floating point comparison where two NaNs are considered equal even though according to IEC 60559:1989 NaN is not equal to any value, including NaN. More...
 
template<>
bool Equals (double const &objA, double const &objB)
 Emulates C#-style floating point comparison where two NaNs are considered equal even though according to IEC 60559:1989 NaN is not equal to any value, including NaN. More...
 
template<>
bool ReferenceEquals (String const &str, std::nullptr_t)
 Specialization of Object::ReferenceEquals for case of string and nullptr. More...
 
template<>
bool ReferenceEquals (String const &str1, String const &str2)
 Specialization of Object::ReferenceEquals for case of strings. More...
 

Protected Member Functions

System::SharedPtr< System::Collections::Generic::List< int32_t > > get_PageTextLengthes () const
 
System::String GetTotalText (System::SharedPtr< Aspose::Pdf::Engine::CommonData::Text::Segmenting::TextSegmenter > segmenter, bool isFormatted)
 

Protected Attributes

System::SharedPtr< System::Text::StringBuilderextractedText
 

Additional Inherited Members

- Public Types inherited from System::Object
typedef SmartPtr< Objectptr
 Alias for smart pointer type. More...
 
- Static Public Member Functions inherited from System::Object
static bool ReferenceEquals (ptr const &objA, ptr const &objB)
 Compares objects by reference. More...
 
template<typename T >
static std::enable_if<!IsSmartPtr< T >::value, bool >::type ReferenceEquals (T const &objA, T const &objB)
 Compares objects by reference. More...
 
template<typename T >
static std::enable_if<!IsSmartPtr< T >::value, bool >::type ReferenceEquals (T const &objA, std::nullptr_t)
 Reference-compares value type object with nullptr. More...
 
template<typename T1 , typename T2 >
static std::enable_if< IsSmartPtr< T1 >::value &&IsSmartPtr< T2 >::value, bool >::type Equals (T1 const &objA, T2 const &objB)
 Compares reference type objects in C# style. More...
 
template<typename T1 , typename T2 >
static std::enable_if<!IsSmartPtr< T1 >::value &&!IsSmartPtr< T2 >::value, bool >::type Equals (T1 const &objA, T2 const &objB)
 Compares value type objects in C# style. More...
 
static const TypeInfoType ()
 Implements C# typeof(System.Object) construct. More...
 

Detailed Description

Represents an absorber object of a text. Performs text extraction and provides access to the result via TextAbsorber::Text object.

The TextAbsorber object is used to extract text from a Pdf document or the document's page.

Constructor & Destructor Documentation

◆ TextAbsorber() [1/4]

ASPOSE_PDF_SHARED_API Aspose::Pdf::Text::TextAbsorber::TextAbsorber ( )

Initializes a new instance of the TextAbsorber.

Performs text extraction and provides access to the extracted text via TextAbsorber::Text object.

◆ TextAbsorber() [2/4]

ASPOSE_PDF_SHARED_API Aspose::Pdf::Text::TextAbsorber::TextAbsorber ( System::SharedPtr< TextExtractionOptions extractionOptions)

Initializes a new instance of the TextAbsorber with extraction options.

Performs text extraction and provides access to the extracted text via TextAbsorber::Text object.

Parameters
extractionOptionsText extraction options

◆ TextAbsorber() [3/4]

ASPOSE_PDF_SHARED_API Aspose::Pdf::Text::TextAbsorber::TextAbsorber ( System::SharedPtr< TextExtractionOptions extractionOptions,
System::SharedPtr< Aspose::Pdf::Text::TextSearchOptions textSearchOptions 
)

Initializes a new instance of the TextAbsorber with extraction and text search options.

Performs text extraction and provides access to the extracted text via TextAbsorber::Text object.

Parameters
extractionOptionsText extraction options
textSearchOptionsText search options

◆ TextAbsorber() [4/4]

ASPOSE_PDF_SHARED_API Aspose::Pdf::Text::TextAbsorber::TextAbsorber ( System::SharedPtr< Aspose::Pdf::Text::TextSearchOptions textSearchOptions)

Initializes a new instance of the TextAbsorber with text search options.

Performs text extraction and provides access to the extracted text via TextAbsorber::Text object.

Parameters
textSearchOptionsText search options

Member Function Documentation

◆ get_Errors()

ASPOSE_PDF_SHARED_API System::SharedPtr<System::Collections::Generic::List<System::SharedPtr<TextExtractionError> > > Aspose::Pdf::Text::TextAbsorber::get_Errors ( ) const

List of TextExtractionError objects. It contain information about errors were found during text extraction. Searching for errors will performed only if TextSearchOptions.LogTextExtractionErrors = true; And it may decrease performance.

◆ get_ExtractionOptions()

virtual ASPOSE_PDF_SHARED_API System::SharedPtr<TextExtractionOptions> Aspose::Pdf::Text::TextAbsorber::get_ExtractionOptions ( )
virtual

Gets text extraction options.

Allows to define text formatting mode TextExtractionOptions during extraction. The default mode is TextExtractionOptions::TextFormattingMode::Pure

Reimplemented in Aspose::Pdf::Text::TextFragmentAbsorber.

◆ get_HasErrors()

ASPOSE_PDF_SHARED_API bool Aspose::Pdf::Text::TextAbsorber::get_HasErrors ( ) const

Value indicates whether errors were found during text extraction. Searching for errors will performed only if TextSearchOptions.LogTextExtractionErrors = true; And it may decrease performance.

◆ get_PageTextLengthes()

System::SharedPtr<System::Collections::Generic::List<int32_t> > Aspose::Pdf::Text::TextAbsorber::get_PageTextLengthes ( ) const
protected

◆ get_Text()

virtual ASPOSE_PDF_SHARED_API System::String Aspose::Pdf::Text::TextAbsorber::get_Text ( )
virtual

Gets extracted text that the TextAbsorber extracts on the PDF document or page.

Reimplemented in Aspose::Pdf::Text::TextFragmentAbsorber.

◆ get_TextSearchOptions()

virtual ASPOSE_PDF_SHARED_API System::SharedPtr<Aspose::Pdf::Text::TextSearchOptions> Aspose::Pdf::Text::TextAbsorber::get_TextSearchOptions ( )
virtual

Gets text search options.

Allows to define rectangle which delimits the extracted text. By default the rectangle is empty. That means page boundaries only defines the text extraction region.

Reimplemented in Aspose::Pdf::Text::TextFragmentAbsorber.

◆ GetTotalText()

System::String Aspose::Pdf::Text::TextAbsorber::GetTotalText ( System::SharedPtr< Aspose::Pdf::Engine::CommonData::Text::Segmenting::TextSegmenter >  segmenter,
bool  isFormatted 
)
protected

◆ set_ExtractionOptions()

virtual ASPOSE_PDF_SHARED_API void Aspose::Pdf::Text::TextAbsorber::set_ExtractionOptions ( System::SharedPtr< TextExtractionOptions value)
virtual

Sets text extraction options.

Allows to define text formatting mode TextExtractionOptions during extraction. The default mode is TextExtractionOptions::TextFormattingMode::Pure

Reimplemented in Aspose::Pdf::Text::TextFragmentAbsorber.

◆ set_TextSearchOptions()

virtual ASPOSE_PDF_SHARED_API void Aspose::Pdf::Text::TextAbsorber::set_TextSearchOptions ( System::SharedPtr< Aspose::Pdf::Text::TextSearchOptions value)
virtual

Sets text search options.

Allows to define rectangle which delimits the extracted text. By default the rectangle is empty. That means page boundaries only defines the text extraction region.

Reimplemented in Aspose::Pdf::Text::TextFragmentAbsorber.

◆ Visit() [1/3]

virtual ASPOSE_PDF_SHARED_API void Aspose::Pdf::Text::TextAbsorber::Visit ( System::SharedPtr< Page page)
virtual

Extracts text on the specified page

Parameters
pagePdf pocument page object.

Reimplemented in Aspose::Pdf::Text::TextFragmentAbsorber, and Aspose::Pdf::Text::TextParagraphAbsorber.

◆ Visit() [2/3]

virtual ASPOSE_PDF_SHARED_API void Aspose::Pdf::Text::TextAbsorber::Visit ( System::SharedPtr< XForm form)
virtual

Extracts text on the specified XForm.

Parameters
formPdf form object.

Reimplemented in Aspose::Pdf::Text::TextFragmentAbsorber.

◆ Visit() [3/3]

virtual ASPOSE_PDF_SHARED_API void Aspose::Pdf::Text::TextAbsorber::Visit ( System::SharedPtr< Document pdf)
virtual

Extracts text on the specified document

Parameters
pdfPdf pocument object.

Reimplemented in Aspose::Pdf::Text::TextFragmentAbsorber.

Member Data Documentation

◆ extractedText

System::SharedPtr<System::Text::StringBuilder> Aspose::Pdf::Text::TextAbsorber::extractedText
protected