ParagraphAbsorber

ParagraphAbsorber class

Represents an absorber object of page structure objects such as sections and paragraphs. Performs search for sections and paragraphs of text and provides access for rectangles and polydons that describes it in text coordinate space. Also performs text segments search and provides access to search results via !:TextFragments collections grouped by structure elements.

public class ParagraphAbsorber

Constructors

Name Description
ParagraphAbsorber() Initializes a new instance of the ParagraphAbsorber that performs search for sections/paragraphs of the document or page.
ParagraphAbsorber(int) Initializes a new instance of the ParagraphAbsorber that performs search for sections/paragraphs of the document or page.

Properties

Name Description
IsMulticolumnParagraphsAllowed { get; set; } Gets or sets value that indicates whether starting text lines of a next section may be treated as continuation of the last paragraph of a previous section.
PageMarkups { get; } Gets collection of PageMarkup that were absorbed.
SectionsSearchDepth { get; set; } Gets or sets value that instructs how many times sequential searches for more fine elements of structure will be performed. Default search depth is 3. It means three searches for horizontally divided sections (headers, paragraphs etc) and three searches for vertically divided ones (columns).

Methods

Name Description
Visit(Document) Performs search for sections and paragraphs on the specified Document.
Visit(Page) Performs search on the specified Page.

Remarks

When the search is completed the PageMarkups collection will contains PageMarkup objects that represents page structure by collections of MarkupSection and MarkupParagraph. The TextFragment object provides access to the search occurrence text, text properties, and allows to edit text and change the text state (font, font size, color etc).

Examples

The example demonstrates how to find first text segment of each paragraph on the first PDF document page and highlight it.

// Open document
Document doc = new Document("input.pdf");

// Create ParagraphAbsorber object
ParagraphAbsorber absorber = new ParagraphAbsorber();

// Accept the absorber for first page
absorber.Visit(doc.Pages[1]);

// Get markup object of first page
PageMarkup markup = absorber.PageMarkups[0];

// Loop through structure elements of the page text to find first text fragment of each paragraph
foreach (MarkupSection section in markup.Sections)
{
    foreach (MarkupParagraph paragraph in section.Paragraphs)
    {
        TextFragment fragment = paragraph.Fragments[0];
        // Update text properties
        fragment.TextState.BackgroundColor = Color.LightBlue;
    }
}

// Save document
doc.Save(GetOutputPath("output.pdf"));

See Also