Document class

Document class

Represents a Word document. To learn more, visit the Working with Document documentation article.

The Document is a central object in the Aspose.Words library.

To load an existing document in any of the LoadFormat formats, pass a file name or a stream into one of the Document constructors. To create a blank document, call the constructor without parameters.

Use one of the Save method overloads to save the document in any of the SaveFormat formats.

Document.mail_merge is the Aspose.Words’s reporting engine that allows to populate reports designed in Microsoft Word with data from various data sources quickly and easily.

Document stores document-wide information such as DocumentBase.styles, Document.built_in_document_properties, Document.custom_document_properties, lists and macros. Most of these objects are accessible via the corresponding properties of the Document.

The Document is a root node of a tree that contains all other nodes of the document. The tree is a Composite design pattern and in many ways similar to XmlDocument. The content of the document can be manipulated freely programmatically:

Consider using DocumentBuilder that simplifies the task of programmatically creating or populating the document tree.

The Document can contain only Section objects.

In Microsoft Word, a valid document needs to have at least one section.

Inheritance: DocumentDocumentBaseCompositeNodeNode

Constructors

Name Description
Document() Creates a blank Word document.
Document(file_name) Opens an existing document from a file. Automatically detects the file format.
Document(file_name, load_options) Opens an existing document from a file. Allows to specify additional options such as an encryption password.
Document(stream) Opens an existing document from a stream. Automatically detects the file format.
Document(stream, load_options) Opens an existing document from a stream. Allows to specify additional options such as an encryption password.

Properties

Name Description
attached_template Gets or sets the full path of the template attached to the document.
automatically_update_styles Gets or sets a flag indicating whether the styles in the document are updated to match the styles in the attached template each time the document is opened in MS Word.
background_shape Gets or sets the background shape of the document. Can be None.
(Inherited from DocumentBase)
built_in_document_properties Returns a collection that represents all the built-in document properties of the document.
child_nodes Gets all immediate child nodes of this node.
(Inherited from CompositeNode)
compatibility_options Provides access to document compatibility options (that is, the user preferences entered on the Compatibility tab of the Options dialog in Word).
compliance Gets the OOXML compliance version determined from the loaded document content. Makes sense only for OOXML documents.
count Gets the number of immediate children of this node.
(Inherited from CompositeNode)
custom_document_properties Returns a collection that represents all the custom document properties of the document.
custom_node_id Specifies custom node identifier.
(Inherited from Node)
custom_xml_parts Gets or sets the collection of Custom XML Data Storage Parts.
default_tab_stop Gets or sets the interval (in points) between the default tab stops.
digital_signatures Gets the collection of digital signatures for this document and their validation results.
document Gets the document to which this node belongs.
(Inherited from Node)
endnote_options Provides options that control numbering and positioning of endnotes in this document.
field_options Gets a FieldOptions object that represents options to control field handling in the document.
first_child Gets the first child of the node.
(Inherited from CompositeNode)
first_section Gets the first section in the document.
font_infos Provides access to properties of fonts used in this document.
(Inherited from DocumentBase)
font_settings Gets or sets document font settings.
footnote_options Provides options that control numbering and positioning of footnotes in this document.
frameset Returns a Document.frameset instance if this document represents a frames page.
glossary_document Gets or sets the glossary document within this document or template. A glossary document is a storage for AutoText, AutoCorrect and Building Block entries defined in a document.
grammar_checked Returns True if the document has been checked for grammar.
has_child_nodes Returns True if this node has any child nodes.
(Inherited from CompositeNode)
has_macros Returns True if the document has a VBA project (macros).
has_revisions Returns True if the document has any tracked changes.
hyphenation_options Provides access to document hyphenation options.
is_composite Returns True if this node can contain other nodes.
(Inherited from Node)
last_child Gets the last child of the node.
(Inherited from CompositeNode)
last_section Gets the last section in the document.
layout_options Gets a LayoutOptions object that represents options to control the layout process of this document.
lists Provides access to the list formatting used in the document.
(Inherited from DocumentBase)
mail_merge Returns a MailMerge object that represents the mail merge functionality for the document.
mail_merge_settings Gets or sets the object that contains all of the mail merge information for a document.
next_sibling Gets the node immediately following this node.
(Inherited from Node)
node_changing_callback Called when a node is inserted or removed in the document.
(Inherited from DocumentBase)
node_type Returns NodeType.DOCUMENT.
original_file_name Gets the original file name of the document.
original_load_format Gets the format of the original document that was loaded into this object.
package_custom_parts Gets or sets the collection of custom parts (arbitrary content) that are linked to the OOXML package using “unknown relationships”.
page_color Gets or sets the page color of the document. This property is a simpler version of DocumentBase.background_shape.
(Inherited from DocumentBase)
page_count Gets the number of pages in the document as calculated by the most recent page layout operation.
parent_node Gets the immediate parent of this node.
(Inherited from Node)
previous_sibling Gets the node immediately preceding this node.
(Inherited from Node)
protection_type Gets the currently active document protection type.
range Returns a Range object that represents the portion of a document that is contained in this node.
(Inherited from Node)
remove_personal_information Gets or sets a flag indicating that Microsoft Word will remove all user information from comments, revisions and document properties upon saving the document.
resource_loading_callback Allows to control how external resources are loaded.
(Inherited from DocumentBase)
revisions Gets a collection of revisions (tracked changes) that exist in this document.
revisions_view Gets or sets a value indicating whether to work with the original or revised version of a document.
sections Returns a collection that represents all sections in the document.
shade_form_data Specifies whether to turn on the gray shading on form fields.
show_grammatical_errors Specifies whether to display grammar errors in this document.
show_spelling_errors Specifies whether to display spelling errors in this document.
spelling_checked Returns True if the document has been checked for spelling.
styles Returns a collection of styles defined in the document.
(Inherited from DocumentBase)
theme Gets the Document.theme object for this document.
track_revisions True if changes are tracked when this document is edited in Microsoft Word.
variables Returns the collection of variables added to a document or template.
vba_project Gets or sets a Document.vba_project.
versions_count Gets the number of document versions that was stored in the DOC document.
view_options Provides options to control how the document is displayed in Microsoft Word.
warning_callback Called during various document processing procedures when an issue is detected that might result in data or formatting fidelity loss.
(Inherited from DocumentBase)
watermark Provides access to the document watermark.
web_extension_task_panes Returns a collection that represents a list of task pane add-ins.
write_protection Provides access to the document write protection options.

Methods

Name Description
accept(visitor) Accepts a visitor.
accept_all_revisions() Accepts all tracked changes in the document.
append_child(new_child) Adds the specified node to the end of the list of child nodes for this node.
(Inherited from CompositeNode)
append_document(src_doc, import_format_mode) Appends the specified document to the end of this document.
append_document(src_doc, import_format_mode, import_format_options) Appends the specified document to the end of this document.
cleanup() Cleans unused styles and lists from the document.
cleanup(options) Cleans unused styles and lists from the document depending on given CleanupOptions.
clone() Performs a deep copy of the Document.
clone(is_clone_children) Performs a deep copy of the Document.
compare(document, author, date_time) Compares this document with another document producing changes as number of edit and format revisions Revision.
compare(document, author, date_time, options) Compares this document with another document producing changes as a number of edit and format revisions Revision. Allows to specify comparison options using CompareOptions.
copy_styles_from_template(template) Copies styles from the specified template to a document.
copy_styles_from_template(template) Copies styles from the specified template to a document.
ensure_minimum() If the document contains no sections, creates one section with one paragraph.
expand_table_styles_to_direct_formatting() Converts formatting specified in table styles into direct formatting on tables in the document.
extract_pages(index, count) Returns the Document object representing specified range of pages.
get_ancestor(ancestor_type) Gets the first ancestor of the specified NodeType.
(Inherited from Node)
get_child(node_type, index, is_deep) Returns an Nth child node that matches the specified type.
(Inherited from CompositeNode)
get_child_nodes(node_type, is_deep) Returns a live collection of child nodes that match the specified type.
(Inherited from CompositeNode)
get_page_info(page_index) Gets the page size, orientation and other information about a page that might be useful for printing or rendering.
get_text() Gets the text of this node and of all its children.
(Inherited from Node)
import_node(src_node, is_import_children) Imports a node from another document to the current document.
(Inherited from DocumentBase)
import_node(src_node, is_import_children, import_format_mode) Imports a node from another document to the current document with an option to control formatting.
(Inherited from DocumentBase)
index_of(child) Returns the index of the specified child node in the child node array.
(Inherited from CompositeNode)
insert_after(new_child, ref_child) Inserts the specified node immediately after the specified reference node.
(Inherited from CompositeNode)
insert_before(new_child, ref_child) Inserts the specified node immediately before the specified reference node.
(Inherited from CompositeNode)
join_runs_with_same_formatting() Joins runs with same formatting in all paragraphs of the document.
next_pre_order(root_node) Gets next node according to the pre-order tree traversal algorithm.
(Inherited from Node)
node_type_to_string(node_type) A utility method that converts a node type enum value into a user friendly string.
(Inherited from Node)
normalize_field_types() Changes field type values FieldChar.field_type of FieldStart, FieldSeparator, FieldEnd in the whole document so that they correspond to the field types contained in the field codes.
prepend_child(new_child) Adds the specified node to the beginning of the list of child nodes for this node.
(Inherited from CompositeNode)
previous_pre_order(root_node) Gets the previous node according to the pre-order tree traversal algorithm.
(Inherited from Node)
protect(type) Protects the document from changes without changing the existing password or assigns a random password.
protect(type, password) Protects the document from changes and optionally sets a protection password.
remove() Removes itself from the parent.
(Inherited from Node)
remove_all_children() Removes all the child nodes of the current node.
(Inherited from CompositeNode)
remove_child(old_child) Removes the specified child node.
(Inherited from CompositeNode)
remove_external_schema_references() Removes external XML schema references from this document.
remove_macros() Removes all macros (the VBA project) as well as toolbars and command customizations from the document.
remove_smart_tags() Removes all SmartTag descendant nodes of the current node.
(Inherited from CompositeNode)
save(file_name) Saves the document to a file. Automatically determines the save format from the extension.
save(file_name, save_format) Saves the document to a file in the specified format.
save(file_name, save_options) Saves the document to a file using the specified save options.
save(stream, save_format) Saves the document to a stream using the specified format.
save(stream, save_options) Saves the document to a stream using the specified save options.
select_nodes(xpath) Selects a list of nodes matching the XPath expression.
(Inherited from CompositeNode)
select_single_node(xpath) Selects the first Node that matches the XPath expression.
(Inherited from CompositeNode)
start_track_revisions(author, date_time) Starts automatically marking all further changes you make to the document programmatically as revision changes.
start_track_revisions(author) Starts automatically marking all further changes you make to the document programmatically as revision changes.
stop_track_revisions() Stops automatic marking of document changes as revisions.
to_string(save_format) Exports the content of the node into a string in the specified format.
(Inherited from Node)
to_string(save_options) Exports the content of the node into a string using the specified save options.
(Inherited from Node)
unlink_fields() Unlinks fields in the whole document.
unprotect() Removes protection from the document regardless of the password.
unprotect(password) Removes protection from the document if a correct password is specified.
update_fields() Updates the values of fields in the whole document.
update_list_labels() Updates list labels for all list items in the document.
update_page_layout() Rebuilds the page layout of the document.
update_table_layout() Implements an earlier approach to table column widths re-calculation that has known issues.
update_thumbnail(options) Updates BuiltInDocumentProperties.thumbnail of the document according to the specified options.
update_thumbnail() Updates BuiltInDocumentProperties.thumbnail of the document using default options.
update_word_count() Updates word count properties of the document.
update_word_count(update_lines_count) Updates word count properties of the document, optionally updates BuiltInDocumentProperties.lines property.

Examples

Shows how to execute a mail merge with data from a DataTable.

def test_execute_data_table(self):

    table = DataTable("Test")
    table.columns.add("CustomerName")
    table.columns.add("Address")
    table.rows.add(["Thomas Hardy", "120 Hanover Sq., London"])
    table.rows.add(["Paolo Accorti", "Via Monte Bianco 34, Torino"])

    # Below are two ways of using a DataTable as the data source for a mail merge.
    # 1 -  Use the entire table for the mail merge to create one output mail merge document for every row in the table:
    doc = ExMailMerge.create_source_doc_execute_data_table()

    doc.mail_merge.execute(table)

    doc.save(ARTIFACTS_DIR + "MailMerge.execute_data_table.whole_table.docx")

    # 2 -  Use one row of the table to create one output mail merge document:
    doc = ExMailMerge.create_source_doc_execute_data_table()

    doc.mail_merge.execute(table.rows[1])

    doc.save(ARTIFACTS_DIR + "MailMerge.execute_data_table.one_row.docx")

@staticmethod
def create_source_doc_execute_data_table() -> aw.Document:
    """Creates a mail merge source document."""

    doc = aw.Document()
    builder = aw.DocumentBuilder(doc)

    builder.insert_field(" MERGEFIELD CustomerName ")
    builder.insert_paragraph()
    builder.insert_field(" MERGEFIELD Address ")

    return doc

See Also