Class DocSaveOptions

DocSaveOptions class

Save options for export to Doc format

public class DocSaveOptions : UnifiedSaveOptions, IPipelineOptions

Constructors

NameDescription
DocSaveOptions()The default constructor.

Properties

NameDescription
AddReturnToLineEnd { get; set; }Use paragraph or line breaks
BatchSize { get; set; }Defines batch size if batched conversion is applicable to source and destination formats pair.
CacheGlyphs { get; set; }Gets or sets boolean value which indicates if will font glyphs be cached while preparing aps pages. Improves performance of conversion pdf to other formats but increases memory consumption.
CloseResponse { get; set; }Gets or sets boolean value which indicates will Response object be closed after document saved into response.
ConvertType3Fonts { get; set; }Gets or sets conversion for Type3 fonts. In Type 3 fonts, glyphs shall be defined by streams of graphics operators. This means that in the DOC/DOCX output we see images instead of text. Set this flag to true to convert Type3 fonts to TTF and get text in the resulting file.
ExtractOcrSublayerOnly { get; set; }This atrribute turned on functionality for extracting image or text for PDF documents with OCR sublayer.
Format { get; set; }Output format
ImageResolutionX { get; set; }Converted images X resolution.
ImageResolutionY { get; set; }Converted images Y resolution.
MaxDistanceBetweenTextLines { get; set; }This parameter is used for grouping text lines into paragraphs. Determines how far apart can be two relative text lines. Specified in hundreds of percent of the text lines height.
MemorySaveModePath { get; set; }Defines the path (file name or directory name) to hold temporary data when converting in memory save mode.
Mode { get; set; }Recognition mode.
RecognizeBullets { get; set; }Switch on the recognition of bullets
RelativeHorizontalProximity { get; set; }In Pdf words may be innerly represented with operators that prints words by independently printing their letters or syllables. So, to detect words sometimes we need detect groups of independent chars that are in fact words. This setting defines width of space between text elements(letters, syllables) that must be treated as distance between words during recognition of words in source PDF. (presence of empty space at least with this width between letters means that textual elements pertain to different words). It’s normed to font size - 1.0 means 100% of supposed word’s font size. ATTENTION!It’s used only in cases when source PDF contains specific rarely used fonts for which optimal value cannot be calculated from font. So, in vast majority of cases this parameter changes nothing in result document.
ReSaveFonts { get; set; }Gets or sets the procedure for resaving fonts. If set to true, we reload fonts on every page to avoid the influence of previous font properties and load the newly created font from scratch. Set this option to false if you want to improve performance. The default value is true;
SaveFormat { get; }Format of data save.
WarningHandler { get; set; }Callback to handle any warnings generated. The WarningHandler returns ReturnAction enum item specifying either Continue or Abort. Continue is the default action and the Save operation continues, however the user may also return Abort in which case the Save operation should cease.

Fields

NameDescription
CustomProgressHandlerThis handler can be used to handle conversion progress events f.e. it can be used to show progress bar or messages about current amount of processed pages, example of handler’s code that shows progress on console is :
IsMultiThreadingProcess pages in few threads.
TryMergeAdjacentSameBackgroundImagesSometimes PDFs contain background images (of pages or table cells) constructed from several same tiling background images put one near other. In such case renderers of target formats (f.e MsWord for DOCS format) sometimes generates visible boundaries beetween parts of background images, cause their techniques of image edge smoothing (anti-aliasing) is different from Acrobat Reader. If it looks like exported document contains such visible boundaries between parts of same background images, please try use this setting to get rid of that unwanted effect. ATTENTION! This optimization of quality usually essentially slows down conversion, so, please, use this option only when it’s really necessary.

Examples

The following example shows how to convert PDF file to DOC or DOCX file

[C#]
	// The path to the documents directory.
	string dataDir = "YOUR_DATA_DIRECTORY";

	// The path to your PDF File.
	var pdfFile = Path.Combine(dataDir, "PDF-to-DOC.pdf");

	// The path to output DOC or DOCX File.
	var docFile = Path.Combine(dataDir, "PDF-to-DOC.doc");
		
	using (Document pdfDocument = new Document(pdfFile))
	{
		DocSaveOptions saveOptions = new DocSaveOptions
		{
			Format = DocSaveOptions.DocFormat.Doc,
			// Set the recognition mode as Flow
			Mode = DocSaveOptions.RecognitionMode.Flow,
			// Set the Horizontal proximity as 2.5
			RelativeHorizontalProximity = 2.5f,
			// Enable the value to recognize bullets during conversion process
			RecognizeBullets = true
		};
		pdfDocument.Save(docFile, saveOptions);
	}
[VB.NET]

    ' The path to the documents directory.
    Dim dataDir As String = "YOUR_DATA_DIRECTORY"
	
    ' The path to your PDF File.
    Dim pdfFile = Path.Combine(dataDir, "PDF-to-DOC.pdf")
    
	' The path to output DOC or DOCX File.
    Dim docFile = Path.Combine(dataDir, "PDF-to-DOC.doc")
 
    Using pdfDocument As Document = New Document(pdfFile)
        Dim saveOptions As DocSaveOptions = New DocSaveOptions With {
          .Format = DocSaveOptions.DocFormat.Doc,
            ' Set the recognition mode as Flow
            .Mode = DocSaveOptions.RecognitionMode.Flow,
            ' Set the Horizontal proximity as 2.5
            .RelativeHorizontalProximity = 2.5,
            ' Enable the value to recognize bullets during conversion process
            .RecognizeBullets = True
        }
        pdfDocument.Save(docFile, saveOptions)
    End Using

See Also