DocumentRecognitionSettings

Inheritance: java.lang.Object

public class DocumentRecognitionSettings

Settings for the pdf recognition. Contains elements that allow customizing the recognition process.

Constructors

ConstructorDescription
DocumentRecognitionSettings(int pagesNumber)Initializes a new instance of the @see #DocumentRecognitionSettings class with default properties.
DocumentRecognitionSettings(int startPage, int pagesNumber)Initializes a new instance of the @see #DocumentRecognitionSettings class with short set of properties.
DocumentRecognitionSettings(int startPage, int pagesNumber, Language language, boolean detectAreas, boolean autoSkew, int threshold)Initializes a new instance of the @see #DocumentRecognitionSettings class with full set of properties.

Methods

MethodDescription
setDetectAreas(boolean detectAreas)
setAutoSkew(boolean autoSkew)
setLanguage(Language language)
setThresholdValue(int thresholdValue)
setIgnoredCharacters(String ignoredCharacters)
setLinesFiltration(boolean linesFiltration)
setStartPage(int startPage)
setPagesNumber(int pagesNumber)
setThreadsCount(int threadsCount)Gets or sets the number of threads for processing.
setAutoContrast(boolean autoContrast)Allows using an additional contrast correction algorithm for the image before recognition.
setAutoDenoising(boolean autoDenoising)Enables the use of an additional neural network to improve the image - reduce noise.
setAllowedCharacters(CharactersAllowedType allowedCharacters)Allowed characters set.
setAllowedCharacters(String allowedCharacters)Allowed characters set.
setDetectAreasMode(DetectAreasMode detectAreasMode)Determines the type of neural network used for areas detection.
setSkew(double skew)Sets angle in degrees for image rotation.
setUpscaleSmallFont(boolean upscaleSmallFont)Allows you to use additional algorithms specifically for small font recognition.
getStartPage()First page in pdf file to extract images.
getPagesNumber()Total amount of pages from pdf file to extract i,ages (start with startPage).

DocumentRecognitionSettings(int pagesNumber)

public DocumentRecognitionSettings(int pagesNumber)

Initializes a new instance of the @see #DocumentRecognitionSettings class with default properties. Demands to set pagesNumber. Set 0 to recognize all pages in document.

Parameters:

ParameterTypeDescription
pagesNumberintSet the number of pages for recognition multipage pdf file.

DocumentRecognitionSettings(int startPage, int pagesNumber)

public DocumentRecognitionSettings(int startPage, int pagesNumber)

Initializes a new instance of the @see #DocumentRecognitionSettings class with short set of properties.

Parameters:

ParameterTypeDescription
startPageintSet the first page for recognition.
pagesNumberintSet the number of pages for recognition multipage pdf file.

DocumentRecognitionSettings(int startPage, int pagesNumber, Language language, boolean detectAreas, boolean autoSkew, int threshold)

public DocumentRecognitionSettings(int startPage, int pagesNumber, Language language, boolean detectAreas, boolean autoSkew, int threshold)

Initializes a new instance of the @see #DocumentRecognitionSettings class with full set of properties.

Parameters:

ParameterTypeDescription
startPageintSet the first page for recognition. 0 by default.
pagesNumberintSet the number of pages for recognition multipage pdf file.
languageLanguageLanguage used for OCR.
detectAreasbooleanEnable automatic text areas detection.
autoSkewbooleanEnable automatic image skew correction.
thresholdintCustom image binarization threshold

setDetectAreas(boolean detectAreas)

public void setDetectAreas(boolean detectAreas)

Parameters:

ParameterTypeDescription
detectAreasboolean

setAutoSkew(boolean autoSkew)

public void setAutoSkew(boolean autoSkew)

Parameters:

ParameterTypeDescription
autoSkewboolean

setLanguage(Language language)

public void setLanguage(Language language)

Parameters:

ParameterTypeDescription
languageLanguage

setThresholdValue(int thresholdValue)

public void setThresholdValue(int thresholdValue)

Parameters:

ParameterTypeDescription
thresholdValueint

setIgnoredCharacters(String ignoredCharacters)

public void setIgnoredCharacters(String ignoredCharacters)

Parameters:

ParameterTypeDescription
ignoredCharactersjava.lang.String

setLinesFiltration(boolean linesFiltration)

public void setLinesFiltration(boolean linesFiltration)

Parameters:

ParameterTypeDescription
linesFiltrationboolean

setStartPage(int startPage)

public void setStartPage(int startPage)

Parameters:

ParameterTypeDescription
startPageint

setPagesNumber(int pagesNumber)

public void setPagesNumber(int pagesNumber)

Parameters:

ParameterTypeDescription
pagesNumberint

setThreadsCount(int threadsCount)

public void setThreadsCount(int threadsCount)

Gets or sets the number of threads for processing. By default, 0 means that the image will be processed with the number of threads equal to your number of processors. ThreadsCount = 1 means that the image will be processed in the main thread.

Parameters:

ParameterTypeDescription
threadsCountintthe number of threads that will be created for parallel recognition of image fragments.

setAutoContrast(boolean autoContrast)

public void setAutoContrast(boolean autoContrast)

Allows using an additional contrast correction algorithm for the image before recognition.

Parameters:

ParameterTypeDescription
autoContrastbooleancontains boolean value - a contrast correction filter is set.

setAutoDenoising(boolean autoDenoising)

public void setAutoDenoising(boolean autoDenoising)

Enables the use of an additional neural network to improve the image - reduce noise. Useful for images with scan artifacts, distortion, spots, flares, gradients, foreign elements.

Parameters:

ParameterTypeDescription
autoDenoisingbooleancontains boolean value - a denoising is set.

setAllowedCharacters(CharactersAllowedType allowedCharacters)

public void setAllowedCharacters(CharactersAllowedType allowedCharacters)

Allowed characters set. Determines the type of characters allowed for recognition result.

Parameters:

ParameterTypeDescription
allowedCharactersCharactersAllowedTypecontains enum @see CharactersAllowedType value.

setAllowedCharacters(String allowedCharacters)

public void setAllowedCharacters(String allowedCharacters)

Allowed characters set. Determines the array of characters allowed for recognition result.

Parameters:

ParameterTypeDescription
allowedCharactersjava.lang.Stringcontains array.

setDetectAreasMode(DetectAreasMode detectAreasMode)

public void setDetectAreasMode(DetectAreasMode detectAreasMode)

Determines the type of neural network used for areas detection.

Parameters:

ParameterTypeDescription
detectAreasModeDetectAreasModecontains enum @see DetectAreasMode value.

setSkew(double skew)

public void setSkew(double skew)

Sets angle in degrees for image rotation. Zero by default. Setting this value will disable the setAutoSkew(boolean) property, so that auto skew correction is not applied.

Parameters:

ParameterTypeDescription
skewdoubleRotate image on specified angle.

setUpscaleSmallFont(boolean upscaleSmallFont)

public void setUpscaleSmallFont(boolean upscaleSmallFont)

Allows you to use additional algorithms specifically for small font recognition. Useful for images with small size characters.

Parameters:

ParameterTypeDescription
upscaleSmallFontbooleancontains boolean value - a upscaleSmallFont is set.

getStartPage()

public int getStartPage()

First page in pdf file to extract images.

Returns: int - start page

getPagesNumber()

public int getPagesNumber()

Total amount of pages from pdf file to extract i,ages (start with startPage).

Returns: int - pages amount for recognition