AppSuite:ReaderEngineConfig: Difference between revisions
Kai.ahrens (talk | contribs) (Adding section for handling of temporary files) |
Kai.ahrens (talk | contribs) |
||
Line 249: | Line 249: | ||
The DocumentConverter server needs to store files at runtime for different purposes at different volume locations: | The DocumentConverter server needs to store files at runtime for different purposes at different volume locations: | ||
* '''Persistent files (Cache)''' The files that should last longer than the runtime of one converter instance are stored at the configurable ''com.openexchange.documentconverter.cacheDir'' directory. As the name of the property implies, such files are result cache entries used by multiple converter instances. This directory is monitored at runtime and all files are managed by the converter. | * '''Persistent files (Cache)''' The files that should last longer than the runtime of one converter instance are stored at the configurable ''com.openexchange.documentconverter.cacheDir'' directory. As the name of the property implies, such files are result cache entries used by multiple converter instances. This directory is monitored at runtime and all files are managed by the converter. Constraints for this directory are set via the converter properties ''com.openexchange.documentconverter.minFreeVolumeSizeMB'', ''com.openexchange.documentconverter.maxCacheSizeMB'', ''com.openexchange.documentconverter.maxCacheEntries'' and ''com.openexchange.documentconverter.cacheEntryTimeoutSeconds''. | ||
* '''Medium lasting files''' These files are only valid for the runtime of one converter instance (e.g. ReaderEngine related runtime config files for each ReaderEngine instance). They are stored within the configurable ''com.openexchange.documentconverter.scratchDir'' directory. This directory is not constantly monitored at runtime but all files, contained in the ''${com.openexchange.documentconverter.scratchDir}/oxdc.tmp'' sub directory are managed by the converter during the startup and shutdown phase of one converter server instance. In this case, the whole ''${com.openexchange.documentconverter.scratchDir}/oxdc.tmp'' directory gets cleaned up during converter server shutdown as well as converter server startup. Initial cleanup during startup is necessary due to the fact, that the last converter instance might have aborted for unknown reasons, like e.g. power outage, VM abort etc. | * '''Medium lasting files''' These files are only valid for the runtime of one converter instance (e.g. ReaderEngine related runtime config files for each ReaderEngine instance). They are stored within the configurable ''com.openexchange.documentconverter.scratchDir'' directory. This directory is not constantly monitored at runtime but all files, contained in the ''${com.openexchange.documentconverter.scratchDir}/oxdc.tmp'' sub directory are managed by the converter during the startup and shutdown phase of one converter server instance. In this case, the whole ''${com.openexchange.documentconverter.scratchDir}/oxdc.tmp'' directory gets cleaned up during converter server shutdown as well as converter server startup. Initial cleanup during startup is necessary due to the fact, that the last converter instance might have aborted for unknown reasons, like e.g. power outage, VM abort etc. |
Latest revision as of 13:47, 22 March 2019
A summary of all configuration items, together with each default value, is given below. Although the defaults have been carefully chosen for a real life deployment, the admin should take a closer look at each of them and adjust them accordingly, if necessary.
- com.openexchange.documentconverter.installDir=/opt/readerengine
This item contains the the directory of the libreaderengine installation. The libreaderengine installation directory in general contains the ./program directory, which itself contains the engine executables.
VERY IMPORTANT: If not set correctly, the complete web service will be nonfunctional.
Default value: "/opt/readerengine"
- com.openexchange.documentconverter.cacheDir=/var/spool/open-xchange/documentconverter/readerengine.cache
This item contains the directory that will make up the cache for persistent job data. The directory itself does not need to exist at startup, but the parent directory needs to exist and needs to have write permissions for the user running the servlet, in order for the servlet to create this cache directory at runtime.
VERY IMPORTANT: If not set correctly, the complete web service will be nonfunctional.
Default value: "/var/spool/open-xchange/documentconverter/readerengine.cache"
- com.openexchange.documentconverter.scratchDir=/var/spool/open-xchange/documentconverter/readerengine.scratch
This item contains the directory, that will make up the runtime enironment for the readerengine. The directory itself does not need to exist at startup, but the parent directory needs to exist and needs to have write permissions for the user running the servlet , in order for the servlet to create this cache directory at runtime.
VERY IMPORTANT: If not set correctly, the complete web service will be nonfunctional.
Default value: "/var/spool/open-xchange/documentconverter/readerengine.scratch"
- com.openexchange.documentconverter.errorDir=
This item specifies a directory for files that could not be loaded due to an error condition or due to a timeout.
Note: The used disk space will grow with retained files. Files have to be removed manually.
Default value: n/a
- com.openexchange.documentconverter.blacklistFile=/opt/open-xchange/etc/readerengine.blacklist
The list of external document content URLs that are not allowed to be loaded
by the readerengine after loading a document.
The file itself contains a list of (newline separated) regular expressions.
Each external URL is first checked against the list of blacklist URL regular
expressions.
If the external URL matches one blacklist entry, the external URL is
then checked against the list of whitelist URL regular expressions.
The behavior in summary is as follows:
If the URL is not blacklisted and not whitelisted, it is resolved at runtime.
If the URL is blacklisted but not whitelisted, it is not resolved at runtime.
If the URL is not blacklisted but whitelisted, it is resolved at runtime.
If the URL is blacklisted and whitelisted, it is resolved at runtime.
In boolean notation: valid = (!blacklisted) || whitelisted
Please note that the regular expressions need to fully qualify the patterns that
the URL should be checked against.
Upper/Lower cases need to be handled by the regular expression as well.
The file itself needs to be UTF-8 encoded to be read appropriately.
Default value: "/opt/open-xchange/etc/readerengine.blacklist"
- com.openexchange.documentconverter.whitelistFile=/opt/open-xchange/etc/readerengine.whitelist
The list of external document content URLs that are allowed to be loaded
by the readerengine after an external URL matched a blacklist pattern.
The file itself contains a list of (newline separated) regular expressions.
Each external URL is only checked against the list of whitelist URL regular
expressions if it previously matched a pattern in the blacklist file.
If the external URL matches one blacklist entry, the external URL is
then checked against the list of whitelist URL regular expressions.
The behavior in summary is as follows:
If the URL is not blacklisted and not whitelisted, it is resolved at runtime.
If the URL is blacklisted but not whitelisted, it is not resolved at runtime.
If the URL is not blacklisted but whitelisted, it is resolved at runtime.
If the URL is blacklisted and whitelisted, it is resolved at runtime.
In boolean notation: valid = (!blacklisted) || whitelisted
Please note that the regular expressions need to fully qualify the patterns that
the URL should be checked against.
Upper/Lower cases need to be handled by the regular expression as well.
The file itself needs to be UTF-8 encoded to be read appropriately.
Default value: "/opt/open-xchange/etc/readerengine.whitelist"
- com.openexchange.documentconverter.urlLinkLimit=200
The external URL link limit specifies the maximum amount of
valid external internet URLs (filtered by blacklist and whitelist before),
that are tried to get resolved by the engine when loading a document.
When this limit is reached, no more external internet URLs are resolved
for the current document.
Important: Please take note than one externally linked object within the document does not automatically correspond to one external URL call. In general, there are - at least - two URL calls necessary to display one externally linked object. Such additional calls are in most cases based on a format detection, happening prior to resolving the object data itself.
Set to -1 for no upper limit or to 0 to disable the resolving of internet URLs completely
Default value: 200
- com.openexchange.documentconverter.urlLinkProxy =
The external URL link proxy entry specifies a proxy server that is used by the readerengine
to resolve external links, contained within a document. Such links are e.g. external http://
graphic links, that are going to be resolved during the filtering process of a readerengine
instance.
Set this entry to the address of the proxy server: host:port
Recognized protocols are http://, https:// and ftp://
Leave empty, if no proxy server should be used by the readerengine
Default value: n/a
- com.openexchange.documentconverter.RemoteBaseUrl =
Use a remote document conversion webservice to do the actual conversion;
Set this entry to the base URL of the remote host http://host[:port]/documentconverterPath;
leave empty if conversion should happen on the local machine
Default value: n/a
From 7.8.2 on: The com.openexchange.documentconverter.RemoteBaseUrl is not valid for the documentconverter.properties file anymore. The corresponding documentconverter server needs to be set on the Ox backend node, where the documentconverter-client package has been installed. The name of the new entry is com.openexchange.documentconverter.client.remoteDocumentConverterUrl. The entry itself is located within the documentconverter-client.properties configuration file>
- com.openexchange.documentconverter.RemoteCacheUrls =
Use one or more remote converter cache(s) to speedup the conversion. The first entry, if set, is treated as the remote master cache, receiving cache updates from the local cache. Additional entries are treated as remote slave caches for read purposes only.
Set the (whitespace separated) entries to the base URL('s) of the appropriate remote host(s): http://host[:port]/documentconverterCachePath
Leave empty if only the local filesystem cache should be used
Default value: n/a
- com.openexchange.documentconverter.RemoteSharePointUrl =
Use a remote SharePoint service to do MSO to PDF conversions.
Set this entry to the URL of the SharePoint host: http://host[:port]/_vti_bin/oxconvert.svc/mex?wsdl
If left empty, the corresponding conversion job always returns false.
Default value: n/a
- com.openexchange.documentconverter.RemoteSharePointUsername =
The login user name to be used for calls to the SharePoint service
Default value: n/a
- com.openexchange.documentconverter.RemoteSharePointPassword =
The password to be used for calls to the SharePoint service
Default value: n/a
- com.openexchange.documentconverter.jobProcessorCount=3
This item determines the number of engines working in parallel for job execution. The value needs to be greater or equal to 1, with best performance results about (n-1), where n specifies the number of available CPU cores of the machine the service is running on.
Default value: 3
- com.openexchange.documentconverter.jobRestartCount=50
This item determines the maximum number of executed jobs after which a single engine is automatically restarted in order to avoid memory fragmentation and possible memory leaks within one libreaderengine instance,
Default value: 50
- com.openexchange.documentconverter.jobExecutionTimeoutMilliseconds=60000
This item determines the timeout in milliseconds, after which the execution of a single job is terminated.
Default value: 60000
- com.openexchange.documentconverter.maxVMemMB=2048
This item determines the maximum size in megabytes (MB) of virtual memory that each started readerengine process is allowed to consume. If a job tries to consume more VMem than set via this config item, the processing of the current job for the appropriate readerengine process will be aborted and the underlying process is restarted to avoid memory corruption.
Set this value to -1 for no upper limit.
Default value: 2048
- com.openexchange.documentconverter.maxCacheSizeMB=-1
This item determines the maximum size in megabytes (MB) of all persistently cached converter job entries at runtime. A larger value may drastically reduce the time for conversion jobs, e.g. in case of a repeated creation of document previews.
Set this value to -1 for no upper limit.
Default value: -1
- com.openexchange.documentconverter.maxCacheEntries=-1
This item determines the maximum number of converter jobs cached at runtime. The value affects the amount of runtime job information to be cached as well as the number of file entries within the cache directory.
Set this value to -1 for no upper limit.
Default value: -1
- com.openexchange.documentconverter.cacheEntryTimeoutSeconds=2592000
This item determines the timeout in seconds, after which a cached job result is automatically removed from the cache.
Set this value to 0 to disable the timeout based removal of cached job results.
Default value: 2592000
- com.openexchange.documentconverter.enableCacheLookup=false
Setting this flag to true enables the caller of the RemoteInternalPreviewService#getCachedPreviewFor implementation (OfficePreviewService) to retrieve the cached only result of a previous conversion call, without scheduling a new job in case of a non existing cache entry, which might run for a long period time, up to the given job timeout time.
Set to false to disable the cache lookup within the RemoteInternalPreviewService#getCachedPreviewFor implementation.
Default value: false
- com.openexchange.documentconverter.errorCacheTimeoutSeconds=600
This value determines, how long an error, associated with a job hash value, is held within the error cache. If the timeout has not been reached, additional RemoteInternalPreviewService#getPreviewFor calls with the same job hash will instantly return with the cached error code instead of processing the job again.
Set to 0 to disable the error cache handling.
Default value: 0
- com.openexchange.documentconverter.errorCacheMaxCycleCount=5
This value determines the number of cycles, a job, associated with a job hash value, is added to the error cache.
One cycle starts after adding a job to the error cache and ends after the errorCacheTimeout has been reached.
After reaching the given maximum cycle count, the job is not removed from the error cache anymore and will be held within the error cache for the rest of the runtime of the current backend instance.
Since the error cache is not persistent, the cycle counter for each job hash is reset after a restart of the
backend instance.
Set to 0 to disable the error cache handling.
Default value: 5
- com.openexchange.documentconverter.servletLocalFileUrls=false
This item determines, if the documentconverter servlet should be allowed to handle file Urls of the form file://... The file Url itself is a resource that locates files that are locally accessible on the machine, the documentconverter backend is running on.
Default value: false
- com.openexchange.capability.sharepointconversion=false
Capability to enable the usage of a SharePoint conversion server; capability is only
checked, if a valid SharePoint remote converter has been configured appropriately
Default value: false
Handling of temporary files
The DocumentConverter server needs to store files at runtime for different purposes at different volume locations:
- Persistent files (Cache) The files that should last longer than the runtime of one converter instance are stored at the configurable com.openexchange.documentconverter.cacheDir directory. As the name of the property implies, such files are result cache entries used by multiple converter instances. This directory is monitored at runtime and all files are managed by the converter. Constraints for this directory are set via the converter properties com.openexchange.documentconverter.minFreeVolumeSizeMB, com.openexchange.documentconverter.maxCacheSizeMB, com.openexchange.documentconverter.maxCacheEntries and com.openexchange.documentconverter.cacheEntryTimeoutSeconds.
- Medium lasting files These files are only valid for the runtime of one converter instance (e.g. ReaderEngine related runtime config files for each ReaderEngine instance). They are stored within the configurable com.openexchange.documentconverter.scratchDir directory. This directory is not constantly monitored at runtime but all files, contained in the ${com.openexchange.documentconverter.scratchDir}/oxdc.tmp sub directory are managed by the converter during the startup and shutdown phase of one converter server instance. In this case, the whole ${com.openexchange.documentconverter.scratchDir}/oxdc.tmp directory gets cleaned up during converter server shutdown as well as converter server startup. Initial cleanup during startup is necessary due to the fact, that the last converter instance might have aborted for unknown reasons, like e.g. power outage, VM abort etc.
- Short lasting files These files are stored within the Java VM specific I/O temporary directory, whose location is configurable via the Java VM system property java.io.tmpdir. This directory is used by the converter to temporarily store request attachments in most cases. The files stored within this directory have a lifetime equal to the duration of the request itself. When the request has been finished, the appropriate files are cleaned up. For the converter, this means that e.g. source files to be converted and attached to the request are extracted from the request and stored in order to prevent exceeding memory consumption by source file buffers. When the conversion request is finished, the stored temporary file gets deleted.
From 7.10.2 on: The java.io.tmpdir Java system property specified directory will not be used by the converter anymore. Instead, even short living temporary files will be stored at the ${com.openexchange.documentconverter.scratchDir}/oxdc.tmp location. By this change, even short living files will be stored inside this managed directory, so that a server shutdown/start cleans up this directory automatically. This change affects all files created by the converter implementation itself. Temporary files from other baseline bundles might still be stored within the configured java.io.tmpdir.