HTML export
The Toolbox web service with the transcribe operation allows the creation of a HTML export for a PDF document.
Particularly when mapping color gradients, patterns and shading, this can always result in quality losses and minor inaccuracies in positioning. The translation of the various fonts is also generally error-prone, especially if a substitute font has to be used (e.g. for the PDF-14 standard fonts). The result should therefore be sighted and checked for deviations if possible!
{
"toolbox": [
{
"transcribe": {
"errorReport": "file",
"failureLevel": "error",
"html": {
"dpi": 72,
"pages": "1"
},
"successReport": "none"
}
}
]
}
The transcribe operation can be used to convert one or more pages of the PDF into a coherent HTML document. Path drawing instructions are translated into corresponding SVG drawing paths, fonts and raster graphics are extracted if possible and embedded directly into the resulting document, and the frames and dimensions of the pages are mapped using appropriate containers.
Texts are rendered using the determined fonts whenever possible, so that textual content is selectable and preserved.
For the representation of more complex drawing operations such as color gradients, patterns and shading, raster graphics are always generated, which can lead to inaccuracies and quality losses.
The errorReport parameter can be used to create a XML report that provides more information about conversion problems.
- The
transcribeoperation currently does not support the export of non-Unicode capable fonts. Only fonts whose glyphs can be mapped to Unicode can be displayed correctly, text content without Unicode mapping will be missing in the resulting document. This mainly concerns Type1 and Type3 Postscript fonts and some CFF fonts, if they do not have a corresponding table. - All specifications regarding the resolution of the result refer primarily and significantly to the raster graphics used or to be generated.