Trained on large, accurate datasets, our Auto Structuring tool is machine learned and Harvard, CMS, APA, and Cambridge-educated.
The Auto Structuring tool creates a tagged document from a docx/TeX input. The paragraph-level KPI is in the high 90s. Auto Structuring is repeated through production, before and after copy editing. It works alongside production tools that check for Completeness and Usability. So a mark down document becomes richly tagged to character level. Our Auto Conversion tools transform this tagset to a standard XML / DTD and Schema.
Auto Structuring is not just another production tool that delivers a smart script. Its IP core can be repurposed in strategically important ways to transform STM Publishing. When deployed at Submission for instance, the tool automates Pre Screening. The manuscript comes in clean. There is early XML. Authors do not have to wait for queries that take weeks to come. And Auto Structuring runs in the cloud and on your browser. See also Products > Ingest Central.
In TNQ Production, the XML is created using MLiFlow. A customer once called it the Rolls Royce of conversion tools. From format rich tags in DocX to NLM, it is a smooth ride. To build the content XML, MLiFlow filters format rich Microsoft tags assigning tags from the TNQ Universal DTD (TUD), and then uses XSLT mapping framework to uplift the flat TUD XML to appropriate structured publisher XML such as NLM JATS XML.
The XML from MLiFlow has the necessary instruction for auto-pagination in an XML-aware typesetting system like ArborText Advanced Publisher (3B2) or InDesign. Packaged as a dataset, the XML converts smoothly downstream to HTML. There is greater potential for marking up this XML for a richer reading environment. The scripts currently run by most host platforms do not do enough justice to the XML and its potential.
It should be fun to convert to HTML5, the best HTML version ever. But we are still converting to PDF, ePub, gadgety HTML, Apps and multiple content standards that increase production overheads. There is a case for decommoditising conversion and committing to the browser and HTML5 as the production delivery platform.
Single column HTML output of a full length research article from Page Central, which can be downloaded as a PDF.