Project Page: Polyglot
Repository: https://opensource.ncsa.illinois.edu/bitbucket/projects/POL/repos/polyglot
Docker Hub: https://hub.docker.com/u/ncsapolyglot
Downloads: Artifacts
Documentation: Manual | PDF | Javadoc
Bug Reporting: Jira
License: NCSA Open Source
Status: Active
Utilizing a tool called a Software Server to program against functionality within arbitrary 3rd party software, Polyglot is a distributed service which carries out file format conversions utilizing the open, save, import, and export capabilities amongst a dynamic and extensible collection of available software. Polyglot addresses the need to access content amongst the many possible formats available to store data digitally. Polyglot also addresses the problem of information loss that inevitably occurs through conversion and provides means of quantifying that loss and then minimizing it during future conversions.
Motivated by a need to identify 3D file formats best suited for long term digital preservation and the large number of available formats, NCSA Polyglot (n. one who speaks many languages) was created as a means of providing an extensible, scalable, and quantifiable conversion service. In our work we define a file format as well suited for long term preservation if it is open, widely supported, and incurs little information loss when converted to by many other formats (an essential requirement given the already existing collections of files across many formats). With suitable measures we can estimate information loss by comparing files before and after conversions. In order to actually identify an ideal format for long term preservation, however, we need to be able to evaluate conversions between potentially all available formats. In other words, we require a “universal” converter. Building a “universal” converter by directly supporting any of the available formats requires either implementing a file loader for each format (to extract its content independent of the file type) or implementing transcoders which directly convert between pairs of formats. Both tasks are arduous if not impossible given the number of formats available. This is made even more difficult by the fact that many formats are propriety with closed specifications. In order to build a practical “universal” converter we take a different approach. It is a fact that vendors of proprietary formats will support their format within their own software. It is also generally the case that these software applications support importing and exporting to some subset of file formats. We utilize this built in support across and do this across many software packages to build a conversion service. At the heart of Polyglot are software servers which allow arbitrary software to be placed in the “cloud” under a uniform API. The Polyglot service, focused on conversions, utilizes the “open”, “save”, “import”, and “export” operations provided by a collection of distributed software servers. From these operations an input/output graph is constructed which stores formats at its vertices and conversions between input and output formats using a particular piece of software as its edges. In order perform a conversion between a given input and output format we search this graph for a shortest path between the formats, identifying applications capable of performing the conversion and then calling the corresponding software server operations to carry it out.
Source code is available from our git repository at https://opensource.ncsa.illinois.edu/stash/scm/pol/polyglot.git and can be checked out as follows:
git clone https://opensource.ncsa.illinois.edu/stash/scm/pol/polyglot.git
If you wish to contribute code to the project please contact kmchenry@ncsa.illinois.edu
Videos
An overview of the prototype Polyglot 3rd party software extensible conversion system. | Overview |
Using GUI driven 3rd party software open/save capabilities to create an extensible conversion service. | Converting Files Previewing Files Listening for Software Monitoring Software |