Project Page: Polyglot
Repository: https://opensource.ncsa.illinois.edu/bitbucket/projects/POL/repos/polyglot
Docker Hub: https://hub.docker.com/u/ncsapolyglot
Downloads: Artifacts
Documentation: Manual | PDF | Javadoc
Bug Reporting: Jira
License: NCSA Open Source
Status: Active

Utilizing a tool called a Software Server to program against functionality within arbitrary 3rd party software, Polyglot is a distributed service which carries out file format conversions utilizing the open, save, import, and export capabilities amongst a dynamic and extensible collection of available software. Polyglot addresses the need to access content amongst the many possible formats available to store data digitally. Polyglot also addresses the problem of information loss that inevitably occurs through conversion and provides means of quantifying that loss and then minimizing it during future conversions.

Motivated by a need to identify 3D file formats best suited for long term digital preservation and the large number of available formats, NCSA Polyglot (n. one who speaks many languages) was created as a means of providing an extensible, scalable, and quantifiable conversion service. In our work we define a file format as well suited for long term preservation if it is open, widely supported, and incurs little information loss when converted to by many other formats (an essential requirement given the already existing collections of files across many formats). With suitable measures we can estimate information loss by comparing files before and after conversions. In order to actually identify an ideal format for long term preservation, however, we need to be able to evaluate conversions between potentially all available formats. In other words, we require a “universal” converter. Building a “universal” converter by directly supporting any of the available formats requires either implementing a file loader for each format (to extract its content independent of the file type) or implementing transcoders which directly convert between pairs of formats. Both tasks are arduous if not impossible given the number of formats available. This is made even more difficult by the fact that many formats are propriety with closed specifications. In order to build a practical “universal” converter we take a different approach. It is a fact that vendors of proprietary formats will support their format within their own software. It is also generally the case that these software applications support importing and exporting to some subset of file formats. We utilize this built in support across and do this across many software packages to build a conversion service. At the heart of Polyglot are software servers which allow arbitrary software to be placed in the “cloud” under a uniform API. The Polyglot service, focused on conversions, utilizes the “open”, “save”, “import”, and “export” operations provided by a collection of distributed software servers. From these operations an input/output graph is constructed which stores formats at its vertices and conversions between input and output formats using a particular piece of software as its edges. In order perform a conversion between a given input and output format we search this graph for a shortest path between the formats, identifying applications capable of performing the conversion and then calling the corresponding software server operations to carry it out.

Source code is available from our git repository at https://opensource.ncsa.illinois.edu/stash/scm/pol/polyglot.git and can be checked out as follows:

git clone https://opensource.ncsa.illinois.edu/stash/scm/pol/polyglot.git

If you wish to contribute code to the project please contact kmchenry@ncsa.illinois.edu

An I/O-Graph, with vertices representing a number of file formats and edges representing a conversion between a source and target format. The highlighted edges indicate a conversion path between the *.stp and *.lwo file format given the 3rd party applications represented within the graph.
The web interface to a Polyglot server. Users can drag and drop files to the top area, select the output format in the list, click “upload”, and download the resulting files when they are available.
Given a “universal converter” one can consider things like a “universal viewer”. In this web interface to a Polyglot server a user can again drag and drop files to the top area. However this time there is no selection of the output format. The output format is hard coded to one that can be displayed in the browser. When the user presses the “upload” button the files are converted and displayed in the area below. In the shown example the files are 3D files and displayed using our included applet which displays files of the type *.obj. The interactive applet allows users to manipulate the resulting 3D objects.


An overview of the prototype Polyglot 3rd party software extensible conversion system. Overview
Using GUI driven 3rd party software open/save capabilities to create an extensible conversion service. Converting Files
Previewing Files
Listening for Software
Monitoring Software