Doc2Learn

Repository: https://opensource.ncsa.illinois.edu/stash/scm/vs/doc2learn.git
Documentation: Web | PDF
Bug Reporting: Jira
License: NCSA Open Source
Status: Supported

Doc2Learn provides functionality to do side-by-side visual comparisons of documents to quickly explore their contents in terms of word frequency (text, integers and floating point numbers), image color histograms (frequency of colors in an image), and frequency of encoded vector graphics. It can also perform automated grouping of documents based on a similarity of all components in the documents. The pair-wise similarities can be displayed and interacted with to investigate different parameters for grouping documents. Once grouping parameters were selected, the software enables manual re-shuffling of documents among groups, as well as re-ordering the documents within a group. The order of documents within a group is automatically established based on file time stamps. Finally, several attributes of documents in each group are extracted and displayed according to their temporal order. The attributes are evaluated using a few rules in order to perform integrity checks of a set of related documents in a group.