2017 Newsletter

SSA Highlights for 2017

The NCSA Scientific Software and Applications (SSA) Division aims to support, build upon, and advance software development efforts across supported research projects, and to use extensive technology and domain science expertise to support external researchers in their use of NCSA cyberinfrastructure resources. In 2017, SSA played a key role in leading and contributing to software, applications support, and training that applies to both. This work is being done under 12 newly funded projects this year, as well as 33 continuing projects.

In software, the Innovative Software and Data Analysis (ISDA) group has been developing a number of software tools, then working with users to apply them to research areas. Four software systems we want to highlight are Clowder, the NDS Lab Workbench, Brown Dog, and IN-CORE.

Clowder is an open source framework that provides a “smart” drop-box like web resource, allowing researchers to not only share research data with their communities, but to also control, curate, analyze, and publish data in a customizable manner specific to their communities needs.  Initially developed from funding by NSF and the National Archives, Clowder is now being leveraged by many campus efforts to quickly support data sharing analysis needs, for example in material science data, the humanities, and industry/agricultural activities, and in the past year, Clowder has garnered increased interest amongst several new communities (both nationally as well as internationally within Europe), been written into numerous proposals, and has been recognized in two best papers:

  • A. Langmead, P. Rodriguez, S. Puthanveetil Satheesan, and A. Craig, “Extracting Meaningful Data from Decomposing Bodies,” Practice & Experience in Advanced Research Computing (PEARC), 2017, Accelerating Discovery in Scholarly Research Best Paper
  • P. Nguyen, S. Konstanty, T. Nicholson, T. O’Brien, A. Schwartz-Duval, T. Spila, K. Nahrstedt, R. Campbell, I. Gupta, M. Chan, K. McHenry, N. Paquin, “4CeeD: Real-time Acquisition and Analysis Framework for Materials-related Cyber-Physical Environments,” The 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), 2017, Best Paper.

The NDS Labs Workbench, developed at NCSA to fill a need identified by the National Data Service Consortium in terms of finding, evaluating, and interconnecting data management and analysis tools, has had increased exposure and accessibility of these tools as well as the potential to launch such tools next to large, Terabyte/Petabyte range, datasets.  The Workbench, which is an “App-Store” for scientific data management/analysis tools, has been leveraged in 2017 to drive campus-connected workshops such as ThinkChicago, Phenome 2017, the PI4 Bootcamp, the 2017 North American Einstein Toolkit School and Workshop, as well as other national hackathons/communities such as EarthCube and the Research Data Alliance.

Brown Dog had its beta release in July, bringing together a number of NCSA technologies such as Clowder, Polyglot, DataWolf, Versus, and PEcAn to stand up an extensible and distributed Data Transformation Service (DTS) that supports data access and manipulation across scientific research activities.

IN-CORE (Interdependent Networked Community Resilience Modeling Environment) Version 1 beta was released in September, leveraging NCSA’s ERGO framework and expanding to support analyzing the effects of hazards beyond earthquakes, such as tornadoes, wildfires, and tsunamis.

In applications support, Michelle Gower from the Advanced Application Support (AAS) Group and the Dark Energy Survey Data Management (DESDM) played a key role in the LIGO discovery of gravitational waves from a neutron star merger.

The Science and Engineering Application Support (SEAS) group supports many research teams on Blue Waters, and won an Editor’s Choice Award from HPCWire for the Best Use of HPC in Physical Sciences for support of the ArcticDEM project, which also used the Swift workflow system, partially developed at NCSA.

SEAS members also participated in moving several major science project pipelines, such as LIGO, ATLAS, LSST, to the Cray ecosystem of Blue Waters, and determined, documented, and presented and/or published papers on best-practices for provisioning productivity software (Python) and environments (Shifter) on supercomputing systems with older software.

SSA and AAS lead the cyberinfrastructure portion of the new nanomanufacturing (nanoMFG) node of the Network for Computational Nanotechnology (NCN), supporting tool developers and tools users in this field.

The SEAS group prepared the Sustained Petascale Performance (SPP) application benchmark suite that was used in part for NSF’s “Towards a Leadership-Class Computing Facility” solicitation, as well as playing a key role in NCSA’s response to this overall solicitation.

In training, the AAS group ramped up Research IT training on campus as part of the Deputy CIO for Research IT Office. Externally, the group worked as part of the XSEDE project to organize the International HPC Summer School (IHPCSS2017), including acquiring funding for the school, developing moodle support, and providing access to computing allocations. The group also participated in organizing XSEDE Broadening Participation outreach/training workshops, including evolving new user training material and access to resources with the additional complications of two factor authentication. They also updated XSEDE training course, maintained CI-Tutor and Badging Moodle, lead development of training review process.

Finally, SSA works on promoting the concept of scientific software as a key part of research, including how such software is made more sustainable and how contributors to such software receive credit visible in academia for their work. For example, Dan Katz presented on these topics in 2017 with the keynote at the International Workshop on Science Gateways – Australia: “Software Citation: A solution with a problem”, and an invited plenary talk at eResearch Australasia: “Software in Research: Underappreciated and Underrewarded,” and Kenton McHenry presented “Enabling Open Science without Impeding Open Science” at Toward an Open Science Enterprise, at the National Academies.

Overall, the SSA Division participated in outreach and dissemination by presenting 15 invited talks; publishing 9 journal papers and 13 conference and workshop papers; presenting 6 additional conference talks and 7 conference posters; writing 5 technical reports; organizing or co-organizing 14 conferences and workshops; editing 1 journal special issue; writing 4 organizational blog posts; presenting 7 tutorials and short courses; presenting 2 webinars; attending 24 other conferences and workshops, 8 other collaborative meetings, and 2 advisory committee meetings; and finally, our work was discussed in 4 general articles.

New SSA projects in 2017:

  1. CDC Vector Borne Diseases:  Upper Midwestern Center of Excellence in Vector Borne Diseases  (news release)
  2. KICT/KISTI Risk Assessment of High-rise Mixed-use Buildings
  3. NCSA Faculty Fellowship – Computational Infrastructure for Collaborative Design of Semiconductor Nanocrystals
  4. NCSA Faculty Fellowship – Materials Modeling Optimization
  5. NCSA Faculty Fellowship – Modeling the Massive HathiTrust Corpus: Creating Concept-based Representations of 15 Million Volumes
  6. NIH: Towards a FAIR Digital Ecosystem in the Cloud
  7. NSF: Network for Computational Nanotechnology – Hierarchical nanoMFG Node
  8. NSF: SI2-S2I2 Conceptualization: Geospatial Software Institute
  9. NSF: SI2-S2I2 Conceptualization: Conceptualizing a US Research Software Sustainability Institute (URSSI)
  10. UIUC ACES Cover Crop
  11. UIUC ACES Farm Doc
  12. Industry Partnerships

Invited talks:

Journal papers:

  • S. Jha, D. S. Katz, A. Luckow, N. Chue Hong, O. Rana, Y. Simmhan, “Introducing Distributed Dynamic Data-Intensive (D3) Science: Understanding Applications and Infrastructure”, Concurrency Computat: Pract Exper, 2017.
    https://doi.org/10.1002/cpe.4032
  • P. Jiang, M. Elag, P. Kumar, S. D. Peckham, L. Marini, R. Liu, “A service-oriented architecture for coupling web service models using the Basic Model Interface (BMI)”, Environmental Modelling & Software, 2017.
    https://doi.org/10.1016/j.envsoft.2017.01.021
  • A. Marshall-Colon, S. P. Long, D. K. Allen, G. Allen, D. A. Beard, B. Benes, S. von Caemmerer, A. J. Christensen, D. J. Cox, J. C. Hart, P. M. Hirst, K. Kannan, D. S. Katz, J. P. Lynch, A. J. Millar, B. Panneerselvam, N. D. Price, P. Prusinkiewicz, D. Raila, R. G. Shekar, S. Shrivastava, D. Shukla, V. Srinivasan, M. Stitt, M. J. Turk, E. O. Voit, Y. Wang, X. Yin, X.-G. Zhu, “Crops In Silico: Generating Virtual Crops Using an Integrative and Multi-scale Modeling Platform,” Frontiers in Plant Science, v.8, page 786, 2017. https://doi.org/10.3389/fpls.2017.00786
  • M. Elag, P. Kumar, L. Marini, J. Myers, M. Hedstrom, and B. Plale, “Identification and characterization of information-networks in long-tail data collections,” Environmental Modelling & Software, 2017. https://doi.org/10.1016/j.envsoft.2017.03.032
  • C. Sophocleous, L. Marini, R. Georgiou, M. Elfarargy, K. McHenry, “Medici 2: A Scalable Content Management System for Cultural Heritage Datasets”, Code4Lib Journal, 2017. http://journal.code4lib.org/articles/12317
  • J. Kwack, G. H. Bauer, “HPCG and HPGMG benchmark tests on multiple program, multiple data (MPMD) mode on Blue Waters—A Cray XE6/XK7 hybrid system,” Concurrency Computat: Pract Exper. 2017;e4298. https://doi.org/10.1002/cpe.4298
  • A. Allen, C. Aragon, C. Becker, J. Carver, A. Chi?, B. Combemale, M. Croucher, K. Crowston, D. Garijo, A. Gehani, C. Goble, R. Haines, R. Hirschfeld, J. Howison, K. Huff, C. Jay, D. S. Katz, C. Kirchner, K. Kuksenok, R. Lämmel, O. Nierstrasz, M. Turk, R. van Nieuwpoort, M. Vaughn, J. J. Vinju, “Engineering Academic Software (Dagstuhl Perspectives Workshop 16252),” Dagstuhl Manifestos, v.6(1), 2017.
    https://doi.org/10.4230/DagMan.6.1.1
  • R. C. Jiménez, M. Kuzak, Mo. Alhamdoosh, M. Barker, B. Batut, M. Borg, S. Capella-Gutierrez, N. Chue Hong, M. Cook, M. Corpas, M. Flannery, L. Garcia, J. L. Gelpí, S. Gladman, C. Goble, M. González Ferreiro, A. Gonzalez-Beltran, P. C. Griffin, B. Grüning, J. Hagberg, P. Holub, R. Hooft, J. Ison, D. S. Katz, B. Leskošek, F. López Gómez, L. J. Oliveira, D. Mellor, R. Mosbergen, N. Mulder, Y. Perez-Riverol, R. Pergl, H. Pichler, B. Pope, F. Sanz, M. V. Schneider, V. Stodden, R. Suchecki, R. Svobodová Va?eková, H.-A. Talvik, I. Todorov, A. Treloar, S. Tyagi, M. van Gompel, D. Vaughan, A. Via, X. Wang, N. S. Watson-Haigh, S. Crouch, “Four simple recommendations to encourage best practices in research software,” F1000Research, v.6:876, 2017.
    https://doi.org/10.12688/f1000research.11407.1
  • J. P. Tennant, J. M. Dugan, D. Graziotin, D. C. Jacques, F. Waldner, D. Mietchen, Y. Elkhatib, L. B. Collister, C. K. Pikas, T. Crick, P. Masuzzo, A. Caravaggi, D. R. Berg, K. E. Niemeyer, T. Ross-Hellauer, S. Mannheimer, L. Rigling, D. S. Katz, B. Greshake, J. Pacheco-Mendoza, N. Fatima, M. Poblet, M. Isaakidis, D. E. Irawan, S. Renaut, C. R. Madan, L. Matthias, J. N. Kjær, D. P. O’Donnell, C. Neylon, S. Kearns, M. Selvaraju, and J. Colomb, “A multi-disciplinary perspective on emergent and future innovations in peer review” [version 3; referees: 2 approved with reservations], F1000Research, v.6:1151, 2017. https://doi.org/10.12688/f1000research.12037.3

Conference papers:

  • P. Nguyen, S. Konstanty, T. Nicholson, T. O’Brien, A. Schwartz-Duval, T. Spila, K. Nahrstedt, R. Campbell, I. Gupta, M. Chan, K. McHenry, and N. Paquin, 4CeeD: Real-time Acquisition and Analysis Framework for Materials-related Cyber-Physical Environments”, The 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), 2017. (best paper award)
    https://doi.org/10.1109/CCGRID.2017.51
  • J. Kwack and G. Bauer, “HPCG and HPGMG benchmark tests in Multiple Program, Multiple Data (MPMD) mode on Blue Waters – a Cray XE6/XK7 hybrid system,” CUG, 2017. https://cug.org/proceedings/cug2017_proceedings/includes/files/pap118s2-file1.pdf
  • G. Bauer, V. Anisimov, G. Arnold, B. Bode, R. Brunner, T. Cortese, R. Haas, A. Kot, W. Kramer, J. Kwack, J. Li, C. Mendes, R. Mokos, C. Steffen, “Updating the SPP Benchmark Suite for Extreme-Scale Systems,” CUG, 2017.
  • C. MacLean, “Python Usage Metrics on Blue Waters, ” CUG, 2017.
    https://cug.org/proceedings/cug2017_proceedings/includes/files/pap163s2-file1.pdf
  • P. Rodriguez, S. Puthanveetil Satheesan, J. Will, E. Wuerffel, and A. Craig, “Extracting, Assimilating, and Sharing the Results of Image Analysis on the FSA/OWI Photography Collection”, PEARC, 2017. https://doi.org/10.1145/3093338.3093365
  • A. Langmead, P. Rodriguez, S. Puthanveetil Satheesan, and A. Craig, “Extracting Meaningful Data from Decomposing Bodies”, PEARC, 2017. (Best Paper in the “Accelerating Discovery in Scholarly Research” Track)
    https://doi.org/10.1145/3093338.3093368
  • C. Willis, M. Lambert, K. McHenry, and C. Kirkpatrick, “Container-based Analysis Environments for Low-Barrier Access to Research Data,” PEARC, 2017. https://doi.org/10.1145/3093338.3104164
  • C. MacLean, H. Leong, J. Enos, “Improving the Start-Up Time of Python Applications on Large Scale HPC Systems,” HPCSYSPROS2017, SC17. https://doi.org/10.1145/3155105.3155107
  • U. Nangia and D. S. Katz, “Track 1 Paper: Surveying the U.S. National Postdoctoral Association Regarding Software Use and Training in Research,” WSSSPE5.1, 2017. https://doi.org/10.6084/m9.figshare.5328442
  • U. Nangia and D. S. Katz, “Understanding Software in Research: Initial Results from Examining Nature and a Call for Collaboration,” WSSSPE5.2, held with IEEE International Conference on eScience, 2017. https://doi.org/10.1109/eScience.2017.78
  • I. Gutierrez-Polo, Y. Zhao, S. Bradley, E. Roeder, M. Pitcel, K. TePas, P. Collingsworth, and L. Marini, “Monitoring Water Quality in the Great Lakes leveraging Geo-Temporal Cyberinfrastructure,” IEEE International Conference on eScience, 2017.  https://doi.org/10.1109/eScience.2017.50
  • E. A. Huerta, R. Haas, E. Fajardo, D. S. Katz, S. Anderson, P. Couvares, J. Willis, T. Bouvet, J. Enos, W. T. C. Kramer, H. W. Leong, D. Wheeler, “BOSS-LDG: A Novel Computational Framework that Brings Together Blue Waters, Open Science Grid, Shifter and the LIGO Data Grid to Accelerate Gravitational Wave Discovery,” IEEE International Conference on eScience, 2017. https://doi.org/10.1109/eScience.2017.47
  • M. Turilli, Y. N. Babuji, A. Merzky, M. T. Ha, M. Wilde, D. S. Katz, S. Jha, “Evaluating Distributed Execution of Workloads,” IEEE International Conference on eScience, 2017. https://doi.org/10.1109/eScience.2017.41

Conference presentations:

  • L. J. Hwang, D. S. Katz, L. H. Kellogg, K. E. Niemeyer, FORCE11 Software Citation Working Group, “Software vs. Data: The FORCE11 Citation Principles,” Seismological Society of America Annual Meeting, 2017.
  • A. Langmead, P. Rodriguez, S. Puthanveetil Satheesan, and A. Craig, “Extracting Meaningful Data from Decomposing Bodies,” Workshop on Computer Vision in Digital Humanities, Digital Humanities, 2017.
  • K. Niemeyer, A. Smith, L. Barba, G. Githinji, M. Gymrek, K. Huff, D. Katz, C. Madan, A. Cabunoc Mayes, K. Moerman, P. Prins, K. Ram, A. Rokem, T. Teal, J. Vanderplas, “Introducing JOSS: The Journal of Open Source Software,” SciPy, 2017.
  • A. Gardella, B. Cowdery, M. De Kauwe, A. R. Desai, M. Duveneck, I. Fer, R. Fisher, R. Knox, R. Kooper, D. LeBauer, T. Mccabe, F. Minunno, A. Raiho, S. Serbin, A. N. Shiklomanov, A. Thomas, A. Walker, M. Dietze, “A multi-model assessment of terrestrial biosphere model data needs”, AGU, 2017.
  • S. Puthanveetil Satheesan, “Brown Dog: A Data Transformation Ecosystem for Research – Advancing from Beta to 1.0”, AGU, 2017.
  • D. Suh, J. S. Lee, S. Chai, S. Shin, C. Navarro, “Risk Assessment for Korean High-rise Mixed-use Buildings, 16th European Conference on Earthquake Engineering, Thessaloniki, Greece. (Abstract accepted in 2017, conference is in 2018.)

Conference posters:

  • N. Kenyon, M. Willman, D. Han, A. Rabassa, W. Diaz, R. Leeman, K. McHenry, D. Salomon, A. Bartholomew, N. Kenyon, D. Berman, “Timing of Mesenchymal Stem Cell Infusions Affects Rejection Free and Overall Islet Allograft Survival”, American Transplant Congress, 2017.
  • D. S. Katz, “Parsl: A Python-based Parallel Scripting Library,” Crops in silico (Cis) workshop, 2017.
  • U. Nangia and D. S. Katz, “Understand the Role of Software in Research,” iSchool Research Showcase, 2017.
  • D. S. Katz, “Parsl: A Python-based parallel scripting library”, iSchool Research Showcase, 2017.
  • T. Li, C. Steffen, R. Chui, R. Haas, L. S. Mainzer, “Benchmarking Parallelized File Aggregation Tools for Large Scale Data Management,” SC17.
  • S. Puthanveetil Satheesan, J. Alameda, S. Bradley, M. Dietze, G. Jansen, R. Kooper, P. Kumar, J. Lee, R. Marciano, L. Marini, B. S. Minsker, C. Navarro, E. Roeder, A. Schmidt, M. Slavenas, W. Sullivan, B. Zhang, Y. Zhao, I. Zharnitsky, K. McHenry, “Brown Dog: A Data Transformation Ecosystem for Research – Advancing from Beta to 1.0,” AGU, 2017.
  • R. Kooper, M. Burnette, J. Maloney, D. LeBauer, “Data Flow for the TERRA-REF Project,”  AGU, 2017.

Tech reports and preprints:

Conference/Workshop Organization:

Journal editing:

Blogs:

Teaching:

  • AAS teams supported Globus ECSS staff training event on Globus Administration.
  • Galen Arnold, JaeHyuk Kwack, and Colin MacLean presented mini-tutorials at Blue Waters Symposium.
  • SEAS staff provided on-site classroom support at NCSA and Beckman as well as consulting services to other remote sites and to the Institute instructors from NCSA, ANL, NERSC, ORNL, TACC for the 5 day workshop. Scaling to Petascale Institute
  • AAS staff presented ECSS Staff Hands-on tutorial for XSTREAM, PEARC17.
  • AAS staff presented ECSS Staff Hands-on tutorial for KNL, PEARC17.
  • Dan Katz co-taught a 6-hour course on “Software Citation: Principles, Usage, Benefits, and Challenges,” FORCE11 Scholarly Communication Institute
  • Dan co-taught a 3-hour course on computational reproducibility, FORCE11 Scholarly Communication Institute

Webinars:

Conference/Workshop participation:

  • Dan Katz attended the PI meeting for new NSF CISE REU sites
  • Dan invited to NSF Beyond Reproducibility workshop to represent Office of Advanced Cyberinfrastructure researcher view
  • Dan attended the Software Sustainability Institute’s Collaborations Workshop, and presented a lighting talk, was on a panel, and led a mini session, all on software citation and software sustainability
  • Kenton McHenry presented DIBBs activity as well as NDS Labs Workbench as a tool to explore data management tools at Joint Big Data/Big Data Hub meeting
  • NDS team participated/organized 7th Workshop of the National Data Service Consortium / GlobusWorld 2017
  • AAS team supported XSEDE@Southern University at Baton Rouge
  • AAS team supported XSEDE@Jackson State at Baton Rouge
  • Dan presented a position paper “Without fundamental advances that address heterogeneity and dynamism, we will remain condemned to point and non-extensible solutions for distributed applications and systems” at the DOE Future Online Analysis Platform workshop
  • Kenton attended Open Research Cloud Declaration Workshop
  • SEAS team members and others from SSA attended the Blue Waters Symposium
  • Jay Alameda attended  “Modeling Research in the Cloud” workshop, May 31-June 2, UCAR, Boulder, CO.
  • Shannon Bradley sponsored to attend 2017 Grace Hopper Celebration.
  • SEAS staff participated in virtual, summer Petascale Institute, along with XSEDE and DOE sites.
  • SEAS staff members Roland Haas and JaeHyuk Kwack attended the Joint Lab conference (JLESC)
  • Greg Bauer presented overview of container support on Blue Waters at the local Container Analysis Environments Workshop, NCSA
  • Jong Lee attended NSF-sponsored RCN workshop: Professionalization in Cyberinfrastructure
  • Dan attended Research Data Alliance 10th Plenary Meeting to discuss software citation, science gateways, and software preservation
  • The AAS team supported XSEDE@South Carolina State University
  • Dan attended NUMFocus’ first sustainability summit, where he shared and learned about software sustainability good practices.
  • Dan attended NumFOCUS’s Diversity and Inclusion in Scientific Computing (DISC) unconference that was held with PyData NYC 2017.
  • Kenton represented NDS at Open Research Cloud Declaration workshop in Boston
  • Kenton attended the 2017 EarthCube All Hands Meeting and presented an instance of the NDS Labs Workbench modified for EarthCube
  • Jong presented Ergo at Workshop: Data Science in Emergency Preparation and Response

Advisory Committees:

  • Dan Katz was invited to serve on the IEDA (Interdisciplinary Earth Data Alliance) Technical Advisory Committee for a 3-year term, and attended one advisory board meeting.
  • Dan attended the annual advisory board meeting for the UNICAMP Center for Computational Science.

Collaborative meetings:

  • Dan Katz visiting the Netherlands eScience Center to meet with director and staff
  • Dan met with staff at Elsevier to discuss software publications and citation
  • KISTI visitors at NCSA attended Ergo training sessions for project on “Risk Analysis for Mixed-use and High-rise Buildings”
  • Dan participated in a set of High Energy Physics S2I2 conceptualization workshops to plan a HEP software institute
  • Kenton McHenry, Jong Lee , and Rob Kooper conducted workshops at NCHC in Taiwan showcasing NCSA software: Brown Dog, Clowder, GeoDashboard, NDS Labs Workbench.
  • Jong attended meetings in South Korea: project meeting with KICT (Korea Institute of Civil Engineering and Building Technology) and KISTI; meetings with research teams who are interested in data project at KISTI such as Big Data Hub, Brown Dog, NDS, etc.
  • Jong presented Great Lakes To Gulf Virtual Observatory Phase 3 work to Walton/McKnight foundation
  • Jong demoed Data portal for Illinois Nutrient Loss Reduction Strategy at 2017 Inaugural Illinois NLRS Conference

SSA work mentioned in publications: