VisOnFire: Visual Analysis of Large and Heterogeneous Scientific Workflows for Analytical Provenance

Over the last few decades, many scientific fields such as biomedicine or climate research have been confronted with vast and continuously growing amounts of data. However, not data gathering is the grand challenge anymore but its analysis. Both the sheer amount of data and its complexity pose significant problems. Today, local solutions are not feasible anymore and large-scale experiments are carried out on powerful server infrastructure as scientific workflows consisting of data transformation and analysis operations. Running such workflows can take hours, days, or even weeks. Misconfiguration, erroneous scripts, and non-converging operations are highly problematic in this respect, as re-running the workflow is costly both in time and money. Moreover, these workflows are created, administered, and changed by potentially large and spatially separated consortia of involved researchers. Due to this complexity, it becomes increasingly hard to gain an overview of all processing steps involved and to trace who has changed what at which place and caused which changes in (intermediate) results. In many contexts, reproducibility of results generated by complex scientific workflows is crucial. However, a recent study showed that it was not possible to confirm findings of almost 90% of over 50 cancer genomics studies. Thus, developing novel approaches that realize traceability and reproducibility is of utmost importance.

The key to traceability and reproducibility lies in the collection of information about the processed data, the applied operations, and their parameters over time. Modern scientific workflow tools provide analytical provenance, but are mostly restricted to scenarios where a single static input dataset results in a single output dataset. With changes occurring at the level of the input data, the workflow itself, and also its parameterization, it is hard and tedious—if even possible—to find out which changes actually caused variations in the output using current technology.

The primary goal of our project is to realize provenance at all levels, allowing analysts to gain a deeper understanding of the workflow, changes applied to it, and how they influence the results. This will be achieved by developing a visual forensic tool for scientific workflows, which includes novel visual analysis methods that allow for a scalable visualization of the workflow and its changes, a visual comparison of complex data structures, and novel change metrics needed to quantify changes in complex data structures.

The methods we develop will help address the issue of reproducibility in published results, which has plagued many scientific communities. Investigators can use our methods to make all or parts of it public, traceable, and reproducible. The provenance visualization and query tools will make it straightforward for scientists to offer a comprehensive description of the analyses performed to obtain their results.

Journal Publications

Christina Niederer, Holger Stitz, Reem Hourieh, Florian Grassinger, Wolfgang Aigner, Marc Streit
TACO: Visualizing Changes in Tables Over Time
IEEE Transactions on Visualization and Computer Graphics (InfoVis '17), 24(1), pp. 677-686, 2018.
Paper Homepage
Holger Stitz, Stefan Luger, Nils Gehlenborg, and Marc Streit
AVOCADO: Visualization of Workflow-Derived Data Provenance for Reproducible Biomedical Research
Computer Graphics Forum (EuroVis '16), vol. 35, no. 3, pp. 481-490, 2016
Paper Homepage
Holger Stitz, Samuel Gratzl, Wolfgang Aigner, and Marc Streit
ThermalPlot: Visualizing Multi-Attribute Time-Series Data Using a Thermal Metaphor
IEEE Transactions on Visualization and Computer Graphics, 22(12), pp. 2594-2607, 2016
Paper Homepage

Extended Abstracts and Posters

Holger Stitz, Samuel Gratzl, Harald Piringer, and Marc Streit
Provenance-Based Visualization Retrieval
IEEE Conference on Visual Analytics Science and Technology (VAST ’17), Phoenix, AZ, USA, 2017
IEEE VAST 2017 Best Poster Award Poster Abstract | Poster | Video
Reem Hourieh, Holger Stitz, Nils Gehlenborg, Marc Streit
TaCo: Comparative Visualization of Large Tabular Data
Poster Compendium of the Eurographics/IEEE Symposium on Visualization (EuroVis ’16), Groningen, Netherlands, 2016.
Poster Abstract | Poster | Video
Stefan Luger, Holger Stitz, Nils Gehlenborg, and Marc Streit
Interactive Visualization of Provenance Graphs for Reproducible Biomedical Research
IEEE Conference on Information Visualization (InfoVis ’15), Chicago, IL, USA, 2015
IEEE InfoVis 2015 Best Poster Award Poster Abstract | Poster
Holger Stitz, Samuel Gratzl, Wolfgang Aigner, and Marc Streit
ThermalPlot: Visualizing Multi-Attribute Time-Series Data Using a Thermal Metaphor
IEEE Conference on Information Visualization (InfoVis ’15), Chicago, IL, USA, 2015
IEEE InfoVis 2015 Honorable Mention Poster Award Poster Homepage
FH St. Pölten
VisOnFire: Visual Analysis of Large and Heterogeneous Scientific Workflows for Analytical Provenance
European Researchers Night, 2015
Poster

Theses

Michael Gillhofer | Master's Thesis
Provenance Graph Based Steering
Supervision: Prof. Marc Streit
Reem Hourieh | Master's Thesis
Comparative Visualization of Large Tabular Data
Supervision: Prof. Marc Streit
Stefan Luger | Master's Thesis
Interactive Visualization of Provenance Graphs for Reproducible Biomedical Research
Supervision: Prof. Marc Streit

You can contact us, via marc.streit@jku or wolfgang.aigner@fhstp.ac.at.

The project (P 27975-NBL) is funded by the Austrian Science Fund (FWF).

VisOnFire

About

Publications

Journal Publications

Extended Abstracts and Posters

Theses

Contact

Acknowledgements

Team

About

Publications

Journal Publications

Extended Abstracts and Posters

Theses

Contact

Acknowledgements

Team

Related Projects