...
 
Commits (9)
......@@ -4,21 +4,21 @@
\chapter*{Preface}
\addcontentsline{toc}{chapter}{Preface}
This book is a direct result of the \emph{Conquaire} (Continuous Quality Control for Research Data to Ensure Reproducibility) project, which was by the DFG between 2016 and 2019.
The goal of the project was to understand in how far principles from continuous quality control and test-driven development as nowadays being state-of-the-art in software engineering can be applied to the management of research data to increase its quality and potential for reuse.
This book is a direct result of the \emph{Conquaire} (Continuous Quality Control for Research Data to Ensure Reproducibility) project, which was funded by the DFG between 2016 and 2019.
The goal of the project was to understand in how far principles from continuous quality control and test-driven development as nowadays being state-of-the-art in software engineering can be applied to the management of research data to increase its quality and potential for re-use.
In order to arrive at such an understanding, we have been closely working together with researchers from different disciplines at Bielefeld University ranging from biology, over chemistry, economics, linguistics, psychology through to computer science / robotics.
All in all, we have been working with eight research groups and have defined a case study in reproducibility with each of these groups. Within these use cases we have aimed at reproducing one central part of a previously published research article.
In doing this we have limited ourselves to reproducing the computational analysis leading to the particular result, as reproducing the actual experiments would have been outside of the scope of the Conquaire project.
In doing this, we have limited ourselves to reproducing the computational analysis leading to the particular result, as reproducing the actual experiments would have been outside of the scope of the Conquaire project.
The book that lies in front of you documents these eight case studies and describes what we have done to reproduce the specific results. In most cases, reproducing the analytical result, in spite of data and scripts being available, has only been possible by close interaction and guidance by the authors of the original publication, which in all cases are direct co-authors of the chapter describing our reproducibility experiments.
The work conducted in the case studies has provided us with a detailed understanding of the analytical workflows used by all the case study partners and has allowed us to get a deep understanding of barriers and challenges in reproducing published results. As a result, we can give a number of clear recommendations at the end of the book, representing the lessons learned from the practical attempt to reproduce a number of published results.
The work conducted in the case studies has provided us with a detailed understanding of the analytical workflows used by all the case study partners and has allowed us to get a deep understanding of barriers and challenges in reproducing published results. Thus, we can give a number of clear recommendations at the end of the book, representing the lessons learned from the practical attempt to reproduce a number of published results.
This exercise in understanding requirements and problems for analytical reproducibility would not have been possible without funding by DFG. Most critically, it would not have been possible without the effort and dedication of the eight research groups we have worked with for the last three years. We would like to thank all of them for their patience with us and for bearing with us while walking on the sometimes stony path of achieving reproducibility. We thank all of the research groups for providing us data, scripts, describing their workflows, etc. All of these groups have been engaged in this project because they were interested in finding how to improve their workflows to make their results transparent, reproducible and thus better accessible to the scientific community. We thank all of these groups for engaging in the project in spite of the risk that
comes with a higher level of transparency and exposure. By being transparent, one risks that others can discover some flaws in the way things have been done. This is quite a risk in science, a risk that we have to nevertheless take as progress in science should always be weighted higher than the consequences for particular individuals.
We would like to thank all the student researchers involved in Conquaire that have supported the activities of reproducing results. We would like to thank in particular Lukas Biermann and Fabian Herrmann as they have been central to the success of many case studies, having worked day-to-day with many of the above mentioned research groups and having developed central pieces of the Conquaire infrastructure for supporting continuous quality control of research data.
comes with a higher level of transparency and exposure. By being transparent, one risks that others can discover some flaws in the way things have been done. This is quite a risk in science, a risk that nevertheless we have to take as progress in science should always be weighted higher than the consequences for particular individuals.
We would like to thank all the student researchers involved in Conquaire who have supported the activities of reproducing results. We would like to thank in particular Lukas Biermann and Fabian Herrmann as they have been central to the success of many case studies, having worked day-to-day with many of the above mentioned research groups and having developed central pieces of the Conquaire infrastructure for supporting continuous quality control of research data.
We would also like to thank Vidya Ayer, who has been working on the project since its start. She has been key in pulling together the different chapters that this book consists of and provided a very early draft version of a manuscript for the book.
Finally, we would like to thank John P. McCrae for contributing to the Conquaire project proposal. Many of the key ideas of Conquaire go back to him.
It has been a pleasure and very rewarding to work with all these scientists and learning about their very specific research questions, goals and methods. We hope that you find this book as exciting to read as it was for us to edit it. \\
It has been a pleasure and very rewarding to work with all these scientists and learning about their very specific research questions, goals, and methods. We hope that you find this book as exciting to read as it was for us to edit it. \\
\vspace{0.5cm} \\
Bielefeld, 21st of July, 2019 \\
Bielefeld, 29th September, 2019 \\
\vspace{0.5cm} \\
Philipp Cimiano, Christian Pietsch, Cord Wiljes
......
This diff is collapsed.
......@@ -9,7 +9,7 @@
\chapterauthor[2]{Martin Egelhaaf}
\chapterauthor[1]{Philipp Cimiano}
\begin{affils}
\chapteraffil[1]{Semantic Computing Group, Faculty of Technology \& Cognitive Interaction Technology Excellence Center, Bielefeld University}
\chapteraffil[1]{Semantic Computing Group, Faculty of Technology \& Cognitive Interaction Technology Excellence Center (CITEC), Bielefeld University}
\chapteraffil[2]{Faculty of Biology, Bielefeld University}
\end{affils}
......@@ -27,7 +27,7 @@ Insect spatial locomotion, bumblebee flights, analytical reproducibility, virtua
\section{Introduction} \label{intro}
Animals move in their environment in a quest for food, a mating partner, or a place to raise their offspring. The animals, therefore, need to solve spatial tasks, viz. orientating themselves, identifying and reaching a target (such as, a mating partner or, a food source), following habitual routes (for e.g., between their home and food sources). Even in cluttered environments, animals manage to solve these complex spatial tasks without collisions with obstacles in their path. These abilities are not only observed in vertebrates but also in insects with small brains. Indeed, flying insects can chase their partner \cite{Boeddeker2003}, learn the surroundings of their nest \cite{Robert2018,Lobecke2018}, cross cluttered environments \cite{Crall2015,Kern2012}, and follow routes \cite{Woodgate2016,Lihoreau2010}. Given the small number of nerve cells in insect brains and the limited reliability of neurons in general, extracting information required to solve navigational tasks needs to rely on extremely efficient neural mechanisms. As a consequence of millions of years of evolution, these mechanisms are tightly linked to the sophisticated locomotion and gaze strategies of insects.
Animals move in their environment in a quest for food, a mating partner, or a place to raise their offspring. The animals, therefore, need to solve spatial tasks, viz. orientating themselves, identifying and reaching a target (such as a mating partner or a food source), following habitual routes (for e.g., between their home and food sources). Even in cluttered environments, animals manage to solve these complex spatial tasks without collisions with obstacles in their path. These abilities are not only observed in vertebrates but also in insects with small brains. Indeed, flying insects can chase their partner \cite{Boeddeker2003}, learn the surroundings of their nest \cite{Robert2018,Lobecke2018}, cross cluttered environments \cite{Crall2015,Kern2012}, and follow routes \cite{Woodgate2016,Lihoreau2010}. Given the small number of nerve cells in insect brains and the limited reliability of neurons in general, extracting information required to solve navigational tasks needs to rely on extremely efficient neural mechanisms. As a consequence of millions of years of evolution, these mechanisms are tightly linked to the sophisticated locomotion and gaze strategies of insects.
The research focus of the Neurobiology group at Bielefeld University is to elucidate the computational principles, down to the level of neurons and neural networks that generate and control visually guided behaviour in complex and cluttered environments.
Understanding the computational principles involved in visually guided behaviour requires, first, monitoring the behaviour of the animal over long periods, and second, reconstructing the visual perception of the environment from the animal's perspective.
......@@ -36,9 +36,9 @@ The visual processing and behaviour of insects is extremely fast, and hence moni
Lobecke et al. \cite{Lobecke2018} recorded the behaviour of naive bumblebees exiting their nest for the first time. This behaviour can last for several minutes, and the monitoring of the animal behaviour resulted in the collection of several thousand images on which the bumblebees' positions were automatically tracked and manually reviewed. The orientations of the bumblebees during their learning flights were obtained from the recorded positions using the Camera Calibration Toolbox from MATLAB \cite{caltoolbox}).
In this chapter we discuss a case study in applying a combination of continuous integration principles, virtualization and Git to support reproducibility of one computational step in the experimental pipeline described by Lobecke et al. \cite{Lobecke2018}. Our main motivation for this case study is to develop best practices that support the execution of the original analytical workflow by third parties. For this reason, we explore how virtualization technology can be used to create a reproducible computational environment that can be directly executed without the need to install software. An approach based on virtualization prevents problems related to broken dependencies due to to later non-availability of the required version of software and packages. In addition to using virtualization, we make use of an integration server to specify and execute a number of integrity tests that ensure validity of the data.
In this chapter, we discuss a case study in applying a combination of continuous integration principles, virtualization and Git to support reproducibility of one computational step in the experimental pipeline described by Lobecke et al. \cite{Lobecke2018}. Our main motivation for this case study is to develop best practices that support the execution of the original analytical workflow by third parties. For this reason, we explore how virtualization technology can be used to create a reproducible computational environment that can be directly executed without the need to install software. An approach based on virtualization prevents problems related to broken dependencies due to later non-availability of the required version of software and packages. In addition to using virtualization, we make use of an integration server to specify and execute a number of integrity tests that ensure validity of the data.
The structure of this chapter is as follows: in the following section \ref{methods_egelhaaf} we describe how the data in the original study by Lobecke et al. was collected. In section \ref{analytical_architecture} we describe the technical environment we have set up to preserve the computational environment and thus ensure executability of the analytical workflow. We also describe how we have used continuous integration (CI) principles to implement a set of quality checks and integrity tests that ensure the validity of the data.
The structure of this chapter is as follows: in the following section \ref{methods_egelhaaf}, we describe how the data in the original study by Lobecke et al. was collected. In section \ref{analytical_architecture}, we describe the technical environment we have set up to preserve the computational environment and thus ensure executability of the analytical workflow. We also describe how we have used continuous integration (CI) principles to implement a set of quality checks and integrity tests that ensure the validity of the data.
%\subsection{Main publication result reproduced within Conquaire}
......@@ -59,7 +59,7 @@ Lobecke et al. \cite{Lobecke2018} reported that the first learning flights of bu
All data files, MATLAB and Python scripts for analysis as listed in Fig. \ref{fig:original_workflow} were made available by the Neurobiology group.
%As a result, the data and scripts are made available \footnote{\url{https://GitLab.ub.uni-bielefeld.de/olivier.bertrand/tra3dpy}}.
The XML-file (Fig. \ref{fig:xml_format}) contains parameters of the camera that were used for recording the bee flight movement. They are used in the triangulation process to calculate trajectories using two \emph{tra format} files. The dataset had been partially already published by Lobecke et al. (2018) \cite{Lobecke2018}.
The XML-file (Fig. \ref{fig:xml_format}) contains parameters of the camera that were used for recording the bee flight movement. They are used in the triangulation process to calculate trajectories using two \emph{tra format} files. The dataset had been already partially published by Lobecke et al. (2018) \cite{Lobecke2018}.
The tra files contain the trajectory values in 2D format from two cameras, one located on top and the other located on the side of the bee (Fig. \ref{fig:tra_format}).
The MATLAB file format contains complete trajectory information in 3D format.
......@@ -101,8 +101,8 @@ In this case study, we set up a computational environment that builds on three k
\begin{itemize}
\item Git Repository: The original data and the scripts to compute 3D trajectories from the 2D data of the two cameras were uploaded to a Git repository. The benefit of using Git is that data and scripts are stored in a versioned fashion so that particular versions of data and scripts can be referenced. Further, the data is backuped.
\item Virtualization: We rely on virtualization technology to create a virtual image of the computational environment that can be shared and executed on any machine that runs the same virtualization software. In our case we rely on VMWare.
\item Continuous Integration: We deploy a continuous integration server that pulls the data and scripts from the Git repository, builds the analytical pipeline and executes a number of integrity tests on the data.
\item Virtualization: We rely on virtualization technology to create a virtual image of the computational environment that can be shared and executed on any machine that runs the same virtualization software. In our case, we rely on VMWare.
\item Continuous Integration: We deploy a continuous integration server that pulls the data and scripts from the Git repository, builds the analytical pipeline, and executes a number of integrity tests on the data.
\end{itemize}
In the following, we describe the virtualization and continuous integration approach in more detail. Before, however, we briefly describe how the original MATLAB code that was used in the original experiment was migrated to an open source programming language, Python in particular.
......
This diff is collapsed.
This diff is collapsed.