The APAeval hackathon was held during the 2021 RNA Society meeting (05/25/2021-06/05/21). APAeval’s aim is to benchmark 18 open-source, poly(A)-site-specific computational tools to provide users a basis for choosing the tools that best cater to their needs. The challenge is designed to be a collaborative effort of RNA biologists, bioinformaticians, and developers at various career stages (from students to PIs) from all over the world (currently has participants from 11 countries). Beyond the evaluation of APA analysis methods, the organizers also aim to increase the involvement of computationally-oriented researchers in RNA research, bridge the RNA Society and the International Society of Computational Biology and create a framework for large-scale, reproducible, community-run, methods evaluation.
We are recruiting more interested volunteers as this communal effort is extended until the 2022 RNA Society meeting. (Click here to join and/or check out our Github repo https://github.com/iRNA-COSI/APAeval)
What are we benchmarking?
The 18 chosen tools fall into a spectrum of three tasks they perform: identification, quantification, differential analysis of poly(A) sites. The APAeval community has come up with the following preliminary metrics for evaluating the tools on each of the tasks and is coming up with more benchmark metrics:
- Sensitivity/FDR of identified sites
- Precision of identified sites with respect to public databases
- Assignment to genome features
- Correlation between RNA-seq and 3’end-seq quantification
- Differential analysis
- Area under the ROC curve (AUC) from the ground truth
- Reproducibility from replicate datasets
- Compute resources
- Maximum memory consumption
- CPU hours
How do we benchmark?
The APAeval setup consists of three steps: data preprocessing, execution workflows, and summary workflows.
- Data preprocessing
We preprocessed publicly available RNA-seq datasets that contain matched orthogonal datasets generated from the same lab with the open-source nf-core/rnaseq (https://github.com/nf-core/rnaseq) pipeline. These datasets include human and mouse simulation, cell-line, and primary-cell data, with varying sequencing depth, allowing the testing of a tool’s sensitivity towards low-vs.-high read coverage.
- Execution workflows
Our participants have developed tool-specific execution workflows in workflow languages snakemake and nextflow, based on their preferences. Each of these workflows executes a specific tool on the preprocessed data and outputs tab-separated file(s) containing information for benchmarking purposes.
- Summary workflows
Each benchmark has its own summary workflow, which collects the tab-separated files from the tools and computes the benchmark-specific metrics. These summary workflows allow integrating the benchmark results on OpenEBench, an ELIXIR platform run by the Barcelona Supercomputing Center (BSC), to make the benchmark results accessible to the public.
APAeval is a work in progress
During the organizing period and the hackathon, we, the APAeval community, worked hard and achieved progress in all aspects of the APAeval challenge as we have completed the preprocessing of the datasets, two execution workflows, and a summary workflow.
We thank AWS for sponsoring us on cloud-computing resources, Seqera Lab for providing technical and application support on Nextflow and NextflowTower (https://tower.nf/), and OpenEBench (https://openebench.bsc.es/dashboard) for helping us in integrating our benchmark results.
Hoping to present the full benchmark results at RNA 2022 and publish two related papers, we have extended the APAeval challenge until the 2022 RNA Society meeting and are inviting interested volunteers to join us (click here to join; see our kick-off video here).
We hope to provide the scientific community a well-analyzed benchmark on the poly(A)-site-specific tools and a generalizable software-benchmark framework that can also be applied to contexts other than poly(A) sites.
Here are some of the APAeval community members during a meeting:
Here is a list of affiliated institutions of current APAeval community members:
- Amsterdam UMC, Netherlands
- Barcelona Supercomputing Center, Spain
- Biozentrum, University of Basel, Switzerland
- Buchmann Institute for Molecular Life Science, Germany
- Centrum für Thrombose und Hämostase (CTH) Mainz, Germany
- ETH Zürich, Switzerland
- Genome Institute of Singapore, Singapore
- National Cancer Institute, United States
- National Taiwan University, Taiwan
- Sloan-Kettering Institute, MSKCC, United States
- Swiss Institute of Bioinformatics, Switzerland
- University of Amsterdam, the Netherlands
- University of California, Irvine, United States
- University College, London, United Kingdom
- University of Edinburgh, United Kingdom
- University of Leeds, United Kingdom
- University of Pennsylvania, United States
- University of Utah, United States
- University of Wuhan, China
- Weill Cornell Graduate School, United States
- Yale University, United States
Look out for our next APAeval presentation for more progress updates at the Integrative RNA Biology (iRNA) track at ISMB 2021 in July (see details here) – Hope to see you there and come join us!