Skip to content

Commands Overview

🚧 Experimental

RolyPoly is under active development - features may be incomplete or experimental.

RolyPoly provides several commands for different stages of viral analysis. For detailed help in the terminal, use:

rolypoly --help

Available Commands

Setup and Data

  • Get Data: Download and prepare necessary databases (rolypoly get-data)
  • Version: Display version and data information (rolypoly version)

Core Pipeline

  • End to End: Run the complete pipeline with default settings (rolypoly end2end)
  • Read Processing: Filter and process raw RNA-seq reads (rolypoly filter-reads)
  • Assembly: Perform assembly of filtered reads (rolypoly assemble)
  • Assembly Filtering: Remove potential host or contamination sequences (rolypoly filter-contigs)
  • Marker Gene Search: Search for RNA virus hallmark genes (RdRp and other markers) in assembled contigs (rolypoly marker-search)
  • Virus Search: Search for viral sequences in filtered assemblies (rolypoly virus-mapping)

Annotation

  • Genome Annotation: Combined RNA and protein annotation 🚧 (rolypoly annotate)
  • RNA Annotation: Predict RNA structures and elements 🚧 (rolypoly annotate-rna)
  • Protein Annotation: Identify and annotate protein-coding regions 🚧 (rolypoly annotate-prot)

Miscellaneous & Utilities

Analysis (Experimental)

Legend: - 🚧 Experimental command - implemented but under active development - ⚠️ Placeholder page - documented but not yet available

Common Options

Many commands share these common options:

  • -t, --threads: Number of threads to use (int, default: 1)
  • -M, --memory: Memory allocation in GB (str, default: "6g")
  • -o, --output or --output-dir: Output location (str, default: current directory + command-specific suffix)
  • --keep-tmp: Save temporary files and folders (optional flag, default: False)
  • -g, --log-file: Path to log file (str, default: command-specific log in current directory)
  • -i, --input: Input file or directory (str, required)

For detailed usage of each command, use the --help option:

rolypoly [COMMAND] --help

Memory Usage Note

The --memory argument sets RAM limits for external programs:

  • SPAdes: Used for genome assembly
  • bbmap: Used in read filtering
  • MEGAHIT: Alternative assembler
  • MMseqs2: Sequence clustering and searching
  • Diamond: Sequence alignment and annotation
  • HMMER/pyHMMER: Used for viral marker gene detection

Note: There is no guarantee that other rolypoly commands or external programs won't exceed the specified memory.

Tips and Tricks

  • Each of the main commands of rolypoly could be entered into using external inputs (e.g. you already have assembly and want to search it for RdRps).
  • If you have a lot of similar samples, some operations might be preformed once instead of rerunning the an entire command. For example, if you are working on the same host (or if ytou suspect the DNA cotanaminats in your samples to be consistent across multiple runs) you can mask the host genome once, externally, provide it to rolypoly's mask_dna, and then when running the filter* commands, use the flag "dont_mask" to skip masking.
  • Offloading commands to different machines is a smart idea if your access to a bigmem compute node is not a given. This is generally true (regardless of rolypoly) for assembly (tend to be memory hungry, at least spades) compared to marker search (more CPU heavy).
  • You can use a small subset of your input data for dry runs to get a sense of what to expect, sort of an "exploratory" investigation :)