Commands Overview¶
🚧 Experimental
RolyPoly is under active development - features may be incomplete or experimental.
RolyPoly provides several commands for different stages of viral analysis. For detailed help in the terminal, use:
Available Commands¶
Setup and Data¶
- Get Data: Download and prepare necessary databases (
rolypoly get-data) - Version: Display version and data information (
rolypoly version)
Core Pipeline¶
- End to End: Run the complete pipeline with default settings (
rolypoly end2end) - Read Processing: Filter and process raw RNA-seq reads (
rolypoly filter-reads) - Assembly: Perform assembly of filtered reads (
rolypoly assemble) - Assembly Filtering: Remove potential host or contamination sequences (
rolypoly filter-contigs) - Marker Gene Search: Search for RNA virus hallmark genes (RdRp and other markers) in assembled contigs (
rolypoly marker-search) - Virus Search: Search for viral sequences in filtered assemblies (
rolypoly virus-mapping)
Annotation¶
- Genome Annotation: Combined RNA and protein annotation 🚧 (
rolypoly annotate) - RNA Annotation: Predict RNA structures and elements 🚧 (
rolypoly annotate-rna) - Protein Annotation: Identify and annotate protein-coding regions 🚧 (
rolypoly annotate-prot)
Miscellaneous & Utilities¶
- Miscellaneous Commands: Quality of life utilities
- Shrink Reads: Subsample FASTQ files (
rolypoly shrink-reads) - Mask DNA: Mask viral-like sequences in reference genomes (
rolypoly mask-dna) - FASTX Stats: Calculate sequence statistics (
rolypoly fastx-stats) - Rename Sequences: Standardize sequence IDs (
rolypoly rename-seqs) - Quick Taxonomy: Fast taxonomic assignment 🚧 (
rolypoly quick-taxonomy) - Fetch SRA: Download SRA data from ENA (
rolypoly fetch-sra)
Analysis (Experimental)¶
- Host Classification: Predict potential viral hosts ⚠️
- Binning:
- Termini Analysis: Analyze contig termini (
rolypoly termini) 🚧 - Correlation Analysis: Analyze co-occurrence across samples (
rolypoly correlate) 🚧
Legend: - 🚧 Experimental command - implemented but under active development - ⚠️ Placeholder page - documented but not yet available
Common Options¶
Many commands share these common options:
-t, --threads: Number of threads to use (int, default: 1)-M, --memory: Memory allocation in GB (str, default: "6g")-o, --outputor--output-dir: Output location (str, default: current directory + command-specific suffix)--keep-tmp: Save temporary files and folders (optional flag, default: False)-g, --log-file: Path to log file (str, default: command-specific log in current directory)-i, --input: Input file or directory (str, required)
For detailed usage of each command, use the --help option:
Memory Usage Note¶
The --memory argument sets RAM limits for external programs:
- SPAdes: Used for genome assembly
- bbmap: Used in read filtering
- MEGAHIT: Alternative assembler
- MMseqs2: Sequence clustering and searching
- Diamond: Sequence alignment and annotation
- HMMER/pyHMMER: Used for viral marker gene detection
Note: There is no guarantee that other rolypoly commands or external programs won't exceed the specified memory.
Tips and Tricks¶
- Each of the main commands of rolypoly could be entered into using external inputs (e.g. you already have assembly and want to search it for RdRps).
- If you have a lot of similar samples, some operations might be preformed once instead of rerunning the an entire command. For example, if you are working on the same host (or if ytou suspect the DNA cotanaminats in your samples to be consistent across multiple runs) you can mask the host genome once, externally, provide it to rolypoly's mask_dna, and then when running the
filter*commands, use the flag "dont_mask" to skip masking. - Offloading commands to different machines is a smart idea if your access to a
bigmemcompute node is not a given. This is generally true (regardless of rolypoly) for assembly (tend to be memory hungry, at least spades) compared to marker search (more CPU heavy). - You can use a small subset of your input data for dry runs to get a sense of what to expect, sort of an "exploratory" investigation :)