Miscellaneous & Quality of Life Commands¶
This page documents utility commands in RolyPoly that help with data handling, sequence processing, and other common tasks. These are available under the misc group or as standalone commands.
Shrink Reads (shrink-reads)¶
Subsample FASTQ files by number or proportion.
Usage:
---subset-type: top_reads (default) or random
- --sample-size: Number of reads (int) or proportion (float <1)
flowchart TD
A["Input: FASTQ (.fq/.fastq, gz) "] --> B["Sample selection<br> (top_reads / random)"] --> C["Output: subsampled FASTQ<br> (.fq / .fq.gz)"]
classDef io fill:#f0f9ff,stroke:#0366d6,color:#03396c;
class A,C io
class B fill:#fffaf0,stroke:#b85c00,color:#7a3b00;
Mask DNA (mask-dna)¶
Mask viral-like sequences in a reference genome using various aligners.
Usage:
--a, --aligner: Aligner backend (minimap2, mmseqs2, diamond, bowtie1, bbmap)
- -r, --reference: Custom masking reference
- --tmpdir: Temporary directory
flowchart TD
M1["Input: Reference FASTA (.fa/.fasta)"] --> M2["Align to viral DB<br> (minimap2 / bbmap / mmseqs)"] --> M3["Mask regions (N / lowercase)"] --> M4["Output: masked FASTA"]
class M1,M4 io
class M2,M3 fill:#fffaf0,stroke:#b85c00,color:#7a3b00;
FASTX Stats (fastx-stats)¶
Calculate sequence statistics for FASTA/FASTQ files.
Usage:
---fields: Choose which stats to report
- --format: Output format (tsv, csv, md)
- -c, --circular: Treat sequences as circular for analysis
flowchart TD
S1["Input: FASTA/FASTQ"] --> S2["Parse sequences (polars)"] --> S3["Compute stats<br> (length, GC, N-count, hash)"] --> S4["Output: stats.tsv/csv/md/parquet"]
class S1,S4 io
class S2,S3 fill:#fffaf0,stroke:#b85c00,color:#7a3b00;
Rename Sequences (rename-seqs)¶
Standardize sequence IDs in a FASTA file and generate a mapping table.
Usage:
---prefix: Prefix for new IDs
- --hash/--no-hash: Use hash instead of running number
- --stats/--no-stats: Include sequence stats in mapping
Quick Taxonomy (quick-taxonomy)¶
Fast taxonomic assignment for marker search results or contigs. (Experimental)
Usage:
---marker_results: Optional marker-search results file
- --format: Output format (text, json, tsv)
- --min_score: Minimum score for taxonomy assignment
Fetch SRA (fetch-sra)¶
Download SRA/ENA FASTQ files using ENA API and aria2c/wget.
Usage:
--i, --input: Run accession (e.g., SRR...) or file with accessions (one per line)
- --report: Also download XML report metadata
- Downloads all FASTQ files for a run ID
- Requires aria2c or wget installed
For more details on each command, use rolypoly [COMMAND] --help.