Skip to content

Contributing to RolyPoly

Contributions welcome! Whether it's bug fixes, new features, documentation improvements, packaging, tests, or reports of your exprience / resource usage for your samples - all help is appreciated. Pull requests or forks are the preferred way to contribute and will be considered, and you can also open issues for discussion or contact one of the developers directly.

Project Roadmap & TODO List

Check out our project roadmap and TODO list to see what features and improvements are planned.

Contribution guidelines

  • Primary Language: Python >=3.10
  • Secondary Languages: Some system calls to shell/Bash are allowed.
  • Dependency Management: via pixi (development)
  • Prefer using existing dependencies over adding new ones.
  • Avoid pandas, and use polars
  • Avoid biopython if possible, check if an existing feature is implemented in src/rolypoly/utils/bio/* or use polars-bio.
  • lazy / eager: the CLI commands are lazy evaluted (see /src/rolypoly/utils/lazy_group.py), and need to be explictly added to the src/rolypoly/rolypoly.py file. This makes debugging/tracing slightly harder, but it also isolates the commands, so we can break one of them without worrying on it effect it may have on others.

Code Organization

  1. File Structure:
  2. src/rolypoly/utils/: Utility functions and helpers
  3. src/rolypoly/commands/: Command-line interface modules (using click).
  4. rolypoly.utils.various for general-purpose functions that don't fit into other categories (e.g. dataframe operations)
  5. rolypoly.utils.logging for logging, configuration, output tracking etc

  6. Naming Conventions:

  7. CLI arguments: No positional arguments unless absolutly necceray. Instead, prefer 'decalerd' and explict named arguemnt. Must support both short and long options, e.g. -s and --skip-existing. Optionally, provide support for json file (--config config.json) or json string (--override-params '{"skip_existing": true}).
  8. Functions and Internal variables: Snake case (e.g., skip_existing). Try and reuse variable names from other commands for the same purpose. Long descriptive names are ok.
  9. Classes: PascalCase (though use classes sparingly).
  10. Environment or Global variables: UPPERCASE or CamelCase.
  11. Avoid "_" prefix for "private" functions. if somthing is explictly not meant to ever at all be used outside its scope, that should be in a comment or docstring, but in generally we want to avoid these and there shouldn't be "private" breaking stuff.

2.1 Docstrings: - Add a docstring to all user-facing command functions (click entry points) and reusable utility functions. - Keep simple helpers concise (one-line docstring is fine). - For non-trivial logic, use a multi-line docstring and include sections like Args, Returns, Raises, and Note when useful. - Prefer the same style already used in the codebase (plain-language summary first, then structured sections if needed). - Do not remove existing docstrings unless they are incorrect; update them when behavior or parameters change. - Module-level docstrings are recommended for larger utility modules, especially when they contain multiple related functions.

  1. Temporary Files:
  2. Optionally, create temp directory (hidden argument in some commands --temp-dir, if not specified it's within user's output path).
  3. When done, move only final output files to user's output path, or rename the temp-dir if it's easier (same parent path maybe).
  4. Try to clean up tmp files unless --keep-tmp flag is used.

  5. Calling external tools:

  6. Use rolypoly.utils.command_runner.run_command_comp() to run external commands.
  7. If that is not possible, use subprocess.run().

  8. Shared Code:

  9. Avoid creating intermediate helper modules in commands/ - utilities belong in utils/
  10. Place reusable functions in appropriate utils/ subdirectories (e.g., utils/bio/ for biological sequence operations)
  11. Check existing utilities before implementing new functionality

Testing & Benchmarking

  1. Testing:
  2. Add tests under src/tests/*.
  3. Prefer pytest for new tests, and keep command smoke tests in src/tests/test_cli_contracts.py with scenarios in src/tests/cli_scenarios.json.
  4. For most (ideally all) click commands, include a hidden log-level option so tests can consistently enable debug logging:
    • @click.option("-ll", "--log-level", hidden=True, default="INFO", help="Log level")
  5. Use small/local fixtures from testing_folder/ when possible.
  6. You can also use /clusterfs/jgi/scratch/science/metagen/neri/tests/rp_tests/ (on dori), which contains larger example data for different commands.
  7. Run standardized CLI tests: pixi run -e dev pytest -q src/tests/test_cli_contracts.py
  8. Run fast help-only smoke tests (just --help for top-level + each command): pixi run -e dev pytest -q src/tests/test_cli_help_smoke.py
  9. Run one command's scenarios: pixi run -e dev pytest -q src/tests/test_cli_contracts.py --cli-commands fetch-sra
  10. Run multiple commands' scenarios: pixi run -e dev pytest -q src/tests/test_cli_contracts.py --cli-commands annotate,assemble,marker-search
  11. Run specific scenario IDs: pixi run -e dev pytest -q src/tests/test_cli_contracts.py --cli-scenarios marker_search_runtime_genomad,assemble_megahit_runtime
  12. Run by text match (id/description/command): pixi run -e dev pytest -q src/tests/test_cli_contracts.py --cli-match fetch,identify
  13. Environment-variable based selection (useful in CI/shell scripts):
    • RP_CLI_COMMANDS=fetch-sra,marker-search pixi run -e dev pytest -q src/tests/test_cli_contracts.py
    • RP_CLI_SCENARIOS=assemble_megahit_runtime pixi run -e dev pytest -q src/tests/test_cli_contracts.py
    • RP_CLI_MATCH=identify pixi run -e dev pytest -q src/tests/test_cli_contracts.py
  14. Run all command scenarios + unit tests: pixi run -e dev pytest -q src/tests
  15. Run all tests: pixi run -e dev pytest -q src/tests
  16. Legacy ad-hoc scripts under testing_folder/*.sh are still useful for manual debugging, but new command validation should be added to the pytest flow above.
  17. Benchmarking:
  18. Use /usr/bin/time for resource monitoring. Alternatively, hyperfine is great too but. Ideallt - use SLURM and keep track of the job IDs for later analysis with seff/pyseff.

PyPI / TestPyPI release automation

Releases are automated via GitHub Actions using trusted publishing (OIDC), with this flow: 1. Build wheel + sdist and run help-smoke tests (src/tests/test_cli_help_smoke.py) 2. Validate package metadata with twine check 3. Publish to TestPyPI 4. Install from TestPyPI and run import + CLI smoke check 5. Publish the same artifacts to PyPI

Version bumping is manual and happens before CI publishing.

Workflow file: .github/workflows/pypi-release.yml

The workflow uses concurrency cancellation per ref (pypi-release-${ref}), so newer runs automatically cancel older in-progress runs on the same branch/tag.

One-time setup (maintainers)

Configure trusted publishers in both PyPI and TestPyPI for project rolypoly-tk: - Owner/repo: UriNeri/rolypoly - Workflow name: pypi-release.yml - Environment names: testpypi and pypi

Use environment protection rules in GitHub for safer releases (recommended): - testpypi: optional reviewers - pypi: required reviewer(s)

Triggering releases

  • Primary path: push to deployment branch release (this triggers build/test/publish workflow)
  • Optional: create/publish a GitHub Release (also triggers workflow)
  • Optional: run the workflow manually (workflow_dispatch) for dry-runs/testing

One-command release prep (manual bump + commit + trigger CI)

Use the pixi task: - pixi run -e dev bump-commit-publish

This task runs src/setup/bump_commit_publish.sh and by default: - bumps version in src/rolypoly/__init__.py (micro by default; or major/minor/explicit X.Y.Z) - refreshes src/setup/env_big.yaml from pixi workspace export conda-environment -e complete, with cleanup for micromamba compatibility - runs help-smoke tests locally - commits src/rolypoly/__init__.py and src/setup/env_big.yaml - pushes to origin/release to trigger GitHub Actions publish flow

Common options: - pixi run -e dev bump-commit-publish -- --bump minor - pixi run -e dev bump-commit-publish -- --bump 0.7.0 - pixi run -e dev bump-commit-publish -- --branch release --remote origin - pixi run -e dev bump-commit-publish -- --skip-smoke

Local fallback (manual upload)

If needed, manual upload with twine is still supported: - Build: pixi run -e dev python -m build --sdist --wheel --outdir dist - Check: pixi run -e dev twine check dist/* - Upload: pixi run -e dev twine upload dist/* --verbose

Example Workflow: Adding a New Command

Here's a high-level workflow for adding a new command to RolyPoly:

  1. Check for existing utilities: Search src/rolypoly/utils/ for existing functions that might help (especially utils/bio/ for sequence operations)

  2. Create the command file: Add your command in the appropriate subdirectory under src/rolypoly/commands/ (e.g., commands/misc/my_command.py)

  3. Use @click.command() decorator
  4. Follow naming conventions (short + long options, snake_case for parameters)
  5. Import and reuse existing utilities from utils/ where possible

  6. Add shared utilities if needed: If you create reusable functions, place them in src/rolypoly/utils/ (NOT in commands/)

  7. Use existing modules when appropriate (e.g., utils/bio/polars_fastx.py for FASTA/FASTQ operations)

  8. Register the command: CRITICAL - Add your command to src/rolypoly/rolypoly.py in the appropriate lazy_subcommands group

  9. Format: "command-name": "rolypoly.commands.subdir.my_command.my_command_function"
  10. The command won't appear in the CLI without this step!

  11. Test the command:

  12. Run pixi run rolypoly <command-name> --help to verify it loads
  13. Test with actual data
  14. Add test cases to src/tests/ if appropriate

  15. Document: Update help strings and consider adding examples to README or docs

  16. Add a markdown file in the appropriate location under docs/
  17. Update the mkdocs.yml configuration file to include your new documentation
  18. Add to the index or relevant navigation section if needed

Note

This project is governed under the LBNL IP office. By contributing, you agree that your contributions will be subject to the terms of the GPLv3 license.