Contributing to RolyPoly¶
Contributions welcome! Whether it's bug fixes, new features, documentation improvements, packaging, tests, or reports of your exprience / resource usage for your samples - all help is appreciated. Pull requests or forks are the preferred way to contribute and will be considered, and you can also open issues for discussion or contact one of the developers directly.
Project Roadmap & TODO List¶
Check out our project roadmap and TODO list to see what features and improvements are planned.
Contribution guidelines¶
- Primary Language: Python >=3.10
- Secondary Languages: Some system calls to shell/Bash are allowed.
- Dependency Management: via pixi (development)
- Prefer using existing dependencies over adding new ones.
- Avoid pandas, and use polars
- Avoid biopython if possible, check if an existing feature is implemented in
src/rolypoly/utils/bio/*or use polars-bio. - lazy / eager: the CLI commands are lazy evaluted (see
/src/rolypoly/utils/lazy_group.py), and need to be explictly added to thesrc/rolypoly/rolypoly.pyfile. This makes debugging/tracing slightly harder, but it also isolates the commands, so we can break one of them without worrying on it effect it may have on others.
Code Organization¶
- File Structure:
src/rolypoly/utils/: Utility functions and helperssrc/rolypoly/commands/: Command-line interface modules (using click).rolypoly.utils.variousfor general-purpose functions that don't fit into other categories (e.g. dataframe operations)-
rolypoly.utils.loggingfor logging, configuration, output tracking etc -
Naming Conventions:
- CLI arguments: No positional arguments unless absolutly necceray. Instead, prefer 'decalerd' and explict named arguemnt. Must support both short and long options, e.g.
-sand--skip-existing. Optionally, provide support for json file (--config config.json) or json string (--override-params '{"skip_existing": true}). - Functions and Internal variables: Snake case (e.g.,
skip_existing). Try and reuse variable names from other commands for the same purpose. Long descriptive names are ok. - Classes: PascalCase (though use classes sparingly).
- Environment or Global variables: UPPERCASE or CamelCase.
- Avoid "_" prefix for "private" functions. if somthing is explictly not meant to ever at all be used outside its scope, that should be in a comment or docstring, but in generally we want to avoid these and there shouldn't be "private" breaking stuff.
2.1 Docstrings:
- Add a docstring to all user-facing command functions (click entry points) and reusable utility functions.
- Keep simple helpers concise (one-line docstring is fine).
- For non-trivial logic, use a multi-line docstring and include sections like Args, Returns, Raises, and Note when useful.
- Prefer the same style already used in the codebase (plain-language summary first, then structured sections if needed).
- Do not remove existing docstrings unless they are incorrect; update them when behavior or parameters change.
- Module-level docstrings are recommended for larger utility modules, especially when they contain multiple related functions.
- Temporary Files:
- Optionally, create temp directory (hidden argument in some commands
--temp-dir, if not specified it's within user's output path). - When done, move only final output files to user's output path, or rename the temp-dir if it's easier (same parent path maybe).
-
Try to clean up tmp files unless
--keep-tmpflag is used. -
Calling external tools:
- Use
rolypoly.utils.command_runner.run_command_comp()to run external commands. -
If that is not possible, use
subprocess.run(). -
Shared Code:
- Avoid creating intermediate helper modules in
commands/- utilities belong inutils/ - Place reusable functions in appropriate
utils/subdirectories (e.g.,utils/bio/for biological sequence operations) - Check existing utilities before implementing new functionality
Testing & Benchmarking¶
- Testing:
- Add tests under
src/tests/*. - Prefer
pytestfor new tests, and keep command smoke tests insrc/tests/test_cli_contracts.pywith scenarios insrc/tests/cli_scenarios.json. - For most (ideally all) click commands, include a hidden log-level option so tests can consistently enable debug logging:
@click.option("-ll", "--log-level", hidden=True, default="INFO", help="Log level")
- Use small/local fixtures from
testing_folder/when possible. - You can also use
/clusterfs/jgi/scratch/science/metagen/neri/tests/rp_tests/(on dori), which contains larger example data for different commands. - Run standardized CLI tests:
pixi run -e dev pytest -q src/tests/test_cli_contracts.py - Run fast help-only smoke tests (just
--helpfor top-level + each command):pixi run -e dev pytest -q src/tests/test_cli_help_smoke.py - Run one command's scenarios:
pixi run -e dev pytest -q src/tests/test_cli_contracts.py --cli-commands fetch-sra - Run multiple commands' scenarios:
pixi run -e dev pytest -q src/tests/test_cli_contracts.py --cli-commands annotate,assemble,marker-search - Run specific scenario IDs:
pixi run -e dev pytest -q src/tests/test_cli_contracts.py --cli-scenarios marker_search_runtime_genomad,assemble_megahit_runtime - Run by text match (id/description/command):
pixi run -e dev pytest -q src/tests/test_cli_contracts.py --cli-match fetch,identify - Environment-variable based selection (useful in CI/shell scripts):
RP_CLI_COMMANDS=fetch-sra,marker-search pixi run -e dev pytest -q src/tests/test_cli_contracts.pyRP_CLI_SCENARIOS=assemble_megahit_runtime pixi run -e dev pytest -q src/tests/test_cli_contracts.pyRP_CLI_MATCH=identify pixi run -e dev pytest -q src/tests/test_cli_contracts.py
- Run all command scenarios + unit tests:
pixi run -e dev pytest -q src/tests - Run all tests:
pixi run -e dev pytest -q src/tests - Legacy ad-hoc scripts under
testing_folder/*.share still useful for manual debugging, but new command validation should be added to the pytest flow above. - Benchmarking:
- Use
/usr/bin/timefor resource monitoring. Alternatively, hyperfine is great too but. Ideallt - use SLURM and keep track of the job IDs for later analysis with seff/pyseff.
PyPI / TestPyPI release automation¶
Releases are automated via GitHub Actions using trusted publishing (OIDC), with this flow:
1. Build wheel + sdist and run help-smoke tests (src/tests/test_cli_help_smoke.py)
2. Validate package metadata with twine check
3. Publish to TestPyPI
4. Install from TestPyPI and run import + CLI smoke check
5. Publish the same artifacts to PyPI
Version bumping is manual and happens before CI publishing.
Workflow file: .github/workflows/pypi-release.yml
The workflow uses concurrency cancellation per ref (pypi-release-${ref}), so newer runs automatically cancel older in-progress runs on the same branch/tag.
One-time setup (maintainers)¶
Configure trusted publishers in both PyPI and TestPyPI for project rolypoly-tk:
- Owner/repo: UriNeri/rolypoly
- Workflow name: pypi-release.yml
- Environment names: testpypi and pypi
Use environment protection rules in GitHub for safer releases (recommended):
- testpypi: optional reviewers
- pypi: required reviewer(s)
Triggering releases¶
- Primary path: push to deployment branch
release(this triggers build/test/publish workflow) - Optional: create/publish a GitHub Release (also triggers workflow)
- Optional: run the workflow manually (
workflow_dispatch) for dry-runs/testing
One-command release prep (manual bump + commit + trigger CI)¶
Use the pixi task:
- pixi run -e dev bump-commit-publish
This task runs src/setup/bump_commit_publish.sh and by default:
- bumps version in src/rolypoly/__init__.py (micro by default; or major/minor/explicit X.Y.Z)
- refreshes src/setup/env_big.yaml from pixi workspace export conda-environment -e complete, with cleanup for micromamba compatibility
- runs help-smoke tests locally
- commits src/rolypoly/__init__.py and src/setup/env_big.yaml
- pushes to origin/release to trigger GitHub Actions publish flow
Common options:
- pixi run -e dev bump-commit-publish -- --bump minor
- pixi run -e dev bump-commit-publish -- --bump 0.7.0
- pixi run -e dev bump-commit-publish -- --branch release --remote origin
- pixi run -e dev bump-commit-publish -- --skip-smoke
Local fallback (manual upload)¶
If needed, manual upload with twine is still supported:
- Build: pixi run -e dev python -m build --sdist --wheel --outdir dist
- Check: pixi run -e dev twine check dist/*
- Upload: pixi run -e dev twine upload dist/* --verbose
Example Workflow: Adding a New Command¶
Here's a high-level workflow for adding a new command to RolyPoly:
-
Check for existing utilities: Search
src/rolypoly/utils/for existing functions that might help (especiallyutils/bio/for sequence operations) -
Create the command file: Add your command in the appropriate subdirectory under
src/rolypoly/commands/(e.g.,commands/misc/my_command.py) - Use
@click.command()decorator - Follow naming conventions (short + long options, snake_case for parameters)
-
Import and reuse existing utilities from
utils/where possible -
Add shared utilities if needed: If you create reusable functions, place them in
src/rolypoly/utils/(NOT incommands/) -
Use existing modules when appropriate (e.g.,
utils/bio/polars_fastx.pyfor FASTA/FASTQ operations) -
Register the command: CRITICAL - Add your command to
src/rolypoly/rolypoly.pyin the appropriate lazy_subcommands group - Format:
"command-name": "rolypoly.commands.subdir.my_command.my_command_function" -
The command won't appear in the CLI without this step!
-
Test the command:
- Run
pixi run rolypoly <command-name> --helpto verify it loads - Test with actual data
-
Add test cases to
src/tests/if appropriate -
Document: Update help strings and consider adding examples to README or docs
- Add a markdown file in the appropriate location under
docs/ - Update the
mkdocs.ymlconfiguration file to include your new documentation - Add to the index or relevant navigation section if needed
Note¶
This project is governed under the LBNL IP office. By contributing, you agree that your contributions will be subject to the terms of the GPLv3 license.