New to the team? Start here.
Name | Role | GitHub |
---|---|---|
Jeff Korte | Product Owner | @JeffKorte |
Quazi Hoque | Software Engineer | @quazi-broad |
Drew Herbst | Tech Lead | @aherbst-broad |
- DSP Monsters - Team for repositories under the
broadinstitute
org - Monster - Team for repositories under the
DataBiosphere
org
Linked Data definitions for the Terra Core Data Model, with extensions for unmodeled datasets.
- TerraCore Data Model - Data Model definitions and examples
Pipelines for moving data into the Jade Data Repository.
- ClinVar - ETL pipeline for the ClinVar dataset
- ENCODE - ETL pipeline for the ENCODE dataset
- Dog Aging - ETL pipeline for the Dog Aging Project dataset
- HCA - ETL pipeline for the HCA
Tools and libraries used to support the top-level ingest pipelines.
- Base utilities - Common utilities shared across our batch ETL projects
- XML-to-JSON-list - Command-line tool for mechanical conversion of XML into Beam-friendly JSON
Infrastructure, configuration, and shared code used to manage developing and deploying our services.
- Helm charts - Custom Helm charts for pieces of Monster infrastructure
- Core deployments - Terraform modules, Helm releases, and deploy scripts for Monster's GCP environments
- setup-chart-releaser - GitHub Action to install Chart Releaser
The repositories in this section are still being used, but we're trying to move away from them.
Our first stabs at data ingest envisioned a framework of dataset-agnostic services. We shifted away from that pattern because it introduced significant overhead vs. custom pipelines using common command-line tools.
- Transporter - Bulk file-transfer system
- Storage Libs - Utility libraries for I/O against external storage systems