NCBI ALFA Tutorial (ASHG 2020): Aggregating Allele Frequencies for Better Variant Interpretation

📅 Published in Wednesday, September 17 of 2025

Source: National Library of Medicine / NCBI • ASHG 2020 CoLab (video tutorial)

ALFA (Allele Frequency Aggregator) is an NCBI initiative to improve our understanding of human sequence variation by combining aggregate data from multiple dbGaP studies into a comprehensive catalogue of common and clinical variants. The project applies FAIR principles (findable, accessible, interoperable, reusable), integrates with dbSNP, and is accessible via web, API, and FTP.

In its early releases, ALFA processed tens of thousands of subjects and hundreds of billions of genotypes to produce hundreds of millions of unique variants, enabling consistency checks across datasets (e.g., 1000 Genomes, gnomAD, TOPMed) and expanding coverage for both common and rare variants. Subsequent releases continue to grow, including data from underrepresented populations.

What You’ll Learn from the Tutorial

Searching dbSNP with ALFA data: Example with BRCA1 (rsIDs, genomic location, clinical significance, MAF, functional annotations), and using preset filters (e.g., pathogenic in ClinVar + ALFA frequency) to focus on rare/pathogenic variants.
RefSNP frequency tab: View allele counts/frequencies by project and subpopulation; sort, filter, and compare across datasets (1000 Genomes, gnomAD, etc.).
Programmatic access: Use NCBI APIs & Jupyter notebooks (Binder) to retrieve frequencies by gene (e.g., TP53), by rsID, or by interval; query ALFA VCF on NCBI FTP (tabix) and download tab-delimited tables.
FAIR & interoperability: Normalized, standard formats; integration with dbSNP releases so existing pipelines can reuse ALFA seamlessly.

💡 Tip: Start at the project page to access documentation, handouts (factsheets), example queries, and the API notebooks.

👉 Explore the NCBI ALFA project and tutorials