MSigDB v2026.1.Hs (Jan 2026)

Important Notices#

Updates to Human Collections (MSigDB v2026.1.Hs)#

C1: positional gene sets#

Updated human gene annotations to Ensembl 115.

C2:CGP#

Gene sets from 10 publications have been added to C2:CGP, these gene sets consist of:

C2:CP:Reactome#

C2:CP:WikiPathways#

WikiPathways gene sets have been updated to the January 10, 2026 release (+40 gene sets).

C5:GO (Gene Ontology)#

Gene sets in these subcollections are derived from the controlled vocabulary of the Gene Ontology (GO) project: The Gene Ontology Consortium. Gene Ontology: tool for the unification of biology (Nature Genet 2000). The gene sets are named by GO term and contain genes annotated by that term. This collection has been updated to the most recent GO annotations at the time of release production using annotations in the GO-basic obo file released on 2025-10-10 and NCBI gene2go annotations downloaded on 2025-12-14.

This collection is divided into three subcollections:

These updates were generated in accordance with the procedure described in the GO release notes for MSigDB 7.0.

C5:HPO (Human Phenotype Ontology)#

Gene sets in this subcollection have been updated to reflect the January 8, 2026 release of the Human Phenotype Ontology database (+45 gene sets). This subcollection has been redundancy filtered through a procedure comparable to that of the GO and Reactome subcollections.

C9: Computational Perturbation Signature Gene Sets (New Addition)#

The initial release of this collection consists of 62 gene sets that represent the coordinate transcriptional responses to cellular dependency on a given specific oncogene, as determined by the following methodology:

We began by generating a CRISPR sensitivity vector using the negative relative cell viability readouts from the DepMap CRISPR screen (release 25Q3, Broad Institute) for the gene of interest. We aligned this dependency data with DepMap global mRNA expression data (CCLE mRNA, release 25Q3), retaining only the 1,106 cell lines present in both datasets. The overlapping samples were randomly partitioned into a Training Set (66%) and a Test Set (34%). Using an information-theoretic association metric, we quantified the association between the CRISPR dependency target and the mRNA expression of every gene in the training set. This process ranked genes based on how closely their expression aligned with the specific gene dependency. Subsequently, we performed a gene set optimization procedure. Using varying numbers of the top ranked genes we constructed candidate gene signatures. We then computed the association of their ssGSEA-derived enrichment scores against the CRISPR target, and identified the gene set size that maximized the association score in the training data. This procedure defined the final genetic perturbation-based gene set the gene of interest. Finally, we validated the gene set against the target in the test set to ensure it exceeded a minimum association threshold (0.20).

The initial cohort of genes selected for this procedure consisted of oncogenes annotated with at least 4 sources of evidence in the OncoKB database.

CHIP file updates#