MSigDB 2026.1.Hs - GSEA-MSigDB Documentation

MSigDB v2026.1.Hs (Jan 2026)

Important Notices#

This page describes updates made to the Molecular Signatures Database Human Collections for release 2026.1 (MSigDB 2026.1.Hs).
MSigDB v2026.1 is based on gene annotation data from Ensembl Release 115 (September 2026).
In order to access the MSigBD mouse collections through the GSEA UI, the GSEA 4.3.0 or newer is required.

Updates to Human Collections (MSigDB v2026.1.Hs)#

C1: positional gene sets#

Updated human gene annotations to Ensembl 115.

C2:CGP#

Gene sets from 10 publications have been added to C2:CGP, these gene sets consist of:

Sets describing the Cardioprotective effect of OPA1 transcirptional programs from Fong-McMaster et al.
Sets describing TReg associated prognostic signatures in Breast Cancer from Ma et al.
Sets describing Antibody-dependent cellular phagocytosis in stomach adenocarcinoma from Li et al.
A necropoptosis associated signature in Colorectal cancer from Tan et al.
A prognostic signature associated with Dendritic cells in Pancreatic cancer from Liang et al.
A signature of cancer associated fibroblasts tumor microenvionrment remodeling in lung adenocarcinoma from Gengqiu et al.
Gene sets derived from the BRCAGenie PRS model from Lee et al
A telomere related prognostic signature in head and neck squamous cell carcinoma from Wei et al.
A signature predictive of prehypertension and cardiometabolic risk from Perchard et al.
A signature of post-activated B cells that maintain IL4-STAT6 CD23 signaling and that are diverted from plasma cell differentiation submitted from Pignarre et al.

C2:CP:Reactome#

Reactome gene sets have been updated to reflect the state of the Reactome pathway architecture as of Reactome 95 (+52 gene sets).
As previously described in the Reactome release notes for MSigDB 7.0, in order to limit redundancy between gene sets within the Reactome subcollection we applied a filtering procedure based on Jaccard coefficients and distance from the top level of the Reactome event hierarchy.

C2:CP:WikiPathways#

WikiPathways gene sets have been updated to the January 10, 2026 release (+40 gene sets).

C5:GO (Gene Ontology)#

Gene sets in these subcollections are derived from the controlled vocabulary of the Gene Ontology (GO) project: The Gene Ontology Consortium. Gene Ontology: tool for the unification of biology (Nature Genet 2000). The gene sets are named by GO term and contain genes annotated by that term. This collection has been updated to the most recent GO annotations at the time of release production using annotations in the GO-basic obo file released on 2025-10-10 and NCBI gene2go annotations downloaded on 2025-12-14.

This collection is divided into three subcollections:

BP: GO Biological process (-44 gene sets). Gene sets derived from the Biological Process Ontology; set names are prefixed with GOBP_.
CC: GO Cellular component (+38 gene sets). Gene sets derived from the Cellular Component Ontology; set names are prefixed with GOCC_.
MF: GO Molecular function (+17 gene sets). Gene sets derived from the Molecular Function Ontology; set names are prefixed with GOMF_.

These updates were generated in accordance with the procedure described in the GO release notes for MSigDB 7.0.

C5:HPO (Human Phenotype Ontology)#

Gene sets in this subcollection have been updated to reflect the January 8, 2026 release of the Human Phenotype Ontology database (+45 gene sets). This subcollection has been redundancy filtered through a procedure comparable to that of the GO and Reactome subcollections.

C9: Computational Perturbation Signature Gene Sets (New Addition)#

The initial release of this collection consists of 62 gene sets that represent the coordinate transcriptional responses to cellular dependency on a given specific oncogene, as determined by the following methodology:

We began by generating a CRISPR sensitivity vector using the negative relative cell viability readouts from the DepMap CRISPR screen (release 25Q3, Broad Institute) for the gene of interest. We aligned this dependency data with DepMap global mRNA expression data (CCLE mRNA, release 25Q3), retaining only the 1,106 cell lines present in both datasets. The overlapping samples were randomly partitioned into a Training Set (66%) and a Test Set (34%). Using an information-theoretic association metric, we quantified the association between the CRISPR dependency target and the mRNA expression of every gene in the training set. This process ranked genes based on how closely their expression aligned with the specific gene dependency. Subsequently, we performed a gene set optimization procedure. Using varying numbers of the top ranked genes we constructed candidate gene signatures. We then computed the association of their ssGSEA-derived enrichment scores against the CRISPR target, and identified the gene set size that maximized the association score in the training data. This procedure defined the final genetic perturbation-based gene set the gene of interest. Finally, we validated the gene set against the target in the test set to ensure it exceeded a minimum association threshold (0.20).

The initial cohort of genes selected for this procedure consisted of oncogenes annotated with at least 4 sources of evidence in the OncoKB database.

CHIP file updates#

MSigDB 2026.1.Hs gene annotations and gene mapping CHIP files have been updated to data from Ensembl 115.
Gene orthology annotations for mapping mouse and rat genes to their best match mouse orthologs have been updated to Alliance of Genome Resources orthology database release 8.1.0 (2026-04-18)