MSigDB v2023.2.Hs (Oct 2023)
Important Notices#
-
This page describes updates made to the Molecular Signatures Database Human Collections for release 2023.2 (MSigDB 2023.2.Hs).
-
MSigDB v2023.2 is based on gene annotation data from Ensembl Release 110 (July 2023).
-
In order to access the MSigBD mouse collections through the GSEA UI, the GSEA 4.3.0 or newer is required.
New Subcollections in MSigDB 2023.2.Hs#
C2:CP:KEGG_MEDICUS#
MSigDB 2023.2 contains a new subcollection of gene sets curated from the KEGG MEDICUS database. KEGG MEDICUS contains annotated biological networks of relevance for research into human disease, including "reference" pathways, as well as annotated "variant", "pathogen" and "env factor" (environmental factor) pathways. These gene sets are named with prefix KEGG_MEDICUS_REFERENCE_
, KEGG_MEDICUS_VARIANT_
, KEGG_MEDICUS_PATHOGEN_
, and KEGG_MEDICUS_ENV_FACTOR_
respectively. Note that while this collection was filtered to remove identical sets, this initial release did not undergo the same similarity based redundancy filtering that is done for GO and Reactome collection. This may result in highly similar sets continuing to larger than expected FDRs when used with the GSEA competitive enrichment test. We will continue to evaluate the performance of this collection and if necessary adopt more stringent redundancy filtering in future releases. This subcollection initially contains 619 gene sets derived from the KEGG MEDICUS Network database retrieved on 2023-09-26, and are released under special licensing terms; see the MSigDB license page for details.
Note that as part of the addition of the KEGG_MEDICUS subcollection, the older C2:CP:KEGG subcollection has been renamed to C2:CP:KEGG_LEGACY as we recommend using the newer KEGG_MEDICUS sets.
C4:3CA#
MSigDB 2023.2 contains a new subcollection of gene sets curated from the Curated Cancer Cell Atlas (3CA) project. This collection consists of NMF dervied metaprograms annotaed for various tumor and non-tumor specific biological processes. Gene sets are prefixed with either GAVISH_3CA_METAPROGRAM_
or GAVISH_3CA_MALIGNANT_METAPROGRAM_
to distinguish whether the gene set describes a molecular state identified in tumor (malignant) or non-tumor cells.
This initial release contains 149 gene sets. See the Curated Cancel Cell Atlas website for more details on the source studies, dataset curation, or meta-program computational methodology.
Updates to Human Collections (MSigDB v2023.2.Hs)#
C1: positional gene sets#
Updated human gene annotations to Ensembl 110 (+1 gene set).
C2:CGP#
33 Gene sets have been added to C2:CGP, these gene sets consist of:
- The
NABA_
gene sets that previously resided in the C2:CP subcollection have been moved into C2:CGP. This makes them more accessible for inclusion with other CGP sets for GSEA analyses. - New contibutations of
NABA_MATRISOME_
,DI_MARTINO_MATRISOME_
, andHEBERT_MATRISOME_
prefixed gene sets describing the extracellular matrix composition of various tumor microenvironments. - 15 gene sets from the EstroGene Database (prefixed
LI_ESTROGENE_
), see also Li et al. 2023. - 3 gene sets describing contextual SOX10 target genes from Purwin et al. 2023
C2:CP:Reactome#
- Reactome gene sets have been updated to reflect the state of the Reactome pathway architecture as of Reactome v86 (+38 gene sets).
- As previously described in the Reactome release notes for MSigDB 7.0, in order to limit redundancy between gene sets within the Reactome subcollection we applied a filtering procedure based on Jaccard coefficients and distance from the top level of the Reactome event hierarchy.
C2:CP:WikiPathways#
WikiPathways gene sets have been updated to the October 10, 2023 release (+58 gene sets).
C5:GO (Gene Ontology)#
Gene sets in these subcollections are derived from the controlled vocabulary of the Gene Ontology (GO) project: The Gene Ontology Consortium. Gene Ontology: tool for the unification of biology (Nature Genet 2000). The gene sets are named by GO term and contain genes annotated by that term. This collection has been updated to the most recent GO annotations as present in the GO-basic obo file released on 2023-07-27 and NCBI gene2go annotations downloaded on 2023-09-01.
This collection is divided into three subcollections:
- BP: GO Biological process (-104 gene sets). Gene sets derived from the Biological Process Ontology; set names are prefixed with
GOBP_
. - CC: GO Cellular component (+6 gene sets). Gene sets derived from the Cellular Component Ontology; set names are prefixed with
GOCC_
. - MF: GO Molecular function (+27 gene sets). Gene sets derived from the Molecular Function Ontology; set names are prefixed with
GOMF_
.
These updates were generated in accordance with the procedure described in the GO release notes for MSigDB 7.0.
C5:HPO (Human Phenotype Ontology)#
Gene sets in this subcollection have been updated to reflect the 2023-09-01 release of the Human Phenotype Ontology database (+142 gene sets). This subcollection has been redundancy filtered through a procedure comparable to that of the GO and Reactome subcollections.
CHIP file updates#
- MSigDB 2023.2.Hs gene annotations and gene mapping CHIP files have been updated to data from Ensembl 110.
- Gene orthology annotations for mapping mouse and rat genes to their best match human orthologs have been updated to Alliance of Genome Resources orthology database release 5.4 (2023-04-24)