MSigDB v7.3 (Mar 2021)

This release includes a reorganization of C7 to accommodate the addition of vaccination response gene sets provided by the Human Immunology Project Consortium among other minor updates and additions.

Note: Due to substantial changes introduced in MSigDB 7.0, using GSEA 4.0.0+ is recommended when utilizing MSigDB 7.0+ resources.

Advisory: It is strongly recommended that users of MSigDB 7.3 always use the GSEA "Collapse/Remap to gene symbols" feature with the provided Symbol Remapping chip file if your dataset was generated with a transcriptome other than Ensembl v103/GENCODE v37.

New Additions and Changes to Collection Organization#

C2:CGP#

Gene sets describing the molecular effect of over expression of S1PR3 in Leukemia (PMID33458693), and signatures describing the effects of anti-TNF therapy on inflammatory bowel disease PMID33429950 as well as gene sets contributed by the following individuals have been added to C2:CGP

C7: Immunologic Signature Gene Sets#

C8: Cell Type Signature Gene Sets#

333 Gene sets of single-cell sequencing derived cell identity signatures have been added to C8. These consist of:

"Filtered by Similarity" Annotations#

Gene set sub-collections updated in this release that have undergone redundancy filtering for inclusion in MSigDB now have an additional field on the gene set page "Filtered by similarity". This field contains the source database IDs of other candidate gene sets that clustered with the selected set by Jaccard similarity coefficient, and exhibited Jaccard coefficients >0.85 with the selected set but were filtered out of the collection on the basis of tree distance or set size. These database IDs link to the source resource's page for that identifier as in the EXTERNAL_DETAILS_URL field.

Updates to Existing Gene Sets by Collection#

C1 (Positional Gene Sets)#

C1 has been updated to reflect the primary assembly of the current release of the Human Genome as present in Ensembl 103 and GENCODE 37 (GRCh38). Gene annotations for this collection are derived from the Chromosome and Karyotype band tracks from the Ensembl BioMart (version 103) and reflect the gene architecture as represented on the primary assembly.

C2:CP:Reactome#

C2:CP:WikiPathways#

WikiPathways gene sets have been updated to reflect the state of WikiPathways Release 20210310 (+28 gene sets).

C3 Regulatory Target Gene Sets#

C3:GTRD has been updated to GTRD v20.06 (+175 gene sets), this additionally corrects an error where data from certain transcription factors with short promoter regions may have been omitted.

C5:GO (Gene Ontology)#

Gene sets in these sub-collections are derived from the controlled vocabulary of the Gene Ontology (GO) project: The Gene Ontology Consortium. Gene Ontology: tool for the unification of biology (Nature Genet 2000). The gene sets are named by GO term and contain genes annotated by that term. This collection has been updated to the most recent GO annotations as present in the GO-basic obo file released on 2021-02-01 and NCBI gene2go annotations downloaded on 2021-02-16.

This collection is divided into three sub-collections:

Gene sets in GO sub-collection previously had the universal prefix "GO_", this prefix has been updated to be sub-collection specific. Gene sets in GO:BP now begin with "GOBP_", GO:CC now begin with "GOCC_", and GO:MF now begin with "GOMF_". This change should enable better "at a glance" determinations of which GO sub-collection was the origin of a specific gene set hit in analysis pipelines.

These updates were generated in accordance with the procedure described in the MSigDB v7.0 Release Notes.

C5:HPO (Human Phenotype Ontology)#

Gene sets in this sub-collection have been updated to reflect the 2021-02-09 release of the Human Phenotype Ontology database (+319 gene sets). This sub-collection has been redundancy filtered through a procedure comparable to that of the GO and Reactome sub-collections.

CHIP File Updates#

All CHIP files previously provided in the standard MSigDB 7.2 release have been updated for MSigDB 7.3 in accordance with previously described procedures.

Gene orthology annotations for mapping mouse and rat genes to their best match human orthologs have been updated to Alliance of Genome Resources orthology database release 3.2. Genes with no ortholog listed in the Alliance Orthology file, and genes with multiple orthologs are now processed using additional information from Ensembl's orthology table using a procedure adapted from the procedure utilized in MSigDB 7.0. to enable the selection of additional best-match orthologs.