MSigDB FAQ

MSigDB Gene Sets#

1. How do I use MSigDB gene sets with different species?#

MSigDB is divided into two components: a Human Database and a Mouse Database. Each database is provided in the approved gene symbols of their respective species, and mapping (chip) files are provided to translate various forms of gene identifiers to those approved gene symbols using the GSEA application's "Collapse/Remap Dataset" function.

In order to use data from a species that is not the same as the native species of the specific database a chip file that provides orthologous of genes from the organism of interest to gene symbols appropriate to the specific database can instead be provided to the "Collapse/Remap Dataset function". We provide chip files that contain these ortholog mappings for Mouse and Rat for the Human Database, and Human and Rat for the Mouse Database. For MSigDB Webtools, this function is built into the species selector.

For species other than Mouse and Rat we recommend using data from the Alliance of Genome Resources or Ensembl's orthology tables to construct custom CHIP files for mapping your dataset. Each MSigDB version has a corresponding Ensembl version from which the specific gene symbols in MSigDB were obtained. This version information is available on the MSigDB release notes page. When constructing a custom chip file, the gene symbols that the chip file provides mappings to should match the gene symbols in this targeted version.

2. How do I find out more information about a particular MSigDB gene set?#

Each gene set in the MSigDB (Molecular Signature Database) is fully described by a gene set page. These pages can be navigated using the tools on the MSigDB website. Additionally, the gene set names on the GSEA results pages link directly to the applicable gene set page in MSigDB. Alternatively, a Google search on the gene set name also displays a link to the gene set page.

3. How can I view/access gene sets from previous releases of MSigDB?#

The data files from previous releases of MSigDB are archived and are still available for download on the Downloads page. Scroll down to the bottom of the page to the 'Archived releases' section.

4. Should I run GSEA on one or multiple MSigDB collections?#

We recommend running GSEA in individual collections, or even sub-collections of gene sets rather that on the entire MSigDB. Using individual collections saves time and produces more optimistic FDR q-values because GSEA has fewer gene sets to test. In addition, the results will be easier to interpret as they will focus your attention on particular kind of gene sets. We have grouped gene sets according to their derivation into collections precisely for these reasons, and provide general suggestions for their usage here: MSigDB collections.

5. How can I submit a gene set to MSigDB?#

The GSEA team welcomes gene set contributions and encourages users to submit novel gene sets. Generally, in order to be considered for inclusion in MSigDB, a gene set submission should be associated with a peer-reviewed publication. In order to request additional information on submitting a gene set to MSigDB, please email genesets@broadinstitute.org.