GSEA Desktop v3.0 (Jul 2017)
Notes#
- This release is open-source on GitHub under a BSD-style license.
- See the Using GSEA v3.0 Features section below for a brief introduction and guide.
- The algorithm in this version of GSEA is unchanged from the former code line and produces results equivalent to the previous version (v2.x).
- GSEA Desktop v3.0 requires Java 8.
New Features#
- Updated for the MSigDB v6.0 open-access release.
- The Enrichment Map integration has been updated to work with current versions of Cytoscape (v3.3.0 and newer).
- Plots and other results can now be saved in the SVG format, which provides improved resolution suitable for publication, better post-analysis editing options, etc. Use the ‘Create SVG plot images’ parameter in the Advanced Fields.
- MSigDB XML Browser Beta 1 has been released as a separate application to streamline the GSEA Desktop application. See the Downloads page. Another option for investigating genesets is the online MSigDB browser.
- There is a new option for saving the datasets backing the report heatmaps for downstream use & visualization. Use the ‘Create GCT Files’ parameter in the Advanced Fields. These files can be used with e.g. R, GenePattern, Morpheus, or GENE-E among other options.
- New improvements for long-running file transfer operations include enhanced performance through I/O buffering and compression during transfer, and the addition of a progress bar.
- Analyses run with a timestamp random seed now record the timestamp value in the HTML Report ‘Other’ section. The same seed can be reused at any time to reproduce the analysis.
- Implemented a feature to inform the user when a new version is available.
Bug Fixes and Other Improvements#
- Selection of multiple CHIP files for analysis is no longer allowed. This is a simplification that will permit further back-end improvements in the future.
- Collapse Dataset is no longer allowed with a GSEA Preranked analysis. This was a rarely-used feature that caused a lot of confusion.
- Fixed an HTML report launcher bug for users with non-English locale settings.
- Fixed an issue with running the Enrichment Map integration on Mac.
- Fixed an issue where ‘NaN’ (division by zero) results were given a value of zero in the reports.
- Fixed issues handling C3 MIR Gene Sets and other gene sets with names containing comma characters. Set the ‘Alternate Delimiter’ parameter and use a semicolon to separate gene set names.
- Improved control layout on the lower Tool Command panel.
- Improved layout of the result chooser panels on the Leading Edge Analysis and Enrichment Map Visualization screens.
- Updated the Browser Launcher component to fix issues with launching the web browser.
- Improved the error reporting when there are too few samples in the dataset.
- Fixed some basic aspects of integration with the Mac OS X menu bar.
- Removed Thread Controls from the UI as these rarely-used functions led to instability in the application.
- Improved memory resource use and clean-up during GSEA report generation.
- Fixed a failure in the Chip Chooser component when not connected to a network.
- Cleaned up clutter in the application menus and preferences.
- Better handling for the special internally-used GENE_SYMBOL and SEQ_ACCESSION CHIP files. Please do not use these as inputs to your analyses.
- In preparation for the open-source release, we made numerous changes toward cleaning up the code base, removing / replacing a number of third-party libraries, and removing unused resources from the GSEA jar file.
This is a brief introduction and guide to using some of the new features in the GSEA v3.0 series. The information will eventually go into our official documentation, but will live here until then.
Using GSEA v3.0 Features#
Re-running Analyses GSEA v3.0 Results#
- A new feature has been implemented in GSEA v3.0 to save this timestamp value into the 'Comments' section of the main page of the HTML Report. By setting this value for the random seed in a new analysis (under 'Advanced fields') you can reproduce your results, so long as you keep the other computational parameters the same. You can vary certain reporting parameters — for example to create SVG plots or export heatmap GCTs (see below) — after you are satisfied with the results.
Generating SVG Plots#
- We occasionally get requests for a feature to generate plots in a higher-resolution than the PNG format allows. To meet this need, GSEA v3.0 offers a new Create SVG plot images analysis setting in the 'Advanced fields' section.
- This setting is turned off by default as it is somewhat CPU-intensive and because it creates substantially larger plots, e.g. ~150x the size for our Enrichment Plot PNGs. The SVGs are GZ compressed for the same reason. They compress quite well but can still be up to ~5x larger than the PNGs. They can be decompressed using 'gunzip' on Mac or Linux and 7-Zip on Windows.
- The SVG plots should be viewable in most modern web browsers and editable in a variety of software such as Inkscape or Adobe Illustrator; we have no particular recommendations.
- Note that the SVG plots may match closely but not exactly to the PNGs. The fonts in particular may be slightly different.
- We recommend running your analyses with this setting disabled and using the PNGs to review your results. When you have a satisfactory analysis run, you can reproduce the results by re-running the analysis with the same random seed setting as described above. We recognize that this is not a convenient workflow, but changing the report generator to allow on-demand SVG generation was not feasible. We may revisit this in the future if circumstances permit, but in the meantime this feature at least provides a means of producing higher-resolution images.
Better Handling of Special, Internally Used CHIP Files#
- The GSEA Desktop uses two special files — GENE_SYMBOL.chip and SEQ_ACCESSION.chip — behind the scenes. In GSEA v2.2.x, these files were re-fetched from the Broad FTP site in each session and only kept until program exit. For GSEA v3.0, we now cache these files locally on the user's computer so that the program only needs to download them once.
- These files can be found within the gsea_home sub-directory of the users home directory (generally C:\users\
\gsea_home on Windows or /Users/ /gsea_home on a Mac). You can open this from within GSEA using Help > Show GSEA home folder. The CHIP file caching location is gsea_home/file_cache/chip. If GSEA reports any errors trying to read these files, clearing out this location might help resolve the problem. - Note that both GENE_SYMBOL and SEQ_ACCESSION serve special internal purposes within the GSEA code and should not be used directly as inputs to your analyses.
Extracting the Heatmap Datasets#
- We occasionally get requests for more control over heatmap images: reordering/clustering rows & columns, changing color scheme, adding a scale, etc. Some users have also asked for access to the underlying dataset represented by these heatmaps.
- Restructuring the existing code to do this isn't possible at this time. Instead, however, GSEA v3.0 has a new feature for users to do these things on their own: we've added a new Create GCT Files setting under 'Advanced fields' which will save the datasets backing all the heatmaps in the report for use in external visualizers or analysis tools.
- For each heatmap plot, it creates a GCT file with the same file name (except for the '.gct' extension). This is GSEA's standard data matrix format and it can be readily used in R, GenePattern, Morpheus, or GENE-E among other options. See our Data formats page for details.
- You may wish to use the corresponding CLS file to identify phenotype classes to external software. This file is saved with the report in the 'edb' subdirectory.
- To answer the natural question, "What is the source of this data?", it comes directly from the input dataset (GCT, RES, PCL, etc). It is your original expression data, just reordered to match the limited set of genes represented in the given heatmap (and possibly "collapsed" to map probe-level data to genes). You may find these GCTs useful for further downstream analysis of a subset of your data in the context of an individual Gene Set, for example.
Working with older versions of C3 MIR datasets using the alternate delimiter setting#
- In general we recommend that Gene Set and input file names should stick to alphanumeric characters, plus the underscore as a separator. Use of other 'special' characters can cause issues on some operating systems and programming languages, and those issues may vary across Mac, Linux, and Windows.
- The C3 MIR sub-collection in former versions of MSigDB (v5.2 and earlier) did not keep to this advice. It used the comma character as part of the Gene Set name, which conflicts with the separator character GSEA v2.x used for the Gene Set selector fields and cause failures as GSEA could not distinguish between the commas within the names and the commas separating the names. These Gene Sets have been renamed in MSigDB v6.0 to avoid these issues.
- GSEA v3.0 introduces a new alternate delimiter setting in 'Advanced fields' to allow you to override the default separator for cases like this. We recommend using the semicolon ';' instead.
- We encourage you to avoid these characters in your own Gene Set and file names. It's safest to stick to alphanumeric characters, possibly with an underscore in place of spaces or other special characters.