FAQ

Please cite use of MARRVEL by:

Julia Wang, Rami Al-Ouran, Yanhui Hu, Seon-Young Kim, Ying-Wooi Wan, Michael F. Wangler, Shinya Yamamoto, Hsiao-Tuan Chao, Aram Comjean, Stephanie E. Mohr, UDN, Norbert Perrimon, Zhandong Liu, Hugo J. Bellen. “MARRVEL: Integration of Human and Model Organism Genetic Resources to Facilitate Functional Annotation of the Human Genome.” The American Journal of Human Genetics 100, 1–11, 2017 June 1. doi:10.1016/j.ajhg.2017.04.010

MARRVEL stands for Model organism Aggregated Resources for Rare Variant ExpLoration.

You can insert either a human variant (multiple formats allowed) and gene, or search for a single gene. You can select from five databases (OMIM, ExAC, Geno2MP, DGV and DECIPHER).

Currently, the search field will accept the following nomenclature:

  1. Genomic Coordinates
    • 17-29490284-C-G
    • Chr17: 29490284C>G
    • 17: 29490284C>G
  2. HGVS (Human Genome Variation Society)
    • NM_012160.3:c.541A>G

If you see no results across multiple databases, then we recommend that you first check if your variant coordinates, gene name, and reference nucleotide are correct by using http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg19 or https://mutalyzer.nl/.If these information are correct, then please submit a feedback report for our team to review.

OMIM

We used the OMIM API to retrieve OMIM data after obtaining the license from OMIM.

Ensembl ID by gene symbol from:

Ensembl GRCh37 REST API (http://grch37.rest.ensembl.org/)

HGNC ID and Entrez ID by gene symbol or Ensembl ID from:

HGNC REST API (http://www.genenames.org/help/rest-web-service-help)

ClinVar

We search ClinVar by HGNC identifier for human gene and provide Variant, Location, Condition(s), Frequency, Clinical Significance, and Review Status. This data is updated every 2 weeks.

ExAC

We downloaded the variant sites file in VCF format for release 0.3.1.
File available at: ExAC FTP(release 0.3.1)
ExAC database: http://exac.broadinstitute.org/
Assembly: GRCh 37 (hg19)

Geno2MP

We downloaded the variant sites VCF file: “Geno2MP.variants.vcf”. The VCF file was last updated on October 10, 2016.
VCF file available at: http://geno2mp.gs.washington.edu/Geno2MP/#/terms
Geno2MP database: http://geno2mp.gs.washington.edu/
Assembly: GRCh 37 (hg19)

Geno2MP Phenotype information:

Directly from Geno2MP website
Last update: Oct 15 15:46 UTC (MARRVEL crawler)

DGV

We downloaded the DGV variants file available as a tab delimited text file.
File available at: http://dgv.tcag.ca/dgv/docs/GRCh37_hg19_variants_2016-05-15.txt
Release date: 2016-05-15
Database: http://dgv.tcag.ca/dgv/app/home
Assembly: GRCh 37 (hg19)

DECIPHER control data

We downloaded the “Population Copy-Number Variation Frequencies” file on Sep-7-2016.
File available at: https://decipher.sanger.ac.uk/about#downloads/data
Database: https://decipher.sanger.ac.uk/
Assembly: GRCh 37 (hg19)

GO terms

Human:

EMBL-EBI QuickGO (ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/HUMAN/goa_human.gaf.gz)
Date modified: 11/28/16, 9:35:00 AM (the file)

Rat:

FTP on RDG ( ftp://ftp.rgd.mcw.edu/pub/ontology/annotated_rgd_objects_by_ontology/rattus_genes_go) Last update: Mar 3, 2017

Mouse:

MGI result page -> All GO Annotations -> GO Graph: the table below the graph (http://www.informatics.jax.org/marker/gograph/{{MGI ID}})
Last update: Nov 19 00:01 UTC (MARRVEL crawler)

Zebrafish

ZFIN result page -> GENE ONTOLOGY -> "GO Terms (all ?)" (http://zfin.org/action/marker/marker-go-view/{{ZFIN ID}})
Last update: Nov 18 21:56 UTC (MARRVEL crawler)

Fly:

Directly from the FlyBase result page(http://flybase.org/reports/{{flybase id}}.html)
Only "Terms Based on Experimental Evidence."
Last update: Oct 26 20:01 UTC (MARRVEL crawler)

Worm:

WormBase REST API (http://www.wormbase.org/about/userguide/for_developers/API-REST/Go_term#0--10)
Last update: Nov 9 22:17 UTC (MARRVEL crawler)

Budding Yeast:

Directly from the SGD result page (http://www.yeastgenome.org/locus/{{SGD ID}}/overview)
Last update: Nov 18 22:11 UTC (MARRVEL crawler)

Fission Yeast:

Directly from the PomBase result page (http://www.pombase.org/spombe/result/{{pombase id}}) Last update: Nov 18 23:04 UTC (MARRVEL crawler)

Organ/Tissue expression data

Human

GTEx

http://www.gtexportal.org/static/datasets/gtex_analysis_v6p/rna_seq_data/GTEx_Analysis_v6p_RNA-seq_RNA-SeQCv1.1.8_gene_median_rpkm.gct.gz
V6

Protein Atlas

http://www.proteinatlas.org/{{ensemblID}}.xml
Updated: Dec 14 17:18 UTC, 2016

Rat

Rat BodyMap: Yu, Y., Fuscoe, J.C., Zhao, C., Guo, C., Jia, M., Qing, T., Bannon, D.I., Lancashire, L., Bao, W., Du, T., et al. (2014). A rat RNA-Seq transcriptomic BodyMap across 11 organs and 4 developmental stages. Nat. Commun. 5, 3230.

Mouse

MGI result page, "Mutations, Alleles, and Phenotypes" section, "Phenotype Overview”.
Last update: Nov 8 22:17 UTC

Zebrafish

ZFIN downloadable data (http://zfin.org/downloads), "Expression data for wildtype fish"
Last update: 2016.11.07 (the file)

Fly

FlyBase result page -> FlyAtlas Anatomy Microarray -> download data (TSV)

Only when "Back-to-back Scales" or "Heatmap" is available.
Last update: Nov 7 18:52 UTC (MARRVEL crawler)

Protein Domain

Directly from DIOPT Best score page
Last update: Mar 6, 2017 (MARRVEL crawler)

The Undiagnosed Disease Network's Model Organism Screening Center, in collaboration with Dr. Zhandong Liu's and Dr. Norbert Perrimon's bioinformatics teams developed MARRVEL as a public tool to efficiently analyze potentially pathogenic variants in patients with rare or undiagnosed diseases.

Online Mendelian Inheritance in Man (OMIM)

The three main pieces of information that we draw from OMIM are: gene function, associated phenotypes, and reported alleles. It is helpful to know if a gene is a part of a known Mendelian phenotype whose molecular basis is known (#entries). Genes without this knowledge are candidates for novel gene discovery and for genes that are this category if the patient's phenotype does not match the reported disease and phenotype as well as those of the patients in the literature, then this increases the opportunity to provide a phenotypic expansion.

ExAC

ExAC contains more than 60,000 exomes and is, other than gnomAD (http://gnomad.broadinstitute.org/), the largest public collection of exomes that have been selected against individuals with severe early-onset Mendelian phenotypes. For MARRVEL’s purposes, ExAC serves as the best control population minor allele frequency. We are interested in two sets of outputs from ExAC. The first output is the gene-centric overview of the expected versus observed number of missense and loss of function alleles. A metric called pLI (probability of Loss of function intolerance) ranges between 0 and 1 is likely related to how essential both copies of a gene are before reproductive age. A pLI score of 1 means that this gene is very intolerant of any loss of function variants and is under selective constraint. The second output is data from ExAC that pertains to the specific variant.

ClinVar

ClinVar is a public archive of reports of the relationships among human variations and phenotypes, with supporting evidence. Variants with interpretations reported by researchers and clinicians are valuable for analyzing how likely a variant is pathogenic.

Geno2MP

Geno2MP is a collection of samples from the University of Washington Center for Mendelian Genetics. It contains about 5,000 exomes of affected individuals and unaffected relatives. This database links the phenotypic as well as mode of inheritance information to specific alleles. For phenotype, we focus on comparing the affected organ system of the patient to the affected individuals in Geno2MP. A match in allele, mode of inheritance, and phenotype provides an increased probability that the variant likely pathogenic. However, due to small sample size a negative association does not necessarily decrease a variant’s pathogenic priority.

Database of Genomic Variants (DGV)

To our knowledge, DGV is the largest public-access collection of structural variants from more than 54,000 individuals. The database includes samples of reportedly healthy individuals, at the time of ascertainment, from up to 72 different studies. Possible limitations to this data include variation in source and method of the data acquired the lack of information regarding incomplete penetrance of pathogenic CNVs, and whether individuals will develop associated diseases subsequent to data collection.

DECIPHER(controls)

The data displayed on MARRVEL (without user registration) includes common variants from the control population. The data displayed includes structural variants that cover the genomic location of the input variant. DECIPHER also contains variant and phenotypic information for affected individuals (accessible in the next version update via user registration)

DIOPT

DIOPT provided multiple protein sequence alignment of the best predicted orthologs in six model organisms against the protein sequence of the human gene of interest. The alignment will provide information on the conservation of specific amino acids as well as functional protein domains

SGDPomBaseWormBaseFlyBaseZFINMGI, and RGD

We collected data from multiple model organism databases and provide a summary of the biological and genetic functions of the predicted orthologs derived by DIOPT.

One of many ways that users can use the data provided by MARRVEL is by following the user-driven analysis as depicted in the Figure below.

You can report any errors and provide feedback by clicking on the ‘Feedback’ button which is displayed on the right side of our portal.

In the next version we are planning to add:

  • AJHG abstracts
  • GeneReviews
  • And others

We will also be working on incorporating pathogenicity prediction algorithms, and others.

We welcome recommendations from our users regarding future databases to include in our MARRVEL analysis pipeline. Please submit any suggestions through the ‘Feedback’ button displayed on the right.