Wang J, Al-Ouran R, Hu Y, Kim SY, Wan YW, Wangler MF, Yamamoto S, Chao HT, UDN Consortium, Comjean A, Mohr SE, Perrimon N, Liu Z, Bellen HJ (2017) MARRVEL: integration of human and model organism genetic resources to facilitate functional annotation of the human genome. American Journal of Human Genetics. doi:10.1016/j.ajhg.2017.04.010 PMID:28502612


MARRVEL (Model organism Aggregated Resources for Rare Variant ExpLoration) aims to facilitate the use of public genetic resources to prioritize rare human gene variants for study in model organisms. To facilitate the search process and gather all the data in a simple display we extract data from human data bases (OMIM, ExAC, ClinVar, Geno2MP, DGV, and DECIPHER) for efficient variant prioritization. The protein sequences for eight organisms (S. cerevisiae, S. pombe, C. elegans, D. melanogaster, D. rerio, M. musculus, R. norvegicus, and H. sapiens) are aligned with highlighted protein domain information via collaboration with DIOPT. The key biological and genetic features are then extracted from existing model organism databases (SGD, PomBase, WormBase, FlyBase, ZFIN, MGI, and RGD).

Background and Significance

As whole exome and genome sequencing are incorporated into personal health care, we are faced with an abundance of rare variants of unknown function. The lack of in vivo functional studies of variants further increases the difficulty in interpreting sequencing data and results in on average 30% genetic diagnostic rate. To understand the impact of these human variants and to increase the rate of diagnosis, it is critical to gather knowledge about the gene and the variant that helps us determine the significance of the findings. This information can be found in human genetic data sets as well as molecular, biological, and phenotypic data generated in a variety of genetic model organisms. Once this information is gathered and analyzed, it sets the stage for diagnostic interpretation and in depth studies of novel pathogenic mechanisms.

Overview of the System flow

Patients with undiagnosed diseases with possible underlying genetic etiology are increasingly being sent for Whole Exome Sequencing or Whole Genome Sequencing. The result of the sequencing can produce a long list of possible candidate variants. Starting from a candidate human variant that may be disease causing, MARRVEL allows simultaneous collection of data from multiple sources that are used to determine how likely a variant can cause a rare genetic disease. Furthermore, we aim to guide the variant analysis further by transitioning to model organisms. In collaboration with Nobert Perrimon’s lab, we examine the conservation of the specific variant of interest in homologs/orthologs across model organisms and provide a concise summary of what is known about these genes. This is an important process to select the appropriate model organism to study candidate genes and variants.

In order to most efficiently utilize limited time and resources, MARRVEL aggregates several key public resources used to prioritize how likely a variant may be pathogenic. These resources become especially valuable for discovering rare genetic disease genes and together comprise a rich source of information for new disease gene discovery. The next stage of MARRVEL’s analysis is to curate information available for candidate genes and variants across multiple model organisms to evaluate conservation and assess what is already known about the homologous genes in model organisms.

For MARRVEL’s first set of data describing variants of interest, we selected the following 5 core public human genetics databases: OMIM, ExAC, ClinVar, Geno2MP, DGV, DECIPHER. These databases are useful for determining allele frequency of the variant of interest and if individuals with the variant exhibit similar phenotypes as the patient of interest. Additional databases will be added as they become available. In our interface, we collect and curated the critical information used for variant prioritization.

MARRVEL’s second set of data facilitates variant analysis in model organisms by providing known functional data and pursue further gene function annotation. In collaboration with Nobert Perrimon’s team, we have expanded the tools to: (1) Identify potential orthologs in 6 model organisms (budding yeast, fission yeast, worm, fly, zebrafish, mouse, and rat) via DIOPT (DRSC Integrative Ortholog Prediction Tool), (2) align model organism protein sequences and annotate protein domains and amino acid change of interest for conservation analysis, and (3) provide experimental evidence supported gene ontology and tissue expression pattern.

Team members and Collaborators


Julia Wang
Rami Al-Ouran
Yanhui (Claire) Hu
Seon-Young Kim
Ying-Wooi Wan
Michael Wangler
Shinya Yamamoto
Hsiao-Tuan Chao
Aram Comjean
Stephanie Mohr
Norbert Perrimon
Hugo Bellen
Zhandong Liu




Online Mendelian Inheritance in Man, OMIM®. McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University (Baltimore, MD), October 2016. World Wide Web URL: https://omim.org/


Lek, M., Karczewski, K. J., Minikel, E. V., Samocha, K. E., Banks, E., Fennell, T., … Consortium, E. A. (2016). Analysis of protein-coding genetic variation in 60,706 humans. Nature, 536(7616), 285–291.
The authors would like to thank the Exome Aggregation Consortium and the groups that provided exome variant data for comparison. A full list of contributing groups can be found at http://exac.broadinstitute.org/about.


Landrum MJ, Lee JM, Benson M, Brown G, Chao C, Chitipiralla S, Gu B, Hart J, Hoffman D, Hoover J, Jang W, Katz K, Ovetsky M, Riley G, Sethi A, Tully R, Villamarin-Salomon R, Rubinstein W, Maglott DR. ClinVar: public archive of interpretations of clinically relevant variants. Nucleic Acids Res. 2015 Nov 17. PubMed PMID: 26582918.


Geno2MP, NHGRI/NHLBI University of Washington-Center for Mendelian Genomics (UW-CMG), Seattle, WA.
The authors would like to thank the University of Washington Center for Mendelian Genomics and all contributors to Geno2MP for use of data included in Geno2MP.


MacDonald JR, Ziman R, Yuen RK, Feuk L, Scherer SW. The database of genomic variants: a curated collection of structural variation in the human genome. Nucleic Acids Res. 2013 Oct 29. PubMed PMID: 24174537


DECIPHER: Database of Chromosomal Imbalance and Phenotype in Humans using Ensembl Resources. Firth, H.V. et al (2009). Am.J.Hum.Genet 84, 524-533 (DOI: dx.doi.org/10/1016/j.ajhg.2009.03.010)

This study makes use of data generated by the DECIPHER community. A full list of centres who contributed to the generation of the data is available from http://decipher.sanger.ac.uk and via email from decipher@sanger.ac.uk. Funding for the project was provided by the Wellcome Trust.

We thank the expert advice and feedback from:
Undiagnosed Diseases Network Model Organism Working Group and Coordinating Center
Jim Lupski, Richard Gibbs, Zeynep Akdemir, John Seavitt, George Eisenhoffer, Swathi Arur, and Grezegorz Ira


Hu Y, Flockhart I, Vinayagam A, et al. An integrative approach to ortholog prediction for disease-focused and other functional studies. BMC Bioinformatics. 2011;12(1):357.


Wildeman M et al. (2008). Improving sequence variant descriptions in mutation databases and literature using the Mutalyzer sequence variation nomenclature checker. Hum Mutat 29, 6-13

Model Organism Databases


Cherry JM, Hong EL, Amundsen C, Balakrishnan R, Binkley G, Chan ET, Christie KR, Costanzo MC, Dwight SS, Engel SR, Fisk DG, Hirschman JE, Hitz BC, Karra K, Krieger CJ, Miyasato SR, Nash RS, Park J, Skrzypek MS, Simison M, Weng S, Wong ED (2012) Saccharomyces Genome Database: the genomics resource of budding yeast. Nucleic Acids Res. Jan;40(Database issue):D700-5. [PMID: 22110037]


McDowall MD, Harris MA, Lock A, Rutherford K, Staines DM, Bähler J, Kersey PJ, Oliver SG, Wood V. (2015) PomBase 2015: updates to the fission yeast database.
Nucleic Acids Res. 43:D656-61.
PMID:22039153 DOI: 10.1093/nar/gkr853


Kevin L. Howe, Bruce J. Bolt, Scott Cain, Juancarlos Chan, Wen J. Chen, Paul Davis, James Done, Thomas Down, SibylGao, Christian Grove, Todd W. Harris, Ranjana Kishore, Raymond Lee, Jane Lomax, Yuling Li, Hans-Michael Muller, Cecilia Nakamura, Paulo Nuin, Michael Paulini, Daniela Raciti, Gary Schindelman, Eleanor Stanley, Mary Ann Tuli, Kimberly Van Auken, Daniel Wang, Xiaodong Wang, Gary Williams, Adam Wright, Karen Yook, Matthew Berriman, Paul Kersey, Tim Schedl, Lincoln Stein, Paul W. Sternberg (2016) Nucleic Acids Res, 44, D774-80.


Attrill H, Falls K, Goodman JL, Millburn GH, Antonazzo G, Rey AJ, Marygold SJ; the FlyBase Consortium. (2016) FlyBase: establishing a Gene Group resource for Drosophila melanogaster.
Nucleic Acids Res. 44(D1):D786-D792


Ruzicka et al., ZFIN, The zebrafish model organism database: Updates and new directions.
Genesis. 2015 53(8):498-509.


Eppig JT, Blake JA, Bult CJ, Kadin JA, Richardson JE;; The Mouse Genome Database Group. 2015. The Mouse Genome Database (MGD): facilitating mouse as a model for human biology and disease. Nucleic Acids Res. 2015 Jan 28;43(Database issue):D726-36.


Shimoyama M, De Pons J, Hayman GT, Laulederkind SJ, Liu W, Nigam R, Petri V, Smith JR, Tutaj M, Wang SJ, Worthey E, Dwinell M, Jacob H.
Nucleic Acids Res. 2015 Jan 28;43(Database issue):D743-50.

We thank the expert advice and feedback from:

Undiagnosed Diseases Network Model Organism Working Group and Coordinating Center
Jim Lupski, Richard Gibbs, Zeynep Akdemir, John Seavitt, George Eisenhoffer, Swathi Arur, and Grzegorz Ira


Undiagnosed Diseases Network

What's new

  • v1.0.1: Added IMPC link for mouse homologs
  • v1.0.0: Added the other model organisms
  • Beta v0.3.2: Added model organism(mouse and fly) search
  • Beta v0.3.1: Rat, GTEx, HGVS nomenclature by Mutalyzer
  • Beta v0.3.0: ClinVar, human GO term, multiple protein alignment, DIOPT v6
  • Beta v0.2.0: Model organisms (yeast, fission yeast, worm, fly, zebrafish, mouse)
  • Beta v0.1.0: ExAC, Geno2MP, DGV, DECIPHER