NIEHS
Search Site 
Summary Information
for Completed Genes

Summary Data
Summary Statistics
Data Download
Other SNP Resources
Inflammation Genes
Webmaster
Eric Torskey
(et@gs.washington.edu)

Page last updated:
March 7, 2008

Monitor page for changes

it's private
powered by ChangeDetection

Local Prettybase Input File:
EGP Finished Gene Prettybase Input File:
Rare Allele Percentage (integer, 0 to 50):
Cluster and/or Draw Trees For:
Linkage Disequilibrium Plot:    LD Shader Mode:
LD Min.:    LD Max.:

Displaying Genotype Data: Visual Genotypes

The display and interpretation of large genotype data sets can be simplified by using a graphical display. We have found it useful to present complete raw datasets of individuals' genotype data using a display format called a visual genotype (VG) (see Nickerson et al., Nature Genetics, 19:233-240, 1998, and Rieder et al., Nature Genetics, 22:59-60, 1999). This format presents all data in an array of samples (rows) x polymorphic sites (columns) and encodes each diallelic polymorphism according to a general color scheme where:

blue - homozygous genotype for the common allele
red - heterozygous genotype (both common and rare allele)
yellow - homozygous genotype for the rare allele
gray - missing data (N)

This array format allows one to visually inspect the data across both individual's diplotypes and polymorphic sites to make comparisons.

We have established a specific format for the uploading of genotype data. See below for complete formatting guidelines.

Displaying Your Own Data

Using this page you are able to upload a file (using the browse button to find a file on your own computer) and have a visual genotype produced. This image will appear in a separate browser window.

Displaying EGP Finished Data

You are also able to directly view data from our Finished Genes list and interactively display these visual genotypes. Use the drop-down list titled "EGP Finished Gene Prettybase Input File" to select a gene.

Display Options

The format of the visual genotype can be adjusted by filtering out polymorphisms below a minimum allele frequency. This is simply done by entering your minimum cutoff in the Rare Allele Percentage box.

In order to cluster sites, we calculate r2 for all pairs of sites in a file, and then cluster sites such that sites with similar patterns of genotype (high r2) are shown near one another. We use an unweighted average linkage algorithm (UPGMA) to generate a hierarchical tree of cluster relationships (see more on VG2 site clustering).

How to Save A Visual Genotype Image

  1. Move the mouse pointer over the image you wish to save.
  2. Press right mouse button.
  3. Move the pointer over "Save Image As..." (Netscape) or "Save Picture As..." (Internet Explorer)
  4. Choose target directory to save to and type in desired filename.
  5. Click "Ok" button.
Formatting Guidelines

Genotype Data Format:

This file must be a tab-delimited file with four fields on each line, in the format:

<Site Position><tab><Sample ID><tab><Allele 1><tab><Allele 2>

e.g.

200 IND1 A A

This line would represent a polymorphism at position 200, from a sample named IND1, with a homozygous genotype of allele A (both Allele 1 and Allele 2 are A). This format must be repeated on each line of your input file for *all* samples listed in your input file. If you have not genotyped a sample at a specific site, it would be entered with a genotype of "N N").

Multiple Site Example:

200 IND1 A A
200 IND2 A G
200 IND3 G G
300 IND1 N N
300 IND2 C C
300 IND3 C T
(etc. for more sites)

If you have questions about the data format please see our Finished Genes page, and select a gene and view the Individual Genotypes file.

 
National Institute of Environmental Health Sciences Environmental Genome Project National Institute of Environmental Health Sciences UW NIEHS