NIEHS
Search Site 
Summary Information
for Completed Genes

Summary Data
Summary Statistics
Data Download
Other SNP Resources
Inflammation Genes
Webmaster
Eric Torskey
(et@gs.washington.edu)

Page last updated:
March 7, 2008

Monitor page for changes

it's private
powered by ChangeDetection

Local Phasebase Input File:
Haplotype Sorting:
EGP Finished Gene Phasebase Input File:
Rare Allele Percentage (integer, 0 to 50):
Cluster and/or Draw Trees For:
Generate Representative Phasebase:

Displaying Estimated Haplotype Data: Visual Haplotypes

The display and interpretation of large sets of DNA polymorphism data can be simplified by using a graphical display. We have found it useful to present complete raw datasets of indiviudals' genotype data using a display format called a visual genotype (VG) (see Nickerson et al., Nature Genetics, 19:233-240, 1998, and Rieder et al., Nature Genetics, 22:59-60, 1999). We have adopted this same format to the display of theoretical haplotype data which is computationally inferred from genotype data. Similar to visual genotype data, we have adapted this format to present data in an array of samples (rows) x polymorphic sites (columns) and encodes each polymorphism found on a chromosome according to a general color scheme where:

blue =  common allele
yellow = homozygous genotype for the rare allele
gray = missing data (N)

This array format allows one to visually inspect the data across both individual's haplotypes and polymorphic sites to make comparisons.  In many cases, presenting data in this visual haplotype format one can see the result of recombination which has transferred blocks of chromosomal segments between haplotypes.

We have established a specific format for the uploading of haplotype data. See below for complete formatting guidelines.

Displaying Your Own Data

Using this page you are able to upload a file (using the browse button to find a file on your own computer) and have a visual haplotype produced. This image will appear in a separate browser window.

Displaying EGP Finished Data

You are also able to directly view data from our Finished Genes list and interactively display these visual genotypes. Select the "EGP Finished Gene Phasebase Input File" button and use the drop-down list to select a gene.

Display Options

The format of the visual haplotype can be adjusted by filtering out polymorphic sites below a minimum allele frequency. This is simply done by entering your minimum cutoff in the "Rare Allele Percentage" box.  Haplotypes can also be sorted (by individual sample or haplotype frequency) or clustered based on sample or site similarity.

Formatting Guidelines

Haplotype Data Format:

This data formatting has been adapted from genotype data format.  In order to accomodate haplotype data we have a duplicate haplotype allele entry for each line (which similates a single allele).

This file must be a tab-delimited file with four fields on each line, in the format:

<Site Position><tab><Sample ID><tab><Haplotype 1 Allele><tab><Haplotype 1 Allele>
<Site Position><tab><Sample ID><tab><Haplotype 2 Allele><tab><Haplotype 2 Allele>

e.g.

200 IND1 A A
200 IND1 G G

These lines would represent two haplotypes - one having an "A" allele and the other having a "G" allele at position 200, from a sample named IND1. This format must be repeated on each line of your input file for *all* samples (i.e. two lines per sample) listed in your input file. If you do not have a haplotype allele for a sample at a specific site, it would be entered with a genotype of "N N."

Multiple Site Example:

200 IND1 A A
200 IND1 G G
200 IND2 G G
200 IND2 C C
200 IND3 C C
200 IND3 T T
(etc. for more sites)

If you have questions about the data format please see our Finished Genes page, select a gene and view the Individual Genotypes file.

 
National Institute of Environmental Health Sciences Environmental Genome Project National Institute of Environmental Health Sciences UW NIEHS >