[Genome] liftOVER input format
Ann Zweig
ann at soe.ucsc.edu
Wed Oct 11 11:59:09 PDT 2006
Hello Ellen-
Unfortunately we do not yet host this type of data. One of our developers
found a website that may provide the kind of information you need.
http://genewindow.nci.nih.gov
Quoting from the main page:
"Genewindow is the primary tool for pre- and post-genetic bioinformatics and
analytical work at the Core Genotyping Facility (CGF) at the National Cancer
Institute, which is currently analyzing approximately 75,000 samples at a rate
of four million SNP genotypes per year. While Genewindow is implemented for the
human genome and integrated with the CGF laboratory data, it stands as a useful
tool to assist investigators in the selection of variants for study in vitro, or
in novel genetic association studies."
Paper on Genewindow in Nature Genetics Vol. 37 No. 2 Feb 2005 pages 109-110.
They get their data from http://snp500cancer.nci.nih.gov which has files
available for download.
As to the training for using the UCSC Genome Browser, we do have some resources
available. The company, Open Helix provides training materials for the Genome
Browser at: http://www.openhelix.com/ucscmaterials.shtml. They also do regional
seminars, and have come to Boston in the past. Check their home page to see
their seminar schedule. I know that there are several users of the Genome
Browser at DFCI -- Open Helix also provides on-site training.
In the future, please direct your questions to the genome mailing list at
genome at soe.ucsc.edu -- our moderated forum for user questions and discussion.
You will likely get a quicker response to your question.
Regards,
----------
Ann Zweig
UCSC Genome Bioinformatics Group
http://genome.ucsc.edu
Rachel
Thanks for your prompt reply. I think it is no longer necessary for me to use
the liftOver - I have solved the problem by putting sequence of gene(s) in the
region into BLAT, specifying the July 2003 build. This way I was able to view
my data more closely in relation to specific sequence from the genes of
interest.
However, I do have some other questions, although I am not sure you are the
right person to ask. . . I am looking at a region on Chromosome 14q - (at about
19,000,000 to 23,000,000 in the 2003 build) at the TRAC gene or T cell receptor
alpha common region. There are about 12 or so transcripts overlapping in that
region, most of which code for TRAC, but 3 of which code for TRA@ - which is
also the locus name. All of the transcipts that say "TRAC" list the same
protein (P01848), while the transcripts that say "TRA@" each code for different
proteins (Q8WUDO, Q6P4I7, Q6NSA6) - but all 4 of the proteins have the same aa
sequence in the C terminus and only differ at the N-terminus. So there are
alternative 5 prime exons for these variants, which is also apparent from the
genome browser. So my questions are:
1.) Is there a simple way to use the UCSC system or other databases to check if
these proteins have been validated at the protein level? - or if they are
hypothetical proteins only? It would be useful to know something about them or
their function.
2.) Because this is a T cell receptor gene, it is a region of lots of
rearrangement - both in encoding the normal T-cell receptor repertoire, as well
as in some abnormal rearrangements that have been seen in some types of
leukemia. Is it possible to view these rearrangement breakpoints in the browser
- are they databased? Or do I need to find them in the literature and put them
in myself?
Many thanks for your help - Becuase of where my work has taken me recently I
have had to become more familiar with the genome browser and associated
informatics tools, but I have little real training - so it is a process of just
figuring things out. This is a good thing, but I sometimes think that I need
some kind of course, or that that might save me time in the long run.
Many thanks,
Ellen Freed
Ellen Freed, PhD
Research Scientist
Department of Medical Oncology
Dana-Farber Cancer Institute
Dana 714A
44 Binney Street
Boston, MA 02115
(617) 632-5957
Ellen_Freed at dfci.harvard.edu
-----Original Message-----
From: Rachel Harte [mailto:hartera at soe.ucsc.edu]
Sent: Tuesday, October 10, 2006 5:36 PM
To: Freed, Ellen
Subject: Re: [Genome] liftOVER input format
Ellen,
Please would you clarify your question further. Are you saying that
you are looking at Known Genes II on hg16 and it was updated in July
2005 and so the gene positions no longer match the annotations on an
Affy SNP chip that was based on the July 2003 human build (hg16). Are you
genes in a particular location on hg16 (July 2003 Build).
Please give me details of how you are using liftOver e.g. from which
assembly and to which assembly are you lifting. Which parameters have you
chosen or are you just using the defaults? Please give me an example of
the position that you pasted into the liftOver text box. This information
will help me to sort out your problem.
Thank you.
Rachel
On Tue, 10 Oct 2006, Freed, Ellen wrote:
>
> Hi genome browser help,
>
> I have been working with the browser to map positions of amplification and LOH
> detected in SNP analysis onto the genome. We used Affymetrix 100K SNP arrays,
> which are based on the July 2003 genome build. I have one tricky location
where
> I am having trouble establishing which genes are included in a small
> amplification present in several samples. Using the July 2003 genome build
and
> clicking on the links to specific known genes in the region of interest - I
have
> found that the files were updated in 2005, and no longer match the positions
in
> July 2003 build.
>
> I am trying to use the liftOVER utility to determine how the old SNP data maps
> onto more current info provided for the genes. When I use the same
> tab-delimited text file I have used to map the positions of gain into the
> browser in the window provided in liftOVER and choose the "position" format, I
> get an error message that is format-related, but am confused about what aspect
> of the format is off. I think probably something in the first 2 lines needs
> changing but am not sure about this - although I have read the instructions.
>
> But, also, I am not sure this is the correct approach. There are only a few
> genes involved and it might be easier if I could just access to the exact
> coordinates of the genes as of the July 2003 build. Maybe I can do that using
> BLAT - am trying that next?
>
> I would be very grateful for any advice on how best to accomplish this goal.
>
> Many thanks,
>
> Ellen Freed
>
> Ellen Freed, PhD
> Research Scientist
> Department of Medical Oncology
> Dana-Farber Cancer Institute
> Dana 714A
> 44 Binney Street
> Boston, MA 02115
>
> (617) 632-5957
> Ellen_Freed at dfci.harvard.edu
>
>
> _______________________________________________
> Genome maillist - Genome at soe.ucsc.edu
> http://www.soe.ucsc.edu/mailman/listinfo/genome
>
More information about the Genome
mailing list