[Genome] Batch retrieval using gene symbols
Archana Thakkapallayil
archanat at soe.ucsc.edu
Fri Dec 8 09:29:50 PST 2006
Hello Jacob,
The geneSymbol field is located in the kgXref table under the Known
Genes track. Unfortunately, you cannot paste/upload gene symbols as
identifiers to search the kgXref table. You can only search for
identifiers in the primary field of the table which in this case is the
kgID (the accession). Also, you can't get the sequence information
directly using this table.
However, you could get the information that you are looking for by
retrieving information from the kgXref and the knownGene tables using
the Table Browser. This is going to be a two-step process. First, you
need to find out the known gene ID's corresponding to your gene symbol's
using the kgXref table and then use your list of known gene ID's to
extract the promotor/upstream regions that you are interested, using the
knownGene table.
To get to it, make the following selections in the Table Browser:
clade: vertebrate
genome: human
assembly: Mar. 2006
group: Genes and Gene Prediction Tracks
track: Known Genes
table: kgXref
click on "filter: create" button and then paste a white-space separated
list of your gene symbols into the geneSymbol text box and then click
"submit"
Then choose "output format: selected fields from primary and related
tables" and hit "get output"
On the Select Fields page, check the kgID field from the kgXref table
and then hit "get output".
This gives you the list of known gene ID's corresponding to your gene
symbol's.
Now back on the Table Browser page, choose "table: knownGene" and
"region:genome". Then paste your list of known gene ID's in the
paste/upload list box and hit "submit" button.
Then choose "output format: sequence" and hit "get output".
Select "genomic" and hit "submit".
Under "Sequence Retrieval Region Options" check only the box
"Promotor/Upstream by -- bases". You can specify the number of bases you
are interested in the text box here and then hit "get sequence".
In case,if you are looking for the actual promotor regions for your
genes, we do have a couple of tracks on h18 that may help: the 'FirstEF'
track, the TFBS Conserved track and the 'Reg Potential 7 species' track.
The FirstEf track predicts exon, promoter and CpG window. The 'Reg
Potential 7 species' track predicts regulatory regions. The TFBS track
contains the location and score of transcription factor binding sites
conserved in the human/mouse/rat alignment.
I hope that this helps you. Please let us know if you have further
questions.
Regards,
Archana
UCSC Genome Bioinformatics Group
Jacob Brown wrote:
> Hello,
>
> I am just trying to retrieve promoter regions for ~300 genes using
> the table browser. I am pasting a list of gene symbols in the
> identifier box and using the known genes track. When I ask for
> sequence in output it does not find any hits. Suggestions?
>
> Many thanks,
>
> Jacob Brown
> National Eye Institute
> _______________________________________________
> Genome maillist - Genome at soe.ucsc.edu
> http://www.soe.ucsc.edu/mailman/listinfo/genome
>
More information about the Genome
mailing list