[Genome] using the knownGene table

Kayla Smith kayla at soe.ucsc.edu
Thu Nov 8 17:14:59 PST 2007


Hello Donna,

One quick way to get the positions of all the exons in a track is to use 
the Table Browser as follows:

clade: Vertebrate, genome: Human, assembly, Mar. 2006
group: Genes and Gene Prediction Tracks, track: UCSC Genes
table:  knownGene
region: genome
output format:  custom track

click "get output"

On the next page select "Create one BED record per:"  "Exons"  and click 
"get custom track in file".

This will give you a file with the positions of all the exons, and in 
the name of each item is the kgId and which exon it is.

chr1	1736	2090	uc001aaa.1_exon_0_0_chr1_1737_f	0	+
chr1	2475	2584	uc001aaa.1_exon_1_0_chr1_2476_f	0	+
chr1	3083	4121	uc001aaa.1_exon_2_0_chr1_3084_f	0	+

To filter the results to only show the items you're interested in, go 
back and click on "identifiers: paste list" and paste in your names.

Another way to go about this task is to use the Table Browser, but 
instead of selecting "custom track", select "all fields from selected 
tables" then click "get output".  The "exonCount" column is how many 
exons a given gene has, and the exonStarts and exonEnds can be used to 
find out the positions of the exons within a gene.

I'm not sure what your question is about ncbi, but if you can rephrase 
it I can give it a shot.


I hope this information is helpful to you.  Please don't hesitate to 
contact us again if you require further assistance.

Kayla Smith
UCSC Genome Bioinformatics Group

Donna Toleno wrote:
> 
> Hello mailing list. 
> 
> I am new to using the UCSC resources and only slightly more experienced with NCBI. I am using the knownGene table to obtain the exon start and end sites for transcripts of a list of genes.  The output from the table is sorted by chromosome number and position so the order of the output is different from my list order. Is there a way to do this query in a batch or scripted way so that I know which gene corresponds to each of  the transcripts. I considered using the proteinID field in this table to merge together the appropriate information about my list of genes. However the FAQ for UCSC Genome Bioinformatics seems to advise against using this protein ID. I am not sure how I can translate the protein ID to the UniProt accession number.
> 
> The other piece of information I have on my list of interesting genes is an accession number that begins with NM. I am equally confused by the display of RefSeqs in transcripts and products in the ncbi database. There is some text on this topic in the following link:
> 
> http://www.ncbi.nlm.nih.gov/entrez/query/static/help/genefaq.html
> 
> I realize that you probably can not comment on the ncbi resources. 
>  My main goal is to obtain the number of exons and the exon boundaries for a list of genes. I would appreciate any suggestions or advice. Perhaps one of the other tables will get me the information I need.
> 
> 
> Thank you,
> 
> Donna 
> _______________________________________________
> Genome maillist  -  Genome at soe.ucsc.edu
> http://www.soe.ucsc.edu/mailman/listinfo/genome



More information about the Genome mailing list