[Genome] [Fwd: Re: find 5' and 3' regions. (fwd)]

Brooke Rhead rhead at soe.ucsc.edu
Wed Oct 25 18:59:21 PDT 2006


Hello William,

Rachel forwarded your question to me.  I think I understand what you 
want to retrieve from the Genome Browser:  the start and end positions 
of the region that is 2000 bases upstream and 2000 bases downstream of 
each gene.  If this is not what you are trying to do, please let me know.

The transcription start and end positions of each gene can be retrieved 
using the Table Browser tool.  Then you can use your own tools to add 
2000 to or subtract 2000 from each value.

To get to the Table Browser, click on the "Tables" link in the blue bar 
at the top of the page.  Then select the clade, genome and assembly you 
wish to use (probably vertebrate, human, Mar. 2006).  Select the "Gene 
and Gene Prediction Tracks" group.  Now you have a choice to make.  The 
track you select depends on which gene set you wish to use.  It sounds 
like you might already be using the refSeq gene set.  If so, choose 
"refSeq Genes".  Choose the table at the top of the list in the 
drop-down menu (in the case of refSeq Genes, it is called "refGene"). To 
get an idea of the type of data contained in the refGene table, hit the 
"describe table schema" button.

Note that the transcription start positions in our tables are one base 
less than the start positions displayed in the Genome Browser.  See an 
explanation here:
http://genome.ucsc.edu/FAQ/FAQtracks#tracks1

For the "region" option choose "genome" if you want to retrieve all of 
the information at once, or choose "position" and enter a genomic range, 
such as "chr1".  Then choose "selected fields from primary and related 
tables" as the output format, and enter a name for the file you will 
download.

When you hit "get output" you will get a page where you can select the 
fields that will be retrieved from the table.  Select the fields you 
wish to retrieve (probably name, chrom, strand, txStart, txEnd) and hit 
"get output".

Once you have downloaded the file, you can subtract 2000 from each 
txStart position and add 2000 to each txEnd position to get the values 
you are looking for.  Note that for genes on the '+' strand, the txStart 
value is the 5' end of the gene and the txEnd is the 3' end of the gene. 
  For genes on the '-' strand, the opposite is true: the txStart value 
is the 3' end, and the txEnd value is the 5' end.

I hope this procedure helps you get the information you need.

-- 
Brooke Rhead
UCSC Genome Bioinformatics Group



---------- Forwarded message ----------
Date: Wed, 25 Oct 2006 06:48:35 -0700
From: LIANHE SHAO <lshao2 at jhmi.edu>
To: Rachel Harte <hartera at soe.ucsc.edu>
Subject: Re: [Genome] find 5' and 3' regions.

Rachel,
Thanks for your reply.
We have over thousands of genes to look at.
I want to find the start and stop positions of 5' and 3' region of each 
gene outside of the transcription start and end site.
>From your refFlex table, I can locate the transcription start and end postions, but I can not find the 5' promoter's starting position or 3' UTR ending postion.
Some poeple said it will be difficult to locate them, because many genes 
have no such info yet. They recommended to use 2000 bps as the length of 
promoter region and utr region. Say gene ABCC3, transcription starts at 
46067227, ends at 46124062, then its 5' region starts at 46065227 and 3' 
ends at 46126062.
I know it is not accurate, because each gene has different length of 5' 
and 3' regions, well, I just can not find such info in your site.

BTW, I am not a biologist :-)

Hope it is clear this time.

Thanks a lot.
William

----- Original Message -----
From: Rachel Harte <hartera at soe.ucsc.edu>
Date: Tuesday, October 24, 2006 4:14 pm
Subject: Re: [Genome] find 5' and 3' regions.
To: LIANHE SHAO <lshao2 at jhmi.edu>


> William,
>
> Please would you clarify exactly what you are looking for. Do you want
> the
> regions that are the 5' and 3' UTR regions of the mRNA transcript that
> is
> created from a gene? Or do you want the upstream (5') and downstream
> (3')
> regions outside of the transcription start site for genes. If this is
> the
> case, what are you trying to locate, is it the promoter region or a
> specified
> length of region 5' or 3' to the transcription start site of a gene.
> This
> information will help me to help you.
>
> Thanks.
>
> Rachel
>
> Tue, 24 Oct 2006, LIANHE SHAO wrote:
>
>> Hello,
>> I have a qestion:
>> I want to find out the 5' and 3' regions of all the genes on, say,
> chromosome 1. How can I do it?
>> When I use Genomic Sequence Retrieval tool, as displayed below, it
> ask me for promoter/upstream by 1000 bases,
>> Downstream by 1000 bases.
>> I think it is too arbitrary.
>> Do you have a way to locate these region precisely?
>> Do you have a tool to do batch job? Because there are so many genes
> on a chromosome, it is almost impossible to do it one by one manually.
>>
>>
>> Sequence Retrieval Region Options:
>> Promoter/Upstream by 1000 bases
>> 5' UTR Exons
>> CDS Exons
>> 3' UTR Exons
>> Introns
>> Downstream by 1000 bases
>> One FASTA record per gene.
>> One FASTA record per region (exon, intron, etc.) with extra bases
> upstream (5') and extra downstream (3')
>>    Split UTR and CDS parts of an exon into separate FASTA records
>> Note: if a feature is close to the beginning or end of a chromosome
> and upstream/downstream bases are added, they may be truncated in
> order to avoid extending past the edge of the chromosome.
>>
>>
>>
>> Regards,
>> William
>> _______________________________________________
>> Genome maillist  -  Genome at soe.ucsc.edu
>>
>>
>
> --
> Rachel Harte
> UCSC Genome Bioinformatics Group
>


More information about the Genome mailing list