[Genome] [Fwd: Re: find 5' and 3' regions. (fwd)]
Brooke Rhead
rhead at soe.ucsc.edu
Wed Oct 25 18:59:21 PDT 2006
Hello William,
Rachel forwarded your question to me. I think I understand what you
want to retrieve from the Genome Browser: the start and end positions
of the region that is 2000 bases upstream and 2000 bases downstream of
each gene. If this is not what you are trying to do, please let me know.
The transcription start and end positions of each gene can be retrieved
using the Table Browser tool. Then you can use your own tools to add
2000 to or subtract 2000 from each value.
To get to the Table Browser, click on the "Tables" link in the blue bar
at the top of the page. Then select the clade, genome and assembly you
wish to use (probably vertebrate, human, Mar. 2006). Select the "Gene
and Gene Prediction Tracks" group. Now you have a choice to make. The
track you select depends on which gene set you wish to use. It sounds
like you might already be using the refSeq gene set. If so, choose
"refSeq Genes". Choose the table at the top of the list in the
drop-down menu (in the case of refSeq Genes, it is called "refGene"). To
get an idea of the type of data contained in the refGene table, hit the
"describe table schema" button.
Note that the transcription start positions in our tables are one base
less than the start positions displayed in the Genome Browser. See an
explanation here:
http://genome.ucsc.edu/FAQ/FAQtracks#tracks1
For the "region" option choose "genome" if you want to retrieve all of
the information at once, or choose "position" and enter a genomic range,
such as "chr1". Then choose "selected fields from primary and related
tables" as the output format, and enter a name for the file you will
download.
When you hit "get output" you will get a page where you can select the
fields that will be retrieved from the table. Select the fields you
wish to retrieve (probably name, chrom, strand, txStart, txEnd) and hit
"get output".
Once you have downloaded the file, you can subtract 2000 from each
txStart position and add 2000 to each txEnd position to get the values
you are looking for. Note that for genes on the '+' strand, the txStart
value is the 5' end of the gene and the txEnd is the 3' end of the gene.
For genes on the '-' strand, the opposite is true: the txStart value
is the 3' end, and the txEnd value is the 5' end.
I hope this procedure helps you get the information you need.
--
Brooke Rhead
UCSC Genome Bioinformatics Group
---------- Forwarded message ----------
Date: Wed, 25 Oct 2006 06:48:35 -0700
From: LIANHE SHAO <lshao2 at jhmi.edu>
To: Rachel Harte <hartera at soe.ucsc.edu>
Subject: Re: [Genome] find 5' and 3' regions.
Rachel,
Thanks for your reply.
We have over thousands of genes to look at.
I want to find the start and stop positions of 5' and 3' region of each
gene outside of the transcription start and end site.
>From your refFlex table, I can locate the transcription start and end postions, but I can not find the 5' promoter's starting position or 3' UTR ending postion.
Some poeple said it will be difficult to locate them, because many genes
have no such info yet. They recommended to use 2000 bps as the length of
promoter region and utr region. Say gene ABCC3, transcription starts at
46067227, ends at 46124062, then its 5' region starts at 46065227 and 3'
ends at 46126062.
I know it is not accurate, because each gene has different length of 5'
and 3' regions, well, I just can not find such info in your site.
BTW, I am not a biologist :-)
Hope it is clear this time.
Thanks a lot.
William
----- Original Message -----
From: Rachel Harte <hartera at soe.ucsc.edu>
Date: Tuesday, October 24, 2006 4:14 pm
Subject: Re: [Genome] find 5' and 3' regions.
To: LIANHE SHAO <lshao2 at jhmi.edu>
> William,
>
> Please would you clarify exactly what you are looking for. Do you want
> the
> regions that are the 5' and 3' UTR regions of the mRNA transcript that
> is
> created from a gene? Or do you want the upstream (5') and downstream
> (3')
> regions outside of the transcription start site for genes. If this is
> the
> case, what are you trying to locate, is it the promoter region or a
> specified
> length of region 5' or 3' to the transcription start site of a gene.
> This
> information will help me to help you.
>
> Thanks.
>
> Rachel
>
> Tue, 24 Oct 2006, LIANHE SHAO wrote:
>
>> Hello,
>> I have a qestion:
>> I want to find out the 5' and 3' regions of all the genes on, say,
> chromosome 1. How can I do it?
>> When I use Genomic Sequence Retrieval tool, as displayed below, it
> ask me for promoter/upstream by 1000 bases,
>> Downstream by 1000 bases.
>> I think it is too arbitrary.
>> Do you have a way to locate these region precisely?
>> Do you have a tool to do batch job? Because there are so many genes
> on a chromosome, it is almost impossible to do it one by one manually.
>>
>>
>> Sequence Retrieval Region Options:
>> Promoter/Upstream by 1000 bases
>> 5' UTR Exons
>> CDS Exons
>> 3' UTR Exons
>> Introns
>> Downstream by 1000 bases
>> One FASTA record per gene.
>> One FASTA record per region (exon, intron, etc.) with extra bases
> upstream (5') and extra downstream (3')
>> Split UTR and CDS parts of an exon into separate FASTA records
>> Note: if a feature is close to the beginning or end of a chromosome
> and upstream/downstream bases are added, they may be truncated in
> order to avoid extending past the edge of the chromosome.
>>
>>
>>
>> Regards,
>> William
>> _______________________________________________
>> Genome maillist - Genome at soe.ucsc.edu
>>
>>
>
> --
> Rachel Harte
> UCSC Genome Bioinformatics Group
>
More information about the Genome
mailing list