[Genome] downloads for Variation and Repeats tracks
Archana Thakkapallayil
archanat at soe.ucsc.edu
Mon Apr 9 13:21:36 PDT 2007
Hello Dmitry,
The Simple Repeats track (generated by the TRF program) includes repeats
with period from 1 to 2000 -- repeats with small period are not excluded.
When we soft-mask (lower-case) repetitive portions of the assembly
sequence for alignment, we filter Simple Repeats to keep the repeats
with period of 12 or less, to avoid masking repetitive protein coding
domains of real genes. The assembly sequence is soft-masked with the
output of RepeatMasker and the filtered TRF output. For more
information about our sequence masking procedure, please see this
previously answered mail list question:
http://www.soe.ucsc.edu/pipermail/genome/2006-August/011426.html
You could retrieve them using the Table Browser. Make the following
selection in the Table Browser:
1. clade: vertebrate
2. genome: human
3. assembly: Mar. 2006
4. group: Variation and Repeats
5. track: Simple Repeats
6. table: simpleRepeat
7. region: genome
8. click on "filter: create" button and select "period <=" and enter
"15" in the text box and then hit "submit" button.
9. Click on "get output".
I hope this information helps you. Please let us know if you have
further questions.
Regards,
Archana
UCSC Genome Bioinformatics Group
Dmitry wrote:
> Hi All,
>
> I figured out the CXCL2 part.
> This is a negatively aligned gene so the 22xCA became 22XGT, which is
> there.
> But the second question stands.
> Why tandem repeats less than 15 elements are not reported and how to
> retrieve them?
>
> Thank you
>
> Dmitry
>
> -----Original Message-----
> From: Dmitry [mailto:dgrigor1 at jhmi.edu]
> Sent: Monday, April 09, 2007 12:03 PM
> To: 'Ann Zweig'
> Cc: 'genome at soe.ucsc.edu'
> Subject: RE: [Genome] downloads for Variation and Repeats tracks
>
> Hi All,
>
> The provided Variation and Repeats datasets are very useful, thank you.
> But unfortunately I noticed that there a lot of tandem repeats that are
> not reported.
>
> For example in the 2000 bases upstream region of CXCL2 gene there is
> 22xCA
> Chr4 75180980
> GTTGAAACACACACACACACACACACACACACACACACACACACACACACACGTGATA ,
>
> which I couldn't find in nether Trf or Microsatellite datasets.
>
> Are those repeats skipped or masked purposely?
> If so where can I find unmasked datasets?
> Also you are not reporting tandem repeats that have less than 12
> components.
> Is there a way to retrieve those as well?
>
> Thank you for your help
>
> Dmitry
>
> -----Original Message-----
> From: Ann Zweig [mailto:ann at soe.ucsc.edu]
> Sent: Thursday, April 05, 2007 7:35 PM
> To: Dmitry
> Cc: genome at soe.ucsc.edu
> Subject: Re: [Genome] downloads for Variation and Repeats tracks
>
> Hello Dmitry,
>
> You can download the MySQL tables underlying the data from our
> download server.
> Follow the 'Downloads' link from the blue navigation bar on the left
> side of
> the home page. From there, press Human, then under hg18 (or the
> assembly you
> are interested in) press Annotation Database. This directory contains a
> file
> for each table in the database.
>
> You will need to determine the name of the table that supports
> the track you
> are interested in. To do this in the genome browser, simply press the
> hyperlink
> for the name of the track in the track controls under the display. In
> the URL
> on this page, you will see (usually at the very end), "g=abc". The
> 'abc' is the
> name of the table that underlies this track. This is the table you will
> want to
> download from the Download server.
>
> For example, for the SNP track in the hg18 browser, the table
> name is snp126.
>
>
> Regards,
>
> ----------
> Ann Zweig
> UCSC Genome Bioinformatics Group
> http://genome.ucsc.edu
>
>
> Please feel free to search the Genome mailing list archives by visiting
> our home page, clicking on "Contact Us", then typing a word or phrase
> into the search box. On that same page
> (http://genome.ucsc.edu/contacts.html), you can subscribe to the Genome
> mailing list.
>
>
>
>
> Dmitry wrote:
>
>> Hi,
>>
>> Where can I find and download psl files for the Variation and Repeats
>> tracks?
>>
>> Thank you
>>
>> Dmitry
>> _______________________________________________
>> Genome maillist - Genome at soe.ucsc.edu
>> http://www.soe.ucsc.edu/mailman/listinfo/genome
>>
> _______________________________________________
> Genome maillist - Genome at soe.ucsc.edu
> http://www.soe.ucsc.edu/mailman/listinfo/genome
>
More information about the Genome
mailing list