[Genome] repeats in bosTau2.0 vs bosTau3.0
João Fadista
Joao.Fadista at agrsci.dk
Mon Jun 4 13:24:42 PDT 2007
Dear Heather Trumbower,
Thank you very much for taking care of this "problem". I think that perhaps it is better if I forward to you the file that the people at Ensembl sent to me. Therefore in attachment I put the repeats file for chr1 bosTau3.0 and if you want I can also send you the file from chr1 bosTau2.0.
Kind regards,
Joao Fadista
-----Original Message-----
From: Heather Trumbower [mailto:heather at soe.ucsc.edu]
Sent: Monday, June 04, 2007 7:38 PM
To: João Fadista
Cc: genome at soe.ucsc.edu
Subject: Re: [Genome] repeats in bosTau2.0 vs bosTau3.0
Joao:
My colleague Rachel has forwarded this inquiry to me, I'll be working with you going forward. Thanks very much for your report.
We always use the -species and the -s (sensitive) parameters to
RepeatMasker. We don't use the -nolow option.
I obtained the Ensembl bosTau3 repeats from ftp.ensembl.org in the file pub/release-44/bos_taurus_44_3a/data/mysql/bos_taurus_core_44_3a/repeat_feature.txt.table.gz.
I used the bos_taurus_core_44_3a.sql file from the same directory to get the description of the repeat_feature table:
`repeat_feature_id` int(10) unsigned NOT NULL auto_increment,
`seq_region_id` int(10) unsigned NOT NULL default '0',
`seq_region_start` int(10) unsigned NOT NULL default '0',
`seq_region_end` int(10) unsigned NOT NULL default '0',
`seq_region_strand` tinyint(1) NOT NULL default '1',
`repeat_start` int(10) NOT NULL default '0',
`repeat_end` int(10) NOT NULL default '0',
`repeat_consensus_id` int(10) unsigned NOT NULL default '0',
`analysis_id` smallint(5) unsigned NOT NULL default '0',
`score` double default NULL,
Next, I obtained the seq_region.txt.table.gz file, and used that to translate from contigs to chromosomes, using the content of our gold table.
I loaded this into a track called EnsRep, in my development browser at http://hgwdev-heather.cse.ucsc.edu.
To review it, I've been walking through chr1 and chr29, one contig at a
time. What I see is definitely a concern. For the first few contigs
(chr1 first 3, chr29 first 6), the two tracks are generally concordant.
However, past that, they become wildly different. The fact that the
first few look okay gives me confidence that I've translated the ensembl repeats correctly.
Could you take a look at the EnsRep track in the bosTau3 browser at http://hgwdev-heather.cse.ucsc.edu, and confirm that it matches the data
that you are working with? Please let me know, thanks, that will help me
decide what is the next best step. One thing I'm strongly considering is
rerunning repeatmasker with -nolow to see if I get results that match Ensembl much more closely.
Heather Trumbower
UCSC Genome Bioinformatics Group
On Thu, 31 May 2007, João Fadista wrote:
> Dear Rachel Harte,
>
> Thanks for the useful reply. I have now compared the number of
> repetitive events in chr1 from Ensembl against the ones from UCSC genome browser:
>
> - For bosTau2, UCSC has 164666 repetitive events while ENSEMBL has
> 209523
> - For bosTau3, UCSC has 272914 repetitive events while ENSEMBL has 374749.
>
> My questions is now why are the numbers between ENSEMBL and UCSC so different?
> I asked the same question to the people at Ensembl and they said the following:
>
> "The only thing I can say is that Ensembl ran Repeatmasker with the
> parameters -nolow -species cow using the repbase from 20050129. For
> btau3 also the parameter -s (0-5% more sensitive) was used. I don't
> know what UCSC exactly does."
>
>
>
> Kind regards,
> João Fadista
>
>
> ________________________________
>
> De: Rachel Harte [mailto:hartera at soe.ucsc.edu]
> Enviada: qui 31-05-2007 18:17
> Para: João Fadista
> Cc: genome at soe.ucsc.edu
> Assunto: Re: [Genome] repeats in bosTau2.0 vs bosTau3.0
>
>
>
> Hello Joao,
>
> I took a look at the repeats in chr1 for bosTau2 and bosTau3 and I get
> these results when using one of our programs:
>
> For bosTau2:
>> faSize chr1.fa
> 102834029 bases (20863006 N's 81971023 real 45985884 upper 35985139
> lower) in 1 sequences in 1 files
> %34.99 masked total, %43.90 masked real
>
> For bosTau3:
> faSize chr1.fa
> 146199855 bases (9578504 N's 136621351 real 71742580 upper 64878771
> lower) in 1 sequences in 1 files
> %44.38 masked total, %47.49 masked real
>
> Note also that there is a difference in size between chr1 of bosTau2
> and bosTau3 which could contribute to differences in the amount of
> repeats in chr1 between the two assemblies.
>
> So when you look at the percentage of bases overall that are masked
> for
> chr1 of each of the assemblies (bosTau2 and bosTau3), the amount of
> masked sequence is quite different. However, if you just look at the
> percentage of non-N sequence (so this is excluding all the gaps in the
> sequence), then the percentage of repeats is very similar. So the
> difference in resuls that you are finding are due to there being a
> larger amount of gaps
> (Ns) in the sequence for chr1 of bosTau2 (20,863,006 Ns) compared to
> chr1 of bosTau3 (9,578,504 Ns).
>
> I hope that this helps you. Please let us know if you have further
> questions.
>
> Rachel
>
> Rachel Harte
> UCSC Genome Bioinformatics Group
> http://genome.ucsc.edu <http://genome.ucsc.edu/>
>
>
> On Thu, 31 May 2007, [iso-8859-1] João Fadista wrote:
>
>> Hello,
>>
>> I would like to ask you why do I get such a difference between the length of repeats in chr1 of the cow genome for the versions 2.0 and 3.0:
>>
>> - In bosTau2.0 the length of chr1 is 102,834,029 bp with 36,085,051
>> bp of repeats (35.1%)
>> - In bosTau3.0 the length of chr1 is 146,199,855 bp with 65,482,014
>> bp of repeats (44.8%)
>>
>> To retrieve the position of the repeats I used the group "Variation and Repeats" and the track "Repeat Masker".
>>
>>
>>
>> Best regards
>>
>> João Fadista
>> Ph.d. student
>>
>>
>>
>> UNIVERSITY OF AARHUS
>> Faculty of Agricultural Sciences
>> Dept. of Genetics and Biotechnology
>> Blichers Allé 20, P.O. BOX 50
>> DK-8830 Tjele
>>
>> Phone: +45 8999 1900
>> Direct: +45 8999 1900
>> E-mail: Joao.Fadista at agrsci.dk <mailto:Joao.Fadista at agrsci.dk>
>> Web: www.agrsci.org <http://www.agrsci.org/>
>> ________________________________
>>
>> News and news media <http://www.agrsci.org/navigation/nyheder_og_presse> .
>>
>> This email may contain information that is confidential. Any use or publication of this email without written permission from Faculty of Agricultural Sciences is not allowed. If you are not the intended recipient, please notify Faculty of Agricultural Sciences immediately and delete this email.
>>
>> _______________________________________________
>> Genome maillist - Genome at soe.ucsc.edu
>> http://www.soe.ucsc.edu/mailman/listinfo/genome
>>
>
>
>
> _______________________________________________
> Genome maillist - Genome at soe.ucsc.edu
> http://www.soe.ucsc.edu/mailman/listinfo/genome
>
More information about the Genome
mailing list