[Genome] repeats in bosTau2.0 vs bosTau3.0

Heather Trumbower heather at soe.ucsc.edu
Mon Jun 4 10:38:10 PDT 2007


Joao:

My colleague Rachel has forwarded this inquiry to me, I'll be working with 
you going forward.  Thanks very much for your report.

We always use the -species and the -s (sensitive) parameters to 
RepeatMasker.   We don't use the -nolow option.

I obtained the Ensembl bosTau3 repeats from ftp.ensembl.org in the file 
pub/release-44/bos_taurus_44_3a/data/mysql/bos_taurus_core_44_3a/repeat_feature.txt.table.gz.
I used the bos_taurus_core_44_3a.sql file from the same directory to get 
the description of the repeat_feature table:

    `repeat_feature_id` int(10) unsigned NOT NULL auto_increment,
    `seq_region_id` int(10) unsigned NOT NULL default '0',
    `seq_region_start` int(10) unsigned NOT NULL default '0',
    `seq_region_end` int(10) unsigned NOT NULL default '0',
    `seq_region_strand` tinyint(1) NOT NULL default '1',
    `repeat_start` int(10) NOT NULL default '0',
    `repeat_end` int(10) NOT NULL default '0',
    `repeat_consensus_id` int(10) unsigned NOT NULL default '0',
    `analysis_id` smallint(5) unsigned NOT NULL default '0',
    `score` double default NULL,

Next, I obtained the seq_region.txt.table.gz file, and used that to 
translate from contigs to chromosomes, using the content of our gold 
table.

I loaded this into a track called EnsRep, in my development browser at 
http://hgwdev-heather.cse.ucsc.edu.

To review it, I've been walking through chr1 and chr29, one contig at a 
time.   What I see is definitely a concern.  For the first few contigs 
(chr1 first 3, chr29 first 6), the two tracks are generally concordant. 
However, past that, they become wildly different.   The fact that the 
first few look okay gives me confidence that I've translated the ensembl 
repeats correctly.

Could you take a look at the EnsRep track in the bosTau3 browser at 
http://hgwdev-heather.cse.ucsc.edu, and confirm that it matches the data 
that you are working with?   Please let me know, thanks, that will help me 
decide what is the next best step.   One thing I'm strongly considering is 
rerunning repeatmasker with -nolow to see if I get results that match 
Ensembl much more closely.

Heather Trumbower
UCSC Genome Bioinformatics Group



On Thu, 31 May 2007, João Fadista wrote:

> Dear Rachel Harte,
>
> Thanks for the useful reply. I have now compared the number of repetitive events in chr1
> from Ensembl against the ones from UCSC genome browser:
>
> - For bosTau2, UCSC has 164666 repetitive events while ENSEMBL has 209523
> - For bosTau3, UCSC has 272914 repetitive events while ENSEMBL has 374749.
>
> My questions is now why are the numbers between ENSEMBL and UCSC so different?
> I asked the same question to the people at Ensembl and they said the following:
>
> "The only thing I can say is that Ensembl ran Repeatmasker with the parameters -nolow
> -species cow using the repbase from 20050129. For btau3 also the 
> parameter -s (0-5% more sensitive) was used. I don't know what UCSC 
> exactly does."
>
>
>
> Kind regards,
> João Fadista
>
>
> ________________________________
>
> De: Rachel Harte [mailto:hartera at soe.ucsc.edu]
> Enviada: qui 31-05-2007 18:17
> Para: João Fadista
> Cc: genome at soe.ucsc.edu
> Assunto: Re: [Genome] repeats in bosTau2.0 vs bosTau3.0
>
>
>
> Hello Joao,
>
> I took a look at the repeats in chr1 for bosTau2 and bosTau3 and I get
> these results when using one of our programs:
>
> For bosTau2:
>> faSize chr1.fa
> 102834029 bases (20863006 N's 81971023 real 45985884 upper 35985139 lower)
> in 1 sequences in 1 files
> %34.99 masked total, %43.90 masked real
>
> For bosTau3:
> faSize chr1.fa
> 146199855 bases (9578504 N's 136621351 real 71742580 upper 64878771 lower)
> in 1 sequences in 1 files
> %44.38 masked total, %47.49 masked real
>
> Note also that there is a difference in size between chr1 of bosTau2
> and bosTau3 which could contribute to differences in the amount of repeats
> in chr1 between the two assemblies.
>
> So when you look at the percentage of bases overall that are masked for
> chr1 of each of the assemblies (bosTau2 and bosTau3), the amount of masked
> sequence is quite different. However, if you just look at the percentage
> of non-N sequence (so this is excluding all the gaps in the sequence),
> then the percentage of repeats is very similar. So the difference in
> resuls that you are finding are due to there being a larger amount of gaps
> (Ns) in the sequence for chr1 of bosTau2 (20,863,006 Ns) compared to chr1
> of bosTau3 (9,578,504 Ns).
>
> I hope that this helps you. Please let us know if you have further
> questions.
>
> Rachel
>
> Rachel Harte
> UCSC Genome Bioinformatics Group
> http://genome.ucsc.edu <http://genome.ucsc.edu/>
>
>
> On Thu, 31 May 2007, [iso-8859-1] João Fadista wrote:
>
>> Hello,
>>
>> I would like to ask you why do I get such a difference between the length of repeats in chr1 of the cow genome for the versions 2.0 and 3.0:
>>
>> - In bosTau2.0 the length of chr1 is 102,834,029 bp with 36,085,051 bp
>> of repeats (35.1%)
>> - In bosTau3.0 the length of chr1 is 146,199,855 bp with 65,482,014 bp of
>> repeats (44.8%)
>>
>> To retrieve the position of the repeats I used the group "Variation and Repeats" and the track "Repeat Masker".
>>
>>
>>
>> Best regards
>>
>> João Fadista
>> Ph.d. student
>>
>>
>>
>>        UNIVERSITY OF AARHUS
>> Faculty of Agricultural Sciences
>> Dept. of Genetics and Biotechnology
>> Blichers Allé 20, P.O. BOX 50
>> DK-8830 Tjele
>>
>> Phone:         +45 8999 1900
>> Direct:        +45 8999 1900
>> E-mail:        Joao.Fadista at agrsci.dk <mailto:Joao.Fadista at agrsci.dk>
>> Web:   www.agrsci.org <http://www.agrsci.org/>
>> ________________________________
>>
>> News and news media <http://www.agrsci.org/navigation/nyheder_og_presse> .
>>
>> This email may contain information that is confidential. Any use or publication of this email without written permission from Faculty of Agricultural Sciences is not allowed. If you are not the intended recipient, please notify Faculty of Agricultural Sciences immediately and delete this email.
>>
>> _______________________________________________
>> Genome maillist  -  Genome at soe.ucsc.edu
>> http://www.soe.ucsc.edu/mailman/listinfo/genome
>>
>
>
>
> _______________________________________________
> Genome maillist  -  Genome at soe.ucsc.edu
> http://www.soe.ucsc.edu/mailman/listinfo/genome
>


More information about the Genome mailing list