[Genome] possible reasons for sequence masking

Kayla Smith kayla at soe.ucsc.edu
Fri Feb 9 15:14:36 PST 2007


Vanessa,

In the assembly sequence that we make available for download, we mask
repeats only.  However, if you click to get sequence for a 
gene/transcript, you have the option of having coding regions in upper 
case and introns in lower case -- and in that case it's not masking, 
it's just the use of case for a different purpose.  And if there are 
different splice forms of a gene, the sequences returned will have upper 
and lower case in different places.

I hope that helps to clear up the "masking" you might be seeing.  Please 
don't hesitate to contact us again if you require more assistance.

Kayla Smith
UCSC Genome Bioinformatics Group


Vanessa Bauer wrote:
> Hello,
> 
> Sorry to bother you but I was unsuccessful answering the following 
> question from browsing your web site.  In short, I am curious if 
> there are various reasons for sequences to be masked in an alignment. 
> We have downloaded introns for a specific set of loci (roughly 8500) 
> for  Drosophila genomes from the Comparative Genomics "group" 
> (multiz15way alignments).  We our now attempting to get this data in 
> the format that we want (i.e.,  each alignment block linked to its 
> corresponding transcript and to mask any part of a intron that is 
> also, at times, coding sequence) using the dm2 annotation.  We have 
> noticed upper and lower case letters in the alignments.  While I did 
> notice that repeats are masked on the web site I was also wondering 
> if there is any other reason for masking.  More specifically, have 
> intron sequence that are also coding  (due to alternative splicing or 
> coding regions within introns of other coding regions) been masked?
> 
> thanks, Vanessa
> _______________________________________________
> Genome maillist  -  Genome at soe.ucsc.edu
> http://www.soe.ucsc.edu/mailman/listinfo/genome



More information about the Genome mailing list