[Genome] Hello all.

Maximilian Haeussler maximilianh at gmail.com
Sat Apr 21 11:11:16 PDT 2007


I am not from UCSC, but there are a couple of tools that can search
with mismatches. A well-known one is patmatch, a perl-wrapper around
nrgrep, available from arabidopsis.org. You can easily convert its
output to bed-format with awk.

cheers,
max

On 13/04/07, Wang, Xin <wang60 at iupui.edu> wrote:
> Thanks a lot for your help.
> But I come across such problem that the repeat sequences are not exactly the same. 1~3 bp in these repeat sequences are different. Can I set a threshold,90% for example, to filter out the similar repeat sequences using  findMotif and maskOutFa?
>
>
> -----Original Message-----
> From: Rachel Harte [mailto:hartera at soe.ucsc.edu]
> Sent: Thu 4/12/2007 6:28 PM
> To: Wang, Xin
> Cc: genome at soe.ucsc.edu
> Subject: Re: [Genome] Hello all.
>
> Hello Xin Wang,
>
> The -qmask option just tells Blat that sequence in lower or upper case
> should be masked out. In order to mask out the repeat sequences, you will
> need to indicate which sequence must be considered as masked sequence. To do
> this, you will need to change these repeats in your query sequence to
> lower case letters (if your query is in upper case as in your example).
> You could use perl or sed to change the repeats to lower case in your
> query sequence file e.g.:
>
> sed -e 's/ATTGAGGG/attgaggg/g' query.fa > query.masked.fa
>
> Then you can use the -qmask=lower option to tell Blat to mask out the
> sequence that is in lower case in your query sequence.
>
> I hope that this helps you. Please let us know if you have further
> questions.
>
> Rachel
>
> Rachel Harte
> UCSC Genome Bioinformatics Group
> http://genome.ucsc.edu
>
>
> On Thu, 12 Apr 2007, Wang, Xin wrote:
>
> > Hello all,
> >
> > I'm using BLAT to match a large amount of sequences against Human Genome,
> > which turns out to be very fast. But now I have a problem that I don't
> > know how to solve it by myself.
> >
> > There is a opional parameter called -qMask. As I know, it is for masking
> > out repeat query sequences. My question is how can I mask out some
> > specific sequences. For example, if I want to mask out "ATTGAGGG" from
> > the query sequence
> > "ATTGAGGGACCGGATTNCCGGGGGGGGAAAACCCTCCACCCCCGGGCCCCCGGGACCACGGGACAGGATTGACAGATTGATAGCT",
> how can I do it using BLAT. Thank you!
> >
> > _______________________________________________
> > Genome maillist  -  Genome at soe.ucsc.edu
> > http://www.soe.ucsc.edu/mailman/listinfo/genome
> >
>
>
> _______________________________________________
> Genome maillist  -  Genome at soe.ucsc.edu
> http://www.soe.ucsc.edu/mailman/listinfo/genome
>


-- 
Maximilian Haeussler,
skype: maximilianhaeussler


More information about the Genome mailing list