[Genome] how to replicate human mRNA tracks
Pufeng Du
dpf05 at mails.tsinghua.edu.cn
Wed Apr 9 18:11:22 PDT 2008
Thank you. I have got the key problem. I forget to sort the psl file before
the pslCDNAFilter. Now the results are very similar, even I keep the other
parameters unchanged.
Pufeng
---
Pufeng Du, Phd. Candidate
Bioinformatics Division
Tsinghua National Laboratory of Informatics Science Technology
Department of Automation
Tsinghua University
Beijing 100084, China
Tel: (86-10)62794294#813 (Office)
Email: dpf05 at mails.tsinghua.edu.cn
-----Original Message-----
From: Brooke Rhead [mailto:rhead at soe.ucsc.edu]
Sent: Wednesday, April 09, 2008 6:44 AM
To: Pufeng Du
Cc: genome at soe.ucsc.edu
Subject: Re: [Genome] how to replicate human mRNA tracks
Hello Pufeng,
The blat options that we use when making the mRNA tracks for human are:
-q=rna -fine -noHead -repeats=lower -ooc=11.ooc
(-t=dna is a default setting, so there is no difference between using
the option and not using the option in this case.) The main difference
here is that we use the 11.ooc file (more info here:
http://genome.ucsc.edu/FAQ/FAQblat#blat6), but this difference does not
explain the doubled psl size that you are getting.)
The pslCDnaFilter options that we use for human mRNA tracks are:
-minQSize=20 -minNonRepSize=16 -ignoreNs -bestOverlap -polyASizes
-globalNearBest=0.0025 -minId=0.95 -minCover=0.25
The biggest difference here is our use of -globalNearBest instead of
your -localNearBest. However, since you are only using the mRNA
sequences that we have already aligned to chrY, this might not make a
huge difference.
One thing that would explain a large number of psl results in your
filtered output is the lack of a sorting step between running blat and
running pslCDnaFilter. From the pslCDnaFilter usage statement:
WARNING: comparive filters requires that the input is sorted by
query name. The command: 'sort -k 10,10' will do the trick.
If your blat results were not sorted prior to filtering, there would be
a large number of alignments left in the filtered output.
If you have further questions, please feel free to write back to the
mailing list address (genome at soe.ucsc.edu).
--
Brooke Rhead
UCSC Genome Bioinformatics Group
Pufeng Du wrote:
> Hi,
>
> ?
>
> I have downloaded the mrna sequence and the genome sequences from UCSC, I
> also have the program blat and the pslCDNAfilter on my local machine.
>
> ?
>
> I tried to replicate the mrna track of human on chrY by using the
following
> steps.
>
> ?
>
> blat chrY.2bit chrY.fa ¡§Cq=rna ¡§Ct=dna ¡§Cfine ¡§Crepeats=lower
chrY.local.psl
>
> ?
>
> faPolyASizes chrY.fa chrY.polya
>
> ?
>
> pslCDnaFilter -minId=0.96 -minCover=0.25 -localNearBest=0.001 -minQSize=20
> -minNonRepSize=16 -ignoreNs -bestOverlap -polyASizes=chrY..polya
> chrY..local.psl chrY.filter.psl
>
> ?
>
> where chrY.fa contain all the mRNA sequence which is contained by the
> chrY.psl downloaded from Table browser. chrY.2bit is generated from the
> chromosome file by faToTwobit
>
> ?
>
> The chrY.filter.psl is about two times in size of the chrY.psl downloaded
> from Table browser. Is there any more filtering steps to create the psl
file
> which can be downloaded from Table Browser?
>
> ?
>
> ?
>
> Pufeng Du
>
> ---
>
> Pufeng Du, Phd. Candidate
> Bioinformatics Division
> Tsinghua National Laboratory of Informatics Science Technology
> Department of Automation
> Tsinghua University
> Beijing 100084, China
> Tel: (86-10)62794294#813 (Office)
>
> Email: HYPERLINK
> "mailto:dpf05 at mails.tsinghua.edu.cn"dpf05 at mails.tsinghua.edu.cn
>
> ?
>
>
> No virus found in this outgoing message.
> Checked by AVG.
> Version: 7.5.519 / Virus Database: 269.22.9/1364 - Release Date: 2008-4-7
> 18:38
>
>
>
> _______________________________________________
> Genome maillist - Genome at soe.ucsc.edu
> http://www.soe.ucsc.edu/mailman/listinfo/genome
No virus found in this incoming message.
Checked by AVG.
Version: 7.5.519 / Virus Database: 269.22.9/1364 - Release Date: 2008-4-7
18:38
No virus found in this outgoing message.
Checked by AVG.
Version: 7.5.519 / Virus Database: 269.22.9/1364 - Release Date: 2008-4-7
18:38
More information about the Genome
mailing list