[Genome] gfClient maxIntron

Galt Barber galt at soe.ucsc.edu
Thu Jul 12 14:38:04 PDT 2007


Summary: you can do with with Stand-alone BLAT using -fine

 -fine  For high quality mRNAs look harder for small initial and
        terminal exons.  Not recommended for ESTs

Your case clearly shows BLAT does NOT use
only canonical splice junctions. Thanks for that.
I know that BLAT does in fact model the junctions to find
the optimal exon boundaries.

I tried it with gfServer and gfClient and failed.
I tried it with standalone blat and failed.

Then I used standalone blat with -fine and succeeded.

Here are the details for anyone interested:

prompt> cat Julien.seq
>test_seq
AGTCAACCCTTCTGTAAATCACCTGCTGTGGTTATGATGCTCTGAGTTCA
ATACGTCTGAACCTTTGCTGTCTATGGATCTGCTCTAAACCTTATAGCCT
GCTTATGGGGGAAGAGAAATGGAAAAGAAAATGAAATAAATCAGCAGTTA
TGAGGCAGAGCCTAAGAGAACTATGGCAACATCAGGTGACTGTCCCAGAA
GTGAATCGCAG


twoBitToFa hg18.2bit:chr21 Julien.fa
faToTwoBit Julien.fa Julien.2bit -noMask

gfServer start localhost 3000 Julien.2bit -canStop&

gfClient -minScore=0 -minIdentity=0 -nohead -maxIntron=1000000 localhost \
3000 . Julien.seq Julien.psl

prompt> cat Julien.psl
114     0       0       0       0       0       0       0       +
test_seq        211     0       114  chr21    46944323        46094323
46094437        1       114,    0,      46094323,
97      0       0       0       0       0       0       0       +
test_seq        211     114     211  chr21    46944323        46881233
46881330        1       97,     114,    46881233,


This fails: the output looks just like yours, the two halves are not
chained together like we want them to be.

gfServer stop localhost 3000

Now trying stand-alone BLAT:

blat -minScore=0 -minIdentity=0 -noHead -maxIntron=1000000 Julien.2bit \
Julien.seq Julien.psl

This fails also, the psl has two alignments not chained together
just as before.

I then tried -fine:

blat -fine -minScore=0 -minIdentity=0 -noHead -maxIntron=10000000 \
 Julien.2bit Julien.seq Julien.psl

This works!
prompt> cat Julien.psl
114     0       0       0       0       0       0       0       +
test_seq        211     0       114  chr21    46944323        46094323
46094437        1       114,    0,      46094323,

I hope that helps!

-Galt


On Thu, 12 Jul 2007, Julien Lagarde wrote:

> Hi Galt,
> Thanks for your answer.
> I'm a bit confused now. If I understand you correctly blat would simply
> forbid non-canonical introns?
>
> Then how come this synthetic cDNA seq:
>
> CAGAGCTCTGCCTGAGATTCTTGGGGCCTGCACAGGAGGGCAGCCCCA
> GGATGCCCGCAGGGAATGCCAGCTCTGTGGGGCTGACCTCCAACAGGC
> CGTGCACAGATGGTGATGGGGGAGGGAGTGGCTCTCCGGCCCTGACGT
> GGCCCCCCGCCTGCCTTTCTGGACACCCCTGCTCTGCCGGCTCAGCAG
> CCCCAGCACCCGCTTCTGAGCTCTGGGTTCAAACAGAGATTAATCAGA
> GCGTCGCTCAACTGCACCCCAGCCCCAGGACCCGGCCACCATGTTGCA
> TCACCGCAGAGCCAAACATCAGGATGCTGCTGGAGGTGACTGTCCCGG
> TGCCGGTTCCCCTGTGGAGACCCCCCTGGCTGCTCTCAGCCTCACCCA
> ATGCCCTTCTTGGCCCTGCTGGACACAGAGCTGCAGACAGAGATGAAG
> GAGACAGCGGTAGCCGCAGGCATTACCCAATGCAACCCCCAGCCCCCG
> CCCCTGCCCACGTTGCATCACAGCCGACCCGAGGAGCCGCAGCTCCCA
> GAGGAAGACACGGTGGGGCCGATGCAGATGCAGCTGGAGATACAGCCT
> GGTCTCCCTCCTCTGGGTCAGGTCTCCGGCGCCCTCCCGCCGGCCTGC
> AGGGCTGCATTAGGATGGGGAGTTTGAGCTCAGTTAGAGACCAGCCCC
> AGAAACGCAGAGAGAGGGTGCAGCGCCG
>
> is correctly spliced "CAag.(6kb).agAA" by blat?
> cheers,
> julien
>
>
> Galt Barber wrote:
> > People are using the -maxIntron= successfully.
> >
> > Looking at the ends of your "exons", they don't look
> > like proper intron ends.
> >
> > AGag agAG
> >
> > should be
> > XXgt agXX
> >
> > The chainer is trying to chain exons together
> > to make a gene model.  It follows splice-junction
> > rules.
> >
> > You could artificially construct one by
> > taking an exon from one gene and an exon
> > from another gene far enough away, and
> > that way the chainer would see proper
> > intron ends.
> >
> > -Galt
> >
> >
> > On Thu, 12 Jul 2007, Julien Lagarde wrote:
> >
> >
> >> Hi genome,
> >>
> >> i'm trying to map exons separated by long introns with gfServer/gfClient
> >> (v.33) on hg17.
> >>
> >>
> >> unfortunately gfClient seems to ignore whatever maxIntron value i use.
> >> Here's an example:
> >>
> >> gfClient -minScore=0 -minIdentity=0 -nohead -maxIntron=1000000 localhost
> >> 3500 / test.seq test.psl
> >>
> >> test.seq is:
> >> AGTCAACCCTTCTGTAAATCACCTGCTGTGGTTATGATGCTCTGAGTTCA
> >> ATACGTCTGAACCTTTGCTGTCTATGGATCTGCTCTAAACCTTATAGCCT
> >> GCTTATGGGGGAAGAGAAATGGAAAAGAAAATGAAATAAATCAGCAGTTA
> >> TGAGGCAGAGCCTAAGAGAACTATGGCAACATCAGGTGACTGTCCCAGAA
> >> GTGAATCGCAG
> >>
> >> and corresponds to two "fake" (i.e. i made them up) exons separated by a
> >> 786kb intron.
> >>
> >> The resulting psl output is:
> >> 114     0       0       0       0       0       0       0       +
> >> test_long_intron        211     0       114     chr21   46944323
> >> 46094323        46094437        1       114,    0,      46094323,
> >> 97      0       0       0       0       0       0       0       +
> >> test_long_intron        211     114     211     chr21   46944323
> >> 46881233        46881330        1       97,     114,    46881233,
> >>
> >> so basically blat finds the two blocks but does not join them together.
> >>
> >> What do you think?
> >>
> >> thanks a lot,
> >> julien
> >>
> >> --
> >> -----------------------------------------------------
> >> Julien Lagarde
> >> Genome Bioinformatics Research Group
> >> Centre de Regulacio Genomica
> >> Grup de Recerca en Informatica Biomedica (IMIM)
> >> Dr. Aiguader, 88 				(+34) 93 3160166 ph
> >> E-08003 Barcelona				(+34) 93 3160099 fax
> >> http://genome.imim.es
> >> --------------------------------
> >>
> >>
> >>
> >> --------
> >> La informació continguda en aquest missatge i en qualsevol fitxer
> >> adjunt és confidencial, privada i d'ús exclusiu per al destinatari.
> >> Si no és la persona a la qual anava dirigida aquesta informació, si us
> >> plau, notifiqui immediatament l'enviament erroni al remitent i esborri
> >> el missatge. Qualsevol còpia, divulgació, distribució o utilització no
> >> autoritzada d'aquest correu electrònic i dels seus adjunts estÃ
> >> prohibida en virtut de la legislació vigent.
> >>
> >> La información contenida en este mensaje y en cualquier fichero
> >> adjunto es confidencial, privada y de uso exclusivo para el
> >> destinatario. Si usted no es la persona a la cual iba dirigida esta
> >> información, por favor, notifique inmediatamente el envío erróneo al
> >> remitente y borre el mensaje. Cualquier copia, divulgación,
> >> distribución o utilización no autorizada de este correo electrónico y
> >> de sus adjuntos está prohibida en virtud de la legislación vigente.
> >>
> >> The information included in this e-mail and any attached files are
> >> confidential and private. If you are not the intended recipient,
> >> please notify the error to the sender and delete this message
> >> immediately. Dissemination, forwarding or copying of this e-mail and
> >> its associated attachments is strictly prohibited according with
> >> current legislation.
> >> --------
> >>
> >> _______________________________________________
> >> Genome maillist  -  Genome at soe.ucsc.edu
> >> http://www.soe.ucsc.edu/mailman/listinfo/genome
> >>
> >>
> >
> >
>
> --
> -----------------------------------------------------
> Julien Lagarde
> Genome Bioinformatics Research Group
> Centre de Regulacio Genomica
> Grup de Recerca en Informatica Biomedica (IMIM)
> Dr. Aiguader, 88 				(+34) 93 3160166 ph
> E-08003 Barcelona				(+34) 93 3160099 fax
> http://genome.imim.es
> --------------------------------
>
>
>
> --------
> La informació continguda en aquest missatge i en qualsevol fitxer
> adjunt és confidencial, privada i d'ús exclusiu per al destinatari.
> Si no és la persona a la qual anava dirigida aquesta informació, si us
> plau, notifiqui immediatament l'enviament erroni al remitent i esborri
> el missatge. Qualsevol còpia, divulgació, distribució o utilització no
> autoritzada d'aquest correu electrònic i dels seus adjunts està
> prohibida en virtut de la legislació vigent.
>
> La información contenida en este mensaje y en cualquier fichero
> adjunto es confidencial, privada y de uso exclusivo para el
> destinatario. Si usted no es la persona a la cual iba dirigida esta
> información, por favor, notifique inmediatamente el envío erróneo al
> remitente y borre el mensaje. Cualquier copia, divulgación,
> distribución o utilización no autorizada de este correo electrónico y
> de sus adjuntos está prohibida en virtud de la legislación vigente.
>
> The information included in this e-mail and any attached files are
> confidential and private. If you are not the intended recipient,
> please notify the error to the sender and delete this message
> immediately. Dissemination, forwarding or copying of this e-mail and
> its associated attachments is strictly prohibited according with
> current legislation.
> --------
>



More information about the Genome mailing list