[Genome] GenScan UCSC track for the ENCODE regions

Ann Zweig ann at soe.ucsc.edu
Fri Oct 6 11:22:27 PDT 2006


Hello Axel,

	In addition to Hiram's information, I would just like to suggest that you might 
want to contact the authors of the paper that you have cited.  Also, you may 
find some useful information on the EGASP workshop website:

http://genome.imim.es/gencode/workshop2005.html

	Hope this is helpful to you.

Regards,

----------
Ann Zweig
UCSC Genome Bioinformatics Group
http://genome.ucsc.edu


Hiram Clawson wrote:
> Good Morning Axel:
> 
> Our run of genscan is done on:
> 
> 1. hard masked contigs
> 
> 2. our own gsBig front end
> 	(in the kent source tree: http://genome.ucsc.edu/admin/cvs.html
> 	http://genome.ucsc.edu/admin/jk-install.html)
>      to genscan with arguments, for example:
> 	/cluster/bin/x86_64/gsBig /cluster/data/hg17/1/NT_004321/NT_004321.fa.masked \
> gtf/NT_004321.fa.gtf -trans=pep/NT_004321.fa.pep \
> -subopt=subopt/NT_004321.fa.bed -exe=hg3rdParty/genscanlinux/genscan \
> -par=hg3rdParty/genscanlinux/HumanIso.smat -tmp=/tmp -window=2400000
> 
> 3. results lifted to chrom coordinates
> 
> The usage message of gsBig is:
> gsBig - Run Genscan on big input and produce GTF files and other parsed output
> usage:
>     gsBig file.fa output.gtf
> options:
>     -subopt=output.bed - Produce suboptimal exons.
>     -trans=output.fa - where to put translated proteins.
>     -prerun=input.genscan - Assume genscan run already with this output.
>     -window=size    Set window to pass to genscan specific size (default 1200000)
>                     You want ~400 bytes memory for each base in window.
>     -exe=/bin/genscan-linux/genscan - where genscan executable is.
>     -par=/bin/genscan-linux/HumanIso.smat - where parameter file is.
>     -tmp=/tmp - where temporary files go to.
> 
> You can read our entire processing sequence in the *.txt files in the source tree:
> 	src/hg/makeDb/doc/*.txt
> 
> --Hiram
> 
> Axel E. Bernal wrote:
>> Hi,
>>
>> I am trying to reproduce the predictions of the GenScan track for the 
>> ENCODE region.
>> The numbers for the nucleotide level, as they appeared in the last EGASP 
>> '05 ENCODE publication: Genome Biology, Volumen 7, Supplement 1, are as 
>> follows (they appear in page S2.10 under "USCS genscan track"):
>> Sn:84.17%
>> Sp:60.60%
>>
>> Whereas the ones that I obtained with a local copy of GenScan are around:
>> Sn:48.92%
>> Sp:59.40%
>>
>> I made sure no errors were made while reading the output of the program, 
>> also I am using masked sequences - I used RepeatMasker - (the results with 
>> unmasked sequences are even worse).
>>
>> I'd appreciate if you could help me out as to what could be the reason I 
>> am obtaining these bad results on ENCODE; in all other test sets I don't 
>> have this problem. Did you use any especial parameters for running the 
>> program? or any pre/post-processing in the sequences?
>>
>> Thanks a lot in advance,
>>
>> Sincerely,
>>
>> Axel E Bernal
>> University of Pennsylvania
> _______________________________________________
> Genome maillist  -  Genome at soe.ucsc.edu
> http://www.soe.ucsc.edu/mailman/listinfo/genome


More information about the Genome mailing list