[Genome] nib utilities?

Galt Barber galt at soe.ucsc.edu
Mon Oct 16 10:37:38 PDT 2006


And with twoBitToFa you can also give a list of ranges

   -seqList=file - file containing list of sequence names
                    to output of form the seqSpec[:start-end]

This will run very fast and get all 100,000 sequences
in no time!

-Galt


On Mon, 16 Oct 2006, Hiram Clawson wrote:

> Good Morning Davide:
>
> I'm not certain exactly what you are asking about here.
>
> You can get nibFrag to write its output to stdout by
> using the special name "stdout" in the command in place
> of the out.fa argument.  All the kent utilities recognize
> the special words: stdin stdout stderr in place of any
> filename.
>
> You could also use the single 2bit file for the genome
> and use the twoBitToFa command to extract sequences.
> See usage message attached below.
>
> --Hiram
>
> On 2006 Oct 16, , at 1:52 AM, Davide Cittaro wrote:
>
> > Hi all, in order to get sequences from genomes I found that nibFrag
> > does the job I need (-> get a sequence in a specified range).
> > Nevertheless every time I launch, it reads the wanted nib files and
> > gives the output to a fasta file. While the continuous reads results
> > in a slow down of the process (if I have to get 100000 sequences it
> > takes some time), it seems that the output cannot be appended nor
> > sent to stdout.
> > I wonder if there is a way to get fragments from a gfServer that is
> > already running (so that nib files have been loaded previously) and
> > which other utilities can handle nib files.
> >
>
>
> > nibFrag - Extract part of a nib file as .fa (all bases/gaps lower case
> > by default)
> > usage:
> >    nibFrag [options] file.nib start end strand out.fa
> > where strand is + (plus) or m (minus)
> > options:
> >    -masked - use lower case characters for bases meant to be masked out
> >    -hardMasked - use upper case for not masked-out and 'N' characters
> > for masked-out bases
> >    -upper - use upper case characters for all bases
> >    -name=name Use given name after '>' in output sequence
> >    -dbHeader=db Add full database info to the header, with or without
> > -name option
> >    -tbaHeader=db Format header for compatibility with tba, takes
> > database name as argument
>
> twoBitToFa - Convert all or part of .2bit file to fasta
> usage:
>     twoBitToFa input.2bit output.fa
> options:
>     -seq=name - restrict this to just one sequence
>     -start=X  - start at given position in sequence (zero-based)
>     -end=X - end at given position in sequence (non-inclusive)
>     -seqList=file - file containing list of sequence names
>                      to output of form the seqSpec[:start-end]
>     -noMask - convert sequence to all upper case
>
> Sequence and range may also be specified as part of the input
> file name using the syntax:
>        /path/input.2bit:name
>     or
>        /path/input.2bit:name
>     or
>        /path/input.2bit:name:start-end
>
> _______________________________________________
> Genome maillist  -  Genome at soe.ucsc.edu
> http://www.soe.ucsc.edu/mailman/listinfo/genome
>


More information about the Genome mailing list