[Genome] nib utilities?
Galt Barber
galt at soe.ucsc.edu
Mon Oct 16 10:37:38 PDT 2006
And with twoBitToFa you can also give a list of ranges
-seqList=file - file containing list of sequence names
to output of form the seqSpec[:start-end]
This will run very fast and get all 100,000 sequences
in no time!
-Galt
On Mon, 16 Oct 2006, Hiram Clawson wrote:
> Good Morning Davide:
>
> I'm not certain exactly what you are asking about here.
>
> You can get nibFrag to write its output to stdout by
> using the special name "stdout" in the command in place
> of the out.fa argument. All the kent utilities recognize
> the special words: stdin stdout stderr in place of any
> filename.
>
> You could also use the single 2bit file for the genome
> and use the twoBitToFa command to extract sequences.
> See usage message attached below.
>
> --Hiram
>
> On 2006 Oct 16, , at 1:52 AM, Davide Cittaro wrote:
>
> > Hi all, in order to get sequences from genomes I found that nibFrag
> > does the job I need (-> get a sequence in a specified range).
> > Nevertheless every time I launch, it reads the wanted nib files and
> > gives the output to a fasta file. While the continuous reads results
> > in a slow down of the process (if I have to get 100000 sequences it
> > takes some time), it seems that the output cannot be appended nor
> > sent to stdout.
> > I wonder if there is a way to get fragments from a gfServer that is
> > already running (so that nib files have been loaded previously) and
> > which other utilities can handle nib files.
> >
>
>
> > nibFrag - Extract part of a nib file as .fa (all bases/gaps lower case
> > by default)
> > usage:
> > nibFrag [options] file.nib start end strand out.fa
> > where strand is + (plus) or m (minus)
> > options:
> > -masked - use lower case characters for bases meant to be masked out
> > -hardMasked - use upper case for not masked-out and 'N' characters
> > for masked-out bases
> > -upper - use upper case characters for all bases
> > -name=name Use given name after '>' in output sequence
> > -dbHeader=db Add full database info to the header, with or without
> > -name option
> > -tbaHeader=db Format header for compatibility with tba, takes
> > database name as argument
>
> twoBitToFa - Convert all or part of .2bit file to fasta
> usage:
> twoBitToFa input.2bit output.fa
> options:
> -seq=name - restrict this to just one sequence
> -start=X - start at given position in sequence (zero-based)
> -end=X - end at given position in sequence (non-inclusive)
> -seqList=file - file containing list of sequence names
> to output of form the seqSpec[:start-end]
> -noMask - convert sequence to all upper case
>
> Sequence and range may also be specified as part of the input
> file name using the syntax:
> /path/input.2bit:name
> or
> /path/input.2bit:name
> or
> /path/input.2bit:name:start-end
>
> _______________________________________________
> Genome maillist - Genome at soe.ucsc.edu
> http://www.soe.ucsc.edu/mailman/listinfo/genome
>
More information about the Genome
mailing list