[Genome] formula to locate sequence in the fasta files

Kayla Smith kayla at soe.ucsc.edu
Mon Oct 8 13:13:50 PDT 2007


Hello Zhou,

Information about the 0-based / 1-based coordinates are here in our FAQ:
http://genome.ucsc.edu/FAQ/FAQtracks#tracks1

The formula you've provided may function on a set of fasta files that have
exactly 50 characters to each line, but we can not guarantee that all our
fasta files have 50 characters on each line.

The recommended method is to use the fasta reading utilities in the
kent source tree which can extract specific sequences, such as faFrag,
or twoBitToFa from the 2bit file.

Here is a FAQ on downloading the Genome Browser utilities:
http://genome.ucsc.edu/FAQ/FAQlicense#license3

I hope this information is helpful to you.  Please don't hesitate to
contact us again if you require further assistance.

Kayla Smith
UCSC Genome Bioinformatics Group

On Mon, 8 Oct 2007, ZHOU Jiangtao wrote:

> Hi,
>
>
>
> To get the genome sequences for a given gene location, let say, (chr1,
> +, txStart, txEnd), I downloaded the FASTA files
>
> >From goldenPath/hg18/chromosomes/
>
> And use this formula:
>
>
>
> Starting position of file chr1.fa:
>
> strlen("chr1")+2+($txStart/50)*51+$txStart%50;
>
> length:$txEnd-$txStart+1;
>
>
>
> are these formula correct? I found out sometimes it will be 1 position
> earlier than the one I can get from the genome browser. Are txStart 0
> based or 1 based?
>
>
>
> Regards,
>
>
>
> Zhou Jiangtao
>
> _______________________________________________
> Genome maillist  -  Genome at soe.ucsc.edu
> http://www.soe.ucsc.edu/mailman/listinfo/genome
>


More information about the Genome mailing list