[Genome] location of .tpa files
Jonathan Miller
jnthnmllr at gmail.com
Mon Nov 6 13:25:15 PST 2006
Hi Ewan,
thanks for the detailed instructions; I'm working on it now and will report
back to you if I experience any problems.
best wishes
jm
On 11/6/06, Ewan Birney <birney at ebi.ac.uk> wrote:
>
>
> On 6 Nov 2006, at 20:13, Donna Karolchik wrote:
>
> > hi Jonathan,
> >
> > I suspect you are looking for TPF (i.e. tiling path format)
> > files...we don't
> > know of any TPA files. If so, you can most likely get those from
> > the NCBI site.
> > We do have some tables with clone IDs that might contain the info
> > you're looking
> > for, e.g. chr*_gold or ctgPos, depending on the type of accession/
> > level of
> > assembly structure you want. You can download these from our
> > downloads server at
> > http://hgdownload.cse.ucsc.edu/.
> >
> > -Donna
>
>
> Jonathan - I am sorry you are going all around the houses here, but let
> me suggest something simpler - you need the list of accession numbers
> in each chromosome, and currently _all_ of those accessions in human are
> finished and the vast majority of those in mouse are also fininshed.
> If you want to check, pull out the accessions from EMBL/GenBank and
> look at the HTG_ tag in the comment lines.
>
> To get accession numbers, you can either do a join on the ensembl
> mysql database like:
>
> mysql -e 'select c.name from seq_region c,seq_region chr,assembly am
> where chr.name = "X" and am.asm_seq_region_id = chr.seq_region_id and
> am.cmp_seq_region_id = c.seq_region_id and c.coord_system_id = 4' -h
> ensembldb.ensembl.org -u anonymous homo_sapiens_core_41_36c | perl -
> ne '($acc) = /^(\w+)\./; print $acc,"\n"'
>
>
> (the perl pipe is to convert text like:
>
> AC000055.1.1.93578 to
>
> AC000055
>
> )
>
>
> Or you can (I think) get out this list from the Table Browser at UCSC -
> not quite sure what to do but it will be something like the accession
> track
> in the assembly group tables.
>
>
> For mouse, the corresponding SQL is
>
> mysql -e 'select c.name from seq_region c,seq_region chr,assembly am
> where chr.name = "X" and am.asm_seq_region_id = chr.seq_region_id and
> am.cmp_seq_region_id = c.seq_region_id and c.coord_system_id = 3' -h
> ensembldb.ensembl.org -u anonymous mus_musculus_core_41_36b | perl -
> ne '/CAA/ && next; ($acc) = /(\w+)\./; print $acc,"\n"'
>
>
> I have rather cheekily added a /CAA/ && next in the perl loop,
> skipping the
> accessions starting with CAA. This is becuase I happen to know that
> these are
> WGS contigs.
>
>
> As I've done this, I've thrown these up on my web site at
>
> http://www.ebi.ac.uk/~birney/human_X.txt
> http://www.ebi.ac.uk/~birney/mouse_X.txt
>
>
>
> Feel to play around with the above SQL and/or hand it over to your
> local/favourite geek
> to help explain what's going on here.
>
>
>
>
>
>
>
More information about the Genome
mailing list