[Genome] location of .tpa files

Ewan Birney birney at ebi.ac.uk
Mon Nov 6 13:12:26 PST 2006


On 6 Nov 2006, at 20:13, Donna Karolchik wrote:

> hi Jonathan,
>
> I suspect you are looking for TPF (i.e. tiling path format)  
> files...we don't
> know of any TPA files. If so, you can most likely get those from  
> the NCBI site.
> We do have some tables with clone IDs that might contain the info  
> you're looking
> for, e.g. chr*_gold or ctgPos, depending on the type of accession/ 
> level of
> assembly structure you want. You can download these from our  
> downloads server at
> http://hgdownload.cse.ucsc.edu/.
>
> -Donna


Jonathan - I am sorry you are going all around the houses here, but let
me suggest something simpler - you need the list of accession numbers
in each chromosome, and currently _all_ of those accessions in human are
finished and the vast majority of those in mouse are also fininshed.
If you want to check, pull out the accessions from EMBL/GenBank and
look at the HTG_ tag in the comment lines.

To get accession numbers, you can either do a join on the ensembl
mysql database like:

mysql -e 'select c.name  from seq_region c,seq_region chr,assembly am  
where chr.name = "X" and am.asm_seq_region_id = chr.seq_region_id and  
am.cmp_seq_region_id = c.seq_region_id and c.coord_system_id = 4' -h  
ensembldb.ensembl.org -u anonymous homo_sapiens_core_41_36c | perl - 
ne '($acc) = /^(\w+)\./; print $acc,"\n"'


(the perl pipe is to convert text like:

   AC000055.1.1.93578 to

   AC000055

)


Or you can (I think) get out this list from the Table Browser at UCSC -
not quite sure what to do but it will be something like the accession  
track
in the assembly group tables.


For mouse, the corresponding SQL is

mysql -e 'select c.name  from seq_region c,seq_region chr,assembly am  
where chr.name = "X" and am.asm_seq_region_id = chr.seq_region_id and  
am.cmp_seq_region_id = c.seq_region_id and c.coord_system_id = 3' -h  
ensembldb.ensembl.org -u anonymous mus_musculus_core_41_36b | perl - 
ne '/CAA/ && next; ($acc) = /(\w+)\./; print $acc,"\n"'


I have rather cheekily added a /CAA/ && next in the perl loop,  
skipping the
accessions starting with CAA. This is becuase I happen to know that  
these are
WGS contigs.


As I've done this, I've thrown these up on my web site at

http://www.ebi.ac.uk/~birney/human_X.txt
http://www.ebi.ac.uk/~birney/mouse_X.txt



Feel to play around with the above SQL and/or hand it over to your  
local/favourite geek
to help explain what's going on here.








More information about the Genome mailing list