[Genome] chimp proteins in fasta
Brooke Rhead
rhead at soe.ucsc.edu
Tue Apr 22 20:08:10 PDT 2008
Hello Vesko,
You can find a bulk download of protein sequence for all species here:
http://hgdownload.cse.ucsc.edu/goldenPath/uniProt/database/protein.txt.gz
This is a large file (615 MB zipped), containing the fasta records for
nearly 4 million proteins.
To get only the chimp proteins from this table (which will be a lot less
data -- only about 1600 proteins), you can use the Table Browser. Go to
the "Tables" link at the top of our page and make the following selections:
group: all tables
database: uniProt
table: uniProt.accToTaxon
Hit the "filter: create" button and create a filter so that "taxon=9598"
(9598 is the taxon ID for chimpanzee from NCBI:
http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Pan+troglodytes&lvl=0&srchmode=1
).
Back on the main Table Browser page, choose the output format option
"selected fields from primary and related tables" and enter a filename
in the "output file" field. Hit "get output".
On the next page, select the box next to the "uniprot.protein" table and
hit "allow selection from checked tables". At the top of this page,
check the two boxes in the "uniProt.protein fields" section and hit "get
output". You should get a file containing only the chimpanzee protein
sequences from the uniProt.protein table.
I hope this information is helpful.
--
Brooke Rhead
UCSC Genome Bioinformatics Group
Vesselin Baev wrote:
> Dear all,
> where I can find fasta of all chimp proteins? I looked in
> ftp://hgdownload.cse.ucsc.edu/goldenPath/panTro2/database/
> but there is only cds files with coordinates?
>
> Vesko
>
More information about the Genome
mailing list