[Genome] GeneSorter References
Fan Hsu
fanhsu at soe.ucsc.edu
Fri Jan 11 15:39:35 PST 2008
Hi Eddie,
Attached below please find the detailed processing steps we used to
build our knownToHprd table. Help this helps.
Fan.
BUILD HPRD DATA FOR KNOWN GENE DETAILS PAGE LINKS (DONE 9/11/06)
# Download HPRD_XML_060106.tar.gz from www.hprd.org
gzip -d HPRD_XML_060106.tar.gz
tar -xvf HPRD_XML_060106.tar.gz
# This will create 18838 xxxx.xml files under HPRD_XML_060106
# Create hprdToCdna table
echo 'grep -H entry_cdna HPRD_XML_060106/$1.xml' >do1Cdna
ls HPRD_XML_060106 >j
cat j |sed -e 's/.xml/\tdo1Cdna/g' >jj
cut -f 1 jj >j.2
cut -f 2 jj >j.1
paste j.1 j.2 >doAllCdna
chmod +x do*
./doAllCdna >j.cdna
cat j.cdna| sed -e 's/\//\t/' | sed -e 's/.xml/\t/' |\
sed -e 's/<entry_cdna>/\t/' | sed -e 's/<\//\t/'| sed -e 's/\./\t/'| cut
-f 2,4|\
grep -v None >hprdToCdna.tab
hgsql hg18 -e 'drop table hprdToCdna'
hgsql hg18 <~/src/hg/lib/hprdToCdna.sql
hgsql hg18 -e 'load data local infile "hprdToCdna.tab" into table
hprdToCdna'
# Create hprdToUniProt table
echo 'fgrep -H Swiss HPRD_XML_060106/$1.xml' >do1
ls HPRD_XML_060106 >j
cat j |sed -e 's/.xml/\tdo1/g' >jj
cut -f 1 jj >j.2
cut -f 2 jj >j.1
paste j.1 j.2 >doall
chmod +x do*
./doall >j.out
cat j.out|grep SwissProt | sed -e 's/\//\t/' | sed -e 's/.xml/\t/' | \
sed -e 's/Prot>/\t/' | sed -e 's/<\//\t/'| cut -f 2,4|grep -v None
>hgrdToUniProt.tab
hgsql hg18 -e 'drop table hprdToUniProt'
hgsql hg18 <~/src/hg/lib/hprdToUniProt.sql
hgsql hg18 -e 'load data local infile "hprdToUniProt.tab" into table
hprdToUniProt'
# build knownToHprd table
hgsql hg18 -N -e 'select kgId,hprdId from hprdToCdna, kgXref where
cdnaId=kgId' >j.kg1
hgsql hg18 -N -e 'select kgId,hprdId from hprdToUniProt, kgXref where
uniProtId=spId' >j.kg2
cat j.kg1 j.kg2 |sort -u >knownToHprd.tab
wc knownToHprd.tab
hgsql hg18 -e 'drop table knownToHprd'
hgsql hg18 <~/src/hg/lib/knownToHprd.sql
hgsql hg18 -e 'load data local infile "knownToHprd.tab" into table
knownToHprd'
hgsql hg18 -e 'select count(*) from knownToHprd'
# 19,646 records created.
# remove temporary files.
rm j*
-----Original Message-----
From: genome-bounces at soe.ucsc.edu [mailto:genome-bounces at soe.ucsc.edu] On
Behalf Of Edward Herman
Sent: Friday, 11 January, 2008 3:22 PM
To: genome at soe.ucsc.edu
Cc: Brett Abrahams
Subject: [Genome] GeneSorter References
Hi,
We have recently used the Gene Sorter tool on our gene of interest, and
found a striking overlap between the list that this tool generated and our
own data. We would like to know more about what publications or results the
Gene Sorter is based on and how we can access them. In particular, we are
interested in the HRPD Protein-Protein interaction database. We tried their
website (www.hprd.org) searching on our gene and clicking the "Interactions"
tab, but only a subset (8 out of 71) appear here with links to publications.
Can you tell us more about what data was used to generate the HRPD
Protein-Protein interaction database on the Gene Sorter tool?
Thanks,
Eddie
Geschwind Lab, UCLA
_______________________________________________
Genome maillist - Genome at soe.ucsc.edu
http://www.soe.ucsc.edu/mailman/listinfo/genome
More information about the Genome
mailing list