[Genome] GeneSorter References

Fan Hsu fanhsu at soe.ucsc.edu
Fri Jan 11 15:39:35 PST 2008


Hi Eddie,

Attached below please find the detailed processing steps we used to 
build our knownToHprd table.  Help this helps.

Fan.

BUILD HPRD DATA FOR KNOWN GENE DETAILS PAGE LINKS (DONE 9/11/06)

# Download HPRD_XML_060106.tar.gz from www.hprd.org

    gzip -d HPRD_XML_060106.tar.gz
    tar -xvf HPRD_XML_060106.tar.gz

# This will create 18838 xxxx.xml files under HPRD_XML_060106

# Create hprdToCdna table

    echo 'grep -H entry_cdna  HPRD_XML_060106/$1.xml' >do1Cdna

    ls  HPRD_XML_060106 >j
    cat j |sed -e 's/.xml/\tdo1Cdna/g' >jj
    cut -f 1 jj >j.2
    cut -f 2 jj >j.1
    paste j.1 j.2 >doAllCdna
    chmod +x do*

    ./doAllCdna >j.cdna
    cat j.cdna| sed -e 's/\//\t/' | sed -e 's/.xml/\t/' |\
    sed -e 's/<entry_cdna>/\t/' | sed -e 's/<\//\t/'| sed -e 's/\./\t/'| cut
-f 2,4|\
    grep -v None >hprdToCdna.tab

    hgsql hg18 -e 'drop table hprdToCdna'
    hgsql hg18 <~/src/hg/lib/hprdToCdna.sql
    hgsql hg18 -e 'load data local infile "hprdToCdna.tab" into table
hprdToCdna'

# Create hprdToUniProt table

    echo 'fgrep -H Swiss  HPRD_XML_060106/$1.xml' >do1

    ls HPRD_XML_060106 >j
    cat j |sed -e 's/.xml/\tdo1/g' >jj
    cut -f 1 jj >j.2
    cut -f 2 jj >j.1
    paste j.1 j.2 >doall
    chmod +x do*

    ./doall >j.out
    cat j.out|grep SwissProt | sed -e 's/\//\t/' | sed -e 's/.xml/\t/' | \
    sed -e 's/Prot>/\t/' | sed -e 's/<\//\t/'| cut -f 2,4|grep -v None
>hgrdToUniProt.tab

    hgsql hg18 -e 'drop table hprdToUniProt'
    hgsql hg18 <~/src/hg/lib/hprdToUniProt.sql
    hgsql hg18 -e 'load data local infile "hprdToUniProt.tab" into table
hprdToUniProt'

# build knownToHprd table

    hgsql hg18 -N -e 'select kgId,hprdId from hprdToCdna, kgXref where
cdnaId=kgId' >j.kg1
    hgsql hg18 -N -e 'select kgId,hprdId from hprdToUniProt, kgXref where
uniProtId=spId' >j.kg2

    cat j.kg1 j.kg2 |sort -u >knownToHprd.tab
    wc knownToHprd.tab

    hgsql hg18 -e 'drop table knownToHprd'
    hgsql hg18 <~/src/hg/lib/knownToHprd.sql

    hgsql hg18 -e 'load data local infile "knownToHprd.tab" into table
knownToHprd'
    hgsql hg18 -e 'select count(*) from knownToHprd'

# 19,646 records created.

# remove temporary files.

    rm j*

-----Original Message-----
From: genome-bounces at soe.ucsc.edu [mailto:genome-bounces at soe.ucsc.edu] On
Behalf Of Edward Herman
Sent: Friday, 11 January, 2008 3:22 PM
To: genome at soe.ucsc.edu
Cc: Brett Abrahams
Subject: [Genome] GeneSorter References

Hi,
We have recently used the Gene Sorter tool on our gene of interest, and
found a striking overlap between the list that this tool generated and our
own data.  We would like to know more about what publications or results the
Gene Sorter is based on and how we can access them.  In particular, we are
interested in the HRPD Protein-Protein interaction database.  We tried their
website (www.hprd.org) searching on our gene and clicking the "Interactions"
tab, but only a subset (8 out of 71) appear here with links to publications.

Can you tell us more about what data was used to generate the HRPD
Protein-Protein interaction database on the Gene Sorter tool?

Thanks,
Eddie
Geschwind Lab, UCLA
_______________________________________________
Genome maillist  -  Genome at soe.ucsc.edu
http://www.soe.ucsc.edu/mailman/listinfo/genome



More information about the Genome mailing list