From johnsonrb at gis.a-star.edu.sg Mon Jul 2 01:21:42 2007 From: johnsonrb at gis.a-star.edu.sg (Rory JOHNSON) Date: Mon, 2 Jul 2007 16:21:42 +0800 Subject: [Genome] bed file request Message-ID: <606B10C3E38D7F43B823E5132D32ED3D1DB058@gisexch.gis.a-star.edu.sg> Hi Please could you send me the BED files for the recent dataset from Kapranov et al: Affy Tx sRNA Reg Track Affy Tx sRNA Sig For both strands and cell lines. Thanks! The browser is fantastic, use it all the time. Best Rory Johnson ========================================= Rory JOHNSON Postdoctoral Fellow Genome Institute of Singapore 60 Biopolis Street #02-01 Genome Singapore 138672 Email: johnsonrb at gis.a-star.edu.sg Phone: (+65) 64788149 Fax: (+65) 64789005 ========================================= This email is confidential and may be privileged. If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its contents to any other person as it may be an offence under the Official Secrets Act. Thank you. From pauhsi at nhri.org.tw Mon Jul 2 01:17:11 2007 From: pauhsi at nhri.org.tw (pauhsi) Date: Mon, 02 Jul 2007 16:17:11 +0800 Subject: [Genome] about GO (gene ontology) Message-ID: <000801c7bc81$6646d670$790a450a@nhri.local> Hi How do I map knownGene.txt (http://hgdownload.cse.ucsc.edu/goldenPath/hg18/database/knownGene.txt.gz) to GO number? From s.davidson at abdn.ac.uk Mon Jul 2 05:43:58 2007 From: s.davidson at abdn.ac.uk (Davidson, S.) Date: Mon, 2 Jul 2007 13:43:58 +0100 Subject: [Genome] SNP Message-ID: Hi UCSC, I wonder if you can clear something up for me. I'm looking at Human Mar. 2006 Assembly and I'm interested in the SNP track. I was looking at the source data used to create this track, in particular the b126_SNPContigLoc_36_1.bcp.gz file from dbSNP. I noticed that the coordinates given for the SNPs don't match the coordinates displayed on the browser. For example (the first SNPs in the file): snp dbsnp_coordinate UCSC_coordinate rs3 31344841 31344842 rs4 31345221 31345222 However when I looked within UCSC text files from the table browser I saw that these SNPs are noted as having a chromstart and chromend positions chromstart chromend rs3 31344841 31344842 rs4 31345221 31345222 This suggests that dbSNP uses a different coordinate convention to the zero based start one based end that UCSC employs, am I right in thinking this? Thanks in advance, Scott From heather at soe.ucsc.edu Mon Jul 2 08:45:54 2007 From: heather at soe.ucsc.edu (Heather Trumbower) Date: Mon, 2 Jul 2007 08:45:54 -0700 (PDT) Subject: [Genome] SNP In-Reply-To: References: Message-ID: Scott: You are correct, we adjust the dbSNP coordinates to match our zero-based start, one-based end convention. Heather Trumbower UCSC Genome Bioinformatics Group On Mon, 2 Jul 2007, Davidson, S. wrote: > Hi UCSC, > I wonder if you can clear something up for me. > I'm looking at Human Mar. 2006 Assembly and I'm interested in the SNP > track. > > I was looking at the source data used to create this track, in > particular the b126_SNPContigLoc_36_1.bcp.gz file from dbSNP. > I noticed that the coordinates given for the SNPs don't match the > coordinates displayed on the browser. > For example (the first SNPs in the file): > > snp dbsnp_coordinate UCSC_coordinate > rs3 31344841 31344842 > rs4 31345221 31345222 > > However when I looked within UCSC text files from the table browser I > saw that these SNPs are noted as having a chromstart and chromend > positions > > chromstart chromend > rs3 31344841 31344842 > rs4 31345221 31345222 > > This suggests that dbSNP uses a different coordinate convention to the > zero based start one based end that UCSC employs, am I right in thinking > this? > > Thanks in advance, > Scott > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome > From archanat at soe.ucsc.edu Mon Jul 2 09:14:27 2007 From: archanat at soe.ucsc.edu (Archana Thakkapallayil) Date: Mon, 02 Jul 2007 09:14:27 -0700 Subject: [Genome] rsync server not right? In-Reply-To: <1695.69.111.194.130.1183162557.squirrel@webmail.soe.ucsc.edu> References: <46856F42.2050701@gs.washington.edu> <1695.69.111.194.130.1183162557.squirrel@webmail.soe.ucsc.edu> Message-ID: <46892463.90309@soe.ucsc.edu> Hello Ginger, One of our admins ran the command that you used and it works fine for him. Could you please try again and let us know if this still does not work, or if you have further questions. Regards, Archana UCSC Genome Bioinformatics Group >> Hello, >> >> when I tried to rsync the tables of genome databases using sth like >> rsync -avzn >> rsync://hgdownload.cse.ucsc.edu/genome/goldenPath/hg18/database/ /tmp/ > >> /tmp/t.out >> >> here is what I got. It had been like this for a few days. >> >> @ERROR: max connections (10) reached -- try again later >> rsync: connection unexpectedly closed (68 bytes read so far) >> rsync error: error in rsync protocol data stream (code 12) at io.c(165) >> >> >> Sth went wrong with rsync server of the databases? >> >> thank you for helps >> >> -- >> Ze(Ginger) Cheng >> Bioinformatics Specialist >> Howard Hughes Medical Institute >> >> Department of Genome Sciences >> Foege Building S 433-D Box 355065 >> 1705 NE Pacific St. >> Seattle WA 98195-5065 >> >> Office: 206-543-9530 >> Fax: 206-221-5795 >> Email: ginger at gs.washington.edu >> >> _______________________________________________ >> Genome maillist - Genome at soe.ucsc.edu >> http://www.soe.ucsc.edu/mailman/listinfo/genome >> >> > > From ann at soe.ucsc.edu Mon Jul 2 09:45:53 2007 From: ann at soe.ucsc.edu (Ann Zweig) Date: Mon, 02 Jul 2007 09:45:53 -0700 Subject: [Genome] bed file request In-Reply-To: <606B10C3E38D7F43B823E5132D32ED3D1DB058@gisexch.gis.a-star.edu.sg> References: <606B10C3E38D7F43B823E5132D32ED3D1DB058@gisexch.gis.a-star.edu.sg> Message-ID: <46892BC1.20001@cse.ucsc.edu> Hello Rory, Thanks for the compliments on the browser. All underlying data files are available for download from our download server (press the "Downloads" link from the home page). Then press 'Human', navigate to 'hg17', and click on 'Annotation Database'. The database tables that hold the information for the two tracks you are interested in have the following names: Affymetrix Transcriptome Phase 3 Short RNA Fragments: affyTxnPhase3FragsHeLaTopStrand affyTxnPhase3FragsHeLaBottomStrand affyTxnPhase3FragsHepG2TopStrand affyTxnPhase3FragsHepG2BottomStrand Affymetrix Transcriptome Phase 3 Short RNA Signal: affyTxnPhase3HeLatop_strand affyTxnPhase3HeLabottom_strand affyTxnPhase3HepG2top_strand affyTxnPhase3HepG2bottom_strand Be sure to let us know if you have other questions. Regards, ---------- Ann Zweig UCSC Genome Bioinformatics Group http://genome.ucsc.edu Please feel free to search the Genome mailing list archives by visiting our home page, clicking on "Contact Us", then typing a word or phrase into the search box. On that same page (http://genome.ucsc.edu/contacts.html), you can subscribe to the Genome mailing list. Rory JOHNSON wrote: > Hi > > > > Please could you send me the BED files for the recent dataset from > Kapranov et al: > > > > Affy Tx sRNA Reg Track > > Affy Tx sRNA Sig > > > > For both strands and cell lines. > > > > Thanks! The browser is fantastic, use it all the time. > > > > Best > > Rory Johnson > > > > > > > > > ========================================= > > > > Rory JOHNSON > > > > Postdoctoral Fellow > > > > Genome Institute of Singapore > > 60 Biopolis Street > > #02-01 Genome > > Singapore 138672 > > > > Email: johnsonrb at gis.a-star.edu.sg > > Phone: (+65) 64788149 > > Fax: (+65) 64789005 > > > > ========================================= > > This email is confidential and may be privileged. If you are not the > intended recipient, please delete it and notify us immediately. Please > do not copy or use it for any purpose, or disclose its contents to any > other person as it may be an offence under the Official Secrets Act. > Thank you. > > > > > > > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From ann at soe.ucsc.edu Mon Jul 2 10:04:34 2007 From: ann at soe.ucsc.edu (Ann Zweig) Date: Mon, 02 Jul 2007 10:04:34 -0700 Subject: [Genome] about GO (gene ontology) In-Reply-To: <000801c7bc81$6646d670$790a450a@nhri.local> References: <000801c7bc81$6646d670$790a450a@nhri.local> Message-ID: <46893022.1080309@cse.ucsc.edu> Hello Pauhsi, To map from the knownGene table to the goId, you will be starting in the assembly database (hg18) then moving to the go database. Here is a list of the relationships: hg18.knownGene.id == hg18.kgXref.kgID hg18.kgXref.spID == go.goaPart.dbObjectId Then then GO id is contained in that table here: go.goaPart.goId Please be sure to write back if this is not clear. Regards, ---------- Ann Zweig UCSC Genome Bioinformatics Group http://genome.ucsc.edu Please feel free to search the Genome mailing list archives by visiting our home page, clicking on "Contact Us", then typing a word or phrase into the search box. On that same page (http://genome.ucsc.edu/contacts.html), you can subscribe to the Genome mailing list. pauhsi wrote: > Hi > > How do I map knownGene.txt (http://hgdownload.cse.ucsc.edu/goldenPath/hg18/database/knownGene.txt.gz) to GO number? > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From George.Liu at ARS.USDA.GOV Mon Jul 2 10:45:35 2007 From: George.Liu at ARS.USDA.GOV (Liu, George) Date: Mon, 2 Jul 2007 13:45:35 -0400 Subject: [Genome] refGene table for bosTau3 Message-ID: I wonder if refGene table for bosTau3 is available. On test server? The below is for bosTau2: http://hgdownload.cse.ucsc.edu/goldenPath/bosTau2/database/refGene.txt.g z Thanks again, George Dr. George Liu Research Biologist (Bioinformatics) Bovine Functional Genomics Lab. ANRI, ARS, USDA Building 200, Rm 4B, BARC-East 10300 Baltimore Ave, Beltsville, MD 20705-2350 George.Liu at ars.usda.gov Tel: 301-504-9843 (office) FAX: 301-504-8414 Tel: 301-504-6936 (lab) From ann at soe.ucsc.edu Mon Jul 2 13:15:44 2007 From: ann at soe.ucsc.edu (Ann Zweig) Date: Mon, 02 Jul 2007 13:15:44 -0700 Subject: [Genome] refGene table for bosTau3 In-Reply-To: References: Message-ID: <46895CF0.3070100@cse.ucsc.edu> Hello George, The bosTau3 assembly has been created but has not been through our QA process. The initial creation of the assembly includes all of the genBank tables, including the one you are interested in: refGene. If you are interested in getting a copy of this un-reviewed table, please reply to my individually (no need to copy the entire list) and I'll let you know how to get it. Regards, ---------- Ann Zweig UCSC Genome Bioinformatics Group http://genome.ucsc.edu Liu, George wrote: > I wonder if refGene table for bosTau3 is available. On test server? > > The below is for bosTau2: > > http://hgdownload.cse.ucsc.edu/goldenPath/bosTau2/database/refGene.txt.g > z > > > > Thanks again, > > > > George > > > > Dr. George Liu > > Research Biologist (Bioinformatics) > Bovine Functional Genomics Lab. > ANRI, ARS, USDA > Building 200, Rm 4B, BARC-East > 10300 Baltimore Ave, > Beltsville, MD 20705-2350 > George.Liu at ars.usda.gov > Tel: 301-504-9843 (office) > FAX: 301-504-8414 > Tel: 301-504-6936 (lab) > > > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From rhead at soe.ucsc.edu Mon Jul 2 14:34:25 2007 From: rhead at soe.ucsc.edu (Brooke Rhead) Date: Mon, 02 Jul 2007 14:34:25 -0700 Subject: [Genome] multiple identifiers using table browser intersection Message-ID: <46896F61.1070601@soe.ucsc.edu> Hi Archie, The problem is with column 9 of the BED file, itemRgb. The BED parser had trouble with the comma-separated "r,g,b" format. We have fixed this problem with the BED parser (in the source file hg/lib/bed.c). If you update your source tree to the latest source and recompile your libraries and the overlapSelect program, your files in their current state should work with overlapSelect. Alternatively, a quick work-around for this problem (with the old source) is to turn the r,g,b values in column 9 into zeros before processing. This awk command will do the trick: awk 'BEGIN{OFS=FS="\t"} {$9=0; print $0}' file1.bed > file1.fixed.bed Let us know if you have further problems or questions. -- Brooke Rhead UCSC Genome Bioinformatics Group -------- Original Message -------- Subject: RE: [Genome] multiple identifiers using table browser intersection Date: Fri, 29 Jun 2007 16:18:37 -0700 From: Russell, Archie To: Brooke Rhead Thanks a lot Brooke Here are snippets of the two files -----Original Message----- From: Brooke Rhead [mailto:rhead at soe.ucsc.edu] Sent: Friday, June 29, 2007 12:32 PM To: Russell, Archie Cc: genome at soe.ucsc.edu Subject: Re: [Genome] multiple identifiers using table browser intersection Hi Archie, It sounds like there might be a format issue with the BED file you are using. For instance, does it by any chance have a 'bin' column (it should not)? Alternatively, is it space-separated rather than tab-separated? The overlapSelect program only works with tab-separated BEDs. If you'd like, you can send me your file, or a sample of it. The developer who wrote overlapSelect has offered to take a look at it. (No need to copy the list if you send an attachment -- the mail list program strips attachments.) -- Brooke Rhead UCSC Genome Bioinformatics Group Russell, Archie wrote: > > Hey Brooke > > Thanks a lot for the pointers. It looks like overlapSelect with > mergeOutput (or maybe idOutput) is probably what I need. > > but I am having a problem with overlapSelect: > > > % overlapSelect -selectFmt=bed -inFmt=bed > /info/genome/Projects/649/dog/ucsc_browser/boundary.bed > /info/genome/Projects/649/dog/ucsc_browser/ncrna.bed out.bed > > gives the error > > invalid unsigned number: "2,148,141" (2,148,141 are block sizes) > When i specify -selectCoordCols=0,1,2 -inCoordCols=0,1,2 things seem to > work but then I don't think I'm getting exon-level overlaps. > > Can you tell me what I should change? > > Thanks, > Archie > > > *Brooke Rhead* rhead at soe.ucsc.edu > > /Thu Jun 28 12:47:24 PDT 2007/ > > Hello Archie, > > We have not added any new functionality to the Table Browser that will > join two tables on an identifier field. However, there is a Galaxy tool > that will do this (at http://main.g2.bx.psu.edu/). On the Galaxy page, > under the heading "Filter, Sort, Join, Compare, Subtract", there is a > tool to "Join two Queries side by side on a specified field". > > There is also a user-developed script on genomewiki with a similar > function, called bedOverlapName (which in turn calls the kent source > tool overlapSelect). It is located here: > > http://genomewiki.cse.ucsc.edu/index.php/BedOverlapName > > I hope this information is helpful. > > -- > Brooke Rhead > UCSC Genome Bioinformatics Group > > > Russell, Archie wrote: > >/ Hi, > />/ > />/ I want to do an intersection of two bed tracks and get a file that gives > />/ me pairs of identifiers (e.g. accessions) that overlap. Is it possible > />/ to do this in the table browser? I know this question has come up > />/ before and I think the answer was that this wasn't possible at the time, > />/ but maybe things have changed since then. > />/ > />/ Thanks, > />/ Archie > />/ > />/ Archie Russell > />/ Rosetta Inpharmatics > />/ 206-802-6312 > />/ / > > // > > Archie Russell > Rosetta Inpharmatics > 206-802-6312 > > > > ------------------------------------------------------------------------ ------ > Notice: This e-mail message, together with any attachments, contains > information of Merck & Co., Inc. (One Merck Drive, Whitehouse Station, > New Jersey, USA 08889), and/or its affiliates (which may be known > outside the United States as Merck Frosst, Merck Sharp & Dohme or MSD > and in Japan, as Banyu - direct contact information for affiliates is > available at http://www.merck.com/contact/contacts.html) that may be > confidential, proprietary copyrighted and/or legally privileged. It is > intended solely for the use of the individual or entity named on this > message. If you are not the intended recipient, and have received this > message in error, please notify us immediately by reply e-mail and then > delete it from your system. > > ------------------------------------------------------------------------ ------ > ------------------------------------------------------------------------------ Notice: This e-mail message, together with any attachments, contains information of Merck & Co., Inc. (One Merck Drive, Whitehouse Station, New Jersey, USA 08889), and/or its affiliates (which may be known outside the United States as Merck Frosst, Merck Sharp & Dohme or MSD and in Japan, as Banyu - direct contact information for affiliates is available at http://www.merck.com/contact/contacts.html) that may be confidential, proprietary copyrighted and/or legally privileged. It is intended solely for the use of the individual or entity named on this message. If you are not the intended recipient, and have received this message in error, please notify us immediately by reply e-mail and then delete it from your system. ------------------------------------------------------------------------------ From ginger at gs.washington.edu Mon Jul 2 16:37:09 2007 From: ginger at gs.washington.edu (Ginger Cheng) Date: Mon, 02 Jul 2007 16:37:09 -0700 Subject: [Genome] custom coloring a bedGraph track? Message-ID: <46898C25.2050509@gs.washington.edu> Hello, Browser Gurus, Just wondering if there is any quick way to color a bedGraph track based on a field of the data table. So it would be sth like chrom | chromStart | chromEnd | dataValue | color | chr1 | 16190712 | 16190763 | -0.548 | red | chr1 | 16190782 | 16190803 | -0.001 | gray THank you for any help in advance -- Ze(Ginger) Cheng Bioinformatics Specialist Howard Hughes Medical Institute Department of Genome Sciences Foege Building S 433-D Box 355065 1705 NE Pacific St. Seattle WA 98195-5065 Office: 206-543-9530 Fax: 206-221-5795 Email: ginger at gs.washington.edu From yang21 at llnl.gov Mon Jul 2 17:38:38 2007 From: yang21 at llnl.gov (Shan Yang) Date: Mon, 02 Jul 2007 17:38:38 -0700 Subject: [Genome] A question about regulatory potential scores (regPotential7X.txt for mouse) Message-ID: <7.0.0.16.2.20070702172600.02625368@llnl.gov> Hi, I just downloaded the "regPotential7X.txt" file for mm8. Here is a question I have about the data. For example, the 1st line in the file says: 608 chr1 3050242 3050650 chr1.0 1 408 0 /gbdb/mm8/wib/regPotential7X.wib 0 0.144514 408 1.67677 0.187318 To my understanding, this basically means that the 408 bp long region has a min score of 0 and max score of 0.144514 and the sum of score in this region is 1.67677, which makes the average to be 0.00408028. According to the description of this score, anything over 0.001 may have a big potential to be a regulatory element, which means this regions is highly possible to be a regulatory element. But if I look at the browser, I can see that the main contributor of this average score in this 408bp region is a subregion of chr1:3050626-3050650 (25bp). So does this mean that this subregion has more potential than any other nt in this region or every nt in this entire 408 bp region has the same potential? I think the first should be right. If this is true, then why does the file record the whole 408 region rather than just the subregion? Thanks a lot! Shan Shan Yang, PhD Genome Biology Division, L-452 Chemistry, Materials & Life Sciences Directorate (CMLS) Lawrence Livermore National Laboratory 7000 East Ave, Livermore, CA, 94550 Ph: 925-422-7389 Fax: 925-422-2099 From jayuan2007 at yahoo.com Mon Jul 2 23:58:03 2007 From: jayuan2007 at yahoo.com (Jay an) Date: Mon, 2 Jul 2007 23:58:03 -0700 (PDT) Subject: [Genome] GO annotation In-Reply-To: <468558D7.20905@soe.ucsc.edu> Message-ID: <376543.73855.qm@web63201.mail.re1.yahoo.com> hello Brooke, your instruction is very helpful. thanks GO can be categorized different processes, such as physiological process, metabolism, cellular physiological process. how can get this information from UCSC ? regards Jay Brooke Rhead wrote: Hi Jay, The go.dbObjectId is the UniProt accession number. (But in the case of A0A000, the species is not human, but Streptomyces ghanaensis [TaxID: 35758]). The go database contains dbObjectId's for all assemblies, not just hg18. However, it is possible to distingush species in the goaPart table, as the species name is included as part of the dbObjectSymbol field. Here are some examples where "HUMAN" is included: mysql> select * from goaPart where dbObjectSymbol like '%_HUMAN' limit 5; +------------+----------------+-------+------------+--------+ | dbObjectId | dbObjectSymbol | notId | goId | aspect | +------------+----------------+-------+------------+--------+ | A0A184 | A0A184_HUMAN | | GO:0005764 | C | | A0A184 | A0A184_HUMAN | | GO:0006629 | P | | A0A184 | A0A184_HUMAN | | GO:0006665 | P | | A0A1K6 | A0A1K6_HUMAN | | GO:0004222 | F | | A0A1K6 | A0A1K6_HUMAN | | GO:0006508 | P | +------------+----------------+-------+------------+--------+ 5 rows in set (0.00 sec) You can use the Table Browser to filter the goaPart table so that only human UniProt IDs are shown. To do this, hit the filter "create" button. In the free-form query box enter the text: dbObjectSymbol like '%HUMAN' and hit "submit". The output should be limited to only the UniProt symbols with "HUMAN" in the name. The filtered goaPart table may be all you need to map GO accessions to genes. But, as you have noticed, the goaPart table is linked to the hg18 kgXref table, too. If you would like to get the gene names corresponding to GO accessions from kgXref, you can do that, too, with the Table Browser: 1. You will likely want to leave the filter on the goaPart table in place (dbObjectSymbol like '%HUMAN'). 2. Select the option for "output format: selected fields from primary and related tables" and hit "get output". 3. On the next screen, under the "Linked Tables" heading, select the box for the hg18 kgXref table. Scroll to the bottom of the page and hit "Allow selection from checked tables". 4. You should now see a section called at the top of the page called "hg18.kgXref fields", where you can select any of the identifiers from the kgXref table (like gene symbol). 5. Hit "get output". You should get a list of GO identifiers with associated gene names from kgXref. Keep in mind that not every GO ID will be associated with a gene in the kgXref table. I hope this information helps. -- Brooke Rhead UCSC Genome Bioinformatics Group Jay an wrote: > thanks Brooke, > > I followed you instruction. but I got below: > > #dbObjectId dbObjectSymbol notId goId aspect > A0A000 A0A000_9ACTO GO:0003870 F > A0A000 A0A000_9ACTO GO:0006783 P > A0A000 A0A000_9ACTO GO:0009058 P > > there is not proteinID. > I found "hg18.kgXref > .spID > (via goaPart.dbObjectId", > how can I "via goaPart.dbObjectId"? > > thank you > Jay > > > > */Brooke Rhead /* wrote: > > Hello Jay, > > The GO accessions are linked to genes (that is, protein IDs) in the > table 'goaPart', which resides in our 'go' database. > > You can get to this table in the Table Browser by selecting "group: all > tables" and "database: go", then selecting "table: go.goaPart". > > I hope this information helps. If you have further questions, please > feel free to write back to this list. > > -- > Brooke Rhead > UCSC Genome Bioinformatics Group > > > Jay an wrote: > > hello, > > > > every GO:XXXXX has related genes. can you tell me how to a matrix > > (GO:XXXXX and genes)? > > > > > > thanks > > > > > > > > --------------------------------- > > Get your own web address. > > Have a HUGE year through Yahoo! Small Business. > > _______________________________________________ > > Genome maillist - Genome at soe.ucsc.edu > > http://www.soe.ucsc.edu/mailman/listinfo/genome > > > ------------------------------------------------------------------------ > Get the Yahoo! toolbar and be alerted to new email > wherever > you're surfing. --------------------------------- The fish are biting. Get more visitors on your site using Yahoo! Search Marketing. From peter.robinson at charite.de Tue Jul 3 04:25:37 2007 From: peter.robinson at charite.de (Peter Robinson) Date: Tue, 03 Jul 2007 13:25:37 +0200 Subject: [Genome] Question about MAF format Message-ID: <468A3231.8060600@charite.de> Dear UCSC, we are looking at the upstream1000.maf alignments. I am not sure what to make of alignments such as the following: s NM_152486 0 1000 + 1000 GATTTCCAGCCGCTGCTGGACAACGGCGAGCCGTGCATCGAGGTGGAGTGCGGCGCCAACCGCGCGCTGCTCTACGTGCGCAAACTCTGCCAGGGCAGCAA GGGCCCGTCCATCCGCCACCGCGGCGAGTGGCTCACGCCCAACGAGTTCCAGTTCGTCAGCGGCCGCGAGACGGCCAAGGACTGGAAGCGCAGCATCCGCCACAAAGGT--GCCG--CCGC------ --------------------CCCTCCCTTC-GCTGCC--GGGA--CCCGCGGGCCCCGA-CCCCACC---CCCTACCCGACTC-----GG--ACACCCGG---GAGCC------------------T CCGGCTCGGCC--GAGGGGGCGCTGC--AGCTCCAGGGCTG-CGCGGG--GACACCCC------------CGCCGCGCGCGGAGGCCTCGGTGA------ACACGGACAGATCGCCCCCCG-CTGCA CC-TCCCCCCAGCTTGGGCCACA---GCGCTTG---------------------------GGGCTCGCGGGCCGCTCCCTCCG-CTC--------GGAAGGTCTCT----------GCGAGGCTCCT GGGCCTTAAGGCCCGAAGGAAGTTTACGGGGACTCGAGA--G-AGCGGGC--AGGAGGCGGGTTGGGAGGG----------------------CGCGGAGCCC-CGGGTTCGGGGGAGACTGGAGGG GCGCACGTGCGGCC-GGGTGCGAg------------------------cgcgcggcgg-----------------------g----------------ggaggctgcgggg--------cggcgcgg gggcgcgcgcggagcccgagcggcggcgccAGGTCAC---ACAACCTGTTTTGGCGCCTGCGGGCGCCTGGGCCCAAGGGT-GCGACGCGGGGGCGCCTGAGCCGGGACAC-----AGGGGGTGCGG TGAGCGCCAGGC-----GCCGC----GGGGAGTTAAAAAGTTCGGGACCTGA--------GCGGTGCGTGGTTCCGCGGTGGCCGCCTCT---------------TCCTGCCGCG----CAGGC--- --CGAGGGTCCCGACGGCGCCGCTCACC-GCTCCGGGACTCAGCC---TTTCTGGGCCCGGCCTGCGGTTCCCTCG---GGGCCGG---GGAGAGGGTGGAGCGCGGGAG-------GAGGGGCGCC GGG----TGGGGACGCCC---------AGGCCCTTCGTCGGGGGAGGGCGCTCCACCCGGGCTGGAGTTGC----AGAGCCCA---------- e panTro1 0 0 + 0 e rheMac2 0 0 + 0 e rn4 0 0 + 0 e mm8 0 0 + 0 What exactly do the "e" lines mean? The FAQ states "an "e" line containing information about the size of the gap between the alignments that span the current block" However, I am not sure what that could mean in the current context. If it is the case that no alignable sequence could be found, why not show "......." (dots), as is the case in many other alignments? Thanks, Peter -- Dr. med. Peter N. Robinson, MSc. Institut f?r Medizinische Genetik Universit?tsklinikum Charite Humboldt-Universit?t Augustenburger Platz 1 13353 Berlin Germany voice: 49-30-450569124 fax: 49-30-450569915 email: peter.robinson at charite.de http://www.charite.de/ch/medgen/robinson From achen at gw.med.sc.edu Tue Jul 3 08:06:30 2007 From: achen at gw.med.sc.edu (Aishe Chen) Date: Tue, 03 Jul 2007 11:06:30 -0400 Subject: [Genome] mouse 16 chromosome Message-ID: Hi, Dear Sir/Madam: I inserted a DNA frangment in 16th chromosome of mouse. Would you email me the genomic DNA sequence of this range ( range between 82,500,000 and 86,878,000 bp) in 16th chromosome of mouse? Thanks. Aishe Chen From lining2005 at mail.bnu.edu.cn Tue Jul 3 06:56:52 2007 From: lining2005 at mail.bnu.edu.cn (=?gb2312?B?wO7E/g==?=) Date: Tue, 03 Jul 2007 21:56:52 +0800 Subject: [Genome] phastConsElements table Message-ID: <383471012.03057@mail.bnu.edu.cn> Hello, I have been down load two tables from the Table Browser in your website,phastConsElements28way and phastConsElements28wayPlacMammal.I just found that one table has more records than the other one.I think more conserved elements have been found among more species,besides mammal in phastConsElements28way table .I do not know whether it is right or not! So again,the question is: What are the more differences between the two tables? Thanks in advance for your help. NingLi From robert.castelo at upf.edu Tue Jul 3 09:47:27 2007 From: robert.castelo at upf.edu (Robert Castelo) Date: Tue, 03 Jul 2007 18:47:27 +0200 Subject: [Genome] use of the shortmatch track through an url Message-ID: <1183481247.21945.32.camel@llull.imim.es> dear people at ucsc, i'd like to ask you if it is possible to use the Short Match track under the Mapping and Sequencing Tracks directly through an url such that i can build a link which, when i click on it, opens the genome browser on a particular region of the genome and showing the perfect matches to some short sequence of my interest (i.e., the short sequence would have to go written on the url somehow..) i know this could be worked out through a custom track but i was wondering whether the Short Match would do this job also for me. thanks!! robert. -- Robert Castelo, PhD RyC Fellow Researcher Dept. Experimental and Health Sciences Pompeu Fabra University Barcelona Biomedical Research Park tel +34 933 160 514 fax +34 933 160 550 robert.castelo at upf.edu From archie_russell at merck.com Tue Jul 3 10:09:30 2007 From: archie_russell at merck.com (Russell, Archie) Date: Tue, 3 Jul 2007 10:09:30 -0700 Subject: [Genome] Converting coordinates on a mRNA to coordinates on the genome In-Reply-To: <1183481247.21945.32.camel@llull.imim.es> References: <1183481247.21945.32.camel@llull.imim.es> Message-ID: <23B0A4FBD181A44D9B89C4FB3E96D594809C85@ussemx1100.merck.com> Hi, I have some coordinates relative to mRNAs (transcript features) that I'd like to turn into coordinates on the genome. I can blat the mRNA to the genome, but doing the math for the translation is a little tricky, in particular on the reverse strand. Do you have any code or recommendations for how to do this? Thanks, Archie ------------------------------------------------------------------------------ Notice: This e-mail message, together with any attachments, contains information of Merck & Co., Inc. (One Merck Drive, Whitehouse Station, New Jersey, USA 08889), and/or its affiliates (which may be known outside the United States as Merck Frosst, Merck Sharp & Dohme or MSD and in Japan, as Banyu - direct contact information for affiliates is available at http://www.merck.com/contact/contacts.html) that may be confidential, proprietary copyrighted and/or legally privileged. It is intended solely for the use of the individual or entity named on this message. If you are not the intended recipient, and have received this message in error, please notify us immediately by reply e-mail and then delete it from your system. ------------------------------------------------------------------------------ From kayla at soe.ucsc.edu Tue Jul 3 10:42:13 2007 From: kayla at soe.ucsc.edu (Kayla Smith) Date: Tue, 3 Jul 2007 10:42:13 -0700 (PDT) Subject: [Genome] mouse 16 chromosome In-Reply-To: References: Message-ID: Aishe, I can show you how to get this (or any) sequence yourself. First you go to the gateway page: http://genome.ucsc.edu/cgi-bin/hgGateway And select "Mouse" under the genome pulldown menu. Then you can type in the coordinates you mentioned in your email below in the following format and click "submit": chr16:82,500,000-86,878,000 In the Genome Browser click on the "DNA" button on the top of the page to get the genomic sequence. I hope this is helpful to you. Please don't hesitate to contact again if you require further assistance. Kayla Smith UCSC Genome Bioinformatics Group On Tue, 3 Jul 2007, Aishe Chen wrote: > Hi, Dear Sir/Madam: > > I inserted a DNA frangment in 16th chromosome of mouse. Would you email > me the genomic DNA sequence of this range ( range between 82,500,000 and > 86,878,000 bp) in 16th chromosome of mouse? Thanks. > > Aishe Chen > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome > From rhead at soe.ucsc.edu Tue Jul 3 11:48:10 2007 From: rhead at soe.ucsc.edu (Brooke Rhead) Date: Tue, 03 Jul 2007 11:48:10 -0700 Subject: [Genome] GO annotation Message-ID: <468A99EA.2040909@soe.ucsc.edu> Hi Jay, The GO processes are listed in the table 'term' in the field 'term_type'. The 'acc' field in the 'term' table is the same as the 'goId' field in the 'goaPart' table. You can use the Table Browser as described in my earlier message to get a list of GO IDs and their corresponding processes from these tables. Also, since you are interested in GO annotations, may I suggest the Gene Sorter as a tool to view and sort genes based on GO similarity? If you click the "Gene Sorter" link at the top of the page, you will be taken to the Gene Sorter gateway page. Enter any gene name into the search box to get started. Click the "configure" button and scroll down to the Gene Ontology (GO) checkbox -- here you can turn on the display of GO information. The Gene Sorter has many options for sorting and filtering data. Complete instructions on using the Gene Sorter are here: http://genome.ucsc.edu/goldenPath/help/hgNearHelp.html I hope this helps. -- Brooke Rhead UCSC Genome Bioinformatics Group Date: Mon, 2 Jul 2007 23:58:03 -0700 (PDT) From: Jay an To: Brooke Rhead Cc: "'genome at soe.ucsc.edu'" Subject: Re: [Genome] GO annotation hello Brooke, your instruction is very helpful. thanks GO can be categorized different processes, such as physiological process, metabolism, cellular physiological process. how can get this information from UCSC ? regards Jay Brooke Rhead wrote: Hi Jay, The go.dbObjectId is the UniProt accession number. (But in the case of A0A000, the species is not human, but Streptomyces ghanaensis [TaxID: 35758]). The go database contains dbObjectId's for all assemblies, not just hg18. However, it is possible to distingush species in the goaPart table, as the species name is included as part of the dbObjectSymbol field. Here are some examples where "HUMAN" is included: mysql> select * from goaPart where dbObjectSymbol like '%_HUMAN' limit 5; +------------+----------------+-------+------------+--------+ | dbObjectId | dbObjectSymbol | notId | goId | aspect | +------------+----------------+-------+------------+--------+ | A0A184 | A0A184_HUMAN | | GO:0005764 | C | | A0A184 | A0A184_HUMAN | | GO:0006629 | P | | A0A184 | A0A184_HUMAN | | GO:0006665 | P | | A0A1K6 | A0A1K6_HUMAN | | GO:0004222 | F | | A0A1K6 | A0A1K6_HUMAN | | GO:0006508 | P | +------------+----------------+-------+------------+--------+ 5 rows in set (0.00 sec) You can use the Table Browser to filter the goaPart table so that only human UniProt IDs are shown. To do this, hit the filter "create" button. In the free-form query box enter the text: dbObjectSymbol like '%HUMAN' and hit "submit". The output should be limited to only the UniProt symbols with "HUMAN" in the name. The filtered goaPart table may be all you need to map GO accessions to genes. But, as you have noticed, the goaPart table is linked to the hg18 kgXref table, too. If you would like to get the gene names corresponding to GO accessions from kgXref, you can do that, too, with the Table Browser: 1. You will likely want to leave the filter on the goaPart table in place (dbObjectSymbol like '%HUMAN'). 2. Select the option for "output format: selected fields from primary and related tables" and hit "get output". 3. On the next screen, under the "Linked Tables" heading, select the box for the hg18 kgXref table. Scroll to the bottom of the page and hit "Allow selection from checked tables". 4. You should now see a section called at the top of the page called "hg18.kgXref fields", where you can select any of the identifiers from the kgXref table (like gene symbol). 5. Hit "get output". You should get a list of GO identifiers with associated gene names from kgXref. Keep in mind that not every GO ID will be associated with a gene in the kgXref table. I hope this information helps. -- Brooke Rhead UCSC Genome Bioinformatics Group Jay an wrote: > thanks Brooke, > > I followed you instruction. but I got below: > > #dbObjectId dbObjectSymbol notId goId aspect > A0A000 A0A000_9ACTO GO:0003870 F > A0A000 A0A000_9ACTO GO:0006783 P > A0A000 A0A000_9ACTO GO:0009058 P > > there is not proteinID. > I found "hg18.kgXref > .spID > (via goaPart.dbObjectId", > how can I "via goaPart.dbObjectId"? > > thank you > Jay > > > > */Brooke Rhead /* wrote: > > Hello Jay, > > The GO accessions are linked to genes (that is, protein IDs) in the > table 'goaPart', which resides in our 'go' database. > > You can get to this table in the Table Browser by selecting "group: all > tables" and "database: go", then selecting "table: go.goaPart". > > I hope this information helps. If you have further questions, please > feel free to write back to this list. > > -- > Brooke Rhead > UCSC Genome Bioinformatics Group > > > Jay an wrote: > > hello, > > > > every GO:XXXXX has related genes. can you tell me how to a matrix > > (GO:XXXXX and genes)? > > > > > > thanks > > > > > > > > --------------------------------- > > Get your own web address. > > Have a HUGE year through Yahoo! Small Business. > > _______________________________________________ > > Genome maillist - Genome at soe.ucsc.edu > > http://www.soe.ucsc.edu/mailman/listinfo/genome > > > ------------------------------------------------------------------------ > Get the Yahoo! toolbar and be alerted to new email > wherever > you're surfing. --------------------------------- The fish are biting. Get more visitors on your site using Yahoo! Search Marketing. From ann at soe.ucsc.edu Tue Jul 3 14:01:37 2007 From: ann at soe.ucsc.edu (Ann Zweig) Date: Tue, 03 Jul 2007 14:01:37 -0700 Subject: [Genome] phastConsElements table In-Reply-To: <383471012.03057@mail.bnu.edu.cn> References: <383471012.03057@mail.bnu.edu.cn> Message-ID: <468AB931.8060005@soe.ucsc.edu> Hello NingLi, When we create the Most Conserved track, we aim for a certain coverage percentage. For the 28-way Most Conserved track, we tweaked the parameters of the phastCons program until we got approximately 5% coverage genome-wide, and approximately 70% CDS coverage. So, the size of the tables are dependent on the parameters used for the phastCons program. You can read more details about how we created the tables by pressing the small blue or gray button to the far left of the track in the track display, or by clicking on the "Most Conserved" track name in the track controls below the track display. For even more background, here is a link to the paper describing the phastCons program: http://www.genome.org/cgi/content/abstract/15/8/1034 Please be sure to write back to the list if this is still not clear. Regards, ---------- Ann Zweig UCSC Genome Bioinformatics Group http://genome.ucsc.edu Please feel free to search the Genome mailing list archives by visiting our home page, clicking on "Contact Us", then typing a word or phrase into the search box. On that same page (http://genome.ucsc.edu/contacts.html), you can subscribe to the Genome mailing list. ?? wrote: > Hello, > I have been down load two tables from the Table Browser in your > website,phastConsElements28way and phastConsElements28wayPlacMammal.I just found > that one table has more records than the other one.I think more conserved elements > have been found among more species,besides mammal in phastConsElements28way table > .I do not know whether it is right or not! > > So again,the question is: > What are the more differences between the two tables? > > Thanks in advance for your help. > > NingLi > > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From kayla at soe.ucsc.edu Tue Jul 3 16:39:06 2007 From: kayla at soe.ucsc.edu (Kayla Smith) Date: Tue, 3 Jul 2007 16:39:06 -0700 (PDT) Subject: [Genome] A question about regulatory potential scores (regPotential7X.txt for mouse) In-Reply-To: <7.0.0.16.2.20070702172600.02625368@llnl.gov> References: <7.0.0.16.2.20070702172600.02625368@llnl.gov> Message-ID: Shan, I suggest contacting the scientists who created this data, listed on the details page for this track (http://genome.ucsc.edu/cgi-bin/hgTrackUi?g=regPotential7X) to learn more about exactly how the items in the table are partitioned up into regions. Since you are looking for precise scores, one thing to keep in mind when looking at this data is that it is in compressed form and so may appear incorrect in some cases. Here is a link to the uncompressed wiggle values for this data: http://www.bx.psu.edu/~james/esperr_rp_7way_scores/genome_scores_mm8/chr1.scores.truncated.bz2 I hope this is helpful to you. Please don't hesitate to contact us again if you require further assistance. Kayla Smith UCSC Genome Bioinformatics Group Kayla Smith UCSC Genome Bioinformatics Group On Mon, 2 Jul 2007, Shan Yang wrote: > Hi, > > I just downloaded the "regPotential7X.txt" file for mm8. Here is a > question I have about the data. > For example, the 1st line in the file says: > > 608 chr1 3050242 3050650 > chr1.0 1 408 0 /gbdb/mm8/wib/regPotential7X.wib > 0 0.144514 408 1.67677 0.187318 > > To my understanding, this basically means that the 408 bp long region > has a min score of 0 and max score of 0.144514 and the sum of score > in this region is 1.67677, which makes the average to be 0.00408028. > According to the description of this score, anything over 0.001 may > have a big potential to be a regulatory element, which means this > regions is highly possible to be a regulatory element. But if I look > at the browser, I can see that the main contributor of this average > score in this 408bp region is a subregion of chr1:3050626-3050650 > (25bp). So does this mean that this subregion has more potential than > any other nt in this region or every nt in this entire 408 bp region > has the same potential? I think the first should be right. If this is > true, then why does the file record the whole 408 region rather than > just the subregion? > > Thanks a lot! > > Shan > > > Shan Yang, PhD > Genome Biology Division, L-452 > Chemistry, Materials & Life Sciences Directorate (CMLS) > Lawrence Livermore National Laboratory > 7000 East Ave, Livermore, CA, 94550 > > Ph: 925-422-7389 > Fax: 925-422-2099 > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome > From heather at soe.ucsc.edu Wed Jul 4 05:28:35 2007 From: heather at soe.ucsc.edu (Heather Trumbower) Date: Wed, 4 Jul 2007 05:28:35 -0700 (PDT) Subject: [Genome] use of the shortmatch track through an url In-Reply-To: <1183481247.21945.32.camel@llull.imim.es> References: <1183481247.21945.32.camel@llull.imim.es> Message-ID: Hi Robert: The genome browser URL recognizes a good number of key/value pairs. To open to a particular database and region, you use db= and position=. For example: http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg17&position=chr22:15000000-16000000 You can also control the visibility of any track. The value is simply 'full', 'pack', 'squish', 'dense' or 'hide'. The key can be found by looking at the track label (it is a link to the hgTrackUi configuration page) or by doing http://genome.ucsc.edu/cgi-bin/cartDump. The key for the Short Match track is oligoMatch. So now your URL looks like: http://genome.ucsc.edu/cgi-bin/hgTracks?oligoMatch=full&db=hg17&position=chr22:15000000-16000000 The last thing you would want to set is the search string. We keep that in the key hgt.oligoMatch: http://genome.ucsc.edu/cgi-bin/hgTracks?hgt.oligoMatch=ccccc&oligoMatch=full&db=hg17&position=chr22:15000000-16000000 I hope this is what you are looking for, let us know if you have further questions. Heather Trumbower UCSC Genome Bioinformatics Group On Tue, 3 Jul 2007, Robert Castelo wrote: > dear people at ucsc, > > i'd like to ask you if it is possible to use the Short Match track under > the Mapping and Sequencing Tracks directly through an url such that i > can build a link which, when i click on it, opens the genome browser on > a particular region of the genome and showing the perfect matches to > some short sequence of my interest (i.e., the short sequence would have > to go written on the url somehow..) > > i know this could be worked out through a custom track but i was > wondering whether the Short Match would do this job also for me. > > > thanks!! > > robert. > > From Johanne.Duhaime at ircm.qc.ca Wed Jul 4 07:29:11 2007 From: Johanne.Duhaime at ircm.qc.ca (Duhaime Johanne) Date: Wed, 4 Jul 2007 10:29:11 -0400 Subject: [Genome] pslRep pslSort Message-ID: <96C071542ED58D49BC08210D3456D580861100@pandore.ircm.priv> Hello I would like to extract the best match (only one) from a blat output. I have read an old email from Thu 01/03/2007 14:41 (BLAT match, score,percentage - how to select best alignment) where some utilities are suggested (pslRep etc). But I cannot find where I can download these programs. I went around the differents links suggested but could not find it. Thank you in advance Johanne Duhaime From heather at soe.ucsc.edu Wed Jul 4 10:30:27 2007 From: heather at soe.ucsc.edu (Heather Trumbower) Date: Wed, 4 Jul 2007 10:30:27 -0700 (PDT) Subject: [Genome] pslRep pslSort In-Reply-To: <96C071542ED58D49BC08210D3456D580861100@pandore.ircm.priv> References: <96C071542ED58D49BC08210D3456D580861100@pandore.ircm.priv> Message-ID: Duhaime: You can obtain our source code from http://hgdownload.cse.ucsc.edu/admin/jksrc.zip, or via CVS as described at http://genome.cse.ucsc.edu/admin/cvs.html. We have instructions on how to build the source at http://genome.cse.ucsc.edu/admin/jk-install.html. A previous message that answered this question is: http://www.soe.ucsc.edu/pipermail/genome/2006-September/011707.html Heather Trumbower UCSC Genome Bioinformatics Group On Wed, 4 Jul 2007, Duhaime Johanne wrote: > Hello > > I would like to extract the best match (only one) from a blat output. I > have read an old email from Thu 01/03/2007 14:41 (BLAT match, > score,percentage - how to select best alignment) where some utilities > are suggested (pslRep etc). But I cannot find where I can download these > programs. I went around the differents links suggested but could not > find it. > > Thank you in advance > > > Johanne Duhaime > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome > From hartera at soe.ucsc.edu Wed Jul 4 17:31:56 2007 From: hartera at soe.ucsc.edu (Rachel Harte) Date: Wed, 4 Jul 2007 17:31:56 -0700 (PDT) Subject: [Genome] Converting coordinates on a mRNA to coordinates on the genome In-Reply-To: <23B0A4FBD181A44D9B89C4FB3E96D594809C85@ussemx1100.merck.com> References: <1183481247.21945.32.camel@llull.imim.es> <23B0A4FBD181A44D9B89C4FB3E96D594809C85@ussemx1100.merck.com> Message-ID: Hello Archie, We have a tool called pslMap which is in the directory, src/hg/pslMap/, in the Genome Browser source tree that will help you do this calculation. If you type pslMap at the command line, you will get help for using the program. The input files should be in PSL format - see http://genome.ucsc.edu/FAQ/FAQformat#format2 You should create a PSL file of coordinates for your mRNAs as the inPsl and then the mapPsl should be a PSL file of mRNA alignments to a genome downloaded from our downloads server: http://hgdownload.cse.ucsc.edu Click on the organism of interest, find the correct assembly, and then click on the "Annotation database" link. all_mrna.txt.gz is the PSL file of mRNA alignments and refSeqAli.txt.gz is the PSL file for RefSeq mRNA alignments. I hope that this helps you. Please let us know if you have further questions. Rachel Rachel Harte UCSC Genome Bioinformatics Group http://genome.ucsc.edu On Tue, 3 Jul 2007, Russell, Archie wrote: > > Hi, > > I have some coordinates relative to mRNAs (transcript features) that I'd > like to turn into coordinates on the genome. I can blat the mRNA to > the genome, but doing the math for the translation is a little tricky, > in particular on the reverse strand. Do you have any code or > recommendations for how to do this? > > Thanks, > Archie > > > ------------------------------------------------------------------------------ > Notice: This e-mail message, together with any attachments, contains > information of Merck & Co., Inc. (One Merck Drive, Whitehouse Station, > New Jersey, USA 08889), and/or its affiliates (which may be known > outside the United States as Merck Frosst, Merck Sharp & Dohme or MSD > and in Japan, as Banyu - direct contact information for affiliates is > available at http://www.merck.com/contact/contacts.html) that may be > confidential, proprietary copyrighted and/or legally privileged. It is > intended solely for the use of the individual or entity named on this > message. If you are not the intended recipient, and have received this > message in error, please notify us immediately by reply e-mail and then > delete it from your system. > > ------------------------------------------------------------------------------ > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome > From robert.castelo at upf.edu Thu Jul 5 01:21:13 2007 From: robert.castelo at upf.edu (Robert Castelo) Date: Thu, 05 Jul 2007 10:21:13 +0200 Subject: [Genome] use of the shortmatch track through an url In-Reply-To: References: <1183481247.21945.32.camel@llull.imim.es> Message-ID: <1183623673.27748.5.camel@llull.imim.es> Heather, thanks a lot, this is exactly what i was looking for. best regards, robert. On Wed, 2007-07-04 at 05:28 -0700, Heather Trumbower wrote: > Hi Robert: > > The genome browser URL recognizes a good number of key/value pairs. > To open to a particular database and region, you use db= and position=. > For example: > > http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg17&position=chr22:15000000-16000000 > > You can also control the visibility of any track. The value is simply > 'full', 'pack', 'squish', 'dense' or 'hide'. The key can be found by > looking at the track label (it is a link to the hgTrackUi configuration > page) or by doing http://genome.ucsc.edu/cgi-bin/cartDump. > > The key for the Short Match track is oligoMatch. > > So now your URL looks like: > > http://genome.ucsc.edu/cgi-bin/hgTracks?oligoMatch=full&db=hg17&position=chr22:15000000-16000000 > > The last thing you would want to set is the search string. We keep that > in the key hgt.oligoMatch: > > http://genome.ucsc.edu/cgi-bin/hgTracks?hgt.oligoMatch=ccccc&oligoMatch=full&db=hg17&position=chr22:15000000-16000000 > > I hope this is what you are looking for, let us know if you have further > questions. > > Heather Trumbower > UCSC Genome Bioinformatics Group > > > On Tue, 3 Jul 2007, Robert Castelo wrote: > > > dear people at ucsc, > > > > i'd like to ask you if it is possible to use the Short Match track under > > the Mapping and Sequencing Tracks directly through an url such that i > > can build a link which, when i click on it, opens the genome browser on > > a particular region of the genome and showing the perfect matches to > > some short sequence of my interest (i.e., the short sequence would have > > to go written on the url somehow..) > > > > i know this could be worked out through a custom track but i was > > wondering whether the Short Match would do this job also for me. > > > > > > thanks!! > > > > robert. > > > > > From Joao.Fadista at agrsci.dk Thu Jul 5 01:50:23 2007 From: Joao.Fadista at agrsci.dk (=?iso-8859-1?Q?Jo=E3o_Fadista?=) Date: Thu, 5 Jul 2007 10:50:23 +0200 Subject: [Genome] cow lift Over Message-ID: Hi, I would like to know if it is possible to use your LiftOver tool to convert genome coordinates from the cow assembly August 2005 to the cow assembly August 2006. Best regards Jo?o Fadista Ph.d. student UNIVERSITY OF AARHUS Faculty of Agricultural Sciences Dept. of Genetics and Biotechnology Blichers All? 20, P.O. BOX 50 DK-8830 Tjele Phone: +45 8999 1900 Direct: +45 8999 1900 E-mail: Joao.Fadista at agrsci.dk Web: www.agrsci.org ________________________________ News and news media . This email may contain information that is confidential. Any use or publication of this email without written permission from Faculty of Agricultural Sciences is not allowed. If you are not the intended recipient, please notify Faculty of Agricultural Sciences immediately and delete this email. From kayla at soe.ucsc.edu Thu Jul 5 08:13:41 2007 From: kayla at soe.ucsc.edu (Kayla Smith) Date: Thu, 5 Jul 2007 08:13:41 -0700 (PDT) Subject: [Genome] cow lift Over In-Reply-To: References: Message-ID: Joao, The bosTau2 to bosTao3 liftOver should be ready next week. I will email you when it is available. Thanks, Kayla Smith UCSC Genome Bioinformatics Group On Thu, 5 Jul 2007, [iso-8859-1] Jo?o Fadista wrote: > Hi, > > I would like to know if it is possible to use your LiftOver tool to convert genome coordinates from the cow assembly August 2005 to the cow assembly August 2006. > > > > Best regards > > Jo?o Fadista > Ph.d. student > > > > UNIVERSITY OF AARHUS > Faculty of Agricultural Sciences > Dept. of Genetics and Biotechnology > Blichers All? 20, P.O. BOX 50 > DK-8830 Tjele > > Phone: +45 8999 1900 > Direct: +45 8999 1900 > E-mail: Joao.Fadista at agrsci.dk > Web: www.agrsci.org > ________________________________ > > News and news media . > > This email may contain information that is confidential. Any use or publication of this email without written permission from Faculty of Agricultural Sciences is not allowed. If you are not the intended recipient, please notify Faculty of Agricultural Sciences immediately and delete this email. > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome > From cooper303 at btinternet.com Thu Jul 5 07:22:48 2007 From: cooper303 at btinternet.com (JONATHAN COOPER) Date: Thu, 5 Jul 2007 15:22:48 +0100 (BST) Subject: [Genome] identifying genes... Message-ID: <855957.9268.qm@web86403.mail.ird.yahoo.com> Hi There, I am trying to identify genes associated with particular genomic intervals, but not having much luck. I have uploaded a file of RE1 transcription factor binding site coordinates into the table browser, and am trying to identify any genes that surround these sites (using "gene and gene prediction tracks") and intersecting it with my custom track. However, from the 60 odd sites in my track, only 4 genes have been identified. Is there something i am doing wrong? I was hoping to get a list of genes identified near each site that i could then cross-reference to find out functional information about. If you have any suggestions i'd be very grateful. best regards, jonathan cooper From shaharal at post.tau.ac.il Thu Jul 5 04:57:30 2007 From: shaharal at post.tau.ac.il (shahar alon) Date: Thu, 5 Jul 2007 13:57:30 +0200 Subject: [Genome] mRNA\genome side by side alignment Message-ID: <20070705105751.3AFDCBCC085@post.tau.ac.il> I'm trying to find mRNA\genome side by side alignment for all the RefSeq genes in the zebrafish. Is there a file with that data on your servers? (The file should contain the actual alignment of the mRNA to the genome and not the statistics of the alignment) If there isn't such a file, how can I create one using the Table browser? Thank's Shahar alon, Israel From stephenf at ebi.ac.uk Thu Jul 5 05:16:19 2007 From: stephenf at ebi.ac.uk (Stephen Fitzgerald) Date: Thu, 5 Jul 2007 13:16:19 +0100 (BST) Subject: [Genome] ftp server down ? Message-ID: Hi guys, your ftp server seems to be out of action. Cheers, Stephen. From rosenfel at cshl.edu Thu Jul 5 08:38:15 2007 From: rosenfel at cshl.edu (Jeffrey Rosenfeld) Date: Thu, 05 Jul 2007 11:38:15 -0400 Subject: [Genome] Subtracting BED lists Message-ID: <468D1067.6050202@cshl.edu> Do you have a downloadable program that will subtract two lists of BED points. I have found bedIntersect in the Kent source files and it has a similar function, but I could not find a program that does the reverse. For two given BED files, I want a list of genomic locations that are only in one of them. Thank You, Jeffrey Rosenfeld Cold Spring Harbor Lab From rhead at soe.ucsc.edu Thu Jul 5 14:17:19 2007 From: rhead at soe.ucsc.edu (Brooke Rhead) Date: Thu, 05 Jul 2007 14:17:19 -0700 Subject: [Genome] Subtracting BED lists In-Reply-To: <468D1067.6050202@cshl.edu> References: <468D1067.6050202@cshl.edu> Message-ID: <468D5FDF.4090104@soe.ucsc.edu> Hi Jeffrey, There is another tool in the Kent source called 'featureBits' that can do this. There are many options for featureBits -- here is the usage statement: ----- featureBits - Correlate tables via bitmap projections. usage: featureBits database table(s) This will return the number of bits in all the tables anded together Pipe warning: output goes to stderr. Options: -bed=output.bed Put intersection into bed format. Can use stdout. -fa=output.fa Put sequence in intersection into .fa file -faMerge For fa output merge overlapping features. -minSize=N Minimum size to output (default 1) -chrom=chrN Restrict to one chromosome -chromSize=sizefile read chrom sizes from file instead of database. -or Or tables together instead of anding them -not Output negation of resulting bit set. -countGaps Count gaps in denominator -noRandom Don't include _random (or Un) chromosomes -noHap Don't include _hap chromosomes -dots=N Output dot every N chroms (scaffolds) processed -minFeatureSize=n Don't include bits of the track that are smaller than minFeatureSize, useful for differentiating between alignment gaps and introns. -bin=output.bin Put bin counts in output file -binSize=N Bin size for generating counts in bin file (default 500000) -binOverlap=N Bin overlap for generating counts in bin file (default 250000) -bedRegionIn=input.bed Read in a bed file for bin counts in specific regions and write to bedRegionsOut -bedRegionOut=output.bed Write a bed file of bin counts in specific regions from bedRegionIn -enrichment Calculates coverage and enrichment assuming first table is reference gene track and second track something else '-where=some sql pattern' restrict to features matching some sql pattern You can include a '!' before a table name to negate it. Some table names can be followed by modifiers such as: :exon:N Break into exons and add N to each end of each exon :cds Break into coding exons :intron:N Break into introns, remove N from each end :utr5, :utr3 Break into 5' or 3' UTRs :upstream:N Consider the region of N bases before region :end:N Consider the region of N bases after region :score:N Consider records with score >= N :upstreamAll:N Like upstream, but doesn't filter out genes that have txStart==cdsStart or txEnd==cdsEnd :endAll:N Like end, but doesn't filter out genes that have txStart==cdsStart or txEnd==cdsEnd The tables can be bed, psl, or chain files, or a directory full of such files as well as actual database tables. To count the bits used in dir/chrN_something*.bed you'd do: featureBits database dir/_something.bed ----- Note that the -not option will negate the result of an intersection. Alternatively, you can use the online Table Browser tool to get the list of locations that belong only to the first list. To do this, first upload your two BEDs as custom tracks. Then go to the Table Browser and select one of the BED tracks. From here you can proceed in a couple of different ways. One way to do it is to intersect the two lists, then intersect the first list with the complement of the intersection from the first step. For instance, with the first BED selected, hit the "intersection: create" button and choose the second BED track. Save this intersection as a third custom track. Then go back to the Table Browser and select the first BED again, and hit "intersection: create" again. This time, choose your new custom track. Also check the box to "Complement [your custom track] before intersection/union". This intersection should contain the regions from the first list that are not in the intersection of the two lists. I hope this information helps. Please let us know if we can clarify any of the above. -- Brooke Rhead UCSC Genome Bioinformatics Group Jeffrey Rosenfeld wrote: > Do you have a downloadable program that will subtract two lists of BED > points. I have found bedIntersect in the Kent source files and it has a > similar function, but I could not find a program that does the reverse. > For two given BED files, I want a list of genomic locations that are > only in one of them. > > Thank You, > > Jeffrey Rosenfeld > Cold Spring Harbor Lab > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome > From kayla at soe.ucsc.edu Thu Jul 5 12:57:26 2007 From: kayla at soe.ucsc.edu (Kayla Smith) Date: Thu, 5 Jul 2007 12:57:26 -0700 (PDT) Subject: [Genome] identifying genes... In-Reply-To: <855957.9268.qm@web86403.mail.ird.yahoo.com> References: <855957.9268.qm@web86403.mail.ird.yahoo.com> Message-ID: Johnathan, It sounds like you have the correct strategy when you describe using the Table Browser to intersect a gene track with your custom track. However, I notice that you say you're using "genes and gene prediction tracks". That is actually subsection of the tracks we have available, and an individual track/table is used in the intersection. Once you've chosen "genes and gene prediction tracks" you have to also choose which track/table it is that you will be using. I recommend using track "UCSC genes" and table: "knownGene". Also check that you have the "region" option set to "genome" as this could be a reason your results are limited. I hope this is helpful to you. Please don't hesitate to write back if you require further assistance. Kayla Smith UCSC Genome Bioinformatics Group On Thu, 5 Jul 2007, JONATHAN COOPER wrote: > Hi There, > > I am trying to identify genes associated with particular genomic intervals, but not having much luck. I have uploaded a file of RE1 transcription factor binding site coordinates into the table browser, and am trying to identify any genes that surround these sites (using "gene and gene prediction tracks") and intersecting it with my custom track. However, from the 60 odd sites in my track, only 4 genes have been identified. Is there something i am doing wrong? I was hoping to get a list of genes identified near each site that i could then cross-reference to find out functional information about. > > If you have any suggestions i'd be very grateful. > > best regards, > jonathan cooper > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome > From nhansen at mail.nih.gov Thu Jul 5 12:35:27 2007 From: nhansen at mail.nih.gov (Nancy Hansen) Date: Thu, 05 Jul 2007 15:35:27 -0400 Subject: [Genome] Source accession numbers for assemblies? Message-ID: <468D47FF.80601@mail.nih.gov> Hello, browser team! The NISC sequencing center is preparing to submit medical sequencing traces to the NCBI trace archive, and one field we need to include is the NCBI chromosome accession from which we designed our PCR primers. The only thing we're currently tracking in our database is the UCSC assembly code (e.g., "hg17") and the chromosome. Do you have a way for us to download accession numbers/versions for each chromosome of each of your assemblies (e.g., hg17 chr1 => NC_00001.9, hg17 chr2 => NC_00002.10)? I couldn't find this info in the table browser, but I'm guessing you have it somewhere. Thanks for all your great work! --Nancy ************************************* Nancy F. Hansen, PhD nhansen at nhgri.nih.gov Comparative Genomics Unit, NHGRI 5625 Fishers Lane Rockville, MD 20852 Phone: (301) 435-1560 Fax: (301) 435-6170 From rhead at soe.ucsc.edu Thu Jul 5 11:52:17 2007 From: rhead at soe.ucsc.edu (Brooke Rhead) Date: Thu, 05 Jul 2007 11:52:17 -0700 Subject: [Genome] mRNA\genome side by side alignment In-Reply-To: <20070705105751.3AFDCBCC085@post.tau.ac.il> References: <20070705105751.3AFDCBCC085@post.tau.ac.il> Message-ID: <468D3DE1.3080607@soe.ucsc.edu> Hello Shahar, We do not keep the side-by-side alignments in files. The alignment displays in the Genome Browser are generated on the fly from the psl table associated with the RefSeq Genes track, called 'refSeqAli'. (psl format is described here: http://genome.ucsc.edu/FAQ/FAQformat#format2) There is not a way to generate these alignment displays via the Table Browser. However, we do have a utility in our source tree called 'pslPretty' that can generate the alignment displays from a psl and the corresponding sequence files. Information about downloading the Genome Browser source (which is free for academic, noncommercial, and personal use) is located here: http://hgdownload.cse.ucsc.edu/downloads.html#source_downloads Once the source is installed and the utilities made, you can run the 'pslPretty' command without any arguments to get a usage statement. The zebrafish sequence files are available for download here, as well as the zebrafish refSeq mRNA sequences: http://hgdownload.cse.ucsc.edu/goldenPath/danRer4/bigZips/ I hope this information helps. If you have any further questions, pleas do not hesitate to write back to this list. -- Brooke Rhead UCSC Genome Bioinformatics Group shahar alon wrote: > I'm trying to find mRNA\genome side by side alignment for all the RefSeq > genes in the zebrafish. > > Is there a file with that data on your servers? > > (The file should contain the actual alignment of the mRNA to the genome and > not the statistics of the alignment) > > If there isn't such a file, how can I create one using the Table browser? > > Thank's > > Shahar alon, Israel > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome > From kayla at soe.ucsc.edu Thu Jul 5 10:16:55 2007 From: kayla at soe.ucsc.edu (Kayla Smith) Date: Thu, 5 Jul 2007 10:16:55 -0700 (PDT) Subject: [Genome] ftp server down ? In-Reply-To: References: Message-ID: Steven, Our public FTP server: ftp://hgdownload.cse.ucsc.edu/ appears to be working fine for us this morning. Could you try again and see if it works for you? If not please send me the name of the file(s) you are trying to access and we can look into it further. Kayla Smith UCSC Genome Bioinformatics Group On Thu, 5 Jul 2007, Stephen Fitzgerald wrote: > Hi guys, your ftp server seems to be out of action. > Cheers, Stephen. > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome > From arhan at ucla.edu Thu Jul 5 11:58:42 2007 From: arhan at ucla.edu (Areum Han) Date: Thu, 5 Jul 2007 11:58:42 -0700 Subject: [Genome] size of the whole files. Message-ID: <000801c7bf36$8197fff0$a92743a4@variome> Hello. I would like to download uscs mouse data. What is the total size of whole files in mm8/* folders? Sincerely yours, Areum Han. ----------------------------------------------------------- Areum Han Graduate student, Biomedical Engineering Dept., UCLA 310-775-1606 / arhan at ucla.edu ----------------------------------------------------------- From Andrew_Yee at dfci.harvard.edu Thu Jul 5 15:17:08 2007 From: Andrew_Yee at dfci.harvard.edu (Yee, Andrew J.,M.D.) Date: Thu, 5 Jul 2007 18:17:08 -0400 Subject: [Genome] table browser for IMAGE clone information Message-ID: Do you have any tables that incorporate I.M.A.G.E clone information (e.g. http://image.llnl.gov/)? Thanks, Andrew Andrew J. Yee, MD Instructor in Medicine, Harvard Medical School Assistant in Medicine, Mass. General Hospital Cancer Center The information transmitted in this electronic communication is intended only for the person or entity to whom it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of or taking of any action in reliance upon this information by persons or entities other than the intended recipient is prohibited. If you received this information in error, please contact the Compliance HelpLine at 800-856-1983 and properly dispose of this information. From rhead at soe.ucsc.edu Thu Jul 5 18:19:39 2007 From: rhead at soe.ucsc.edu (Brooke Rhead) Date: Thu, 05 Jul 2007 18:19:39 -0700 Subject: [Genome] size of the whole files. In-Reply-To: <000801c7bf36$8197fff0$a92743a4@variome> References: <000801c7bf36$8197fff0$a92743a4@variome> Message-ID: <468D98AB.9000605@soe.ucsc.edu> Hello Areum, Here are the approximate total sizes for each of the mm8 directories on our downloads server (here: http://hgdownload.cse.ucsc.edu/goldenPath/mm8/): 2.3G bigZips/ 804M chromosomes/ 8.0K database/ 927M liftOver/ 4.8G multiz17way/ 1.5G phastCons17Scores/ 3.7G regPotential7X/ 145M vsAnoCar1/ 789M vsBosTau2/ 888M vsCanFam2/ 65M vsDanRer3 70M vsDanRer4 1.1G vsEquCab1/ 540M vsFelCat3/ 59M vsFr1 83M vsGalGal2 96M vsGalGal3 67M vsGasAcu1/ 1.1G vsHg17 1.1G vsHg18 409M vsMonDom4/ 223M vsOrnAna1/ 951M vsPanTro1 1.1G vsPanTro2 909M vsRheMac2/ 2.6G vsRn4/ 60M vsTetNig1/ 139M vsXenTro1 176M vsXenTro2 This is a grand total of ~26G. If you have any further questions, please do not hesitate to contact us again. -- Brooke Rhead UCSC Genome Bioinformatics Group Areum Han wrote: > Hello. > I would like to download uscs mouse data. > What is the total size of whole files in mm8/* folders? > > Sincerely yours, > Areum Han. > ----------------------------------------------------------- > Areum Han > Graduate student, > Biomedical Engineering Dept., UCLA > 310-775-1606 / arhan at ucla.edu > ----------------------------------------------------------- > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome > From kayla at soe.ucsc.edu Thu Jul 5 22:02:27 2007 From: kayla at soe.ucsc.edu (Kayla Smith) Date: Thu, 5 Jul 2007 22:02:27 -0700 (PDT) Subject: [Genome] table browser for IMAGE clone information In-Reply-To: References: Message-ID: Andrew, We have a table for most assemblies called imageClone. You can download the imageClone table for the hg18 database here: http://hgdownload.cse.ucsc.edu/goldenPath/hg18/database/imageClone.txt.gz If you want more information than that table provides alone, you can go to the Table Browser, and select the following options: clade: Vertebrate genome: Human assembly: Mar. 2006 group: All Tables table: imageClone You can click on the "describe table schema" button to see how this table is connected to others. The LNDL link you provide has a lot of information, so I'm not sure exactly what you're looking for. If you can be more specific about what you're looking for, I can help to direct you to the appropriate information. I hope this is helpful to you. Please don't hesitate to contact us again if you require further assistance. Kayla Smith UCSC Genome Bioinformatics Group On Thu, 5 Jul 2007, Yee, Andrew J.,M.D. wrote: > Do you have any tables that incorporate I.M.A.G.E clone information (e.g. > http://image.llnl.gov/)? > > Thanks, > Andrew > > Andrew J. Yee, MD > Instructor in Medicine, Harvard Medical School > Assistant in Medicine, Mass. General Hospital Cancer Center > > > > > > > The information transmitted in this electronic communication is intended only for the person or entity to whom it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of or taking of any action in reliance upon this information by persons or entities other than the intended recipient is prohibited. If you received this information in error, please contact the Compliance HelpLine at 800-856-1983 and properly dispose of this information. > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome > From daren76 at hotmail.com Thu Jul 5 20:48:29 2007 From: daren76 at hotmail.com (daren76 daren76) Date: Fri, 06 Jul 2007 03:48:29 +0000 Subject: [Genome] How to check for homologies in other species ? Message-ID: Hi friends, I have downloaded the pairwise and multiple alignment informations from http://hgdownload.cse.ucsc.edu/goldenPath/hg18. I have a set of genes about 100 nts in length from human to check for homologies in other species. Hope someone can point me the steps. Thanks Daren _________________________________________________________________ Get the new Windows Live Messenger! http://get.live.com/messenger/overview From manrai at gmail.com Fri Jul 6 09:22:18 2007 From: manrai at gmail.com (Arjun Kumar Manrai) Date: Fri, 6 Jul 2007 12:22:18 -0400 Subject: [Genome] heterochromatin Message-ID: <1037ea710707060922j29246f7au57eb96824a8c79b2@mail.gmail.com> Hello, Is it possible to find out which parts of the sequence for a given species are annotated as heterochromatin and which are annotated as euchromatin? And does UCSC have summary statistics available on what proportion of the predicted amount of heterochromatin is currently sequenced for a given species? Thanks, Arjun (Raj) K. Manrai Harvard University From stephenf at ebi.ac.uk Fri Jul 6 01:18:44 2007 From: stephenf at ebi.ac.uk (Stephen Fitzgerald) Date: Fri, 6 Jul 2007 09:18:44 +0100 (BST) Subject: [Genome] ftp server down ? In-Reply-To: References: Message-ID: Hi Kayla, it came back up at about 16:00 BST, so I downloaded the files I needed. Cheers, Stephen. On Thu, 5 Jul 2007, Kayla Smith wrote: > > Steven, > > Our public FTP server: ftp://hgdownload.cse.ucsc.edu/ > > appears to be working fine for us this morning. Could you try again and > see if it works for you? If not please send me the name of the file(s) > you are trying to access and we can look into it further. > > Kayla Smith > UCSC Genome Bioinformatics Group > > On Thu, 5 Jul 2007, Stephen Fitzgerald wrote: > >> Hi guys, your ftp server seems to be out of action. >> Cheers, Stephen. >> _______________________________________________ >> Genome maillist - Genome at soe.ucsc.edu >> http://www.soe.ucsc.edu/mailman/listinfo/genome >> > From Andrew_Yee at dfci.harvard.edu Fri Jul 6 04:39:02 2007 From: Andrew_Yee at dfci.harvard.edu (Yee, Andrew J.,M.D.) Date: Fri, 6 Jul 2007 07:39:02 -0400 Subject: [Genome] table browser for IMAGE clone information In-Reply-To: Message-ID: Thanks for the information, it was very helpful (I should have explored the "all tables" option under "group" a little bit more thoroughly!). Andrew -----Original Message----- From: Kayla Smith [mailto:kayla at soe.ucsc.edu] Sent: Friday, July 06, 2007 1:02 AM To: Yee, Andrew J.,M.D. Cc: genome at soe.ucsc.edu Subject: Re: [Genome] table browser for IMAGE clone information Andrew, We have a table for most assemblies called imageClone. You can download the imageClone table for the hg18 database here: http://hgdownload.cse.ucsc.edu/goldenPath/hg18/database/imageClone.txt.gz If you want more information than that table provides alone, you can go to the Table Browser, and select the following options: clade: Vertebrate genome: Human assembly: Mar. 2006 group: All Tables table: imageClone You can click on the "describe table schema" button to see how this table is connected to others. The LNDL link you provide has a lot of information, so I'm not sure exactly what you're looking for. If you can be more specific about what you're looking for, I can help to direct you to the appropriate information. I hope this is helpful to you. Please don't hesitate to contact us again if you require further assistance. Kayla Smith UCSC Genome Bioinformatics Group On Thu, 5 Jul 2007, Yee, Andrew J.,M.D. wrote: > Do you have any tables that incorporate I.M.A.G.E clone information (e.g. > http://image.llnl.gov/)? > > Thanks, > Andrew > > Andrew J. Yee, MD > Instructor in Medicine, Harvard Medical School Assistant in Medicine, > Mass. General Hospital Cancer Center > > > > > > > The information transmitted in this electronic communication is intended only for the person or entity to whom it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of or taking of any action in reliance upon this information by persons or entities other than the intended recipient is prohibited. If you received this information in error, please contact the Compliance HelpLine at 800-856-1983 and properly dispose of this information. > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome > The information transmitted in this electronic communication is intended only for the person or entity to whom it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of or taking of any action in reliance upon this information by persons or entities other than the intended recipient is prohibited. If you received this information in error, please contact the Compliance HelpLine at 800-856-1983 and properly dispose of this information. From Andrew_Yee at dfci.harvard.edu Fri Jul 6 05:47:46 2007 From: Andrew_Yee at dfci.harvard.edu (Yee, Andrew J.,M.D.) Date: Fri, 6 Jul 2007 08:47:46 -0400 Subject: [Genome] table browser for IMAGE clone information In-Reply-To: Message-ID: As a follow up, in terms of using the Table Browser, is there a way you can browse the table by entering the IMAGE clone number, instead of the GenBank accession number? Thanks, Andrew -----Original Message----- From: Kayla Smith [mailto:kayla at soe.ucsc.edu] Sent: Friday, July 06, 2007 1:02 AM To: Yee, Andrew J.,M.D. Cc: genome at soe.ucsc.edu Subject: Re: [Genome] table browser for IMAGE clone information Andrew, We have a table for most assemblies called imageClone. You can download the imageClone table for the hg18 database here: http://hgdownload.cse.ucsc.edu/goldenPath/hg18/database/imageClone.txt.gz If you want more information than that table provides alone, you can go to the Table Browser, and select the following options: clade: Vertebrate genome: Human assembly: Mar. 2006 group: All Tables table: imageClone You can click on the "describe table schema" button to see how this table is connected to others. The LNDL link you provide has a lot of information, so I'm not sure exactly what you're looking for. If you can be more specific about what you're looking for, I can help to direct you to the appropriate information. I hope this is helpful to you. Please don't hesitate to contact us again if you require further assistance. Kayla Smith UCSC Genome Bioinformatics Group On Thu, 5 Jul 2007, Yee, Andrew J.,M.D. wrote: > Do you have any tables that incorporate I.M.A.G.E clone information (e.g. > http://image.llnl.gov/)? > > Thanks, > Andrew > > Andrew J. Yee, MD > Instructor in Medicine, Harvard Medical School Assistant in Medicine, > Mass. General Hospital Cancer Center > > > > > > > The information transmitted in this electronic communication is intended only for the person or entity to whom it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of or taking of any action in reliance upon this information by persons or entities other than the intended recipient is prohibited. If you received this information in error, please contact the Compliance HelpLine at 800-856-1983 and properly dispose of this information. > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome > The information transmitted in this electronic communication is intended only for the person or entity to whom it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of or taking of any action in reliance upon this information by persons or entities other than the intended recipient is prohibited. If you received this information in error, please contact the Compliance HelpLine at 800-856-1983 and properly dispose of this information. From rhead at soe.ucsc.edu Fri Jul 6 09:51:17 2007 From: rhead at soe.ucsc.edu (Brooke Rhead) Date: Fri, 06 Jul 2007 09:51:17 -0700 Subject: [Genome] Source accession numbers for assemblies? In-Reply-To: <468D47FF.80601@mail.nih.gov> References: <468D47FF.80601@mail.nih.gov> Message-ID: <468E7305.6000300@soe.ucsc.edu> Hello Nancy, I have spoken to our developers to see if we maintain a list of chromosome accession numbers here, and we do not. We suggest you contact the NCBI helpdesk at info at ncbi.nlm.nih.gov. Sorry we can't be of more assistance. -- Brooke Rhead UCSC Genome Bioinformatics Group Nancy Hansen wrote: > Hello, browser team! > > The NISC sequencing center is preparing to submit medical sequencing > traces to the NCBI trace archive, and one field we need to include is > the NCBI chromosome accession from which we designed our PCR primers. > The only thing we're currently tracking in our database is the UCSC > assembly code (e.g., "hg17") and the chromosome. Do you have a way for > us to download accession numbers/versions for each chromosome of each of > your assemblies (e.g., hg17 chr1 => NC_00001.9, hg17 chr2 => > NC_00002.10)? I couldn't find this info in the table browser, but I'm > guessing you have it somewhere. > > Thanks for all your great work! > --Nancy > > ************************************* > Nancy F. Hansen, PhD nhansen at nhgri.nih.gov > Comparative Genomics Unit, NHGRI > 5625 Fishers Lane > Rockville, MD 20852 > Phone: (301) 435-1560 Fax: (301) 435-6170 > > > > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome > From ann at soe.ucsc.edu Fri Jul 6 10:02:00 2007 From: ann at soe.ucsc.edu (Ann Zweig) Date: Fri, 06 Jul 2007 10:02:00 -0700 Subject: [Genome] About fetching bulk results of microRNA target prediction In-Reply-To: References: Message-ID: <468E7588.1030904@soe.ucsc.edu> Hello JW, The track you're interested in is available on the previous human assembly (May 2004 -- we call it hg17). Choose that assembly from the website gateway, then navigate down to the track called "picTar miRNA" in the "Expression and Regulation" section of the track controls. Regards, ---------- Ann Zweig UCSC Genome Bioinformatics Group http://genome.ucsc.edu Please feel free to search the Genome mailing list archives by visiting our home page, clicking on "Contact Us", then typing a word or phrase into the search box. On that same page (http://genome.ucsc.edu/contacts.html), you can subscribe to the Genome mailing list. > Hi there, > > I've been seeking the track about pictar microRNA target prediction from > UCSC genome browser and trying to download the bulk results from it as > the pictar website guides. However I can't find the related track from > genome browser, could you please show me how to find it ? Thanks, > > Savant JW > From ann at soe.ucsc.edu Fri Jul 6 10:06:53 2007 From: ann at soe.ucsc.edu (Ann Zweig) Date: Fri, 06 Jul 2007 10:06:53 -0700 Subject: [Genome] About fetching bulk results of microRNA target prediction In-Reply-To: <468E7588.1030904@soe.ucsc.edu> References: <468E7588.1030904@soe.ucsc.edu> Message-ID: <468E76AD.2000102@soe.ucsc.edu> Hello again, JW, I realize that I didn't quite answer all of your question. You can use the Table Browser ("Tables" in the top blue navigation bar) to download the table containing the picTar data. Configure the Table Browser like so: clade: Vertebrate genome: Human assembly: May 2004 group: Expression and Regulation track: PicTar miRNA table: PicTar 4 [or 5] Species region: genome Choose your output type for download. Be sure to let us know if you need more detailed instructions. Regards, Ann Zweig. Ann Zweig wrote: > Hello JW, > > The track you're interested in is available on the previous human > assembly (May 2004 -- we call it hg17). Choose that assembly from the > website gateway, then navigate down to the track called "picTar miRNA" > in the "Expression and Regulation" section of the track controls. > > Regards, > > ---------- > Ann Zweig > UCSC Genome Bioinformatics Group > http://genome.ucsc.edu > > Please feel free to search the Genome mailing list archives by visiting > our home page, clicking on "Contact Us", then typing a word or phrase > into the search box. On that same page > (http://genome.ucsc.edu/contacts.html), you can subscribe to the Genome > mailing list. > > > >> Hi there, >> >> I've been seeking the track about pictar microRNA target prediction >> from UCSC genome browser and trying to download the bulk results from >> it as the pictar website guides. However I can't find the related >> track from genome browser, could you please show me how to find it ? >> Thanks, >> >> Savant JW >> > From ann at soe.ucsc.edu Fri Jul 6 13:46:17 2007 From: ann at soe.ucsc.edu (Ann Zweig) Date: Fri, 06 Jul 2007 13:46:17 -0700 Subject: [Genome] heterochromatin In-Reply-To: <1037ea710707060922j29246f7au57eb96824a8c79b2@mail.gmail.com> References: <1037ea710707060922j29246f7au57eb96824a8c79b2@mail.gmail.com> Message-ID: <468EAA19.8060802@cse.ucsc.edu> Hello Raj, It's possible to get a general sense of this by looking at the gap track. Each individual gap includes a label as to the type of gap (heterochromatin is one type). You can use the table browser to view only those gaps that are of type heterochromatin. Read more about using the Table Browser here: http://genome.ucsc.edu/goldenPath/help/hgTablesHelp.html For the human assemblies, we have generated statistics about the entire assembly, on a per-chromosome basis. These stats include information about the amount of non-euch. gaps. You can find the stats for the latest human assembly (hg18 (NCBI build 36.1)) here: http://genome.ucsc.edu/goldenPath/stats.html#hg18 NCBI also has statistics about this build here: http://www.ncbi.nlm.nih.gov/mapview/stats/BuildStats.cgi?taxid=9606&build=36&ver=1 I hope this helps you get started in your search. Please feel free to write back to the list if you need more information. Regards, ---------- Ann Zweig UCSC Genome Bioinformatics Group http://genome.ucsc.edu Please feel free to search the Genome mailing list archives by visiting our home page, clicking on "Contact Us", then typing a word or phrase into the search box. On that same page (http://genome.ucsc.edu/contacts.html), you can subscribe to the Genome mailing list. Arjun Kumar Manrai wrote: > Hello, > > Is it possible to find out which parts of the sequence for a given species > are annotated as heterochromatin and which are annotated as euchromatin? And > does UCSC have summary statistics available on what proportion of the > predicted amount of heterochromatin is currently sequenced for a given > species? > > Thanks, > Arjun (Raj) K. Manrai > Harvard University > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From ann at soe.ucsc.edu Fri Jul 6 14:15:02 2007 From: ann at soe.ucsc.edu (Ann Zweig) Date: Fri, 06 Jul 2007 14:15:02 -0700 Subject: [Genome] table browser for IMAGE clone information In-Reply-To: References: Message-ID: <468EB0D6.5080603@cse.ucsc.edu> Hello Andrew, Yes, you can do this using the Table Browser. Set it up as my colleague, Kayla, suggests. Then press the filter 'create' button. On this page, enter your IMAGE clone number in the "Image ID =" box. Then press 'submit'. Choose 'all fields from selected table' as the output format, then press "get output". You will see a list of all rows of the table that match your IMAGE ID. For example, if you search for the IMAGE ID of 40370, the output will be: #filter: imageClone.imageId = 40370 #imageId acc type direction 40370 AF520794 mRNA 0 40370 R55267 EST 5 40370 R55268 EST 3 I hope this is helpful to you. Regards, ---------- Ann Zweig UCSC Genome Bioinformatics Group http://genome.ucsc.edu Please feel free to search the Genome mailing list archives by visiting our home page, clicking on "Contact Us", then typing a word or phrase into the search box. On that same page (http://genome.ucsc.edu/contacts.html), you can subscribe to the Genome mailing list. Yee, Andrew J.,M.D. wrote: > As a follow up, in terms of using the Table Browser, is there a way you can > browse the table by entering the IMAGE clone number, instead of the GenBank > accession number? > > Thanks, > Andrew > > -----Original Message----- > From: Kayla Smith [mailto:kayla at soe.ucsc.edu] > Sent: Friday, July 06, 2007 1:02 AM > To: Yee, Andrew J.,M.D. > Cc: genome at soe.ucsc.edu > Subject: Re: [Genome] table browser for IMAGE clone information > > > Andrew, > > We have a table for most assemblies called imageClone. You can download the > imageClone table for the hg18 database here: > http://hgdownload.cse.ucsc.edu/goldenPath/hg18/database/imageClone.txt.gz > > If you want more information than that table provides alone, you can go to the > Table Browser, and select the following options: > > clade: Vertebrate > genome: Human > assembly: Mar. 2006 > group: All Tables > table: imageClone > > You can click on the "describe table schema" button to see how this table is > connected to others. The LNDL link you provide has a lot of information, so I'm > not sure exactly what you're looking for. If you can be more specific about > what you're looking for, I can help to direct you to the appropriate > information. > > I hope this is helpful to you. Please don't hesitate to contact us again if you > require further assistance. > > Kayla Smith > UCSC Genome Bioinformatics Group > > On Thu, 5 Jul 2007, Yee, Andrew J.,M.D. wrote: > >> Do you have any tables that incorporate I.M.A.G.E clone information (e.g. >> http://image.llnl.gov/)? >> >> Thanks, >> Andrew >> >> Andrew J. Yee, MD >> Instructor in Medicine, Harvard Medical School Assistant in Medicine, >> Mass. General Hospital Cancer Center >> >> >> >> >> >> >> The information transmitted in this electronic communication is intended only > for the person or entity to whom it is addressed and may contain confidential > and/or privileged material. Any review, retransmission, dissemination or other > use of or taking of any action in reliance upon this information by persons or > entities other than the intended recipient is prohibited. If you received this > information in error, please contact the Compliance HelpLine at 800-856-1983 and > properly dispose of this information. >> _______________________________________________ >> Genome maillist - Genome at soe.ucsc.edu >> http://www.soe.ucsc.edu/mailman/listinfo/genome >> > > > > > > The information transmitted in this electronic communication is intended only for the person or entity to whom it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of or taking of any action in reliance upon this information by persons or entities other than the intended recipient is prohibited. If you received this information in error, please contact the Compliance HelpLine at 800-856-1983 and properly dispose of this information. > > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From jayuan2007 at yahoo.com Fri Jul 6 19:20:33 2007 From: jayuan2007 at yahoo.com (Jay an) Date: Fri, 6 Jul 2007 19:20:33 -0700 (PDT) Subject: [Genome] GO annotation In-Reply-To: <468558D7.20905@soe.ucsc.edu> Message-ID: <352890.85248.qm@web63202.mail.re1.yahoo.com> Hi Brooke, I got a table consisting of GO:XXX and gene symbol. but I found most of gene symbols are n/a. does it mean no gene names? Jay Brooke Rhead wrote: Hi Jay, The go.dbObjectId is the UniProt accession number. (But in the case of A0A000, the species is not human, but Streptomyces ghanaensis [TaxID: 35758]). The go database contains dbObjectId's for all assemblies, not just hg18. However, it is possible to distingush species in the goaPart table, as the species name is included as part of the dbObjectSymbol field. Here are some examples where "HUMAN" is included: mysql> select * from goaPart where dbObjectSymbol like '%_HUMAN' limit 5; +------------+----------------+-------+------------+--------+ | dbObjectId | dbObjectSymbol | notId | goId | aspect | +------------+----------------+-------+------------+--------+ | A0A184 | A0A184_HUMAN | | GO:0005764 | C | | A0A184 | A0A184_HUMAN | | GO:0006629 | P | | A0A184 | A0A184_HUMAN | | GO:0006665 | P | | A0A1K6 | A0A1K6_HUMAN | | GO:0004222 | F | | A0A1K6 | A0A1K6_HUMAN | | GO:0006508 | P | +------------+----------------+-------+------------+--------+ 5 rows in set (0.00 sec) You can use the Table Browser to filter the goaPart table so that only human UniProt IDs are shown. To do this, hit the filter "create" button. In the free-form query box enter the text: dbObjectSymbol like '%HUMAN' and hit "submit". The output should be limited to only the UniProt symbols with "HUMAN" in the name. The filtered goaPart table may be all you need to map GO accessions to genes. But, as you have noticed, the goaPart table is linked to the hg18 kgXref table, too. If you would like to get the gene names corresponding to GO accessions from kgXref, you can do that, too, with the Table Browser: 1. You will likely want to leave the filter on the goaPart table in place (dbObjectSymbol like '%HUMAN'). 2. Select the option for "output format: selected fields from primary and related tables" and hit "get output". 3. On the next screen, under the "Linked Tables" heading, select the box for the hg18 kgXref table. Scroll to the bottom of the page and hit "Allow selection from checked tables". 4. You should now see a section called at the top of the page called "hg18.kgXref fields", where you can select any of the identifiers from the kgXref table (like gene symbol). 5. Hit "get output". You should get a list of GO identifiers with associated gene names from kgXref. Keep in mind that not every GO ID will be associated with a gene in the kgXref table. I hope this information helps. -- Brooke Rhead UCSC Genome Bioinformatics Group Jay an wrote: > thanks Brooke, > > I followed you instruction. but I got below: > > #dbObjectId dbObjectSymbol notId goId aspect > A0A000 A0A000_9ACTO GO:0003870 F > A0A000 A0A000_9ACTO GO:0006783 P > A0A000 A0A000_9ACTO GO:0009058 P > > there is not proteinID. > I found "hg18.kgXref > .spID > (via goaPart.dbObjectId", > how can I "via goaPart.dbObjectId"? > > thank you > Jay > > > > */Brooke Rhead /* wrote: > > Hello Jay, > > The GO accessions are linked to genes (that is, protein IDs) in the > table 'goaPart', which resides in our 'go' database. > > You can get to this table in the Table Browser by selecting "group: all > tables" and "database: go", then selecting "table: go.goaPart". > > I hope this information helps. If you have further questions, please > feel free to write back to this list. > > -- > Brooke Rhead > UCSC Genome Bioinformatics Group > > > Jay an wrote: > > hello, > > > > every GO:XXXXX has related genes. can you tell me how to a matrix > > (GO:XXXXX and genes)? > > > > > > thanks > > > > > > > > --------------------------------- > > Get your own web address. > > Have a HUGE year through Yahoo! Small Business. > > _______________________________________________ > > Genome maillist - Genome at soe.ucsc.edu > > http://www.soe.ucsc.edu/mailman/listinfo/genome > > > ------------------------------------------------------------------------ > Get the Yahoo! toolbar and be alerted to new email > wherever > you're surfing. --------------------------------- Park yourself in front of a world of choices in alternative vehicles. Visit the Yahoo! Auto Green Center. From jayuan2007 at yahoo.com Sun Jul 8 04:54:03 2007 From: jayuan2007 at yahoo.com (Jay an) Date: Sun, 8 Jul 2007 04:54:03 -0700 (PDT) Subject: [Genome] GO annotation In-Reply-To: <468A99EA.2040909@soe.ucsc.edu> Message-ID: <801050.93849.qm@web63212.mail.re1.yahoo.com> Hi Brooke, GO term has different levels. could you tell me how to get level information for each GO code? Jay Brooke Rhead wrote: Hi Jay, The GO processes are listed in the table 'term' in the field 'term_type'. The 'acc' field in the 'term' table is the same as the 'goId' field in the 'goaPart' table. You can use the Table Browser as described in my earlier message to get a list of GO IDs and their corresponding processes from these tables. Also, since you are interested in GO annotations, may I suggest the Gene Sorter as a tool to view and sort genes based on GO similarity? If you click the "Gene Sorter" link at the top of the page, you will be taken to the Gene Sorter gateway page. Enter any gene name into the search box to get started. Click the "configure" button and scroll down to the Gene Ontology (GO) checkbox -- here you can turn on the display of GO information. The Gene Sorter has many options for sorting and filtering data. Complete instructions on using the Gene Sorter are here: http://genome.ucsc.edu/goldenPath/help/hgNearHelp.html I hope this helps. -- Brooke Rhead UCSC Genome Bioinformatics Group Date: Mon, 2 Jul 2007 23:58:03 -0700 (PDT) From: Jay an To: Brooke Rhead Cc: "'genome at soe.ucsc.edu'" Subject: Re: [Genome] GO annotation hello Brooke, your instruction is very helpful. thanks GO can be categorized different processes, such as physiological process, metabolism, cellular physiological process. how can get this information from UCSC ? regards Jay Brooke Rhead wrote: Hi Jay, The go.dbObjectId is the UniProt accession number. (But in the case of A0A000, the species is not human, but Streptomyces ghanaensis [TaxID: 35758]). The go database contains dbObjectId's for all assemblies, not just hg18. However, it is possible to distingush species in the goaPart table, as the species name is included as part of the dbObjectSymbol field. Here are some examples where "HUMAN" is included: mysql> select * from goaPart where dbObjectSymbol like '%_HUMAN' limit 5; +------------+----------------+-------+------------+--------+ | dbObjectId | dbObjectSymbol | notId | goId | aspect | +------------+----------------+-------+------------+--------+ | A0A184 | A0A184_HUMAN | | GO:0005764 | C | | A0A184 | A0A184_HUMAN | | GO:0006629 | P | | A0A184 | A0A184_HUMAN | | GO:0006665 | P | | A0A1K6 | A0A1K6_HUMAN | | GO:0004222 | F | | A0A1K6 | A0A1K6_HUMAN | | GO:0006508 | P | +------------+----------------+-------+------------+--------+ 5 rows in set (0.00 sec) You can use the Table Browser to filter the goaPart table so that only human UniProt IDs are shown. To do this, hit the filter "create" button. In the free-form query box enter the text: dbObjectSymbol like '%HUMAN' and hit "submit". The output should be limited to only the UniProt symbols with "HUMAN" in the name. The filtered goaPart table may be all you need to map GO accessions to genes. But, as you have noticed, the goaPart table is linked to the hg18 kgXref table, too. If you would like to get the gene names corresponding to GO accessions from kgXref, you can do that, too, with the Table Browser: 1. You will likely want to leave the filter on the goaPart table in place (dbObjectSymbol like '%HUMAN'). 2. Select the option for "output format: selected fields from primary and related tables" and hit "get output". 3. On the next screen, under the "Linked Tables" heading, select the box for the hg18 kgXref table. Scroll to the bottom of the page and hit "Allow selection from checked tables". 4. You should now see a section called at the top of the page called "hg18.kgXref fields", where you can select any of the identifiers from the kgXref table (like gene symbol). 5. Hit "get output". You should get a list of GO identifiers with associated gene names from kgXref. Keep in mind that not every GO ID will be associated with a gene in the kgXref table. I hope this information helps. -- Brooke Rhead UCSC Genome Bioinformatics Group Jay an wrote: > thanks Brooke, > > I followed you instruction. but I got below: > > #dbObjectId dbObjectSymbol notId goId aspect > A0A000 A0A000_9ACTO GO:0003870 F > A0A000 A0A000_9ACTO GO:0006783 P > A0A000 A0A000_9ACTO GO:0009058 P > > there is not proteinID. > I found "hg18.kgXref > .spID > (via goaPart.dbObjectId", > how can I "via goaPart.dbObjectId"? > > thank you > Jay > > > > */Brooke Rhead /* wrote: > > Hello Jay, > > The GO accessions are linked to genes (that is, protein IDs) in the > table 'goaPart', which resides in our 'go' database. > > You can get to this table in the Table Browser by selecting "group: all > tables" and "database: go", then selecting "table: go.goaPart". > > I hope this information helps. If you have further questions, please > feel free to write back to this list. > > -- > Brooke Rhead > UCSC Genome Bioinformatics Group > > > Jay an wrote: > > hello, > > > > every GO:XXXXX has related genes. can you tell me how to a matrix > > (GO:XXXXX and genes)? > > > > > > thanks > > > > > > > > --------------------------------- > > Get your own web address. > > Have a HUGE year through Yahoo! Small Business. > > _______________________________________________ > > Genome maillist - Genome at soe.ucsc.edu > > http://www.soe.ucsc.edu/mailman/listinfo/genome > > > ------------------------------------------------------------------------ > Get the Yahoo! toolbar and be alerted to new email > wherever > you're surfing. --------------------------------- The fish are biting. Get more visitors on your site using Yahoo! Search Marketing. --------------------------------- Looking for a deal? Find great prices on flights and hotels with Yahoo! FareChase. From james at ryley.com Sat Jul 7 19:28:47 2007 From: james at ryley.com (James) Date: Sat, 7 Jul 2007 22:28:47 -0400 Subject: [Genome] Suggestion for http://genome.ucsc.edu/ENCODE/terms.html Message-ID: <200707080228.l682Sl6d016152@smtp.ryley.com> Hi, I saw that you have some patent-related information at http://genome.ucsc.edu/ENCODE/terms.html. Are you familiar with http://www.FreePatentsOnline.com? FreePatentsOnline has more data and more features than any other free patent site, plus free PDF downloading - thought it might be useful to you. If you have an appropriate place on your web site, a link would be great. Sincerely, James From wxzheng_tju at hotmail.com Mon Jul 9 02:17:34 2007 From: wxzheng_tju at hotmail.com (ZhengWenXin) Date: Mon, 9 Jul 2007 17:17:34 +0800 Subject: [Genome] Help! Message-ID: Dear Prof. I query the sub-sequence from 38,117,568-38,117,968 on the chromosome 15 of the human genome (hg18, Mar 2006) through the Genome Browser. It can be seen that there?s a gene named SRP14 aligning there in the RefSeq Genes track. I want to make it sure that if the region from about 38,117,768-38,117,890 bp, represented by a piece of thick line, is an exon of the gene named SRP14. If it is an exon of the gene SRP14, please see the conservation track of the region chr15: 38,117,568-38,117,968. There?s a large peak in the region of this exon in the conservation track. Does it mean that this exon is more conserved than the introns beside it throughout evolution? Thanks a lot for your patience. I am looking forward to hearing from you. Your help to my study would be greatly appreciated. Best wishes, Wen-Xin Wen-Xin Zheng, PhD candidate Bioinformatics Center Tianjin University Tianjin 300072 China Fax: +86-22-27402697 Website: http://tubic.tju.edu.cn _________________________________________________________________ ?????????? http://search.msn.com/results.aspx?q=%E4%B8%AD%E5%9B%BD%E5%8D%81%E5%A4%A7%E9%A3%8E%E6%99%AF&mkt=zh-CN&form=QBRE From Ian.Donaldson at manchester.ac.uk Mon Jul 9 08:57:21 2007 From: Ian.Donaldson at manchester.ac.uk (Ian Donaldson) Date: Mon, 9 Jul 2007 16:57:21 +0100 Subject: [Genome] RefSeq table hg18 Message-ID: <20070709165721.nybpktivi8kck444@webmail.manchester.ac.uk> Hello, I have recently downloaded RefGene data using the hg18 table browser to obain coordinates to complement an online tool I am using. Some of the long (presumably newer) NM references e.g. NM_001001998 cannot be found in the data I downloaded from UCSC. Please can you tell me if this is just likely to be due to slightly older data stored in UCSC? Also, is it poosible to tell me where you sourced the RefSeq data? Many thanks, Ian From hiram at soe.ucsc.edu Mon Jul 9 09:48:20 2007 From: hiram at soe.ucsc.edu (Hiram Clawson) Date: Mon, 09 Jul 2007 09:48:20 -0700 Subject: [Genome] RefSeq table hg18 In-Reply-To: <20070709165721.nybpktivi8kck444@webmail.manchester.ac.uk> References: <20070709165721.nybpktivi8kck444@webmail.manchester.ac.uk> Message-ID: <469266D4.8060601@soe.ucsc.edu> Good Morning Ian: The sequence you mention, NM_001001998, is in the genome browser at least as of 08 July and available in the download file ftp://hgdownload.cse.ucsc.edu/goldenPath/hg18/bigZips/refMrna.fa.gz The refSeq data is updated once per week, usually on Sunday. As mentioned in the README: ftp://hgdownload.cse.ucsc.edu/goldenPath/hg18/bigZips/README.txt The sequence you mentioned is in genbank as of 26 June 2007. Perhaps you downloaded the data before 01 July ? Or our automatic process missed it for 01 July. --Hiram Ian Donaldson wrote: > Hello, > > I have recently downloaded RefGene data using the hg18 table browser to obain > coordinates to complement an online tool I am using. Some of the long > (presumably newer) NM references e.g. NM_001001998 cannot be found in the data > I downloaded from UCSC. Please can you tell me if this is just likely to be > due to slightly older data stored in UCSC? Also, is it poosible to tell me > where you sourced the RefSeq data? > > Many thanks, > Ian From provencb at iro.umontreal.ca Mon Jul 9 09:56:21 2007 From: provencb at iro.umontreal.ca (Benjamin Provencher) Date: Mon, 09 Jul 2007 12:56:21 -0400 Subject: [Genome] GO tree representation Message-ID: <469268B5.1020308@iro.umontreal.ca> Hello, I would like to get all GO term related to regulation. From AmiGO, I get a tree representation having the term [GO:0065007 biological regulation] as root. This node as three child: GO:0065009, GO:0050789, GO:005008 wich also have childs. I would like to know if there is a way to extract the list of child from your go database. thanks From sstrome at indiana.edu Mon Jul 9 10:25:17 2007 From: sstrome at indiana.edu (Susan Strome) Date: Mon, 9 Jul 2007 13:25:17 -0400 Subject: [Genome] WS170 on UCSC Gen Browser? Message-ID: Dear UCSC Genome Browser: I am a biology prof moving from Indiana Univ. to join MCD Biology at UCSC next week ... I'm also a member of an NIH ENCODE group. There are several ENCODE groups that will be working with Nimblegen on ChIP-chip experiments and will want to view results via the UCSC Genome Browser. Nimblegen data use WS170, and so we are grappling with how to interface with the Gen Browser, which currently uses WS120. Wormbase people think that perhaps a script can be used to convert the data from one sequence version to another. I am writing to inquire about the alternative possibility of having the Genome Browser group add the WS170 sequence version to the site. Thanks for any info on this, Susan -- Susan Strome Chancellor's Professor Department of Biology 1001 E. 3rd St. Indiana University Bloomington, IN 47405-7005 phone: 812-855-5450 fax: 812-855-6705 email: sstrome at indiana.edu From hiram at soe.ucsc.edu Mon Jul 9 11:45:04 2007 From: hiram at soe.ucsc.edu (Hiram Clawson) Date: Mon, 09 Jul 2007 11:45:04 -0700 Subject: [Genome] WS170 on UCSC Gen Browser? In-Reply-To: References: Message-ID: <46928230.9020201@soe.ucsc.edu> Good Morning Susan: The WS170 C. elegans browser is currently undergoing QA processing here in preparation for release to the public WEB site. You can view this copy under development at: http://genome-test.cse.ucsc.edu/cgi-bin/hgTracks?db=ce4 --Hiram Susan Strome wrote: > Dear UCSC Genome Browser: I am a biology prof moving from Indiana > Univ. to join MCD Biology at UCSC next week ... I'm also a member of > an NIH ENCODE group. There are several ENCODE groups that will be > working with Nimblegen on ChIP-chip experiments and will want to view > results via the UCSC Genome Browser. Nimblegen data use WS170, and > so we are grappling with how to interface with the Gen Browser, which > currently uses WS120. Wormbase people think that perhaps a script > can be used to convert the data from one sequence version to another. > I am writing to inquire about the alternative possibility of having > the Genome Browser group add the WS170 sequence version to the site. > > Thanks for any info on this, > Susan From kayla at soe.ucsc.edu Mon Jul 9 14:24:54 2007 From: kayla at soe.ucsc.edu (Kayla Smith) Date: Mon, 09 Jul 2007 14:24:54 -0700 Subject: [Genome] Help! In-Reply-To: References: Message-ID: <4692A7A6.6050607@cse.ucsc.edu> Wen-Xin, Yes, you are correct. The region you mention, chr15:38,117,768-38,117,890 is within an exon of the SRP14 gene. The large peak in the Conservation track at chr15: 38,117,568-38,117,968 implies that the sequence there is conserved. You can click into the details page for the conservation track and check the boxes to turn on information for any/all of the aligning species in order to determine more specifically which species have sequence in common. Good luck with your research. I hope this is helpful to you. Please don't hesitate to contact us again if you require further assistance. Kayla Smith UCSC Genome Bioinformatics Group ZhengWenXin wrote: > Dear Prof. > I query the sub-sequence from 38,117,568-38,117,968 on the chromosome 15 of the human genome (hg18, Mar 2006) through the Genome Browser. It can be seen that there?s a gene named SRP14 aligning there in the RefSeq Genes track. I want to make it sure that if the region from about 38,117,768-38,117,890 bp, represented by a piece of thick line, is an exon of the gene named SRP14. If it is an exon of the gene SRP14, please see the conservation track of the region chr15: 38,117,568-38,117,968. There?s a large peak in the region of this exon in the conservation track. Does it mean that this exon is more conserved than the introns beside it throughout evolution? > Thanks a lot for your patience. I am looking forward to hearing from you. Your help to my study would be greatly appreciated. > > Best wishes, > Wen-Xin > > Wen-Xin Zheng, PhD candidate > Bioinformatics Center > Tianjin University > Tianjin 300072 > China > Fax: +86-22-27402697 > Website: http://tubic.tju.edu.cn > > _________________________________________________________________ > ?????????? > http://search.msn.com/results.aspx?q=%E4%B8%AD%E5%9B%BD%E5%8D%81%E5%A4%A7%E9%A3%8E%E6%99%AF&mkt=zh-CN&form=QBRE > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From ikhrebtukova at illumina.com Mon Jul 9 16:14:28 2007 From: ikhrebtukova at illumina.com (Khrebtukova, Irina) Date: Mon, 9 Jul 2007 16:14:28 -0700 Subject: [Genome] custom track wig file with + and - numbers for the same position Message-ID: Hi, I'm trying to upload a custom track wig file with chr start and some positive or negative number and two alt colors. So far it was working perfectly well for me until I start using data where the same chr start could have both positive and negative numbers. You could easily imagine the situation when some short fragments map to the the same start on both + and - strand. However, seems like I CAN NOT upload any track that has the same chr position but different values, for example: variableStep chrom=chr1 span=32 555560 -113 555560 5 your browser immediately throw an error like: track load error: chrom positions not in numerical order at line 24. previous: 555560 > 555560 <-current could you please advise any solution around this? thanks! Irina Khrebtukova, PhD Sr. Staff Bioinformatics Scientist Illumina Inc. 25861 Industrial Blvd., Hayward, CA 94545 ph: 510-723-9219 ikhrebtukova at illumina.com From hiram at soe.ucsc.edu Mon Jul 9 16:32:42 2007 From: hiram at soe.ucsc.edu (Hiram Clawson) Date: Mon, 09 Jul 2007 16:32:42 -0700 Subject: [Genome] custom track wig file with + and - numbers for the same position In-Reply-To: References: Message-ID: <4692C59A.9070007@soe.ucsc.edu> Good Afternoon Irina: The wiggle track will not allow multiple values in the same location, as you have discovered. If you want two values in the same location, create two data sets and load them into two separate tracks of data. --Hiram Khrebtukova, Irina wrote: > Hi, > > I'm trying to upload a custom track wig file with chr start and some > positive or negative number and two alt colors. So far it was working > perfectly well for me until I start using data where the same chr start > could have both positive and negative numbers. You could easily imagine > the situation when some short fragments map to the the same start on > both + and - strand. > > However, seems like I CAN NOT upload any track that has the same chr > position but different values, for example: > > variableStep chrom=chr1 span=32 > 555560 -113 > 555560 5 > > your browser immediately throw an error like: > track load error: > chrom positions not in numerical order at line 24. previous: 555560 > > 555560 <-current > > could you please advise any solution around this? > thanks! > > Irina Khrebtukova, PhD > Sr. Staff Bioinformatics Scientist > Illumina Inc. > 25861 Industrial Blvd., > Hayward, CA 94545 > ph: 510-723-9219 > ikhrebtukova at illumina.com From jayuan2007 at yahoo.com Tue Jul 10 00:42:46 2007 From: jayuan2007 at yahoo.com (Jay an) Date: Tue, 10 Jul 2007 00:42:46 -0700 (PDT) Subject: [Genome] GO id Message-ID: <818162.98005.qm@web63214.mail.re1.yahoo.com> hello, can you tell me how to get a table about "is-a" relationship for GO id. GO_id1 GO_id1's parent distance GO_id2 GO_id2's parent distance GO_id3 GO_id3's parent distance .... --------------------------------- Choose the right car based on your needs. Check out Yahoo! Autos new Car Finder tool. From simleo at crs4.it Tue Jul 10 08:22:30 2007 From: simleo at crs4.it (Simone Leo) Date: Tue, 10 Jul 2007 17:22:30 +0200 Subject: [Genome] minus strand DNA from DAS server Message-ID: <4693A436.8000802@crs4.it> Greetings, I'm using DAS queries to get DNA sequences, for instance: http://genome.ucsc.edu/cgi-bin/das/hg18/dna?segment=19:50100878,50104489 This is from strand plus. Is there a way to get minus strand DNA as well? Regards, Simone Leo -- Simone Leo Distributed Computing group Advanced Computing and Communications program CRS4 POLARIS - Building #1 Piscina Manna I-09010 Pula (CA) - Italy e-mail: simleo at crs4.it http://www.crs4.it From min.he at duke.edu Tue Jul 10 09:46:14 2007 From: min.he at duke.edu (Min He) Date: Tue, 10 Jul 2007 12:46:14 -0400 Subject: [Genome] Request for the ENCODE phased data Message-ID: <4693B7D6.8000700@duke.edu> To Whom It May Concern: Would you please tell me where I can download the phased data of the ENCODE 44 regions? Thank you so much. Min He From archanat at soe.ucsc.edu Tue Jul 10 10:43:02 2007 From: archanat at soe.ucsc.edu (Archana Thakkapallayil) Date: Tue, 10 Jul 2007 10:43:02 -0700 Subject: [Genome] minus strand DNA from DAS server In-Reply-To: <4693A436.8000802@crs4.it> References: <4693A436.8000802@crs4.it> Message-ID: <4693C526.5090601@soe.ucsc.edu> Hello Simone, You will have to reverse-complement the sequence to get the minus strand DNA. There is a tool in our source tree called 'faRc' which can reverse complement sequence. faRc - Reverse complement a FA file usage: faRc in.fa out.fa In.fa and out.fa may be the same file. options: -keepName - keep name identical (don't prepend RC) -keepCase - works well for ACGTUN in either case. bizarre for other letters. without it bases are turned to lower, all else to n's -justReverse - prepends R unless asked to keep name -justComplement - prepends C unless asked to keep name (cannot appear together with -justReverse) Info on downloading the source is here: http://genome.ucsc.edu/FAQ/FAQlicense.html#license3 Many other popular packages (perhaps including the one you are using to send DAS requests) also include this function. I hope this information helps. If you have any further questions, please do not hesitate to write back to this list. Regards, Archana UCSC Genome Bioinformatics Group Simone Leo wrote: > Greetings, > > I'm using DAS queries to get DNA sequences, for instance: > > http://genome.ucsc.edu/cgi-bin/das/hg18/dna?segment=19:50100878,50104489 > > This is from strand plus. Is there a way to get minus strand DNA as well? > > Regards, > > Simone Leo > From kayla at soe.ucsc.edu Tue Jul 10 10:57:17 2007 From: kayla at soe.ucsc.edu (Kayla Smith) Date: Tue, 10 Jul 2007 10:57:17 -0700 Subject: [Genome] GO tree representation In-Reply-To: <469268B5.1020308@iro.umontreal.ca> References: <469268B5.1020308@iro.umontreal.ca> Message-ID: <4693C87D.4050204@cse.ucsc.edu> Benjamin, While we do import the GO tables so we can support searches on the annotations, we don't provide tools for navigating the GO database structure. I recommend using the GO sites for this: AmiGO: http://amigo.geneontology.org/cgi-bin/amigo/go.cgi QuickGO: http://www.ebi.ac.uk/ego/ Please don't hesitate to contact us again if you have any other questions about the Genome Browser. Kayla Smith UCSC Bioinformatics Group Benjamin Provencher wrote: > Hello, > I would like to get all GO term related to regulation. From AmiGO, I get > a tree representation having the term [GO:0065007 biological > regulation] as root. This node as three child: GO:0065009, GO:0050789, > GO:005008 wich also have childs. I would like to know if there is a way > to extract the list of child from your go database. > thanks > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From archanat at soe.ucsc.edu Tue Jul 10 11:03:50 2007 From: archanat at soe.ucsc.edu (Archana Thakkapallayil) Date: Tue, 10 Jul 2007 11:03:50 -0700 Subject: [Genome] Request for the ENCODE phased data In-Reply-To: <4693B7D6.8000700@duke.edu> References: <4693B7D6.8000700@duke.edu> Message-ID: <4693CA06.6050603@soe.ucsc.edu> Hello Min, I assume you are talking about the phased HapMap genotypes for the ENCODE regions. It has been pointed out to me that these data aren't available separately from the genome-wide phased genotype files, which are available here: http://www.hapmap.org/downloads/phasing/2006-07_phaseII/ I hope this information helps you. Please let us know if you have further questions. Regards, Archana UCSC Genome Bioinformatics Group Min He wrote: > To Whom It May Concern: > > Would you please tell me where I can download the phased data of the > ENCODE 44 regions? > > Thank you so much. > > Min He > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome > From wzxiao at stanford.edu Tue Jul 10 12:29:52 2007 From: wzxiao at stanford.edu (wenzhong) Date: Tue, 10 Jul 2007 12:29:52 -0700 Subject: [Genome] quick question about the version/release date of the annotations Message-ID: <006601c7c328$b022d750$106885f0$@edu> Hello, Can I please ask a quick question about the versions of the annotations in UCSC browser? For example, if I download the "Genes and Gene Prediction Tracks" using the Table Browser now, what would the corresponding versions or release dates of RefSeq (updated daily?), Ensemble (build 37 as stated on the website or the current build 45?), and UCSC Known Genes (the latest version by Jim?)? Thanks for the help! -Wenzhong From maberman at mail.med.upenn.edu Tue Jul 10 13:12:18 2007 From: maberman at mail.med.upenn.edu (Micah Berman) Date: Tue, 10 Jul 2007 16:12:18 -0400 Subject: [Genome] Annotation tracks and genome builds Message-ID: <4693E822.6040106@mail.med.upenn.edu> I am currently using custom annotation tracks on the UCSC browser, mostly simple BED formatted lists. I was wondering if there is a way in the track or browser data of these annotation tracks to define which build of the genome (eg. hg17) these tracks belong to. I understand that on the top of the page where I upload these files I am able define the build from a drop-down list, but is there a way to define the build intrinsic to the annotation file (like a track or browser line definition)? I would be more comfortable knowing that my files will not erroneously align with the wrong build. One the files are uploaded, are these BED definitions linked permanently to the build with which they were uploaded? Do they change if I switch builds on the browser? I'd appreciate your help in learning more about this issue. Thanks, Micah Berman -- Micah Berman MS4 University of Pennsylvania School of Medicine maberman at mail.med.upenn.edu From archanat at soe.ucsc.edu Tue Jul 10 15:22:44 2007 From: archanat at soe.ucsc.edu (Archana Thakkapallayil) Date: Tue, 10 Jul 2007 15:22:44 -0700 Subject: [Genome] quick question about the version/release date of the annotations In-Reply-To: <006601c7c328$b022d750$106885f0$@edu> References: <006601c7c328$b022d750$106885f0$@edu> Message-ID: <469406B4.1020201@soe.ucsc.edu> Hello Wenzhong, Assembly-specific track releases, updates, and release dates are noted in our Release Log at: http://genome.ucsc.edu/goldenPath/releaseLog.html The exact Ensembl release information is not given in the Release Log for hg18. However, you could get this information by clicking into the track settings page for the Ensembl Gene track: http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg18&g=ensGene The refSeq data is updated once per week, usually on Sunday. As mentioned in the README: ftp://hgdownload.cse.ucsc.edu/goldenPath/hg18/bigZips/README.txt For the hg18 assembly: UCSC Genes: release date ( 6 April 2007 ) Ensembl Genes: Build 43 RefSeq Genes: last modified date ( 8 Jul 2007 ) I hope this information helps. If you have any further questions, please do not hesitate to write back to this list. Regards, Archana UCSC Genome Bioinformatics Group wenzhong wrote: > Hello, > > > > Can I please ask a quick question about the versions of the annotations in > UCSC browser? For example, if I download the "Genes and Gene Prediction > Tracks" using the Table Browser now, what would the corresponding versions > or release dates of RefSeq (updated daily?), Ensemble (build 37 as stated on > the website or the current build 45?), and UCSC Known Genes (the latest > version by Jim?)? > > > > Thanks for the help! > > > > -Wenzhong > > > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome > From wzxiao at stanford.edu Tue Jul 10 16:04:59 2007 From: wzxiao at stanford.edu (wenzhong) Date: Tue, 10 Jul 2007 16:04:59 -0700 Subject: [Genome] quick question about the version/release date of the annotations In-Reply-To: <469406B4.1020201@soe.ucsc.edu> References: <006601c7c328$b022d750$106885f0$@edu> <469406B4.1020201@soe.ucsc.edu> Message-ID: <003301c7c346$bd8ab660$38a02320$@edu> Hello Archana, Thanks very much for the help! It is exactly what I was looking for. Best regards, Wenzhong -----Original Message----- From: Archana Thakkapallayil [mailto:archanat at soe.ucsc.edu] Sent: Tuesday, July 10, 2007 3:23 PM To: wenzhong Cc: genome at soe.ucsc.edu Subject: Re: [Genome] quick question about the version/release date of the annotations Hello Wenzhong, Assembly-specific track releases, updates, and release dates are noted in our Release Log at: http://genome.ucsc.edu/goldenPath/releaseLog.html The exact Ensembl release information is not given in the Release Log for hg18. However, you could get this information by clicking into the tr