From mirrian at iis.sinica.edu.tw Wed Nov 1 02:40:28 2006 From: mirrian at iis.sinica.edu.tw (mirrian@iis.sinica.edu.tw) Date: Wed, 01 Nov 2006 18:40:28 +0800 Subject: [Genome] questions for knowngene to refseq Message-ID: <20061101184028.s89526cnk80w44oo@webmail.iis.sinica.edu.tw> Dear Sir, I am trying to link these two kinds of data, refseq download from NCBI and known gene from UCSC. I have downloaded these two tables, kgXref and knownToRefSeq. I found that these two tables are different, but both contain the knowngene info and refseq info. For example, kgXref contains 32750 records that related to refseq from NCBI homo build 36.1, whie knownToRefSeq contains 33961 records that related to refseq from NCBI homo build 36.1. I'm wondering which one is more accurate than the other and what causes this difference. Furthermore, from the info in "The UCSC Known Genes" published in Feb.24, 2006, Genome analysis, if an mRNA has multiple proteins, choose the best from the order, PDB,Swiss-Port, TrEMBL. And for the other hand, if one protein has multiple mRNA, choose the best in favor of longer and newer one with less mismatches. Does that means Known Genes DB only contained the one to one relationship between protein and mRNA? However, looking at the knownToRefSeq table, the relationship between known gene and refseq is not one to one. About 22747 records shows that one refseq has multiple known genes. Would you mind to tell me what causes this? Thanks for your help, and look forward to your response. Best Regards, MengRu ---------------------------------------------------------------- This message was sent using IMP, the Internet Messaging Program. From fanhsu at soe.ucsc.edu Wed Nov 1 08:57:37 2006 From: fanhsu at soe.ucsc.edu (Fan Hsu) Date: Wed, 1 Nov 2006 08:57:37 -0800 Subject: [Genome] questions for knowngene to refseq In-Reply-To: <20061101184028.s89526cnk80w44oo@webmail.iis.sinica.edu.tw> Message-ID: Hi MengRu, Although the majority of UCSC Known Genes (KG) are identical to RefSeq genes, there are some significant differences: 1. Not every RefSeq is a KG. Some RefSeqs were filtered out because they did not pass our gene-check processing step (e.g. RefSeqs with no start or stop codons, or bad reading frames are filtered out). 2. If there is a UniProt protein which maps well to a GenBank mRNA, and it passes the gene-check filter and there is no equal or better corresponding RefSeq, the mRNA/UniProt pair will be added to the KG data set. 3. UCSC KG is updated once in a few months. Our RefSeq track is updated nightly. So the refGene table may contains some latest RefSeq updates that came after the last KG build. The most accurate cross-reference between KG and RefSeq could be found in the kgXref table. The knownToRefSeq table is constructed to support UCSC Gene Sorter, which is based on a canonical set of gene clusters. Each gene cluster may consist of several overlapping KGs, with a single representative KG (could be a RefSeq or an mRNA). And if a RefSeq overlaps with a gene cluster region, it will be added into the kwownToRefSeq table. So there will be situations that a RefSeq did not make it to UCSC KG and yet it could show up in the knownToRefSeq table. The paper, The UCSC Known Genes. Bioinformatics 22(9), 1036-46 (2006), Hsu, F., Kent, W.J., Clawson, H., Kuhn, R.M., Diekhans, M., and Haussler, D. describe our KG I process. We have substantially revised and improved our KG build process, and the new process is called KG II. A description of the KG II process (attached below for your convenience) could be found at: http://genome.ucsc.edu/cgi-bin/hgGene?hgg_do_kgMethod=1&hgg_type=knownGene For KG II, there is no longer a strict one-to-one relationship between the representative mRNA and protein. We noticed that this has created some problem in certain situations and we are considering to go back to this one-to-one relationship for our future KG III process. If you have any further questions about KG, feel free to ask. Fan. BTW, I was born and grew up in Taiwan. It is nice to hear someone from my homeland. :-) Methods This release of UCSC Known Genes was built by a new process, KG II, as described below. UniProt protein sequences (including alternative splicing isoforms) and mRNA sequences from RefSeq and GenBank were aligned against the base genome using BLAT. RefSeq alignments having a base identity level within 0.1% of the best and at least 96% base identity with the genomic sequence were kept. GenBank mRNA alignments having a base identity level within 0.2% of the best and at least 97% base identity with the genomic sequence were kept. Protein alignments having a base identity level within 0.2% of the best and at least 80% base identity with the genomic sequence were kept. Then the genomic mRNA and protein alignments were compared, and protein-mRNA pairings were determined from their overlaps. mRNA CDS data were obtained from RefSeq and GenBank data and supplemented by CDS structures derived from UCSC protein-mRNA BLAT alignments. The initial set of UCSC Known Genes candidates consists of all protein-mRNA pairs with valid mRNA CDS structures. A gene-check program (similar to the one used for the Consensus CDS (CCDS) project) is used to remove questionable candidates, such as those with in-frame stop codons, missing start or stop codons, etc. >From each group of gene candidates that share the same CDS structure, the protein-mRNA pair having the best ranking and protein-mRNA alignment score is selected as a UCSC Known Gene. The ranking of a gene candidate depends on its gene-check quality measures. When all else is equal, a preference is given to RefSeq mRNAs and next to MGC mRNAs. Similarly a preference is given to gene candidates represented by Swiss-Prot proteins. The protein-mRNA alignment score is calculated based on protein to mRNA alignment using TBLASTN, plus weighted sub-scores according to the date and length of the mRNA. -----Original Message----- From: genome-bounces at soe.ucsc.edu [mailto:genome-bounces at soe.ucsc.edu]On Behalf Of mirrian at iis.sinica.edu.tw Sent: Wednesday, November 01, 2006 2:40 AM To: genome at soe.ucsc.edu; genome-mirror at soe.ucsc.edu Subject: [Genome] questions for knowngene to refseq Dear Sir, I am trying to link these two kinds of data, refseq download from NCBI and known gene from UCSC. I have downloaded these two tables, kgXref and knownToRefSeq. I found that these two tables are different, but both contain the knowngene info and refseq info. For example, kgXref contains 32750 records that related to refseq from NCBI homo build 36.1, whie knownToRefSeq contains 33961 records that related to refseq from NCBI homo build 36.1. I'm wondering which one is more accurate than the other and what causes this difference. Furthermore, from the info in "The UCSC Known Genes" published in Feb.24, 2006, Genome analysis, if an mRNA has multiple proteins, choose the best from the order, PDB,Swiss-Port, TrEMBL. And for the other hand, if one protein has multiple mRNA, choose the best in favor of longer and newer one with less mismatches. Does that means Known Genes DB only contained the one to one relationship between protein and mRNA? However, looking at the knownToRefSeq table, the relationship between known gene and refseq is not one to one. About 22747 records shows that one refseq has multiple known genes. Would you mind to tell me what causes this? Thanks for your help, and look forward to your response. Best Regards, MengRu ---------------------------------------------------------------- This message was sent using IMP, the Internet Messaging Program. -- No virus found in this incoming message. Checked by AVG Free Edition. Version: 7.1.409 / Virus Database: 268.13.21/509 - Release Date: 10/31/2006 -- No virus found in this outgoing message. Checked by AVG Free Edition. Version: 7.1.409 / Virus Database: 268.13.22/512 - Release Date: 11/1/2006 From acox at jax.org Wed Nov 1 08:59:14 2006 From: acox at jax.org (Allison Cox) Date: Wed, 1 Nov 2006 11:59:14 -0500 Subject: [Genome] genome browser question Message-ID: <20061101115914436.00000003716@pouka> Generator Microsoft Word 11 (filtered medium) Hello, I' m working on a custom annotation for the genome browser. I have a question - in my track line - I don' t want a description above the line - I just want a name along the side - how do I do this? If I don' t enter a description (e.g. description=" human QTLs" ), the description value is defaulted to my track name - and then the track name is placed both at the side ( where I want it) and above the line (where I don' t want it). Thanks, Allison Cox Research Assistant Beverly Paigen Lab The Jackson Laboratory 600 Main Street Bar Harbor, ME 04609 207-288-6000 x1742 From archanat at soe.ucsc.edu Wed Nov 1 10:14:13 2006 From: archanat at soe.ucsc.edu (Archana Thakkapallayil) Date: Wed, 01 Nov 2006 10:14:13 -0800 Subject: [Genome] genome browser question In-Reply-To: <20061101115914436.00000003716@pouka> References: <20061101115914436.00000003716@pouka> Message-ID: <4548E3F5.3030701@soe.ucsc.edu> Hello Allison, Unfortunately there is no way to remove the 'description' attribute from the track line. If you leave off the 'description' characteristic from your track line, it either displays the track name if it has any or it displays the default value, 'User Supplied Tracks'. However, you could turn off the description of your custom track on the browser by using the 'Configure' page. To do this choose "configure" from the browser display page, and then uncheck 'Display description above each track'. This turns off all the track description lines on the browser. I hope that this answers your question. Please let us know if you have further questions. Regards, Archana UCSC Genome Bioinformatics Group Allison Cox wrote: > Generator Microsoft Word 11 (filtered medium) Hello, > I' m working on a custom annotation for the genome browser. > I have a question - in my track line - I don' t want a description above the line - I just want a name along the side - how do I do this? If I don' t enter a description (e.g. description=" human QTLs" ), the description value is defaulted to my track name - and then the track name is placed both at the side ( where I want it) and above the line (where I don' t want it). > Thanks, > Allison Cox > > Research Assistant > Beverly Paigen Lab > The Jackson Laboratory > 600 Main Street > Bar Harbor, ME 04609 > 207-288-6000 x1742 > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome > From hiram at soe.ucsc.edu Wed Nov 1 10:22:06 2006 From: hiram at soe.ucsc.edu (Hiram Clawson) Date: Wed, 01 Nov 2006 10:22:06 -0800 Subject: [Genome] genome browser question In-Reply-To: <4548E3F5.3030701@soe.ucsc.edu> References: <20061101115914436.00000003716@pouka> <4548E3F5.3030701@soe.ucsc.edu> Message-ID: <4548E5CE.4050009@soe.ucsc.edu> Allison: Also, you could set description=" " on your track definition line. There would be a blank line above your track, but no description. --Hiram > Allison Cox wrote: >> Generator Microsoft Word 11 (filtered medium) Hello, >> I' m working on a custom annotation for the genome browser. >> I have a question - in my track line - I don' t want a description above the line - I just want a name along the side - how do I do this? If I don' t enter a description (e.g. description=" human QTLs" ), the description value is defaulted to my track name - and then the track name is placed both at the side ( where I want it) and above the line (where I don' t want it). >> Thanks, >> Allison Cox From adnan_derti at hms.harvard.edu Wed Nov 1 10:45:26 2006 From: adnan_derti at hms.harvard.edu (Adnan Derti) Date: Wed, 01 Nov 2006 13:45:26 -0500 Subject: [Genome] finding breaks in human-other synteny Message-ID: <4548EB46.1050302@hms.harvard.edu> Hello. I'd like to infer breakpoints in synteny between two genomes. Could you recommend a simple method or a set of rules of thumb based on your alignments of orthologous regions? The results don't need to be precise. As always, thank you for the data and answers you provide. aDNAn Derti From bina at purdue.edu Wed Nov 1 11:30:07 2006 From: bina at purdue.edu (bina@purdue.edu) Date: Wed, 1 Nov 2006 14:30:07 -0500 Subject: [Genome] accessing the encode region for the HOX locus Message-ID: <1162409407.4548f5bf4433f@webmail.purdue.edu> We are having trouble accessing the ENCODE region for ENm010 We are getting the following error message bedGraphLoadItems: table encodeRegulomeProbCACO2 only has 4 data columns, specified graph column 5 does not exist Minou Bina From kate at soe.ucsc.edu Wed Nov 1 11:32:35 2006 From: kate at soe.ucsc.edu (Kate Rosenbloom) Date: Wed, 1 Nov 2006 11:32:35 -0800 Subject: [Genome] accessing the encode region for the HOX locus In-Reply-To: <1162409407.4548f5bf4433f@webmail.purdue.edu> References: <1162409407.4548f5bf4433f@webmail.purdue.edu> Message-ID: <3b69b66b6644e75b557c1a850813a651@soe.ucsc.edu> Hello Minou, We have a configuration problem on our public server for this track -- we are in the process of repairing it and will notify you when finished. Cheers, Kate --- Kate Rosenbloom UCSC Genome Bioinformatics On Nov 1, 2006, at 11:30 AM, bina at purdue.edu wrote: > > We are having trouble accessing the ENCODE region for ENm010 > > We are getting the following error message > > bedGraphLoadItems: table encodeRegulomeProbCACO2 only has 4 data > columns, > specified graph column 5 does not exist > > Minou Bina > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From kuhn at soe.ucsc.edu Wed Nov 1 11:58:14 2006 From: kuhn at soe.ucsc.edu (Robert Kuhn) Date: Wed, 1 Nov 2006 11:58:14 -0800 Subject: [Genome] genome browser question Message-ID: <200611011958.LAA14789@sundance.cse.ucsc.edu> Allison, You can get the description to disappear from above your custom track by defining it as a space: description=" " It will still occupy the space normally taken up by the description text. The text will be visible on the left side if you set your track visibility to 2 (dense). best wishes, --b0b kuhn ucsc genome bioinformatics group > From archanat at soe.ucsc.edu Wed Nov 1 10:14:35 2006 > To: Allison Cox > Cc: "genome at soe.ucsc.edu" > Subject: Re: [Genome] genome browser question > > Hello Allison, > > Unfortunately there is no way to remove the 'description' attribute from > the track line. If you leave off the 'description' characteristic from > your track line, it either displays the track name if it has any or it > displays the default value, 'User Supplied Tracks'. > > However, you could turn off the description of your custom track on the > browser by using the 'Configure' page. To do this choose "configure" > from the browser display page, and then uncheck 'Display description > above each track'. This turns off all the track description lines on the > browser. > > I hope that this answers your question. Please let us know if you have > further questions. > > Regards, > > Archana > UCSC Genome Bioinformatics Group > > > Allison Cox wrote: > > Generator Microsoft Word 11 (filtered medium) Hello, > > I' m working on a custom annotation for the genome browser. > > I have a question - in my track line - I don' t want a description above the line - I just want a name along the side - how do I do this? If I don' t enter a description (e.g. description=" human QTLs" ), the description value is defaulted to my track name - and then the track name is placed both at the side ( where I want it) and above the line (where I don' t want it). > > Thanks, > > Allison Cox > > > > Research Assistant > > Beverly Paigen Lab > > The Jackson Laboratory > > 600 Main Street > > Bar Harbor, ME 04609 > > 207-288-6000 x1742 > > _______________________________________________ > > Genome maillist - Genome at soe.ucsc.edu > > http://www.soe.ucsc.edu/mailman/listinfo/genome > > > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome > From ann at soe.ucsc.edu Wed Nov 1 12:02:42 2006 From: ann at soe.ucsc.edu (Ann Zweig) Date: Wed, 01 Nov 2006 12:02:42 -0800 Subject: [Genome] finding breaks in human-other synteny In-Reply-To: <4548EB46.1050302@hms.harvard.edu> References: <4548EB46.1050302@hms.harvard.edu> Message-ID: <4548FD62.5020205@soe.ucsc.edu> Hello aDNAn, We have a couple of tracks on most of our browsers that will give you the information you need. If you were interested in getting a list of positional information instead of simply visualizing the information in the browser, let me know and I will give you instructions for that as well. To visualize the syntenic breakpoints in the browser, follow these steps. I will explain how to do this for the latest human assembly (hg18) and the latest cow assembly (bosTau2), but you can of course use any pair of organisms. 1. Open the hg18 browser. 2. Hide all of the tracks except the Conservation track (set visibility to full), the Cow Chain (set visibility to dense), and the Cow Net (set visibility to full). 3. From the configuration page for the Conservation track (press the 'conservation' link from the track controls), un-check all organisms except the cow. The Net track will give you the information you want. This track shows the best cow/human chain for every part of the human genome. It is useful for finding orthologous regions and for studying genome rearrangement. You can read more about how to interpret the Net track on the details page (click on the "Cow Net" link from the track controls). The best chains are displayed on "Level 1" in the Net track. The gaps from Level 1 are filled in with other chains that are displayed on "Level 2" (and so on). If the chain on a lower level is on the same chromosome as the chain on the level above it, it is syntenic. Other possibilities are inversions, or non-syntenic chains. A particularly good example is the following region in human: chrX:152,048,252-152,052,811. If you view the Cow Net track for this region, you can see that there are three distinct orthologous regions in the bosTau2 genome: scaffold9234 scaffold14102 scaffold10843 This should get you started. As I said, if you would like to generate a list of syntenic regions using the Table Browser, feel free to write back to this list. Regards, ---------- Ann Zweig UCSC Genome Bioinformatics Group http://genome.ucsc.edu Adnan Derti wrote: > Hello. > > I'd like to infer breakpoints in synteny between two genomes. Could you > recommend a simple method or a set of rules of thumb based on your > alignments of orthologous regions? The results don't need to be precise. > > As always, thank you for the data and answers you provide. > > aDNAn Derti > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From zhengdong.zhang at yale.edu Wed Nov 1 12:26:58 2006 From: zhengdong.zhang at yale.edu (zhengdong.zhang@yale.edu) Date: Wed, 01 Nov 2006 15:26:58 -0500 Subject: [Genome] Two ENCODE tracks to be added to the Browser Message-ID: <20061101152658.qdw27pmxc8gwgcwk@www.mail.yale.edu> Hello, I have two ENCODE tracks, 'RFBR clusters' and 'RFBR deserts', to submit to the USCS genome browser. They can be included in the 'ENCODE Chromatin Immunoprecipitation' section. I uploaded a zip file, encode-tf-clusters-deserts.zip, to genome-encodedev.cse.ucsc.edu. The zip file contains two corresponding BED file and a TXT description file: encode-tf-clusters.bed encode-tf-deserts.bed encode-tf-clusters-deserts-description.txt The coordinates in both BED files are based on the NCBI Build 35 assembly (May 2004, hg17). There is also a 'link-out' web address for each element specified in the BED files. Thank you. Zhengdong Zhang Gerstein Group Yale University From ann at soe.ucsc.edu Wed Nov 1 14:57:01 2006 From: ann at soe.ucsc.edu (Ann Zweig) Date: Wed, 01 Nov 2006 14:57:01 -0800 Subject: [Genome] accessing the encode region for the HOX locus In-Reply-To: <3b69b66b6644e75b557c1a850813a651@soe.ucsc.edu> References: <1162409407.4548f5bf4433f@webmail.purdue.edu> <3b69b66b6644e75b557c1a850813a651@soe.ucsc.edu> Message-ID: <4549263D.6030404@soe.ucsc.edu> Hello again Minou, We have fixed this problem. Sorry for the inconvenience. Regards, ---------- Ann Zweig UCSC Genome Bioinformatics Group http://genome.ucsc.edu Kate Rosenbloom wrote: > Hello Minou, > > We have a configuration problem on our public server > for this track -- we are in the process of repairing > it and will notify you when finished. > > Cheers, > Kate > --- > Kate Rosenbloom > UCSC Genome Bioinformatics > > On Nov 1, 2006, at 11:30 AM, bina at purdue.edu wrote: > >> We are having trouble accessing the ENCODE region for ENm010 >> >> We are getting the following error message >> >> bedGraphLoadItems: table encodeRegulomeProbCACO2 only has 4 data >> columns, >> specified graph column 5 does not exist >> >> Minou Bina >> _______________________________________________ >> Genome maillist - Genome at soe.ucsc.edu >> http://www.soe.ucsc.edu/mailman/listinfo/genome > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From kate at soe.ucsc.edu Wed Nov 1 15:15:53 2006 From: kate at soe.ucsc.edu (Kate Rosenbloom) Date: Wed, 1 Nov 2006 15:15:53 -0800 Subject: [Genome] Two ENCODE tracks to be added to the Browser In-Reply-To: <20061101152658.qdw27pmxc8gwgcwk@www.mail.yale.edu> References: <20061101152658.qdw27pmxc8gwgcwk@www.mail.yale.edu> Message-ID: Hi Zhengdong, Thanks for your ENCODE data submission! I am cc'ing our engineer who will be loading your new Chip/chip data into the browser. You can follow the progress of your data through our track development and quality assurance process via the ENCODE data status page: http://genome.ucsc.edu/ENCODE/trackStatus.html Cheers, Kate On Nov 1, 2006, at 12:26 PM, zhengdong.zhang at yale.edu wrote: > Hello, > > I have two ENCODE tracks, 'RFBR clusters' and 'RFBR deserts', to > submit to the > USCS genome browser. They can be included in the 'ENCODE Chromatin > Immunoprecipitation' section. > > I uploaded a zip file, encode-tf-clusters-deserts.zip, to > genome-encodedev.cse.ucsc.edu. The zip file contains two corresponding > BED file > and a TXT description file: > > encode-tf-clusters.bed > encode-tf-deserts.bed > encode-tf-clusters-deserts-description.txt > > The coordinates in both BED files are based on the NCBI Build 35 > assembly (May > 2004, hg17). There is also a 'link-out' web address for each element > specified > in the BED files. > > > Thank you. > > Zhengdong Zhang > Gerstein Group > Yale University > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From hwei at wicell.org Wed Nov 1 14:57:32 2006 From: hwei at wicell.org (Hairong Wei) Date: Wed, 01 Nov 2006 17:57:32 -0500 Subject: [Genome] What is difference? Message-ID: To Whom it may concern: I downloaded human chr1 assembly sequence from your ftp ( ftp://hgdownload.cse.ucsc.edu/goldenPath/hg18/chromosomes ) and NCBI's web site ftp://ftp.ncbi.nih.gov/genomes/H_sapiens/CHR_01, and then looked at the file sizex, they are different. wireless-162:~/wicell/human_genome_assembly hwei$ wc chr1.fa 4944996 4944996 252194720 chr1.fa wireless-162:~/wicell/human_genome_assembly hwei$ wc hs_ref_chr1.fa 3532140 3532148 250781962 hs_ref_chr1.fa wireless-162:~/wicell/human_genome_assembly hwei$ Why they are different? Are assembly chromosome sequences provided at your website different from NCBI's? Hairong Wei Wicell Research Institute 608-890-1533 From hiram at soe.ucsc.edu Wed Nov 1 16:42:20 2006 From: hiram at soe.ucsc.edu (Hiram Clawson) Date: Wed, 01 Nov 2006 16:42:20 -0800 Subject: [Genome] What is difference? In-Reply-To: References: Message-ID: <45493EEC.5050008@soe.ucsc.edu> Good Afternoon Hairong: Please note, you were looking at the NCBI fasta file that contains the contigs for chr1. To see the assembled chr1, use this file: ftp://ftp.ncbi.nih.gov/genomes/H_sapiens/Assembled_chromosomes/hs_ref_chr1.fa.gz It is identical to the UCSC chr1 sequence. The UCSC sequence includes masking from Repeat masker and simple repeats: $ faSize chr1.fa.gz 247249719 bases (22250000 N's 224999719 real 115286666 upper 109713053 lower) $ faSize hs_ref_chr1.fa.gz 247249719 bases (22250000 N's 224999719 real 224999719 upper 0 lower) The line length in the files is different. You need to compare the actual sequence, not the text fasta files. --Hiram Hairong Wei wrote: > To Whom it may concern: > > I downloaded human chr1 assembly sequence from your ftp ( > ftp://hgdownload.cse.ucsc.edu/goldenPath/hg18/chromosomes ) and NCBI's web > site ftp://ftp.ncbi.nih.gov/genomes/H_sapiens/CHR_01, and then looked at > the file sizex, they are different. > > wireless-162:~/wicell/human_genome_assembly hwei$ wc chr1.fa > 4944996 4944996 252194720 chr1.fa > wireless-162:~/wicell/human_genome_assembly hwei$ wc hs_ref_chr1.fa > 3532140 3532148 250781962 hs_ref_chr1.fa > wireless-162:~/wicell/human_genome_assembly hwei$ > > Why they are different? Are assembly chromosome sequences provided at your > website different from NCBI's? > > Hairong Wei > Wicell Research Institute > 608-890-1533 From agherma1 at jhem.jhmi.edu Thu Nov 2 07:15:23 2006 From: agherma1 at jhem.jhmi.edu (Adi Gherman) Date: Thu, 2 Nov 2006 10:15:23 -0500 Subject: [Genome] cDNA Microarrays data Message-ID: <000801c6fe91$b7c0b290$2069100a@dalab02> Hi! I was wondering if it's possible to get actual values for the Normal Human Tissue cDNA Microarrays data, instead of the ratio colors (in the gene description and page index). I have tried finding that myself, but it's easy to get lost among so many tables:( Thank you! Sincerely, Adrian Gherman From inus at nbn.ac.za Thu Nov 2 06:31:08 2006 From: inus at nbn.ac.za (Inus Scheepers) Date: Thu, 02 Nov 2006 16:31:08 +0200 Subject: [Genome] UCSC Genome browser data on portable disk? Message-ID: <454A012C.7080208@nbn.ac.za> Dear Sir/Madam We are interested in mirroring the USCS browser, but the size of the data download is an obstacle. We can send a portable disk and if someone over there can copy the data onto the disk and ship it back to us here in South Africa, it would be feasible. Who would be the best person to speak to to arrange something like this? Regards,-- Inus Scheepers : National Bioinformatics Network : P/Bag X17 : UWC Campus: Modderdam Rd, Belville : 7535 : System Admin : Tel +27 21 959 2991 : Fax 959 3573 : Cell +27 82 658 1961 : IM inus.sc at gmail.com : : From lironle4 at post.tau.ac.il Thu Nov 2 05:11:06 2006 From: lironle4 at post.tau.ac.il (Liron Levkovitz) Date: Thu, 2 Nov 2006 15:11:06 +0200 Subject: [Genome] Human Promoters Message-ID: <20061102131109.88431BCC195@post.tau.ac.il> Hello, I've downloaded the file upstream1000.zip of Human (hg18) and I've noticed there are some promoters in the file with different RefSeq id's, but with the same locations and therefore the same sequences (see example below). What is the meaning of the different Id's? Thanks Liron Levkovitz M.Sc Student Tel- Aviv University >NM_058167_up_1000_chr1_1199098_r ttcccacgggctgaggctgcgtggacactcctcctggcttctgccccacggcctgcagcaacgtctcagacccacc aaggtgtcccagagggccagagtcagggctgagcatggacacagcctgccctctgcccgtagccacagcgggactg accctcaatattcagccaccttgctgacctgtggttgttcctgagaggcagtcagggttggggcaaaggagagggc cacaggcagccaggtggcgaggtccacctcctgatcccccaccaggggaccaccctcagctccccgctgtcctcag cggcctctcagggaggcggtgggctgcgcgcctgcccccagcgtcccaccccagcgccccgccacctaccgctctc tcccaatgtgtctggggtcagagagaggtggagaggacgcagcgccccgcgctttctgcagtctgtccaaggggag gtcaggggcatcccggggctgggcaggcacctgcaggggagggtgctgcggcgcggacgtgggttgcaaagagatg ccacgcgtgtcactctcgggtccctgaacgagagagctctgagctccctggagcccaaccctccactgagcctcca ggggcccgggacgctccatgcccgggccgcgccgcgcctcccctaagggccgggtggggcgtgggagcggaggaca aggcgggagccgggaggcgggagggtcggggttctttctccaccggggtcgccctcagccgggcctgccgccctgg gccgtggcacatggggagggaagcgcgcggcttcggggtctggggctctcgcgcccactaacggtgcccggagccg cccgccagtgcgcaagcgccgccccgccccccgccccatcccccaccccggggtcgacggcgacagagagtcgtgg gcgcggtccgccagtctgcctagagcggccagccctccccctccctcctcaccgccccggaccgcgcaccggaagc agaagctaggct >NM_194315_up_1000_chr1_1199098_r ttcccacgggctgaggctgcgtggacactcctcctggcttctgccccacggcctgcagcaacgtctcagacccacc aaggtgtcccagagggccagagtcagggctgagcatggacacagcctgccctctgcccgtagccacagcgggactg accctcaatattcagccaccttgctgacctgtggttgttcctgagaggcagtcagggttggggcaaaggagagggc cacaggcagccaggtggcgaggtccacctcctgatcccccaccaggggaccaccctcagctccccgctgtcctcag cggcctctcagggaggcggtgggctgcgcgcctgcccccagcgtcccaccccagcgccccgccacctaccgctctc tcccaatgtgtctggggtcagagagaggtggagaggacgcagcgccccgcgctttctgcagtctgtccaaggggag gtcaggggcatcccggggctgggcaggcacctgcaggggagggtgctgcggcgcggacgtgggttgcaaagagatg ccacgcgtgtcactctcgggtccctgaacgagagagctctgagctccctggagcccaaccctccactgagcctcca ggggcccgggacgctccatgcccgggccgcgccgcgcctcccctaagggccgggtggggcgtgggagcggaggaca aggcgggagccgggaggcgggagggtcggggttctttctccaccggggtcgccctcagccgggcctgccgccctgg gccgtggcacatggggagggaagcgcgcggcttcggggtctggggctctcgcgcccactaacggtgcccggagccg cccgccagtgcgcaagcgccgccccgccccccgccccatcccccaccccggggtcgacggcgacagagagtcgtgg gcgcggtccgccagtctgcctagagcggccagccctccccctccctcctcaccgccccggaccgcgcaccggaagc agaagctaggct >NM_194457_up_1000_chr1_1199098_r ttcccacgggctgaggctgcgtggacactcctcctggcttctgccccacggcctgcagcaacgtctcagacccacc aaggtgtcccagagggccagagtcagggctgagcatggacacagcctgccctctgcccgtagccacagcgggactg accctcaatattcagccaccttgctgacctgtggttgttcctgagaggcagtcagggttggggcaaaggagagggc cacaggcagccaggtggcgaggtccacctcctgatcccccaccaggggaccaccctcagctccccgctgtcctcag cggcctctcagggaggcggtgggctgcgcgcctgcccccagcgtcccaccccagcgccccgccacctaccgctctc tcccaatgtgtctggggtcagagagaggtggagaggacgcagcgccccgcgctttctgcagtctgtccaaggggag gtcaggggcatcccggggctgggcaggcacctgcaggggagggtgctgcggcgcggacgtgggttgcaaagagatg ccacgcgtgtcactctcgggtccctgaacgagagagctctgagctccctggagcccaaccctccactgagcctcca ggggcccgggacgctccatgcccgggccgcgccgcgcctcccctaagggccgggtggggcgtgggagcggaggaca aggcgggagccgggaggcgggagggtcggggttctttctccaccggggtcgccctcagccgggcctgccgccctgg gccgtggcacatggggagggaagcgcgcggcttcggggtctggggctctcgcgcccactaacggtgcccggagccg cccgccagtgcgcaagcgccgccccgccccccgccccatcccccaccccggggtcgacggcgacagagagtcgtgg gcgcggtccgccagtctgcctagagcggccagccctccccctccctcctcaccgccccggaccgcgcaccggaagc agaagctaggct >NM_194458_up_1000_chr1_1199098_r ttcccacgggctgaggctgcgtggacactcctcctggcttctgccccacggcctgcagcaacgtctcagacccacc aaggtgtcccagagggccagagtcagggctgagcatggacacagcctgccctctgcccgtagccacagcgggactg accctcaatattcagccaccttgctgacctgtggttgttcctgagaggcagtcagggttggggcaaaggagagggc cacaggcagccaggtggcgaggtccacctcctgatcccccaccaggggaccaccctcagctccccgctgtcctcag cggcctctcagggaggcggtgggctgcgcgcctgcccccagcgtcccaccccagcgccccgccacctaccgctctc tcccaatgtgtctggggtcagagagaggtggagaggacgcagcgccccgcgctttctgcagtctgtccaaggggag gtcaggggcatcccggggctgggcaggcacctgcaggggagggtgctgcggcgcggacgtgggttgcaaagagatg ccacgcgtgtcactctcgggtccctgaacgagagagctctgagctccctggagcccaaccctccactgagcctcca ggggcccgggacgctccatgcccgggccgcgccgcgcctcccctaagggccgggtggggcgtgggagcggaggaca aggcgggagccgggaggcgggagggtcggggttctttctccaccggggtcgccctcagccgggcctgccgccctgg gccgtggcacatggggagggaagcgcgcggcttcggggtctggggctctcgcgcccactaacggtgcccggagccg cccgccagtgcgcaagcgccgccccgccccccgccccatcccccaccccggggtcgacggcgacagagagtcgtgg gcgcggtccgccagtctgcctagagcggccagccctccccctccctcctcaccgccccggaccgcgcaccggaagc agaagctaggct From tjkwon at cmmt.ubc.ca Wed Nov 1 20:52:04 2006 From: tjkwon at cmmt.ubc.ca (Andrew Kwon) Date: Wed, 1 Nov 2006 20:52:04 -0800 Subject: [Genome] C. elegans refGene and sangerToRefSeq tables Message-ID: <000901c6fe3a$a466b2f0$79325289@nt.cmmt.ubc.ca> To whom it may concern: I downloaded the C. elegans refGene, sangerGene and sangerToRefSeq tables from UCSC ftp site. While going through the records, I ran into trouble with some of the records. More specifically, when you view the C. elegans genome browser, ZK686.2 is given the refseq id NM_066289, and ZK686.5 is given NM_001027859. However, from the tables I downloaded, ZK686.5 is associated with NM_066289 in sangerToRefSeq table. NM_001027859 is present in refGene table, but not in sangerToRefSeq table. Is this a faulty annotation, or am I making a mistake somewhere? Andrew T. Kwon Wasserman Lab CMMT UBC From tjkwon at cmmt.ubc.ca Wed Nov 1 20:58:24 2006 From: tjkwon at cmmt.ubc.ca (Andrew Kwon) Date: Wed, 1 Nov 2006 20:58:24 -0800 Subject: [Genome] C. elegans refGene table and sangerGene table Message-ID: <000a01c6fe3b$86f952d0$79325289@nt.cmmt.ubc.ca> >From the C. elegans refGene table, searching for rows with name = 'NM_061506' returns 5 rows, 2 of which are from chrII, 2 from chrIV, and one from chrV. Searching for 'NM_061506' in sangerToRefSeq table returns 4 rows, including K02E7.3, K02B7.1, W03G1.4, R09E12.6. I don't understand why this is happening. How can the same refseq id refer to genes from different chromosomes? Andrew T. Kwon Wasserman Lab CMMT UBC From Mike.Mitchell at cancer.org.uk Thu Nov 2 08:17:02 2006 From: Mike.Mitchell at cancer.org.uk (Mike Mitchell) Date: Thu, 02 Nov 2006 16:17:02 +0000 Subject: [Genome] Human Promoters In-Reply-To: <20061102131109.88431BCC195@post.tau.ac.il> Message-ID: This is an example of known gene isoforms (splice variants). The gene UBE2J2 has 4 known isoform, each one of these has it's own RefSeq identifier and in this case they all share the same first exon. If you have a look at hg18 at chr1:1,179,157-1,199,097 you will see the case that you highlighted. -- Mike Mitchell Bioinformatics & Biostatistics Service Cancer Research UK +44 (0) 207 269 3115 From hiram at soe.ucsc.edu Thu Nov 2 09:05:51 2006 From: hiram at soe.ucsc.edu (Hiram Clawson) Date: Thu, 02 Nov 2006 09:05:51 -0800 Subject: [Genome] UCSC Genome browser data on portable disk? In-Reply-To: <454A012C.7080208@nbn.ac.za> References: <454A012C.7080208@nbn.ac.za> Message-ID: <454A256F.9070003@soe.ucsc.edu> Good Morning Inus: Please note, if you do not need all of the assemblies we host, it is possible to create a browser with a subset of data. Please note the minimum set of tables needed for a browser: http://genomewiki.cse.ucsc.edu/index.php/Browser_installation Your data transfer problem wouldn't be over after an initial load. You would need to keep your mirror data updated which requires a continuing need for bandwidth. I'm curious, what is the difficulty with the data transfer at your location ? Is it because of a limited bandwidth connection, or is there a fee charged for amount of data transferred ? --Hiram Inus Scheepers wrote: > Dear Sir/Madam > > We are interested in mirroring the USCS browser, but the size of the > data download is an obstacle. > > We can send a portable disk and if someone over there can copy the data > onto the disk and ship it back to > us here in South Africa, it would be feasible. > > Who would be the best person to speak to to arrange something like this? > > > Regards,-- > > Inus Scheepers : National Bioinformatics Network : P/Bag X17 : UWC Campus: Modderdam Rd, Belville : 7535 : > System Admin : Tel +27 21 959 2991 : Fax 959 3573 : Cell +27 82 658 1961 : IM inus.sc at gmail.com : : From archanat at soe.ucsc.edu Thu Nov 2 11:43:14 2006 From: archanat at soe.ucsc.edu (Archana Thakkapallayil) Date: Thu, 02 Nov 2006 11:43:14 -0800 Subject: [Genome] cDNA Microarrays data In-Reply-To: <000801c6fe91$b7c0b290$2069100a@dalab02> References: <000801c6fe91$b7c0b290$2069100a@dalab02> Message-ID: <454A4A52.3000903@soe.ucsc.edu> Hello Adrian, The expression values for the Normal Human Tissue cDNA Microarray data are stored in the 'hgFixed' database in the tables humanNormal, humanNormalMedian, humanNormalRatio, and humanNormalMedianRatio. You can access them via the Table Browser by setting your group to "All Tables" and database to "hgFixed", and then looking in the table list. Here is a summary of the contents of each table and you could choose the table you want: humanNormal - aboslute expression values for all experiments. humanNormalMedian - median values for replicate experiments humanNormalRatio - log2 ratio of the expression value for a probeset divided by the median expression value for all experiments for that probeset. humanNormalMedianRatio - median values for replicate experiments for the data in humanNormalRatio humanNormal is the original expression data. The field expScores is the comma separated list of expression values. You might also want to look at the Exps tables to find out the tissues that corresponds to each expression value. The tables are humanNormalExps and humanNormalMedianExps. humanNormalExps is for the humanNormal and humanNormalRatio tables. humanNormalMedianExps is for the humanNormalMedian and humanNormalMedianRatio tables. The Exps table list tissues with ID starting from 0 - n-1 where n is the number of experiments. The ID is an index into the list of expScores that gives the position of the data value for that tissue. I hope this helps to answer your question. If you require more assistance, please don't hesitate to contact us again. Regards, Archana UCSC Genome Bioinformatics Group Adi Gherman wrote: > Hi! > I was wondering if it's possible to get actual values for the Normal Human Tissue cDNA Microarrays data, instead of the ratio colors (in the gene description and page index). I have tried finding that myself, but it's easy to get lost among so many tables:( > > Thank you! > > Sincerely, > Adrian Gherman > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome > From hiram at soe.ucsc.edu Thu Nov 2 12:08:08 2006 From: hiram at soe.ucsc.edu (Hiram Clawson) Date: Thu, 02 Nov 2006 12:08:08 -0800 Subject: [Genome] C. elegans refGene table and sangerGene table In-Reply-To: <000a01c6fe3b$86f952d0$79325289@nt.cmmt.ubc.ca> References: <000a01c6fe3b$86f952d0$79325289@nt.cmmt.ubc.ca> Message-ID: <454A5028.6070202@soe.ucsc.edu> Good Morning Andrew: The Genbank refSeq alignments to the C. elegans genome are simply alignments of the sequence to the genome. If you take the sequence for NM_061506 you will find it will blat match almost %100 exactly to locations on chrII, chrIV and chrV. Our daily genbank alignments are simply alignments via blat to the genome. If they pass our criteria for alignment, they are marked on the genome. For your query about the discrepancy between the Genbank refGene alignment tables and the sangerToRefSeq tables, this is a problem of coordination between updates in Genbank vs. our original build of the C. elegans browser. Our sangerToRefSeq table was created 07 June 2004 with the refSeq data available at that time. The newer refGene tables are built daily with refSeq data of today. Over time these two table contents will diverge as genbank incorporates new data. --Hiram Andrew Kwon wrote: >>From the C. elegans refGene table, searching for rows with name = > 'NM_061506' returns 5 rows, 2 of which are from chrII, 2 from chrIV, and one > from chrV. Searching for 'NM_061506' in sangerToRefSeq table returns 4 > rows, including K02E7.3, K02B7.1, W03G1.4, R09E12.6. > > I don't understand why this is happening. How can the same refseq id refer > to genes from different chromosomes? > > Andrew T. Kwon > Wasserman Lab > CMMT UBC > I downloaded the C. elegans refGene, sangerGene and sangerToRefSeq tables > from UCSC ftp site. While going through the records, I ran into trouble with > some of the records. More specifically, when you view the C. elegans genome > browser, ZK686.2 is given the refseq id NM_066289, and ZK686.5 is given > NM_001027859. However, from the tables I downloaded, ZK686.5 is associated > with NM_066289 in sangerToRefSeq table. NM_001027859 is present in refGene > table, but not in sangerToRefSeq table. Is this a faulty annotation, or am > I making a mistake somewhere? From fungazid at yahoo.com Thu Nov 2 14:14:16 2006 From: fungazid at yahoo.com (fungazid fungazid) Date: Thu, 2 Nov 2006 14:14:16 -0800 (PST) Subject: [Genome] How do I reply to messages in this mailing list ? Message-ID: <20061102221416.89875.qmail@web58502.mail.re3.yahoo.com> Hello Sorry about this newbie question. I created a new mailing list message by sending it to genome at soe.ucsc.edu, but how do I reply to a messages sent by others ? Avi --------------------------------- Get your email and see which of your friends are online - Right on the new Yahoo.com From rhead at soe.ucsc.edu Thu Nov 2 15:57:18 2006 From: rhead at soe.ucsc.edu (Brooke Rhead) Date: Thu, 02 Nov 2006 15:57:18 -0800 Subject: [Genome] How do I reply to messages in this mailing list ? In-Reply-To: <20061102221416.89875.qmail@web58502.mail.re3.yahoo.com> References: <20061102221416.89875.qmail@web58502.mail.re3.yahoo.com> Message-ID: <454A85DE.8090107@soe.ucsc.edu> Hello Avi, The mailing list is moderated by Genome Browser staff, meaning that all messages sent to genome at soe.ucsc.edu are approved by us before being distributed to all list subscribers. This keeps spam messages from being distributed to everyone. If you wish to reply to a message posted to this list, simply reply to the original sender and "cc" genome at soe.ucsc.edu. Your reply will go directly to the user who initiated the question, and then to all mailing list subscribers after we see your message in our database and approve it for distribution. -- Brooke Rhead UCSC Genome Bioinformatics Group fungazid fungazid wrote: > Hello > > Sorry about this newbie question. I created a new mailing list message by sending it to genome at soe.ucsc.edu, but how do I reply to a messages sent by others ? > > Avi > > > --------------------------------- > Get your email and see which of your friends are online - Right on the new Yahoo.com > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From fungazid at yahoo.com Fri Nov 3 04:57:33 2006 From: fungazid at yahoo.com (fungazid fungazid) Date: Fri, 3 Nov 2006 04:57:33 -0800 (PST) Subject: [Genome] How do I reply to messages in this mailing list ? In-Reply-To: <454A85DE.8090107@soe.ucsc.edu> Message-ID: <20061103125733.87477.qmail@web58508.mail.re3.yahoo.com> Hi Brooke, Thanks :) Avi Brooke Rhead wrote: Hello Avi, The mailing list is moderated by Genome Browser staff, meaning that all messages sent to genome at soe.ucsc.edu are approved by us before being distributed to all list subscribers. This keeps spam messages from being distributed to everyone. If you wish to reply to a message posted to this list, simply reply to the original sender and "cc" genome at soe.ucsc.edu. Your reply will go directly to the user who initiated the question, and then to all mailing list subscribers after we see your message in our database and approve it for distribution. -- Brooke Rhead UCSC Genome Bioinformatics Group fungazid fungazid wrote: > Hello > > Sorry about this newbie question. I created a new mailing list message by sending it to genome at soe.ucsc.edu, but how do I reply to a messages sent by others ? > > Avi > > > --------------------------------- > Get your email and see which of your friends are online - Right on the new Yahoo.com > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome --------------------------------- Want to start your own business? Learn how on Yahoo! Small Business. From fungazid at yahoo.com Fri Nov 3 05:12:33 2006 From: fungazid at yahoo.com (fungazid fungazid) Date: Fri, 3 Nov 2006 05:12:33 -0800 (PST) Subject: [Genome] problems in loading repeat masker to mysql In-Reply-To: <4547D233.9010304@soe.ucsc.edu> Message-ID: <20061103131233.6452.qmail@web58509.mail.re3.yahoo.com> Archana Hi, The rmsk tables from http://genome-test.cse.ucsc.edu/ were loaded successfully into my mysql, as suggested by your engineers. So now everything is OK. Thank you :), Avi Archana Thakkapallayil wrote: Hello Avi, Here is a response from one of our engineers: The .out files are in a slightly different format from our database tables and dump .txt files. If you are going to work directly with mysql, you should use the .sql/.txt files. We provide .out for those who are used to working directly with RepeatMasker output files. Unfortunately, there is no database dump for the rabbit database. The .sql file for rmsk can be found in the Annotation database dump for a different assembly e.g., http://hgdownload.cse.ucsc.edu/goldenPath/hg18/database/ You could download the chr1_rmsk.sql from here and then rename this for use with the rabbit genome. You can then obtain the data directly from the 'Table Browser' on our test server at: http://genome-test.cse.ucsc.edu/ The table that contains this information is called 'rmsk', which comes under group : Variation and Repeats, and track: Repeatmasker. Information on using the Table Browser is here: http://genome.ucsc.edu/goldenPath/help/hgTablesHelp.html Another suggestion is that you need to add LOCAL to your MySQL command: "LOAD DATA LOCAL INFILE" I hope this information helps. Let us know if you run into any problems. Regards, Archana UCSC Genome Bioinformatics Group fungazid fungazid wrote: > Hello world, > > Please help me, I have a problem in loading rmsk.out file for oryCun1 and loxAfr1 builds into mysql. I tried to load it with LOAD DATA INFILE command with no success. It seems that LOAD DATA INFILE can't handle the spacing between Fields. > I got the files from FTP hgdownload.cse.ucsc.edu > > many thaks > Avi > > > --------------------------------- > We have the perfect Group for you. Check out the handy changes to Yahoo! Groups. > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome > --------------------------------- Everyone is raving about the all-new Yahoo! Mail. From jesus.lopez at bsc.es Fri Nov 3 07:28:53 2006 From: jesus.lopez at bsc.es (Jesus Maria Lopez) Date: Fri, 3 Nov 2006 16:28:53 +0100 Subject: [Genome] Human Genome Build Message-ID: <200611031628.53056.jesus.lopez@bsc.es> Hi, I have been looking at human UCSC genome documentation and I have seen that the genome assembly is Build 36.1 ( finished human genome assembly (hg18, Mar. 2006)). I thought that the last genome build was published in october 2005. Are there some differences between your build and the last build published in Nature or date is only a reference for UCSC? Thank you in advance... Jesus. From rd67 at leicester.ac.uk Fri Nov 3 02:01:18 2006 From: rd67 at leicester.ac.uk (Dixon, Dr R.) Date: Fri, 3 Nov 2006 10:01:18 -0000 Subject: [Genome] Rat Ensembl genes and QTLs Message-ID: Dear Genome team, I have a list of Ensembl Gene ids (hundreds) (ENSG....) - I would like to obtain the QTLs that overlap with these genes... How do I do this with the ucsc browser table structre Thankyou Rick From ann at soe.ucsc.edu Fri Nov 3 08:29:21 2006 From: ann at soe.ucsc.edu (Ann Zweig) Date: Fri, 03 Nov 2006 08:29:21 -0800 Subject: [Genome] Human Genome Build In-Reply-To: <200611031628.53056.jesus.lopez@bsc.es> References: <200611031628.53056.jesus.lopez@bsc.es> Message-ID: <454B6E61.1060602@soe.ucsc.edu> Hello Jesus, The most recent human assembly hosted on the UCSC genome browser (we call it: hg18, Mar. 2006) is based on the NCBI Build 36.1 (released Mar. 2006). You can read more about the NCBI release on their website: http://www.ncbi.nlm.nih.gov/genome/guide/human/release_notes.html#b36 If you send the reference for the Nature paper to which you are referring, I can help you determine which assembly it is based on. Regards, ---------- Ann Zweig UCSC Genome Bioinformatics Group http://genome.ucsc.edu Jesus Maria Lopez wrote: > Hi, > I have been looking at human UCSC genome documentation and I have seen that > the genome assembly is Build 36.1 ( finished human genome assembly (hg18, > Mar. 2006)). I thought that the last genome build was published in october > 2005. Are there some differences between your build and the last build > published in Nature or date is only a reference for UCSC? > > > Thank you in advance... > > > Jesus. > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From archanat at soe.ucsc.edu Fri Nov 3 10:17:58 2006 From: archanat at soe.ucsc.edu (Archana Thakkapallayil) Date: Fri, 03 Nov 2006 10:17:58 -0800 Subject: [Genome] Rat Ensembl genes and QTLs In-Reply-To: References: Message-ID: <454B87D6.5030608@soe.ucsc.edu> Hello Rick, This task can be accomplished with the Table Browser intersection tool. Click the "Tables" link at the top of the page and follow these steps: 1. Select the genome and assembly of interest and then choose 'group: Genes and Gene Prediction Tracks', 'track: Ensembl Genes', and 'table: ensGene'. 2. Select "genome" as the region. 3. Click on the "paste list" or "upload list" button to add the list of Ensembl gene ids. You need to have a list of your Ensembl gene ids in a consistent format that the Table Browser can use (one ID per line). 4. Select custom track as the output and press the "get output" button. Then hit "get custom track in table browser" button. This creates a track in the Table Browser that represents the Ensembl genes that you have uploaded. 5. Then you can create an intersection of these genes with the RGD QTL track. To do this make the following selections: group: Mapping and Sequencing Tracks track: RGD QTL table: rgdQtl region: genome Press the "intersection: create" button and on the intersection page choose 'Custom Tracks' as the group name, and the custom track and table name. Then select the radio button 'All RGD QTL records that have any overlap with tb_ensGene' and hit "submit". 6. Output the intersection as BED or custom track. This gives you the list of all QTL's that overlaps with your Ensembl gene list. I hope that this helps you. Please let us know if you have further questions. Regards, Archana UCSC Genome Bioinformatics Group Dixon, Dr R. wrote: > Dear Genome team, > > I have a list of Ensembl Gene ids (hundreds) (ENSG....) - I would like > to obtain the QTLs that overlap with these genes... How do I do this > with the ucsc browser table structre > > Thankyou > > Rick > > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome > From bberman at usc.edu Fri Nov 3 12:45:18 2006 From: bberman at usc.edu (Benjamin Berman) Date: Fri, 03 Nov 2006 12:45:18 -0800 Subject: [Genome] ESPERR data in UCSC Message-ID: <2B061BB9-C2D5-4982-9F8B-FE90C66BD6D6@usc.edu> Hey James, I just wanted to double check that the track UCSC is using for its hg17 "7 species regulatory potential" is indeed the ESPERR data which corresponds to your new (October 2006) Genome Research publication. This is what it seems to indicate on your website, but the new reference isn't listed in the track description at UCSC and their FTP site has files that are from june. Thanks, ben. ----- Ben Berman Postdoctoral Fellow, Preventive Medicine Keck School of Medicine of USC 1441 Eastlake Ave., Rm NOR 4423 Los Angeles, CA 90033 From bina at purdue.edu Fri Nov 3 14:00:02 2006 From: bina at purdue.edu (Minou Bina) Date: Fri, 3 Nov 2006 17:00:02 -0500 Subject: [Genome] Grouping custom tracks In-Reply-To: Message-ID: <005a01c6ff93$69af2840$57bbd280@chem.purdue.edu> Hi I have three custom tracks that I would like to group so that they will appear together as a unit. How do I do that? I hope that the question makes sense to you! Minou Bina From ann at soe.ucsc.edu Fri Nov 3 14:25:01 2006 From: ann at soe.ucsc.edu (Ann Zweig) Date: Fri, 03 Nov 2006 14:25:01 -0800 Subject: [Genome] Grouping custom tracks In-Reply-To: <005a01c6ff93$69af2840$57bbd280@chem.purdue.edu> References: <005a01c6ff93$69af2840$57bbd280@chem.purdue.edu> Message-ID: <454BC1BD.4070804@soe.ucsc.edu> Hello Minou, I think I understand your question. If this doesn't answer it, please don't hesitate to write back with more details. I think you are trying to create a custom track that contains sub-tracks. We call these composite tracks. An example of a composite track on the website would be on the hg17 browser -- the Affy Sites track. If you visit the description page for that track, you will see that it has several sub-tracks. Unfortunately, it is not possible to create composite tracks using the Custom Track tool. If you were to build a mirror site of the browser, you would be able to create composite tracks. If you are just trying to save some screen real estate, let me know and I can give you some tips. If I have missed the point completely, let me know that as well. Regards, ---------- Ann Zweig UCSC Genome Bioinformatics Group http://genome.ucsc.edu Minou Bina wrote: > > Hi > > I have three custom tracks that I would like to group so that they will > appear together as a unit. > > How do I do that? > > I hope that the question makes sense to you! > > Minou Bina > > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From ann at soe.ucsc.edu Fri Nov 3 15:35:21 2006 From: ann at soe.ucsc.edu (Ann Zweig) Date: Fri, 03 Nov 2006 15:35:21 -0800 Subject: [Genome] ESPERR data in UCSC In-Reply-To: <2B061BB9-C2D5-4982-9F8B-FE90C66BD6D6@usc.edu> References: <2B061BB9-C2D5-4982-9F8B-FE90C66BD6D6@usc.edu> Message-ID: <454BD239.20502@soe.ucsc.edu> Hello Ben, Hopefully James will also reply to you, but here's how it looks from our perspective. James Taylor and his team provided us with the data in June 2006. From that data, we created the ESPERR track that you see on the hg17 browser. From what I can tell from the paper in Genome Research, it is based on the June 2006 data. Regards, ---------- Ann Zweig UCSC Genome Bioinformatics Group http://genome.ucsc.edu Benjamin Berman wrote: > Hey James, > > I just wanted to double check that the track UCSC is using for its > hg17 "7 species regulatory potential" is indeed the ESPERR data which > corresponds to your new (October 2006) Genome Research publication. > This is what it seems to indicate on your website, but the new > reference isn't listed in the track description at UCSC and their FTP > site has files that are from june. > > Thanks, > ben. > > ----- > Ben Berman > Postdoctoral Fellow, Preventive Medicine > Keck School of Medicine of USC > 1441 Eastlake Ave., Rm NOR 4423 > Los Angeles, CA 90033 > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From bberman at usc.edu Fri Nov 3 15:38:05 2006 From: bberman at usc.edu (Benjamin Berman) Date: Fri, 03 Nov 2006 15:38:05 -0800 Subject: [Genome] ESPERR data in UCSC In-Reply-To: <454BD239.20502@soe.ucsc.edu> References: <2B061BB9-C2D5-4982-9F8B-FE90C66BD6D6@usc.edu> <454BD239.20502@soe.ucsc.edu> Message-ID: <5FD70D2C-8763-40C2-BE32-92F928AB308B@usc.edu> Yes, James confirmed that the data in UCSC corresponds to the Oct 2006 Genome Research paper. Thanks, ben. On Nov 3, 2006, at 3:35 PM, Ann Zweig wrote: > Hello Ben, > > Hopefully James will also reply to you, but here's how it looks > from our perspective. James Taylor and his team provided us with > the data in June 2006. From that data, we created the ESPERR > track that you see on the hg17 browser. From what I can tell from > the paper in Genome Research, it is based on the June 2006 data. > > Regards, > > ---------- > Ann Zweig > UCSC Genome Bioinformatics Group > http://genome.ucsc.edu > > > > Benjamin Berman wrote: >> Hey James, >> I just wanted to double check that the track UCSC is using for >> its hg17 "7 species regulatory potential" is indeed the ESPERR >> data which corresponds to your new (October 2006) Genome Research >> publication. This is what it seems to indicate on your website, >> but the new reference isn't listed in the track description at >> UCSC and their FTP site has files that are from june. >> Thanks, >> ben. >> ----- >> Ben Berman >> Postdoctoral Fellow, Preventive Medicine >> Keck School of Medicine of USC >> 1441 Eastlake Ave., Rm NOR 4423 >> Los Angeles, CA 90033 >> _______________________________________________ >> Genome maillist - Genome at soe.ucsc.edu >> http://www.soe.ucsc.edu/mailman/listinfo/genome ----- Ben Berman Postdoctoral Fellow, Preventive Medicine Keck School of Medicine of USC 1441 Eastlake Ave., Rm NOR 4423 Los Angeles, CA 90033 From adnan_derti at hms.harvard.edu Fri Nov 3 18:25:19 2006 From: adnan_derti at hms.harvard.edu (Adnan Derti) Date: Fri, 03 Nov 2006 21:25:19 -0500 Subject: [Genome] Genome Browser question: collapsing regions not of interest Message-ID: <454BFA0F.90403@hms.harvard.edu> Hi. I'd like to know if there's an option in the Genome Browser that would allow uninformative regions to be collapsed so that informative regions are highlighed, where "uninformative" would mean "a vertical area not containing an object in any track". For example, let's say you're looking at a gene. Most of the space is taken by introns, so it's difficult to see alternative splicing events. If the introns were shrunk/minimized (i.e., not drawn to scale), the entire exonic structure of the gene could be seen at once. Thanks! Adnan Derti From david.lomelin at ucsf.edu Fri Nov 3 20:57:52 2006 From: david.lomelin at ucsf.edu (Lomelin, David) Date: Fri, 3 Nov 2006 20:57:52 -0800 Subject: [Genome] mRNA to genomic sequence Message-ID: Hi, I'm a student at UCSF working in Neil Risch's lab. I'm interested in retrieving the genomic sequence for a given mRNA refseq sequence so that I could take a look at the intronic regions. I saw that your Table Browser allows me this option exactly as I need; however, I'm interested in obtaining this information programatically so that I could look at multiple regions in a quick and automated way. I tried to retrieve the data by copying and pasting the url and having a program retrieve the results, but apparently, the url does not contain a refseq parameter that allows me to fetch sequences dynamically. Rather, it seems (I'm guessing) that the url contains a process id that contains the refseq somewhere locally on the UCSC site which prevents me from fetching sequences on the fly. Is there any way for me to access your data in a more programmatic fashion? Thank you. --David From jnthnmllr at gmail.com Mon Nov 6 08:44:52 2006 From: jnthnmllr at gmail.com (Jonathan Miller) Date: Mon, 6 Nov 2006 10:44:52 -0600 Subject: [Genome] location of .tpa files Message-ID: <7c73a94c0611060844k21cc5267oc33cbabbb26e3977@mail.gmail.com> Re: location of .tpa files Hi, The Sanger Center referred me to files they say I should be able to find at UCSC ftp site; I can't find them - can you help? Here is the message from the Sanger Center; my orginal query to them is below that (see >); could you also take a quick look at my query and indicate whether you think there may be better alternatives... many thanks jm ----------------------------------------------------------------------------------------------------------------------- Sorry for the delay in getting back to you - I needed to check this with one of our mouse experts. *Apparently you need to get the tiling path for your chromosome, with the ids of all the clones used in the assembly. You can get this from the UCSC ftp site (listed as TPA files). *For each one, you can then check its status. Finished and unfinished clones (with HTG phase statuses) should be in the EMBL database. Sections without an HTG status are whole genome shotgun contigs assembled directly from reads, rather than actual clones - they should also be available from EMBL, though we're guessing you're not so interested in those. I hope this is the information you need. If not, do get back to us and we'll try to help further. Regards Anne Parker Ensembl Web Team ------------------------------------------------------------------------------------------------------------------------ > > Hi, > > I am trying to identify the set of "finished" sequence that was > assembled into the mouse and human genomes. > > In particular, I'd like to find all sequence that, for example, was > used to assemble the mouse X chromosome. > > So that for every location in the assembled mouse X chromosome, I can > identify a "finished" contig(s) (or possibly a set of reads) that > corresponds to this location. > > It appeared to me that the appropriate set of sequences should be at: > > (e.g. for human) > > ftp://ftp.sanger.ac.uk/pub/sequences/human/Chr_X/ > > however, it appears that there are sequences in the assembled genome > (any recent version) that are not represented in this directoy. > *> My goal is to investigate whether certain observations that I have > made about the assembled genome sequences, could in fact be > artifacts of the process of assembly. > > So I would like to look at sets of sequences that cover the full mouse > genome at least once (in its current assembly), but are still at > pre-assembly stages. The full set of traces is probably too crude > for my purposes (I'd want at the very least only high quality > reads). *> *> If you are unable to advise, I'd be grateful if you could direct me > towards someone who could (preferably a name and email address). > *> many thanks and best wishes > > jm From donnak at soe.ucsc.edu Mon Nov 6 12:13:23 2006 From: donnak at soe.ucsc.edu (Donna Karolchik) Date: Mon, 6 Nov 2006 12:13:23 -0800 Subject: [Genome] location of .tpa files References: <7c73a94c0611060844k21cc5267oc33cbabbb26e3977@mail.gmail.com> Message-ID: <009601c701e0$068cc7a0$310aa8c0@donnakLT> hi Jonathan, I suspect you are looking for TPF (i.e. tiling path format) files...we don't know of any TPA files. If so, you can most likely get those from the NCBI site. We do have some tables with clone IDs that might contain the info you're looking for, e.g. chr*_gold or ctgPos, depending on the type of accession/level of assembly structure you want. You can download these from our downloads server at http://hgdownload.cse.ucsc.edu/. -Donna ----------------------------------- Donna Karolchik UCSC Genome Bioinformatics Group http://genome.ucsc.edu ----- Original Message ----- From: "Jonathan Miller" To: Sent: Monday, November 06, 2006 8:44 AM Subject: [Genome] location of .tpa files > Re: location of .tpa files > > Hi, > > The Sanger Center referred me to files they say I should be able to find at > UCSC ftp site; I can't find them - can you help? > Here is the message from the Sanger Center; my orginal query to them is > below that (see >); could you also take a quick look at my query and > indicate whether you think there may be better alternatives... > > many thanks > > jm > ----------------------------------------------------------------------------------------------------------------------- > Sorry for the delay in getting back to you - I needed to check this with > one of our mouse experts. > > *Apparently you need to get the tiling path for your chromosome, with the > ids of all the clones used in the assembly. You can get this from the > UCSC ftp site (listed as TPA files). > > *For each one, you can then check its status. Finished and unfinished > clones (with HTG phase statuses) should be in the EMBL database. > Sections without an HTG status are whole genome shotgun contigs > assembled directly from reads, rather than actual clones - they should > also be available from EMBL, though we're guessing you're not so > interested in those. > > I hope this is the information you need. If not, do get back to us and > we'll try to help further. > > Regards > > Anne Parker > Ensembl Web Team > ------------------------------------------------------------------------------------------------------------------------ >> >> Hi, >> >> I am trying to identify the set of "finished" sequence that was >> assembled into the mouse and human genomes. >> >> In particular, I'd like to find all sequence that, for example, was >> used to assemble the mouse X chromosome. >> >> So that for every location in the assembled mouse X chromosome, I can >> identify a "finished" contig(s) (or possibly a set of reads) that >> corresponds to this location. >> >> It appeared to me that the appropriate set of sequences should be at: >> >> (e.g. for human) >> >> ftp://ftp.sanger.ac.uk/pub/sequences/human/Chr_X/ >> >> however, it appears that there are sequences in the assembled genome >> (any recent version) that are not represented in this directoy. >> > *> My goal is to investigate whether certain observations that I have >> made about the assembled genome sequences, could in fact be >> artifacts of the process of assembly. >> >> So I would like to look at sets of sequences that cover the full mouse >> genome at least once (in its current assembly), but are still at >> pre-assembly stages. The full set of traces is probably too crude >> for my purposes (I'd want at the very least only high quality >> reads). > *> > *> If you are unable to advise, I'd be grateful if you could direct me >> towards someone who could (preferably a name and email address). >> > *> many thanks and best wishes >> >> jm > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome > From jfars at unm.edu Mon Nov 6 11:59:01 2006 From: jfars at unm.edu (James C Farslow) Date: Mon, 6 Nov 2006 12:59:01 -0700 (MST) Subject: [Genome] Human/Chimp Annotation Files Message-ID: <2060362.1162843141088.JavaMail.jfars@unm.edu> Hello, We are trying to identify duplicated ORFs in the human ad chimp genomes. I tried to find annotation files on your website, but have not had any luck looking for what we need. Do you have any files that provide: An ORF name; the chromosome; the ORF start and stop positions; the strand (+/- or W/C). We are specifically looking for known or predicted ORFs, not transcript or microarray data. Positions of known exons/introns would be useful, but not required. Your time and assistance in this matter would be greatly appreciated. Thank you. James Farslow jfars at unm.edu Bergthorsson Lab University of New Mexico, Biology Dept. From benb at fruitfly.org Mon Nov 6 12:48:38 2006 From: benb at fruitfly.org (Benjamin Berman) Date: Mon, 06 Nov 2006 12:48:38 -0800 Subject: [Genome] Table browser / FTP site discrepancy Message-ID: <6E8E3B84-5698-4E39-90C0-86FA87161352@fruitfly.org> I am trying to use the regulatory potential 7 species track for some analysis. I am using hg17. I first downloaded some regions using the table browser. For chrom 19, position 52426843, I get a value of 0.0458949 (see the table browser output .txt file attached to this email): teaview2:~/genome_data/build35_7x_reg_pot.subsets benb$ grep 52426843 build35_7x_reg_pot_vega_nonbinding.txt 52426843 0.0458949 But when i download the entire chromosome wiggle track from the FTP server (ftp://hgdownload.cse.ucsc.edu/goldenPath/hg17/regPotential7X/ chr19.regPotential7X.hg17.gz), I get a different score of 0.049367: teaview2:~/genome_data/build35_7x_reg_pot.subsets benb$ grep 52426843 ../build35_7x_reg_pot/chr19.regPotential7X.hg17 52426843 0.049367 Is this because of the compression alluded to in the table browser output? It says that worst case loss in resolution is 1.7e-05, but this is a difference of more than 1e-3. Do you know why these two don't match up more closely? Thanks for your help, Ben Berman -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: build35_7x_reg_pot_vega_nonbinding.txt Url: http://www.soe.ucsc.edu/pipermail/genome/attachments/20061106/a463e566/build35_7x_reg_pot_vega_nonbinding.txt -------------- next part -------------- From archanat at soe.ucsc.edu Mon Nov 6 16:47:41 2006 From: archanat at soe.ucsc.edu (Archana Thakkapallayil) Date: Mon, 06 Nov 2006 16:47:41 -0800 Subject: [Genome] Genome Browser question: collapsing regions not of interest In-Reply-To: <454BFA0F.90403@hms.harvard.edu> References: <454BFA0F.90403@hms.harvard.edu> Message-ID: <454FD7AD.1010506@soe.ucsc.edu> Hello Adnan, Unfortunately we don't have this feature available on the browser. This is a very good suggestion and we would also like to see this on our browser. We have this in our feature implement list, but due to the number of project commitments, there will most likely be a long delay before we can provide this feature. In the meantime, you could use the 'Alt-Splicing' track on the Human browser, which may be of some use to you. If you click on a gene in this track, you will see a page where the gene is drawn such that the exons are exaggerated, and the introns are minimized. This comes under the 'mRNA and EST track'. I hope this information is helpful to you. Thanks for the suggestion. Regards, Archana UCSC Genome Bioinformatics Group Adnan Derti wrote: > Hi. > > I'd like to know if there's an option in the Genome Browser that would > allow uninformative regions to be collapsed so that informative regions > are highlighed, where "uninformative" would mean "a vertical area not > containing an object in any track". For example, let's say you're > looking at a gene. Most of the space is taken by introns, so it's > difficult to see alternative splicing events. If the introns were > shrunk/minimized (i.e., not drawn to scale), the entire exonic structure > of the gene could be seen at once. > > Thanks! > > Adnan Derti > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome > From birney at ebi.ac.uk Mon Nov 6 13:12:26 2006 From: birney at ebi.ac.uk (Ewan Birney) Date: Mon, 6 Nov 2006 21:12:26 +0000 Subject: [Genome] location of .tpa files In-Reply-To: <009601c701e0$068cc7a0$310aa8c0@donnakLT> References: <7c73a94c0611060844k21cc5267oc33cbabbb26e3977@mail.gmail.com> <009601c701e0$068cc7a0$310aa8c0@donnakLT> Message-ID: <0FB6FC62-BCB9-42C3-B005-5089CF24A2BB@ebi.ac.uk> On 6 Nov 2006, at 20:13, Donna Karolchik wrote: > hi Jonathan, > > I suspect you are looking for TPF (i.e. tiling path format) > files...we don't > know of any TPA files. If so, you can most likely get those from > the NCBI site. > We do have some tables with clone IDs that might contain the info > you're looking > for, e.g. chr*_gold or ctgPos, depending on the type of accession/ > level of > assembly structure you want. You can download these from our > downloads server at > http://hgdownload.cse.ucsc.edu/. > > -Donna Jonathan - I am sorry you are going all around the houses here, but let me suggest something simpler - you need the list of accession numbers in each chromosome, and currently _all_ of those accessions in human are finished and the vast majority of those in mouse are also fininshed. If you want to check, pull out the accessions from EMBL/GenBank and look at the HTG_ tag in the comment lines. To get accession numbers, you can either do a join on the ensembl mysql database like: mysql -e 'select c.name from seq_region c,seq_region chr,assembly am where chr.name = "X" and am.asm_seq_region_id = chr.seq_region_id and am.cmp_seq_region_id = c.seq_region_id and c.coord_system_id = 4' -h ensembldb.ensembl.org -u anonymous homo_sapiens_core_41_36c | perl - ne '($acc) = /^(\w+)\./; print $acc,"\n"' (the perl pipe is to convert text like: AC000055.1.1.93578 to AC000055 ) Or you can (I think) get out this list from the Table Browser at UCSC - not quite sure what to do but it will be something like the accession track in the assembly group tables. For mouse, the corresponding SQL is mysql -e 'select c.name from seq_region c,seq_region chr,assembly am where chr.name = "X" and am.asm_seq_region_id = chr.seq_region_id and am.cmp_seq_region_id = c.seq_region_id and c.coord_system_id = 3' -h ensembldb.ensembl.org -u anonymous mus_musculus_core_41_36b | perl - ne '/CAA/ && next; ($acc) = /(\w+)\./; print $acc,"\n"' I have rather cheekily added a /CAA/ && next in the perl loop, skipping the accessions starting with CAA. This is becuase I happen to know that these are WGS contigs. As I've done this, I've thrown these up on my web site at http://www.ebi.ac.uk/~birney/human_X.txt http://www.ebi.ac.uk/~birney/mouse_X.txt Feel to play around with the above SQL and/or hand it over to your local/favourite geek to help explain what's going on here. From jnthnmllr at gmail.com Mon Nov 6 13:25:15 2006 From: jnthnmllr at gmail.com (Jonathan Miller) Date: Mon, 6 Nov 2006 15:25:15 -0600 Subject: [Genome] location of .tpa files In-Reply-To: <0FB6FC62-BCB9-42C3-B005-5089CF24A2BB@ebi.ac.uk> References: <7c73a94c0611060844k21cc5267oc33cbabbb26e3977@mail.gmail.com> <009601c701e0$068cc7a0$310aa8c0@donnakLT> <0FB6FC62-BCB9-42C3-B005-5089CF24A2BB@ebi.ac.uk> Message-ID: <7c73a94c0611061325k677e8275v5b4ec7fe1cc0f3a5@mail.gmail.com> Hi Ewan, thanks for the detailed instructions; I'm working on it now and will report back to you if I experience any problems. best wishes jm On 11/6/06, Ewan Birney wrote: > > > On 6 Nov 2006, at 20:13, Donna Karolchik wrote: > > > hi Jonathan, > > > > I suspect you are looking for TPF (i.e. tiling path format) > > files...we don't > > know of any TPA files. If so, you can most likely get those from > > the NCBI site. > > We do have some tables with clone IDs that might contain the info > > you're looking > > for, e.g. chr*_gold or ctgPos, depending on the type of accession/ > > level of > > assembly structure you want. You can download these from our > > downloads server at > > http://hgdownload.cse.ucsc.edu/. > > > > -Donna > > > Jonathan - I am sorry you are going all around the houses here, but let > me suggest something simpler - you need the list of accession numbers > in each chromosome, and currently _all_ of those accessions in human are > finished and the vast majority of those in mouse are also fininshed. > If you want to check, pull out the accessions from EMBL/GenBank and > look at the HTG_ tag in the comment lines. > > To get accession numbers, you can either do a join on the ensembl > mysql database like: > > mysql -e 'select c.name from seq_region c,seq_region chr,assembly am > where chr.name = "X" and am.asm_seq_region_id = chr.seq_region_id and > am.cmp_seq_region_id = c.seq_region_id and c.coord_system_id = 4' -h > ensembldb.ensembl.org -u anonymous homo_sapiens_core_41_36c | perl - > ne '($acc) = /^(\w+)\./; print $acc,"\n"' > > > (the perl pipe is to convert text like: > > AC000055.1.1.93578 to > > AC000055 > > ) > > > Or you can (I think) get out this list from the Table Browser at UCSC - > not quite sure what to do but it will be something like the accession > track > in the assembly group tables. > > > For mouse, the corresponding SQL is > > mysql -e 'select c.name from seq_region c,seq_region chr,assembly am > where chr.name = "X" and am.asm_seq_region_id = chr.seq_region_id and > am.cmp_seq_region_id = c.seq_region_id and c.coord_system_id = 3' -h > ensembldb.ensembl.org -u anonymous mus_musculus_core_41_36b | perl - > ne '/CAA/ && next; ($acc) = /(\w+)\./; print $acc,"\n"' > > > I have rather cheekily added a /CAA/ && next in the perl loop, > skipping the > accessions starting with CAA. This is becuase I happen to know that > these are > WGS contigs. > > > As I've done this, I've thrown these up on my web site at > > http://www.ebi.ac.uk/~birney/human_X.txt > http://www.ebi.ac.uk/~birney/mouse_X.txt > > > > Feel to play around with the above SQL and/or hand it over to your > local/favourite geek > to help explain what's going on here. > > > > > > > From chenzhzh at genomics.org.cn Mon Nov 6 21:52:55 2006 From: chenzhzh at genomics.org.cn (zhongzhongchen) Date: Tue, 07 Nov 2006 13:52:55 +0800 Subject: [Genome] How to link TreeFam ? Message-ID: <1162878775.12007.5.camel@localhost> Dear UCSC genome browser, Please allow me to send a e-mail. I am working for TreeFam(http://www.treefam.org/), Beijing Genomics Institute & Chinese Academy of Sciences, in China. TreeFam is a database of phylogenetic trees of gene families found in animals. It aims to develop a curated resource that represents the accurate evolutionary history of all animal gene families, as well as reliable ortholog and paralog assignments. We are now considering ways to display the data and would like to know if it is possible to submit our data so it can be seen as tracks on the UCSC public genome browser and more research community will share it. What we have expected is to display in this way: display an Ensembl gene on the human genome, and go to TreeFam if someone clicks that gene. Possible examples would be: ENST00000383548 TreeFam http://www.treefam.org/cgi-bin/TFinfo.pl?ac=ENST00000383548 That is when someone click TreeFam, it will go to http://www.treefam.org/cgi-bin/TFinfo.pl?ac=ENST00000383548. We will be very pleased if you approve the linkage and tell me how to do it. Thanks in advance! Chen Zhongzhong ^_^ ______________________________________ Beijing Genomics Institute Chinese Academy of Sciences Address: Beijing Airport Industrial Zone B-6 Beijing, 101300, China Tel: 86-10-8048-1197,1197 (http://www.treefam.org/) From palermor at u.washington.edu Mon Nov 6 23:26:21 2006 From: palermor at u.washington.edu (Robert Palermo) Date: Mon, 6 Nov 2006 23:26:21 -0800 Subject: [Genome] Batch BLATs Message-ID: <082EA8D7-9A68-4B42-B9C7-5D7D09D51DD0@u.washington.edu> I have been running a large number of BLAT jobs in interactive mode, aligning about 500 sequence elements against the Rhesus Genome. I have been getting notices that this amount of traffic is now getting penalized. This is a chore I am running in interactive mode tonight, doing in batches of 25 sequences. If you can give me a faster, tractable route for this one-time job, please let me know. Bob Palermo From hwanseok at yumc.yonsei.ac.kr Tue Nov 7 00:59:13 2006 From: hwanseok at yumc.yonsei.ac.kr (=?ks_c_5601-1987?B?wMzIr7yu?=) Date: Tue, 7 Nov 2006 17:59:13 +0900 Subject: [Genome] [Q] How to retrieve cytoband data of hundreds of SNPs in batch Message-ID: <34F26283AF70184CB25B672BFEA655EC42DAC8@yuse0yumc01> Dear UCSC Genome Browser Team, I found following brief url shows me detail information about SNP (rs966102) in graphic browser. http://genome.ucsc.edu/cgi-bin/hgc?hgsid=80191490&g=snp126&i=rs966102 But I would like to retrive cytoband data of hundreds of SNPs. I tried to fetch the url with command line utility like url2file.exe but it failed probably due to hgsid (session id ?) . Could you help me how to retrive them? Sincerely, Hwanseok Rhee, Ph.D Dept. Clinical Genetics Yonsei Univ School of Medicine, Seoul, Korea From donnak at soe.ucsc.edu Tue Nov 7 10:19:34 2006 From: donnak at soe.ucsc.edu (Donna Karolchik) Date: Tue, 7 Nov 2006 10:19:34 -0800 Subject: [Genome] Batch BLATs References: <082EA8D7-9A68-4B42-B9C7-5D7D09D51DD0@u.washington.edu> Message-ID: <002c01c70299$4efbb8c0$6401a8c0@donnakLT> hi Bob, Program-driven use of web-based blat is limited to a maximum of one hit every 15 seconds and no more than 5,000 hits per day. As you have discovered, we have automated processed in place to slow down or halt access that exceeds these limits. This prevents one user from monopolizing our server or (in extreme cases) bringing the server down. If it's feasible, I suggest you install a local copy of blat, which will allow you efficient unlimited access. For more information on downloading and installing blat, see our FAQ: http://genome.ucsc.edu/FAQ/FAQblat#blat3. A lot of individuals at your institution use our software, so I suspect there are local blat servers available. If you ask around, you may be able to find one, if you'd prefer to not install it yourself. -Donna ----------------------------------- Donna Karolchik UCSC Genome Bioinformatics Group http://genome.ucsc.edu ----- Original Message ----- From: "Robert Palermo" To: Cc: Sent: Monday, November 06, 2006 11:26 PM Subject: [Genome] Batch BLATs >I have been running a large number of BLAT jobs in interactive mode, > aligning about 500 sequence elements against the Rhesus Genome. > > I have been getting notices that this amount of traffic is now > getting penalized. > > This is a chore I am running in interactive mode tonight, doing in > batches of 25 sequences. If you can give me a faster, tractable > route for this one-time job, please let me know. > > Bob Palermo > > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome > From donnak at soe.ucsc.edu Tue Nov 7 13:58:04 2006 From: donnak at soe.ucsc.edu (Donna Karolchik) Date: Tue, 7 Nov 2006 13:58:04 -0800 Subject: [Genome] mRNA to genomic sequence References: Message-ID: <018901c702b8$00a01e40$6401a8c0@donnakLT> hi David, Are you familiar with the Table Browser's batch upload feature? This may give you what you want. To retrieve the genomic sequence for several RefSeq mRNAs: 1. List all the mRNA accession IDs in a file, one per line. 2. In the Table Browser, set up your clade, genome, assembly, group, etc. options. 3. Set region to "genome". 4. Click the "upload list" button and upload your list. 5. Select the "sequence" output option, then configure your fasta output as desired. You could then write a script to parse out the information you want from this output. Alternatively, you should be able to retrieve your information by removing the hgsid paramter and adding "position=chrN:start-end" to your URL, where position corresponds to your mRNA position. You may have to tinker with some of the other parameter settings in the URL to fine-tune your query. You can get a complete list of your parameter settings by examining the contents of your "cart" via the URL http://genome.ucsc.edu/cgi-bin/cartDump. If you use a programmatic method to retrieve the data, keep in mind that program-driven use of the Genome Browser is limited to a maximum of one hit every 15 seconds and no more than 5,000 hits per day. -Donna ----------------------------------- Donna Karolchik UCSC Genome Bioinformatics Group http://genome.ucsc.edu ----- Original Message ----- From: "Lomelin, David" To: Sent: Friday, November 03, 2006 8:57 PM Subject: [Genome] mRNA to genomic sequence > Hi, I'm a student at UCSF working in Neil Risch's lab. I'm interested in > retrieving the genomic sequence for a given mRNA refseq sequence so that I > could take a look at the intronic regions. I saw that your Table Browser > allows me this option exactly as I need; however, I'm interested in obtaining > this information programatically so that I could look at multiple regions in a > quick and automated way. I tried to retrieve the data by copying and pasting > the url and having a program retrieve the results, but apparently, the url > does not contain a refseq parameter that allows me to fetch sequences > dynamically. Rather, it seems (I'm guessing) that the url contains a process > id that contains the refseq somewhere locally on the UCSC site which prevents > me from fetching sequences on the fly. Is there any way for me to access your > data in a more programmatic fashion? Thank you. > > --David > > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome > From donnak at soe.ucsc.edu Tue Nov 7 16:20:15 2006 From: donnak at soe.ucsc.edu (Donna Karolchik) Date: Tue, 7 Nov 2006 16:20:15 -0800 Subject: [Genome] Table browser / FTP site discrepancy References: <6E8E3B84-5698-4E39-90C0-86FA87161352@fruitfly.org> Message-ID: <025901c702cb$aecb3c80$6401a8c0@donnakLT> hi Ben, Looks like you've uncovered a bug in the code that calculates the worst case resolution loss number. We've fixed this and it will go out with one of our next releases. In the interim, the bug fix is available on our test server at http://genome-test.cse.ucsc.edu. Thanks for bringing this to our attention, and sorry for any confusion this may have caused. Needless to say, if you need the data to be accurate, you should use the raw data from the file you downloaded, rather than using the compressed data available through the Table Browser. I'd like to mention a few things about the resolution loss numbers we include in the Table Browser query results. In most cases, the number for a value range is calculated for the entire bin(s) of 128 values, even though the values for only part of the bin(s) may be present in your query result. The absolute worst case is displayed -- not all numbers in the set will differ that much from the raw values. You should pay particular attention to tiny numbers near zero. -Donna ----------------------------------- Donna Karolchik UCSC Genome Bioinformatics Group http://genome.ucsc.edu ----- Original Message ----- From: "Benjamin Berman" To: "UCSC genome browser" Sent: Monday, November 06, 2006 12:48 PM Subject: [Genome] Table browser / FTP site discrepancy > > I am trying to use the regulatory potential 7 species track for some > analysis. I am using hg17. I first downloaded some regions using > the table browser. For chrom 19, position 52426843, I get a value of > 0.0458949 (see the table browser output .txt file attached to this > email): > > teaview2:~/genome_data/build35_7x_reg_pot.subsets benb$ grep 52426843 > build35_7x_reg_pot_vega_nonbinding.txt > 52426843 0.0458949 > > But when i download the entire chromosome wiggle track from the FTP > server (ftp://hgdownload.cse.ucsc.edu/goldenPath/hg17/regPotential7X/ > chr19.regPotential7X.hg17.gz), I get a different score of 0.049367: > > teaview2:~/genome_data/build35_7x_reg_pot.subsets benb$ grep > 52426843 ../build35_7x_reg_pot/chr19.regPotential7X.hg17 > 52426843 0.049367 > > Is this because of the compression alluded to in the table browser > output? It says that worst case loss in resolution is 1.7e-05, but > this is a difference of more than 1e-3. Do you know why these two > don't match up more closely? > > Thanks for your help, > Ben Berman > > > -------------------------------------------------------------------------------- > > -------------------------------------------------------------------------------- > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome > From heather at soe.ucsc.edu Tue Nov 7 17:17:28 2006 From: heather at soe.ucsc.edu (Heather Trumbower) Date: Tue, 7 Nov 2006 17:17:28 -0800 (PST) Subject: [Genome] [Q] How to retrieve cytoband data of hundreds of SNPs in batch In-Reply-To: <34F26283AF70184CB25B672BFEA655EC42DAC8@yuse0yumc01> References: <34F26283AF70184CB25B672BFEA655EC42DAC8@yuse0yumc01> Message-ID: Hwanseok: One approach is to use our Table Browser. That is, given a list of SNP identifiers, you can create a custom track that has the coordinates for those SNPs. You can then intersect those coordinates with the cytoBand track. I include detailed instructions for this below. A limitation of this approach is that it loses the specific SNP identifiers. If you wish to retain those, you could use our utilities hgMapToGene.c or hgGeneBands.c (with minor edits to use SNPs rather than genes). Our source code is available at http://hgdownload.cse.ucsc.edu/admin/jksrc.zip, or via CVS as documented at http://genome.ucsc.edu/admin/cvs.html. Instructions for compiling the code are available at http://genome.ucsc.edu/admin/jk-install.html. Let us know if you have further questions. Heather Trumbower UCSC Genome Bioinformatics Group Here are the detailed instructions for using the table browser: Click the "Tables" link at the top of the page and follow these steps: 1. Select the genome and assembly of interest and then choose 'group: Variation and Repeats', 'track: SNPs', and 'table: snp126' . 2. Select "genome" as the region. 3. Click on the "paste list" or "upload list" button to add the list of SNP ids. You need to have a list of your ids in a consistent format that the Table Browser can use (one ID per line). 4. Select custom track as the output and press the "get output" button. Then hit "get custom track in table browser" button. This creates a track in the Table Browser that represents the SNPs that you have uploaded. 5. Then you can create an intersection of these SNPs with the 'Chromosome Band' track. To do this make the following selections: group: Mapping and Sequencing Tracks track: Chromosome Band table: cytoBand region: genome Press the "intersection: create" button and on the intersection page choose 'Custom Tracks' as the group name, and the custom track and table name. Then select the radio button 'All Chromosome Band records that have any overlap with tb_snp126' and hit "submit". 6. Output the intersection as BED or custom track. This gives you the list of all chromosome bands that overlap with your SNPs. > Dear UCSC Genome Browser Team, > > I found following brief url shows me detail information about SNP (rs966102) in graphic browser. > http://genome.ucsc.edu/cgi-bin/hgc?hgsid=80191490&g=snp126&i=rs966102 > But I would like to retrive cytoband data of hundreds of SNPs. > I tried to fetch the url with command line utility like url2file.exe but it failed probably due to hgsid (session id ?) . > Could you help me how to retrive them? > > > > Sincerely, > Hwanseok Rhee, Ph.D > Dept. Clinical Genetics > Yonsei Univ School of Medicine, Seoul, Korea > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome > From yang21 at llnl.gov Tue Nov 7 17:31:45 2006 From: yang21 at llnl.gov (Shan Yang) Date: Tue, 07 Nov 2006 17:31:45 -0800 Subject: [Genome] Program to calculate AL CpG Islands available? Message-ID: <7.0.0.16.2.20061107172833.0259a208@llnl.gov> Hi, I have some DNA sequences and want to know Andy Law CpG islands in them. The CpG software on the Genome Browser only gives me the conventional CpG island, not AL CpG island. Is the program detecting AL CpG island available any where? Thanks a lot! Shan Yang, PhD Genome Biology Division, L-441 Biosciences Directorate Lawrence Livermore National Laboratory 7000 East Ave, Livermore, CA, 94550 Ph: 925-422-7389 Fax: 925-422-2099 From donnak at soe.ucsc.edu Tue Nov 7 19:31:19 2006 From: donnak at soe.ucsc.edu (Donna Karolchik) Date: Tue, 7 Nov 2006 19:31:19 -0800 Subject: [Genome] How to link TreeFam ? References: <1162878775.12007.5.camel@localhost> Message-ID: <033701c702e6$762972f0$6401a8c0@donnakLT> hi Chen, Thanks for your email. We are interested in this data set, and are discussing how best to incorporate it into the Genome Browser. One of our engineers will contact you in the next day or so about this. -Donna ----------------------------------- Donna Karolchik UCSC Genome Bioinformatics Group http://genome.ucsc.edu ----- Original Message ----- From: "zhongzhongchen" To: Sent: Monday, November 06, 2006 9:52 PM Subject: [Genome] How to link TreeFam ? > Dear UCSC genome browser, > > Please allow me to send a e-mail. > I am working for TreeFam(http://www.treefam.org/), Beijing Genomics > Institute & Chinese Academy of Sciences, in China. > > TreeFam is a database of phylogenetic trees of gene families found in > animals. It aims to develop a curated resource > that represents the accurate evolutionary history of all animal gene > families, as well as reliable ortholog and paralog > assignments. > > We are now considering ways to display the data and would like to know > if it is possible to submit our data so it can be > seen as tracks on the UCSC public genome browser and more research > community will share it. > What we have expected is to display in this way: display an Ensembl gene > on the human genome, and go to TreeFam if someone > clicks that gene. > > Possible examples would be: > ENST00000383548 TreeFam > http://www.treefam.org/cgi-bin/TFinfo.pl?ac=ENST00000383548 > That is when someone click TreeFam, it will go to > http://www.treefam.org/cgi-bin/TFinfo.pl?ac=ENST00000383548. > > > We will be very pleased if you approve the linkage and tell me how to do > it. > > Thanks in advance! > > Chen Zhongzhong ^_^ > ______________________________________ > Beijing Genomics Institute > Chinese Academy of Sciences > Address: > Beijing Airport Industrial Zone B-6 > Beijing, 101300, China > Tel: 86-10-8048-1197,1197 > (http://www.treefam.org/) > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome > From Mike.Mitchell at cancer.org.uk Wed Nov 8 07:23:53 2006 From: Mike.Mitchell at cancer.org.uk (Mike Mitchell) Date: Wed, 08 Nov 2006 15:23:53 +0000 Subject: [Genome] Human Promoters In-Reply-To: <20061108150223.29368BCC0FD@post.tau.ac.il> Message-ID: On 8/11/06 15:02, "Liron Levkovitz" wrote: > Thanks for your answer. I just want to make sure I understand: In case the > promoters have different RefSeq IDs but the same locations- they are > promoters of splice variants (as a results from alternative splicing), and > in the case they have the same RefSeq IDs but different locations- they are > alternative promoters of the same gene (see example below). > >> NM_000039_up_1000_chr11_116213549_r >> NM_000039_up_1000_chr22_random_12498_f Each RefSeq ID refers to one unique transcript, thus a gene will multiple associated RefSeq IDs if there are multiple known transcripts. So if you get several RefSeq IDs at the same locations *and strand orientation* then those should be different transcripts of the same gene. Each gene should also uniquely localise to the genome, in your example above you have APO1 localised to chromosome 11 (as it should do) and to chr22_random. This second localisation is worrying. The "random" chromosome assignments indicate that the assembly is believed to be part of a specific chromosome but has yet to be accurately mapped. Perhaps there is something wrong with the chr22_random assembly - which you will need to clarify with the originators of the data and assembly, the Sanger Centre and the NCBI respectively. In summary: different RefSeq IDs, same location and strand = different transcripts of a gene same RefSeq ID, different location = trouble -- Mike Mitchell Bioinformatics & Biostatistics Service Cancer Research UK +44 (0) 207 269 3115 From lironle4 at post.tau.ac.il Wed Nov 8 07:02:20 2006 From: lironle4 at post.tau.ac.il (Liron Levkovitz) Date: Wed, 8 Nov 2006 17:02:20 +0200 Subject: [Genome] Human Promoters In-Reply-To: Message-ID: <20061108150223.29368BCC0FD@post.tau.ac.il> Hello Mike, Thanks for your answer. I just want to make sure I understand: In case the promoters have different RefSeq IDs but the same locations- they are promoters of splice variants (as a results from alternative splicing), and in the case they have the same RefSeq IDs but different locations- they are alternative promoters of the same gene (see example below). >NM_000039_up_1000_chr11_116213549_r >NM_000039_up_1000_chr22_random_12498_f Is this correct? Thanks Liron Levkovitz M.Sc student Tel Aviv University -----Original Message----- From: Mike Mitchell [mailto:Mike.Mitchell at cancer.org.uk] Sent: Thursday, November 02, 2006 6:17 PM To: Liron Levkovitz; Subject: Re: [Genome] Human Promoters This is an example of known gene isoforms (splice variants). The gene UBE2J2 has 4 known isoform, each one of these has it's own RefSeq identifier and in this case they all share the same first exon. If you have a look at hg18 at chr1:1,179,157-1,199,097 you will see the case that you highlighted. -- Mike Mitchell Bioinformatics & Biostatistics Service Cancer Research UK +44 (0) 207 269 3115 +++++++++++++++++++++++++++++++++++++++++++ This Mail Was Scanned By Mail-seCure System at the Tel-Aviv University CC. From Anil.Jegga at cchmc.org Wed Nov 8 08:59:25 2006 From: Anil.Jegga at cchmc.org (Anil Jegga) Date: Wed, 08 Nov 2006 11:59:25 -0500 Subject: [Genome] Genesorter table export - Errors in column headers Message-ID: Hi When I download the gene list (using the "text") after querying from the GeneSorter, the resultant table has the column headers wrongly placed (there are extra tabs in between seome column headers). I am attaching a sample file for your reference. This happens especially when I add additional columns to the table (using the "configure" option. If I download with the default options the column headers are fine. Thanks Anil Anil Jegga Assistant Professor Department of Pediatrics and Division of Biomedical Informatics Cincinnati Children's Hospital Medical Center and University of Cincinnati Tel: (513)-636-0261 Fax: (513)-636-2056 http://anil.cchmc.org -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: genesorter_output_columnHeaderErrors.txt Url: http://www.soe.ucsc.edu/pipermail/genome/attachments/20061108/3874e10d/genesorter_output_columnHeaderErrors-0001.txt From yoyoq at yahoo.com Wed Nov 8 10:48:18 2006 From: yoyoq at yahoo.com (jp d) Date: Wed, 8 Nov 2006 10:48:18 -0800 (PST) Subject: [Genome] Custom track question In-Reply-To: <454A4A52.3000903@soe.ucsc.edu> Message-ID: <20061108184818.10997.qmail@web50409.mail.yahoo.com> Dear genome browser, I have several websites where users click links that load a custom track to the browser. This is fantastically useful, but the custom tracks tend to build up after a while. Is there an option I can include that would delete all previous custom tracks and only load the current one? Thanks John Paul Donohue From bina at purdue.edu Wed Nov 8 11:53:18 2006 From: bina at purdue.edu (Minou Bina) Date: Wed, 8 Nov 2006 14:53:18 -0500 Subject: [Genome] error limit in HHS Message-ID: <002901c7036f$895d8850$57bbd280@chem.purdue.edu> Do you know what are the error bars in the locations of DNase hypersensitive sites that are listed in the ENCODE region? Minou Bina From rhead at soe.ucsc.edu Wed Nov 8 11:45:58 2006 From: rhead at soe.ucsc.edu (Brooke Rhead) Date: Wed, 08 Nov 2006 11:45:58 -0800 Subject: [Genome] Human Promoters In-Reply-To: References: Message-ID: <455233F6.7050304@soe.ucsc.edu> Hello Liron, I would like to add to Mike's excellent answer. It is not unheard of for mRNAs to align to multiple places in the genome. From the RefSeq gene details page: "RefSeq mRNAs were aligned against the human genome using blat; those with an alignment of less than 15% were discarded. When a single mRNA aligned in multiple places, the alignment having the highest base identity was identified. Only alignments having a base identity level within 0.1% of the best and at least 96% base identity with the genomic sequence were kept." There is also an answer to a very similar, previously-asked question located here: http://www.cse.ucsc.edu/pipermail/genome/2005-September/008553.html This FAQ (also referenced in the above link) contains some tips about determining whether an mRNA that aligns to multiple locations is an artifact of the assembly build process or is an actual gene duplication: http://genome.ucsc.edu/FAQ/FAQtracks#tracks9 Also, thank you, Mike, for your answers. -- Brooke Rhead UCSC Genome Bioinformatics Group Mike Mitchell wrote: > > > On 8/11/06 15:02, "Liron Levkovitz" wrote: > >> Thanks for your answer. I just want to make sure I understand: In case the >> promoters have different RefSeq IDs but the same locations- they are >> promoters of splice variants (as a results from alternative splicing), and >> in the case they have the same RefSeq IDs but different locations- they are >> alternative promoters of the same gene (see example below). >> >>> NM_000039_up_1000_chr11_116213549_r >>> NM_000039_up_1000_chr22_random_12498_f > > Each RefSeq ID refers to one unique transcript, thus a gene will multiple > associated RefSeq IDs if there are multiple known transcripts. So if you get > several RefSeq IDs at the same locations *and strand orientation* then those > should be different transcripts of the same gene. > > Each gene should also uniquely localise to the genome, in your example above > you have APO1 localised to chromosome 11 (as it should do) and to > chr22_random. This second localisation is worrying. The "random" chromosome > assignments indicate that the assembly is believed to be part of a specific > chromosome but has yet to be accurately mapped. Perhaps there is something > wrong with the chr22_random assembly - which you will need to clarify with > the originators of the data and assembly, the Sanger Centre and the NCBI > respectively. > > In summary: > > different RefSeq IDs, same location and strand = different transcripts of a > gene > > same RefSeq ID, different location = trouble > From gng5 at email.med.yale.edu Wed Nov 8 11:43:38 2006 From: gng5 at email.med.yale.edu (Grace Gathungu) Date: Wed, 08 Nov 2006 14:43:38 -0500 Subject: [Genome] finding information Message-ID: <1163015018.4552336a10451@webmail.med.yale.edu> Hello, Is there a systematic method, or can you guide me on how to extract expression values for two distinct genes from the same source in a microarray data set that was previously published? Grace Gathungu From kate at soe.ucsc.edu Wed Nov 8 12:57:53 2006 From: kate at soe.ucsc.edu (Kate Rosenbloom) Date: Wed, 8 Nov 2006 12:57:53 -0800 Subject: [Genome] error limit in HHS In-Reply-To: <002901c7036f$895d8850$57bbd280@chem.purdue.edu> References: <002901c7036f$895d8850$57bbd280@chem.purdue.edu> Message-ID: Hello Minou, I can't tell from your email whether you are referring to the NHGRI DNAseI HS data or that from University of Washington/Regulome. In either case, we do not have error bar calculations at UCSC. You could contact the submitting labs as described on the track details pages to see if this is available. Cheers, Kate --- Kate Rosenbloom UCSC Genome Bioinformatics On Nov 8, 2006, at 11:53 AM, Minou Bina wrote: > Do you know what are the error bars in the locations of DNase > hypersensitive > sites that are listed in the ENCODE region? > > > > Minou Bina > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From kate at soe.ucsc.edu Wed Nov 8 15:23:50 2006 From: kate at soe.ucsc.edu (Kate Rosenbloom) Date: Wed, 8 Nov 2006 15:23:50 -0800 Subject: [Genome] Custom track question In-Reply-To: <20061108184818.10997.qmail@web50409.mail.yahoo.com> References: <20061108184818.10997.qmail@web50409.mail.yahoo.com> Message-ID: Hello John Paul, You can remove existing custom tracks from the display when loading others via URL by clearing the custom track file variable for the assembly you are using. To do this, add the following to the URL that loads the new custom tracks: &ctfile_db= where 'db' is replaced with the genome assembly. For example, for data displayed on the March 2006 (hg18) browser, your URL would look similar to: http://genome.ucsc.edu/cgi-bin/hgTracks? db=hg18&hgt.customText=your_URL&ctfile_hg18= If this construct doesn't work for you, please contact me directly. Cheers, Kate --- Kate Rosenbloom UCSC Genome Bioinformatics On Nov 8, 2006, at 10:48 AM, jp d wrote: > Dear genome browser, > I have several websites where users click links that > load > a custom track to the browser. This is fantastically > useful, > but the custom tracks tend to build up after a while. > Is there an option I can include that would delete > all previous custom tracks and only load the current > one? > Thanks > John Paul Donohue > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From archanat at soe.ucsc.edu Wed Nov 8 15:37:10 2006 From: archanat at soe.ucsc.edu (Archana Thakkapallayil) Date: Wed, 08 Nov 2006 15:37:10 -0800 Subject: [Genome] Human/Chimp Annotation Files In-Reply-To: <2060362.1162843141088.JavaMail.jfars@unm.edu> References: <2060362.1162843141088.JavaMail.jfars@unm.edu> Message-ID: <45526A26.1020602@soe.ucsc.edu> Hello James, You could get this information using any of the gene prediction tracks in either assembly, such as Known genes, RefSeq genes, N-SCAN , Genscan genes etc. You can either use the Table Browser to get the selected fields from the respective tables or download the tables in bulk from our download server at: http://hgdownload.cse.ucsc.edu/downloads.html From here choose the organism you are interested in and then choose 'Annotation database'. The corresponding files are: knownGene.txt.gz refGene.txt.gz nscanGene.txt.gz genscan.txt.gz More information on using the Table Browser is here: http://genome.ucsc.edu/goldenPath/help/hgTablesHelp.html I hope this information is helpful to you. If you have further questions, please do not hesitate to contact us again. Regards, Archana UCSC Genome Bioinformatics Group James C Farslow wrote: > Hello, > We are trying to identify duplicated ORFs in the human ad chimp > genomes. I tried to find annotation files on your website, but have > not had any luck looking for what we need. Do you have any files that > provide: > > An ORF name; > the chromosome; > the ORF start and stop positions; > the strand (+/- or W/C). > > We are specifically looking for known or predicted ORFs, not transcript > or microarray data. Positions of known exons/introns would be useful, > but not required. > > Your time and assistance in this matter would be greatly appreciated. > > Thank you. > > James Farslow > jfars at unm.edu > Bergthorsson Lab > University of New Mexico, Biology Dept. > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome > From rhead at soe.ucsc.edu Wed Nov 8 16:19:23 2006 From: rhead at soe.ucsc.edu (Brooke Rhead) Date: Wed, 08 Nov 2006 16:19:23 -0800 Subject: [Genome] finding information In-Reply-To: <1163015018.4552336a10451@webmail.med.yale.edu> References: <1163015018.4552336a10451@webmail.med.yale.edu> Message-ID: <4552740B.4080507@soe.ucsc.edu> Hello Grace, There are two tools you could use for extracting expression data from the Genome Browser: the Table Browser and the Gene Sorter. The Table Browser can be used to extract data from a particular track in a particular location. Before you start, however, you will need to figure out if the expression data in which you are interested are currently contained in one of our annotation tracks. Look in the "Expression and Regulation" section of tracks, and click on the blue track names to read descriptions of the tracks. When you find the expression data you wish to use, click on the "View table schema" link on the description page to get the name of the table underlying the track. Now hit the "Tables" link in the blue bar at the top of the page. This will take you to the Table Browser, where you can select the clade, genome, assembly, group (expression and regulation), track, and table to use. For region, choose the position range of your gene of interest. Make any other selections you desire (such as entering an output file name), then hit "get output" to get the data in the selected table from the selected region. The other tool that can extract expression data from the Genome Browser is the Gene Sorter. (See the Gene Sorter User's Guide here: http://genome.ucsc.edu/goldenPath/help/hgNearHelp.html .) To get to the Gene Sorter, click the "Gene Sorter" link in the blue bar at the top of the page. Select your genome and assembly of interest and look up your gene(s) of interest using the search box. Hit the "configure" button to turn on the display of data columns containing the expression data you want. If you wish to limit the data displayed to only your two genes of interest, hit the "filter" button and paste the two gene names in the "Name - Gene Name/Select Gene" section (or, if using accession numbers, paste the identifiers in the appropriate filter box). When you have all of the information displayed in the Gene Sorter the way you want, hit the "text" button to get the expression data in numerical format. I hope one of these methods works for you. If you need more information, please feel free to write back to this mailing list. -- Brooke Rhead UCSC Genome Bioinformatics Group Grace Gathungu wrote: > Hello, > > Is there a systematic method, or can you guide me on how to extract expression > values for two distinct genes from the same source in a microarray data set > that was previously published? > > Grace Gathungu > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From donnak at soe.ucsc.edu Wed Nov 8 16:24:49 2006 From: donnak at soe.ucsc.edu (Donna Karolchik) Date: Wed, 8 Nov 2006 16:24:49 -0800 Subject: [Genome] Program to calculate AL CpG Islands available? References: <7.0.0.16.2.20061107172833.0259a208@llnl.gov> Message-ID: <0a3e01c70395$b1ec43d0$6401a8c0@donnakLT> hi Shan, The software used for computing the Andy Law CpG Islands annotation is actually a combination of two programs. The first is a program in kent/src/oneShot/preProcGgfAndy/ -- this must be compiled in the Genome Browser source tree. See the FAQ (http://genome.ucsc.edu/FAQ/FAQdownloads#download27) for more information on downloading and building the source tree. The second is Andy Law's perl script (slightly modifed by UCSC), which I will send you in a separate email message.off-list.. To translate the output of Andy's script into UCSC's particular table format, a perl inline command is tacked on at the end. Here's an example of how you would run these programs on a fasta file $f: ~/bin/$MACHTYPE/preProcGgfAndy $f \ | ggf-andy-cpg-island.pl \ | perl -wpe 'chomp; ($s,$e,$cpg,$n,$c,$g,$oE) = split("\t"); $s--; \ $gc = $c + $g; $pCpG = (100.0 * 2 * $cpg / $n); \ $pGc = (100.0 * $gc / $n); \ $_ = "'$chr'\t$s\t$e\tCpG: $cpg\t$n\t$cpg\t$gc\t" . \ "$pCpG\t$pGc\t$oE\n";' \ >> cpgIslandGgfAndy.bed UCSC runs this on masked sequence for mammals so that Alus (especially in human) are not tagged as CpG islands. However, in chicken we use unmasked sequence. -Donna ----------------------------------- Donna Karolchik UCSC Genome Bioinformatics Group http://genome.ucsc.edu ----- Original Message ----- From: "Shan Yang" To: Sent: Tuesday, November 07, 2006 5:31 PM Subject: [Genome] Program to calculate AL CpG Islands available? > Hi, > > I have some DNA sequences and want to know Andy Law CpG islands in > them. The CpG software on the Genome Browser only gives me the > conventional CpG island, not AL CpG island. Is the program detecting > AL CpG island available any where? > > Thanks a lot! > > Shan Yang, PhD > Genome Biology Division, L-441 > Biosciences Directorate > Lawrence Livermore National Laboratory > 7000 East Ave, Livermore, CA, 94550 > > Ph: 925-422-7389 > Fax: 925-422-2099 > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome > From kayla at soe.ucsc.edu Wed Nov 8 16:54:35 2006 From: kayla at soe.ucsc.edu (Kayla Smith) Date: Wed, 08 Nov 2006 16:54:35 -0800 Subject: [Genome] Genesorter table export - Errors in column headers In-Reply-To: References: Message-ID: <45527C4B.4050304@cse.ucsc.edu> Hi Anil, Thanks for pointing this out --It's fixed on our test site now: http://genome-test.cse.ucsc.edu/ and the fix will eventually percolate up to our main site, though it may take a week or so. Please try it out on genome-test in the meantime, and let us know if you encounter any more problems. Keep in mind that this is our test server, and data here has not gone through our rigorous QA process. But you are welcome to use it if it will be of use to you in the interim. Kayla Smith UCSC Genome Bioinformatics Group Anil Jegga wrote: > Hi > > When I download the gene list (using the "text") after querying from > the GeneSorter, the resultant table has the column headers wrongly > placed (there are extra tabs in between seome column headers). I am > attaching a sample file for your reference. This happens especially when > I add additional columns to the table (using the "configure" option. If > I download with the default options the column headers are fine. > > Thanks > Anil > > Anil Jegga > Assistant Professor > Department of Pediatrics and Division of Biomedical Informatics > Cincinnati Children's Hospital Medical Center and University of > Cincinnati > Tel: (513)-636-0261 > Fax: (513)-636-2056 > http://anil.cchmc.org From ebr26 at student.canterbury.ac.nz Wed Nov 8 21:35:37 2006 From: ebr26 at student.canterbury.ac.nz (Emmanuel Buschiazzo) Date: Thu, 09 Nov 2006 18:35:37 +1300 Subject: [Genome] Net data Message-ID: <4552BE29.80108@student.canterbury.ac.nz> Hello, I would be interested to use and visualize data from Human Net obtained from regions of mammalian genomes homologous to a 2 Mb region of the human genome. Do you know of any program available to analize and especially visualize chromosomal rearrangements from data obtained in Tables? Best regards, Emmanuel Buschiazzo. From GILADI at hadassah.org.il Thu Nov 9 00:35:37 2006 From: GILADI at hadassah.org.il (????? ???? - Giladi Hilla) Date: Thu, 9 Nov 2006 10:35:37 +0200 Subject: [Genome] Transcription Factor Binding Site Message-ID: <9382027D01ABA04C92708FF63D4EFB210168F180@EXCHANGE.DOM.HADASSAH.ORG.IL> Dear Sir, I am interested in the human miR-122 microRNA region on chromosome 18. When I searched this region with the UCSC Browser dated May 2004 for transcription factor binding sites, I found a consensus for the binding of the FoxO1, FoxO3 and FoxO4 transcription factors, located approximately 920 bp upstream of miR-122. When I repeated the search with the March 2006 version of the UCSC Browser, these putative TFB sites no longer appear. I did compare the published consensus binding site (there are several which differ at one position) to the putative one upstream of miR-122 and the match seems good. I would appreciate it if you explain to me what are the guidelines that determine when these sites appear in the browser. Thank You very much in advance Hilla Giladi Hilla Giladi Ph.D Goldyne Savad Institute of Gene Therapy Hadassah University Hospital P.O.Box 12000, Jerusalem, Israel Tel: 972-2-6777998 Fax: 972-2-6430982 ************************************************************************************ This footnote confirms that this email message has been scanned by PineApp Mail-SeCure for the presence of malicious code, vandals & computer viruses. ************************************************************************************ From ann at soe.ucsc.edu Thu Nov 9 11:34:02 2006 From: ann at soe.ucsc.edu (Ann Zweig) Date: Thu, 09 Nov 2006 11:34:02 -0800 Subject: [Genome] Net data In-Reply-To: <4552BE29.80108@student.canterbury.ac.nz> References: <4552BE29.80108@student.canterbury.ac.nz> Message-ID: <455382AA.9090403@cse.ucsc.edu> Hello Emmanuel, The best way to visualize the Net track is to use the UCSC Genome Browser. Most of the tables in the Table Browser are displayed in the Browser. Follow these steps to see the Net track in your 2 Mb region of interest in the human genome: 1. Open the Genome Browser to the latest Human assembly: http://genome.ucsc.edu/cgi-bin/hgGateway From this gateway page, choose: vertebrate, human, Mar 2006, then enter your region of interest into the position box. Press 'submit'. 2. Press the 'hide all' button to hide all of the annotation tracks on the human assembly. 3. From the track controls, find the mammalian Net tracks of interest to you (e.g. chimp, mouse, rat, cow, etc.). Under each Net track you are interested in visualizing, change the visibility from 'hide' to 'full'. When you have changed the visibility of all of the Net tracks you are interested in, press the "refresh" button in the Genome Browser just under the main track display (not the 'refresh' button in your Internet Browser). All of your Net tracks should now be open in your area of interest on the human assembly. Another track that may be of interest to you is the Conservation track (located near the Net tracks in the controls). I hope this has answered your question. Regards, ---------- Ann Zweig UCSC Genome Bioinformatics Group http://genome.ucsc.edu Emmanuel Buschiazzo wrote: > Hello, > > I would be interested to use and visualize data from Human Net obtained > from regions of mammalian genomes homologous to a 2 Mb region of the > human genome. > > Do you know of any program available to analize and especially visualize > chromosomal rearrangements from data obtained in Tables? > > Best regards, > > Emmanuel Buschiazzo. > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From liujin at email.chop.edu Thu Nov 9 11:16:12 2006 From: liujin at email.chop.edu (Jinglan Liu) Date: Thu, 09 Nov 2006 14:16:12 -0500 Subject: [Genome] fosmid clones Message-ID: Where can I purchase the fosmid clones listed on your browser? CHORI seems not to provide them. Thanks! Jinglan Liu, Ph.D. Postdoctoral Fellow in the laboratory of Dr. Ian Krantz Division of Human and Molecular Genetics The Children's Hospital of Philadelphia 1012 Abramson Research Center 3615 Civic Center Boulevard Philadelphia, PA 19104-4318 Phone (267) 426-0086 FAX (215) 590-3850 From ortend at boystown.org Thu Nov 9 12:36:54 2006 From: ortend at boystown.org (Orten, Dana J) Date: Thu, 9 Nov 2006 14:36:54 -0600 Subject: [Genome] pdf file Message-ID: How do I show the chromosome and position in the pdf or ps file? -- Dana Jo Orten, Ph.D. ortend at boystown.org Center for Hereditary Communication Disorders Boys Town National Research Hospital 555 N. 30th Street, Omaha, NE 68131 402-498-6698 Fax: 402-498-6331 I praise you because I am fearfully and wonderfully made. (Psalms 139:14) From Pierre.Paradis at ircm.qc.ca Thu Nov 9 12:14:46 2006 From: Pierre.Paradis at ircm.qc.ca (Paradis Pierre) Date: Thu, 9 Nov 2006 15:14:46 -0500 Subject: [Genome] exon annotation Message-ID: <7377A2B0C8A794469B4DFAD80FEA8D3B3BE883@pandore.ircm.priv> We have found a possible anotation error for the human and mouse GATA4 gene. In Ensembl, we have observed 7 exons while we observe only 6 in UCSC. However, if we blat the cDNA in UCSC, than the extra exon appears. Sincerely, Pierre Pierre Paradis, Ph.D., Chercheur associ? chevronn?, Laboratoire de D?veloppement et Diff?renciation Cardiaques, Institut de Recherches Cliniques de Montr?al (IRCM), 110 des Pins ouest, Montr?al, Qu?bec Canada, H2W 1R7 Tel : 514-987-5658 / Fax: : 514-987-5575 pierre.paradis at ircm.qc.ca Pierre Paradis, Ph.D., Chercheur associ? chevronn?, Laboratoire de D?veloppement et Diff?renciation Cardiaques, Institut de Recherches Cliniques de Montr?al (IRCM), 110 des Pins ouest, Montr?al, Qu?bec Canada, H2W 1R7 Tel : 514-987-5658 / Fax: : 514-987-5575 pierre.paradis at ircm.qc.ca From ann at soe.ucsc.edu Thu Nov 9 13:52:48 2006 From: ann at soe.ucsc.edu (Ann Zweig) Date: Thu, 09 Nov 2006 13:52:48 -0800 Subject: [Genome] Transcription Factor Binding Site In-Reply-To: <9382027D01ABA04C92708FF63D4EFB210168F180@EXCHANGE.DOM.HADASSAH.ORG.IL> References: <9382027D01ABA04C92708FF63D4EFB210168F180@EXCHANGE.DOM.HADASSAH.ORG.IL> Message-ID: <4553A330.1050602@cse.ucsc.edu> Hello Hilla, I see what you are looking at: the three binding sites in the TFBS track on hg17 did not make it into the track in the hg18 (March 2006) assembly. I checked with the developer of the TFBS track and you have found an error in the way the data were processed. We apologize for any inconvenience. We will be recreating this track and will make it available on the public server as soon as possible. Rest assured that those binding sites that are currently listed in the hg18 TFBS track are all correct. There are, however, some binding sites that are missing from the track (three of which you found). When we recreate the track, your missing binding sites will reappear. The TFBS track on hg17 is correct and complete. Thanks for bringing this to our attention. We will send you an email when the track has been updated. Regards, ---------- Ann Zweig UCSC Genome Bioinformatics Group http://genome.ucsc.edu ????? ???? - Giladi Hilla wrote: > > > Dear Sir, > > I am interested in the human miR-122 microRNA region on chromosome 18. > > > > When I searched this region with the UCSC Browser dated May 2004 for transcription factor binding sites, I found a consensus for the binding of the FoxO1, FoxO3 and FoxO4 transcription factors, located approximately 920 bp upstream of miR-122. > > > > When I repeated the search with the March 2006 version of the UCSC Browser, these putative TFB sites no longer appear. > > > > I did compare the published consensus binding site (there are several which differ at one position) to the putative one upstream of miR-122 and the match seems good. > > > > I would appreciate it if you explain to me what are the guidelines that determine when these sites appear in the browser. > > > > Thank You very much in advance > > Hilla Giladi > > > > Hilla Giladi Ph.D > Goldyne Savad Institute of Gene Therapy > Hadassah University Hospital > P.O.Box 12000, Jerusalem, Israel > Tel: 972-2-6777998 > Fax: 972-2-6430982 > > > > > > > ************************************************************************************ > This footnote confirms that this email message has been scanned by > PineApp Mail-SeCure for the presence of malicious code, vandals & computer viruses. > ************************************************************************************ > > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From fanhsu at soe.ucsc.edu Thu Nov 9 14:00:15 2006 From: fanhsu at soe.ucsc.edu (Fan Hsu) Date: Thu, 9 Nov 2006 14:00:15 -0800 Subject: [Genome] exon annotation In-Reply-To: <7377A2B0C8A794469B4DFAD80FEA8D3B3BE883@pandore.ircm.priv> Message-ID: Hi Pierre, Which genome release(s) were you using? I looked at the latest hg18 (March, 2006) and the previous release hg17 (May, 2004), they both seem to have the 7 exons GATA4 in our UCSC Known Genes track. Fan. -----Original Message----- From: genome-bounces at soe.ucsc.edu [mailto:genome-bounces at soe.ucsc.edu]On Behalf Of Paradis Pierre Sent: Thursday, November 09, 2006 12:15 PM To: genome at soe.ucsc.edu Subject: [Genome] exon annotation We have found a possible anotation error for the human and mouse GATA4 gene. In Ensembl, we have observed 7 exons while we observe only 6 in UCSC. However, if we blat the cDNA in UCSC, than the extra exon appears. Sincerely, Pierre Pierre Paradis, Ph.D., Chercheur associ? chevronn?, Laboratoire de D?veloppement et Diff?renciation Cardiaques, Institut de Recherches Cliniques de Montr?al (IRCM), 110 des Pins ouest, Montr?al, Qu?bec Canada, H2W 1R7 Tel : 514-987-5658 / Fax: : 514-987-5575 pierre.paradis at ircm.qc.ca Pierre Paradis, Ph.D., Chercheur associ? chevronn?, Laboratoire de D?veloppement et Diff?renciation Cardiaques, Institut de Recherches Cliniques de Montr?al (IRCM), 110 des Pins ouest, Montr?al, Qu?bec Canada, H2W 1R7 Tel : 514-987-5658 / Fax: : 514-987-5575 pierre.paradis at ircm.qc.ca _______________________________________________ Genome maillist - Genome at soe.ucsc.edu http://www.soe.ucsc.edu/mailman/listinfo/genome From ann at soe.ucsc.edu Thu Nov 9 14:07:40 2006 From: ann at soe.ucsc.edu (Ann Zweig) Date: Thu, 09 Nov 2006 14:07:40 -0800 Subject: [Genome] pdf file In-Reply-To: References: Message-ID: <4553A6AC.6090400@cse.ucsc.edu> Hello Dana, First you must display the chromosome and position in the Genome Browser, then it will appear automatically in the PDF/PS file. To display the chromosome and position in the Browser, you will edit the configuration of the Base Position track. This is the track that displays at the very top of the set of tracks in the browser image. Click on the "Base Position" track control, to go to the configuration page. On the configuration page, check the box labeled "Display: position". Press the 'submit' button to return to the browser. This will display the chromosomal position at the top of the browser image. Now, when you choose PDF/PS, the chromosome and position will print to the file. Hope this is helpful. Be sure to let us know if you have other questions. Regards, ------------ Ann Zweig UCSC Genome Bioinformatics Group http://genome.ucsc.edu Orten, Dana J wrote: > How do I show the chromosome and position in the pdf or ps file? > > -- > Dana Jo Orten, Ph.D. ortend at boystown.org > Center for Hereditary Communication Disorders > Boys Town National Research Hospital > 555 N. 30th Street, Omaha, NE 68131 > 402-498-6698 Fax: 402-498-6331 > > I praise you because I am fearfully and wonderfully made. > (Psalms 139:14) > > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From hiram at soe.ucsc.edu Thu Nov 9 14:08:29 2006 From: hiram at soe.ucsc.edu (Hiram Clawson) Date: Thu, 09 Nov 2006 14:08:29 -0800 Subject: [Genome] pdf file In-Reply-To: References: Message-ID: <4553A6DD.5030700@soe.ucsc.edu> Good Afternoon Dana: You can use the options on the base track to get these items displayed on the pdf image. Use the (grey) control button to the very left of the base position track display, or the label on the base position track control to get into the options for the base position track. --Hiram Orten, Dana J wrote: > How do I show the chromosome and position in the pdf or ps file? > > -- > Dana Jo Orten, Ph.D. ortend at boystown.org > Center for Hereditary Communication Disorders > Boys Town National Research Hospital > 555 N. 30th Street, Omaha, NE 68131 > 402-498-6698 Fax: 402-498-6331 From ann at soe.ucsc.edu Thu Nov 9 14:59:05 2006 From: ann at soe.ucsc.edu (Ann Zweig) Date: Thu, 09 Nov 2006 14:59:05 -0800 Subject: [Genome] fosmid clones In-Reply-To: References: Message-ID: <4553B2B9.4070306@cse.ucsc.edu> Hello Jinglan Liu, We don't maintain a clone library here, however I searched the NCBI websi