From gulban at sickkids.ca Fri Dec 1 12:18:16 2006 From: gulban at sickkids.ca (omid gulban) Date: Fri, 1 Dec 2006 12:18:16 -0800 Subject: [Genome] help with a gene list Message-ID: <000601c71585$d924cdd0$1ecf148e@ViewSonic> Hello All, I am a new user of the UCSC genome browser system. I would like to optain a tab-delimited file containing the following information from the most recent Human genome. gene name other aliases chromosome start end strand Thank You Omid From kate at soe.ucsc.edu Fri Dec 1 11:43:00 2006 From: kate at soe.ucsc.edu (Kate Rosenbloom) Date: Fri, 1 Dec 2006 11:43:00 -0800 Subject: [Genome] D.melanogaster browser not loading? In-Reply-To: <456F091B.10904@cbio.mskcc.org> References: <456F091B.10904@cbio.mskcc.org> Message-ID: Hello John, The problem you are encountering is due to corruption of your custom track file on our file server. We are investigating the cause. It would be helpful to know if you were at one time able to view this track (chr2R_alignFrags). Did it succesfully load initially, then later the problem occurred? Also, how did you upload the data -- via file upload or from a URL ? Please respond to me directly, off the list. To clear the problem with your browser, you will need to reset it (use the "Click here to reset" link on the DM Gateway page). Then reload your custom tracks. Cheers, Kate --- Kate Rosenbloom UCSC Genome Bioinformatics On Nov 30, 2006, at 8:38 AM, John Major wrote: > Hello, I am trying to load the DM genome browser, but have been getting > this error message for the last day or so: > > Expecting at least 6 words line 645157 of > ../trash/ct/ct_genome_7b4a_dca7f0.bed got 3 > > Is this an internal problem, or something problematic with my browser? > > Thanks, > John > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From ann at soe.ucsc.edu Fri Dec 1 11:54:37 2006 From: ann at soe.ucsc.edu (Ann Zweig) Date: Fri, 01 Dec 2006 11:54:37 -0800 Subject: [Genome] help with a gene list In-Reply-To: <000601c71585$d924cdd0$1ecf148e@ViewSonic> References: <000601c71585$d924cdd0$1ecf148e@ViewSonic> Message-ID: <4570887D.9020908@soe.ucsc.edu> Hello Omid, You can obtain such a tab-delimited file by using the Table Browser on our website. Press the "Tables" link in the blue navigation bar across the top of the browser window. I will give you step-by-step instructions for the first 500,000 bases of chrX -- you can extrapolate from there. We have several gene annotation tracks on the browser, I will explain how to use the Known Gene track, but if you want to use another track, you can change the 'track' and 'table' selections in the instructions. Configure the Table Browser like so: genome: Human assembly: Mar. 2006 group: Genes and Gene Prediction Tracks track: Known Genes table: knownGene position: chrX:1-500000 output format: selected fields from primary and related tables Press "get output" button. From this page, choose the fields from the knownGene table that you would like to view. In your case: name, chrom, strand, either or both txStart/cdsStart, txEnd/cdsEnd. This will provide you all of the information you asked for except "other aliases". To include other aliases, you will need to scroll down this page and click on the table named "kgXref". This table includes other gene names such as SWISS=PROT, RefSeq, etc. After checking the kgXref box, scroll to the bottom of the page and press the "Allow Selection From Checked Tables" button. Now, in the hg18.kgXref section, select any other names you would like to see in your output. Depending on your selections, your output will look something like this: #hg18.knownGene.name hg18.knownGene.chrom hg18.knownGene.strand hg18.knownGene.txStart hg18.knownGene.txEnd hg18.kgXref.mRNA hg18.kgXref.spID hg18.kgXref.spDisplayID hg18.kgXref.geneSymbol hg18.kgXref.refseq NM_018390 chrX + 132991 160020 NM_018390 Q9NUJ7 Q9NUJ7_HUMAN PLCXD1 NM_018390 NM_199326 chrX - 214971 222590 NM_199326 Q96H01 Q96H01_HUMAN PPP2R3B NM_199326 NM_013239 chrX - 214971 267627 BC063429,NM_013239, Q9Y5P8,Q9Y5P8, 2ACC_HUMAN,2ACC_HUMAN, PPP2R3B,PPP2R3B, NM_013239,NM_013239, BC063429 chrX - 214975 267445 BC063429 Q96FD8 Q96FD8_HUMAN PPP2R3B NM_013239 BC063429 chrX - 214975 267445 BC063429,NM_013239, Q9Y5P8,Q9Y5P8, 2ACC_HUMAN,2ACC_HUMAN, PPP2R3B,PPP2R3B, NM_013239,NM_013239, I hope this helps you get started using the UCSC Genome Browser. Please don't hesitate to write back if you need more guidance. Regards, ---------- Ann Zweig UCSC Genome Bioinformatics Group http://genome.ucsc.edu omid gulban wrote: > Hello All, > > I am a new user of the UCSC genome browser system. > > I would like to optain a tab-delimited file containing the following information from the most recent Human genome. > > gene name > other aliases > chromosome > start > end > strand > > Thank You > Omid > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From rhead at soe.ucsc.edu Fri Dec 1 15:11:56 2006 From: rhead at soe.ucsc.edu (Brooke Rhead) Date: Fri, 01 Dec 2006 15:11:56 -0800 Subject: [Genome] question about BLAT In-Reply-To: References: <456F8AA8.2090801@soe.ucsc.edu> Message-ID: <4570B6BC.9060706@soe.ucsc.edu> Hi Bill, I think I understand now. You can see the thick sections of your BLAT alignment on the website (which correspond to exons, in your case), but you would like to get the actual genomic coordinates of the starts and ends of these sections. One way to get these coordinates is to choose "psl" as the output type on the BLAT page. Instead of hyperlinks to the Genome Browser, this option will return results in psl format, which you can read about here: http://genome.ucsc.edu/FAQ/FAQformat#format2 . You can determine the genomic coordinates from the "tStarts" and "blockSizes" fields. Note that some fields in psl format are treated differently for alignments on negative strands (explained in the link above). Also note that the coordinates in psl files have a zero-based start and a one-based end. This means that you need to add one to the tStart coordinate to get the same number that is in the Genome Browser display (see further explanation on this here: http://genome.ucsc.edu/FAQ/FAQtracks#tracks1 .) You can also find the start and end coordinates for alignments by looking at the BLAT alignment details page (although this may not be very useful to you if you have a lot of results to examine). If you do a BLAT search with the "hyperlink" output type and then click on the BLATed item in the display, you will see the list of successful alignments. Click on the first link to see the details of the displayed alignment. Note that the coordinates of each block are displayed in the Side by Side Alignment section. (Unlike the psl format coordinates, these coordinates are regular Genome Browser display coordinates, and have a 1-based start and a 1-based end.) I hope this information helps. If you have further questions, please do not hesitate to ask. -- Brooke Rhead UCSC Genome Bioinformatics Group William Hastings wrote: > Hi Brooke, > > I think I partially figured out the answer to my question. Basically I > am looking at possible splice variants of a gene, performed RT PCR with > primers in the 3' and 5' UTR to see if different products would result. > This was the case; we sequenced the bands and used BLAT (from your > website) to align. They do align with different sequences of the target > gene, but we weren't sure how to determine which exons are contained in > the splice variants. However, we realized the exons are shown in the > alignment results. The problem we still have is figuring out where on > the mRNA sequence the exons are, i.e. at which nucleotide does exon 1 > start and end, etc. > > I'm fairly new to these types of analyses, so any info you could give > would be great. > > Thanks again, > Bill > >> Hi Bill, >> >> Can you give me some more information about what you would like to do? >> Do you mean that you are using BLAT to align genomic sequences to an >> assembly, and you wish to know where the exons occur in your BLATed >> sequence? If so, which assembly are you using, and which gene track >> in that assembly would you be interested in using to determine exon >> locations (e.g., Known Genes, RefSeq Genes, etc.)? Also, are you using >> the web-based BLAT from our website, or have you downloaded >> command-line BLAT? >> >> -- >> Brooke Rhead >> UCSC Genome Bioinformatics Group >> >> >> >> William Hastings wrote: >>> Hello, >>> >>> In our alignment we would like to determine the sequences of the >>> exons. Is there an easy way to do this? >>> >>> Thanks very much, >>> >>> Bill Hastings > > From archie_russell at merck.com Fri Dec 1 15:43:24 2006 From: archie_russell at merck.com (Russell, Archie) Date: Fri, 1 Dec 2006 15:43:24 -0800 Subject: [Genome] Per-species conservation Message-ID: <9BEE7CC4462DB14997A5C8CF8F3BEB02010DE7D8@ussemx1100.merck.com> Hi, Could you tell me how to get the data used to generate the "conservation" tracks that relate human to each other species? I can use hgWiggle to get the phastcons17way data (0 - 1 range) which seems to be equivalent to the large portion of the "Vertebrare Multiz Alignment and Conservation (17 species)" track, but I can't figure out how to access the data that makes the little subtracks below this that correspond to individual species. Thanks, Archie Archie Russell Rosetta Inpharmatics 206-802-6312 ------------------------------------------------------------------------------ Notice: This e-mail message, together with any attachments, contains information of Merck & Co., Inc. (One Merck Drive, Whitehouse Station, New Jersey, USA 08889), and/or its affiliates (which may be known outside the United States as Merck Frosst, Merck Sharp & Dohme or MSD and in Japan, as Banyu - direct contact information for affiliates is available at http://www.merck.com/contact/contacts.html) that may be confidential, proprietary copyrighted and/or legally privileged. It is intended solely for the use of the individual or entity named on this message. If you are not the intended recipient, and have received this message in error, please notify us immediately by reply e-mail and then delete it from your system. ------------------------------------------------------------------------------ From ann at soe.ucsc.edu Fri Dec 1 16:02:39 2006 From: ann at soe.ucsc.edu (Ann Zweig) Date: Fri, 01 Dec 2006 16:02:39 -0800 Subject: [Genome] Per-species conservation In-Reply-To: <9BEE7CC4462DB14997A5C8CF8F3BEB02010DE7D8@ussemx1100.merck.com> References: <9BEE7CC4462DB14997A5C8CF8F3BEB02010DE7D8@ussemx1100.merck.com> Message-ID: <4570C29F.30802@soe.ucsc.edu> Hi Archie, What you are looking for are the multiz files. You can find them on the download server here: http://hgdownload.cse.ucsc.edu/goldenPath/hg18/multiz17way/ Read about the maf file type here: http://genome.ucsc.edu/goldenPath/help/maf.html Let us know if you need more details. Regards, ---------- Ann Zweig UCSC Genome Bioinformatics Group http://genome.ucsc.edu Russell, Archie wrote: > Hi, > > Could you tell me how to get the data used to generate the > "conservation" tracks that relate human to each other species? I can > use hgWiggle to get the phastcons17way data (0 - 1 range) which seems to > be equivalent to the large portion of the "Vertebrare Multiz Alignment > and Conservation (17 species)" track, but I can't figure out how to > access the data that makes the little subtracks below this that > correspond to individual species. > > Thanks, > Archie > > Archie Russell > Rosetta Inpharmatics > 206-802-6312 > > > ------------------------------------------------------------------------------ > Notice: This e-mail message, together with any attachments, contains > information of Merck & Co., Inc. (One Merck Drive, Whitehouse Station, > New Jersey, USA 08889), and/or its affiliates (which may be known > outside the United States as Merck Frosst, Merck Sharp & Dohme or MSD > and in Japan, as Banyu - direct contact information for affiliates is > available at http://www.merck.com/contact/contacts.html) that may be > confidential, proprietary copyrighted and/or legally privileged. It is > intended solely for the use of the individual or entity named on this > message. If you are not the intended recipient, and have received this > message in error, please notify us immediately by reply e-mail and then > delete it from your system. > > ------------------------------------------------------------------------------ > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From miranda at medicine.tamhsc.edu Sat Dec 2 09:10:08 2006 From: miranda at medicine.tamhsc.edu (Rajesh Miranda) Date: Sat, 02 Dec 2006 11:10:08 -0600 Subject: [Genome] permission to use a figure in a paper Message-ID: Hello, I used the UCSC genome browser to scan for microRNA targets in a gene, and exported the image in pdf format. I'd like to adapt the image for a manuscript, and wanted to know if that was permitted, and if so, how I should cite the Genome browser site? Thanks in advance Rajesh Miranda Rajesh C. Miranda, Ph.D. Associate Professor Texas A&M University System HSC Dept. Neuroscience and Experimental Therapeutics 142C Reynolds Medical Bldg College Station, TX 77843-1114 phone: 979-862-3418 fax: 979-845-0790 http://recovery.tamu.edu/ From dominik.margraf at gmail.com Sat Dec 2 16:29:00 2006 From: dominik.margraf at gmail.com (Dominik Margraf) Date: Sun, 3 Dec 2006 13:29:00 +1300 Subject: [Genome] the meanings of upper and lower cases alphabets in 3'UTR regions of results from the galaxy browser Message-ID: <409337ba0612021629m1ee3aba0y22f877e91956fb42@mail.gmail.com> I got data from the galaxy browser from the hg17 track 3' untranslated region exons. However the sequence I get contains bases in both upper and lower cases. What do the upper case and lower case mean? Thanks! Dominik From hartera at soe.ucsc.edu Sat Dec 2 18:02:09 2006 From: hartera at soe.ucsc.edu (Rachel Harte) Date: Sat, 2 Dec 2006 18:02:09 -0800 (PST) Subject: [Genome] the meanings of upper and lower cases alphabets in 3'UTR regions of results from the galaxy browser In-Reply-To: <409337ba0612021629m1ee3aba0y22f877e91956fb42@mail.gmail.com> References: <409337ba0612021629m1ee3aba0y22f877e91956fb42@mail.gmail.com> Message-ID: Hello Dominik, We mask repeats in the genome sequence and those bases that are masked are in lower case. The sequence in upper case is unmasked sequence (no repeats identified). Repeats are found with the RepeatMasker program using repeat libraries from RepBase, and also with the Tandem Repeat Finder (TRF) program. For the repeats found by TRF, we mask out only repeats of period 12 or less. I hope that this helps you. Please let us know if you have further questions. Rachel Rachel Harte UCSC Genome Bioinformatics Group http://genome.ucsc.edu On Sun, 3 Dec 2006, Dominik Margraf wrote: > I got data from the galaxy browser from the hg17 track 3' untranslated > region exons. However the sequence I get contains bases in both upper > and lower cases. > > What do the upper case and lower case mean? > > Thanks! > > Dominik > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome > From dominik.margraf at gmail.com Sat Dec 2 18:46:00 2006 From: dominik.margraf at gmail.com (Dominik Margraf) Date: Sun, 3 Dec 2006 15:46:00 +1300 Subject: [Genome] the meanings of upper and lower cases alphabets in 3'UTR regions of results from the galaxy browser In-Reply-To: References: <409337ba0612021629m1ee3aba0y22f877e91956fb42@mail.gmail.com> Message-ID: <409337ba0612021846k1d545cc4j3f92d93f40f91332@mail.gmail.com> Hi! Here what do you mean for "repeat"? Do you mean sequence that occur (conserved) in more than one place or sequence in a entry overlap with another enty? Now I am investigating conserved motifs in non-coding RNA and I downloaded all the 3'UTR, then I found some of these sequences have some lowercase sections within some of the 3'UTRs e.g. GACAGAAATCAGTAatatttatatAGT.... Then should I ignore the lower case parts or should I just treat the lower case parts same as upper case part? Thanks! Dominik 2006/12/3, Rachel Harte : > Hello Dominik, > > We mask repeats in the genome sequence and those bases that are masked are > in lower case. The sequence in upper case is unmasked sequence (no > repeats identified). Repeats are found with the RepeatMasker program > using repeat libraries from RepBase, and also with the Tandem Repeat > Finder (TRF) program. For the repeats found by TRF, we mask out only repeats > of period 12 or less. > > I hope that this helps you. Please let us know if you have further > questions. > > Rachel > > Rachel Harte > UCSC Genome Bioinformatics Group > http://genome.ucsc.edu > > > On Sun, 3 Dec 2006, Dominik Margraf wrote: > > > I got data from the galaxy browser from the hg17 track 3' untranslated > > region exons. However the sequence I get contains bases in both upper > > and lower cases. > > > > What do the upper case and lower case mean? > > > > Thanks! > > > > Dominik > > _______________________________________________ > > Genome maillist - Genome at soe.ucsc.edu > > http://www.soe.ucsc.edu/mailman/listinfo/genome > > > From hiram at soe.ucsc.edu Sat Dec 2 19:17:38 2006 From: hiram at soe.ucsc.edu (Hiram Clawson) Date: Sat, 2 Dec 2006 19:17:38 -0800 Subject: [Genome] the meanings of upper and lower cases alphabets in 3'UTR regions of results from the galaxy browser In-Reply-To: <409337ba0612021846k1d545cc4j3f92d93f40f91332@mail.gmail.com> References: <409337ba0612021629m1ee3aba0y22f877e91956fb42@mail.gmail.com> <409337ba0612021846k1d545cc4j3f92d93f40f91332@mail.gmail.com> Message-ID: <180ee4cf5b1a39533de32d9fb0179848@soe.ucsc.edu> Good Evening Dominik: You can read about Repeat Masker and what it is marking at: http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg18&g=rmsk http://www.repeatmasker.org/ which uses RepBase: http://www.girinst.org/repbase/update/index.html Depending on the purpose of your investigation, you can ignore the repeat masking. --Hiram On 2006 Dec 02, , at 6:46 PM, Dominik Margraf wrote: > Hi! > > Here what do you mean for "repeat"? Do you mean sequence that occur > (conserved) in more than one place or sequence in a entry overlap with > another enty? > > Now I am investigating conserved motifs in non-coding RNA and I > downloaded all the 3'UTR, then I found some of these sequences have > some lowercase sections within some of the 3'UTRs e.g. > > GACAGAAATCAGTAatatttatatAGT.... > > Then should I ignore the lower case parts or should I just treat the > lower case parts same as upper case part? > > Thanks! > > Dominik > > > 2006/12/3, Rachel Harte : >> Hello Dominik, >> >> We mask repeats in the genome sequence and those bases that are >> masked are >> in lower case. The sequence in upper case is unmasked sequence (no >> repeats identified). Repeats are found with the RepeatMasker program >> using repeat libraries from RepBase, and also with the Tandem Repeat >> Finder (TRF) program. For the repeats found by TRF, we mask out only >> repeats >> of period 12 or less. >> >> I hope that this helps you. Please let us know if you have further >> questions. >> >> Rachel >> >> Rachel Harte >> UCSC Genome Bioinformatics Group >> http://genome.ucsc.edu >> >> >> On Sun, 3 Dec 2006, Dominik Margraf wrote: >> >>> I got data from the galaxy browser from the hg17 track 3' >>> untranslated >>> region exons. However the sequence I get contains bases in both >>> upper >>> and lower cases. >>> >>> What do the upper case and lower case mean? >>> >>> Thanks! >>> >>> Dominik >>> _______________________________________________ >>> Genome maillist - Genome at soe.ucsc.edu >>> http://www.soe.ucsc.edu/mailman/listinfo/genome >>> >> > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome > From donnak at soe.ucsc.edu Sat Dec 2 22:01:44 2006 From: donnak at soe.ucsc.edu (Donna Karolchik) Date: Sat, 2 Dec 2006 22:01:44 -0800 Subject: [Genome] permission to use a figure in a paper References: Message-ID: <014b01c716a0$aef3f330$6401a8c0@donnakLT> hi Rajesh, You are quite welcome to use Genome Browser images in your manuscript. You'll find citation information on our website at http://genome.ucsc.edu/cite.html (see the section entitled "Screen shots"). -Donna ----------------------------------- Donna Karolchik UCSC Genome Bioinformatics Group http://genome.ucsc.edu ----- Original Message ----- From: "Rajesh Miranda" To: Sent: Saturday, December 02, 2006 9:10 AM Subject: Re: [Genome] permission to use a figure in a paper > Hello, > I used the UCSC genome browser to scan for microRNA targets in a gene, and > exported the image in pdf format. I'd like to adapt the image for a > manuscript, and wanted to know if that was permitted, and if so, how I should > cite the Genome browser site? > Thanks in advance > Rajesh Miranda > > Rajesh C. Miranda, Ph.D. > Associate Professor > Texas A&M University System HSC > Dept. Neuroscience and Experimental Therapeutics > 142C Reynolds Medical Bldg > College Station, TX 77843-1114 > phone: 979-862-3418 > fax: 979-845-0790 > http://recovery.tamu.edu/ > > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome > From donnak at soe.ucsc.edu Sun Dec 3 09:00:02 2006 From: donnak at soe.ucsc.edu (Donna Karolchik) Date: Sun, 3 Dec 2006 09:00:02 -0800 Subject: [Genome] possible errors in web pages. References: <20061129185551.98169.qmail@web33201.mail.mud.yahoo.com> Message-ID: <001501c716fc$7cfddc90$6401a8c0@donnakLT> Hello again, Yutao, I discussed this issue with the person who generated this annotation track. It looks like he inadvertantly deleted a portion of the description text when he last edited the page, which explains why the text no longer makes sense. We should have a corrected description up on our public site sometime in the next few days. Thanks for finding and reporting this problem! -Donna ----------------------------------- Donna Karolchik UCSC Genome Bioinformatics Group http://genome.ucsc.edu ----- Original Message ----- From: "Yutao Fu" To: Sent: Wednesday, November 29, 2006 10:55 AM Subject: [Genome] possible errors in web pages. > Hi, > I was browsing this page: > http://genome.ucsc.edu/cgi-bin/hgTrackUi?hgsid=80785264&c=chrX&g=tfbsConsSites > It seems several formulae in italic fonts are misplaced or misrepresented. For > example the 2nd formula summed over j while it should be i as specified in the > text. Same problem exist for the 3rd one. The last formula is not for Z-score > calculation. > Could you please clarify on these? Thanks > Yutao Fu > Bioinformatics Program, Boston University > > > > ____________________________________________________________________________________ > Do you Yahoo!? > Everyone is raving about the all-new Yahoo! Mail beta. > http://new.mail.yahoo.com > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome > From archie_russell at merck.com Sun Dec 3 10:25:06 2006 From: archie_russell at merck.com (Russell, Archie) Date: Sun, 3 Dec 2006 10:25:06 -0800 Subject: [Genome] Conservation scores in an mRNA coordinate system In-Reply-To: <001501c716fc$7cfddc90$6401a8c0@donnakLT> Message-ID: <9BEE7CC4462DB14997A5C8CF8F3BEB02010DE7E6@ussemx1100.merck.com> Hi, I have some mRNA sequences for which I want the "conservation" scores that are used in the Conservation wiggle track. I want to conservation scores for each base of the mRNA sequence, so it seems like this would necessitate translating from genomic coordinates into mRNA coordinates. Many of these mRNAs are not in the browser currently so they would probably need to be aligned using BLAT as a first step. What would be the best way to go about this? Thanks, Archie ------------------------------------------------------------------------------ Notice: This e-mail message, together with any attachments, contains information of Merck & Co., Inc. (One Merck Drive, Whitehouse Station, New Jersey, USA 08889), and/or its affiliates (which may be known outside the United States as Merck Frosst, Merck Sharp & Dohme or MSD and in Japan, as Banyu - direct contact information for affiliates is available at http://www.merck.com/contact/contacts.html) that may be confidential, proprietary copyrighted and/or legally privileged. It is intended solely for the use of the individual or entity named on this message. If you are not the intended recipient, and have received this message in error, please notify us immediately by reply e-mail and then delete it from your system. ------------------------------------------------------------------------------ From hiram at soe.ucsc.edu Sun Dec 3 18:45:06 2006 From: hiram at soe.ucsc.edu (Hiram Clawson) Date: Sun, 3 Dec 2006 18:45:06 -0800 Subject: [Genome] Conservation scores in an mRNA coordinate system In-Reply-To: <9BEE7CC4462DB14997A5C8CF8F3BEB02010DE7E6@ussemx1100.merck.com> References: <9BEE7CC4462DB14997A5C8CF8F3BEB02010DE7E6@ussemx1100.merck.com> Message-ID: <9ffe61cd9cd85e285616eb4126124897@soe.ucsc.edu> On 2006 Dec 03, , at 10:25 AM, Russell, Archie wrote: > I have some mRNA sequences for which I want the "conservation" scores > that are used in the Conservation wiggle track. I want to > conservation > scores for each base of the mRNA sequence, so it seems like this would > necessitate translating from genomic coordinates into mRNA coordinates. > Many of these mRNAs are not in the browser currently so they would > probably need to be aligned using BLAT as a first step. > > What would be the best way to go about this? Good Evening Archie: From our genbank mRNA alignment procedure, the following sequence of alignments and filters will produce a .psl file you can load as a track. > blat -noHead -repeats=lower -ooc=hg18.ooc -q=rna -fine \ > hg18.2bit mrna.fa mrna.rawPsl > faPolyASizes mrna.fa mrna.polya > sort -k 10,10 mrna.rawPsl | pslCDnaFilter -minId=0.96 > -minCover=0.25 -localNearBest=0.005 -minQSize=20 -minNonRepSize=16 > -ignoreNs -bestOverlap -polyASizes=mrna.polya stdin mrna.psl We are curious. What's the story with mRNAs that aren't in genbank and already on our browser ? --Hiram From yaelal at md.huji.ac.il Sun Dec 3 23:00:19 2006 From: yaelal at md.huji.ac.il (Yael Altuvia) Date: Mon, 04 Dec 2006 09:00:19 +0200 Subject: [Genome] retrieving genomic information by NP_ accessions Message-ID: <4573C783.90709@md.huji.ac.il> Hi, What would be the best way to get genomic sequences (e.g. 5' utrs) for a batch query Where the query entries are accessions of protAcc either NP_ or XP_ thanks yael From hefh at genomics.org.cn Mon Dec 4 04:10:11 2006 From: hefh at genomics.org.cn (hefh) Date: Mon, 4 Dec 2006 20:10:11 +0800 Subject: [Genome] UCSC database design Message-ID: <20061204120826.35A288E850@mailgw.soe.ucsc.edu> Dear Prof., The UCSC database is very helpful and useful database for biologists to research genome, transcriptome and proteome. UCSC genome browser is also a very wonderful and very convenient tool. But it is very difficult for us to use data from UCSC database due to so many tables. Understanding relation among tables made us headache. Would you mind sending us a visible graph of database designing or relation among tables? Any help will be very appreciated! Best regards! Yours sincerely, hefh Fuhong He ***************************************************** Beijing Genomics Institute Chinese Academy of Sciences Airport Industrial Zone B10 Shunyi, Beijing 101300, P.R. China Tel: 86-10-80485492 (Office) Email: hefh at genomics.org.cn ***************************************************** From k.kok at medgen.umcg.nl Mon Dec 4 01:52:19 2006 From: k.kok at medgen.umcg.nl (Kok, K) Date: Mon, 4 Dec 2006 10:52:19 +0100 Subject: [Genome] (no subject) Message-ID: Dear reader, As a (BAC) arrayCGH "specialist", I use UCSC a lot, and also have made my own tracks. Now I have to make a decission whether ore not to adjust my home-made tracks to also display the CNV's as recently publist in Nature. Does the UCSC have any plans to add the CNV's to their list of default tracks (very much as has been done in Ensembl Cytoview). With kind regards, Klaas Kok K. Kok Postal Address: Visitors Address: UMCG UMCG Department of Genetics Oostersingel; entrance 47 P.O. box 30.001 2nd floor, room E2.046 9700RB Groningen The Netherlands tel.: +31-50-3617108 (7100) Fax: +31-50-3617230 Email: k.kok at medgen.umcg.nl klakok at hotmail.com De inhoud van dit bericht is vertrouwelijk en alleen bestemd voor de geadresseerde(n). Anderen dan de geadresseerde mogen geen gebruik maken van dit bericht, het openbaar maken of op enige wijze verspreiden of vermenigvuldigen. Het UMCG kan niet aansprakelijk gesteld worden voor een incomplete aankomst of vertraging van dit verzonden bericht. The contents of this message are confidential and only intended for the eyes of the addressee(s). Others than the addressee(s) are not allowed to use this message, to make it public or to distribute or multiply this message in any way. The UMCG cannot be held responsible for incomplete reception or delay of this transferred message. From schmidtc at udel.edu Mon Dec 4 07:52:53 2006 From: schmidtc at udel.edu (Carl Schmidt) Date: Mon, 4 Dec 2006 10:52:53 -0500 Subject: [Genome] chicken chromosome lengths Message-ID: For the chicken, why are there differences in the total size of chromosomes when I compare between UCSC and Ensembl? I suppose it has something to do with the build, but I am concerned about trying to use coordinates in from one browser to access information in the other. For example: CH1 UCSC 200,994,015 ENSEMBL 188,239,860 CH 27 UCSC:4,841,970 ENSEMBL: 2,668,888 Carl J. Schmidt Associate Professor Department of Animal & Food Sciences 051 Townsend Hall University of Delaware Newark, DE 19716-2150 schmidtc at udel.edu 302-831-1334 Fax: 302-831-2822 http://udel.edu/~schmidtc From kayla at soe.ucsc.edu Mon Dec 4 10:13:25 2006 From: kayla at soe.ucsc.edu (Kayla Smith) Date: Mon, 04 Dec 2006 10:13:25 -0800 Subject: [Genome] chicken chromosome lengths In-Reply-To: References: Message-ID: <45746545.6020709@cse.ucsc.edu> Carl, You are correct in assuming that the discrepancy you have observed is related to which build you are looking at. We actually offer two different assemblies of the Chicken Browser: galGal2 (Feb 2004) and galGal3 (May 2006). From the data you have pasted below, it looks like you were viewing our galGal3 browser (this is the default) and comparing it to Ensembl's equivalent of the galGal2 assembly. On the gateway page, http://genome.ucsc.edu/cgi-bin/hgGateway, you can toggle between galGal2 and galGal3. Each assembly has a "sequences" link which will give you information on the length of each chromosome. GalGal3: chr1: 200,994,015 chr27: 4,841,970 GalGal2: chr1: 188,239,860 chr27: 2,668,888 I hope this helps to answer your question. Please don't hesitate to contact us again if you require further assistance. Kayla Smith UCSC Genome Bioinformatics Group Carl Schmidt wrote: > For the chicken, why are there differences in the total size of > chromosomes when I compare between UCSC and Ensembl? I suppose it > has something to do with the build, but I am concerned about trying > to use coordinates in from one browser to access information in the > other. > For example: > CH1 UCSC 200,994,015 ENSEMBL 188,239,860 > CH 27 UCSC:4,841,970 ENSEMBL: 2,668,888 > > > > Carl J. Schmidt > Associate Professor > Department of Animal & Food Sciences > 051 Townsend Hall > University of Delaware > Newark, DE 19716-2150 > > schmidtc at udel.edu > 302-831-1334 > Fax: 302-831-2822 > http://udel.edu/~schmidtc > > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From kayla at soe.ucsc.edu Mon Dec 4 10:31:43 2006 From: kayla at soe.ucsc.edu (Kayla Smith) Date: Mon, 04 Dec 2006 10:31:43 -0800 Subject: [Genome] UCSC database design In-Reply-To: <20061204120826.35A288E850@mailgw.soe.ucsc.edu> References: <20061204120826.35A288E850@mailgw.soe.ucsc.edu> Message-ID: <4574698F.3020904@cse.ucsc.edu> Dear Hefh, Thank you for your compliments on our genome browser! Please see a previously answered mailing list question on the schema: http://www.cse.ucsc.edu/pipermail/genome/2002-July/001131.html On the details page for any track, there is a "view table schema" link. This takes you to a page which will show you which other tables are related to the one currently in use and via which identifier, under the section "Connected Tables and Joining Fields". In the Table Browser ("Tables" on the blue bar on the top of the main page) you can navigate through assemblies and tables to find the one(s) you are interested in knowing the relationships for. Clicking on the "describe table schema" button takes you to a page that indicates which relationships that table has. These relationships are described in the all.joiner file, which you can download here: ftp://hgdownload.cse.ucsc.edu/apache/cgi-bin/all.joiner I hope this helps to answer your question. Please don't hesitate to contact us again if you require further assistance. Kayla Smith UCSC Genome Bioinformatics Group hefh wrote: > Dear Prof., > > The UCSC database is very helpful and useful database for biologists to research genome, transcriptome and proteome. > > UCSC genome browser is also a very wonderful and very convenient tool. But it is very difficult for us to use data from UCSC database due to so many tables. > > Understanding relation among tables made us headache. Would you mind sending us a visible graph of database designing or relation among tables? > > Any help will be very appreciated! > > > > Best regards! > > > Yours sincerely, > hefh > > > Fuhong He > ***************************************************** > Beijing Genomics Institute > Chinese Academy of Sciences > > Airport Industrial Zone B10 > Shunyi, Beijing 101300, P.R. China > > Tel: 86-10-80485492 (Office) > Email: hefh at genomics.org.cn > ***************************************************** > > > ------------------------------------------------------------------------ > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From kayla at soe.ucsc.edu Mon Dec 4 10:43:31 2006 From: kayla at soe.ucsc.edu (Kayla Smith) Date: Mon, 04 Dec 2006 10:43:31 -0800 Subject: [Genome] (no subject) In-Reply-To: References: Message-ID: <45746C53.8070508@cse.ucsc.edu> Dear Klass, There is a Sructural Variants track on the human hg17 assembly which has 7 subtracks of copy number polymorphisms determined by various methods. I believe this is the track you are looking for. Here is a link to the details page for the Structural Variations track: http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg17&g=cnp I hope this is helpful to you. Please don't hesitate to contact us again if you require more assistance. Kayla Smith UCSC Genome Bioinformatics Group Kok, K wrote: > Dear reader, > > As a (BAC) arrayCGH "specialist", I use UCSC a lot, and also have made > my own tracks. Now I have to make a decission whether ore not to adjust > my home-made tracks to also display the CNV's as recently publist in > Nature. Does the UCSC have any plans to add the CNV's to their list of > default tracks (very much as has been done in Ensembl Cytoview). > > With kind regards, > > Klaas Kok > > > K. Kok > > Postal Address: Visitors Address: > UMCG UMCG > Department of Genetics Oostersingel; entrance 47 > P.O. box 30.001 2nd floor, room E2.046 > 9700RB Groningen > The Netherlands > > tel.: +31-50-3617108 (7100) > Fax: +31-50-3617230 > Email: k.kok at medgen.umcg.nl > klakok at hotmail.com > > > > De inhoud van dit bericht is vertrouwelijk en alleen bestemd voor de geadresseerde(n). Anderen dan de geadresseerde mogen geen gebruik maken van dit bericht, het openbaar maken of op enige wijze verspreiden of vermenigvuldigen. Het UMCG kan niet aansprakelijk gesteld worden voor een incomplete aankomst of vertraging van dit verzonden bericht. > > The contents of this message are confidential and only intended for the eyes of the addressee(s). Others than the addressee(s) are not allowed to use this message, to make it public or to distribute or multiply this message in any way. The UMCG cannot be held responsible for incomplete reception or delay of this transferred message. > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From archanat at soe.ucsc.edu Mon Dec 4 12:37:35 2006 From: archanat at soe.ucsc.edu (Archana Thakkapallayil) Date: Mon, 04 Dec 2006 12:37:35 -0800 Subject: [Genome] retrieving genomic information by NP_ accessions In-Reply-To: <4573C783.90709@md.huji.ac.il> References: <4573C783.90709@md.huji.ac.il> Message-ID: <4574870F.7080006@soe.ucsc.edu> Hello Yael, This task can be accomplished using the Table Browser. The 'protAcc' field is located in the 'refLink' table, but you can't get the 5' UTR sequence directly using this table. To obtain this information, first you need to find out the refGene ID's corresponding to your protein accessions and then get the 5' UTR sequence for your refGenes. To do this press the "Tables" link in the blue navigation bar across the top of the browser window. Then make the following selections in the Table Browser: clade: vertebrate genome: human assembly: Mar. 2006 group: Genes and Gene Prediction Tracks track: RefSeq Genes table: refLink click on "filter: create" button and then paste a white-space separated list of your protAcc into the textbox "protAcc does match" and then click "submit". Back on the main page, set "output format: selected fields from primary and related tables" and hit "get output" button. On this page check the box for "mrnaAcc" from the refLink table and then hit "get output". This gives you the list of refGene id's corresponding to your protein accessions. Now back on the Table Browser, select table "refGene" and region: "genome" and then paste the list of your mrna accessions using the "paste/upload" list buttons. choose "output format: sequence" and hit "get output". select "genomic" and click "submit". On this page under 'Sequence Retrieval Region Option', check the box for "5' UTR Exons" and then click "get sequence". This gives you the 5' UTR sequence for the refGene ID's corresponding to your protein accessions. I hope this information is helpful to you. Please be sure to write back if you need further instruction. Regards, Archana UCSC Genome Bioinformatics Group Yael Altuvia wrote: > Hi, > > What would be the best way to get genomic sequences (e.g. 5' utrs) for a > batch query > Where the query entries are accessions of protAcc either NP_ or XP_ > > thanks > yael > > > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome > From kiran at ccmail.uchicago.edu Mon Dec 4 12:02:21 2006 From: kiran at ccmail.uchicago.edu (Kiran Annaiah) Date: Mon, 4 Dec 2006 14:02:21 -0600 Subject: [Genome] Mapping SNP positions and IDs Message-ID: <25C9CAF629EECB41B98AB0F758AC55C517E4E3@ucccmail.uchicago.cc> Hello there, I am trying to map few of my SNP's and extract 100-200bp upstream and downstream of it. I was able to figure out how do those using your browser. I was also able to map other SNP's in the upstream and downstream regions. I am doing this so I can avoid using those regions when designing my primer. But the one drawback I saw was that (specific to my problem), in the fasta sequence file is that, it does not tell me what the rsID's of the mapped SNPs. I would also like to know the alleles (example: [a/t]) at that particular SNP location. Is there a way I can do this using the browser or atleast submit a batch query with my rsID's and obtain all the SNPs which fall within the upstream and downstream region of the sequence I need to use for designing my primers. The reason I needed the allele info [A/t] at the SNP locations is to submit it to a specific machine we use here. Any suggestions and ideas would be great. We have hundreds of SNPS to query and would be a pain to do it manually. I AM trying to see if I could write a script to automate this process. Thank you Regards Kiran Dept of Human Genetics University of Chicago 773-391-1208 From dominik.margraf at gmail.com Mon Dec 4 13:46:36 2006 From: dominik.margraf at gmail.com (Dominik Margraf) Date: Tue, 5 Dec 2006 10:46:36 +1300 Subject: [Genome] are the "minus" strand in the "gete data" results from the galaxy browser reverse-complimented? Message-ID: <409337ba0612041346q216bdb89m1a5a4190277f3c1d@mail.gmail.com> I am performing searches using the "get data" function from the Galaxy browser (the known gene track) and the results are returned in BED format, which I subsequently used to obtain the actual sequences in FASTA format. However for the sequences in the "minus" strand, are these sequences (in FASTA file) already reverse-complimented to the "plus strand"? If so, then would it be okay to directly use the BED file to intersect with other tracks (e.g. the phastcons17way track / BED) which do not have strand information? Thanks! Dominik From dominik.margraf at gmail.com Mon Dec 4 14:26:10 2006 From: dominik.margraf at gmail.com (Dominik Margraf) Date: Tue, 5 Dec 2006 11:26:10 +1300 Subject: [Genome] are the "minus" strand in the "gete data" results from the galaxy browser reverse-complimented? In-Reply-To: <409337ba0612041346q216bdb89m1a5a4190277f3c1d@mail.gmail.com> References: <409337ba0612041346q216bdb89m1a5a4190277f3c1d@mail.gmail.com> Message-ID: <409337ba0612041426pe61a88br68e6aa4d15a705a3@mail.gmail.com> I have done a BLAST search on one of the FASTA sequences in the "minus strand" and I found that it is NOT automatically reverse-complimented (it actually matches the minus strand of the DNA rather than the plus strand). However as mentioned in the last email, before I extract the FASTA sequences from the BED interval format, I intend to intersect that data with some other tasks (e.g. phastcon17way) which does not have strand info. Then in this case will the intersection automatically deal with the plus vs minus strand and is it okay just feed my BED file for processing, then convert the resultant BED to fasta and reverse compliment the negative strands of the FASTA? Thanks! Dominik 2006/12/5, Dominik Margraf : > I am performing searches using the "get data" function from the Galaxy > browser (the known gene track) and the results are returned in BED > format, which I subsequently used to obtain the actual sequences in > FASTA format. > > However for the sequences in the "minus" strand, are these sequences > (in FASTA file) already reverse-complimented to the "plus strand"? If > so, then would it be okay to directly use the BED file to intersect > with other tracks (e.g. the phastcons17way track / BED) which do not > have strand information? > > Thanks! > > Dominik > From rhead at soe.ucsc.edu Mon Dec 4 14:27:18 2006 From: rhead at soe.ucsc.edu (Brooke Rhead) Date: Mon, 04 Dec 2006 14:27:18 -0800 Subject: [Genome] custom scripting for probe design In-Reply-To: <20061130122316.AIO93598@mh1.ucr.edu> References: <20061130122316.AIO93598@mh1.ucr.edu> Message-ID: <4574A0C6.1020900@soe.ucsc.edu> Hello Richard, Sorry for taking so long to respond. I queried our staff and others associated with the Genome Browser, and I received one suggestion that may be of use to you: --- MultiAlgin will output a simple consensus sequence. However it doesn't stub in a value when an arbitrary representative is needed at a chaotic position. http://bioinfo.genopole-toulouse.prd.fr/multalin/ --- I hope this site is useful to you. Good luck in your endeavors. -- Brooke Rhead UCSC Genome Bioinformatics Group rbelc001 at student.ucr.edu wrote: > I'm in the process of making 25-mer probes for biodegradation gene identification, but I first need a custom script that can identify ortholog clusters based on several FASTA formatted sequences that are aligned; identifying conserved areas and generating a single sequence to represent the many aligned and conserved sequences. Is anyone aware of a custom script that can perform such a function? It is important to note that this is for probe design, and NOT for primer design. > Richard > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From kayla at soe.ucsc.edu Mon Dec 4 16:40:08 2006 From: kayla at soe.ucsc.edu (Kayla Smith) Date: Mon, 04 Dec 2006 16:40:08 -0800 Subject: [Genome] are the "minus" strand in the "gete data" results from the galaxy browser reverse-complimented? In-Reply-To: <409337ba0612041426pe61a88br68e6aa4d15a705a3@mail.gmail.com> References: <409337ba0612041346q216bdb89m1a5a4190277f3c1d@mail.gmail.com> <409337ba0612041426pe61a88br68e6aa4d15a705a3@mail.gmail.com> Message-ID: <4574BFE8.3050509@cse.ucsc.edu> Dominik, First, there are two ways to get sequence information from our browser. The first is putting in genomic co-ordinates and clicking the "get DNA" button. This will always give you genomic DNA, which is positive strand. The second way is through the table browser, via the position information from a positional table (in this case the knownGenes table). In this case, the data has the strand information attached to the position, and therefore, in the case of a gene on the negative strand, the sequence returned will be relative to that gene, or, in other words, the reverse complement of the genomic DNA. It appears that you have proved this to yourself from your second email. As for your second question, let me see if I understand you clearly: You have two bed files, one which has strand information and one which does not. You would like to intersect these two bed files, and you are concerned that the result of the intersection may not have strand information in it. Intersections in the table browser do not take strand information into consideration. You would get the same output whether there is strand information or not. If you are converting a BED file to FASTA format, you will get genomic sequence if the BED file does not have strand information in it. Likewise, if your BED file has strand information, you will get reverse-complemented-from-the-genomic sequence, for items marked negative. I'm not sure exactly what you will consider proper output for your data analysis, but I hope I have explained the functionality of our tools in a way that makes sense to you. Please don't hesitate to contact us again if you require more assistance with the Genome Browser. Also, if you have any Galaxy-specific questions, they have their own mailing list at galaxy-user at bx.psu.edu Kayla Smith UCSC Genome Bioinformatics Group. Dominik Margraf wrote: > I have done a BLAST search on one of the FASTA sequences in the "minus > strand" and I found that it is NOT automatically reverse-complimented > (it actually matches the minus strand of the DNA rather than the plus > strand). > > However as mentioned in the last email, before I extract the FASTA > sequences from the BED interval format, I intend to intersect that > data with some other tasks (e.g. phastcon17way) which does not have > strand info. Then in this case will the intersection automatically > deal with the plus vs minus strand and is it okay just feed my BED > file for processing, then convert the resultant BED to fasta and > reverse compliment the negative strands of the FASTA? > > Thanks! > > Dominik > > 2006/12/5, Dominik Margraf : >> I am performing searches using the "get data" function from the Galaxy >> browser (the known gene track) and the results are returned in BED >> format, which I subsequently used to obtain the actual sequences in >> FASTA format. >> >> However for the sequences in the "minus" strand, are these sequences >> (in FASTA file) already reverse-complimented to the "plus strand"? If >> so, then would it be okay to directly use the BED file to intersect >> with other tracks (e.g. the phastcons17way track / BED) which do not >> have strand information? >> >> Thanks! >> >> Dominik >> > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From archanat at soe.ucsc.edu Mon Dec 4 16:51:42 2006 From: archanat at soe.ucsc.edu (Archana Thakkapallayil) Date: Mon, 04 Dec 2006 16:51:42 -0800 Subject: [Genome] Mapping SNP positions and IDs In-Reply-To: <25C9CAF629EECB41B98AB0F758AC55C517E4E3@ucccmail.uchicago.cc> References: <25C9CAF629EECB41B98AB0F758AC55C517E4E3@ucccmail.uchicago.cc> Message-ID: <4574C29E.4000901@soe.ucsc.edu> Hello Kiran, You can extract this information using the Table Browser. The field "observed" in the 'snp126' table for hg18 assembly ( table 'snp125' for hg17 assembly ), contains information on the sequences of the observed alleles from rs-fasta files. To do this make the following selections in the Table Browser: clade: vertebrate genome: human assembly: Mar. 2006 (or choose May 2004 if that's the database you are using) group: variation and repeats track: SNPs table: snp126 (or snp125 for the May 2004 database) region: select the "genome" radio button identifiers (names/accessions): paste the list of rs numbers here, then hit "submit" output format: choose "selected fields from primary and related tables" output file: enter a file name here if you wish to save the information directly to a file, or leave this blank to view the results in the browser window Hit "get output" Now you can select the fields in which you are interested; in this case, choose "chrom", "chromStart", "chromEnd", "name" and "observed", and hit the "get output" button. This gives you the position and allele information for all your snp ID's. Be patient, this database table is quite large. You can then take this output and make BED4 custom track to get the upstream and downstream sequences. After loading your custom track, choose "Custom Tracks" as the group and "sequence" as output format in the Table Browser. Then click "get output". On this page under 'Sequence Retrieval Region Options' you can specify the number of bases upstream and downstream and then hit "get sequence". Information about creating custom tracks can be obtained here: http://genome.cse.ucsc.edu/goldenPath/help/customTrack.html More information on BED format is here: http://genome.cse.ucsc.edu/goldenPath/help/customTrack.html I hope this information is helpful to you. If this doesn't answer your question completely, please feel free to write back Regards, Archana UCSC Genome Bioinformatics Group Kiran Annaiah wrote: > Hello there, > > > > > > I am trying to map few of my SNP's and extract 100-200bp upstream and > downstream of it. > > I was able to figure out how do those using your browser. I was also > able to map other SNP's in the upstream and downstream regions. > > > > I am doing this so I can avoid using those regions when designing my > primer. > > > > > > But the one drawback I saw was that (specific to my problem), in the > fasta sequence file is that, it does not tell me what the rsID's of the > mapped SNPs. I would also like to know the alleles (example: [a/t]) at > that particular SNP location. > > > > Is there a way I can do this using the browser or atleast submit a batch > query with my rsID's and obtain all the SNPs which fall within the > upstream and downstream region of the sequence I need to use for > designing my primers. > > > > The reason I needed the allele info [A/t] at the SNP locations is to > submit it to a specific machine we use here. > > > > Any suggestions and ideas would be great. We have hundreds of SNPS to > query and would be a pain to do it manually. I AM trying to see if I > could write a script to automate this process. > > > > Thank you > > > > Regards > > Kiran > > > > Dept of Human Genetics > > University of Chicago > > > > 773-391-1208 > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome > From flicek at ebi.ac.uk Tue Dec 5 03:13:22 2006 From: flicek at ebi.ac.uk (Paul Flicek) Date: Tue, 5 Dec 2006 11:13:22 +0000 Subject: [Genome] chicken chromosome lengths In-Reply-To: <45746545.6020709@cse.ucsc.edu> References: <45746545.6020709@cse.ucsc.edu> Message-ID: <866B0E1A-BDE5-4039-A309-69413BD3645F@ebi.ac.uk> Hi Carl, As an additional point, Ensembl will be providing the new Chicken assembly in the December release. This release will be based on the WASHUC 2.1 assembly (aka galGal3) and include a new genebuild and a new variation database with dbSNP125 data. The December Ensembl release is planned for either next Wednesday or Monday 18 December. I hope this helps and please let us know if you have additional questions, Paul __ Paul Flicek, D.Sc. Ensembl Functional Genomics Project Leader EMBL-European Bioinformatics Institute Wellcome Trust Genome Campus, Hinxton Cambridge CB10 1SD, United Kingdom Tel: +44 (0) 1223 492581 Fax: +44 (0) 1223 494468 http://www.ensembl.org On 4 Dec 2006, at 18:13, Kayla Smith wrote: > Carl, > > You are correct in assuming that the discrepancy you have observed is > related to which build you are looking at. We actually offer two > different assemblies of the Chicken Browser: galGal2 (Feb 2004) and > galGal3 (May 2006). From the data you have pasted below, it looks > like > you were viewing our galGal3 browser (this is the default) and > comparing > it to Ensembl's equivalent of the galGal2 assembly. > > On the gateway page, http://genome.ucsc.edu/cgi-bin/hgGateway, you can > toggle between galGal2 and galGal3. Each assembly has a "sequences" > link which will give you information on the length of each chromosome. > > GalGal3: > chr1: 200,994,015 > chr27: 4,841,970 > > GalGal2: > chr1: 188,239,860 > chr27: 2,668,888 > > > I hope this helps to answer your question. Please don't hesitate to > contact us again if you require further assistance. > > Kayla Smith > UCSC Genome Bioinformatics Group > > > Carl Schmidt wrote: >> For the chicken, why are there differences in the total size of >> chromosomes when I compare between UCSC and Ensembl? I suppose it >> has something to do with the build, but I am concerned about trying >> to use coordinates in from one browser to access information in the >> other. >> For example: >> CH1 UCSC 200,994,015 ENSEMBL 188,239,860 >> CH 27 UCSC:4,841,970 ENSEMBL: 2,668,888 >> >> >> >> Carl J. Schmidt >> Associate Professor >> Department of Animal & Food Sciences >> 051 Townsend Hall >> University of Delaware >> Newark, DE 19716-2150 >> >> schmidtc at udel.edu >> 302-831-1334 >> Fax: 302-831-2822 >> http://udel.edu/~schmidtc >> >> >> _______________________________________________ >> Genome maillist - Genome at soe.ucsc.edu >> http://www.soe.ucsc.edu/mailman/listinfo/genome > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From Ying.Sheng at ki.se Tue Dec 5 02:15:42 2006 From: Ying.Sheng at ki.se (Ying Sheng) Date: Tue, 05 Dec 2006 11:15:42 +0100 Subject: [Genome] question about ortholog genes Message-ID: Dear Sir/Madam, I have a set of mouse genes. I plan to find the human orthologs of these genes. Is there any table in the UCSC database can help me? I notice that in each mouse "Gene Details", there is a block about "Homologous Genes in Other Species", is that the information I want and which is the related table? thanks in advance, Ying Sheng From kwong at bcgsc.ca Tue Dec 5 16:23:36 2006 From: kwong at bcgsc.ca (Kim Wong) Date: Tue, 5 Dec 2006 16:23:36 -0800 Subject: [Genome] Custom tracks on UCSC genome browser Message-ID: Hello, I was wondering if it is possible to create a custom track with blocks on the same track colored differently. Eg: different exons of a gene. Thanks, Kim Computational biologist Genome Sciences Centre, BC Cancer Agency Vancouver, BC. Canada 604.877.6000 x3265 kwong at bcgsc.bc.ca From hartera at soe.ucsc.edu Tue Dec 5 16:45:25 2006 From: hartera at soe.ucsc.edu (Rachel Harte) Date: Tue, 5 Dec 2006 16:45:25 -0800 (PST) Subject: [Genome] question about ortholog genes In-Reply-To: References: Message-ID: Dear Ying Sheng, The section of "Homologous Genes in Other Species" would be helpful in finding orthologs. The tables for each of these consist of the assembly name without the number e.g. mm for mouse, followed by BlastTab so mmBlastTab is the table of best BlastP hits for the protein sequences for mouse known genes (query) against the protein sequences for human known genes (target). The tables can be obtained from our downloads server ("Downloads" link on the blue side bar of the home page) or the Table Browser ("Tables" link on the top blue bar of many of the Genome Browser pages. The kgXref table may also be useful to you because this contains the human gene names for the Known Genes as well as other identifiers relating to each known gene. For more information, please read this answer to a similar question: http://www.cse.ucsc.edu/pipermail/genome/2006-March/010178.html I hope that this helps you. Please let us know if you have further questions. Rachel Rachel Harte UCSC Genome Bioinformatics Group http://genome.ucsc.edu On Tue, 5 Dec 2006, Ying Sheng wrote: > Dear Sir/Madam, > > I have a set of mouse genes. I plan to find the human orthologs of these > genes. Is there any table in the UCSC database can help me? I notice > that in each mouse "Gene Details", there is a block about "Homologous > Genes in Other Species", is that the information I want and which is the > related table? > > thanks in advance, > > Ying Sheng > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome > From hartera at soe.ucsc.edu Tue Dec 5 17:52:11 2006 From: hartera at soe.ucsc.edu (Rachel Harte) Date: Tue, 5 Dec 2006 17:52:11 -0800 (PST) Subject: [Genome] Custom tracks on UCSC genome browser In-Reply-To: References: Message-ID: Hello Kim, If you create a BED format file for the custom track, you can use the itemRgb field to specify a color for each item in the track. The track line itemRgb attribute must be set to "On". This will determin the color for the data in this line of the BED file. You will need to add values for the 8 fields preceding this one. The BED format is described here: http://genome.ucsc.edu/goldenPath/help/hgTracksHelp.html#BED Example 5 at the bottome of the BED lines section shows how this is done. More information about creating custom tracks and examples are here: http://genome.ucsc.edu/goldenPath/help/hgTracksHelp.html#BED If you are specifying one gene per BED file line then the color will only apply to that whole gene. In order to get exons colored differently, you will need to create a separate BED format line for each exon and specify an itemRgb value in each line. This tye of display would not show gene structure, you would just see blocks of exons with no line representing the introns to join the exon blocks together to form a gene. You could add extra lines to the BED format file to specify the intron positions and set their color to gray or black. This way, the gene structure would still be preserved. I hope that this helps you. Please let us know if you have further questions. Rachel Rachel Harte UCSC Genome Bioinformatics Group http://genome.ucsc.edu On Tue, 5 Dec 2006, Kim Wong wrote: > Hello, > > I was wondering if it is possible to create a custom track with blocks > on the same track colored differently. Eg: different exons of a gene. > > Thanks, > Kim > > > Computational biologist > Genome Sciences Centre, BC Cancer Agency > Vancouver, BC. Canada > 604.877.6000 x3265 > kwong at bcgsc.bc.ca > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome > From sandmann at embl.de Wed Dec 6 02:43:43 2006 From: sandmann at embl.de (sandmann@embl.de) Date: Wed, 6 Dec 2006 11:43:43 +0100 Subject: [Genome] Drosophila Affymetrix timecourse track ? Message-ID: <20061206114343.vzrkunk3oziocwss@webmail.embl.de> Dear UCSC brower team ! Recently, a large microarray study profiling the expression of transcripts in Drosophila melanogaster has been performed using whole-genome tiling arrays. In their publication in nature genetics, the authors (Manak et al, Biological function of unannotated transcription during the early development of Drosophila melanogaster, Nature Genetics - 38, 1151 - 1158 (2006))mention that "all RNA graphs representing data described in this manuscript [...] will be made available through the UCSC (http://genome.ucsc.edu)." Therefore I was wondering if you were planning to integrate this large dataset into the genome browser. (I have tried to import the data into our local installation of the browser as a .wig annotation file, but failed - most likely due to its enormous size (400 Mb.) The data is available for public download on a website hosted by Affymetrix.) Thanks, Thomas Sandmann Thomas Sandmann, PhD EMBL Beidelberg, Germany From ann at soe.ucsc.edu Wed Dec 6 10:14:57 2006 From: ann at soe.ucsc.edu (Ann Zweig) Date: Wed, 06 Dec 2006 10:14:57 -0800 Subject: [Genome] Drosophila Affymetrix timecourse track ? In-Reply-To: <20061206114343.vzrkunk3oziocwss@webmail.embl.de> References: <20061206114343.vzrkunk3oziocwss@webmail.embl.de> Message-ID: <457708A1.8060306@soe.ucsc.edu> Hello Thomas, Thanks for pointing this out. We will be in contact with the authors to see about displaying these data in the genome browser. Regards, ---------- Ann Zweig UCSC Genome Bioinformatics Group http://genome.ucsc.edu sandmann at embl.de wrote: > Dear UCSC brower team ! > > Recently, a large microarray study profiling the expression of transcripts in > Drosophila melanogaster has been performed using whole-genome tiling arrays. > In their publication in nature genetics, the authors (Manak et al, Biological > function of unannotated transcription during the early development of > Drosophila melanogaster, Nature Genetics - 38, 1151 - 1158 (2006))mention that > "all RNA graphs representing data described in this manuscript [...] will be > made available through the UCSC (http://genome.ucsc.edu)." Therefore I was > wondering if you were planning to integrate this large dataset into the genome > browser. (I have tried to import the data into our local installation of the > browser as a .wig annotation file, but failed - most likely due to its enormous > size (400 Mb.) The data is available for public download on a website hosted by > Affymetrix.) > > Thanks, > Thomas Sandmann > > Thomas Sandmann, PhD > EMBL Beidelberg, Germany > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From thefferon at mail.nih.gov Wed Dec 6 13:42:38 2006 From: thefferon at mail.nih.gov (Tim Hefferon) Date: Wed, 6 Dec 2006 15:42:38 -0600 Subject: [Genome] problem with ABCC6 proteome entry Message-ID: <6E17B886-227F-4638-B9E8-87B1D1393267@mail.nih.gov> Hi, There appears to be a problem with the Proteome Browser entry for the human gene, ABCC6. The PB returns [Q8TCY8 Up-regulated gene 7 (ABCC6 protein)]. This is a severely truncated isoform of the full-length protein. NCBI's EntrezGene RefSeq entry for the full-length mRNA and protein is: [NM_001171.3?NP_001162.3 ATP-binding cassette, sub-family C, member 6 isoform 1]. The protein that is currently pulled up on the PB is the other RefSeq entry, [NM_001079528.1?NP_001072996.1 ATP-binding cassette, sub- family C, member 6 isoform 2]. Please resolve this issue if possible; there is a research community that works with this gene and would greatly appreciate it! Thank you, Tim Hefferon From thefferon at mail.nih.gov Wed Dec 6 13:49:19 2006 From: thefferon at mail.nih.gov (Tim Hefferon) Date: Wed, 6 Dec 2006 15:49:19 -0600 Subject: [Genome] Fwd: problem with ABCC6 proteome entry References: <6E17B886-227F-4638-B9E8-87B1D1393267@mail.nih.gov> Message-ID: <0D52DF77-E28F-42D5-A4B6-23D3C8526E27@mail.nih.gov> PS The Uniprot ID for the full-length entry is O95255 thanks Begin forwarded message: > From: Tim Hefferon > Date: December 6, 2006 3:42:38 PM GMT-06:00 > To: genome at soe.ucsc.edu > Subject: problem with ABCC6 proteome entry > > Hi, > > There appears to be a problem with the Proteome Browser entry for > the human gene, ABCC6. The PB returns [Q8TCY8 Up-regulated gene 7 > (ABCC6 protein)]. This is a severely truncated isoform of the full- > length protein. > > NCBI's EntrezGene RefSeq entry for the full-length mRNA and protein > is: [NM_001171.3?NP_001162.3 ATP-binding cassette, sub-family C, > member 6 isoform 1]. > > The protein that is currently pulled up on the PB is the other > RefSeq entry, [NM_001079528.1?NP_001072996.1 ATP-binding cassette, > sub-family C, member 6 isoform 2]. > > Please resolve this issue if possible; there is a research > community that works with this gene and would greatly appreciate it! > > Thank you, > > Tim Hefferon > > From major at cbio.mskcc.org Wed Dec 6 14:14:09 2006 From: major at cbio.mskcc.org (John Major) Date: Wed, 06 Dec 2006 17:14:09 -0500 Subject: [Genome] Proteome Browser Data disagrees with Genome Browser? Message-ID: <457740B1.6050509@cbio.mskcc.org> Hello - I have been doing some work with CHEK2, specifically the 2 isoforms: (1) NM_001005735 and (2)NM_007194. The Genome browser view, and tables, show that : (1) should have 13 coding sequence exons, 2 exons with UTR and coding sequence mixed, and one exon of only coding sequence for a total of 16 transcribed exons, and 15 coding sequence exons. (2) should have 12 coding sequence exons, 2 exons with UTR and coding sequence mixed, and one exon of only coding sequence for a total of 15 transcribed exons and 14 coding sequence exons. My first question is why do both of these isoforms map to the same protein ID? (CHK2_HUMAN), and secondly, the canonical form is supposedly the longest form- but the proteome browser view for this protein only has up to coding exon 14 when the canonical form according to the genome view has 15 coding exons? Please advise- John Major From fanhsu at soe.ucsc.edu Wed Dec 6 14:27:25 2006 From: fanhsu at soe.ucsc.edu (Fan Hsu) Date: Wed, 6 Dec 2006 14:27:25 -0800 Subject: [Genome] problem with ABCC6 proteome entry In-Reply-To: <6E17B886-227F-4638-B9E8-87B1D1393267@mail.nih.gov> Message-ID: Hi Tim, I looked into this case and found that at the time when we created the annotation for UCSC Known Genes and UCSC Proteome Browser, the RefSeq NM_001171.1 had 99.5% identity when aligned to the base genome. Our gene-check logic raised two error flags (badFrame and noStop), thus this RefSeq did not make it to the KG candidate genes list. I noticed that the latest version of NM_001171, (NM_001171.3) has 100% identity with base genome. I expect it would be picked up by our future UCSC Known Genes update, which I anticipate to happen within next few months. Fan. -----Original Message----- From: genome-bounces at soe.ucsc.edu [mailto:genome-bounces at soe.ucsc.edu]On Behalf Of Tim Hefferon Sent: Wednesday, December 06, 2006 1:43 PM To: genome at soe.ucsc.edu Subject: [Genome] problem with ABCC6 proteome entry Hi, There appears to be a problem with the Proteome Browser entry for the human gene, ABCC6. The PB returns [Q8TCY8 Up-regulated gene 7 (ABCC6 protein)]. This is a severely truncated isoform of the full-length protein. NCBI's EntrezGene RefSeq entry for the full-length mRNA and protein is: [NM_001171.3?NP_001162.3 ATP-binding cassette, sub-family C, member 6 isoform 1]. The protein that is currently pulled up on the PB is the other RefSeq entry, [NM_001079528.1?NP_001072996.1 ATP-binding cassette, sub- family C, member 6 isoform 2]. Please resolve this issue if possible; there is a research community that works with this gene and would greatly appreciate it! Thank you, Tim Hefferon _______________________________________________ Genome maillist - Genome at soe.ucsc.edu http://www.soe.ucsc.edu/mailman/listinfo/genome -- No virus found in this incoming message. Checked by AVG Free Edition. Version: 7.1.409 / Virus Database: 268.15.11/575 - Release Date: 12/6/2006 -- No virus found in this outgoing message. Checked by AVG Free Edition. Version: 7.1.409 / Virus Database: 268.15.11/575 - Release Date: 12/6/2006 From fanhsu at soe.ucsc.edu Wed Dec 6 14:39:08 2006 From: fanhsu at soe.ucsc.edu (Fan Hsu) Date: Wed, 6 Dec 2006 14:39:08 -0800 Subject: [Genome] Proteome Browser Data disagrees with Genome Browser? In-Reply-To: <457740B1.6050509@cbio.mskcc.org> Message-ID: Hi John, The processing logic for UCSC Known Genes (KG) mapping into proteins and the UCSC Gene Sorter that determines the canonical gene of a KG cluster are independent. And currently we are aware of some problem associated with situations when the correspoding mRNA and protein for KGs are not one-to-one. So some inconsistencies are possible. I hope our next refined KG process, KG III, will address some of those problems. Fan. -----Original Message----- From: genome-bounces at soe.ucsc.edu [mailto:genome-bounces at soe.ucsc.edu]On Behalf Of John Major Sent: Wednesday, December 06, 2006 2:14 PM To: genome at soe.ucsc.edu Subject: [Genome] Proteome Browser Data disagrees with Genome Browser? Hello - I have been doing some work with CHEK2, specifically the 2 isoforms: (1) NM_001005735 and (2)NM_007194. The Genome browser view, and tables, show that : (1) should have 13 coding sequence exons, 2 exons with UTR and coding sequence mixed, and one exon of only coding sequence for a total of 16 transcribed exons, and 15 coding sequence exons. (2) should have 12 coding sequence exons, 2 exons with UTR and coding sequence mixed, and one exon of only coding sequence for a total of 15 transcribed exons and 14 coding sequence exons. My first question is why do both of these isoforms map to the same protein ID? (CHK2_HUMAN), and secondly, the canonical form is supposedly the longest form- but the proteome browser view for this protein only has up to coding exon 14 when the canonical form according to the genome view has 15 coding exons? Please advise- John Major _______________________________________________ Genome maillist - Genome at soe.ucsc.edu http://www.soe.ucsc.edu/mailman/listinfo/genome -- No virus found in this incoming message. Checked by AVG Free Edition. Version: 7.1.409 / Virus Database: 268.15.11/575 - Release Date: 12/6/2006 -- No virus found in this outgoing message. Checked by AVG Free Edition. Version: 7.1.409 / Virus Database: 268.15.11/575 - Release Date: 12/6/2006 From grisham at lifesci.ucla.edu Wed Dec 6 15:33:58 2006 From: grisham at lifesci.ucla.edu (William Grisham) Date: Wed, 6 Dec 2006 15:33:58 -0800 Subject: [Genome] Questions on gene expression data Message-ID: <88d6221b39a6ca9d78d2a1950dbafea3@lifesci.ucla.edu> Hi, I am teaching an undergraduate neuroscience class and using the UCSC Genome Browser as a tool in instruction. We are interested in the expression data. I took a class from Open Helix a couple of years ago and was led to believe that if the box for a given tissue was red, the gene was upregulated. If the box is green, it is downregulated. My question is up or down regulated relative to what? Many thanks for the UCSC Genome Browser--it has been a terrific teaching tool! Sincerely, William Grisham, Ph.D. Department of Psychology UCLA 1285 Franz Hall P.O. Box 951563 Los Angeles, CA 90095-1563 grisham at lifesci.ucla.edu (310) 825-7990 From malachig at bcgsc.ca Wed Dec 6 16:13:51 2006 From: malachig at bcgsc.ca (Malachi Griffith) Date: Wed, 6 Dec 2006 16:13:51 -0800 Subject: [Genome] Custom Track Issue Message-ID: Hi, I have a problem with the behavior of my custom tracks. I load a custom track from a web accessible location on our web server by pointing to it in a tailored URL. This works great, except that when I load a different custom track file with tracks that have different names, the old tracks remain present as blank tracks (they take up space). To get rid of these tracks I have to clear by session cookie. Is there some way to load a series of tracks from a file and explicitly force the browser to show only these tracks, not every track I have viewed during the current session?? Cheers, Malachi Griffith Genome Sciences Centre Vancouver, BC Canada From jiang.qian at jhmi.edu Wed Dec 6 23:31:32 2006 From: jiang.qian at jhmi.edu (Jiang Qian) Date: Wed, 06 Dec 2006 23:31:32 -0800 Subject: [Genome] spliced and unspliced Message-ID: <4577C354.3060103@jhmi.edu> Hi, From the genome browser, I founded that the tracks for ESTs are grouped into "ESTs have been spliced" and "ESTs including unspliced". How did you decide which group an EST belongs to? Also, which downloadable file includes the flag for spliced or unspliced? I downloaded all_est.txt, but cannot see which ESTs are spliced or unspliced? Thanks, -Jiang From twofaiths at ibms.sinica.edu.tw Wed Dec 6 23:52:22 2006 From: twofaiths at ibms.sinica.edu.tw (spi_imbs) Date: Thu, 7 Dec 2006 15:52:22 +0800 Subject: [Genome] Question about table "knownGeene" Message-ID: <000601c719d4$a10a0ca0$4400a8c0@spiX41> Dear Sir, I download the table "knownGeene" of "Known Genes" tarck (Human assembly Mar. 2006). I want to know the location information in these fields (ex: txStart, txEnd). dose the value start with 0 ? or with 1? Thanks! Sylvia From Pieter.Mestdagh at UGent.be Thu Dec 7 04:31:07 2006 From: Pieter.Mestdagh at UGent.be (Pieter Mestdagh) Date: Thu, 07 Dec 2006 13:31:07 +0100 Subject: [Genome] question Message-ID: <4578098B.8040702@UGent.be> Dear, I would like to extract the promotor regions from a gene list with unique identifiers (f.i. ensembl transcript id's) that are conserved among species, hereby maintaining the same identifier for each set of conserved sequences. Could you tell me the best way to tackle this problem? Kind regards, -- ir. Pieter Mestdagh Center for Medical Genetics Ghent (CMGG) Ghent University Hospital Medical Research Building (MRB), 2nd floor, room 120.055 De Pintelaan 185, B-9000 Ghent, Belgium +32 9 240 6979 (phone) | +32 9 240 6549 (fax) http://medgen.ugent.be Pieter.Mestdagh at UGent.be From major at cbio.mskcc.org Thu Dec 7 09:50:57 2006 From: major at cbio.mskcc.org (John Major) Date: Thu, 07 Dec 2006 12:50:57 -0500 Subject: [Genome] CCDS track question Message-ID: <45785481.40303@cbio.mskcc.org> Hello- I'd like to know if the CCDS track info for hg17 has been updated since it was first built for UCSC? Specifically, has it been updated for hg17 since 3/3/05 ? Thanks, John Major From Alal.Eran at childrens.harvard.edu Thu Dec 7 08:15:56 2006 From: Alal.Eran at childrens.harvard.edu (Eran, Alal) Date: Thu, 7 Dec 2006 11:15:56 -0500 Subject: [Genome] RYR3 on chr4 Message-ID: Houston I have a new problem: When I look at the known genes track hg18 chr4:37,000,000-37,584,105 there is a gene called RYR3. If I click on it its description it says Description: Hypothetical protein FLJ36345. Alternate Gene Symbols: AX748249, NM_001036 Representative mRNA: AK093664 Protein: Q8N212 It has nothing to do with RYR3, which is located on chr15. Where did this annotation come from? Thanks for your help, Ally From brownjd at mail.nih.gov Thu Dec 7 09:23:36 2006 From: brownjd at mail.nih.gov (Jacob Brown) Date: Thu, 7 Dec 2006 12:23:36 -0500 Subject: [Genome] Batch retrieval using gene symbols Message-ID: <80D20E5A-8CCC-435E-B0C1-56B5CBE1708F@mail.nih.gov> Hello, I am just trying to retrieve promoter regions for ~300 genes using the table browser. I am pasting a list of gene symbols in the identifier box and using the known genes track. When I ask for sequence in output it does not find any hits. Suggestions? Many thanks, Jacob Brown National Eye Institute From rhead at soe.ucsc.edu Thu Dec 7 10:27:34 2006 From: rhead at soe.ucsc.edu (Brooke Rhead) Date: Thu, 07 Dec 2006 10:27:34 -0800 Subject: [Genome] Question about table "knownGeene" In-Reply-To: <000601c719d4$a10a0ca0$4400a8c0@spiX41> References: <000601c719d4$a10a0ca0$4400a8c0@spiX41> Message-ID: <45785D16.4000900@soe.ucsc.edu> Hi Sylvia, You can find descriptions of tables and their individual fields in the Table Browser. Click on the "Tables" link in the blue bar at the top of our website. Then make the following selections: clade: Vertebrate genome: Human assembly: Mar. 2006 group: Genes and Gene Prediction Tracks track: Known Genes table: knownGene Click on the "describe table schema" button to get information about the table. All of our table values have a 0-based start. (The data is displayed with a 1-based start in the Genome Browser). See more information about that here: http://genome.ucsc.edu/FAQ/FAQtracks#tracks1 I hope this information helps. -- Brooke Rhead UCSC Genome Bioinformatics Group spi_imbs wrote: > Dear Sir, > > I download the table "knownGeene" of "Known Genes" tarck (Human assembly Mar. 2006). > I want to know the location information in these fields (ex: txStart, txEnd). > dose the value start with 0 ? or with 1? > Thanks! > Sylvia > > > ------------------------------------------------------------------------ > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From tara.lauriat at mssm.edu Thu Dec 7 10:26:29 2006 From: tara.lauriat at mssm.edu (Tara Lauriat) Date: Thu, 07 Dec 2006 13:26:29 -0500 Subject: [Genome] question about output Message-ID: I have taken the human exons and flanking sequence for my gene of interest and done a BLAT against the mouse genome to look at conservation. Most of the exons had a nice conservation over the exon as one would expect. However, some like the attached file had strange profiles where part of the exon was not conserved. I was wondering if you know how to interpret this type of result. Sincerely, Tara Lauriat From rhead at soe.ucsc.edu Thu Dec 7 10:42:01 2006 From: rhead at soe.ucsc.edu (Brooke Rhead) Date: Thu, 07 Dec 2006 10:42:01 -0800 Subject: [Genome] CCDS track question In-Reply-To: <45785481.40303@cbio.mskcc.org> References: <45785481.40303@cbio.mskcc.org> Message-ID: <45786079.50708@soe.ucsc.edu> Hello John, Yes, the hg17 CCDS track was most recently updated with data from September 20, 2006. This information is located in our Release Log. To get to it, go to our home page and click on "Release Log" in the light blue bar on the left-hand side of the page. Click on "hg17" and scroll down until you see information about CCDS. Or just click on this link: http://genome.ucsc.edu/goldenPath/releaseLog.html#hg17 If you have any further questions, please do not hesitate to contact us again. -- Brooke Rhead UCSC Genome Bioinformatics Group John Major wrote: > Hello- > > I'd like to know if the CCDS track info for hg17 has been updated since > it was first built for UCSC? > Specifically, has it been updated for hg17 since 3/3/05 ? > > Thanks, > John Major > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From rthurman at u.washington.edu Thu Dec 7 11:32:24 2006 From: rthurman at u.washington.edu (Bob Thurman) Date: Thu, 07 Dec 2006 11:32:24 -0800 Subject: [Genome] coloring custom wiggle tracks Message-ID: <45786C48.2040807@u.washington.edu> Hi, I have a question about coloring custom wiggle tracks. I have two wiggle tracks, one of which I am coloring with the 'color=' and 'altColor=' options on the track line. And what I'd like to do is color the second track using the same color scheme as the first. I.e., I'd like the 2nd track to be green exactly when the first track is, and to switch to red exactly when the first track does. Is this possible? I found the message below, which mentions the use of wigColorBy, but I'm not sure that this is applicable to my situation, and any way, I can't get it to work (when I add 'wigColorBy=track1' in the second track's track line, I get the message "track1 not found", even though the first track has 'name=track1'). Another option I tried was using the 'itemRgb' option on each data line of the second track, but apparently that is not available for wiggle track. Thanks for your help. Bob -- =========================================================== Bob Thurman, Ph.D. Research Scientist Division of Medical Genetics Noble Lab J-205 Health Sciences Building Box 357720 University of Washington Seattle, WA 98195-7720 206-543-8916 =========================================================== From kayla at soe.ucsc.edu Thu Dec 7 11:37:42 2006 From: kayla at soe.ucsc.edu (Kayla Smith) Date: Thu, 07 Dec 2006 11:37:42 -0800 Subject: [Genome] Questions on gene expression data In-Reply-To: <88d6221b39a6ca9d78d2a1950dbafea3@lifesci.ucla.edu> References: <88d6221b39a6ca9d78d2a1950dbafea3@lifesci.ucla.edu> Message-ID: <45786D86.8030108@cse.ucsc.edu> William, Thank you for your compliments on the Genome Browser. The most popular expression data on the Genome Browser is the GNF Atlas 2 expression data. This data represents 79 human tissues, 61 mouse tissues or 44 rat tissues run on affymetrix arrays. I'll give you some insight into this data set. There are two ways to browse this expression data on our website. One is to turn on the gnfAtlas2 track in the browser. This will let you see expression values for various tissues on genomic intervals of your choice. The other is to use the Gene Sorter (there is a link for "Gene Sorter" on the blue bar on the top of the main page) which shows you expression values for tissues for each gene. One particularly useful feature of the Gene Sorter is that you can sort by expression pattern. On the top of the Gene Sorter's gnfAtlas2 column, you can click where you see the names of the tissues. This will take you to more information on the gnfAtlas2 data (You can also click on the top of the column for other types of expression data, such as Gladstone, GNF U74A, arbeitman, etc.). Here is the link (for human) for your convenience: http://genome.ucsc.edu/cgi-bin/hgNear?near.do.colInfo=gnfHumanAtlas2 In the "Methods" section you'll see: "When calculating expression ratios, the overall expression level in the denominator was calculated by first taking the median of replicates for each non-cancerous tissue, and then taking the median of these medians." That means that up- and down-regulation of a gene in a tissue is relative to the the median of other tissues. For example a red square in the "heart" column of the gene sorter for the gene "ACTC" means that this gene is up-regulated in heart compared to how it is expressed in the other tissues. You are correct that the data is displayed such that green represents downregulation and red represents upregulation. I hope this is helpful to you. Please don't hesitate to contact us again if you require more assistance. Kayla Smith UCSC Genome Bioinformatics Group William Grisham wrote: > Hi, > > I am teaching an undergraduate neuroscience class and using the UCSC > Genome Browser as a tool in instruction. > > We are interested in the expression data. I took a class from Open > Helix a couple of years ago and was led to believe that if the box for > a given tissue was red, the gene was upregulated. If the box is green, > it is downregulated. > > My question is up or down regulated relative to what? > > Many thanks for the UCSC Genome Browser--it has been a terrific > teaching tool! > > Sincerely, > > William Grisham, Ph.D. > Department of Psychology > UCLA > 1285 Franz Hall > P.O. Box 951563 > Los Angeles, CA 90095-1563 > > grisham at lifesci.ucla.edu > (310) 825-7990 > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From rthurman at u.washington.edu Thu Dec 7 11:39:38 2006 From: rthurman at u.washington.edu (Bob Thurman) Date: Thu, 07 Dec 2006 11:39:38 -0800 Subject: [Genome] coloring custom wiggle tracks In-Reply-To: <45786C48.2040807@u.washington.edu> References: <45786C48.2040807@u.washington.edu> Message-ID: <45786DFA.7060801@u.washington.edu> Oops, here's that message about using wigColorBy (that I couldn't apply to my situation) -- -----Original Message----- From: Hiram Clawson [mailto:hiram at soe.ucsc.edu ] Sent: Monday, April 10, 2006 2:16 PM To: Ma, Jin Cc: 'genome at soe.ucsc.edu ' Subject: Re: [Genome] Suggestion on coloring the wiggle track Good Morning Jin: You can get the wiggle track to color itself based on the contents of a different track. In the trackDb entry for the wiggle track you want to color, place the declaration: wigColorBy The wiggle will now color itself in the same manner as the other track is colored. The other track is some kind of simple bed track. If you create your other bed track item colors in a color scheme that indicates your level of significance, then the wiggle track will indicate those colors. --Hiram Bob Thurman wrote: > Hi, > > I have a question about coloring custom wiggle tracks. I have two > wiggle tracks, one of which I am coloring with the 'color=' and > 'altColor=' options on the track line. And what I'd like to do is color > the second track using the same color scheme as the first. I.e., I'd > like the 2nd track to be green exactly when the first track is, and to > switch to red exactly when the first track does. Is this possible? > > I found the message below, which mentions the use of wigColorBy, but I'm > not sure that this is applicable to my situation, and any way, I can't > get it to work (when I add 'wigColorBy=track1' in the second track's > track line, I get the message "track1 not found", even though the first > track has 'name=track1'). > > Another option I tried was using the 'itemRgb' option on each data line > of the second track, but apparently that is not available for wiggle track. > > Thanks for your help. > > Bob > > -- =========================================================== Bob Thurman, Ph.D. Research Scientist Division of Medical Genetics Noble Lab J-205 Health Sciences Building Box 357720 University of Washington Seattle, WA 98195-7720 206-543-8916 =========================================================== From Johanne.Duhaime at ircm.qc.ca Thu Dec 7 11:34:29 2006 From: Johanne.Duhaime at ircm.qc.ca (Duhaime Johanne) Date: Thu, 7 Dec 2006 14:34:29 -0500 Subject: [Genome] Not all identities returned by blat Message-ID: <96C071542ED58D49BC08210D3456D5802EBDF9@pandore.ircm.priv> Hello I am trying to find all perfect matches of 45-55 oligos on the Saccharomyces cerevisiae. I would like to use blat instead of megablast because of the speed since I have a lot of oligos. A blast with a certain sequence will return 20 matches 100% identity on the full oligo lenght but with blat I get 18. And this happens for quite a lot of sequences. I always get less matches with blat. I am struggling with the parameters but I cannot get the same results. Ex: TAGTCGCACTAGTCCTGACGTTGATGCTGGCAGTGGTAGTAGCACT blast=20 blat=18 I am using the following : /apps/programs/bin/blat x11.seq ../../../sgd/all.fsa -out=blast8 x11.blat I have tried to change tileSize an stepSIze without success. The matches for the above sequence that do not appear in the blat result are in the neighborhood of another match. Database, blast and blat are installed locally. The megablast command used: /apps/programs/sources/blast-2.2.15/bin/megablast -d ../../../sgd/all -i x11.seq -o x11.blast -m 8 -F f -p 0 Is there a parameter that will conciliate the result or say differently how can I have all the perfect matches with blat? Thank you very much in advance Johanne Duhaime From rhead at soe.ucsc.edu Thu Dec 7 11:55:36 2006 From: rhead at soe.ucsc.edu (Brooke Rhead) Date: Thu, 07 Dec 2006 11:55:36 -0800 Subject: [Genome] spliced and unspliced In-Reply-To: <4577C354.3060103@jhmi.edu> References: <4577C354.3060103@jhmi.edu> Message-ID: <457871B8.3010002@soe.ucsc.edu> Hello Jiang, To find information about any track, click on the "mini-button" for that track -- the thin, vertical, blue or gray box at the far left of the track in the Genome Browser display. (Alternatively, you can click on the blue title of the track in the track controls section -- in this case, under "mRNA and EST Tracks", click on "Spliced ESTs" and "Human ESTs", assuming you're looking at the human browser.) Here you will find a description of the track. The "Spliced ESTs" track description contains this information: "To be considered spliced, an EST must show evidence of at least one canonical intron, i.e. the genomic sequence between EST alignment blocks must be at least 32 bases in length and have GT/AG ends." There is not a table of ESTs that includes information about whether each EST is spliced or not. The 'intronEst' table contains the spliced ESTs, and it is a subset of the 'all_est' table. I hope this information helps. If you have further questions, please do not hesitate to contact us again. -- Brooke Rhead UCSC Genome Bioinformatics Group Jiang Qian wrote: > > Hi, > > From the genome browser, I founded that the tracks for ESTs are grouped > into "ESTs have been spliced" and "ESTs including unspliced". > How did you decide which group an EST belongs to? Also, which > downloadable file includes the flag for spliced or unspliced? > I downloaded all_est.txt, but cannot see which ESTs are spliced or > unspliced? > Thanks, > > -Jiang > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From fanhsu at soe.ucsc.edu Thu Dec 7 12:18:48 2006 From: fanhsu at soe.ucsc.edu (Fan Hsu) Date: Thu, 7 Dec 2006 12:18:48 -0800 Subject: [Genome] RYR3 on chr4 In-Reply-To: Message-ID: Hi Ally, You brought up a difficult case that took me a while to figure out what happened. We have a relative complex logic to assign gene symbols for Known Genes (KG). 1. If a KG's representative mRNA is a RefSeq, it will just assign the gene symbol of this RefSeq as the KG's gene symbol. 2. Otherwise, it will look for a gene symbol in UniProt database using the representative protein. 3. If the above fails, we will try to find if the representative mRNA has an associated RefSeq. If it does, we will use the gene symbol of the RefSeq as the KG's gene symbol. 4. If all the above fail, then we just use the mRNA as the gene symbol. In your particular RYR3 case, the KG AK093664 is represented by mRNA AK093664 and UniProt protein Q8N212. And it just happens that the step 3 logic above kicked in because from Entrez, AK093664 is listed as one of the related sequences of RYR3 gene (NM_001036). See http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gene&cmd=Retrieve&dopt=full _report&list_uids=6263 So my current conclusion is that may be there is a problem in Entrez on listing AK093664 as one of related sequences to RYR3. Or we might consider dropping/refining the step 3 logic above in our future KG build process to avoid the situations like what you just reported. Fan. -----Original Message----- From: genome-bounces at soe.ucsc.edu [mailto:genome-bounces at soe.ucsc.edu]On Behalf Of Eran, Alal Sent: Thursday, December 07, 2006 8:16 AM To: genome at soe.ucsc.edu Subject: [Genome] RYR3 on chr4 Houston I have a new problem: When I look at the known genes track hg18 chr4:37,000,000-37,584,105 there is a gene called RYR3. If I click on it its description it says Description: Hypothetical protein FLJ36345. Alternate Gene Symbols: AX748249, NM_001036 Representative mRNA: AK093664 Protein: Q8N212 It has nothing to do with RYR3, which is located on chr15. Where did this annotation come from? Thanks for your help, Ally _______________________________________________ Genome maillist - Genome at soe.ucsc.edu http://www.soe.ucsc.edu/mailman/listinfo/genome From rhead at soe.ucsc.edu Thu Dec 7 12:46:06 2006 From: rhead at soe.ucsc.edu (Brooke Rhead) Date: Thu, 07 Dec 2006 12:46:06 -0800 Subject: [Genome] Custom Track Issue In-Reply-To: References: Message-ID: <45787D8E.7080403@soe.ucsc.edu> Hello Malachi, To remove existing custom tracks when loading from a URL, add "&ctfile_db=" to the URL, as described in this answer to a previously asked question: http://www.cse.ucsc.edu/pipermail/genome/2006-November/012103.html To hide all annotation tracks except for your custom tracks, use the "browser hide all" line in your custom track data file, as described in the Custom Track User's Guide, here: http://genome.ucsc.edu/goldenPath/help/customTrack.html#BROWSER I hope this information helps. Please let us know if you have any further questions. -- Brooke Rhead UCSC Genome Bioinformatics Group Malachi Griffith wrote: > Hi, > > I have a problem with the behavior of my custom tracks. I load a custom > track from a web accessible location on our web server by pointing to it > in a tailored URL. > > This works great, except that when I load a different custom track file > with tracks that have different names, the old tracks remain present as > blank tracks (they take up space). To get rid of these tracks I have to > clear by session cookie. > > Is there some way to load a series of tracks from a file and explicitly > force the browser to show only these tracks, not every track I have > viewed during the current session?? > > Cheers, > > Malachi Griffith > Genome Sciences Centre > Vancouver, BC > Canada > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From thefferon at mail.nih.gov Thu Dec 7 13:45:45 2006 From: thefferon at mail.nih.gov (Tim Hefferon) Date: Thu, 7 Dec 2006 15:45:45 -0600 Subject: [Genome] save all custom tracks in a single file, please Message-ID: <879466B1-F2C0-4F12-924D-71BADE73EE5E@mail.nih.gov> Hi, Can you pretty-please make a feature whereby a user can save a file that contains all of the custom tracks that (s)he is working with? I work with many custom tracks at once, and generate new ones from them, and I wind up losing them all if I don't save each one separately, or cut and paste them one at a time into a new file. This feature would be hugely useful. Thanks for listening. Thanks, Tim From hiram at soe.ucsc.edu Thu Dec 7 15:44:54 2006 From: hiram at soe.ucsc.edu (Hiram Clawson) Date: Thu, 7 Dec 2006 15:44:54 -0800 Subject: [Genome] coloring custom wiggle tracks In-Reply-To: <45786DFA.7060801@u.washington.edu> References: <45786C48.2040807@u.washington.edu> <45786DFA.7060801@u.washington.edu> Message-ID: <222370c485b7b64a14057bdafc374f0f@soe.ucsc.edu> Good Afternoon Bob: The wigColorBy option works if the named track used for the color reference is a bed track. You instead have two wiggle tracks. What you will need to do is take the wiggle track that you want to be a reference and create a bed track out of it that has the two colors. Aren't the two colors simply the plus and minus values of the wiggle track ? You can threshold these values in the table browser and produce a bed track for the positives, and a bed track for the negatives. Set the positive bed items to be on the positive strand (color) and the negative bed items on the negative strand (altColor). Put these twos sets of bed items together into a single track. Now use that track as your wigColorBy reference. --Hiram On 2006 Dec 07, , at 11:39 AM, Bob Thurman wrote: > Oops, here's that message about using wigColorBy (that I couldn't apply > to my situation) -- > > -----Original Message----- > From: Hiram Clawson [mailto:hiram at soe.ucsc.edu > ] > Sent: Monday, April 10, 2006 2:16 PM > To: Ma, Jin > Cc: 'genome at soe.ucsc.edu > ' > Subject: Re: [Genome] Suggestion on coloring the wiggle track > > > Good Morning Jin: > > You can get the wiggle track to color itself based on the contents > of a different track. In the trackDb entry for the wiggle track you > want to color, place the declaration: > wigColorBy > > The wiggle will now color itself in the same manner as the other > track is colored. The other track is some kind of simple bed track. > If you create your other bed track item colors in a color scheme > that indicates your level of significance, then the wiggle track > will indicate those colors. > > --Hiram > > > > Bob Thurman wrote: >> Hi, >> >> I have a question about coloring custom wiggle tracks. I have two >> wiggle tracks, one of which I am coloring with the 'color=' and >> 'altColor=' options on the track line. And what I'd like to do is >> color >> the second track using the same color scheme as the first. I.e., I'd >> like the 2nd track to be green exactly when the first track is, and to >> switch to red exactly when the first track does. Is this possible? >> >> I found the message below, which mentions the use of wigColorBy, but >> I'm >> not sure that this is applicable to my situation, and any way, I can't >> get it to work (when I add 'wigColorBy=track1' in the second track's >> track line, I get the message "track1 not found", even though the >> first >> track has 'name=track1'). >> >> Another option I tried was using the 'itemRgb' option on each data >> line >> of the second track, but apparently that is not available for wiggle >> track. >> >> Thanks for your help. >> >> Bob >> >> > > -- > =========================================================== > Bob Thurman, Ph.D. Research Scientist > Division of Medical Genetics > Noble Lab > J-205 Health Sciences Building > Box 357720 > University of Washington > Seattle, WA 98195-7720 206-543-8916 > =========================================================== From thefferon at mail.nih.gov Fri Dec 8 07:35:34 2006 From: thefferon at mail.nih.gov (Tim Hefferon) Date: Fri, 8 Dec 2006 09:35:34 -0600 Subject: [Genome] GB save function? Message-ID: <29CDA781-9087-4D12-9967-945EB2305128@mail.nih.gov> Hi, In the interests of improving the Genome Browser's function and utility.... I spend a lot of time creating particular profiles of track settings - tracks of interest on or off, specific custom tracks, specific genome coordinates, etc. Once I have spent time on a view and have everything the way I want it, when I close my browser or go home for the day, I lose all those settings. (I know that cookies, etc. remember my last settings, but that only saves one view, and isn't very good at recreating coordinates.) There is no way (that I know of) to save a particular customized view, so all the time I spent customizing the view is wasted. Could you please consider implementing a Save function that can store (either on my local machine or on the server) all the parameters for custumized genome browser shots? Ideally this would include all custom tracks currently loaded, all other track settings, genome coordinates, etc. That way I could load up a view I have spent time creating and wouldn't have to recreate it each time I want to investigate a particular research question. This would be really helpful. Thanks for considering! Tim From thefferon at mail.nih.gov Fri Dec 8 07:37:18 2006 From: thefferon at mail.nih.gov (Tim Hefferon) Date: Fri, 8 Dec 2006 09:37:18 -0600 Subject: [Genome] Fwd: GB save function? References: <29CDA781-9087-4D12-9967-945EB2305128@mail.nih.gov> Message-ID: <7A6E5BB5-DC28-4150-B014-A38A4DFD70C2@mail.nih.gov> PS I know I can save the screenshot as a PostScript or PDF, but that doesn't make it interactive - i.e., I can't continue to work with it in the genome browser. From donnak at soe.ucsc.edu Fri Dec 8 08:25:03 2006 From: donnak at soe.ucsc.edu (Donna Karolchik) Date: Fri, 8 Dec 2006 08:25:03 -0800 Subject: [Genome] GB save function? References: <29CDA781-9087-4D12-9967-945EB2305128@mail.nih.gov> Message-ID: <003001c71ae5$6d4463e0$d6af8640@donnakLT> hi Tim, We are on the verge of releasing a session management utility that will allow users to save track sessions and switch between sessions. It should do most of what you're asking below. Barring problems, this should be out by the end of the month. -Donna ----------------------------------- Donna Karolchik UCSC Genome Bioinformatics Group http://genome.ucsc.edu ----- Original Message ----- From: "Tim Hefferon" To: Sent: Friday, December 08, 2006 7:35 AM Subject: [Genome] GB save function? > Hi, > > In the interests of improving the Genome Browser's function and > utility.... > > I spend a lot of time creating particular profiles of track settings > - tracks of interest on or off, specific custom tracks, specific > genome coordinates, etc. Once I have spent time on a view and have > everything the way I want it, when I close my browser or go home for > the day, I lose all those settings. (I know that cookies, etc. > remember my last settings, but that only saves one view, and isn't > very good at recreating coordinates.) There is no way (that I know > of) to save a particular customized view, so all the time I spent > customizing the view is wasted. > > Could you please consider implementing a Save function that can store > (either on my local machine or on the server) all the parameters for > custumized genome browser shots? Ideally this would include all > custom tracks currently loaded, all other track settings, genome > coordinates, etc. That way I could load up a view I have spent time > creating and wouldn't have to recreate it each time I want to > investigate a particular research question. > > This would be really helpful. > > Thanks for considering! > > Tim > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome > From archanat at soe.ucsc.edu Fri Dec 8 09:29:50 2006 From: archanat at soe.ucsc.edu (Archana Thakkapallayil) Date: Fri, 08 Dec 2006 09:29:50 -0800 Subject: [Genome] Batch retrieval using gene symbols In-Reply-To: <80D20E5A-8CCC-435E-B0C1-56B5CBE1708F@mail.nih.gov> References: <80D20E5A-8CCC-435E-B0C1-56B5CBE1708F@mail.nih.gov> Message-ID: <4579A10E.5000405@soe.ucsc.edu> Hello Jacob, The geneSymbol field is located in the kgXref table under the Known Genes track. Unfortunately, you cannot paste/upload gene symbols as identifiers to search the kgXref table. You can only search for identifiers in the primary field of the table which in this case is the kgID (the accession). Also, you can't get the sequence information directly using this table. However, you could get the information that you are looking for by retrieving information from the kgXref and the knownGene tables using the Table Browser. This is going to be a two-step process. First, you need to find out the known gene ID's corresponding to your gene symbol's using the kgXref table and then use your list of known gene ID's to extract the promotor/upstream regions that you are interested, using the knownGene table. To get to it, make the following selections in the Table Browser: clade: vertebrate genome: human assembly: Mar. 2006 group: Genes and Gene Prediction Tracks track: Known Genes table: kgXref click on "filter: create" button and then paste a white-space separated list of your gene symbols into the geneSymbol text box and then click "submit" Then choose "output format: selected fields from primary and related tables" and hit "get output" On the Select Fields page, check the kgID field from the kgXref table and then hit "get output". This gives you the list of known gene ID's corresponding to your gene symbol's. Now back on the Table Browser page, choose "table: knownGene" and "region:genome". Then paste your list of known gene ID's in the paste/upload list box and hit "submit" button. Then choose "output format: sequence" and hit "get output". Select "genomic" and hit "submit". Under "Sequence Retrieval Region Options" check only the box "Promotor/Upstream by -- bases". You can specify the number of bases you are interested in the text box here and then hit "get sequence". In case,if you are looking for the actual promotor regions for your genes, we do have a couple of tracks on h18 that may help: the 'FirstEF' track, the TFBS Conserved track and the 'Reg Potential 7 species' track. The FirstEf track predicts exon, promoter and CpG window. The 'Reg Potential 7 species' track predicts regulatory regions. The TFBS track contains the location and score of transcription factor binding sites conserved in the human/mouse/rat alignment. I hope that this helps you. Please let us know if you have further questions. Regards, Archana UCSC Genome Bioinformatics Group Jacob Brown wrote: > Hello, > > I am just trying to retrieve promoter regions for ~300 genes using > the table browser. I am pasting a list of gene symbols in the > identifier box and using the known genes track. When I ask for > sequence in output it does not find any hits. Suggestions? > > Many thanks, > > Jacob Brown > National Eye Institute > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome > From g39428011 at ym.edu.tw Fri Dec 8 10:10:45 2006 From: g39428011 at ym.edu.tw (=?big5?B?svq10w==?=) Date: Sat, 9 Dec 2006 02:10:45 +0800 Subject: [Genome] Question about rs17182545 Message-ID: <001501c71af4$2fae6850$8b00a9c0@spiX41> Dear Sir: When I check the flanking sequence around a SNP (rs17182545, location: chr21:46,528,767) using the Genome Browser, the flanking seq is "AGTCCTTTCA T TAGGCTGGGA", and the SNP is "T". Yet, when I link to the SNP hyperlink on the Genome Browser page, the the observed nt should be A/G(reference allele: A), and the dbSNP flanking sequence is "TCCCAGCCTA A/G TGAAAGGACT". Those information about rs17182545 is obvious different. Could you explain why? Thanks for your help, and look forward to your response. Sylvia IBMS SINICA, TAIWAN From ann at soe.ucsc.edu Fri Dec 8 12:08:49 2006 From: ann at soe.ucsc.edu (Ann Zweig) Date: Fri, 08 Dec 2006 12:08:49 -0800 Subject: [Genome] Question about rs17182545 In-Reply-To: <001501c71af4$2fae6850$8b00a9c0@spiX41> References: <001501c71af4$2fae6850$8b00a9c0@spiX41> Message-ID: <4579C651.5080107@soe.ucsc.edu> Hello Sylvia, That SNP is located on the minus strand. So, the reference genome is actually an "A" in that location, not a "T". Strand: - Observed: A/G Reference allele: A Regards, ---------- Ann Zweig UCSC Genome Bioinformatics Group http://genome.ucsc.edu ²úµÓ wrote: > Dear Sir: > > When I check the flanking sequence around a SNP (rs17182545, location: chr21:46,528,767) using the Genome Browser, > the flanking seq is "AGTCCTTTCA T TAGGCTGGGA", and the SNP is "T". Yet, when I link to the SNP hyperlink on > the Genome Browser page, the the observed nt should be A/G(reference allele: A), and the dbSNP flanking sequence is > "TCCCAGCCTA A/G TGAAAGGACT". > > Those information about rs17182545 is obvious different. Could you explain why? > Thanks for your help, and look forward to your response. > > Sylvia > IBMS SINICA, TAIWAN > > > ------------------------------------------------------------------------ > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From rhead at soe.ucsc.edu Fri Dec 8 12:41:35 2006 From: rhead at soe.ucsc.edu (Brooke Rhead) Date: Fri, 08 Dec 2006 12:41:35 -0800 Subject: [Genome] save all custom tracks in a single file, please In-Reply-To: <879466B1-F2C0-4F12-924D-71BADE73EE5E@mail.nih.gov> References: <879466B1-F2C0-4F12-924D-71BADE73EE5E@mail.nih.gov> Message-ID: <4579CDFF.2080805@soe.ucsc.edu> Hello Tim, We agree that saving all of the custom tracks generated in a session would be a useful feature. We are discussing ways this could be implemented, but I have no estimated date for completion. As an alternative, you might be interested in using Galaxy. Galaxy is a set of tools created and maintained at Penn State University that works in concert with the UCSC Genome Browser. It is located here: http://main.g2.bx.psu.edu/ Galaxy allows users to retrieve and manipulate data from UCSC, and to save, reload, and share data with the "history" option. Thank you for your input on the Browser. I hope you find Galaxy to be a useful tool as well. -- Brooke Rhead UCSC Genome Bioinformatics Group Tim Hefferon wrote: > Hi, > > Can you pretty-please make a feature whereby a user can save a file > that contains all of the custom tracks that (s)he is working with? I > work with many custom tracks at once, and generate new ones from > them, and I wind up losing them all if I don't save each one > separately, or cut and paste them one at a time into a new file. This > feature would be hugely useful. > > Thanks for listening. > > Thanks, > Tim > > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From kayla at soe.ucsc.edu Fri Dec 8 14:55:10 2006 From: kayla at soe.ucsc.edu (Kayla Smith) Date: Fri, 08 Dec 2006 14:55:10 -0800 Subject: [Genome] question about output In-Reply-To: References: Message-ID: <4579ED4E.30907@cse.ucsc.edu> Tara, Short answer: The reason why you are not seeing perfect conservation in that region is because that exon is simply different between human and mouse. You appear to be comparing human and mouse by looking at the conservation wiggle track. I want to point out a few things that may be useful to you: 1. The conservation wiggle graph incorporates data from many different genomes, not just human and mouse, in case you were looking at that pictoral representation of conservation. 2. You can click on the Conservation track controls and uncheck species you are uninterested in. You can then click on the conservation wiggle graph (which wont change in response to what you've unchecked) and you will be given a side by side alignment of the human and mouse genomes in that region. You can see from this alignment that there are some bases for which human and mouse are not the same. 3. The net and chain tracks in the mouse and human browsers can give you some insight to orthologous regions between the two genomes. I hope this helps you. Please don't hesitate to contact us again if you have further questions. Kayla Smith UCSC Genome Bioinformatics Group Tara Lauriat wrote: > I have taken the human exons and flanking sequence for my gene of interest and done a BLAT against the mouse genome to look at conservation. Most of the exons had a nice conservation over the exon as one would expect. However, some like the attached file had strange profiles where part of the exon was not conserved. I was wondering if you know how to interpret this type of result. > Sincerely, > Tara Lauriat > > > ------------------------------------------------------------------------ > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From rhead at soe.ucsc.edu Fri Dec 8 17:53:15 2006 From: rhead at soe.ucsc.edu (Brooke Rhead) Date: Fri, 08 Dec 2006 17:53:15 -0800 Subject: [Genome] Not all identities returned by blat In-Reply-To: <96C071542ED58D49BC08210D3456D5802EBDF9@pandore.ircm.priv> References: <96C071542ED58D49BC08210D3456D5802EBDF9@pandore.ircm.priv> Message-ID: <457A170B.4020104@soe.ucsc.edu> Hello Johanne, I tried BLATing your example sequence against sacCer1 using different parameters. I used -minIdentity=100 to get only matches with 100% identity. Strangely, using the default parameters, I got 19 full-length matches, one more than you found. I am using Standalone BLAT v. 33x5. I got more 100% identity results by lowering the tileSize. With -tileSize=6 I found 22 full-length, 100% hits. With tileSize=7, I found 20, and with tileSize of 8 to 11, I found 19 hits. I should also point out that with web-based blat on our site (at http://genome.ucsc.edu/cgi-bin/hgBlat? ), your example query gives 20 100% identity, full-length matches. Of course, some of the results have a span greater than 46 (the length of the query). We have advice on replicating web-based BLAT results here: http://genome.ucsc.edu/FAQ/FAQblat#blat5 (actually, this title is somewhat misleading . . . these are instructions on replicating web-based BLAT using gfClient and gfServer, not blat. You still may find them useful.) Also, if by "perfect matches" you mean something other than the way I interpreted your meaning, please let us know, and we will try to advise you on how to alter the parameters. -- Brooke Rhead UCSC Genome Bioinformatics Group Duhaime Johanne wrote: > Hello > > I am trying to find all perfect matches of 45-55 oligos on the > Saccharomyces cerevisiae. > > I would like to use blat instead of megablast because of the speed since > I have a lot of oligos. > > A blast with a certain sequence will return 20 matches 100% identity on > the full oligo lenght but with blat I get 18. And this happens for quite > a lot of sequences. I always get less matches with blat. > > I am struggling with the parameters but I cannot get the same results. > > Ex: TAGTCGCACTAGTCCTGACGTTGATGCTGGCAGTGGTAGTAGCACT blast=20 blat=18 > > I am using the following : > > /apps/programs/bin/blat x11.seq ../../../sgd/all.fsa -out=blast8 > x11.blat > > I have tried to change tileSize an stepSIze without success. > The matches for the above sequence that do not appear in the blat result > are in the neighborhood of another match. > > Database, blast and blat are installed locally. > The megablast command used: > /apps/programs/sources/blast-2.2.15/bin/megablast -d ../../../sgd/all -i > x11.seq -o x11.blast -m 8 -F f -p 0 > > Is there a parameter that will conciliate the result or say differently > how can I have all the perfect matches with blat? > > Thank you very much in advance > > > Johanne Duhaime > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From condie_carmack at agilent.com Sat Dec 9 09:57:44 2006 From: condie_carmack at agilent.com (condie_carmack@agilent.com) Date: Sat, 9 Dec 2006 10:57:44 -0700 Subject: [Genome] Question - Custom Track Annotation Message-ID: <6F94DB018DCB9E4AB34E7FE503AF2B16F8F503@wcosmb05.cos.agilent.com> Dear Genome Help- I have a question about annotation in Custom Tracks. I submitted the following custom track to the Genome Browser but the Browser reports the start as one base pair (10901569 instead of the submitted 10901568 position). I know you start the database with 0 instead of 1. Is there a way to fix this in the view? Thanks, Condie Submitted: track name="Agilent-013282" description="Human Genomic CGH" color="255,0,0" chr3 10901568 10901627 A_14_P110067 1000 + 10901568 10901627 0 1 59 0 Result: Custom Track: Agilent-013282 Human Genomic CGH Item: A_14_P110067 Score: 1000 Position: chr3:10901569-10901627 Band: 3p25.3 Genomic Size: 59 Strand: + View DNA for this feature Condie E. Carmack, Ph.D. sAgilent Technologies, Inc. 5301 Stevens Creek Blvd. MS 53U WG Santa Clara, CA 95051-7201 (408) 553-6217 (office) (408) 553-7100 (fax) (650) 814-1912 (cell) www.agilent.com condie_carmack at agilent.com Please note: email has changed from non.agilent to agilent.com. From hiram at soe.ucsc.edu Sat Dec 9 21:03:56 2006 From: hiram at soe.ucsc.edu (Hiram Clawson) Date: Sat, 9 Dec 2006 21:03:56 -0800 Subject: [Genome] Question - Custom Track Annotation In-Reply-To: <6F94DB018DCB9E4AB34E7FE503AF2B16F8F503@wcosmb05.cos.agilent.com> References: <6F94DB018DCB9E4AB34E7FE503AF2B16F8F503@wcosmb05.cos.agilent.com> Message-ID: <8c924a0c397a6a1ef5f88fb8b2942828@soe.ucsc.edu> Good Evening Condie: What you are noticing is not an error and is not something that can be configured. When the browser displays, for example the first three nucleotides of chrom 1, it would read: chr1:1-3 When you submit a custom bed track for the same first three bases of chrom 1, your bed file would read: chr1 0 3 This is just a difference in coordinate systems. One for internal use, one for external display. Please note the following discussion of these coordinate systems: http://genome.ucsc.edu/FAQ/FAQtracks#tracks1 --Hiram On 2006 Dec 09, , at 9:57 AM, wrote: > Dear Genome Help- > > I have a question about annotation in Custom Tracks. > > I submitted the following custom track to the Genome Browser but the > Browser reports the start as one base pair (10901569 instead of the > submitted 10901568 position). I know you start the database with 0 > instead of 1. Is there a way to fix this in the view? > > Thanks, > > Condie From zhuodu at gmail.com Sun Dec 10 08:27:41 2006 From: zhuodu at gmail.com (du zhuo) Date: Mon, 11 Dec 2006 00:27:41 +0800 Subject: [Genome] help-about gene name Message-ID: <6cf4d78a0612100827w11d6eca9rd420dc5f675cc06a@mail.gmail.com> Dear all I have downloaded Upstream5000 data of chicken genome, but the gene name is all displayed as genbank accession Nos. such as NM_001001189. How can I get the gene name of the related accession No. immediately other than search the database one by one? such as NM_001001189 is craniofacial development protein 1 [*Gallus gallus*]... Thank you for your attentions Best wishes, Zhuo Du From andre at bioinf.uni-leipzig.de Mon Dec 11 08:00:28 2006 From: andre at bioinf.uni-leipzig.de (Andre Schuetzenmeister) Date: Mon, 11 Dec 2006 17:00:28 +0100 Subject: [Genome] request Message-ID: <1165852828.9506.9.camel@slibowiz> Hello, my name is Andre Schuetzenmeister, I am a MSc student at the University of Leipzig, Germany. I use the UCSC-Genome Browser to recheck the conservation of binding sites which have previously classified conserved using PhastCons. I have already asked whether an R-interface exists to do this rechecking. I was told that no R-interface exists, thus, I decided to download all multiz-alignment pages and parse these HTML-pages to extract the alignments. I had started my R-script, and after ~800 requests I got an error-page (see the text below) which told me to write to this email address in order to ask for more efficient ways to do my batch-job. Are there more efficient ways to obtain the multiz-alignments for hundreds or thousands of such genomic locations? This would speed up the computation a lot. I would be pleased to get an answer soon. Sincerely, Andre Schuetzenmeister Your mesage : There is a very high volume of traffic coming from your site (IP address 139.18.75.83) as of Mon Dec 11 07:12:45 2006 (California time). So that other users get a fair share of our bandwidth, we are putting in a delay of 12.9 seconds before we service your request. This delay will slowly decrease over a half hour as activity returns to normal. This high volume of traffic is likely due to program-driven rather than interactive access, or the submission of queries on a large number of sequences. If you are making large batch queries, please write to our genome at cse.ucsc.edu public mailing list and inquire about more efficient ways to access our data. If you are sharing an IP address with someone who is sumitting large batch queries, we apologize for the inconvenience. Please contact genome-www at cse.ucsc.edu if you think this delay is being imposed unfairly. From aindap at yahoo.com Mon Dec 11 08:48:38 2006 From: aindap at yahoo.com (Amit Indap) Date: Mon, 11 Dec 2006 08:48:38 -0800 (PST) Subject: [Genome] citing programs from jksrc Message-ID: <20061211164838.36444.qmail@web32610.mail.mud.yahoo.com> Hi UCSC We have several manuscripts in preparation in our lab which have made extensive use of several programs in jksrc (liftOver, pslCDnaFilter, twoBitToFa, etc) Do I cite the NAR paper from 2003 for these tools? Or is there something more appropriate? Thanks for your help, Amit Indap Cornell University ____________________________________________________________________________________ Do you Yahoo!? Everyone is raving about the all-new Yahoo! Mail beta. http://new.mail.yahoo.com From mcquain at microarrays.com Mon Dec 11 08:07:30 2006 From: mcquain at microarrays.com (Mark McQuain) Date: Mon, 11 Dec 2006 10:07:30 -0600 Subject: [Genome] Genome Browser Question - HEEBO probe set Message-ID: <5.2.1.1.0.20061211100025.0531ec00@mail.microarrays.com> Hello: I am interested in learning if there is any information available generated by the HEEBO oligo set in a format similar to the UCSC "Human Gene Sorter" page which currently contains information generated by Norvartis-cDNA, Stanford-cDNA, Affy-U95, and Gladstone-cDNA microarrays. Thank your for your attention. Mark Mark McQuain Microarrays Inc. mcquain at microarrays.com 615-327-5495 From kate at soe.ucsc.edu Mon Dec 11 09:02:36 2006 From: kate at soe.ucsc.edu (Kate Rosenbloom) Date: Mon, 11 Dec 2006 09:02:36 -0800 Subject: [Genome] request In-Reply-To: <1165852828.9506.9.camel@slibowiz> References: <1165852828.9506.9.camel@slibowiz> Message-ID: <457D8F2C.1030202@cse.ucsc.edu> Hello Andre, Your best solution would be to download the full set of multiz alignment files in .maf format from our downloads site: http:hgdownload.cse.ucsc.edu/ and then to process the regions of interest to you locally. Hope this helps, Kate --- Kate Rosenbloom UCSC Genome Bioinformatics Andre Schuetzenmeister wrote: > Hello, > > my name is Andre Schuetzenmeister, I am a MSc student at the University > of Leipzig, Germany. I use the UCSC-Genome Browser to recheck the > conservation of binding sites which have previously classified conserved > using PhastCons. I have already asked whether an R-interface exists to > do this rechecking. I was told that no R-interface exists, thus, I > decided to download all multiz-alignment pages and parse these > HTML-pages to extract the alignments. > I had started my R-script, and after ~800 requests I got an error-page > (see the text below) which told me to write to this email address in > order to ask for more efficient ways to do my batch-job. > > Are there more efficient ways to obtain the multiz-alignments for > hundreds or thousands of such genomic locations? This would speed up the > computation a lot. > > I would be pleased to get an answer soon. > > Sincerely, > Andre Schuetzenmeister > > > Your mesage : > > There is a very high volume of traffic coming from your site (IP address > 139.18.75.83) as of Mon Dec 11 07:12:45 2006 (California time). So that > other users get a fair share of our bandwidth, we are putting in a delay > of 12.9 seconds before we service your request. This delay will slowly > decrease over a half hour as activity returns to normal. This high > volume of traffic is likely due to program-driven rather than > interactive access, or the submission of queries on a large number of > sequences. If you are making large batch queries, please write to our > genome at cse.ucsc.edu public mailing list and inquire about more efficient > ways to access our data. If you are sharing an IP address with someone > who is sumitting large batch queries, we apologize for the > inconvenience. Please contact genome-www at cse.ucsc.edu if you think this > delay is being imposed unfairly. > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From vlb2 at cornell.edu Mon Dec 11 08:50:44 2006 From: vlb2 at cornell.edu (Vanessa Bauer) Date: Mon, 11 Dec 2006 11:50:44 -0500 Subject: [Genome] 12 drosophila genome intron alignments Message-ID: Hello, Below you will find a few e-mails that I have had with Angie Hinrichs regarding intron alignments. She was very helpful and I now have a better understanding of how your data is organized. That being said, I still have questions. Basically, what we would like to have are intron alignments specifically for the species in the melanogaster subgroup. With Angie's help I can not get such alignments directly but they are based on 15 species. Will this led to a bias toward more conserved sequence being available. In other words, would there be more aligned intron nucleotides if the alignments were based on a smaller more closely related set of species? thanks, Tessa Bauer DuMont Hi Tessa, I'm not sure if this will get you exactly what you're looking for, but hopefully it will be a start: our Table Browser tool can make a track of introns, and i