From jtrubisz at mac.com Sun Apr 1 13:51:38 2007 From: jtrubisz at mac.com (Joe Trubisz) Date: Sun, 1 Apr 2007 16:51:38 -0400 Subject: [Genome] Using liftover Message-ID: <716EC20C-2766-4A64-ADB2-D9597BBF0739@mac.com> Hi... Maybe I'm a total idiot, but for the life of me, I cannot get liftover to work at all. I tried it both on linux and mac and I keep getting the same results. Perhaps you can explain to me where I'm going wrong. I'm trying to do a hg18-17 coordinate mapping of a human genome .gff file (it's in gff3 if that makes any difference). A sample of the file (ref.txt) is as follows: ##gff-version 3 chr14 UCSC_1 mRNA 91108540 91111136 . - ID=NM_001080113;Name=NM_001080113;Alias=PP8961;Note=hypothetical protein LOC650662 chr14 UCSC_1 three_prime_UTR 91108540 91110276 . - Parent=NM_001080113 I downloaded the hg18Tohg17.over.chain file. I tried liftover with -gff appearing in all places as follows: liftover ref.txt hg18ToHg17.over.chain xxx -gff and all I keep getting back is the usage page. I tried to include the -errorHelp, but it was of no help. Suggestions are welcome...I'm stumped!!!! Thanks, Joe Trubisz Cold Spring Harbor Labs. From Russell.Smithies at agresearch.co.nz Sun Apr 1 15:30:39 2007 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Mon, 2 Apr 2007 10:30:39 +1200 Subject: [Genome] Btau3 released yet? Message-ID: Hi, I was looking at Ensembl's Btau3 http://www.ensembl.org/Bos_taurus/contigview?c=1:41422300:1&w=2000 and noticed thaey have a link to "View region at UCSC" (points to http://genome.ucsc.edu/cgi-bin/hgTracks?db=bosTau3&position=Chr1:4142130 0-41423299) but as far as I'm aware, you haven't released the current annotations yet. Is this link actually pointing to the 'old' bovine genome as none of the features seem to be in the correct place. Are you planning on releasing annotations for the Btau_3.1 genome or are you still waiting (like the rest of us) on the next release? Thanx, Russell Smithies Bioinformatics Software Developer T +64 3 489 9085 E russell.smithies at agresearch.co.nz AgResearch Farming Food and Health. First > Te Ahuwhenua, Te Kai me te Whai Ora. Tuatahi > Invermay Research Centre Puddle Alley, Mosgiel, New Zealand T +64 3 489 3809 F +64 3 489 9174 www.agresearch.co.nz ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From heather at soe.ucsc.edu Mon Apr 2 07:13:33 2007 From: heather at soe.ucsc.edu (Heather Trumbower) Date: Mon, 2 Apr 2007 07:13:33 -0700 (PDT) Subject: [Genome] Btau3 released yet? In-Reply-To: References: Message-ID: Russell: We are in fact building a browser for bosTau3. It is currently available at http://genome-test.cse.ucsc.edu. There are a few more annotations that we will build before moving the browser to our production servers at http://genome.ucsc.edu. Heather Trumbower UCSC Genome Bioinformatics Group On Mon, 2 Apr 2007, Smithies, Russell wrote: > Hi, > I was looking at Ensembl's Btau3 > http://www.ensembl.org/Bos_taurus/contigview?c=1:41422300:1&w=2000 and > noticed thaey have a link to "View region at UCSC" (points to > http://genome.ucsc.edu/cgi-bin/hgTracks?db=bosTau3&position=Chr1:4142130 > 0-41423299) but as far as I'm aware, you haven't released the current > annotations yet. > Is this link actually pointing to the 'old' bovine genome as none of the > features seem to be in the correct place. > > Are you planning on releasing annotations for the Btau_3.1 genome or are > you still waiting (like the rest of us) on the next release? > > Thanx, > > Russell Smithies > Bioinformatics Software Developer > T +64 3 489 9085 > E russell.smithies at agresearch.co.nz > > AgResearch Farming Food and Health. First >> Te Ahuwhenua, Te Kai me te Whai Ora. Tuatahi >> > Invermay Research Centre > Puddle Alley, > Mosgiel, > New Zealand > T +64 3 489 3809 > F +64 3 489 9174 > www.agresearch.co.nz > > > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome > From zhouyubio at hotmail.com Mon Apr 2 03:11:04 2007 From: zhouyubio at hotmail.com (Yu Zhou) Date: Mon, 02 Apr 2007 10:11:04 +0000 Subject: [Genome] How to determine whether a short element is in 5UTR, exon, intron or 3UTR? Message-ID: Hi ... I have lots of short DNA elements to analyze. After I do BLAT, I could view the their positions by using the genome browser. In a further step, I would like to know their local positions in their own genes and answer the questions like " Whether they are in 5'UTR, or 3'UTR, or exons, or introns? If in exons or introns, which exon they reside?" How can I solve it by using Known Gene of your database? Thanks a lot in advance! Best regards, Yu ZHOU _________________________________________________________________ Express yourself instantly with MSN Messenger! Download today it's FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/ From heather at soe.ucsc.edu Mon Apr 2 10:16:56 2007 From: heather at soe.ucsc.edu (Heather Trumbower) Date: Mon, 2 Apr 2007 10:16:56 -0700 (PDT) Subject: [Genome] Using liftover In-Reply-To: <716EC20C-2766-4A64-ADB2-D9597BBF0739@mac.com> References: <716EC20C-2766-4A64-ADB2-D9597BBF0739@mac.com> Message-ID: Joe: I think you need one more argument (the file name for unmapped elements). You've got "xxx" for the filename for the successful mappings, but command-line liftOver also requires "yyy" where a lift isn't possible. Heather Trumbower UCSC Genome Bioinformatics Group On Sun, 1 Apr 2007, Joe Trubisz wrote: > Hi... > > Maybe I'm a total idiot, but for the life of me, I cannot get > liftover to work at all. > I tried it both on linux and mac and I keep getting the same results. > Perhaps you can explain to me where I'm going wrong. > > I'm trying to do a hg18-17 coordinate mapping of a human genome .gff > file (it's in gff3 if that makes any difference). > > A sample of the file (ref.txt) is as follows: > > ##gff-version 3 > chr14 UCSC_1 mRNA 91108540 91111136 . - > ID=NM_001080113;Name=NM_001080113;Alias=PP8961;Note=hypothetical > protein LOC650662 > chr14 UCSC_1 three_prime_UTR 91108540 91110276 . - Parent=NM_001080113 > > I downloaded the hg18Tohg17.over.chain file. > > I tried liftover with -gff appearing in all places as follows: > > liftover ref.txt hg18ToHg17.over.chain xxx -gff > > and all I keep getting back is the usage page. I tried to include the > -errorHelp, but it was of no help. > > Suggestions are welcome...I'm stumped!!!! > > Thanks, > Joe Trubisz > Cold Spring Harbor Labs. > > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome > From galt at soe.ucsc.edu Mon Apr 2 11:51:08 2007 From: galt at soe.ucsc.edu (Galt Barber) Date: Mon, 2 Apr 2007 11:51:08 -0700 (PDT) Subject: [Genome] Standalone BLAT and very short sequences In-Reply-To: References: Message-ID: blat -noTrimA Don't trim trailing poly-A This may help. You may need to set your stepSize to 1 for such tiny alignments. MinIdentity is a %, so you may want that to be high. BLAT is optimized for aligning mRNA to DNA, but can be used for other jobs too. My experience with very short sequences is that -fastMap should not be used. Hiram Clawson wrote findMotif found in src/utils in the kent cvs project. This may be useful for very short exact matches of size 4 to 16 bases. -Galt On Sat, 31 Mar 2007 calhoujd at notes.udayton.edu wrote: > I am experimenting with using standalone BLAT on very short queries (5-10 > bp). When I use genomic data (a short query against a human chromosome), > I seem to be getting fairly good results for short perfect matches. > However, when I use simulated data, I am experiencing some issues. For > example, if I query with the sequence 'AAAAAAA' (7 bp) against a '.fa' > database that contains 150 base pairs of just 'A' nucleotides, I only get > one match. Shouldn't I get hits at every position along the database? My > usage on cygwin is the following: > > $ ./blat.exe chr22.fa a.fa afaagainstchr22.psl -tileSize=7 -minScore=4 > -minIdentity=75 -oneOff=1 minMatch=1 -stepSize=2 > > Can I tweak the usage to return all of the simulated hits? I would like > to get the simulations working properly before moving on to testing > genomic data. Thanks! > > Jeff Calhoun > Undergraduate Researcher > University of Dayton > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome > From angie at soe.ucsc.edu Mon Apr 2 13:15:55 2007 From: angie at soe.ucsc.edu (Angie Hinrichs) Date: Mon, 2 Apr 2007 13:15:55 -0700 (PDT) Subject: [Genome] Chained blastz alignments In-Reply-To: References: Message-ID: Hi Eric, > So if you first shred the sequences into chunks, then use axtChain to create > chains that can span across these smaller chunks, how and where do you store > the positions of the chunks on the larger assembly? How do you use this > information to later piece back together the chromosome/scaffold that you > have previously broken into chunks. First we invoke blastz with a [start,end] range appended to the sequence filenames. blastz's lav output includes the [start,end] range. Then we run a postprocessing script (written by Scott Schwartz, formerly of PSU, now at Google I think) that parses the range and adjusts the coordinates in the lav blocks. If you have downloaded the kent source tree, the postprocessing script is kent/src/hg/utils/automation/blastz-normalizeLav . It is invoked by blastz-run-ucsc in the same dir, and that is invoked by a job script created by doBlastzChainNet.pl (which in turn depends on several *.pm files in that dir). Scott Schwartz developed a pretty sophisticated system for running blastz on our compute cluster with various optional tweaks (for example, snipping of lineage-specific repeats prior to running blastz, and then adjusting the lav coords to un-snip!). Then Jim added the chaining and netting flow. We staff developers used to run a sequence of their scripts and programs manually from a template, but then automated the whole flow with yet another layer of scripts... so the whole system is pretty challenging to reverse engineer at this point, but the scripts are there to peruse. You can run doBlastzChainNet.pl -help to get a lengthy description of the config file input that it expects, and then you can run "doBlastzChainNet.pl -debug DEF" where DEF is a properly constructed config file, to see a sequence of commands that would be run if not for -debug -- also, it will dump out slave .csh scripts. A lot of the commands in there are specific to our computing environment (e.g. parasol commands for cluster jobs), but they do show exactly how we invoke the many programs in the pipeline. That said, you probably don't need to exactly duplicate our pipeline. For example, if your sequences are small enough (or computers big enough :) so that you don't need to chunk them, then you don't need to bother with the blastz-normalizeLav stuff. Still, axtChain can find much longer chains than blastz because of its handling of double-sided gaps, so I still think it's worth your while -- there are benefits beyond stitching across those chunks. Angie From AHE at stowers-institute.org Mon Apr 2 13:22:33 2007 From: AHE at stowers-institute.org (Eldakak, Amr) Date: Mon, 2 Apr 2007 15:22:33 -0500 Subject: [Genome] question about 3' UTR sequences Message-ID: Hi, I would like to retrieve the 3'-UTR sequence of a yeast protein (Tpo1) using your browser. It did not work, could you please let me know how I could do it. Thank you for your help. Amr Amr H. Eldakak, Ph.D. Postdoctoral Research Associate Stowers Institute for Medical Research 1000 E. 50th Street Kansas City, Missouri 64110 Phone: 816-926-4103 Fax: 816-395-2950 e-mail: aeldakak at stowers-institute.org From ann at soe.ucsc.edu Mon Apr 2 13:52:08 2007 From: ann at soe.ucsc.edu (Ann Zweig) Date: Mon, 02 Apr 2007 13:52:08 -0700 Subject: [Genome] question about 3' UTR sequences In-Reply-To: References: Message-ID: <46116CF8.4030408@cse.ucsc.edu> Hello Amr, Yes, you can do this using our browser. Open the Saccharomyces cerevisiae (sacCer1) browser, and type in Tpo1 in the search box. Your gene will be highlighted. Press the "DNA" link in the top blue navigation bar. From here you can get the DNA for the entire gene. Note that we are using the sacCer assembly from the Saccharomyces Genome Database released 1 Oct. 2003. There may be more recent updates to this gene, but we do not have them displayed on our browser. Hope this answers your question. Regards, ---------- Ann Zweig UCSC Genome Bioinformatics Group http://genome.ucsc.edu Please feel free to search the Genome mailing list archives by visiting our home page, clicking on "Contact Us", then typing a word or phrase into the search box. On that same page (http://genome.ucsc.edu/contacts.html), you can subscribe to the Genome mailing list. Eldakak, Amr wrote: > Hi, > > I would like to retrieve the 3'-UTR sequence of a yeast protein (Tpo1) using your browser. It did not work, could you please let me know how I could do it. Thank you for your help. > > Amr > > > Amr H. Eldakak, Ph.D. > Postdoctoral Research Associate > > Stowers Institute for Medical Research > 1000 E. 50th Street > Kansas City, Missouri 64110 > Phone: 816-926-4103 > Fax: 816-395-2950 > e-mail: aeldakak at stowers-institute.org > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From ann at soe.ucsc.edu Mon Apr 2 14:10:21 2007 From: ann at soe.ucsc.edu (Ann Zweig) Date: Mon, 02 Apr 2007 14:10:21 -0700 Subject: [Genome] Entrez gene ID - LocusLink ID for chimp In-Reply-To: <461123F8.6060907@ucsc.edu> References: <461123F8.6060907@ucsc.edu> Message-ID: <4611713D.5000102@cse.ucsc.edu> Hello Youlian, In assemblies where there is an "Other RefSeq" track in addition to the "RefSeq" track, the refGene table (and associated refFlat table) will contain RefSeqs for the native assembly as well as the other assemblies. Conversely, in assemblies where there is no "Other RefSeq" track, the refGene table will contain only native refSeqs. The only exception to this rule is the chimp assembly! In the case of the chimp, because it is so close to human, and because there are not a lot of chimp RefSeqs, the refGene table contains both human and chimp. Hope this helps you understand the tables. In the future, please direct your questions to the genome mailing list at genome at soe.ucsc.edu -- our moderated forum for user questions and discussion. You will likely get a quicker response to your question. Regards, ---------- Ann Zweig UCSC Genome Bioinformatics Group http://genome.ucsc.edu Please feel free to search the Genome mailing list archives by visiting our home page, clicking on "Contact Us", then typing a word or phrase into the search box. On that same page (http://genome.ucsc.edu/contacts.html), you can subscribe to the Genome mailing list. > Hello, > > > > I recently downloaded the *panTro2.refGene* table with links to > *panTro2.refLink fields* for chimp from your genome browser. I found all > the *LocusLink IDs* and *ProtAccs* are the same as that for human > orthologues of the genes. However, from the homologGene database in NCBI > I got the completely different set of LocusLink IDs and ProtAccs. > > Could you please explain the discrepancy? > > Thanks > > ********************************************************************************************************* > > Youlian Pan, Ph.D. > Research Officer Agent de recherche > /Integrated Reasoning Raisonnement int?gr?/ > /Institute for Information Technology (IIT) Institut de > technologie de l'information (ITI)/ > /National Research Council Canada Conseil national de > recherches Canada/ > /1200 Montreal Rd, Bldg. //M-50 M-50, 1200 chemin > Montr?al/ > /Ottawa, Ontario, Canada/ > /K1A 0R6/ > Tel: (613) 993-8556 > Fax (613) 952-0215 > E-mail: youlian.pan at nrc-cnrc.gc.ca > ********************************************************************************************************* From e-just at northwestern.edu Mon Apr 2 14:17:54 2007 From: e-just at northwestern.edu (Eric Just) Date: Mon, 2 Apr 2007 16:17:54 -0500 Subject: [Genome] Chained blastz alignments In-Reply-To: References: Message-ID: Thanks again, Angie, that really clarifies stuff for me. You have been very helpful and the pipeline is fascinating. Eric On 4/2/07, Angie Hinrichs wrote: > > Hi Eric, > > > So if you first shred the sequences into chunks, then use axtChain to > create > > chains that can span across these smaller chunks, how and where do you > store > > the positions of the chunks on the larger assembly? How do you use > this > > information to later piece back together the chromosome/scaffold that > you > > have previously broken into chunks. > > First we invoke blastz with a [start,end] range appended to the > sequence filenames. blastz's lav output includes the [start,end] > range. Then we run a postprocessing script (written by Scott > Schwartz, formerly of PSU, now at Google I think) that parses the > range and adjusts the coordinates in the lav blocks. > > If you have downloaded the kent source tree, the postprocessing script > is kent/src/hg/utils/automation/blastz-normalizeLav . It is invoked > by blastz-run-ucsc in the same dir, and that is invoked by a job > script created by doBlastzChainNet.pl (which in turn depends on > several *.pm files in that dir). > > Scott Schwartz developed a pretty sophisticated system for running > blastz on our compute cluster with various optional tweaks (for > example, snipping of lineage-specific repeats prior to running blastz, > and then adjusting the lav coords to un-snip!). Then Jim added the > chaining and netting flow. We staff developers used to run a sequence > of their scripts and programs manually from a template, but then > automated the whole flow with yet another layer of scripts... so the > whole system is pretty challenging to reverse engineer at this point, > but the scripts are there to peruse. You can run doBlastzChainNet.pl > -help to get a lengthy description of the config file input that it > expects, and then you can run "doBlastzChainNet.pl -debug DEF" where > DEF is a properly constructed config file, to see a sequence of > commands that would be run if not for -debug -- also, it will dump out > slave .csh scripts. A lot of the commands in there are specific to > our computing environment (e.g. parasol commands for cluster jobs), > but they do show exactly how we invoke the many programs in the > pipeline. > > That said, you probably don't need to exactly duplicate our pipeline. > For example, if your sequences are small enough (or computers big > enough :) so that you don't need to chunk them, then you don't need to > bother with the blastz-normalizeLav stuff. Still, axtChain can find > much longer chains than blastz because of its handling of double-sided > gaps, so I still think it's worth your while -- there are benefits > beyond stitching across those chunks. > > Angie > > > From ann at soe.ucsc.edu Mon Apr 2 16:08:41 2007 From: ann at soe.ucsc.edu (Ann Zweig) Date: Mon, 02 Apr 2007 16:08:41 -0700 Subject: [Genome] How to determine whether a short element is in 5UTR, exon, intron or 3UTR? In-Reply-To: References: Message-ID: <46118CF9.6050403@cse.ucsc.edu> Hello Yu, Yes, you can do this using the browser. It's a bit of work, but it is possible. First you will need to get the positional information from your BLAT results for your DNA elements. Then you can make a Custom Track with that positional information. Then you can intersect your Custom Track with the parts of the Known Gene track to get the information you are looking for. Last month a very similar question was asked and answered at this list. It is archived in the list and located here: http://www.cse.ucsc.edu/pipermail/genome/2007-March/013036.html I encourage you to read that question and answer. I think you will find your answer there. If not, please feel free to write back to the list for more help. Regards, ---------- Ann Zweig UCSC Genome Bioinformatics Group http://genome.ucsc.edu Please feel free to search the Genome mailing list archives by visiting our home page, clicking on "Contact Us", then typing a word or phrase into the search box. On that same page (http://genome.ucsc.edu/contacts.html), you can subscribe to the Genome mailing list. Yu Zhou wrote: > Hi ... > > I have lots of short DNA elements to analyze. After I do BLAT, I could view > the their positions by using the genome browser. In a further step, I would > like to know their local positions in their own genes and answer the > questions like " Whether they are in 5'UTR, or 3'UTR, or exons, or introns? > If in exons or introns, which exon they reside?" > > How can I solve it by using Known Gene of your database? > > Thanks a lot in advance! > > Best regards, > Yu ZHOU > > _________________________________________________________________ > Express yourself instantly with MSN Messenger! Download today it's FREE! > http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/ > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From chenk at mmc.org Tue Apr 3 07:42:14 2007 From: chenk at mmc.org (Kang (Alex) Chen) Date: Tue, 03 Apr 2007 10:42:14 -0400 Subject: [Genome] one question Message-ID: Hi, This is Alex from MMCRI(Maine Medical center Research Institute).I want to establish a plasmid with Nkx2.5 promoter and i search it on your website.Sequence Retrieval Region Options: Promoter/Upstream by XXXX bases. How could i get the correct information about the exact promoter sequence since there are too much or too less sequence according to my choose. Thanks From talcon at iastate.edu Mon Apr 2 17:47:05 2007 From: talcon at iastate.edu (Tim Alcon) Date: Mon, 02 Apr 2007 19:47:05 -0500 Subject: [Genome] identifiers Message-ID: <4611A409.2030401@iastate.edu> What types of gene IDs can be used for a batch query? Is it possible to use Entrez Gene IDs? Thanks in advance for your help. Tim From talcon at iastate.edu Tue Apr 3 09:42:16 2007 From: talcon at iastate.edu (Tim Alcon) Date: Tue, 03 Apr 2007 11:42:16 -0500 Subject: [Genome] orthologous sequences Message-ID: <461283E8.4070201@iastate.edu> Is it possible when doing a batch query for a list of gene sequences in mouse to have it also give you all the orthologous sequences for human? Specifically I'm after the upstream regions, in FASTA format. It seems that Galaxy will automatically get the orthologous sequences if you request a blastz alignment, but I just want the unaligned sequences. Thanks in advance for your help. Tim From peter.scacheri at case.edu Tue Apr 3 10:52:25 2007 From: peter.scacheri at case.edu (Peter Scacheri) Date: Tue, 3 Apr 2007 13:52:25 -0400 Subject: [Genome] Custom Tracks Message-ID: I'm trying to add a custom track with P-value scores. How do I display the track so that the scores are represented in different shades of grey, ie., the higher the score, the darker the color. Here is an example of my input file. track name=test chr1 147973262 147973312 0.372351348 chr1 147973300 147973350 0.372351348 chr1 147973338 147973388 0.353911558 chr1 147973376 147973426 0.335036585 chr1 147973414 147973464 0.315663303 chr1 147973452 147973502 0.29571216 chr1 147973490 147973540 0.275080532 chr1 147973528 147973578 0.253632178 chr1 147974174 147974224 1.725645723 chr1 147974212 147974262 1.725645723 chr1 147974858 147974908 0.08374733 chr1 147975124 147975174 3.82209489 chr1 147975162 147975212 3.409069051 chr1 147975200 147975250 3.064172447 chr1 147975238 147975288 2.77180428 chr1 147975276 147975326 2.520802915 chr1 147975314 147975364 2.302969809 chr1 147975352 147975402 2.112146061 chr1 147975390 147975440 2.112146061 chr1 147975428 147975478 0.948169009 chr1 147975466 147975516 0.178606953 chr1 147975504 147975554 0.372351348 chr1 147975542 147975592 0.372351348 chr1 147975580 147975630 0.178606953 chr1 147975618 147975668 0.178606953 chr1 147975656 147975706 0.178606953 Thank you! Peter From kayla at soe.ucsc.edu Tue Apr 3 11:25:44 2007 From: kayla at soe.ucsc.edu (Kayla Smith) Date: Tue, 03 Apr 2007 11:25:44 -0700 Subject: [Genome] one question In-Reply-To: References: Message-ID: <46129C28.5070302@cse.ucsc.edu> Alex, I'm not sure I quite understand what you're asking but I'll see if I can answer your question anyway. It's great that you found the Sequence Retrieval Region Options menu. There are two boxes which you may find useful. The "Promoter/Upstream by XXXX bases" and also the "Downstream by XXXX" bases. You can put which ever value you like in those boxes. Just backspace over where it says "1000" and type in how much upstream and downstream sequence you want. One thing to note is that Nkx2.5 is on the negative strand. There are two ways to get sequence and they will give you different results. 1. Using the blue "DNA" button on the top of the page when you have Nkx2.5 in view. This will give you genome sequence in the 5' to 3' direction. 2. Clicking into the Known Gene details page for Nkx2.5 and requesting sequence from there. Once you've clicked on Nkx2.5 in the Known Genes track in the browser, you're taken to a details page. Scroll down on this page until you get to the "Sequence" link. This sequence will be given to you in the direction of transcription of the gene, which in the case of Nkx2.5 will be in the 3' to 5' direction. If you were asking whether there is some official promoter sequence to Nkx2.5 I can't answer that for you. That's more of a scientific question than a Genome Browser related question. I hope this information is helpful to you for extracting sequence. Please don't hesitate to contact us again if you require further assistance. Kayla Smith UCSC Genome Bioinformatics Group Kang (Alex) Chen wrote: > Hi, > This is Alex from MMCRI(Maine Medical center Research Institute).I want > to establish a plasmid with Nkx2.5 promoter and i search it on your > website.Sequence Retrieval Region Options: Promoter/Upstream by XXXX > bases. How could i get the correct information about the exact promoter > sequence since there are too much or too less sequence according to my > choose. > Thanks > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome > From - Tue From kayla at soe.ucsc.edu Tue Apr 3 12:29:08 2007 From: kayla at soe.ucsc.edu (Kayla Smith) Date: Tue, 03 Apr 2007 12:29:08 -0700 Subject: [Genome] Custom Tracks In-Reply-To: References: Message-ID: <4612AB04.1020101@cse.ucsc.edu> Peter, Please see the Custom Tracks FAQ here: http://genome.ucsc.edu/goldenPath/help/customTrack.html You will find Example 3 relevant. I've converted the data you gave me into a format which shows shades of grey for your data. I multiplied your p-values by 1000 so that you could see the gradient better. Try pasting the custom track below into our custom track tool, or clicking here to see my session with this custom track: http://genome.cse.ucsc.edu/cgi-bin/hgTracks?hgS_doOtherUser=submit&hgS_otherUserName=Kayla&hgS_otherUserSessionName=CT_for_Peter browser position chr1:147973262-147975706 browser hide all track name="Peter's BED track" description="Peter" visibility=2 color=0,128,0 useScore=1 chr1 147973262 147973312 a 372.351348 chr1 147973300 147973350 b 372.351348 chr1 147973338 147973388 c 353.911558 chr1 147973376 147973426 d 335.036585 chr1 147973414 147973464 e 315.663303 chr1 147973452 147973502 f 295.71216 chr1 147973490 147973540 g 275.080532 chr1 147973528 147973578 h 253.632178 chr1 147974174 147974224 i 25.645723 chr1 147974212 147974262 j 1725.645723 chr1 147974858 147974908 k 83.74733 chr1 147975124 147975174 l 3822.09489 chr1 147975162 147975212 m 3409.069051 chr1 147975200 147975250 n 3064.172447 chr1 147975238 147975288 o 2771.80428 chr1 147975276 147975326 p 2520.802915 chr1 147975314 147975364 q 2302.969809 chr1 147975352 147975402 r 2112.146061 chr1 147975390 147975440 s 2112.146061 chr1 147975428 147975478 t 948.169009 chr1 147975466 147975516 u 178.606953 chr1 147975504 147975554 v 372.351348 chr1 147975542 147975592 w 372.351348 chr1 147975580 147975630 x 178.606953 chr1 147975618 147975668 y 178.606953 chr1 147975656 147975706 z 178.606953 I hope this helps you to visualize your data with our Custom Track tools. Please don't hesitate to contact us again if you require further assistance. Kayla Smith UCSC Genome Bioinformatics Group Peter Scacheri wrote: > I'm trying to add a custom track with P-value scores. How do I > display the track so that the scores are represented in different > shades of grey, ie., the higher the score, the darker the color. > Here is an example of my input file. > > track name=test > chr1 147973262 147973312 0.372351348 > chr1 147973300 147973350 0.372351348 > chr1 147973338 147973388 0.353911558 > chr1 147973376 147973426 0.335036585 > chr1 147973414 147973464 0.315663303 > chr1 147973452 147973502 0.29571216 > chr1 147973490 147973540 0.275080532 > chr1 147973528 147973578 0.253632178 > chr1 147974174 147974224 1.725645723 > chr1 147974212 147974262 1.725645723 > chr1 147974858 147974908 0.08374733 > chr1 147975124 147975174 3.82209489 > chr1 147975162 147975212 3.409069051 > chr1 147975200 147975250 3.064172447 > chr1 147975238 147975288 2.77180428 > chr1 147975276 147975326 2.520802915 > chr1 147975314 147975364 2.302969809 > chr1 147975352 147975402 2.112146061 > chr1 147975390 147975440 2.112146061 > chr1 147975428 147975478 0.948169009 > chr1 147975466 147975516 0.178606953 > chr1 147975504 147975554 0.372351348 > chr1 147975542 147975592 0.372351348 > chr1 147975580 147975630 0.178606953 > chr1 147975618 147975668 0.178606953 > chr1 147975656 147975706 0.178606953 > > > Thank you! > Peter > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From kayla at soe.ucsc.edu Tue Apr 3 16:57:14 2007 From: kayla at soe.ucsc.edu (Kayla Smith) Date: Tue, 03 Apr 2007 16:57:14 -0700 Subject: [Genome] orthologous sequences In-Reply-To: <461283E8.4070201@iastate.edu> References: <461283E8.4070201@iastate.edu> Message-ID: <4612E9DA.5060702@cse.ucsc.edu> Tim, There are a couple of different steps to get the information you're looking for. I'll break it up into a few steps for you. First I will make a custom track of coordinates of your LocusLink genes. Then I will get some relevant sequence. After that we can use the hgBlastTab table to get some human data that corresponds to your mouse data. 1. Get a custom track of coordinates of your LocusLink genes Use the following settings: clade: vertebrate genome: mouse assembly: Feb 2006 group: Genes and Gene Prediction Tracks track: Known Genes table: knownGene region: genome filter: create Check the box next to "knowntoLocusLink" Click "Allow filtering using fields in checked tables" Under "mm8.knownToLocusLink based filters" paste your LocusLink IDs in the box next to "value does match" Click Submit output format: custom track click "get output" name your custom Track click "get custom track in table browser" 2. Get sequence Back in the Table Browser, change settings to: group: Custom Tracks track: Tim_LL (Whatever you named your custom track) table: ct_TimLL output format: sequence click "get output" Here you can elect to get the sequence of the genes or a user determined amount of promoter/upstream sequence. To get only promoter sequence, check the box next to "Promoter/Upstream by XXXX bases" and uncheck the other boxes. Then click "get sequence". 3. Orthologs. There are a few different ways to get the human ortholog information you are looking for. I will show you how to use the Table Browser to get information from the mm8.hgBlastTab table, filtering on the linked table knownToLocusLink. Use the following settings: clade: vertebrate genome: mouse assembly: Feb 2006 group: All Tables database: mm8 table: hgBlastTab filter: create Check the box next to mm8.knownToLocusLink Click the box that says "allow filtering using check tables" under mm8.knownToLocusLink based filters, paste in your values as in the previous part of the email. "value does match ___" Click submit output format: "selected fields from primary and related tables" choose the fields you want to output. So now you have an association between human and mouse ids. You can use the human ids that you got from this step to make a custom track and get sequence, upstream sequence or whatever you like. I think you have the tools to do this on your own. If you need help with this part write back and I'll walk you through it. Here is a related mailing list question from our archives which you may find useful: https://www.soe.ucsc.edu/pipermail/genome/2007-March/013096.html I hope this helps you to get the data you need. Please don't hesitate to contact us again if you require further assistance. Kayla Smith UCSC Genome Bioinformatics Group Tim Alcon wrote: > Is it possible when doing a batch query for a list of gene sequences in > mouse to have it also give you all the orthologous sequences for human? > Specifically I'm after the upstream regions, in FASTA format. It seems > that Galaxy will automatically get the orthologous sequences if you > request a blastz alignment, but I just want the unaligned sequences. > Thanks in advance for your help. > > Tim > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From mark.cowley at student.unsw.edu.au Wed Apr 4 02:18:10 2007 From: mark.cowley at student.unsw.edu.au (Mark Cowley) Date: Wed, 04 Apr 2007 09:18:10 +0000 Subject: [Genome] DBA genomic sequence Message-ID: <1175678290.8827.46.camel@sparta.littlelab.bioinformatics.unsw.edu.au> Hello all, This is not entirely related to the UCSC genome browser, so my apologies, but i think if anyone knows the answer, it will be this audience! I would like to obtain the genomic sequence data from the DBA/2J inbred mouse strain. I understand that Celera sequenced it along with C57BL/6J and others, but since Celera released their data to NCBI it has been difficult to find specific DBA sequence data. All I can find from NCBI are references to "the alternate assembly" which is the Celera consensus build based primarily upon sequencing C57BL/6J. I have found helpful SNP resources in the form of your snp126 track, and the SNP utilities at Ensembl, but am after the raw sequences for specific genomic intervals, to gain extra insight into subtle variations that may or may not exist in my regions of interest. I hope that someone can help Mark Mark Cowley PhD candidate Systems Biology Laboratory School of Biotechnology and Biomolecular Sciences University of New South Wales NSW, 2052, Australia From mike.boursnell at aht.org.uk Wed Apr 4 02:20:41 2007 From: mike.boursnell at aht.org.uk (MIKE BOURSNELL) Date: Wed, 4 Apr 2007 10:20:41 +0100 Subject: [Genome] Equine sequence Message-ID: Hi, Is the equine sequence available on the UCSC browser? Thanks, Mike Mike Boursnell Ph.D Genetics Department Centre for Preventive Medicine Animal Health Trust Lanwades Park Kentford Newmarket Suffolk CB8 7UU Tel +44 (0) 1638 750659 x1286 FAX 08700 502461 Email: mike.boursnelle at aht.org.uk From archie_russell at merck.com Wed Apr 4 11:45:45 2007 From: archie_russell at merck.com (Russell, Archie) Date: Wed, 4 Apr 2007 11:45:45 -0700 Subject: [Genome] Determining seqs with poor alignments Message-ID: <9BEE7CC4462DB14997A5C8CF8F3BEB02010DEEF0@ussemx1100.merck.com> Hi, I am trying to find a way to filter out sequences that have poor exon/intron alignments. In the browser it looks like good alignments have introns that are represented with lines that have herringbone-style arrows along them, and in poor alignments the introns are represented with lines without the arrows. How can I find determine which are which in the database? Thanks, Archie ------------------------------------------------------------------------------ Notice: This e-mail message, together with any attachments, contains information of Merck & Co., Inc. (One Merck Drive, Whitehouse Station, New Jersey, USA 08889), and/or its affiliates (which may be known outside the United States as Merck Frosst, Merck Sharp & Dohme or MSD and in Japan, as Banyu - direct contact information for affiliates is available at http://www.merck.com/contact/contacts.html) that may be confidential, proprietary copyrighted and/or legally privileged. It is intended solely for the use of the individual or entity named on this message. If you are not the intended recipient, and have received this message in error, please notify us immediately by reply e-mail and then delete it from your system. ------------------------------------------------------------------------------ From cdaroqui at montefiore.org Wed Apr 4 11:31:03 2007 From: cdaroqui at montefiore.org (Cecilia Daroqui) Date: Wed, 04 Apr 2007 14:31:03 -0400 Subject: [Genome] Gene lenght Message-ID: Hello, I was wondering if you know of a program/site where I can upload a bunch of gene (names, ID) and sort them by gene lenght (bp, kb). Thank you so much, Cecilia From nsabath at uh.edu Wed Apr 4 12:50:09 2007 From: nsabath at uh.edu (Niv Sabath) Date: Wed, 04 Apr 2007 13:50:09 -0600 Subject: [Genome] cytogenetic map Message-ID: <003201c776f2$738074b0$eee20781@universi6f65ab> Hello, I am a PhD student in the department of Biology and Biochemistry at the University of Houston. I am trying to find the exact base pair locations of the cytogenetic map on the human genome sequence. The paper by Terrence S. Furey and David Haussler (http://hmg.oxfordjournals.org/cgi/content/full/12/9/1037) mention you as one of the databases holding this data. Can you please tell me how to find the mapping of all the bandings in the human genome? Thanks in advance, Niv Sabath From sfinch at childhooddiseases.org Wed Apr 4 11:32:02 2007 From: sfinch at childhooddiseases.org (Sarah Finch) Date: Wed, 4 Apr 2007 14:32:02 -0400 Subject: [Genome] Question about % of exons represented by repeats? Message-ID: Good afternoon, I need to determine the % of exons in the human genome (March 2006 build) that are represented by repeats. Or in other words what % of all the human exons is blocked by the RepeatMasker program? I would really appreciate any help you could give me. Thanks, Sarah Finch Sarah E.W. Finch, Ph.D. Postdoctoral Research Scientist The Rothberg Institute for Childhood Diseases 530 Whitfield Street Guilford, CT 06437 203.458.7100 ext. 217 sfinch at childhooddiseases.org From rhead at soe.ucsc.edu Wed Apr 4 13:00:37 2007 From: rhead at soe.ucsc.edu (Brooke Rhead) Date: Wed, 04 Apr 2007 13:00:37 -0700 Subject: [Genome] cytogenetic map In-Reply-To: <003201c776f2$738074b0$eee20781@universi6f65ab> References: <003201c776f2$738074b0$eee20781@universi6f65ab> Message-ID: <461403E5.5060305@soe.ucsc.edu> Hello Niv, The cytogenetic map information is contained in a table called 'cytoband'. To get to it, click on the "Tables" link in the top blue bar of our site, then make the following selections: clade: vertebrate genome: human assembly: Mar. 2006 (this is our latest human assembly) group: mapping and sequencing tracks track: chromosome band table: cytoband region: genome output format: all fields from selected table If you wish to save the results in a file on your computer, enter a name in the "output file" box. Hit "get output". You should get a list like this: #chrom chromStart chromEnd name gieStain chr1 0 2300000 p36.33 gneg chr1 2300000 5300000 p36.32 gpos25 chr1 5300000 7100000 p36.31 gneg chr1 7100000 9200000 p36.23 gpos25 . . . Please let us know if you have any further questions. -- Brooke Rhead UCSC Genome Bioinformatics Group Niv Sabath wrote: > Hello, > > > > I am a PhD student in the department of Biology and Biochemistry at the > University of Houston. I am trying to find the exact base pair locations of > the cytogenetic map on the human genome sequence. The paper by Terrence S. > Furey and David Haussler > (http://hmg.oxfordjournals.org/cgi/content/full/12/9/1037) mention you as > one of the databases holding this data. Can you please tell me how to find > the mapping of all the bandings in the human genome? > > > > Thanks in advance, Niv Sabath > > > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From rhead at soe.ucsc.edu Wed Apr 4 13:08:03 2007 From: rhead at soe.ucsc.edu (Brooke Rhead) Date: Wed, 04 Apr 2007 13:08:03 -0700 Subject: [Genome] Equine sequence In-Reply-To: References: Message-ID: <461405A3.1050701@soe.ucsc.edu> Hello Mike, Yes, the sequence is available now: http://hgdownload.cse.ucsc.edu/downloads.html#horse There will not be a horse browser available for a few weeks, at least, but there is one in the works. -- Brooke Rhead UCSC Genome Bioinformatics Group MIKE BOURSNELL wrote: > Hi, > > Is the equine sequence available on the UCSC browser? > > Thanks, > > Mike > > Mike Boursnell Ph.D > Genetics Department > Centre for Preventive Medicine > Animal Health Trust > Lanwades Park > Kentford > Newmarket > Suffolk > CB8 7UU > > Tel +44 (0) 1638 750659 x1286 > FAX 08700 502461 > Email: mike.boursnelle at aht.org.uk > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From bina at purdue.edu Wed Apr 4 13:43:14 2007 From: bina at purdue.edu (bina@purdue.edu) Date: Wed, 4 Apr 2007 16:43:14 -0400 Subject: [Genome] GC-rich human rpts Message-ID: <1175719394.46140de283394@webmail.purdue.edu> How can I obtain a copy of the human DNA sequences that are collectively referred to as GC-rich and purinr-rich repeats? Minou Bina From gtg894p at mail.gatech.edu Wed Apr 4 14:14:35 2007 From: gtg894p at mail.gatech.edu (Jittima Piriyapongsa) Date: Wed, 04 Apr 2007 17:14:35 -0400 Subject: [Genome] BLAT program Message-ID: <7.0.1.0.2.20070404165243.02349d20@mail.gatech.edu> Hi, I have a question regarding BLAT search (http://genome.ucsc.edu/cgi-bin/hgBlat?command=start). What is the default cut-off value for the program? I saw in the details on the same page mentioned that BLAT quickly find sequences of 95% and greater similarity of length 40 bases or more. The results I got sometimes have hits with lower identity (i.e, 80% identity) or with short span length (i.e. ~20). Is it possible to know what is the exact cut-off values you used? Thank you very much. Jittima From galt at soe.ucsc.edu Wed Apr 4 15:33:40 2007 From: galt at soe.ucsc.edu (Galt Barber) Date: Wed, 4 Apr 2007 15:33:40 -0700 (PDT) Subject: [Genome] BLAT program In-Reply-To: <7.0.1.0.2.20070404165243.02349d20@mail.gatech.edu> References: <7.0.1.0.2.20070404165243.02349d20@mail.gatech.edu> Message-ID: With stepSize=5 (instead of the old default which was when we did not specify it, so it was 11, same as tileSize) the smallest sequences are around 2 * stepSize + TileSize -1 whic is 2*5 + 11 - 1 = 20, as you found out. This is for the hgBlat program on our website against our gfServer instance. When we run cluster jobs offline, other versions of blat are used, and the settings don't necessarily match what we do with hgBlat. Also, as blat changes and gets upgraded from time to time, this may change some subtle details of alignments. The ones with lower %identity are probably cases of individual exons. Across the entire multiple-exon alignment, the %identity is as reported >= 95%. You can see in the kent/src/hg/makeDb/doc/*.txt the actual steps used to create various tracks. e.g. from hg18.txt: blat -fine -mask=lower -minIdentity=95 -ooc=/cluster/bluearc/hg18/11.ooc -Galt On Wed, 4 Apr 2007, Jittima Piriyapongsa wrote: > Hi, > > I have a question regarding BLAT search > (http://genome.ucsc.edu/cgi-bin/hgBlat?command=start). What is the > default cut-off value for the program? I saw in the details on the > same page mentioned that BLAT quickly find sequences of 95% and > greater similarity of length 40 bases or more. The results I got > sometimes have hits with lower identity (i.e, 80% identity) or with > short span length (i.e. ~20). Is it possible to know what is the > exact cut-off values you used? > > Thank you very much. > Jittima > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome > From rhead at soe.ucsc.edu Wed Apr 4 19:53:25 2007 From: rhead at soe.ucsc.edu (Brooke Rhead) Date: Wed, 04 Apr 2007 19:53:25 -0700 Subject: [Genome] Determining seqs with poor alignments In-Reply-To: <9BEE7CC4462DB14997A5C8CF8F3BEB02010DEEF0@ussemx1100.merck.com> References: <9BEE7CC4462DB14997A5C8CF8F3BEB02010DEEF0@ussemx1100.merck.com> Message-ID: <461464A5.7090402@soe.ucsc.edu> Hi Archie, I assume you are looking at one of the mRNA or EST tracks. The arrows on the introns in these tracks are actually calculated on the fly in the Genome Browser, using the sequence to determine canonical introns. The information is not stored in any database table. If you are interested, we could point you to the code that does this and help you figure out how to replicate what is done by the Genome Browser. -- Brooke Rhead UCSC Genome Bioinformatics Group Russell, Archie wrote: > > Hi, > > I am trying to find a way to filter out sequences that have poor > exon/intron alignments. In the browser it looks like good alignments > have introns that are represented with lines that have herringbone-style > arrows along them, and in poor alignments the introns are represented > with lines without the arrows. How can I find determine which are > which in the database? > > Thanks, > Archie > > > ------------------------------------------------------------------------------ > Notice: This e-mail message, together with any attachments, contains > information of Merck & Co., Inc. (One Merck Drive, Whitehouse Station, > New Jersey, USA 08889), and/or its affiliates (which may be known > outside the United States as Merck Frosst, Merck Sharp & Dohme or MSD > and in Japan, as Banyu - direct contact information for affiliates is > available at http://www.merck.com/contact/contacts.html) that may be > confidential, proprietary copyrighted and/or legally privileged. It is > intended solely for the use of the individual or entity named on this > message. If you are not the intended recipient, and have received this > message in error, please notify us immediately by reply e-mail and then > delete it from your system. > > ------------------------------------------------------------------------------ > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From mashimad at jbirc.aist.go.jp Thu Apr 5 00:22:13 2007 From: mashimad at jbirc.aist.go.jp (Makoto Shimada) Date: Thu, 05 Apr 2007 16:22:13 +0900 Subject: [Genome] Haplotype data of HapMap - ENCODE regions Message-ID: <20070405160317.F724.MASHIMAD@jbirc.aist.go.jp> To whom it may concern, I wonder if I could have public haplotype data around genic regions by re-sequencing of some human population containing several individuals. I guess that there are haplotype data of HapMap - ENCODE regions, which were re-sequenced in 48 unrelated individuals. Could you tell me where those are avilable? I know how to get genotype data in each SNP site from following site, http://www.hapmap.org/downloads/genotypes/ . However these are not done haplotype inference (phasing). I would like to have (phased) haplotype data. I attached archives relating this subject. With my best, Makoto ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Makoto Shimada, Ph.D. Integrated Database Team Japan Biological Information Research Center AIST Tokyo Waterfront Bio-IT Bldg 7F 2-42 Aomi, Koto-ku, Tokyo, Japan, 135-0064 TEL: +81-3-3599-8537, FAX: +81-3-3599-8801 E-mail: mashimad at jbirc.aist.go.jp http://www.jbirc.aist.go.jp/idt/index_e.html ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > From: Daryl Thomas daryl at soe.ucsc.edu > Sent: Wed Jul 26 23:11:19 PDT 2006 > Subject: [Genome] Encode Variation Options for Genome Browser March > Holly, > > For hg17, the current May 2004 HapMap LD data contains only Phase I. The > haplotype inference (phasing) of the new Phase I + II data has recently been > completed, and we are planning to update this track in August or September > as priorities allow. > > We are waiting for the HapMap data coordination center to lift their > annotations from hg17 to hg18. I don't have a good timetable for this, but > the hg18 tracks should be available in the next couple of months. > > The Perlegen Haplotypes and Haplotype Blocks tracks contain pretty old data, > and are limited to chr21 and chr22, respectively. These are available on > hg16 and will probably not be lifted to hg17 and hg18 -- they're more likely > to be replaced entirely by dbSnp's haplotype resources as time allows. > > Daryl > > > -----Original Message----- > > From: genome-bounces at soe.ucsc.edu [mailto:genome-bounces at soe.ucsc.edu] > > On Behalf Of Holly Mack > > Sent: Wednesday, July 26, 2006 8:26 PM > > To: genome at soe.ucsc.edu > > Subject: [Genome] Encode Variation Options for Genome Browser March > > 2006Assembly > > > > Hello, > > > > I recently started learning how to use the UCSC Gene Browser site, which > > is a wonderful resource! I noticed that under 'Gene Sorter' the latest > > assembly date (March 2006) doesn't include all the options under the > > 'Variation and Repeats' section as the previous assembly versions. The > > specific options I'm interested in are the 'HapMap LD' and the 'Haplotype > > Blocks'. > > > > I was wondering if the 2006 version will be updated to include these > > options? Also if the May 2004 HapMap LD information would be considered > > outdated and therefore shouldn't be used? > > > > Thank you in advance! Cheers, Holly > > > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > Holly A. Mack, PhD > > Postdoctoral Research Fellow > > Ageing Research Unit > > The Centre for Mental Health Research > > Building #63, Eggleston Road > > The Australian National University > > Canberra, ACT 0200 Australia > > > > Email: Holly.Mack at anu.edu.au > > Phone: +61 2 6125 8967 > > Fax: +61 2 6125 0733 > > Web: www.anu.edu.au/cmhr > > > > CRICOS Provider #00120C > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > > From: Heather Trumbower heather at soe.ucsc.edu > Sent: Mon Mar 5 10:06:47 PST 2007 > Subject: [Genome] Sequence of Hapmap-Encode project > > Gil: > > HapMap data is managaed by the Data Coordination Center at > http://www.hapmap.org. We are developing a track that is based on the > data available at > www.hapmap.org/downloads/genotypes/2007-01/rs_strand/non-redundant/genotypes_chr*_*_r21a_nr.txt.gz > You can obtain that data directly. This data is described at > http://www.hapmap.org/genotypes/latest/00README.txt. > You may also wish to use the data at > www.hapmap.org/downloads/raw_data/affy500k/ > > We would be happy to try to answer further questions about HapMap data, > but your best source of information would be help at hapmap.org. > > Heather Trumbower > UCSC Genome Bioinformatics Group > > > On Sun, 4 Mar 2007, gatzmon at aecom.yu.edu wrote: > > > Dear Researchers > > I'm trying to track the sequence data from the 48 individuals (of the > > Hapmap-Encode project) that are stored in NCBI's Trace Archive using your > > web browser without success. > > Do you know a better way to do so? Is there a web site where I can > > download those sequences? > > Thanks in advance > > Gil > > > > > > > > Gil Atzmon, Ph.D. > > Assistant Professor of Medicine > > Director of the Genetic Core for LonGenety > > Institute for Aging Research and the Diabetes Research Center > > Albert Einstein College of Medicine > > 1300 Morris Park Ave; Forchheimer #523 > > Bronx, NY, 10461 > > Tel: 718-430-3628 > > Fax: 718-430-8557 > > From anthony.melillo at yale.edu Thu Apr 5 07:48:32 2007 From: anthony.melillo at yale.edu (Tony Melillo) Date: Thu, 05 Apr 2007 10:48:32 -0400 Subject: [Genome] question about drosophila data Message-ID: <46150C40.9000303@yale.edu> Hi, A while back I sent an inquiry regarding Drosophila release 5 data. I was told that it was in progress and undergoing QA. I was also given a link to a test server that had the release 5 data. I'm wondering if there is an eta of when it will be in production. In the mean time, is it possible to get MySQL access to the test server? Thanks. -- Anthony Melillo, M.S. Systems Programmer Yale School of Public Health http://info.med.yale.edu/eph/ From isabelle.dupanloup at zoo.unibe.ch Thu Apr 5 00:15:53 2007 From: isabelle.dupanloup at zoo.unibe.ch (Isabelle Dupanloup) Date: Thu, 5 Apr 2007 09:15:53 +0200 Subject: [Genome] Exon conservation Message-ID: <000401c77752$3f2f0c20$8f8a5c82@cmpg.unibe.ch> Hello there, I'm trying to get some info about exon conservation in mammals (genome scale). For example, I would like to know if exons annotated by ENSEMBL in the human genome are present in the genome of mouse, chimp, etc. One of your colleagues (Heather) suggested me to use the table browser to generate a custom track that contains exons from gene annotations. Then to use the table browser to intersect the exons with the Mouse/Chimp chain (from the conservation track). This is what I did (see screen shots attached). But the output does not contain anything about exon conservation: ----------------------- track name="tb_ensGene" description="table browser query on ensGene" visibility=2 url= chrX 119906996 119909329 ENST00000328078 0 + 119907073 119908750 0 1 2333, 0, ----------------------- Could you help me to solve that problem ? For me what could be nice is to get orthologous positions of human exons in the mouse/chimp genome (if they exist) and an indication of the absence of the human exons in the mouse/chimp genome (if they are not found in these genomes). I hope my msg is clear enough. Thank you very much in advance for your help and your great UCSC tool ! Best, Isabelle ------------------- Isabelle Dupanloup PhD Computational and Molecular Population Genetics Group Baltzerstrasse 6 3012 Bern Switzerland Tel: +41 31 631 45 49 Fax: +41 31 631 48 88 From heather at soe.ucsc.edu Thu Apr 5 09:14:45 2007 From: heather at soe.ucsc.edu (Heather Trumbower) Date: Thu, 5 Apr 2007 09:14:45 -0700 (PDT) Subject: [Genome] Haplotype data of HapMap - ENCODE regions In-Reply-To: <20070405160317.F724.MASHIMAD@jbirc.aist.go.jp> References: <20070405160317.F724.MASHIMAD@jbirc.aist.go.jp> Message-ID: Makoto: You are indeed looking for HapMap data which is available from http://www.hapmap.org. Specifically, genome-wide phasing data is available by going to http://www.hapmap.org/ and clicking on 'Bulk Data Download' under 'Project Data'. This takes you to http://www.hapmap.org/downloads/index.html.en. You can then click on 'Phasing Data' under ('Bulk Data'). There are two directories there. Phase II includes all of Phase I. The documentation for the Phase II data is http://www.hapmap.org/downloads/phasing/2006-07_phaseII/00README.txt. You may also be interested in the 'ENCODE Project' link which is available from the main page in the 'Project Data' section. As far as I know, the ENCODE re-sequencing data have not been phased separately. The HapMap mailing list -- help at hapmap.org -- may be able to provide more information. Heather Trumbower UCSC Genome Bioinformatics Group On Thu, 5 Apr 2007, Makoto Shimada wrote: > To whom it may concern, > > I wonder if I could have public haplotype data around genic regions by > re-sequencing of some human population containing several individuals. > I guess that there are haplotype data of HapMap - ENCODE regions, which > were re-sequenced in 48 unrelated individuals. > Could you tell me where those are available? > > I know how to get genotype data in each SNP site from following site, > http://www.hapmap.org/downloads/genotypes/ . > However these are not done haplotype inference (phasing). > I would like to have (phased) haplotype data. > > I attached archives relating this subject. > > With my best, > Makoto > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > Makoto Shimada, Ph.D. > Integrated Database Team > Japan Biological Information Research Center > > AIST Tokyo Waterfront Bio-IT Bldg 7F > 2-42 Aomi, Koto-ku, > Tokyo, Japan, 135-0064 > TEL: +81-3-3599-8537, FAX: +81-3-3599-8801 > E-mail: mashimad at jbirc.aist.go.jp > http://www.jbirc.aist.go.jp/idt/index_e.html > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From mis at hi.is Thu Apr 5 09:36:59 2007 From: mis at hi.is (Martin Ingi =?iso-8859-1?Q?Sigur=F0sson?=) Date: Thu, 5 Apr 2007 16:36:59 -0000 (GMT) Subject: [Genome] Liftover file from hg8 to a current version Message-ID: <3389.130.208.133.67.1175791019.squirrel@webmail.hi.is> Hi UCSC guys. I was searching for a LiftOver function from hg8 to a more current version. I saw this correspondance on your site: http://www.cse.ucsc.edu/pipermail/genome/2005-June/007863.html I didn?t see any LiftOver file though. Has it been made yet? Thank you in advance Best regards Martin Ingi University of Iceland, department of Medicine Martin Ingi Sigur?sson E-mail: mis at hi.is S?mi : 695-9769 From ann at soe.ucsc.edu Thu Apr 5 10:28:42 2007 From: ann at soe.ucsc.edu (Ann Zweig) Date: Thu, 05 Apr 2007 10:28:42 -0700 Subject: [Genome] Liftover file from hg8 to a current version In-Reply-To: <3389.130.208.133.67.1175791019.squirrel@webmail.hi.is> References: <3389.130.208.133.67.1175791019.squirrel@webmail.hi.is> Message-ID: <461531CA.2030106@soe.ucsc.edu> Hello Martin, Unfortunately, we no longer support liftOver from versions that are that old. Given the roughness of the assemblies from that era, I suspect a liftOver from hg8 to hg18 might not be very useful anyway (the hg8 assembly predates our standardization on the NCBI assembly). If you want to let us know what you're trying to do, we may be able to offer an alternate suggestion. Regards, ---------- Ann Zweig UCSC Genome Bioinformatics Group http://genome.ucsc.edu Martin Ingi Sigur?sson wrote: > Hi UCSC guys. > > I was searching for a LiftOver function from hg8 to a more current > version. I saw this correspondance on your site: > > http://www.cse.ucsc.edu/pipermail/genome/2005-June/007863.html > > I didn?t see any LiftOver file though. Has it been made yet? > > Thank you in advance > > Best regards > > Martin Ingi > University of Iceland, department of Medicine > > > Martin Ingi Sigur?sson > E-mail: mis at hi.is > S?mi : 695-9769 > > > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From ann at soe.ucsc.edu Thu Apr 5 11:49:30 2007 From: ann at soe.ucsc.edu (Ann Zweig) Date: Thu, 05 Apr 2007 11:49:30 -0700 Subject: [Genome] question about drosophila data In-Reply-To: <46150C40.9000303@yale.edu> References: <46150C40.9000303@yale.edu> Message-ID: <461544BA.5010006@soe.ucsc.edu> Hello Tony, Since your original request, our priorities have shifted somewhat. The release 5 browser (we call it dm3) is still in the works, but the developer who is working on it has other, higher priorities right now. We do not allow access to MySQL on the test server, but if you just need a few table dumps, I can probably get them to you. Write me off-list and let me know what you're looking for. Regards, ---------- Ann Zweig UCSC Genome Bioinformatics Group http://genome.ucsc.edu Tony Melillo wrote: > Hi, > > A while back I sent an inquiry regarding Drosophila release 5 data. I > was told that it was in progress and undergoing QA. I was also given a > link to a test server that had the release 5 data. I'm wondering if > there is an eta of when it will be in production. In the mean time, is > it possible to get MySQL access to the test server? > > Thanks. > From rhead at soe.ucsc.edu Thu Apr 5 12:07:35 2007 From: rhead at soe.ucsc.edu (Brooke Rhead) Date: Thu, 05 Apr 2007 12:07:35 -0700 Subject: [Genome] Question about % of exons represented by repeats? In-Reply-To: References: Message-ID: <461548F7.1010502@soe.ucsc.edu> Hello Sarah, You can use the Table Browser to intersect the RepeatMasker table (rmsk) with a custom track containing human exons. To do this, first make the custom track. Go to the Table Browser (the blue "Tables" link at the top of the page) and select March 2006 human assembly. Select "group: genes and gene prediction tracks". Here you will need to decide which of the gene tracks you wish to use for this calculation (you can see a description of each track by clicking the on the track name back on the Genome Browser page). Once you have decided on a gene track, choose "region: genome" and "output format: custom track". Hit "get output" and choose the option to make one BED record per coding exon. (If you would like to include the untranslated exons, you will need to do these steps again and choose the 5' UTR or 3' UTR exon options.) Hit the "get custom track in Table Browser" button. You should now have a custom track that contains only the coding exons from the gene track. Now choose your newly-made custom track in the Table Browser. Be sure "region: genome" is still selected. Hit the "intersection: create" button. In the drop-down menus on the next page, select the rmsk table as the table with which you would like to intersect. This table is under: group: variation and repeats track: RepeatMasker table: rmsk Here you will need to decide how much of an overlap between the exons and the repeats you wish to count in your calculation. For instance, if you want to consider an exon blocked if ANY of it overlaps the RepeatMasker track, choose "All [custom track] records that have any overlap with RepeatMasker". Hit "submit". Now you can use the "summary/statistics" button to get a count of how many of the exons intersect with the RepeatMasker track and the number of bases that are covered by those exons. You can compare these numbers with statistics from your custom track (just clear the intersection with the rmsk table and hit the "summary/statistics" button again). I hope these instructions help you get the information you need. Please let us know if you have further questions. -- Brooke Rhead UCSC Genome Bioinformatics Group Sarah Finch wrote: > Good afternoon, > > > > I need to determine the % of exons in the human genome (March 2006 > build) that are represented by repeats. Or in other words what % of all > the human exons is blocked by the RepeatMasker program? > > > > I would really appreciate any help you could give me. > > > > Thanks, > > Sarah Finch > > > > > > > > Sarah E.W. Finch, Ph.D. > > Postdoctoral Research Scientist > > The Rothberg Institute for Childhood Diseases > > 530 Whitfield Street > > Guilford, CT 06437 > > 203.458.7100 ext. 217 > > sfinch at childhooddiseases.org > > > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From rhead at soe.ucsc.edu Thu Apr 5 14:14:06 2007 From: rhead at soe.ucsc.edu (Brooke Rhead) Date: Thu, 05 Apr 2007 14:14:06 -0700 Subject: [Genome] DBA genomic sequence In-Reply-To: <1175678290.8827.46.camel@sparta.littlelab.bioinformatics.unsw.edu.au> References: <1175678290.8827.46.camel@sparta.littlelab.bioinformatics.unsw.edu.au> Message-ID: <4615669E.5070903@soe.ucsc.edu> Hello Mark, We contacted the director of the mouse genome project at NCBI (Deanna Church) to figure out why the DBA/2J sequence is so difficult to find. Apparently the Celera assembly submitted to NCBI does contain DBA/2J sequence, but it also includes three different 129 substrains, Balb/c and the public C57BL/6J sequence. The consensus sequence was generated from a mixture of traces from all of these strains. Unfortunately, there was not sufficient data for NCBI to determine which parts of the Celera assembly were derived from which strain. I hope this information helps. If you have further questions, you can also try the NCBI help desk: info at ncbi.nlm.nih.gov. -- Brooke Rhead UCSC Genome Bioinformatics Group Mark Cowley wrote: > Hello all, > > This is not entirely related to the UCSC genome browser, so my > apologies, but i think if anyone knows the answer, it will be this > audience! I would like to obtain the genomic sequence data from the > DBA/2J inbred mouse strain. I understand that Celera sequenced it > along with C57BL/6J and others, but since Celera released their data > to NCBI it has been difficult to find specific DBA sequence data. All > I can find from NCBI are references to "the alternate assembly" which > is the Celera consensus build based primarily upon sequencing > C57BL/6J. > > I have found helpful SNP resources in the form of your snp126 track, > and the SNP utilities at Ensembl, but am after the raw sequences for > specific genomic intervals, to gain extra insight into subtle > variations that may or may not exist in my regions of interest. > > I hope that someone can help > > Mark > > > > Mark Cowley PhD candidate > > Systems Biology Laboratory School of Biotechnology and Biomolecular > Sciences University of New South Wales NSW, 2052, Australia > > _______________________________________________ Genome maillist - > Genome at soe.ucsc.edu http://www.soe.ucsc.edu/mailman/listinfo/genome From ann at soe.ucsc.edu Thu Apr 5 14:52:44 2007 From: ann at soe.ucsc.edu (Ann Zweig) Date: Thu, 05 Apr 2007 14:52:44 -0700 Subject: [Genome] Exon conservation In-Reply-To: <000401c77752$3f2f0c20$8f8a5c82@cmpg.unibe.ch> References: <000401c77752$3f2f0c20$8f8a5c82@cmpg.unibe.ch> Message-ID: <46156FAC.9090704@soe.ucsc.edu> Hello Isabelle, Sorry that you are continuing to have difficulty with this. Let's try one more thing. Take, for example, this location on human chromosome 1: chr1:157,236,032-157,291,851. In this location, there are 7 Ensembl Genes. If you load my Session, we can ensure that we are viewing the exact same thing. To load my Session, press the 'Session' link in the top blue navigation bar. Then under Load Settings, enter user: MLQ session name: ensemblExons When you follow the link back to the Genome Browser, you will see 5 tracks loaded in the latest human browser: 1. A Custom Track made from the exons of the Ensembl Genes. 2. The Ensembl Gene track. 3. The Conservation track (with only mouse and chimp lines displayed). 4. The Mouse Net track. 5. The Chimp Net track. This particular configuration of the browser will give you an overview of how conserved these particular human Ensembl Exons are in the mouse and chimp. In this particular part of chr1, there is good conservation in some of these exons and not-so-good conservation in others. Click on a blue or gray mini-button to the left of a track to read more about how the track was crated and what the data mean. Note that if you hold your mouse over an item in either of the two Net tracks, you will see what the orthologous location is in that assembly. In this case, this location in human maps to chromosome 1 in both mouse and chimp. That explains how to visualize the data. If you do not have a large area to review, then this is probably the best method to use. If, however, you are looking at the entire genome, you would probably rather get data in an output file. After you create the Custom Track containing the Exons, you will intersect it first with the mouse Net track, then separately, with the chimp Net. If you are having difficulty creating a custom track using the table browser, read this User's guide: http://genome.cse.ucsc.edu/goldenPath/help/hgTablesHelp.html#CustomTrack The output you pasted in below looks like part of your exon Custom Track. > ----------------------- > track name="tb_ensGene" description="table browser query on ensGene" > visibility=2 url= > chrX 119906996 119909329 ENST00000328078 0 + > 119907073 119908750 0 1 2333, 0, > ----------------------- You will now need to intersect this Custom Track with the mouse Net track. If you need help on intersections, see this page: http://genome.ucsc.edu/goldenPath/help/hgTablesHelp.html#Intersection By performing this intersection, you will find those areas in mouse that are orthologous to the human Ensembl Gene Exons. If, after reading this and the two help pages, you still don't understand, please feel free to write back to the genome list. Regards, ---------- Ann Zweig UCSC Genome Bioinformatics Group http://genome.ucsc.edu Isabelle Dupanloup wrote: > Hello there, > > I'm trying to get some info about exon conservation in mammals (genome > scale). > For example, I would like to know if exons annotated by ENSEMBL in the > human genome are present in the genome of mouse, chimp, etc. > > One of your colleagues (Heather) suggested me to use the table browser > to generate a custom track that contains exons > from gene annotations. Then to use the table browser to intersect the > exons with the Mouse/Chimp chain (from the conservation track). > > This is what I did (see screen shots attached). But the output does not > contain anything about exon conservation: > > ----------------------- > track name="tb_ensGene" description="table browser query on ensGene" > visibility=2 url= > chrX 119906996 119909329 ENST00000328078 0 + > 119907073 119908750 0 1 2333, 0, > ----------------------- > > Could you help me to solve that problem ? > > For me what could be nice is to get orthologous positions of human exons > in the mouse/chimp genome (if they exist) and an indication of the > absence of the human exons in the mouse/chimp genome (if they are not > found in these genomes). > > I hope my msg is clear enough. > > Thank you very much in advance for your help and your great UCSC tool ! > > Best, Isabelle > > ------------------- > Isabelle Dupanloup PhD > Computational and Molecular Population Genetics Group > Baltzerstrasse 6 > 3012 Bern > Switzerland > Tel: +41 31 631 45 49 > Fax: +41 31 631 48 88 > > > ------------------------------------------------------------------------ > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From rhead at soe.ucsc.edu Thu Apr 5 15:09:25 2007 From: rhead at soe.ucsc.edu (Brooke Rhead) Date: Thu, 05 Apr 2007 15:09:25 -0700 Subject: [Genome] Gene lenght In-Reply-To: References: Message-ID: <46157395.6010502@soe.ucsc.edu> Hello Cecilia, There is an excellent tool for all kinds of genome data manipulation called Galaxy: http://main.g2.bx.psu.edu/ It is run by Penn State and works in conjunction with the UCSC Genome Browser. You can fetch Genome Browser information from us through their interface and then use their tools to get what you need. Two tools you should find particularly useful are: - Compute an expression on every row (under the Edit Queries heading). This tool will let you subtract the transcription start coordinate from the transcription end coordinate to get gene length. - Sort data in ascending or descending order (under the Filter, Sort, Join and Compare heading). This will sort your data on the gene length column you create with the compute tool. I hope this information helps. If you have questions about using Galaxy, you can contact their help desk: galaxy-user at bx.psu.edu or check out their wiki site: http://g2.trac.bx.psu.edu/. -- Brooke Rhead UCSC Genome Bioinformatics Group Cecilia Daroqui wrote: > Hello, > I was wondering if you know of a program/site where I can upload a bunch > of gene (names, ID) and sort them by gene lenght (bp, kb). > Thank you so much, > Cecilia > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From dgrigor1 at jhmi.edu Thu Apr 5 15:30:55 2007 From: dgrigor1 at jhmi.edu (Dmitry) Date: Thu, 05 Apr 2007 18:30:55 -0400 Subject: [Genome] downloads for Variation and Repeats tracks Message-ID: <000001c777d2$13826a60$64b181a2@jhu8880e0e5aa9> Hi, Where can I find and download psl files for the Variation and Repeats tracks? Thank you Dmitry From rhead at soe.ucsc.edu Thu Apr 5 16:09:14 2007 From: rhead at soe.ucsc.edu (Brooke Rhead) Date: Thu, 05 Apr 2007 16:09:14 -0700 Subject: [Genome] GC-rich human rpts In-Reply-To: <1175719394.46140de283394@webmail.purdue.edu> References: <1175719394.46140de283394@webmail.purdue.edu> Message-ID: <4615819A.3080906@soe.ucsc.edu> Hello Minou, We do not have a Genome Browser track that specifically contains GC-rich repeat regions. You could get GC-rich repeat region sequences by (1) creating a BED file of GC-rich regions, (2) uploading it as a custom track, and (3) using the Table Browser to intersect your custom track with the RepeatMasker track and outputting sequence for the regions in the intersection. You might be able to use the GC Percent track instead of creating your own custom track. However, it probably will not work well for your purposes, as that track contains GC percent calculated in 5-base windows, and an intersection of that track with the RepeatMasker track will result in a lot of 5-bp regions. Instead, you will likely want to determine your own GC-rich regions in larger bp windows. There is a tool in the Kent source tree that will calculate GC percent in larger windows called 'hgGcPercent'. The program calculates GC percentage in 20kb windows by default, but the window size can be changed, so you can experiment with different sizes to find something that suits your needs. If you have not already downloaded our source code, info on doing so is located here: http://genome.ucsc.edu/FAQ/FAQlicense.html#license3 Here is the usage statement for the hgGcPercent program: ===================== hgGcPercent - Calculate GC Percentage in 20kb windows usage: hgGcPercent [options] database nibDir nibDir can be a .2bit file, a directory that contains a database.2bit file, or a directory that contains *.nib files. Loads gcPercent table with counts from sequence. options: -win= - change windows size (default 20000) -noLoad - do not load mysql table - create bed file -file= - output to (stdout OK) (implies -noLoad) -chr= - process only chrN from the nibDir -noRandom - ignore randome chromosomes from the nibDir -noDots - do not display ... progress during processing -doGaps - process gaps correctly (default: gaps are not counted as GC) -wigOut - output wiggle ascii data ready to pipe to wigEncode -overlap=N - overlap windows by N bases (default 0) -verbose=N - display details to stderr during processing -bedRegionIn=input.bed Read in a bed file for GC content in specific regions and write to bedRegionsOut -bedRegionOut=output.bed Write a bed file of GC content in specific regions from bedRegionIn example: calculate GC percent in 5 base windows using a 2bit nib assembly (dp2): hgGcPercent -wigOut -doGaps -file=stdout -win=5 dp2 \ /cluster/data/dp2 | wigEncode stdin gc5Base.wig gc5Base.wib ===================== The -noLoad option will create a BED file that you filter for GC-rich regions and then use your results to create a custom track. Once you have the BED file, you can upload it as a custom track (instructions here: http://genome.ucsc.edu/goldenPath/help/customTrack.html#ADD_CT) It can then be selected in the Table Browser and an intersection created with the RepeatMasker (rmsk) table. To get the sequence of the intersected regions, simply choose "output format: sequence". I hope this information is useful to you. Let us know if you have further questions. -- Brooke Rhead UCSC Genome Bioinformatics Group bina at purdue.edu wrote: > > How can I obtain a copy of the human DNA sequences that are collectively > referred to as GC-rich and purinr-rich repeats? > > Minou Bina > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From ann at soe.ucsc.edu Thu Apr 5 16:34:43 2007 From: ann at soe.ucsc.edu (Ann Zweig) Date: Thu, 05 Apr 2007 16:34:43 -0700 Subject: [Genome] downloads for Variation and Repeats tracks In-Reply-To: <000001c777d2$13826a60$64b181a2@jhu8880e0e5aa9> References: <000001c777d2$13826a60$64b181a2@jhu8880e0e5aa9> Message-ID: <46158793.5070101@soe.ucsc.edu> Hello Dmitry, You can download the MySQL tables underlying the data from our download server. Follow the 'Downloads' link from the blue navigation bar on the left side of the home page. From there, press Human, then under hg18 (or the assembly you are interested in) press Annotation Database. This directory contains a file for each table in the database. You will need to determine the name of the table that supports the track you are interested in. To do this in the genome browser, simply press the hyperlink for the name of the track in the track controls under the display. In the URL on this page, you will see (usually at the very end), "g=abc". The 'abc' is the name of the table that underlies this track. This is the table you will want to download from the Download server. For example, for the SNP track in the hg18 browser, the table name is snp126. Regards, ---------- Ann Zweig UCSC Genome Bioinformatics Group http://genome.ucsc.edu Please feel free to search the Genome mailing list archives by visiting our home page, clicking on "Contact Us", then typing a word or phrase into the search box. On that same page (http://genome.ucsc.edu/contacts.html), you can subscribe to the Genome mailing list. Dmitry wrote: > Hi, > > Where can I find and download psl files for the Variation and Repeats > tracks? > > Thank you > > Dmitry > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From mashimad at jbirc.aist.go.jp Fri Apr 6 01:32:03 2007 From: mashimad at jbirc.aist.go.jp (Makoto Shimada) Date: Fri, 06 Apr 2007 17:32:03 +0900 Subject: [Genome] Haplotype data of HapMap - ENCODE regions In-Reply-To: References: <20070405160317.F724.MASHIMAD@jbirc.aist.go.jp> Message-ID: <20070406172938.3199.MASHIMAD@jbirc.aist.go.jp> Heather: Thank you so much for your prompt reply. I understand and I am inspecting the following directory, http://www.hapmap.org/downloads/phasing/2005-03_phaseI/ENCODE/ . This looks suitable data for my intention. I would like to ask further question about this data. The directory ?/downloads/phasing? contains two subdirectories, ?2005-03_phaseI/?and ?2006-07_phaseII/.? Do the dates (2005-03 & 2006-7) in these subdirectories represent fixed date of genotype data used? If so, these two phased data have been derived from following two genotype data, respectively? http://www.hapmap.org/genotypes/ 2005-03_16a_phaseI/ 2006-07/ Could you tell me whether I understand correctly? -- Makoto Shimada On Thu, 5 Apr 2007 09:14:45 -0700 (PDT) Heather Trumbower wrote: > Makoto: > > You are indeed looking for HapMap data which is available from > http://www.hapmap.org. > > Specifically, genome-wide phasing data is available by going to > http://www.hapmap.org/ and clicking on 'Bulk Data Download' under > 'Project Data'. > > This takes you to http://www.hapmap.org/downloads/index.html.en. > You can then click on 'Phasing Data' under ('Bulk Data'). > > There are two directories there. Phase II includes all of Phase I. > The documentation for the Phase II data is > http://www.hapmap.org/downloads/phasing/2006-07_phaseII/00README.txt. > > You may also be interested in the 'ENCODE Project' link which is available > from the main page in the 'Project Data' section. > > As far as I know, the ENCODE re-sequencing data have not been phased > separately. > > The HapMap mailing list -- help at hapmap.org -- may be able to provide more > information. > > Heather Trumbower > UCSC Genome Bioinformatics Group > > > On Thu, 5 Apr 2007, Makoto Shimada wrote: > > > To whom it may concern, > > > > I wonder if I could have public haplotype data around genic regions by > > re-sequencing of some human population containing several individuals. > > I guess that there are haplotype data of HapMap - ENCODE regions, which > > were re-sequenced in 48 unrelated individuals. > > Could you tell me where those are available? > > > > I know how to get genotype data in each SNP site from following site, > > http://www.hapmap.org/downloads/genotypes/ . > > However these are not done haplotype inference (phasing). > > I would like to have (phased) haplotype data. > > > > I attached archives relating this subject. > > > > With my best, > > Makoto > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > Makoto Shimada, Ph.D. > > Integrated Database Team > > Japan Biological Information Research Center > > > > AIST Tokyo Waterfront Bio-IT Bldg 7F > > 2-42 Aomi, Koto-ku, > > Tokyo, Japan, 135-0064 > > TEL: +81-3-3599-8537, FAX: +81-3-3599-8801 > > E-mail: mashimad at jbirc.aist.go.jp > > http://www.jbirc.aist.go.jp/idt/index_e.html > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > From birney at ebi.ac.uk Thu Apr 5 23:06:52 2007 From: birney at ebi.ac.uk (Ewan Birney) Date: Fri, 6 Apr 2007 07:06:52 +0100 Subject: [Genome] DBA genomic sequence In-Reply-To: <4615669E.5070903@soe.ucsc.edu> References: <1175678290.8827.46.camel@sparta.littlelab.bioinformatics.unsw.edu.au> <4615669E.5070903@soe.ucsc.edu> Message-ID: <30F6C87D-247C-4D9D-AE6F-1C733B5473D4@ebi.ac.uk> Hi guys - at ensembl we've (in collaboration with the Sanger Variation group) aligned the Celera reads individually and called the specific strain sequence. This is there for Human (Celera individuals), Rat (the Rat sequence from sprag daly sequenced commerical at Celera, and then released) and Mouse. There are two specific displays for this information. Both are gene orientated, as this seemed to be the most common request. One is graphical - which you can get to using the link "Compare SNPs in Transcript" on the left hand side of each gene view: http://www.ensembl.org/Mus_musculus/transcriptsnpview? db=core;transcript=ENSMUST00000037478 each horizontal line represents one strain, each SNP is a vertical line, the SNP colours indicate its consequence (synonymous/non synonymous) and the small grey bars indicate where we have information, ie, when there was sequencing information, but no SNP is present. At the bottom is a green/purple display of the SNPs The second is a multiple alignment view of the resequenced strains http://www.ensembl.org/Mus_musculus/sequencealignview? db=core;gene=ENSMUSG00000036330 This is in "beta" and is not yet linked from the left hand side of each gene page. This gives a complete multiple alignment across the gene, where you can see all the bases in the resequenced strains. Finally you can get complete genome-wide dumps of the resequencing information in the emf files, at: ftp://ftp.ensembl.org/pub/release-43/mus_musculus_43_36d/data/emf/ variation/ We also have this information in Human (Celera individuals) and Rat (the Celera Sprag Dawly strain). This will be extended to platypus and tetraodon and other species with resequened data. We also will incorporate the Sanger SNP discovery reads (per chromosome) and the mouse perlegen data. ftp://ftp.ensembl.org/pub/release-43/homo_sapiens_43_36e/data/emf/ variation ftp://ftp.ensembl.org/pub/release-43/rattus_norvegicus_43_34m/data/ emf/variation Do contact helpdesk at ensembl.org if you want more information on these resources; in addition, if people have any particular visualisation/ presentation ideas for this information, do contact us. On 5 Apr 2007, at 22:14, Brooke Rhead wrote: > Hello Mark, > > We contacted the director of the mouse genome project at NCBI (Deanna > Church) to figure out why the DBA/2J sequence is so difficult to find. > > Apparently the Celera assembly submitted to NCBI does contain DBA/2J > sequence, but it also includes three different 129 substrains, Balb/c > and the public C57BL/6J sequence. The consensus sequence was > generated > from a mixture of traces from all of these strains. Unfortunately, > there was not sufficient data for NCBI to determine which parts of the > Celera assembly were derived from which strain. > > I hope this information helps. If you have further questions, you can > also try the NCBI help desk: info at ncbi.nlm.nih.gov. > > -- > Brooke Rhead > UCSC Genome Bioinformatics Group > > > > Mark Cowley wrote: >> Hello all, >> >> This is not entirely related to the UCSC genome browser, so my >> apologies, but i think if anyone knows the answer, it will be this >> audience! I would like to obtain the genomic sequence data from the >> DBA/2J inbred mouse strain. I understand that Celera sequenced it >> along with C57BL/6J and others, but since Celera released their data >> to NCBI it has been difficult to find specific DBA sequence data. All >> I can find from NCBI are references to "the alternate assembly" which >> is the Celera consensus build based primarily upon sequencing >> C57BL/6J. >> >> I have found helpful SNP resources in the form of your snp126 track, >> and the SNP utilities at Ensembl, but am after the raw sequences for >> specific genomic intervals, to gain extra insight into subtle >> variations that may or may not exist in my regions of interest. >> >> I hope that someone can help >> >> Mark >> >> >> >> Mark Cowley PhD candidate >> >> Systems Biology Laboratory School of Biotechnology and Biomolecular >> Sciences University of New South Wales NSW, 2052, Australia >> >> _______________________________________________ Genome maillist - >> Genome at soe.ucsc.edu http://www.soe.ucsc.edu/mailman/listinfo/genome > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From dgrigor1 at jhmi.edu Thu Apr 5 20:22:28 2007 From: dgrigor1 at jhmi.edu (Dmitry) Date: Thu, 5 Apr 2007 23:22:28 -0400 Subject: [Genome] downloads for Variation and Repeats tracks Message-ID: <000001c777fa$ce4c8880$6600a8c0@GriGoDm91YKN11> Thank you Ann very much! I never looked at the end of the page. I thought that in the Annotation DB all tracks are stored by a chromosome number and never went beyond Chr1. Silly me :-) Thank you again Dmitry -----Original Message----- From: Ann Zweig [mailto:ann at soe.ucsc.edu] Sent: Thursday, April 05, 2007 7:35 PM To: Dmitry Cc: genome at soe.ucsc.edu Subject: Re: [Genome] downloads for Variation and Repeats tracks Hello Dmitry, You can download the MySQL tables underlying the data from our download server. Follow the 'Downloads' link from the blue navigation bar on the left side of the home page. From there, press Human, then under hg18 (or the assembly you are interested in) press Annotation Database. This directory contains a file for each table in the database. You will need to determine the name of the table that supports the track you are interested in. To do this in the genome browser, simply press the hyperlink for the name of the track in the track controls under the display. In the URL on this page, you will see (usually at the very end), "g=abc". The 'abc' is the name of the table that underlies this track. This is the table you will want to download from the Download server. For example, for the SNP track in the hg18 browser, the table name is snp126. Regards, ---------- Ann Zweig UCSC Genome Bioinformatics Group http://genome.ucsc.edu Please feel free to search the Genome mailing list archives by visiting our home page, clicking on "Contact Us", then typing a word or phrase into the search box. On that same page (http://genome.ucsc.edu/contacts.html), you can subscribe to the Genome mailing list. Dmitry wrote: > Hi, > > Where can I find and download psl files for the Variation and Repeats > tracks? > > Thank you > > Dmitry > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From ericho at eden.rutgers.edu Fri Apr 6 08:06:03 2007 From: ericho at eden.rutgers.edu (ericho@eden.rutgers.edu) Date: Fri, 6 Apr 2007 11:06:03 -0400 (EDT) Subject: [Genome] direction of ESTs Message-ID: <24242.165.230.31.178.1175871963.squirrel@webmail.eden.rutgers.edu> Hi Helpdesk, I've a question about the meaning of arrowheads displayed on the EST tracks. E.g. in Human Mar 2006 Assembly, region chr11:46,707,502-46,707,591, with both spliced and unspliced human ESTS set to 'full'. many ESTs are displayed under the gene. I noticed that some ESTs have arrowheads '>>>>>' or '<<<<<<' on them, I think '>>>>>' map to + and '<<<<<<' map to - strand. But it is not. e.g. If you click on EST CB156997 with '>>>>>', it says it is aligned to '-' strand in chromosome 11. Ok, may be the arrowhead represents in the opposite way i.e. '>>>>>' means - strand, '<<<<<<' means + strand. However, when I click on EST AI133113 which has '<<<<<' on it, the record also say it is mapped to '-' strand too. Then I'm confused. Could you tell me in what way I should interpret the arrowheads? Also, many ESTs don't have arrowheads on it, what does that mean? Thanks, -- Eric From gxing at soe.ucsc.edu Thu Apr 5 17:32:05 2007 From: gxing at soe.ucsc.edu (Guoliang Xing) Date: Thu, 5 Apr 2007 17:32:05 -0700 (PDT) Subject: [Genome] gene walking & Unigene cluster mapping Message-ID: Dear Genome help team, I want to code some Perl programs to compare different microarray signals on the same genes using UCSC genome browser annotation data. I'd like to download relevant gene track info in MySQL dataabse and run in local machine mode. Let's say Affy Expression array U133 plus2 vs. Affy U95, they have different probe coverages, but some of the probes mapped to the same gene. My intuition is to map different probes on the two different arrays to a common Unigene cluster id, and then run my statistical comparison based on common cluster ID, gene by gene. I know UCSC browser has the Affy array mapping info. But I don't know which track or mySQL UCSC table is the current gold starndard for gene annotation. Please advise. Thanks Guoliang From hartera at soe.ucsc.edu Fri Apr 6 10:05:33 2007 From: hartera at soe.ucsc.edu (Rachel Harte) Date: Fri, 6 Apr 2007 10:05:33 -0700 (PDT) Subject: [Genome] direction of ESTs In-Reply-To: <24242.165.230.31.178.1175871963.squirrel@webmail.eden.rutgers.edu> References: <24242.165.230.31.178.1175871963.squirrel@webmail.eden.rutgers.edu> Message-ID: Dear Eric, The EST orientation and hence the direction of arrows on the Genome Browser is decided using the intronOrientation from the estOrientInfo table (intronOrientation column) and/or the read direction (if specified) which is in the gbCdnaInfo table (direction colomn). Please take a look at this answer to a similar question which explains how this is done. http://www.cse.ucsc.edu/pipermail/genome/2003-December/003672.html Scroll to the bottom of the window to start at the beginning of the e-mail thread. Also, note that if intronOrientation is a low number then the direct of transcription could be wrong when calculated in this way. I hope that this helps you. Please let us know if you have further questions. Rachel Rachel Harte UCSC Genome Bioinformatics Group http://genome.ucsc.edu On Fri, 6 Apr 2007 ericho at eden.rutgers.edu wrote: > Hi Helpdesk, > > I've a question about the meaning of arrowheads displayed on the EST tracks. > > E.g. in Human Mar 2006 Assembly, region chr11:46,707,502-46,707,591, with > both spliced and unspliced human ESTS set to 'full'. > > many ESTs are displayed under the gene. > > I noticed that some ESTs have arrowheads '>>>>>' or '<<<<<<' on them, I > think '>>>>>' map to + and '<<<<<<' map to - strand. But it is not. > > e.g. If you click on EST CB156997 with '>>>>>', it says it is aligned to > '-' strand in chromosome 11. Ok, may be the arrowhead represents in the > opposite way i.e. '>>>>>' means - strand, '<<<<<<' means + strand. > > However, when I click on EST AI133113 which has '<<<<<' on it, the record > also say it is mapped to '-' strand too. Then I'm confused. > > Could you tell me in what way I should interpret the arrowheads? Also, > many ESTs don't have arrowheads on it, what does that mean? > > > Thanks, > -- Eric > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome > From heather at soe.ucsc.edu Fri Apr 6 10:27:34 2007 From: heather at soe.ucsc.edu (Heather Trumbower) Date: Fri, 6 Apr 2007 10:27:34 -0700 (PDT) Subject: [Genome] Haplotype data of HapMap - ENCODE regions In-Reply-To: <20070406172938.3199.MASHIMAD@jbirc.aist.go.jp> References: <20070405160317.F724.MASHIMAD@jbirc.aist.go.jp> <20070406172938.3199.MASHIMAD@jbirc.aist.go.jp> Message-ID: Makoto: The first line of http://www.hapmap.org/downloads/phasing/2005-03_phaseI/00README.txt "Phasing was done using the Phase I / rel#16a datafiles". It seems that http://www.hapmap.org/downloads/phasing/2006-07_phaseII/00README.txt doesn't mention data version, but I'm quite sure it is release 21. However I would recommend that you confirm by writing to help at hapmap.org as they are the data providers and would be happy to help you. Heather Trumbower UCSC Genome Bioinformatics Group On Fri, 6 Apr 2007, Makoto Shimada wrote: > Heather: > Thank you so much for your prompt reply. > I understand and I am inspecting the following directory, > http://www.hapmap.org/downloads/phasing/2005-03_phaseI/ENCODE/ . > This looks suitable data for my intention. > > I would like to ask further question about this data. > The directory /downloads/phasing contains two subdirectories, > 2005-03_phaseI/and 2006-07_phaseII/. > Do the dates (2005-03 & 2006-7) in these subdirectories represent fixed > date of genotype data used? If so, these two phased data have been > derived from following two genotype data, respectively? > http://www.hapmap.org/genotypes/ > 2005-03_16a_phaseI/ > 2006-07/ > Could you tell me whether I understand correctly? > > From hartera at soe.ucsc.edu Fri Apr 6 10:33:51 2007 From: hartera at soe.ucsc.edu (Rachel Harte) Date: Fri, 6 Apr 2007 10:33:51 -0700 (PDT) Subject: [Genome] gene walking & Unigene cluster mapping In-Reply-To: References: Message-ID: Guoliang, Currently, we have the "Known Genes" set of genes which has not been updated recently. However, within the next few days we are releasing a new gene set (UCSC Genes) that will be our new gold standard for gene annotation - the method for producing this gene set has changed resulting in an even higher quality set of annotations which will include more splice variants and also non-protein-coding genes. Watch out for an announcement of the release of this gene set on our home page. The track control will appear in the "Genes and Gene Prediction Tracks" group under the Genome Browser image. By clicking on the link above this track control, you will see a description of the track and a "View Table Schema" link shows the main table for this track which contains the alignment information for the genes. There are other connected tables that are also listed on this page. I hope that this helps you. Please let us know if you have further questions. Rachel Rachel Harte UCSC Genome Bioinformatics Group http://genome.ucsc.edu On Thu, 5 Apr 2007, Guoliang Xing wrote: > Dear Genome help team, > > I want to code some Perl programs to compare different microarray > signals on the same genes using UCSC genome browser annotation data. I'd > like to download relevant gene track info in MySQL dataabse and run in > local machine mode. > > Let's say Affy Expression array U133 plus2 vs. Affy U95, they have > different probe coverages, but some of the probes mapped to the same gene. > My intuition is to map different probes on the two different arrays to a > common Unigene cluster id, and then run my statistical comparison based on > common cluster ID, gene by gene. > > I know UCSC browser has the Affy array mapping info. > > But I don't know which track or mySQL UCSC table is the current gold > starndard for gene annotation. > > Please advise. > > Thanks > > Guoliang > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome > From gxing at soe.ucsc.edu Fri Apr 6 12:21:09 2007 From: gxing at soe.ucsc.edu (Guoliang Xing) Date: Fri, 6 Apr 2007 12:21:09 -0700 (PDT) Subject: [Genome] gene walking & Unigene cluster mapping In-Reply-To: References: Message-ID: Hi Rachel, Thank you for the info. It is what I need. The next step for me is to figure out which tables to query once you release the new annotation. What I have now is the blat result of all the probes on the affy arrays against Human Genome build 18. I guess in your annotation, the gene (or exon) boundary coordinates are based on genome position from 5'-most base 1 against HG 18. If this is the case, I can combine my BLAT results with your gene annotation info directly. Then I can query my data gene-by-gene based on unique gene IDs, or do a "chromosome walk" from 5' to 3' and apply my statistical analysis in a flexible way. Please confirm if my guess is right, and if you have additional info please let me know. Thanks Guoliang On Fri, 6 Apr 2007, Rachel Harte wrote: > Guoliang, > > Currently, we have the "Known Genes" set of genes which has not been > updated recently. However, within the next few days we are releasing a new > gene set (UCSC Genes) that will be our new gold standard for gene > annotation - the method for producing this gene set has changed resulting > in an even higher quality set of annotations which will include more > splice variants and also non-protein-coding genes. > > Watch out for an announcement of the release of this gene set on our home > page. The track control will appear in the "Genes and Gene Prediction > Tracks" group under the Genome Browser image. By clicking on the link above > this track control, you will see a description of the track and a "View > Table Schema" link shows the main table for this track which contains the > alignment information for the genes. There are other connected tables that > are also listed on this page. > > I hope that this helps you. Please let us know if you have further > questions. > > Rachel > > Rachel Harte > UCSC Genome Bioinformatics Group > http://genome.ucsc.edu > > > On Thu, 5 Apr 2007, Guoliang Xing wrote: > > > Dear Genome help team, > > > > I want to code some Perl programs to compare different microarray > > signals on the same genes using UCSC genome browser annotation data. I'd > > like to download relevant gene track info in MySQL dataabse and run in > > local machine mode. > > > > Let's say Affy Expression array U133 plus2 vs. Affy U95, they have > > different probe coverages, but some of the probes mapped to the same gene. > > My intuition is to map different probes on the two different arrays to a > > common Unigene cluster id, and then run my statistical comparison based on > > common cluster ID, gene by gene. > > > > I know UCSC browser has the Affy array mapping info. > > > > But I don't know which track or mySQL UCSC table is the current gold > > starndard for gene annotation. > > > > Please advise. > > > > Thanks > > > > Guoliang > > _______________________________________________ > > Genome maillist - Genome at soe.ucsc.edu > > http://www.soe.ucsc.edu/mailman/listinfo/genome > > > From hartera at soe.ucsc.edu Fri Apr 6 19:14:23 2007 From: hartera at soe.ucsc.edu (Rachel Harte) Date: Fri, 6 Apr 2007 19:14:23 -0700 (PDT) Subject: [Genome] gene walking & Unigene cluster mapping In-Reply-To: References: Message-ID: Guoliang, The coordinates in both the BLAT output and the database tables have 0-based starts so the first base (5' most) on a chromosome or scaffold is position 0. For the ends, the positions are 1-based. You are correct in saying that the BLAT output and the database table coordinates are directly compatible in this way. Rachel Rachel Harte UCSC Genome Bioinformatics Group http://genome.ucsc.edu On Fri, 6 Apr 2007, Guoliang Xing wrote: > Hi Rachel, > > Thank you for the info. It is what I need. The next step for me is to > figure out which tables to query once you release the new annotation. > > What I have now is the blat result of all the probes on the affy arrays > against Human Genome build 18. > > I guess in your annotation, the gene (or exon) boundary coordinates are > based on genome position from 5'-most base 1 against HG 18. If this is > the case, I can combine my BLAT results with your gene annotation info > directly. > > Then I can query my data gene-by-gene based on unique gene IDs, or do a > "chromosome walk" from 5' to 3' and apply my statistical analysis in a > flexible way. > > Please confirm if my guess is right, and if you have additional info > please let me know. > > Thanks > > Guoliang > > On Fri, 6 Apr 2007, Rachel Harte wrote: > > > Guoliang, > > > > Currently, we have the "Known Genes" set of genes which has not been > > updated recently. However, within the next few days we are releasing a new > > gene set (UCSC Genes) that will be our new gold standard for gene > > annotation - the method for producing this gene set has changed resulting > > in an even higher quality set of annotations which will include more > > splice variants and also non-protein-coding genes. > > > > Watch out for an announcement of the release of this gene set on our home > > page. The track control will appear in the "Genes and Gene Prediction > > Tracks" group under the Genome Browser image. By clicking on the link above > > this track control, you will see a description of the track and a "View > > Table Schema" link shows the main table for this track which contains the > > alignment information for the genes. There are other connected tables that > > are also listed on this page. > > > > I hope that this helps you. Please let us know if you have further > > questions. > > > > Rachel > > > > Rachel Harte > > UCSC Genome Bioinformatics Group > > http://genome.ucsc.edu > > > > > > On Thu, 5 Apr 2007, Guoliang Xing wrote: > > > > > Dear Genome help team, > > > > > > I want to code some Perl programs to compare different microarray > > > signals on the same genes using UCSC genome browser annotation data. I'd > > > like to download relevant gene track info in MySQL dataabse and run in > > > local machine mode. > > > > > > Let's say Affy Expression array U133 plus2 vs. Affy U95, they have > > > different probe coverages, but some of the probes mapped to the same gene. > > > My intuition is to map different probes on the two different arrays to a > > > common Unigene cluster id, and then run my statistical comparison based on > > > common cluster ID, gene by gene. > > > > > > I know UCSC browser has the Affy array mapping info. > > > > > > But I don't know which track or mySQL UCSC table is the current gold > > > starndard for gene annotation. > > > > > > Please advise. > > > > > > Thanks > > > > > > Guoliang > > > _______________________________________________ > > > Genome maillist - Genome at soe.ucsc.edu > > > http://www.soe.ucsc.edu/mailman/listinfo/genome > > > > > > From valouev at stanford.edu Fri Apr 6 21:41:31 2007 From: valouev at stanford.edu (Anton Valouev) Date: Fri, 06 Apr 2007 21:41:31 -0700 Subject: [Genome] Custom plots In-Reply-To: References: <46030375.8090603@stanford.edu> Message-ID: <461720FB.5020303@stanford.edu> Dear Heather, we are in the process of compiling a set of custom tracks for genome-wide C.elegans nucleosome positioning derived using next generation ultra high throughput sequencing technologies. As such, I often have to upload different tracks into your browser and share them with my collaborators. Each chromosome usually has 6-8 tracks associated with it each approximatedly 50Mb. This works fairly well for us, the only inconvenient part about that is that when sessions expire, and I have to upload all the tracks again and that takes a very long time. Is there any way to assign non-expiring session id to us so that we wouldn't have to reload tracks every other day? We plan to later share these tracks as a public resource for genomics community. Many thanks for your help. Best, Anton. Heather Trumbower wrote: > On Thu, 22 Mar 2007, Heather Trumbower wrote: > >> Anton: >> >> We currently have support for custom WIG files, but not for custom >> MAF files. The documentation for custom WIG files is available at >> http://genome.ucsc.edu/goldenPath/help/wiggle.html. >> >> I thought I would also mention, if you are able to configure our >> existing >> Conservation tracks to suit your needs, you can export the image as >> PDF or postscript. We support a number of different configuration >> options for >> Conservation tracks. >> >> I hope this helps, let us know if you have further questions. >> >> Heather Trumbower >> UCSC Genome Bioinformatics Group >> >> >> On Thu, 22 Mar 2007, Anton Valouev wrote: >> >>> Hello, >>> I am trying to display a custom plot on the genome browser, >>> a plot like a conservation score that is a part of your browser. >>> Is there any way to compile a custom track to display a plot >>> for each position of the genome? >>> >>> Thanks in advance. >>> Very best, Anton. >>> _______________________________________________ >>> Genome maillist - Genome at soe.ucsc.edu >>> http://www.soe.ucsc.edu/mailman/listinfo/genome >>> >> > From verjo at iq.usp.br Sat Apr 7 19:59:54 2007 From: verjo at iq.usp.br (verjo@iq.usp.br) Date: Sat, 07 Apr 2007 23:59:54 -0300 Subject: [Genome] custom tracks for TIN and PIN RNAs in hg17 Message-ID: <20070407235954.xw2kza4s6t0w8840@webmail.iq.usp.br> Dear colleagues: We would like to contribute our custom tracks to hg17 browser, which we think will be of interest to a number of investigators. The track named TIN_RNAs show the genome mapping coordinates of all 55,139 Totally Intronic Noncoding RNA (TIN RNA) transcripts that we have identified in the human genome, as described in our paper in Genome Biology (see enclosed). The track files in PSL format can be retrieved from Genome Biology website: http://genomebiology.com/imedia/1795171211288678/supp13.psl The track named PIN_RNAs show the mapping coordinates of all 12,592 Partially Intronic Noncoding RNA (PIN RNA) transcripts, as described in our paper. http://genomebiology.com/imedia/2029173154128867/supp14.psl Finally, the track named TIN_PIN_probes shows the genomic coordinates of all TIN and PIN sense and antisense intronic probes plus the exonic probes in our custom-designed 44K intron-exon oligoarray. This array was used for our gene expression experiments with prostate, kidney and liver human tissues. http://genomebiology.com/imedia/1258251625128868/supp16.psl Please let us know if you need any additional information. Sincerely, Sergio Verjovski-Almeida Eduardo M. Reis Helder I. Nakaya From gxing at soe.ucsc.edu Sun Apr 8 16:22:05 2007 From: gxing at soe.ucsc.edu (Guoliang Xing) Date: Sun, 8 Apr 2007 16:22:05 -0700 (PDT) Subject: [Genome] gene walking & Unigene cluster mapping In-Reply-To: References: Message-ID: Hi Rachel, Thanks for the info. I'll use base 0 for caculation. Now I am looking at the mySQL db dump at http://hgdownload.cse.ucsc.edu/goldenPath/hg18/database/ And I've read Fan Hsu's Bioinformatics paper "The UCSC Known Genes", it's time to get to table details. My question is, which master table should I query to get chr-by-chr UCSC known gene ID (kgID)? Is chr1_gold table the correct one to start for chr1? Is the bin field equal to kgID described in Fan's paper? I've copied its table structure below: -- Table structure for table `chr1_gold` -- CREATE TABLE chr1_gold ( bin smallint(6) NOT NULL default '0', chrom varchar(255) NOT NULL default '', chromStart int(10) unsigned NOT NULL default '0', chromEnd int(10) unsigned NOT NULL default '0', ix int(11) NOT NULL default '0', type char(1) NOT NULL default '', frag varchar(255) NOT NULL default '', fragStart int(10) unsigned NOT NULL default '0', fragEnd int(10) unsigned NOT NULL default '0', strand char(1) NOT NULL default '', UNIQUE KEY chromStart (chromStart), UNIQUE KEY chromEnd (chromEnd), KEY bin (bin), KEY frag (frag(14)) ) TYPE=MyISAM; Thanks Guoliang On Fri, 6 Apr 2007, Rachel Harte wrote: > Guoliang, > > The coordinates in both the BLAT output and the database tables have > 0-based starts so the first base (5' most) on a chromosome or scaffold is > position 0. For the ends, the positions are 1-based. You are correct in > saying that the BLAT output and the database table coordinates are > directly compatible in this way. > > Rachel > > Rachel Harte > UCSC Genome Bioinformatics Group > http://genome.ucsc.edu > > > On Fri, 6 Apr 2007, Guoliang Xing wrote: > > > Hi Rachel, > > > > Thank you for the info. It is what I need. The next step for me is to > > figure out which tables to query once you release the new annotation. > > > > What I have now is the blat result of all the probes on the affy arrays > > against Human Genome build 18. > > > > I guess in your annotation, the gene (or exon) boundary coordinates are > > based on genome position from 5'-most base 1 against HG 18. If this is > > the case, I can combine my BLAT results with your gene annotation info > > directly. > > > > Then I can query my data gene-by-gene based on unique gene IDs, or do a > > "chromosome walk" from 5' to 3' and apply my statistical analysis in a > > flexible way. > > > > Please confirm if my guess is right, and if you have additional info > > please let me know. > > > > Thanks > > > > Guoliang > > > > On Fri, 6 Apr 2007, Rachel Harte wrote: > > > > > Guoliang, > > > > > > Currently, we have the "Known Genes" set of genes which has not been > > > updated recently. However, within the next few days we are releasing a new > > > gene set (UCSC Genes) that will be our new gold standard for gene > > > annotation - the method for producing this gene set has changed resulting > > > in an even higher quality set of annotations which will include more > > > splice variants and also non-protein-coding genes. > > > > > > Watch out for an announcement of the release of this gene set on our home > > > page. The track control will appear in the "Genes and Gene Prediction > > > Tracks" group under the Genome Browser image. By clicking on the link above > > > this track control, you will see a description of the track and a "View > > > Table Schema" link shows the main table for this track which contains the > > > alignment information for the genes. There are other connected tables that > > > are also listed on this page. > > > > > > I hope that this helps you. Please let us know if you have further > > > questions. > > > > > > Rachel > > > > > > Rachel Harte > > > UCSC Genome Bioinformatics Group > > > http://genome.ucsc.edu > > > > > > > > > On Thu, 5 Apr 2007, Guoliang Xing wrote: > > > > > > > Dear Genome help team, > > > > > > > > I want to code some Perl programs to compare different microarray > > > > signals on the same genes using UCSC genome browser annotation data. I'd > > > > like to download relevant gene track info in MySQL dataabse and run in > > > > local machine mode. > > > > > > > > Let's say Affy Expression array U133 plus2 vs. Affy U95, they have > > > > different probe coverages, but some of the probes mapped to the same gene. > > > > My intuition is to map different probes on the two different arrays to a > > > > common Unigene cluster id, and then run my statistical comparison based on > > > > common cluster ID, gene by gene. > > > > > > > > I know UCSC browser has the Affy array mapping info. > > > > > > > > But I don't know which track or mySQL UCSC table is the current gold > > > > starndard for gene annotation. > > > > > > > > Please advise. > > > > > > > > Thanks > > > > > > > > Guoliang > > > > _______________________________________________ > > > > Genome maillist - Genome at soe.ucsc.edu > > > > http://www.soe.ucsc.edu/mailman/listinfo/genome > > > > > > > > > > From dgrigor1 at jhmi.edu Mon Apr 9 09:02:55 2007 From: dgrigor1 at jhmi.edu (Dmitry) Date: Mon, 09 Apr 2007 12:02:55 -0400 Subject: [Genome] downloads for Variation and Repeats tracks In-Reply-To: <46158793.5070101@soe.ucsc.edu> Message-ID: <000e01c77ac0$88dfdb30$64b181a2@jhu8880e0e5aa9> Hi All, The provided Variation and Repeats datasets are very useful, thank you. But unfortunately I noticed that there a lot of tandem repeats that are not reported. For example in the 2000 bases upstream region of CXCL2 gene there is 22xCA Chr4 75180980 GTTGAAACACACACACACACACACACACACACACACACACACACACACACACGTGATA , which I couldn't find in nether Trf or Microsatellite datasets. Are those repeats skipped or masked purposely? If so where can I find unmasked datasets? Also you are not reporting tandem repeats that have less than 12 components. Is there a way to retrieve those as well? Thank you for your help Dmitry -----Original Message----- From: Ann Zweig [mailto:ann at soe.ucsc.edu] Sent: Thursday, April 05, 2007 7:35 PM To: Dmitry Cc: genome at soe.ucsc.edu Subject: Re: [Genome] downloads for Variation and Repeats tracks Hello Dmitry, You can download the MySQL tables underlying the data from our download server. Follow the 'Downloads' link from the blue navigation bar on the left side of the home page. From there, press Human, then under hg18 (or the assembly you are interested in) press Annotation Database. This directory contains a file for each table in the database. You will need to determine the name of the table that supports the track you are interested in. To do this in the genome browser, simply press the hyperlink for the name of the track in the track controls under the display. In the URL on this page, you will see (usually at the very end), "g=abc". The 'abc' is the name of the table that underlies this track. This is the table you will want to download from the Download server. For example, for the SNP track in the hg18 browser, the table name is snp126. Regards, ---------- Ann Zweig UCSC Genome Bioinformatics Group http://genome.ucsc.edu Please feel free to search the Genome mailing list archives by visiting our home page, clicking on "Contact Us", then typing a word or phrase into the search box. On that same page (http://genome.ucsc.edu/contacts.html), you can subscribe to the Genome mailing list. Dmitry wrote: > Hi, > > Where can I find and download psl files for the Variation and Repeats > tracks? > > Thank you > > Dmitry > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From hartera at soe.ucsc.edu Mon Apr 9 09:07:15 2007 From: hartera at soe.ucsc.edu (Rachel Harte) Date: Mon, 9 Apr 2007 09:07:15 -0700 (PDT) Subject: [Genome] gene walking & Unigene cluster mapping In-Reply-To: References: Message-ID: Hello Guoliang, The chrN_gold tables show the positions of the clones that were sequenced to create the genome assembly. They do not have an ID that relates directly to Known Genes. The kgID is in the kgXRef table and it is typically the Genbank accession. The paper that you read (Hsu et al. (2006)) about "The UCSC Known Genes" describes the old Known Genes process. This track is now called "Old Known Genes" on the human hg18 Genome Browser and the table for the gene alignments is knownGeneOld2. The kgXref table for Known Genes has been renamed as kgXrefOld2. This table has identifiers relating to other databases e.g. RefSeq, SWISS-PROT. The new gene set, UCSC Genes, that was released last week was produced by a new process and it is a high quality gene set that is more up to date than the Known Genes track. The old Known Genes track is a year old. The method used to create UCSC Genes is described on the track description page. You can find this either by clicking on the blue/gray button at the left side of the track in the Browser display or by clicking on the "UCSC Genes" link above the track control in the "Genes and Gene Prediction Tracks" group below the Browser image. This new UCSC Genes track also has a kgXref table but the kgID has a new format. This is so that we can produce IDs that can be easily tracked from one release of this gene set to the next. The format is ucNNNXXX.1 where N is a number and X is a lower case letter and the .1 is the version number. It starts with uc001aaa.1, then uc001aab.1, uc001aac.1 ... uc001aba.1, uc001abb.1 etc. I hope that this helps you. Rachel Rachel Harte UCSC Genome Bioinformatics Group http://genome.ucsc.edu On Sun, 8 Apr 2007, Guoliang Xing wrote: > Hi Rachel, > > Thanks for the info. I'll use base 0 for caculation. > > Now I am looking at the mySQL db dump at > > http://hgdownload.cse.ucsc.edu/goldenPath/hg18/database/ > > And I've read Fan Hsu's Bioinformatics paper "The UCSC Known Genes", it's > time to get to table details. > > My question is, which master table should I query to get chr-by-chr UCSC > known gene ID (kgID)? > > Is chr1_gold table the correct one to start for chr1? Is the bin field > equal to kgID described in Fan's paper? > > I've copied its table structure below: > > -- Table structure for table `chr1_gold` > -- > > CREATE TABLE chr1_gold ( > bin smallint(6) NOT NULL default '0', > chrom varchar(255) NOT NULL default '', > chromStart int(10) unsigned NOT NULL default '0', > chromEnd int(10) unsigned NOT NULL default '0', > ix int(11) NOT NULL default '0', > type char(1) NOT NULL default '', > frag varchar(255) NOT NULL default '', > fragStart int(10) unsigned NOT NULL default '0', > fragEnd int(10) unsigned NOT NULL default '0', > strand char(1) NOT NULL default '', > UNIQUE KEY chromStart (chromStart), > UNIQUE KEY chromEnd (chromEnd), > KEY bin (bin), > KEY frag (frag(14)) > ) TYPE=MyISAM; > > Thanks > > Guoliang > > > On Fri, 6 Apr 2007, Rachel Harte wrote: > > > Guoliang, > > > > The coordinates in both the BLAT output and the database tables have > > 0-based starts so the first base (5' most) on a chromosome or scaffold is > > position 0. For the ends, the positions are 1-based. You are correct in > > saying that the BLAT output and the database table coordinates are > > directly compatible in this way. > > > > Rachel > > > > Rachel Harte > > UCSC Genome Bioinformatics Group > > http://genome.ucsc.edu > > > > > > On Fri, 6 Apr 2007, Guoliang Xing wrote: > > > > > Hi Rachel, > > > > > > Thank you for the info. It is what I need. The next step for me is to > > > figure out which tables to query once you release the new annotation. > > > > > > What I have now is the blat result of all the probes on the affy arrays > > > against Human Genome build 18. > > > > > > I guess in your annotation, the gene (or exon) boundary coordinates are > > > based on genome position from 5'-most base 1 against HG 18. If this is > > > the case, I can combine my BLAT results with your gene annotation info > > > directly. > > > > > > Then I can query my data gene-by-gene based on unique gene IDs, or do a > > > "chromosome walk" from 5' to 3' and apply my statistical analysis in a > > > flexible way. > > > > > > Please confirm if my guess is right, and if you have additional info > > > please let me know. > > > > > > Thanks > > > > > > Guoliang > > > > > > On Fri, 6 Apr 2007, Rachel Harte wrote: > > > > > > > Guoliang, > > > > > > > > Currently, we have the "Known Genes" set of genes which has not been > > > > updated recently. However, within the next few days we are releasing a new > > > > gene set (UCSC Genes) that will be our new gold standard for gene > > > > annotation - the method for producing this gene set has changed resulting > > > > in an even higher quality set of annotations which will include more > > > > splice variants and also non-protein-coding genes. > > > > > > > > Watch out for an announcement of the release of this gene set on our home > > > > page. The track control will appear in the "Genes and Gene Prediction > > > > Tracks" group under the Genome Browser image. By clicking on the link above > > > > this track control, you will see a description of the track and a "View > > > > Table Schema" link shows the main table for this track which contains the > > > > alignment information for the genes. There are other connected tables that > > > > are also listed on this page. > > > > > > > > I hope that this helps you. Please let us know if you have further > > > > questions. > > > > > > > > Rachel > > > > > > > > Rachel Harte > > > > UCSC Genome Bioinformatics Group > > > > http://genome.ucsc.edu > > > > > > > > > > > > On Thu, 5 Apr 2007, Guoliang Xing wrote: > > > > > > > > > Dear Genome help team, > > > > > > > > > > I want to code some Perl programs to compare different microarray > > > > > signals on the same genes using UCSC genome browser annotation data. I'd > > > > > like to download relevant gene track info in MySQL dataabse and run in > > > > > local machine mode. > > > > > > > > > > Let's say Affy Expression array U133 plus2 vs. Affy U95, they have > > > > > different probe coverages, but some of the probes mapped to the same gene. > > > > > My intuition is to map different probes on the two different arrays to a > > > > > common Unigene cluster id, and then run my statistical comparison based on > > > > > common cluster ID, gene by gene. > > > > > > > > > > I know UCSC browser has the Affy array mapping info. > > > > > > > > > > But I don't know which track or mySQL UCSC table is the current gold > > > > > starndard for gene annotation. > > > > > > > > > > Please advise. > > > > > > > > > > Thanks > > > > > > > > > > Guoliang > > > > > _______________________________________________ > > > > > Genome maillist - Genome at soe.ucsc.edu > > > > > http://www.soe.ucsc.edu/mailman/listinfo/genome > > > > > > > > > > > > > > > From heather at soe.ucsc.edu Mon Apr 9 09:07:19 2007 From: heather at soe.ucsc.edu (Heather Trumbower) Date: Mon, 9 Apr 2007 09:07:19 -0700 (PDT) Subject: [Genome] Custom plots In-Reply-To: <461720FB.5020303@stanford.edu> References: <46030375.8090603@stanford.edu> <461720FB.5020303@stanford.edu> Message-ID: Anton: We could host your data at http://genome.ucsc.ed