From vkuryshev at tum.de Mon Oct 1 07:22:30 2007 From: vkuryshev at tum.de (Vladimir Kuryshev) Date: Mon, 01 Oct 2007 16:22:30 +0200 Subject: [Genome] from blast output to custom track Message-ID: <470102A6.8020705@tum.de> Dear UCSC Browser gurus, Do you know any simply way (with or without scripting) to convert blast output with contig hits coordinates (e.g., NW_001100389.1: 3121253-3121745 in Mmul_051212) to whole assembly form (e.g., chr14:91521199-91521691, rheMac2)? Would appreciate bvery much any advice. Sincerely, Vladimir From roedelsp at molgen.mpg.de Mon Oct 1 08:42:47 2007 From: roedelsp at molgen.mpg.de (roedelsp at molgen.mpg.de) Date: Mon, 1 Oct 2007 17:42:47 +0200 Subject: [Genome] Known Genes in UCSC Message-ID: <1191253367.4701157711aae@imp.molgen.mpg.de> Hello, I'd like to retrieve the complete gene annotation for hg18 for UTRs, exon and introns of all known genes, including gene ID, chromosome, start, end and strand. I would be very glad if anyone could help me. Thanks in advance Christian Roedelsperger From kayla at soe.ucsc.edu Mon Oct 1 10:35:34 2007 From: kayla at soe.ucsc.edu (Kayla Smith) Date: Mon, 01 Oct 2007 10:35:34 -0700 Subject: [Genome] query In-Reply-To: References: Message-ID: <47012FE6.1070409@cse.ucsc.edu> Hello Jamel, Our Table Browser can be used to retrieve the information you've requested. First click on the "Tables" button on the blue bar on the top of the main page. Then set the following options: clade: Vertebrate genome: Human assembly: Mar. 2006 group: Genes and Gene Prediction Tracks track: RefSeq Genes table: refGene position: chrX output format: sequence click "get output" select "genomic" and press "submit". Here you can choose to extract only exons, and if desired, you can ask for bases upstream and downstream of those exons. You could also select to have introns as well. Finally, you can click the radio button for one FASTA record per gene, or the other radio button for one FASTA record per exon. I hope this is helpful to you. Please don't hesitate to contact us again if you require further assistance. Kayla Smith UCSC Genome Bioinformatics Group Chelly Jamel wrote: > Dear Colleague, > I am an extensive user of the UCSC genome browser > and data base, and I would like to know if there > is an automatic way (I mean not by analyzing > "manually" gene by gene) to obtain a file with > only exonic and their flanking intronic sequences > and their position in the genome. For example a > file with exonic sequences (and their intronic > sequences) of all genes (refseq) on the X > chromosome. Of course, this file could be > constructed by analyzing manually each individual > gene, but such approach would take days and > days... > > Thank you in advance for your valuable help, > With my best regards > Jamel Chelly From kayla at soe.ucsc.edu Mon Oct 1 15:10:32 2007 From: kayla at soe.ucsc.edu (Kayla Smith) Date: Mon, 01 Oct 2007 15:10:32 -0700 Subject: [Genome] Known Genes in UCSC In-Reply-To: <1191253367.4701157711aae@imp.molgen.mpg.de> References: <1191253367.4701157711aae@imp.molgen.mpg.de> Message-ID: <47017058.90109@cse.ucsc.edu> Hello, Christian, You can either download the whole table here: http://hgdownload.cse.ucsc.edu/goldenPath/hg18/database/knownGene.txt.gz Or you can use the Table Browser to extract only the columns you are interested in. Click on "Tables" on the blue bar on the top of the main page. Then set the following settings: clade: Vertebrate genome: Human assembly: Mar. 2006 group: Genes and Gene Prediction Tracks track: UCSC Genes table: knownGene region: genome output format "selected firelds from primary and related tables click "get output" Check the boxes next to the relevant columns, and click "get output" again. I hope this is helpful to you. Please don't hesitate to contact us again if you require further assistance. Kayla Smith UCSC Genome Bioinformatics Group roedelsp at molgen.mpg.de wrote: > Hello, > > I'd like to retrieve the complete gene annotation for hg18 for UTRs, exon and > introns of all known genes, including gene ID, chromosome, start, end and strand. > I would be very glad if anyone could help me. > > Thanks in advance > > Christian Roedelsperger > > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From kayla at soe.ucsc.edu Mon Oct 1 16:17:10 2007 From: kayla at soe.ucsc.edu (Kayla Smith) Date: Mon, 01 Oct 2007 16:17:10 -0700 Subject: [Genome] from blast output to custom track In-Reply-To: <470102A6.8020705@tum.de> References: <470102A6.8020705@tum.de> Message-ID: <47017FF6.8000002@cse.ucsc.edu> Dear Vladimir: On our rheMac2 browser, the "Contigs" track has contigs with identifiers: 1099213919518 thru 1099214725199 The "Assembly" track has scaffolds and identifiers: 1099213921808 thru 1099803004090_2. We have this agp file available for download: http://hgdownload.cse.ucsc.edu/goldenPath/rheMac2/bigZips/rheMac2.agp.tar.gz If you could convert your set of names into the names above, we could look into rearranging this agp file into a lift file. I hope this is helpful. Please don't hesitate to contact us again if you require further assistance. Kayla Smith UCSC Genome Bioinformatics Group Vladimir Kuryshev wrote: > Dear UCSC Browser gurus, > > Do you know any simply way (with or without scripting) to convert blast > output with contig hits coordinates (e.g., NW_001100389.1: > 3121253-3121745 in Mmul_051212) to whole assembly form (e.g., > chr14:91521199-91521691, rheMac2)? > > Would appreciate bvery much any advice. > > Sincerely, > Vladimir > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From kayla at soe.ucsc.edu Mon Oct 1 16:35:50 2007 From: kayla at soe.ucsc.edu (Kayla Smith) Date: Mon, 01 Oct 2007 16:35:50 -0700 Subject: [Genome] pseudo autosomal regions in cow genome In-Reply-To: References: Message-ID: <47018456.3090702@cse.ucsc.edu> Dear Jo?o, The cow assembly has chrX but not chrY, so even if the cow chromosomes have a PAR, bosTau3 doesn't. But yes, the PAR is a mammalian thing, so cows have it too. If you google "cow pseudoautosomal region" the first match is a commentary on an article that used genomic alignments to look at the boundary of the region in a bunch of mammals, concluding that it had moved "fairly recently" in mammalian evolution (~26-70Mya). I hope this information is helpful to you. Please don't hesitate to contact us again if you require further assistance. Kayla Smith UCSC Genome Bioinformatics Group Jo?o Fadista wrote: > Dear all, > > I would like to know if there is any pseudo autosomal regions (PAR) in the cow genome, just like there is in the human genome where the chromosome Y have regions that have the exact same sequences as in some regions in the X chromosome. > > > Best regards > > Jo?o Fadista > Ph.d. student > > > > UNIVERSITY OF AARHUS > Faculty of Agricultural Sciences > Dept. of Genetics and Biotechnology > Blichers All? 20, P.O. BOX 50 > DK-8830 Tjele > > Phone: +45 8999 1900 > Direct: +45 8999 1900 > E-mail: Joao.Fadista at agrsci.dk > Web: www.agrsci.org > ________________________________ > > News and news media . > > This email may contain information that is confidential. Any use or publication of this email without written permission from Faculty of Agricultural Sciences is not allowed. If you are not the intended recipient, please notify Faculty of Agricultural Sciences immediately and delete this email. > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From moushengxu at gmail.com Tue Oct 2 12:48:14 2007 From: moushengxu at gmail.com (mousheng xu) Date: Tue, 2 Oct 2007 15:48:14 -0400 Subject: [Genome] [Question] discrepancy of multiz17way score between the database and the browser Message-ID: <5b9ceee40710021248v34170dbeu6f2bae6bb1f46c2e@mail.gmail.com> Dear Genome Help, If you do a ""select * from multiz17way where chrom='chr6' and chromStart = 31647790", you will get the following return: 826 chr6 31647790 31649567 2389798 430695879 0.698777 Which means that the multiz17way score for the region chr6:31647790-31649567 (total 1778 bps) is 0.698777. But if you take this region into the browser, the "17-Way Cons" track shown on the top as "Conservation" shows a lot of variation (not a constant value of 0.698777) with peaks and dips. Does the browser combine other information to compute the "Conservation"? If I am interested in getting the conservation of a SNP rs909253, how can I get it? Thanks a lot! Mousheng Xu Research Fellow, BWH, Harvard Medical School From mss at berkeley.edu Tue Oct 2 13:05:43 2007 From: mss at berkeley.edu (Mark Schlissel) Date: Tue, 2 Oct 2007 13:05:43 -0700 Subject: [Genome] GFF format question for custom track Message-ID: <7C0439E6-BF66-4C46-BCED-1A16BA10E833@berkeley.edu> Dear Community, I'm a relative novice at this. I've uploaded a GFF file from Nimblegen from a genomic tiling array project we're doing. The UCSC browser correctly reads all of the data EXCEPT the critical sixth field, score. My scores are log2 values of relative hybridization with three significant figures (i.e. 0.47 or 1.26 or -0.05). The Browser is interpreting these by only using the first digit, 0, 1, or 0 in my example, and ignoring the decimal place and sign making the display useless for examining our anti-histone ChIP data. How can I get the browser to read a GFF file whose "score" field has a decimal point value? THANKS. Mark our data file-- chr10 NimbleScan 3014102:3T3:meH3K4/3T3:INPUT:BLOCK1 126463268 126463327 0.45 . . seq_id=chr10:126463133-126470241;probe_id=CHR10FS126463268;count=1 chr10 NimbleScan 3014102:3T3:meH3K4/3T3:INPUT:BLOCK1 126463441 126463500 1.07 . . seq_id=chr10:126463133-126470241;probe_id=CHR10FS126463441;count=1 table output from UCSC Browser-- #chrom chromStart chromEnd name score strand thickStart thickEnd itemRgb blockCount blockSizes chromStarts chr10 126463267 126463327 seq_id=chr10:126463133-126470241 0 . 0 0 0,0,0 1 60, 0, chr10 126463440 126463500 seq_id=chr10:126463133-126470241 1 . 0 0 0,0,0 1 60, 0, I've bold-faced the problematic "score" data Mark Schlissel M.D., Ph.D. Professor of Immunology Professor of Biochemistry & Molecular BIology UC-Berkeley Department of Molecular & Cell Biology 439 Life Science Addition (#3200) Berkeley, CA 94720-3200 510-643-2462 (office) 510-642-6845 (Fax) mss at berkeley.edu From hiram at soe.ucsc.edu Tue Oct 2 14:22:54 2007 From: hiram at soe.ucsc.edu (Hiram Clawson) Date: Tue, 02 Oct 2007 14:22:54 -0700 Subject: [Genome] Problem of uploading custom track In-Reply-To: References: Message-ID: <4702B6AE.7090401@soe.ucsc.edu> Good Afternoon Koichi: After looking at the example data set you forwarded to me, I have found some conversion code that we could improve in the browser to handle the larger gff file that is failing to load for you. We will attempt to fix this in the software in the near future. In the meantime, after looking at your gff file, it appears that it would be more efficient to encode your data as a simple 4-column bed file which would require no conversion to display in the browser. To convert your gff file to a 4-column bed file, select columns 1,4,5,2 from your gff file, e.g.: awk '{printf "%s\t%d\t%d\t%s\n", $1, $4-1, $5, $2}' yourData.gff > yourData.bed Then try loading that bed file. The other columns of your gff file appear to be redundant information to these four columns. --Hiram Koichi Ichimura wrote: > Hello, > > I have been trying to update my custom annotation track (GFF format, > 18MB, uncompressed) to Human Genome Browser, however Internet Explorer > displays error message "Internet Explorer cannot display the webpage" > after 5 minutes or so before it complete uploading. I appreciate your > help. From archanat at soe.ucsc.edu Tue Oct 2 15:14:49 2007 From: archanat at soe.ucsc.edu (Archana Thakkapallayil) Date: Tue, 02 Oct 2007 15:14:49 -0700 Subject: [Genome] GFF format question for custom track In-Reply-To: <7C0439E6-BF66-4C46-BCED-1A16BA10E833@berkeley.edu> References: <7C0439E6-BF66-4C46-BCED-1A16BA10E833@berkeley.edu> Message-ID: <4702C2D9.1040900@soe.ucsc.edu> Hello Mark, Our BED format (into which PSL and GFF are translated for storage) expects an integer score from 0 to 1000. Besides rounding to integer, we don't do any transforms on incoming GFF scores. So if you could transform your score into integer 0..1000, the score would be preserved. If your custom track has many data points, a wiggle custom track might also be a good way to plot the scores, and it does transform incoming scores into its compressed range. I hope that this helps. If you have any further questions please don't hesitate to contact us again. Regards, Archana UCSC Genome Bioinformatics Group Mark Schlissel wrote: > Dear Community, > > I'm a relative novice at this. I've uploaded a GFF file from > Nimblegen from a genomic tiling array project we're doing. The UCSC > browser correctly reads all of the data EXCEPT the critical sixth > field, score. My scores are log2 values of relative hybridization > with three significant figures (i.e. 0.47 or 1.26 or -0.05). The > Browser is interpreting these by only using the first digit, 0, 1, or > 0 in my example, and ignoring the decimal place and sign making the > display useless for examining our anti-histone ChIP data. > > How can I get the browser to read a GFF file whose "score" field has > a decimal point value? > > THANKS. > > Mark > > > our data file-- > > chr10 NimbleScan 3014102:3T3:meH3K4/3T3:INPUT:BLOCK1 > 126463268 126463327 0.45 . . > seq_id=chr10:126463133-126470241;probe_id=CHR10FS126463268;count=1 > chr10 NimbleScan 3014102:3T3:meH3K4/3T3:INPUT:BLOCK1 > 126463441 126463500 1.07 . . > seq_id=chr10:126463133-126470241;probe_id=CHR10FS126463441;count=1 > > > table output from UCSC Browser-- > > #chrom chromStart chromEnd name score strand thickStart thickEnd > itemRgb blockCount blockSizes chromStarts > chr10 126463267 126463327 seq_id=chr10:126463133-126470241 0 . 0 0 > 0,0,0 1 60, 0, > chr10 126463440 126463500 seq_id=chr10:126463133-126470241 1 . 0 0 > 0,0,0 1 60, 0, > > I've bold-faced the problematic "score" data > > > Mark Schlissel M.D., Ph.D. > Professor of Immunology > Professor of Biochemistry & Molecular BIology > UC-Berkeley > Department of Molecular & Cell Biology > 439 Life Science Addition (#3200) > Berkeley, CA 94720-3200 > 510-643-2462 (office) > 510-642-6845 (Fax) > mss at berkeley.edu > > > > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome > From archanat at soe.ucsc.edu Tue Oct 2 15:31:24 2007 From: archanat at soe.ucsc.edu (Archana Thakkapallayil) Date: Tue, 02 Oct 2007 15:31:24 -0700 Subject: [Genome] [Question] discrepancy of multiz17way score between the database and the browser In-Reply-To: <5b9ceee40710021248v34170dbeu6f2bae6bb1f46c2e@mail.gmail.com> References: <5b9ceee40710021248v34170dbeu6f2bae6bb1f46c2e@mail.gmail.com> Message-ID: <4702C6BC.8080907@soe.ucsc.edu> Hello Mousheng Xu, There are 2 different scores associated with the Conservation track: the multiple alignment score associated with the multiz* tables and the phastCons conservation score associated with the phastCons*way tables, which draws the blue wiggle Conservation on the top. The conservation scores are displayed as a "wiggle" (histogram), where the height reflects the size of the score. You can obtain these scores from the conservation scores downloads for the Human assembly. However, if you want to retrieve the scores associated with specific chromosomal ranges, it's easiest to use the Table Browser. To do this: 1. Set the clade, genome, and assembly to the appropriate genome. 2. Set group=Comparative Genomics, track=Conservation, table=phastCons*way, and region=the chromosomal position in which you're interested. 3. Set output format=data points, then click "get output". This will return the conservation scores associated with each base position within your range. You can read more about the methods used for our Conservation track in the description page here: http://www.genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg18&g=multiz17way See this help page from our website for more details about the phastCons file format: http://genome.ucsc.edu/goldenPath/help/phastCons.html Please see this answer to a previously answered mailing list question that explains how to get the conservation score from rs numbers: http://www.soe.ucsc.edu/pipermail/genome/2006-February/009806.html I hope this helps to answer your question. If you have any further questions please don't hesitate to contact us again. Regards, Archana UCSC Genome Bioinformatics Group mousheng xu wrote: > Dear Genome Help, > > If you do a ""select * from multiz17way where chrom='chr6' and chromStart = > 31647790", you will get the following return: > > 826 chr6 31647790 31649567 2389798 430695879 > 0.698777 > > Which means that the multiz17way score for the region chr6:31647790-31649567 > (total 1778 bps) is 0.698777. But if you take this region into the browser, > the "17-Way Cons" track shown on the top as "Conservation" shows a lot of > variation (not a constant value of 0.698777) with peaks and dips. > > Does the browser combine other information to compute the "Conservation"? If > I am interested in getting the conservation of a SNP rs909253, how can I get > it? > > Thanks a lot! > > Mousheng Xu > > Research Fellow, > BWH, Harvard Medical School > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome > From Charleston_Chiang at hms.harvard.edu Tue Oct 2 19:32:46 2007 From: Charleston_Chiang at hms.harvard.edu (Chiang, Charleston Wen-Kai) Date: Tue, 2 Oct 2007 22:32:46 -0400 Subject: [Genome] changing default display size on genome browser Message-ID: <81435EDDF9833B4FB76790B081965223CEE1EB@MAILSERVER02.MED.HARVARD.EDU> Hi, When I searched the genome browser for a SNP (say, rs363153) in the search box at the Gateway page, the first screen I get to is the display of a region 501 bps long (250 bps left and right of the SNP). Is there a way to change the default setting such that this initial screen would display a larger area around the SNP, so that I will not have to zoom out multiple times every time I searched a SNP. Some specification: browser: firefox 2.0.0.7 os: windows XP Sincerely, Charleston From rhead at soe.ucsc.edu Wed Oct 3 10:18:19 2007 From: rhead at soe.ucsc.edu (Brooke Rhead) Date: Wed, 03 Oct 2007 10:18:19 -0700 Subject: [Genome] changing default display size on genome browser In-Reply-To: <81435EDDF9833B4FB76790B081965223CEE1EB@MAILSERVER02.MED.HARVARD.EDU> References: <81435EDDF9833B4FB76790B081965223CEE1EB@MAILSERVER02.MED.HARVARD.EDU> Message-ID: <4703CEDB.5080101@soe.ucsc.edu> Hello Charleston, The padding of 250 bases on either side of a SNP is a setting that we configure here, and it is applicable to all users. There is not a way to change that setting on your end (unless you want to set up a mirror of the Genome Browser). There are a couple of work-arounds you might try. First, another way to zoom out quickly with a single click is to click on the bands in the chromosome ideogram above the Genome Browser display. This will zoom out the display to the width of the band you click in the chromosome image. However, this might be too drastic -- the image will likely be zoomed out too far. Another possibility, if you are working with a particular gene and have one of the gene tracks turned on, is to click on a gene that encompasses your SNP, and then on the gene details page, click on the "position" link. This will zoom you out to the level of the gene. (Note that for the "UCSC Genes" track, the position link is under the "Sequence and Links to Tools and Databases" section and is simply called "Genome Browser".) I hope this information is useful. If you have further questions, please do not hesitate to contact us again. -- Brooke Rhead UCSC Genome Bioinformatics Group Chiang, Charleston Wen-Kai wrote: > Hi, > > When I searched the genome browser for a SNP (say, rs363153) in the > search box at the Gateway page, the first screen I get to is the > display of a region 501 bps long (250 bps left and right of the SNP). > Is there a way to change the default setting such that this initial > screen would display a larger area around the SNP, so that I will not > have to zoom out multiple times every time I searched a SNP. > > Some specification: > browser: firefox 2.0.0.7 > os: windows XP > > Sincerely, Charleston > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From deepreds at ucla.edu Wed Oct 3 11:21:39 2007 From: deepreds at ucla.edu (Namshin Kim) Date: Wed, 3 Oct 2007 11:21:39 -0700 Subject: [Genome] apiMel3 genome assembly Message-ID: <3658dd5f0710031121y1b5d017br9d6caa6d63c20977@mail.gmail.com> Hi, Could you upload apiMel3 genome assembly used for dm3-apiMel3 pairwise alignments and dm3-multiz15way? I couldn't find apiMel3 genome assembly in download page. By the way, it would be great if you provide with UCSC version of apiMel4 (I saw that latest assembly for honeybee is apiMel4). Thanks, Namshin Kim From Ron_Shigeta at affymetrix.com Wed Oct 3 12:13:12 2007 From: Ron_Shigeta at affymetrix.com (Shigeta, Ron) Date: Wed, 3 Oct 2007 12:13:12 -0700 Subject: [Genome] custom track size limits Message-ID: I'm putting together BED files for our Tiling Array Products and they are pretty large. Is there a size limitation for custom track uploads? Its about 250MB when gzipped. Thanks, ron Ron Shigeta Ph.D.|Team Lead, NetAffx| Affymetrix 6550 Vallejo St .Suite 100 Emeryville, CA 94608 | 510 428 8547 From rhead at soe.ucsc.edu Wed Oct 3 15:44:04 2007 From: rhead at soe.ucsc.edu (Brooke Rhead) Date: Wed, 03 Oct 2007 15:44:04 -0700 Subject: [Genome] custom track size limits In-Reply-To: References: Message-ID: <47041B34.30603@soe.ucsc.edu> Hello Ron, There is only one programmed size limit for custom tracks: 300,000,000 data points in a single wiggle track submission (which doesn't apply in your case). Please see this previously answered mailing list question for a discussion of the practical limits you might encounter and how to work around them: http://www.soe.ucsc.edu/pipermail/genome/2007-April/013324.html Please write back to this list if you have further questions. -- Brooke Rhead UCSC Genome Bioinformatics Group Shigeta, Ron wrote: > I'm putting together BED files for our Tiling Array Products and they > are pretty large. > > Is there a size limitation for custom track uploads? Its about 250MB > when gzipped. > > Thanks, > > ron > > Ron Shigeta Ph.D.|Team Lead, NetAffx| Affymetrix > 6550 Vallejo St .Suite 100 Emeryville, CA 94608 | 510 428 8547 > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From rhead at soe.ucsc.edu Wed Oct 3 16:37:22 2007 From: rhead at soe.ucsc.edu (Brooke Rhead) Date: Wed, 03 Oct 2007 16:37:22 -0700 Subject: [Genome] changing default display size on genome browser In-Reply-To: <4703CEDB.5080101@soe.ucsc.edu> References: <81435EDDF9833B4FB76790B081965223CEE1EB@MAILSERVER02.MED.HARVARD.EDU> <4703CEDB.5080101@soe.ucsc.edu> Message-ID: <470427B2.2080005@soe.ucsc.edu> Hello again Charleston, A colleague came up with another solution. If you have a list of SNP identifiers of interest, you could make a custom track of those SNPs with some padding on either side, then select the custom track in the Table Browser with the output format "hyperlinks to Genome Browser" option selected. This will give you a clickable list of your SNPs that will open a new Genome Browser centered around the SNP you clicked. To do this, first go to the Table Browser and select the SNP track and table. Paste the list of rsIDs in the "identifiers (names/accessions):" field. Choose "selected fields" output and get the fields chrom, chromStart, chromEnd, and name. At this point you will need to use your own program (or send your results to the Galaxy website) to subtract some number (1,000 say) from chromStart and add to chromEnd. Now you can take that data and re-upload it as a custom track. You can hit the "add custom track" button on the main Genome Browser page and paste the data there to create the custom track. To get a list of hyperlinks, go back to the Table Browser, select the custom track you just made, and choose "hyperlinks to Genome Browser" as the output format. Now you should have a list of links that will go to each SNP, with 2,000 bases surrounding it (if you added and subtracted 1,000 bases from either end in the previous step). Note that with this method, the SNP of interest will no longer be highlighted in the Genome Browser unless you previously entered it into the position/search box. However, it will be centered in the Genome Browser display. I hope this information helps. -- Brooke Rhead UCSC Genome Bioinformatics Group Brooke Rhead wrote: > Hello Charleston, > > The padding of 250 bases on either side of a SNP is a setting that we > configure here, and it is applicable to all users. There is not a way > to change that setting on your end (unless you want to set up a mirror > of the Genome Browser). > > There are a couple of work-arounds you might try. First, another way to > zoom out quickly with a single click is to click on the bands in the > chromosome ideogram above the Genome Browser display. This will zoom > out the display to the width of the band you click in the chromosome > image. However, this might be too drastic -- the image will likely be > zoomed out too far. Another possibility, if you are working with a > particular gene and have one of the gene tracks turned on, is to click > on a gene that encompasses your SNP, and then on the gene details page, > click on the "position" link. This will zoom you out to the level of > the gene. (Note that for the "UCSC Genes" track, the position link is > under the "Sequence and Links to Tools and Databases" section and is > simply called "Genome Browser".) > > I hope this information is useful. If you have further questions, > please do not hesitate to contact us again. > > -- > Brooke Rhead > UCSC Genome Bioinformatics Group > > > > Chiang, Charleston Wen-Kai wrote: >> Hi, >> >> When I searched the genome browser for a SNP (say, rs363153) in the >> search box at the Gateway page, the first screen I get to is the >> display of a region 501 bps long (250 bps left and right of the SNP). >> Is there a way to change the default setting such that this initial >> screen would display a larger area around the SNP, so that I will not >> have to zoom out multiple times every time I searched a SNP. >> >> Some specification: > > browser: firefox 2.0.0.7 > > os: windows XP >> Sincerely, Charleston > > > > _______________________________________________ >> Genome maillist - Genome at soe.ucsc.edu >> http://www.soe.ucsc.edu/mailman/listinfo/genome > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From yongmei_ji at merck.com Wed Oct 3 17:34:26 2007 From: yongmei_ji at merck.com (Ji, Yongmei) Date: Wed, 3 Oct 2007 17:34:26 -0700 Subject: [Genome] Hg18 to mm8 net files In-Reply-To: References: Message-ID: <9BEE7CC4462DB14997A5C8CF8F3BEB0201A926ED@ussemx1100.merck.com> Dear UCSC Genome Browser, Could you please still provide the following files in your ftp site, as you used to? ftp://hgdownload.cse.ucsc.edu/goldenPath/hg18/database/netMm8.txt.gz ftp://hgdownload.cse.ucsc.edu/goldenPath/hg18/database/netMm8.sql You have a netMm9.txt.gz file in this directory, but the netMm8.txt.gz was deleted. Could you put them back? Thanks, Yongmei ------------------------------------------------------------------------------ Notice: This e-mail message, together with any attachments, contains information of Merck & Co., Inc. (One Merck Drive, Whitehouse Station, New Jersey, USA 08889), and/or its affiliates (which may be known outside the United States as Merck Frosst, Merck Sharp & Dohme or MSD and in Japan, as Banyu - direct contact information for affiliates is available at http://www.merck.com/contact/contacts.html) that may be confidential, proprietary copyrighted and/or legally privileged. It is intended solely for the use of the individual or entity named on this message. If you are not the intended recipient, and have received this message in error, please notify us immediately by reply e-mail and then delete it from your system. ------------------------------------------------------------------------------ From GaoJ at NEI.NIH.GOV Thu Oct 4 08:24:28 2007 From: GaoJ at NEI.NIH.GOV (Gao, James (NIH/NEI) [C]) Date: Thu, 4 Oct 2007 11:24:28 -0400 Subject: [Genome] Retrieve chromosomal positions between STS markers In-Reply-To: References: Message-ID: Hello, In the human Genome browser, one can enter a pair of STS markers in the search term box, and the browser will display a region of the chromosome, such as entering RH18061;RH80175, the browser will display chr7:27,169,003-27,172,296. Here is my question: I have a list of pairs of these STS markers, can I use your MySQL database to retrieve those regions of chromosomes as displayed inside the browser? Thank you. James From asaflev1 at post.tau.ac.il Thu Oct 4 09:45:04 2007 From: asaflev1 at post.tau.ac.il (asaflev1 at post.tau.ac.il) Date: Thu, 4 Oct 2007 18:45:04 +0200 Subject: [Genome] liftover files mm6->mm9 Message-ID: <1191516304.4705189054057@webmail.tau.ac.il> Hi, Can you please supply liftover files for converting mouse mm6 assembly to mm9? Regards, Asaf Levy TAU ---------------------------------------------------------------- This message was sent using IMP, the Internet Messaging Program. From kayla at soe.ucsc.edu Thu Oct 4 11:48:44 2007 From: kayla at soe.ucsc.edu (Kayla Smith) Date: Thu, 04 Oct 2007 11:48:44 -0700 Subject: [Genome] liftover files mm6->mm9 In-Reply-To: <1191516304.4705189054057@webmail.tau.ac.il> References: <1191516304.4705189054057@webmail.tau.ac.il> Message-ID: <4705358C.9010004@cse.ucsc.edu> Asaf, You can use the mm6 --> mm8 liftOver file, and then lift again from mm8 --> mm9. http://hgdownload.cse.ucsc.edu/goldenPath/mm6/liftOver/mm6ToMm8.over.chain.gz http://hgdownload.cse.ucsc.edu/goldenPath/mm8/liftOver/mm8ToMm9.over.chain.gz I hope this information is helpful to you. Please don't hesitate to contact us again if you require further assistance. Kayla Smith UCSC Genome Bioinformatics Group asaflev1 at post.tau.ac.il wrote: > > Hi, > Can you please supply liftover files for converting mouse mm6 assembly to mm9? > > Regards, > Asaf Levy > TAU > > ---------------------------------------------------------------- > This message was sent using IMP, the Internet Messaging Program. > > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From kayla at soe.ucsc.edu Thu Oct 4 12:05:26 2007 From: kayla at soe.ucsc.edu (Kayla Smith) Date: Thu, 04 Oct 2007 12:05:26 -0700 Subject: [Genome] Hg18 to mm8 net files In-Reply-To: <9BEE7CC4462DB14997A5C8CF8F3BEB0201A926ED@ussemx1100.merck.com> References: <9BEE7CC4462DB14997A5C8CF8F3BEB0201A926ED@ussemx1100.merck.com> Message-ID: <47053976.3050203@cse.ucsc.edu> Hello Yongmei, We don't keep old versions of net/chain files in the directory you have specified. Please see if this file will suit your purposes: http://hgdownload.cse.ucsc.edu/goldenPath/hg18/vsMm8/hg18.mm8.net.gz I hope this is helpful to you. Please don't hesitate to contact us again if you require further assistance. Kayla Smith UCSC Genome Bioinformatics Group Ji, Yongmei wrote: > Dear UCSC Genome Browser, > > Could you please still provide the following files in your ftp site, as > you used to? > ftp://hgdownload.cse.ucsc.edu/goldenPath/hg18/database/netMm8.txt.gz > ftp://hgdownload.cse.ucsc.edu/goldenPath/hg18/database/netMm8.sql > > You have a netMm9.txt.gz file in this directory, but the netMm8.txt.gz > was deleted. Could you put them back? > > Thanks, > Yongmei > > > ------------------------------------------------------------------------------ > Notice: This e-mail message, together with any attachments, contains > information of Merck & Co., Inc. (One Merck Drive, Whitehouse Station, > New Jersey, USA 08889), and/or its affiliates (which may be known > outside the United States as Merck Frosst, Merck Sharp & Dohme or MSD > and in Japan, as Banyu - direct contact information for affiliates is > available at http://www.merck.com/contact/contacts.html) that may be > confidential, proprietary copyrighted and/or legally privileged. It is > intended solely for the use of the individual or entity named on this > message. If you are not the intended recipient, and have received this > message in error, please notify us immediately by reply e-mail and then > delete it from your system. > > ------------------------------------------------------------------------------ > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From gbell at wi.mit.edu Fri Oct 5 08:20:52 2007 From: gbell at wi.mit.edu (George Bell) Date: Fri, 05 Oct 2007 11:20:52 -0400 Subject: [Genome] Adding a "miRNA sites" track to the human genome browser? Message-ID: <47065654.5050603@wi.mit.edu> Hi UCSC Genome Bioinformatics people, We'd like to submit data on microRNA sites to your current human genome browser. We currently have this data shown as links from the TargetScan.org web site, for example, http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg18&position=chr6:135580812-135582003&hgt.customText=http://jura.wi.mit.edu/targetscan/vert_40/ucsc/NR/hg18ConsChr6.bed and would like this same data to be available under the "Expression and Regulation" section of the main track list as "miRNA sites". Is this possible? You did this with a previous TargetScan-generated dataset that is displayed with the hg17 assembly as described here: http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg17&g=targetScanS We also have similar data for mouse (mm8), rat (rn4) and dog (canFam2) and would be interested in sharing those datasets too. If this submission is possible, let us know how to proceed. George ******************************************** George Bell, Ph.D. Senior Bioinformatics Scientist, Bioinformatics and Research Computing Whitehead Institute for Biomedical Research Room 209 9 Cambridge Center Cambridge, MA 02142 Tel.: (617) 258-5747 From bina at purdue.edu Fri Oct 5 11:58:15 2007 From: bina at purdue.edu (bina at purdue.edu) Date: Fri, 5 Oct 2007 14:58:15 -0400 Subject: [Genome] custom track Message-ID: <1191610695.470689476fe55@webmail.purdue.edu> Hello I would like to create a custom track to display potential TF binding site, as you have done on the browser for TFBS conserved. How do I do that? Minou Bina Purdue U From ann at soe.ucsc.edu Fri Oct 5 12:19:55 2007 From: ann at soe.ucsc.edu (Ann Zweig) Date: Fri, 05 Oct 2007 12:19:55 -0700 Subject: [Genome] custom track In-Reply-To: <1191610695.470689476fe55@webmail.purdue.edu> References: <1191610695.470689476fe55@webmail.purdue.edu> Message-ID: <47068E5B.5020903@cse.ucsc.edu> Hello Minou, You can read about the details behind a track (description, methods, display, credits, references) by pressing on the 'mini-button' to the left of the actual track display, or by clicking on the hyperlinked track name in the track controls (below the display). In this case, open the hg17 browser and press the 'TFBS Conserved' hyper link. It gives detailed step-by-step instructions on how we created this track. Regards, ---------- Ann Zweig UCSC Genome Bioinformatics Group http://genome.ucsc.edu Please feel free to search the Genome mailing list archives by visiting our home page, clicking on "Contact Us", then typing a word or phrase into the search box. On that same page (http://genome.ucsc.edu/contacts.html), you can subscribe to the Genome mailing list. bina at purdue.edu wrote: > > Hello > > I would like to create a custom track to display potential TF binding site, as > you have done on the browser for TFBS conserved. > > How do I do that? > > > Minou Bina > Purdue U > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From aguo at vcu.edu Fri Oct 5 15:16:38 2007 From: aguo at vcu.edu (Anyuan Guio) Date: Fri, 05 Oct 2007 15:16:38 -0700 Subject: [Genome] question about UCSC STS Message-ID: <4706B7C6.5060100@vcu.edu> Dear colleague, Thank you for your excellent work on UCSC genome browser. But now I have a question about it. I find the physical location of some STS markers on UCSC are not consistent with the NCBI UniSTS. Such as marker AFM280ZA5 (alias D1S469), the location start at 221008275 and end at 221008670 on chr 1 on UCSC http://genome.ucsc.edu/cgi-bin/hgc?hgsid=98338741&o=221008274&t=221008670&g=stsMap&i=AFM280ZA5&c=chr1&l=220908274&r=221108670&db=hg18&pix=620 But the same marker is located from 221008331 to 221008659 of chr 1 annotated on NCBI UniSTS (http://www.ncbi.nlm.nih.gov/genome/sts/sts.cgi?uid=65827). I noticed that the NCBI current human assembly version is Build 36.2. However this Build 36.2 is identical to build 36.1 as NCBI release notes described. (http://www.ncbi.nlm.nih.gov/genome/guide/human/release_notes.html) And the UCSC human genome version is Human Mar. 2006 (hg18) assembly, based on the NCBI Build 36.1 as described on http://genome.ucsc.edu/cgi-bin/hgGateway. So, the two human genome assembly is the same, but a STS marker located on different physical location on UCSC and NCBI. Why? Which can I used? Thanks very much! Anyuan Guo From ann at soe.ucsc.edu Fri Oct 5 12:27:19 2007 From: ann at soe.ucsc.edu (Ann Zweig) Date: Fri, 05 Oct 2007 12:27:19 -0700 Subject: [Genome] Adding a "miRNA sites" track to the human genome browser? In-Reply-To: <47065654.5050603@wi.mit.edu> References: <47065654.5050603@wi.mit.edu> Message-ID: <47069017.2050601@cse.ucsc.edu> Hello George, We would very much like to host these data for all of the organisms you have mentioned. Let's converse off-list as to how we can best do this. Regards, ---------- Ann Zweig UCSC Genome Bioinformatics Group http://genome.ucsc.edu George Bell wrote: > Hi UCSC Genome Bioinformatics people, > > We'd like to submit data on microRNA sites to your current human genome > browser. We currently have this data shown as links from the > TargetScan.org web site, for example, > > http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg18&position=chr6:135580812-135582003&hgt.customText=http://jura.wi.mit.edu/targetscan/vert_40/ucsc/NR/hg18ConsChr6.bed > > and would like this same data to be available under the "Expression and > Regulation" section of the main track list as "miRNA sites". Is this > possible? You did this with a previous TargetScan-generated dataset > that is displayed with the hg17 assembly as described here: > http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg17&g=targetScanS > > We also have similar data for mouse (mm8), rat (rn4) and dog (canFam2) > and would be interested in sharing those datasets too. > > If this submission is possible, let us know how to proceed. > > George > > > ******************************************** > George Bell, Ph.D. > Senior Bioinformatics Scientist, Bioinformatics and Research Computing > Whitehead Institute for Biomedical Research > Room 209 > 9 Cambridge Center > Cambridge, MA 02142 > Tel.: (617) 258-5747 > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From hiram at soe.ucsc.edu Fri Oct 5 12:30:25 2007 From: hiram at soe.ucsc.edu (Hiram Clawson) Date: Fri, 05 Oct 2007 12:30:25 -0700 Subject: [Genome] custom track In-Reply-To: <1191610695.470689476fe55@webmail.purdue.edu> References: <1191610695.470689476fe55@webmail.purdue.edu> Message-ID: <470690D1.9050303@soe.ucsc.edu> Good Afternoon Professor Bina: Is there a particular functionality you see working on this track that you would like to reproduce ? This track is essentially a simple bed-type of track with the fields: chrom, chromStart, chromEnd, name, score and strand. They do have an extra zScore column but that can not be reproduced in a bed-type of track unless you transformed that floating point number into an integer in the range 0 to 1000 and used that in the score column. --Hiram bina at purdue.edu wrote: > Hello > > I would like to create a custom track to display potential TF binding site, as > you have done on the browser for TFBS conserved. > > How do I do that? > > Minou Bina > Purdue U From rhead at soe.ucsc.edu Fri Oct 5 13:04:51 2007 From: rhead at soe.ucsc.edu (Brooke Rhead) Date: Fri, 05 Oct 2007 13:04:51 -0700 Subject: [Genome] apiMel3 genome assembly In-Reply-To: <3658dd5f0710031121y1b5d017br9d6caa6d63c20977@mail.gmail.com> References: <3658dd5f0710031121y1b5d017br9d6caa6d63c20977@mail.gmail.com> Message-ID: <470698E3.4030509@soe.ucsc.edu> Hello Namshin, We have made the apiMel3 genome assembly available on our test server, here: http://genome-test.cse.ucsc.edu/goldenPath/apiMel3/bigZips/ (also available by chromosome, here: http://genome-test.cse.ucsc.edu/goldenPath/apiMel3/chromosomes/ ) Regarding using the latest honeybee assembly, apiMel4: given our present direction that focuses on vertebrates and model organisms, we currently don't have plans to put out an updated browser for the honeybee. If you have any further questions, please do not hesitate to contact us again. -- Brooke Rhead UCSC Genome Bioinformatics Group Namshin Kim wrote: > Hi, > Could you upload apiMel3 genome assembly used for dm3-apiMel3 pairwise > alignments and dm3-multiz15way? I couldn't find apiMel3 genome assembly in > download page. > > By the way, it would be great if you provide with UCSC version of apiMel4 (I > saw that latest assembly for honeybee is apiMel4). > > Thanks, > Namshin Kim > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From Gonzales.Patrick at mayo.edu Fri Oct 5 13:14:44 2007 From: Gonzales.Patrick at mayo.edu (Gonzales, Patrick R.) Date: Fri, 5 Oct 2007 15:14:44 -0500 Subject: [Genome] X and Y Pseudoautosomal Regions Message-ID: <572057D3BDD52A46BD05BC6DA5068611EE4ADB@MSGEBE22.mfad.mfroot.org> Why are the ~100% identical pseudoautosomal regions on Xp/Yp and Xq/Yq not fully shown in the "Segmental Duplications" track of the genome browser? This would be very helpful to investigators interested in these regions to have them clearly demarcated. You mention the defined coordinates under the "Genomes" tab. Thanks! Pat Patrick R. Gonzales, MS Clinical Development Technologist Cytogenetics Array CGH Mayo Clinic Hilton 932 (507)284-8338 gonzales.patrick at mayo.edu From ann at soe.ucsc.edu Fri Oct 5 14:27:06 2007 From: ann at soe.ucsc.edu (Ann Zweig) Date: Fri, 05 Oct 2007 14:27:06 -0700 Subject: [Genome] X and Y Pseudoautosomal Regions In-Reply-To: <572057D3BDD52A46BD05BC6DA5068611EE4ADB@MSGEBE22.mfad.mfroot.org> References: <572057D3BDD52A46BD05BC6DA5068611EE4ADB@MSGEBE22.mfad.mfroot.org> Message-ID: <4706AC2A.8040104@cse.ucsc.edu> Hello Patrick, Good question. Note that the full name of the Segmental Duplications track is "Duplications of >1000 Bases of Non-RepeatMasked Sequence". For a region to be included in the track, at least 1 Kb of the total sequence (containing at least 500 bp of non-RepeatMasked sequence) had to align and a sequence identity of at least 90% was required. In the case of the PARs, they pass both the length and identity requirements. However, those locations are full of repeats. You can turn on the Repeat Masker track to get a sense of just how many repeats there are. That would be my gut feel for why there isn't a 100% match in the Seg Dups track. You might want to contact the lab that produced the track to verify. You can get the contact information from the track description page. As an alternative solution, you could make a Custom Track and display that in the browser, so you would always be aware if you were in that region. Here's a simple Custom Track annotation that would work. browser position chrX:154580000-154913754 track name=PARs description="chrX and chrY pseudoautosomal regions" visibility=2 chrY 1 2709520 chrX 1 2709520 chrY 57443438 57772954 chrX 154584237 154913754 For details on using the Custom Track tool, see: http://genome.ucsc.edu/goldenPath/help/customTrack.html Regards, ---------- Ann Zweig UCSC Genome Bioinformatics Group http://genome.ucsc.edu Gonzales, Patrick R. wrote: > Why are the ~100% identical pseudoautosomal regions on Xp/Yp and Xq/Yq > not fully shown in the "Segmental Duplications" track of the genome > browser? This would be very helpful to investigators interested in > these regions to have them clearly demarcated. You mention the defined > coordinates under the "Genomes" tab. Thanks! > > Pat > > > Patrick R. Gonzales, MS > > Clinical Development Technologist > Cytogenetics Array CGH > Mayo Clinic > Hilton 932 > (507)284-8338 > gonzales.patrick at mayo.edu > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From ann at soe.ucsc.edu Fri Oct 5 15:51:04 2007 From: ann at soe.ucsc.edu (Ann Zweig) Date: Fri, 05 Oct 2007 15:51:04 -0700 Subject: [Genome] question about UCSC STS In-Reply-To: <4706B7C6.5060100@vcu.edu> References: <4706B7C6.5060100@vcu.edu> Message-ID: <4706BFD8.9020600@cse.ucsc.edu> Hello Anyuan, I see exactly what you are talking about. I did a little bit of digging around regarding this particular STS Marker and noticed a few interesting things. If you follow the link from the UniSTS page to the GenBank accession page (Z23995), you will see the full sequence. This sequence matches the exact location of where we have placed the marker on the genome in the hg18 assembly: chr1:221008275-221008670. Now, returning to the UniSTS page for this marker, you will see the sequence for the two primers. Note that both of these primers fall *within* the GenBank accession full sequence. More precisely, the forward primer starts 56 bases after the start of the GenBank sequence, and the reverse primer ends 11 bases before the end. If you add 56 to the UCSC starting coordinate and subtract 11 from the UCSC end coordinate, you will get the exact coordinates listed on the UniSTS site: chr1:221008331-221008659. So, that appears to explain the discrepancy. To help you visualize this more clearly, I have created a session that you can view in your own Internet browser: http://genome.cse.ucsc.edu/cgi-bin/hgTracks?hgS_doOtherUser=submit&hgS_otherUserName=Ann&hgS_otherUserSessionName=AnyuanSTSmarker From this browser configuration, you can see exactly where the primers fall within the STS Marker, as well as the results of BLATting the sequence from GenBank to the hg18 assembly. Regards, ---------- Ann Zweig UCSC Genome Bioinformatics Group http://genome.ucsc.edu Anyuan Guio wrote: > Dear colleague, > Thank you for your excellent work on UCSC genome browser. But now I > have a question about it. > I find the physical location of some STS markers on UCSC are not > consistent with the NCBI UniSTS. > Such as marker AFM280ZA5 (alias D1S469), the location start at > 221008275 and end at 221008670 on chr 1 on UCSC > http://genome.ucsc.edu/cgi-bin/hgc?hgsid=98338741&o=221008274&t=221008670&g=stsMap&i=AFM280ZA5&c=chr1&l=220908274&r=221108670&db=hg18&pix=620 > But the same marker is located from 221008331 to 221008659 of chr 1 > annotated on NCBI UniSTS > (http://www.ncbi.nlm.nih.gov/genome/sts/sts.cgi?uid=65827). > > I noticed that the NCBI current human assembly version is Build > 36.2. However this Build 36.2 is identical to build 36.1 as NCBI release > notes described. > (http://www.ncbi.nlm.nih.gov/genome/guide/human/release_notes.html) > And the UCSC human genome version is Human Mar. 2006 (hg18) > assembly, based on the NCBI Build 36.1 as described on > http://genome.ucsc.edu/cgi-bin/hgGateway. > > So, the two human genome assembly is the same, but a STS marker > located on different physical location on UCSC and NCBI. Why? Which can > I used? > > Thanks very much! > > Anyuan Guo > > > > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From iamyeoni at berkeley.edu Sat Oct 6 21:26:42 2007 From: iamyeoni at berkeley.edu (Soyeon Ahn) Date: Sat, 06 Oct 2007 20:26:42 -0800 Subject: [Genome] A. gambiae (mosquito) update ? Message-ID: <47086002.9000400@berkeley.edu> //Dear UCSC team, I'm currently working on the mosquito genome and found out that the UCSC browser allows //A. gambiae annotated in 2003. I was wondering whether it is possible to update the mosquito genome assembly. (Thank you for the browser!) Regards, Soyeon From ross.lazarus at channing.harvard.edu Fri Oct 5 17:25:50 2007 From: ross.lazarus at channing.harvard.edu (Ross Lazarus) Date: Fri, 05 Oct 2007 20:25:50 -0400 Subject: [Genome] Question about extending genome graph (/cgi-bin/hgGenome) parameters Message-ID: <4706D60E.7000407@channing.harvard.edu> Genome Graphs is a great way to see multiple custom tracks. I'm using it on an internal private mirror here at the Channing lab to display hundreds of outputs from analyses of large association studies for the investigators. I have a proposal that would make it even greater with a relatively small increase in complexity. I have a working Galaxy genome graphs datatype that works exactly like the current Galaxy bed file viewer - it has a "view at ucsc" link that opens a (in our case!) local mirror genome graphs display. Users really like it, but they complain about 3 things - the lines joining points are misleading for these statistics; they don't like the fact that the default is all tracks on at 30 pixels or so; and they want the tracks scaled sensibly. I was wondering if we could have some additional parameters - eg 1) a flag to substitute bars when you don't want the apparent smoothing of joined dots, and 2) some ways to hint at each track's vertical size 3) and scale, 4) and default state (full, packed etc)? I am constructing the URL in python code and it's easy for me to comply with extensions to the current genome graphs call to /cgi-bin/hgGenome I quickly understood from reading the code that making these changes correctly would be a non trivial, high risk venture for me. Does anyone with deep knowledge have time for this please? I would be interested in discussing the possibility of supporting this work via a small (!) subcontract from one of my grants if that is a realistic option and would help get this done ? > From ann at soe.ucsc.edu Mon Feb 5 05:42:17 2007 > From: ann at soe.ucsc.edu (Ann Zweig) > Date: Mon, 05 Feb 2007 07:42:17 -0600 > Subject: [Genome-announce] New Genome Graphs Tool Available in Genome Browser > Message-ID: <45C73439.6090804 at soe.ucsc.edu> > > We are pleased to announce the release of a new software tool in the > Genome Browser collection, the Genome Graphs tool. Genome Graphs offers > the ability to upload and display genome-wide data sets such as the > results of genome-wide SNP association studies, linkage studies and > homozygosity mapping. The Genome Graphs tool may be accessed from the > menu on the UCSC Genome Bioinformatics home page, or from this link: > http://genome.ucsc.edu/cgi-bin/hgGenome > > The initial release of Genome Graphs includes the following features: > > - upload several sets of genome-wide data and display them simultaneously > - click on an area of interest and go directly to the genome browser at > that position > - set a significance threshold for your data and view only regions that > meet that threshold > - view the genes that exist in areas where your data meet your > significance threshold > > For more information about the Genome Graphs tool, visit the Gateway > page or consult the Getting Started on Genome Graphs section in the > User's Guide: > http://genome.ucsc.edu/goldenPath/help/hgTracksHelp.html#GenomeGraphs > > Genome Graphs was written by Jim Kent of the UCSC Genome Bioinformatics > Group. > > ------- > Regards, > Ann Zweig -- Ross Lazarus MBBS MPH, Director of Bioinformatics Channing Laboratory, 181 Longwood Ave., Boston MA 02115, USA. Tel: +617 525 2730 Fax: +617 525 0958 The information transmitted in this electronic communication is intended only for the person or entity to whom it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of or taking of any action in reliance upon this information by persons or entities other than the intended recipient is prohibited. If you received this information in error, please contact the Compliance HelpLine at 800-856-1983 and properly dispose of this information. From zhouj at gis.a-star.edu.sg Mon Oct 8 00:44:52 2007 From: zhouj at gis.a-star.edu.sg (ZHOU Jiangtao) Date: Mon, 8 Oct 2007 15:44:52 +0800 Subject: [Genome] formula to locate sequence in the fasta files Message-ID: <318F9A7177894D4F9061D20E1D2318D01BF0EE@gisexch.gis.a-star.edu.sg> Hi, To get the genome sequences for a given gene location, let say, (chr1, +, txStart, txEnd), I downloaded the FASTA files >From goldenPath/hg18/chromosomes/ And use this formula: Starting position of file chr1.fa: strlen("chr1")+2+($txStart/50)*51+$txStart%50; length:$txEnd-$txStart+1; are these formula correct? I found out sometimes it will be 1 position earlier than the one I can get from the genome browser. Are txStart 0 based or 1 based? Regards, Zhou Jiangtao From asaflev1 at post.tau.ac.il Sat Oct 6 10:15:28 2007 From: asaflev1 at post.tau.ac.il (asaf levy) Date: Sat, 6 Oct 2007 19:15:28 +0200 Subject: [Genome] strand in liftover application Message-ID: <20071006171152.334C0BCC028@post.tau.ac.il> Hi, Does the strand data have any meaning in the liftover tool? Is there a possibility that some genomic positions changed had their strand changed between different genomic versions? Regards, Asaf TAU From dag23 at duke.edu Sun Oct 7 09:16:47 2007 From: dag23 at duke.edu (David Garfield) Date: Sun, 7 Oct 2007 12:16:47 -0400 Subject: [Genome] Alignment techniques for finding SNPs (and other polymorphisms) Message-ID: <80760395-3B3A-42E6-AFEA-498FB0A413C9@duke.edu> Hi genome-crew, I've got a question about the best ways of using alignment software to locate SNPs. I'm working with the sea urchin genomes (the S. purpuratus 2.0 release along with the recently added 1.2x coverage 454 sequences of the related S. franciscanus and A. fragilis). I've had great luck using blastz/TBA to do whole genome alignments before, but in this case I'd like to make use of the high levels of polymorphism in urchin populations and track not only which regions of the A. frag and S. franc genomes map to the annotated S. purp sequence, but also track the polymorphisms that pop up on the occasions where the 454 sequences cover both halpotypes from the same region (and perhaps use the WGS reads from S_purp to find polymorphisms within that genome). Any suggestions on how to proceed? I'm savy enough, I think, to figure out how to use most of the programs out there, and can certainly parse outputs of one format to another, but given the bewildering array of alignment tools out there, I'd rather not reinvent the wheel if folks have already tried to do something similar. Thanks in advance, David Garfield Wray Lab Duke University From kayla at soe.ucsc.edu Mon Oct 8 11:44:10 2007 From: kayla at soe.ucsc.edu (Kayla Smith) Date: Mon, 8 Oct 2007 11:44:10 -0700 (PDT) Subject: [Genome] A. gambiae (mosquito) update ? In-Reply-To: <47086002.9000400@berkeley.edu> References: <47086002.9000400@berkeley.edu> Message-ID: Hello Soyeon, We do not currently have plans to update the mosquito genome browser. If you have any other questions about the Genome Browser please don't hesitate to contact us again. Kayla Smith UCSC Genome Bioinformatics Group On Sat, 6 Oct 2007, Soyeon Ahn wrote: > //Dear UCSC team, > > I'm currently working on the mosquito genome and found out that the UCSC browser allows //A. gambiae annotated in 2003. > I was wondering whether it is possible to update the mosquito genome assembly. > > (Thank you for the browser!) > > Regards, > Soyeon > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome > From kayla at soe.ucsc.edu Mon Oct 8 13:13:50 2007 From: kayla at soe.ucsc.edu (Kayla Smith) Date: Mon, 8 Oct 2007 13:13:50 -0700 (PDT) Subject: [Genome] formula to locate sequence in the fasta files In-Reply-To: <318F9A7177894D4F9061D20E1D2318D01BF0EE@gisexch.gis.a-star.edu.sg> References: <318F9A7177894D4F9061D20E1D2318D01BF0EE@gisexch.gis.a-star.edu.sg> Message-ID: Hello Zhou, Information about the 0-based / 1-based coordinates are here in our FAQ: http://genome.ucsc.edu/FAQ/FAQtracks#tracks1 The formula you've provided may function on a set of fasta files that have exactly 50 characters to each line, but we can not guarantee that all our fasta files have 50 characters on each line. The recommended method is to use the fasta reading utilities in the kent source tree which can extract specific sequences, such as faFrag, or twoBitToFa from the 2bit file. Here is a FAQ on downloading the Genome Browser utilities: http://genome.ucsc.edu/FAQ/FAQlicense#license3 I hope this information is helpful to you. Please don't hesitate to contact us again if you require further assistance. Kayla Smith UCSC Genome Bioinformatics Group On Mon, 8 Oct 2007, ZHOU Jiangtao wrote: > Hi, > > > > To get the genome sequences for a given gene location, let say, (chr1, > +, txStart, txEnd), I downloaded the FASTA files > > >From goldenPath/hg18/chromosomes/ > > And use this formula: > > > > Starting position of file chr1.fa: > > strlen("chr1")+2+($txStart/50)*51+$txStart%50; > > length:$txEnd-$txStart+1; > > > > are these formula correct? I found out sometimes it will be 1 position > earlier than the one I can get from the genome browser. Are txStart 0 > based or 1 based? > > > > Regards, > > > > Zhou Jiangtao > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome > From kavitag at u.washington.edu Mon Oct 8 10:25:30 2007 From: kavitag at u.washington.edu (K. Garg) Date: Mon, 8 Oct 2007 10:25:30 -0700 (PDT) Subject: [Genome] forward and reverse ESTs Message-ID: How can I find two ESTs (forward and reverse) which came from the same clone ? Given an EST, how do I find its pair if it exists ? thanks -kavita ----------------------------------------------------------------------- Kavita Garg, Ph.D. Email: kavitag at u.washington.edu Dept of Genome Sciences, Phone: 206-616-5051 Box 357730 Fax: 206-685-7301 University of Washington School of Medicine 1705 NE Pacific, Seattle, WA 98195-7730, USA ----------------------------------------------------------------------- From georg at jhmi.edu Mon Oct 8 13:10:34 2007 From: georg at jhmi.edu (GEORG EHRET) Date: Mon, 08 Oct 2007 16:10:34 -0400 Subject: [Genome] codon position and splicing acceptor and donor sites Message-ID: Good Morning! I am looking for the codon position (or non-codon) for each nucleotide of a large genomic sequence. Is this data available on UCSC? What about splicing donor and acceptor sites? Thank you! Georg. ********************************************************** Georg B. Ehret, MD Fellow McKusick-Nathans Institute of Genetic Medicine Johns Hopkins University School of Medicine Broadway Research Building, Room 572 733 N. Broadway Baltimore, MD 21205 Phone: (410) 502-7530 Fax: (410) 502-7544 From gua110 at bx.psu.edu Mon Oct 8 12:15:20 2007 From: gua110 at bx.psu.edu (Guruprasad Ananda) Date: Mon, 8 Oct 2007 15:15:20 -0400 Subject: [Genome] Quality scores Message-ID: <74CD0E9B-6AB1-4EB5-965A-8101E6907073@bx.psu.edu> Hi, I noticed that on the UCSC genome browser, quality score are available for download for chimpanzee and rhesus genomes. I was wondering if you had quality scores for other genomes. If not, could you suggest me an alternative place from where I can obtain quality scores for other genomes? Regards, Guru. Guruprasad Ananda Graduate Student Bioinformatics and Genomics The Pennsylvania State University From kayla at soe.ucsc.edu Mon Oct 8 13:44:07 2007 From: kayla at soe.ucsc.edu (Kayla Smith) Date: Mon, 8 Oct 2007 13:44:07 -0700 (PDT) Subject: [Genome] Quality scores In-Reply-To: <74CD0E9B-6AB1-4EB5-965A-8101E6907073@bx.psu.edu> References: <74CD0E9B-6AB1-4EB5-965A-8101E6907073@bx.psu.edu> Message-ID: Hello Guru, We have quality scores for some other assemblies. The table is named "quality" in the respective assembly: anoCar1 bosTau1 bosTau2 bosTau3 caePb1 canFam1 canFam2 equCab1 felCat3 galGal2 gasAcu1 monDom1 monDom4 oryLat1 panTro1 panTro2 priPac1 rheMac2 strPur1 For assemblies which do not have a quality table, you might be interested in checking out the PHRED program: http://www.phrap.com/phred/ I hope this information is helpful to you. Please don't hesitate to contact us again if you require further assistance. Kayla Smith UCSC Genome Bioinformatics Group On Mon, 8 Oct 2007, Guruprasad Ananda wrote: > Hi, > > I noticed that on the UCSC genome browser, quality score are > available for download for chimpanzee and rhesus genomes. I was > wondering if you had quality scores for other genomes. If not, could > you suggest me an alternative place from where I can obtain quality > scores for other genomes? > > Regards, > Guru. > > Guruprasad Ananda > Graduate Student > Bioinformatics and Genomics > The Pennsylvania State University > > > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome > From rhead at soe.ucsc.edu Mon Oct 8 14:21:42 2007 From: rhead at soe.ucsc.edu (Brooke Rhead) Date: Mon, 08 Oct 2007 14:21:42 -0700 Subject: [Genome] Retrieve chromosomal positions between STS markers In-Reply-To: References: Message-ID: <470A9F66.3060505@soe.ucsc.edu> Hello James, The table that contains this information is called 'stsMap', and the fields you will need to retrieve are 'chrom', 'chromStart', 'chromEnd' and 'name'. For instance: mysql> select chrom, chromStart, chromEnd, name from stsMap where name = 'RH18061' or name = 'RH80175'; +-------+------------+----------+---------+ | chrom | chromStart | chromEnd | name | +-------+------------+----------+---------+ | chr7 | 27169002 | 27169152 | RH18061 | | chr7 | 27172089 | 27172296 | RH80175 | +-------+------------+----------+---------+ 2 rows in set (0.01 sec) From here, you will need to select the the smallest chromStart and the biggest chromEnd from each pair to get the list of genomic regions. In case you don't already have the instructions for accessing our public MySQL database, they are located here: http://genome.ucsc.edu/FAQ/FAQdownloads#download29 I hope this information helps. If you have further questions, please do not hesitate to contact us again. -- Brooke Rhead UCSC Genome Bioinformatics Group Gao, James (NIH/NEI) [C] wrote: > Hello, > > In the human Genome browser, one can enter a pair of STS markers in the > search term box, and the browser will display a region of the > chromosome, such as entering RH18061;RH80175, the browser will display > chr7:27,169,003-27,172,296. Here is my question: I have a list of pairs > of these STS markers, can I use your MySQL database to retrieve those > regions of chromosomes as displayed inside the browser? > > Thank you. > > James > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From TPtacek at genetics.uab.edu Tue Oct 9 10:11:54 2007 From: TPtacek at genetics.uab.edu (Travis Ptacek) Date: Tue, 9 Oct 2007 12:11:54 -0500 Subject: [Genome] Retrieving intergenic regions from specific regions on chromosomes Message-ID: <47A4320DAB0F2542AEC31FDA690A78E3011B1C04@GEN-MAIL.genetics.uab.edu> I refered to the following link for the method to retrieve intergenic sequence using galaxy: http://www.soe.ucsc.edu/pipermail/genome/2007-June/013907.html I can perform this method sucessfully, but I need to retrieve intergenic sequence from a specific region, not an entire chromosome or genome. When I perform this method using a region of, for example, chr7:115,000,000-116,000,000 using the Known Genes track, Galaxy correctly retrieves information for the known genes in that region. However, when I complement the interval of the query, I get an two huge intervals covering all of chromosome 7 upstream of 115,000,000 and downstream of 116,000,000 in addtion to intergenic regions within chr7:115,000,000-116,000,000. What I want, in this example, are only the intergenic regions within chr7:115,000,000-116,000,000. Thanks in advance, Travis Ptacek From kuhn at soe.ucsc.edu Tue Oct 9 12:00:01 2007 From: kuhn at soe.ucsc.edu (Robert Kuhn) Date: Tue, 9 Oct 2007 12:00:01 -0700 Subject: [Genome] Question about extending genome graph (/cgi-bin/hgGenome) parameters Message-ID: <200710091900.MAA28919@moondance.cse.ucsc.edu> Hello, Ross, We're glad you like the Genome Graphs tool. We will take your segguestions to heart and see if we can implement them. Bet wishes, --b0b kuhn ucsc genome bioinformatics group > From genome-bounces at soe.ucsc.edu Mon Oct 8 02:17:48 2007 > To: genome at soe.ucsc.edu > Cc: ross.lazarus at channing.harvard.edu > Subject: [Genome] Question about extending genome graph (/cgi-bin/hgGenome) > parameters > > Genome Graphs is a great way to see multiple custom tracks. I'm using it on an > internal private mirror here at the Channing lab to display hundreds of outputs > from analyses of large association studies for the investigators. > > I have a proposal that would make it even greater with a relatively small > increase in complexity. I have a working Galaxy genome graphs datatype that > works exactly like the current Galaxy bed file viewer - it has a "view at ucsc" > link that opens a (in our case!) local mirror genome graphs display. Users > really like it, but they complain about 3 things - the lines joining points are > misleading for these statistics; they don't like the fact that the default is > all tracks on at 30 pixels or so; and they want the tracks scaled sensibly. > > I was wondering if we could have some additional parameters - eg 1) a flag to > substitute bars when you don't want the apparent smoothing of joined dots, and > 2) some ways to hint at each track's vertical size 3) and scale, 4) and default > state (full, packed etc)? I am constructing the URL in python code and it's easy > for me to comply with extensions to the current genome graphs call to > /cgi-bin/hgGenome > > I quickly understood from reading the code that making these changes correctly > would be a non trivial, high risk venture for me. Does anyone with deep > knowledge have time for this please? I would be interested in discussing the > possibility of supporting this work via a small (!) subcontract from one of my > grants if that is a realistic option and would help get this done ? > > > > From ann at soe.ucsc.edu Mon Feb 5 05:42:17 2007 > > From: ann at soe.ucsc.edu (Ann Zweig) > > Date: Mon, 05 Feb 2007 07:42:17 -0600 > > Subject: [Genome-announce] New Genome Graphs Tool Available in Genome Browser > > Message-ID: <45C73439.6090804 at soe.ucsc.edu> > > > > We are pleased to announce the release of a new software tool in the > > Genome Browser collection, the Genome Graphs tool. Genome Graphs offers > > the ability to upload and display genome-wide data sets such as the > > results of genome-wide SNP association studies, linkage studies and > > homozygosity mapping. The Genome Graphs tool may be accessed from the > > menu on the UCSC Genome Bioinformatics home page, or from this link: > > http://genome.ucsc.edu/cgi-bin/hgGenome > > > > The initial release of Genome Graphs includes the following features: > > > > - upload several sets of genome-wide data and display them simultaneously > > - click on an area of interest and go directly to the genome browser at > > that position > > - set a significance threshold for your data and view only regions that > > meet that threshold > > - view the genes that exist in areas where your data meet your > > significance threshold > > > > For more information about the Genome Graphs tool, visit the Gateway > > page or consult the Getting Started on Genome Graphs section in the > > User's Guide: > > http://genome.ucsc.edu/goldenPath/help/hgTracksHelp.html#GenomeGraphs > > > > Genome Graphs was written by Jim Kent of the UCSC Genome Bioinformatics > > Group. > > > > ------- > > Regards, > > Ann Zweig > > > > -- > Ross Lazarus MBBS MPH, Director of Bioinformatics > Channing Laboratory, 181 Longwood Ave., Boston MA 02115, USA. > Tel: +617 525 2730 Fax: +617 525 0958 > > > The information transmitted in this electronic communication is intended only for the person or entity to whom it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of or taking of any action in reliance upon this information by persons or entities other than the intended recipient is prohibited. If you received this information in error, please contact the Compliance HelpLine at 800-856-1983 and properly dispose of this information. > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome > From kayla at soe.ucsc.edu Tue Oct 9 12:44:19 2007 From: kayla at soe.ucsc.edu (Kayla Smith) Date: Tue, 09 Oct 2007 12:44:19 -0700 Subject: [Genome] codon position and splicing acceptor and donor sites In-Reply-To: References: Message-ID: <470BDA13.9070405@cse.ucsc.edu> Georg, In the refGene table, the columns exonStarts, exonEnds and exonFrames will be of use to you. The exonFrame column is described as "Exon frame {0,1,2}, or -1 if no frame for exon" You can use the Table Browser ("Tables" on the blue bar on the top of the main page) to access this data. Set the following options: clade: Vertebrate genome: Human assembly: Mar. 2006 group: Genes and Gene Prediction Tracks track: RefSeq Genes table: refGene region: this part is up to you. I used chr1:6573371-6585516 as an example output format: all fields from selected table click: "get output" As for your question about splicing, there is a similar previously answered mailing list question here: http://www.soe.ucsc.edu/pipermail/genome/2007-September/014658.html This should help you get started. I hope this information has been helpful to you. Please don't hesitate to contact us again if you require further assistance. Kayla Smith UCSC Genome Bioinformatics Group GEORG EHRET wrote: > Good Morning! > I am looking for the codon position (or non-codon) for each nucleotide > of a large genomic sequence. Is this data available on UCSC? What about > splicing donor and acceptor sites? > > Thank you! > Georg. > ********************************************************** > Georg B. Ehret, MD > Fellow > McKusick-Nathans Institute of Genetic Medicine > Johns Hopkins University School of Medicine > Broadway Research Building, Room 572 > 733 N. Broadway > Baltimore, MD 21205 > Phone: (410) 502-7530 > Fax: (410) 502-7544 > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From kayla at soe.ucsc.edu Tue Oct 9 15:10:25 2007 From: kayla at soe.ucsc.edu (Kayla Smith) Date: Tue, 09 Oct 2007 15:10:25 -0700 Subject: [Genome] Alignment techniques for finding SNPs (and other polymorphisms) In-Reply-To: <80760395-3B3A-42E6-AFEA-498FB0A413C9@duke.edu> References: <80760395-3B3A-42E6-AFEA-498FB0A413C9@duke.edu> Message-ID: <470BFC51.8070507@cse.ucsc.edu> Hello David, I've asked Jim Kent for advice on what your question. He says that the divergence level in the sea urchin is high enough you might want to stick with blastz. But that on the other hand BLAT would be much faster and would probably only miss about 1%. Our source code is available for download here: http://genome.ucsc.edu/FAQ/FAQdownloads#download27 Good luck with your research. If you have any questions about the Genome Browser, please don't hesitate to contact us again. Kayla Smith UCSC Genome Bioinformatics Group David Garfield wrote: > Hi genome-crew, > > I've got a question about the best ways of using alignment > software to locate SNPs. I'm working with the sea urchin genomes > (the S. purpuratus 2.0 release along with the recently added 1.2x > coverage 454 sequences of the related S. franciscanus and A. fragilis). > > I've had great luck using blastz/TBA to do whole genome alignments > before, but in this case I'd like to make use of the high levels of > polymorphism in urchin populations and track not only which regions > of the A. frag and S. franc genomes map to the annotated S. purp > sequence, but also track the polymorphisms that pop up on the > occasions where the 454 sequences cover both halpotypes from the same > region (and perhaps use the WGS reads from S_purp to find > polymorphisms within that genome). > > > Any suggestions on how to proceed? I'm savy enough, I think, to > figure out how to use most of the programs out there, and can > certainly parse outputs of one format to another, but given the > bewildering array of alignment tools out there, I'd rather not > reinvent the wheel if folks have already tried to do something similar. > > > Thanks in advance, > > David Garfield > Wray Lab > Duke University > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From archanat at soe.ucsc.edu Tue Oct 9 17:23:09 2007 From: archanat at soe.ucsc.edu (Archana Thakkapallayil) Date: Tue, 09 Oct 2007 17:23:09 -0700 Subject: [Genome] Retrieving intergenic regions from specific regions on chromosomes In-Reply-To: <47A4320DAB0F2542AEC31FDA690A78E3011B1C04@GEN-MAIL.genetics.uab.edu> References: <47A4320DAB0F2542AEC31FDA690A78E3011B1C04@GEN-MAIL.genetics.uab.edu> Message-ID: <470C1B6D.3050302@soe.ucsc.edu> Hello Travis, You could get this information using our Table Browser. First of all, you will need to make a few custom tracks using the Table Browser. 1. A custom track of the introns only from knownGene. 2. A custom track of the exons only from knownGene 3. A custom track of the introns+exons from knownGene (make this by combining the first two CTs) Note this is the same as simply making a custom track of the knownGenes as they are ). 4. A custom track of the complement of #3 for "everything else" (aka intergenic regions) Some details on how to make these custom tracks: 1. Custom track of introns: --------------------------- 1a. Open the Table Browser 1b. set the following options: clade: Vertebrate genome: Human assembly: Mar 2006 group: Genes and Gene Prediction Tracks track: UCSC Genes table: knownGene region: position and enter the position that you are interested in the text box ( chr7:115,000,000-116,000,000 ) output format: custom track Click "get output" 1c. On the next page, select the radio button for "Introns", be sure to name this custom track appropriately, and press "get custom track in table browser." 1d. You now have a custom track of the introns of the Known Genes for your region of interest. 2. Custom track of exons: ------------------------- Follow the above steps, except select the radio button for "Exons" in step 1c. 3. Custom track of introns+exon : --------------------------------- 3a. set the following options: clade: Vertebrate genome: Human assembly: Mar 2006 group: Custom Tracks track: tb_knownGene_INTRONS (this is what I named my CT of the introns) and select the related table region: position intersection: create 3b. On the intersection page, pull down the menu to choose your exons track, tb_knownGene_EXONS ( I used this name for my CT of the exons) . Choose a "base-pair-wise union [OR] of tb_knownGene_INTRONS and tb_knownGene_EXONS" click submit. output format: custom track Click "get output" 3c. Give an appropriate name to the CT ( I used "unionExonsIntrons" ) and choose "Create one BED record per: Whole gene". click 'get custom track in table Browser'. 4. Custom track of the intergenc regions: ----------------------------------------- In this step you have to complement the unionExonsIntrons track to get the intergenic regions. 4a. Choose the track 'unionExonsIntrons' 4b. Create an intersection with itself by choosing "Base-pair-wise intersection (AND) of unionExonsIntrons and unionExonsIntrons" 4c. Also check both the boxes for Complement unionExonsIntrons before intersection/union Complement unionExonsIntrons before intersection/union Click submit. 4d. Back on the Table Browser choose output format: custom track and I named the CT as 'complementUnionIntronsExons' and then press 'get custom track in table browser'. This gives you the intergenic regions for the position: chr7:115,000,000-116,000,000 Please see this session that I've created for you: http://genome.cse.ucsc.edu/cgi-bin/hgTracks?hgS_doOtherUser=submit&hgS_otherUserName=Archana&hgS_otherUserSessionName=hg18_intergenic_%20regions I hope that this helps. Please let us know if you have further questions. Regards, Archana UCSC Genome Bioinformatics Group Travis Ptacek wrote: > I refered to the following link for the method to retrieve intergenic sequence using galaxy: > http://www.soe.ucsc.edu/pipermail/genome/2007-June/013907.html > > I can perform this method sucessfully, but I need to retrieve intergenic sequence from a specific region, not an entire chromosome or genome. > > When I perform this method using a region of, for example, chr7:115,000,000-116,000,000 using the Known Genes track, Galaxy correctly retrieves information for the known genes in that region. However, when I complement the interval of the query, I get an two huge intervals covering all of chromosome 7 upstream of 115,000,000 and downstream of 116,000,000 in addtion to intergenic regions within chr7:115,000,000-116,000,000. What I want, in this example, are only the intergenic regions within chr7:115,000,000-116,000,000. > > Thanks in advance, > > Travis Ptacek > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome > From taysk at novasprint.com Tue Oct 9 21:23:12 2007 From: taysk at novasprint.com (Tay Sen Kwan) Date: Wed, 10 Oct 2007 12:23:12 +0800 Subject: [Genome] Question on multiz Message-ID: <470C53B0.7050803@novasprint.com> I am using the Genome Browser to look at the conservation of certain cDNAs across a number of species, in particular the 28-species multiz conservation track. For example, BC069077 which lies on chr15 q11.2, showed sequence alignment with the rabbit. However, when I extracted the rabbit sequence from the rabbit track, remove the gaps "-" and did 3 alignment tests (with other tools) on it: (a) a "Align two sequences using BLAST (bl2seq)" with BC069077 on NCBI Blast - it gave a "no significant similarity" result (b) a NCBI BLASTN on the human sequence with a very relaxed Expect=10 value, and it found no alignments on chr15 (c) a BLAT which also turn up no alignments on chr15 None of these tests indicated alignment to BC069077. Could you help me resolve this apparent contradiction ? Many thanks. Regards, Sen Kwan From xiw015 at ucsd.edu Tue Oct 9 21:03:56 2007 From: xiw015 at ucsd.edu (xiaoxia) Date: Tue, 9 Oct 2007 21:03:56 -0700 Subject: [Genome] question abou genome sequences Message-ID: Dear Sir/Madam: "Chimp Mar 2006 ". In Region "42036672-4204077", there are many "n"s. My question is whether the length of those n's shows the exact length of the missing data. Is it possible to get those masked sequences somewhere? Thank you very much. Xiaoxia From ross.lazarus at channing.harvard.edu Wed Oct 10 08:43:05 2007 From: ross.lazarus at channing.harvard.edu (Ross Lazarus) Date: Wed, 10 Oct 2007 11:43:05 -0400 Subject: [Genome] Question about extending genome graph (/cgi-bin/hgGenome) parameters In-Reply-To: References: <4706D60E.7000407@channing.harvard.edu> Message-ID: <470CF309.1010705@channing.harvard.edu> Galt, thanks for asking - it helps me to clarify what I have in mind. I basically want to make genome graphs more flexible and controllable - more like custom tracks in their options. I'd like to be able to specify these new options as parameters to /cgi-bin/hgGenome. I'm constructing URL's using a python script, so it's easy for me to make really long, ugly ones! Attached is an image showing 3 tracks - the 2 top ones are from loading a gg file using an URL my code created - http://hg/cgi-bin/hgGenome?db=hg18&hgGenome_dataSetName=hmYRI_CEU_tdt_TDTgg&hgGenome_dataSetDescription=GalaxyGG_data&hgGenome_formatType=best%20guess&hgGenome_markerType=best%20guess&hgGenome_columnLabels=first%20row&hgGenome_maxVal=&hgGenome_labelVals=&hgGenome_maxGapToFill=25000000&hgGenome_uploadFile=http%3A%2F%2Fgalaxy%2Fdisplay_as%3Fid%3D351%26display_app%3Ducsc&hgGenome_doSubmitUpload=submit The lower track is a custom track on our local UCSC mirror site - a bar graph for similar p value data. I want to be able to create a GG call URL with additional parameters so the GG tracks end up looking more like the custom track - bar graph, 90pixels,...etc The specific options I'd like to control are: 1) a track height pixels for each column - eg as a comma separated value list of values 2) a default display (packed, full..) as above, one per column 3) a scale hint for each column 4) an optional colour for each column 5) graph type for each column (points, lines, bars,...) The genome graphs upload form would also need to be adjusted to reflect these for interactive users. I know this is substantial, particularly since you have to already know how to correctly amend the input form and input parameter parsing, and how to correctly adjust track drawing - none of which seem easy to me! My primary motivation is getting bar graphs, but I'm guessing that there are plenty of other representations that users might want to have available. A lot of the work we do involves contemplating test statistics from half a million or a million markers on thousands of phenotypes, so alternative ways of summarizing with something like the -log10 pvalue coarse heatmap karyotype at http://www.ncbi.nlm.nih.gov/SNP/GaP.cgi?rm=genomeTraitView&test_id=13 would be useful... Galt Barber wrote: > Hi, Ross. > I work on hgGenome sometimes. > Can you provide more detailed descriptions, > or even some visuals of what you have/what you want to have, > so I'm sure I understand it. > > No promises here, just wanting to communicate clearly > about what features you had in mind. > > thanks! > > -Galt > > > On Fri, 5 Oct 2007, Ross Lazarus wrote: > >> Genome Graphs is a great way to see multiple custom tracks. I'm using it on an >> internal private mirror here at the Channing lab to display hundreds of outputs >> from analyses of large association studies for the investigators. >> >> I have a proposal that would make it even greater with a relatively small >> increase in complexity. I have a working Galaxy genome graphs datatype that >> works exactly like the current Galaxy bed file viewer - it has a "view at ucsc" >> link that opens a (in our case!) local mirror genome graphs display. Users >> really like it, but they complain about 3 things - the lines joining points are >> misleading for these statistics; they don't like the fact that the default is >> all tracks on at 30 pixels or so; and they want the tracks scaled sensibly. >> >> I was wondering if we could have some additional parameters - eg 1) a flag to >> substitute bars when you don't want the apparent smoothing of joined dots, and >> 2) some ways to hint at each track's vertical size 3) and scale, 4) and default >> state (full, packed etc)? I am constructing the URL in python code and it's easy >> for me to comply with extensions to the current genome graphs call to >> /cgi-bin/hgGenome >> -- Ross Lazarus MBBS MPH, Director of Bioinformatics Channing Laboratory, 181 Longwood Ave., Boston MA 02115, USA. Tel: +617 525 2730 Fax: +617 525 0958 The information transmitted in this electronic communication is intended only for the person or entity to whom it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of or taking of any action in reliance upon this information by persons or entities other than the intended recipient is prohibited. If you received this information in error, please contact the Compliance HelpLine at 800-856-1983 and properly dispose of this information. From zhou-xy02 at mails.tsinghua.edu.cn Wed Oct 10 10:02:36 2007 From: zhou-xy02 at mails.tsinghua.edu.cn (Xueya Zhou) Date: Thu, 11 Oct 2007 01:02:36 +0800 Subject: [Genome] Question about hgGcPercent program Message-ID: <6053b0950710101002t19f1bc69lbfc59f0262c80444@mail.gmail.com> Dear UCSC Genome Browser Crews, I encountered a problem when using your hgGcPercent program to calculate GC percentage in regions that I specified in a bed file (in.bed). The command line I used looked like the following: hgGcPercent -bedRegionIn=in.bed -bedRegionOut=out.bed -doGaps -noLoads hg18 hg18.2bit Then It seemed that the program ignored the bed file that I provided, and went on directly to calculate the whole genome GC percent with the default window size. It produced gcPercent.bed file that detailed the GC percent in each region across whole genome, and produced the empty file out.bed which was really wanted. I have checked that both the file format and my command spelling were correct. I also had a glance at the source code kent/src/hg/makeDb/hgGcPercent/hgGcPercent.c, it did process the options of bedRegionIn and bedRegionOut. So I was puzzled about what is going wrong. Could you help me figure it out? Thanks! Xueya -- Xueya Zhou Bioinformatics Division, Tsinghua National Laboratory of Information Science and Technology Address: FIT 1-107, Tsinghua University, Beijing 100084, China Phone: +86-10-6279-5578 ext 822 From rhead at soe.ucsc.edu Wed Oct 10 16:41:03 2007 From: rhead at soe.ucsc.edu (Brooke Rhead) Date: Wed, 10 Oct 2007 16:41:03 -0700 Subject: [Genome] question abou genome sequences In-Reply-To: References: Message-ID: <470D630F.1000609@soe.ucsc.edu> Hello Xiaoxia, The "n"s actually represent gaps in sequence rather than known sequence that has been masked. If you turn on the "Gap" track in Genome Browser, you will be able to see annotations corresponding to the "n"s. The Gap track contains information about the types of gaps present and whether or not they are bridged (that is, whether the relative order and orientation of the contigs on either side of the gap is known). For many of the gaps, the length is estimated. You can read more about the chimp sequencing process and get more information about the way the assemblers handled gaps by following the links to the sequencing center on the Genome Browser chimp gateway page: http://genome.ucsc.edu/cgi-bin/hgGateway (select chimp, Mar 2006, and scroll down). There is also a bit of information about gaps in the "Assembly details" section of the Genome Browser gateway page. If you scroll down further on the gateway page, you will see that "Centromeres were introduced into the chimp sequence at the positions of the centromeres in the human chromosomes." (The gap type "centromere" is one of the annotated regions of "n"s you will see in the gap track.) I hope this information is helpful. -- Brooke Rhead UCSC Genome Bioinformatics Group xiaoxia wrote: > Dear Sir/Madam: > > "Chimp Mar 2006 ". > In Region "42036672-4204077", there are many "n"s. > My question is whether the length of those n's shows the exact length of the > missing data. Is it possible to get those masked sequences somewhere? > > Thank you very much. > > > Xiaoxia > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From jyoti_shah at merck.com Wed Oct 10 17:41:07 2007 From: jyoti_shah at merck.com (Shah, Jyoti K.) Date: Wed, 10 Oct 2007 17:41:07 -0700 Subject: [Genome] Multiple genome alignment Message-ID: <23B0A4FBD181A44D9B89C4FB3E96D594DB0104@ussemx1100.merck.com> Dear UCSC team, I have a question regarding your conservation track. I want to extract the genomic region in mouse, rat and other species which align with the corresponding genomic region in human species. When I use the human genome browser and Mouse chain track, I can see the chromosome and coordinates in mouse species for certain human genomic region. But I am unable to get the same information when I try the table browser. Ideally, I would like to have a table which gives the following fields: Human chr Human genomic start Human genomic end Human sequence Mouse chr Mouse genomic start Mouse genomic end Mouse sequence Is it possible to extract the above using your table browser? It would be great if you could help me with this. Thanks Jyoti ------------------------------------------------------------------------------ Notice: This e-mail message, together with any attachments, contains information of Merck & Co., Inc. (One Merck Drive, Whitehouse Station, New Jersey, USA 08889), and/or its affiliates (which may be known outside the United States as Merck Frosst, Merck Sharp & Dohme or MSD and in Japan, as Banyu - direct contact information for affiliates is available at http://www.merck.com/contact/contacts.html) that may be confidential, proprietary copyrighted and/or legally privileged. It is intended solely for the use of the individual or entity named on this message. If you are not the intended recipient, and have received this message in error, please notify us immediately by reply e-mail and then delete it from your system. ------------------------------------------------------------------------------ From xu_jianzhen at gibh.ac.cn Wed Oct 10 18:47:12 2007 From: xu_jianzhen at gibh.ac.cn (=?gb2312?B?eHVfamlhbnpoZW4=?=) Date: Thu, 11 Oct 2007 09:47:12 +0800 (CST) Subject: [Genome] =?gb2312?b?cXVlc3Rpb24gYWJvdXQgVUNTQyAnSHVtYW4gRVNUcyBJ?= =?gb2312?b?bmNsdWRpbmcgVW5zcGxpY2VkJyB0cmFjaw==?= Message-ID: <470D80A0.000149.28847@app-01> Dear Genome Help, I've a question about UCSC 'Human ESTs Including Unspliced' track.The methods of this track said,"To generate this track, human ESTs from GenBank were aligned against the genome using blat. Note that the maximum intron length allowed by blat is 750,000 bases, which may eliminate some ESTs with very long introns that might otherwise align".However,to my knowledge of biology,i do not think there are too much such long pre-mRNA that is more than 750kb, so my question is when I find a unspliced EST more than 200kb,does it mean this is a false positive? Thank you all! Xu Jerry GIBH,CAS From rhead at soe.ucsc.edu Wed Oct 10 19:15:07 2007 From: rhead at soe.ucsc.edu (Brooke Rhead) Date: Wed, 10 Oct 2007 19:15:07 -0700 Subject: [Genome] Multiple genome alignment In-Reply-To: <23B0A4FBD181A44D9B89C4FB3E96D594DB0104@ussemx1100.merck.com> References: <23B0A4FBD181A44D9B89C4FB3E96D594DB0104@ussemx1100.merck.com> Message-ID: <470D872B.5020904@soe.ucsc.edu> Hello Jyoti, Sequence is not actually stored in the chain and net tables, so the Table Browser can't retrieve that information from the mouse chain track. Also, when you click on the mouse chain track in the Genome Browser, information about only the current region selected in the Genome Browser window is displayed; this type of info also cannot be retrieved from the mouse chain track in the Table Browser. Another solution is to use the "Conservation" track, and the table 'multiz28way' (for hg18). The table contains pointers to a multiple alignment file (described here: http://genome.ucsc.edu/FAQ/FAQformat#format5 -- be sure to note the way coordinates on the negative strand are handled), which contains sequences for all 28 genomes in the alignment. The Table Browser has a special output format available for this table: "MAF - multiple alignment format". This output format returns the aligned sequences the region you specified in the Table Browser. The multiple alignment in the conservation track is based on the chain and net tracks. Click on the "Conservation" track title above the track control to read about the methods used to create the track. If you would like to parse out sequence for only certain organisms from the MAF, there is a set of tools run by Penn State that can do this, located here: http://main.g2.bx.psu.edu/ Look under the "Fetch Alignments" link on the left-hand side of the page. I hope this helps. If you have further questions, please feel free to contact us again. -- Brooke Rhead UCSC Genome Bioinformatics Group Shah, Jyoti K. wrote: > Dear UCSC team, > > I have a question regarding your conservation track. I want to extract > the genomic region in mouse, rat and other species which align with the > corresponding genomic region in human species. > > When I use the human genome browser and Mouse chain track, I can see the > chromosome and coordinates in mouse species for certain human genomic > region. But I am unable to get the same information when I try the table > browser. Ideally, I would like to have a table which gives the following > fields: > > Human chr > Human genomic start > Human genomic end > Human sequence > Mouse chr > Mouse genomic start > Mouse genomic end > Mouse sequence > > Is it possible to extract the above using your table browser? It would > be great if you could help me with this. > > Thanks > Jyoti > > > ------------------------------------------------------------------------------ > Notice: This e-mail message, together with any attachments, contains > information of Merck & Co., Inc. (One Merck Drive, Whitehouse Station, > New Jersey, USA 08889), and/or its affiliates (which may be known > outside the United States as Merck Frosst, Merck Sharp & Dohme or MSD > and in Japan, as Banyu - direct contact information for affiliates is > available at http://www.merck.com/contact/contacts.html) that may be > confidential, proprietary copyrighted and/or legally privileged. It is > intended solely for the use of the individual or entity named on this > message. If you are not the intended recipient, and have received this > message in error, please notify us immediately by reply e-mail and then > delete it from your system. > > ------------------------------------------------------------------------------ > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From james at jamestaylor.org Thu Oct 11 08:15:53 2007 From: james at jamestaylor.org (James Taylor) Date: Thu, 11 Oct 2007 11:15:53 -0400 Subject: [Genome] Quality scores In-Reply-To: References: <74CD0E9B-6AB1-4EB5-965A-8101E6907073@bx.psu.edu> Message-ID: <4486B4AB-00CC-4BCC-86D8-7F99D32E9605@jamestaylor.org> Hi Kayla, Is there any way to get the original qa files that were used to generate the quality tables in these assemblies? These appear to be available for panTro2 [1] but not for most of the assemblies listed below. We would prefer not to use the data from the table browser, since it has been converted to the lossy "wib" format and we are concerned about accumulated error. Thanks, James .. [1]: http://hgdownload.cse.ucsc.edu/goldenPath/panTro2/bigZips/ chromQuals.tar.gz On Oct 8, 2007, at 4:44 PM, Kayla Smith wrote: > > Hello Guru, > > We have quality scores for some other assemblies. The table is named > "quality" in the respective assembly: > > anoCar1 > bosTau1 > bosTau2 > bosTau3 > caePb1 > canFam1 > canFam2 > equCab1 > felCat3 > galGal2 > gasAcu1 > monDom1 > monDom4 > oryLat1 > panTro1 > panTro2 > priPac1 > rheMac2 > strPur1 > > For assemblies which do not have a quality table, you might be > interested > in checking out the PHRED program: > http://www.phrap.com/phred/ > > I hope this information is helpful to you. Please don't hesitate to > contact us again if you require further assistance. > > Kayla Smith > UCSC Genome Bioinformatics Group > > On Mon, 8 Oct 2007, Guruprasad Ananda wrote: > >> Hi, >> >> I noticed that on the UCSC genome browser, quality score are >> available for download for chimpanzee and rhesus genomes. I was >> wondering if you had quality scores for other genomes. If not, could >> you suggest me an alternative place from where I can obtain quality >> scores for other genomes? >> >> Regards, >> Guru. >> >> Guruprasad Ananda >> Graduate Student >> Bioinformatics and Genomics >> The Pennsylvania State University >> >> >> >> _______________________________________________ >> Genome maillist - Genome at soe.ucsc.edu >> http://www.soe.ucsc.edu/mailman/listinfo/genome >> > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From maximilianh at gmail.com Thu Oct 11 10:06:30 2007 From: maximilianh at gmail.com (Maximilian Haeussler) Date: Thu, 11 Oct 2007 19:06:30 +0200 Subject: [Genome] wigmaf-type in trackdb.ra Message-ID: <76f031ae0710111006m602b424ao4c066715f26254f4@mail.gmail.com> Dear gurus, I've aligned C. intestinalis and C. savingyi some time ago and wrote a how-to about it ( http://genomewiki.ucsc.edu/index.php/Whole_genome_alignment_howto). Tried to do blastz/chain/netting/maffing and loaded the mafs into the browser. The alignments look nice when I zoom to basepair resolution. However, the wigmaf-track doesn't seem to be recognized as a maf-track: When I click onto it, no alignments are displayed and the table browser doesn't offer the "maf" format for download. I have no clue where to search, can you give me a hint into the right direction? Example: http://genome.ciona.cnrs-gif.fr/cgi-bin/hgTracks?hgt.out1=1.5x&position=chr10p%3A958170-959006 Thanks a lot in advance, Max From hiram at soe.ucsc.edu Thu Oct 11 10:07:51 2007 From: hiram at soe.ucsc.edu (Hiram Clawson) Date: Thu, 11 Oct 2007 10:07:51 -0700 Subject: [Genome] Quality scores In-Reply-To: <4486B4AB-00CC-4BCC-86D8-7F99D32E9605@jamestaylor.org> References: <74CD0E9B-6AB1-4EB5-965A-8101E6907073@bx.psu.edu> <4486B4AB-00CC-4BCC-86D8-7F99D32E9605@jamestaylor.org> Message-ID: <470E5867.1080907@soe.ucsc.edu> Good Morning James: This would be a lot of work to collect this data together in one location. In most cases the original quality scores are in qual fasta files from the sequencing centers and are in scaffold or contig coordinate systems. We convert those to our own in-house condensed format (qac) and lift to chrom coordinates, which then becomes source data for the wiggle format conversion. I haven't proven this, but I suspect that the output of the table browser for these numbers doesn't actually have any loss of information. The input numbers were integers in the range of 0 to 100. The wiggle conversion has a histogram range of 0 to 127. So if you take the output of the table browser and round them to the nearest integer, I think you have the original data. Here is a sample from anoCar1: Original qac data: >scaffold_0 21 20 48 31 24 36 29 20 50 21 21 28 36 41 50 49 26 24 49 37 17 18 50 20 12 43 38 50 42 50 30 21 49 43 30 40 42 50 50 49 Table browser output: variableStep chrom=scaffold_0 span=1 1 20.9764 2 19.7795 3 47.9055 4 30.8504 5 23.9685 6 35.937 7 28.7559 8 19.7795 9 50 10 20.9764 11 20.9764 12 27.8583 13 35.937 14 40.7244 15 50 16 48.8032 17 25.7638 18 23.9685 19 48.8032 20 36.8346 ... etc ... I'm pretty sure this rounding rule will apply to quality data in the range of 0 to 100. Be wary if you find a data set with numbers > 100 --Hiram James Taylor wrote: > Hi Kayla, > > Is there any way to get the original qa files that were used to > generate the quality tables in these assemblies? These appear to be > available for panTro2 [1] but not for most of the assemblies listed > below. We would prefer not to use the data from the table browser, > since it has been converted to the lossy "wib" format and we are > concerned about accumulated error. > > Thanks, > James > > .. [1]: http://hgdownload.cse.ucsc.edu/goldenPath/panTro2/bigZips/ > chromQuals.tar.gz > > On Oct 8, 2007, at 4:44 PM, Kayla Smith wrote: > >> Hello Guru, >> >> We have quality scores for some other assemblies. The table is named >> "quality" in the respective assembly: >> >> anoCar1 >> bosTau1 >> bosTau2 >> bosTau3 >> caePb1 >> canFam1 >> canFam2 >> equCab1 >> felCat3 >> galGal2 >> gasAcu1 >> monDom1 >> monDom4 >> oryLat1 >> panTro1 >> panTro2 >> priPac1 >> rheMac2 >> strPur1 >> >> For assemblies which do not have a quality table, you might be >> interested >> in checking out the PHRED program: >> http://www.phrap.com/phred/ >> >> I hope this information is helpful to you. Please don't hesitate to >> contact us again if you require further assistance. >> >> Kayla Smith >> UCSC Genome Bioinformatics Group >> >> On Mon, 8 Oct 2007, Guruprasad Ananda wrote: >> >>> Hi, >>> >>> I noticed that on the UCSC genome browser, quality score are >>> available for download for chimpanzee and rhesus genomes. I was >>> wondering if you had quality scores for other genomes. If not, could >>> you suggest me an alternative place from where I can obtain quality >>> scores for other genomes? >>> >>> Regards, >>> Guru. >>> >>> Guruprasad Ananda >>> Graduate Student >>> Bioinformatics and Genomics >>> The Pennsylvania State University From hiram at soe.ucsc.edu Thu Oct 11 10:23:36 2007 From: hiram at soe.ucsc.edu (Hiram Clawson) Date: Thu, 11 Oct 2007 10:23:36 -0700 Subject: [Genome] wigmaf-type in trackdb.ra In-Reply-To: <76f031ae0710111006m602b424ao4c066715f26254f4@mail.gmail.com> References: <76f031ae0710111006m602b424ao4c066715f26254f4@mail.gmail.com> Message-ID: <470E5C18.4070707@soe.ucsc.edu> Good Morning Max: Please take a look at the example conservation track entries in the source tree trackDb files for assemblies that have conservation tracks. For example: src/hg/makeDb/trackDb/human/hg18/trackDb.ra the 28-way multiz track entry. That will get your trackDb entry correct. For the alignments to display, your maf files need to be in a /gbdb/ location. Please note the sequence of loading instructions in the source tree make docs, for example, the 28-way loading in src/hg/makeDb/doc/hg18.txt: > # load tables for a look > ssh hgwdev > mkdir -p /gbdb/hg18/multiz28way/maf > ln -s /cluster/data/hg18/bed/multiz28way/maf/*.maf \ > /gbdb/hg18/multiz28way/maf > cd /cluster/data/hg18/bed/multiz28way > hgLoadMaf -pathPrefix=/gbdb/hg18/multiz28way/maf hg18 multiz28way > # load summary table > cat maf/*.maf | nice hgLoadMafSummary hg18 -minSize=30000 -mergeGap=1500 \ > -maxSize=200000 multiz28waySummary stdin Note how the /gbdb/ symlinks are made to the actual maf files. And then the pathPrefix command in the hgLoadMaf business to properly point to them. --Hiram Maximilian Haeussler wrote: > Dear gurus, > > I've aligned C. intestinalis and C. savingyi some time ago and wrote a > how-to about it ( > http://genomewiki.ucsc.edu/index.php/Whole_genome_alignment_howto). > Tried to do blastz/chain/netting/maffing and loaded the mafs into the > browser. The alignments look nice when I zoom to basepair resolution. > > However, the wigmaf-track doesn't seem to be recognized as a maf-track: When > I click onto it, no alignments are displayed and the table browser doesn't > offer the "maf" format for download. > I have no clue where to search, can you give me a hint into the right > direction? > > Example: > http://genome.ciona.cnrs-gif.fr/cgi-bin/hgTracks?hgt.out1=1.5x&position=chr10p%3A958170-959006 > > Thanks a lot in advance, > Max From rhead at soe.ucsc.edu Thu Oct 11 10:51:41 2007 From: rhead at soe.ucsc.edu (Brooke Rhead) Date: Thu, 11 Oct 2007 10:51:41 -0700 Subject: [Genome] Question about hgGcPercent program In-Reply-To: <6053b0950710101002t19f1bc69lbfc59f0262c80444@mail.gmail.com> References: <6053b0950710101002t19f1bc69lbfc59f0262c80444@mail.gmail.com> Message-ID: <470E62AD.8050106@soe.ucsc.edu> Hello Xueya, I spoke to the developer who added the bedRegionIn and bedRegionOut options to the hgGcPercent program. He said that your syntax looks correct, except that 'noLoads' should be written as 'noLoad'. If you change that and still see the problem, perhaps you could send us your input file, or a sample of your input file to help us debug the program. (No need to cc the whole list for that . . . attachments get stripped by our mailing list software, anyway.) Let us know if the first fix does not resolve the issue. -- Brooke Rhead UCSC Genome Bioinformatics Group Xueya Zhou wrote: > Dear UCSC Genome Browser Crews, > > I encountered a problem when using your hgGcPercent program to calculate GC > percentage in regions that I specified in a bed file (in.bed). > > The command line I used looked like the following: > > hgGcPercent -bedRegionIn=in.bed -bedRegionOut=out.bed -doGaps -noLoads hg18 > hg18.2bit > > Then It seemed that the program ignored the bed file that I provided, and > went on directly to calculate the whole genome GC percent with the default > window size. It produced gcPercent.bed file that detailed the GC percent in > each region across whole genome, and produced the empty file out.bed which > was really wanted. > > I have checked that both the file format and my command spelling were > correct. I also had a glance at the source code > kent/src/hg/makeDb/hgGcPercent/hgGcPercent.c, it did process the options of > bedRegionIn and bedRegionOut. So I was puzzled about what is going wrong. > Could you help me figure it out? Thanks! > > Xueya > From rhead at soe.ucsc.edu Thu Oct 11 12:27:14 2007 From: rhead at soe.ucsc.edu (Brooke Rhead) Date: Thu, 11 Oct 2007 12:27:14 -0700 Subject: [Genome] Question on multiz In-Reply-To: <470C53B0.7050803@novasprint.com> References: <470C53B0.7050803@novasprint.com> Message-ID: <470E7912.4090508@soe.ucsc.edu> Hello Sen Kwan, One of our developers had the following comments for you: ---- I can't address why he didn't get alignments out of blastn or bl2seq, but this region of human (chr15:22,888,200-22,888,617) is somewhat homologous to scaffold_169278:1189-1699 in rabbit though only at about 50% id, so BLAT is not going to find it. Blastz is very sensitive. The user may want to look at the Rabbit Chain and the Rabbit browser on genome-test to check out the detailed alignments with just human and rabbit. You can see here that the contig this is in (contig_315977) doesn't cover the entire region in human. Since the rabbit is only 2X coverage, there isn't a way to determine synteny, so the alignment needs to be carefully evaluated by the researcher before accepting it as really orthologous. ---- The rabbit chain (on hg18) and rabbit browser he is referring to reside on our test server, here: http://genome-test.cse.ucsc.edu/ The contig mentioned above is located in this position on the rabbit browser: scaffold_169278:1-1,806. Note that much of the data available on our test server has not gone through our quality assurance process and may contain errors. -- Brooke Rhead UCSC Genome Bioinformatics Group Tay Sen Kwan wrote: > I am using the Genome Browser to look at the conservation of certain > cDNAs across a number of species, in particular the 28-species multiz > conservation track. For example, BC069077 which lies on chr15 q11.2, > showed sequence alignment with the rabbit. However, when I extracted > the rabbit sequence from the rabbit track, remove the gaps "-" and did 3 > alignment tests (with other tools) on it: > (a) a "Align two sequences using BLAST (bl2seq)" with BC069077 on NCBI > Blast - it gave a "no significant similarity" result > (b) a NCBI BLASTN on the human sequence with a very relaxed Expect=10 > value, and it found no alignments on chr15 > (c) a BLAT which also turn up no alignments on chr15 > > None of these tests indicated alignment to BC069077. Could you help me > resolve this apparent contradiction ? Many thanks. > > Regards, > > Sen Kwan > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From kayla at soe.ucsc.edu Thu Oct 11 14:58:26 2007 From: kayla at soe.ucsc.edu (Kayla Smith) Date: Thu, 11 Oct 2007 14:58:26 -0700 Subject: [Genome] strand in liftover application In-Reply-To: <20071006171152.334C0BCC028@post.tau.ac.il> References: <20071006171152.334C0BCC028@post.tau.ac.il> Message-ID: <470E9C82.6040007@cse.ucsc.edu> Hello Asaf, The liftOver tool takes as input either a position (which doesn't have a strand component) or a BED which may have a strand value. If you provide a BED6 or higher, you will see the strand being maintained. When using a BED 3, 4 or 5, the liftover assumes that the input is + and returns results corresponding, but without reporting the resulting strand. Here is a chart describing how the sign on the strand behaves: input chain result + + + - + - + - - - - + I hope this information is helpful to you. Please don't hesitate to contact us again if you require further assistance. Kayla Smith UCSC Genome Bioinformatics Group asaf levy wrote: > Hi, > > Does the strand data have any meaning in the liftover tool? > > Is there a possibility that some genomic positions changed had their strand > changed between different genomic versions? > > > > Regards, > > Asaf > > TAU > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From wzxiao at stanford.edu Thu Oct 11 16:16:24 2007 From: wzxiao at stanford.edu (wenzhong) Date: Thu, 11 Oct 2007 16:16:24 -0700 Subject: [Genome] a question about tRNA and rRNA in human genome Message-ID: <006e01c80c5c$be29e3c0$3a7dab40$@edu> Hello, Can I please ask a quick question about tRNA and ribosomal RNA regions in the human genome? I wonder how these species are annotated by the UCSC genome browser and if there is a good way to retrieve these annotations, especially chromosome coordinates. Thanks for the help! -Wenzhong From kayla at soe.ucsc.edu Thu Oct 11 16:45:53 2007 From: kayla at soe.ucsc.edu (Kayla Smith) Date: Thu, 11 Oct 2007 16:45:53 -0700 Subject: [Genome] forward and reverse ESTs In-Reply-To: References: Message-ID: <470EB5B1.5040704@cse.ucsc.edu> Hello Kavita, While we don't have information about pairs of ESTs, we have clone information in the gbCdnaInfo.mrnaClone column, which is a relational link to the mrnaClone.id column. If the clone is in IMAGE, the imageClone table has some information. You may also wish to look into dbEST for this information: http://www.ncbi.nlm.nih.gov/dbEST/ This should get you started. I hope this information is helpful to you. Please don't hesitate to contact us again if you require further assistance. Kayla Smith UCSC Genome Bioinformatics Group K. Garg wrote: > How can I find two ESTs (forward and reverse) which came from the same > clone ? Given an EST, how do I find its pair if it exists ? > > thanks > -kavita > > > ----------------------------------------------------------------------- > Kavita Garg, Ph.D. Email: kavitag at u.washington.edu > Dept of Genome Sciences, Phone: 206-616-5051 > Box 357730 Fax: 206-685-7301 > University of Washington School of Medicine > 1705 NE Pacific, Seattle, WA 98195-7730, USA > ----------------------------------------------------------------------- > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From kayla at soe.ucsc.edu Thu Oct 11 17:54:40 2007 From: kayla at soe.ucsc.edu (Kayla Smith) Date: Thu, 11 Oct 2007 17:54:40 -0700 Subject: [Genome] a question about tRNA and rRNA in human genome In-Reply-To: <006e01c80c5c$be29e3c0$3a7dab40$@edu> References: <006e01c80c5c$be29e3c0$3a7dab40$@edu> Message-ID: <470EC5D0.2060204@cse.ucsc.edu> Hello Wenzhong: Here are some previously answered mailing list questions on this topic: http://www.soe.ucsc.edu/pipermail/genome/2007-August/014299.html http://www.soe.ucsc.edu/pipermail/genome/2007-May/013491.html http://www.soe.ucsc.edu/pipermail/genome/2006-April/010436.html I hope this information is helpful to you. Please don't hesitate to contact us again if you require further assistance. Kayla Smith UCSC Genome Bioinformatics Group wenzhong wrote: > Hello, > > > > Can I please ask a quick question about tRNA and ribosomal RNA regions in > the human genome? I wonder how these species are annotated by the UCSC > genome browser and if there is a good way to retrieve these annotations, > especially chromosome coordinates. > > > > Thanks for the help! > > > > -Wenzhong > > > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From kayla at soe.ucsc.edu Thu Oct 11 18:12:06 2007 From: kayla at soe.ucsc.edu (Kayla Smith) Date: Thu, 11 Oct 2007 18:12:06 -0700 Subject: [Genome] question about UCSC 'Human ESTs Including Unspliced' track In-Reply-To: <470D80A0.000149.28847@app-01> References: <470D80A0.000149.28847@app-01> Message-ID: <470EC9E6.2090905@cse.ucsc.edu> Hello Xu Jerry, Thank you for your close reading of the EST description page. There are many genes with introns longer than 200kb. In fact it's possible to use the Table Browser to make a Custom Track of just the introns of the UCSC Genes track, if you're interested in looking at introns. Unspliced ESTs in the browser are not "pre-mRNAs". They are ESTs where we don't find any canonical splice sites when we align the EST. These could be single exon ESTs that will never be spliced, DNA contamination, or possibly even "pre-mRNAs" that got caught before splicing that are included the genbank data set. Meanwhile, if you do find any data on our website that seems questionable to you, feel free to send us specific examples and we can look into them. More often than not it's a case of us simply displaying what is available from genbank (we don't curate here!), but sometimes it is a bad alignment or filter. I hope this information is helpful to you. Please don't hesitate to contact us again if you require further assistance. Kayla Smith UCSC Genome Bioinformatics Group xu_jianzhen wrote: > Dear Genome Help, > I've a question about UCSC 'Human ESTs Including Unspliced' track.The methods of this track said,"To generate this track, human ESTs from GenBank were aligned against the genome using blat. Note that the maximum intron length allowed by blat is 750,000 bases, which may eliminate some ESTs with very long introns that might otherwise align".However,to my knowledge of biology,i do not think there are too much such long pre-mRNA that is more than 750kb, so my question is when I find a unspliced EST more than 200kb,does it mean this is a false positive? > > > Thank you all! > > > > Xu Jerry > GIBH,CAS > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From acorradin at dei.unipd.it Fri Oct 12 01:29:01 2007 From: acorradin at dei.unipd.it (acorradin at dei.unipd.it) Date: Fri, 12 Oct 2007 10:29:01 +0200 Subject: [Genome] human UTR sequence-gene name-probe_setID Message-ID: <20071012102901.5nt38fiehw4sc4sw@mail.dei.unipd.it> Dear Sirs and Madams, I need to relate microRNAs data with gene expression data. I would like to relate UTR sequences with the gene names of their entire sequences or Affymetrix probeset ID. Is it possible to do this? Thank you very much Alberto, Venice (ITALY) ---------------------------------------------------------------- This message was sent using IMP, the Internet Messaging Program. From pauhsi at nhri.org.tw Thu Oct 11 18:26:28 2007 From: pauhsi at nhri.org.tw (pauhsi) Date: Fri, 12 Oct 2007 09:26:28 +0800 Subject: [Genome] About Human Gene Sorter Message-ID: <000c01c80c6e$e995e5b0$020a450a@nhri.local> Hi, How do I considerably map knownGene.name to Gene Sorter? May I download them in your download page? From ann at soe.ucsc.edu Fri Oct 12 08:43:32 2007 From: ann at soe.ucsc.edu (Ann Zweig) Date: Fri, 12 Oct 2007 08:43:32 -0700 Subject: [Genome] About Human Gene Sorter In-Reply-To: <000c01c80c6e$e995e5b0$020a450a@nhri.local> References: <000c01c80c6e$e995e5b0$020a450a@nhri.local> Message-ID: <470F9624.8010503@soe.ucsc.edu> Hello, I am unclear on exactly what you are trying to do. The Gene Sorter is a tool that is distinct from the Genome Browser. However, the Gene Sorter does refer to data in the knownGene table. Please let us know exactly what you are trying to accomplish and we will try to offer our assistance. Regards, ---------- Ann Zweig UCSC Genome Bioinformatics Group http://genome.ucsc.edu pauhsi wrote: > Hi, > > How do I considerably map knownGene.name to Gene Sorter? > May I download them in your download page? > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From ann at soe.ucsc.edu Fri Oct 12 11:53:18 2007 From: ann at soe.ucsc.edu (Ann Zweig) Date: Fri, 12 Oct 2007 11:53:18 -0700 Subject: [Genome] human UTR sequence-gene name-probe_setID In-Reply-To: <20071012102901.5nt38fiehw4sc4sw@mail.dei.unipd.it> References: <20071012102901.5nt38fiehw4sc4sw@mail.dei.unipd.it> Message-ID: <470FC29E.10307@soe.ucsc.edu> Hello Alberto, I'm not completely clear about what you are trying to do. I will give you a little bit of information and after you read it, if you still have questions, please feel free to write back to the list with a more detailed question. We have an annotation track on the latest human assembly (hg18) that contains microRNAs. The track is called sno/miRNA. The name of the underlying database table for this track is wgRna. You can use the Table Browser ('Tables' from the top blue navigation bar) to mine data from this table. If you wish to select only the microRNAs from the wgRna table using the Table Browser, choose to 'filter' the table in this way: type does match miRna You can also use the Table Browser to intersect data from one table with data from another table. It appears that you may be interested in this. For example, you could intersect the miRna's from the wgRna table with an Affy probe alignment track. We have several such tables including: affyU133, affyHuEx1, affyGnf1h, etc. You would need to decide for yourself which table is most appropriate for your needs. For help on using the Table Browser see this page: http://genome.ucsc.edu/goldenPath/help/hgTablesHelp.html Hopefully this will be enough information for you to get started on your task. As I said, please feel free to write back to the list with more detailed questions. Regards, ---------- Ann Zweig UCSC Genome Bioinformatics Group http://genome.ucsc.edu Please feel free to search the Genome mailing list archives by visiting our home page, clicking on "Contact Us", then typing a word or phrase into the search box. On that same page (http://genome.ucsc.edu/contacts.html), you can subscribe to the Genome mailing list. acorradin at dei.unipd.it wrote: > Dear Sirs and Madams, > I need to relate microRNAs data with gene expression data. > I would like to relate UTR sequences with the gene names of their > entire sequences or Affymetrix probeset ID. > Is it possible to do this? > Thank you very much > > Alberto, Venice (ITALY) > > > ---------------------------------------------------------------- > This message was sent using IMP, the Internet Messaging Program. > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From rhead at soe.ucsc.edu Fri Oct 12 15:06:07 2007 From: rhead at soe.ucsc.edu (Brooke Rhead) Date: Fri, 12 Oct 2007 15:06:07 -0700 Subject: [Genome] Question about hgGcPercent program In-Reply-To: <6053b0950710111935h256da0c2jc97aac8f0c96600f@mail.gmail.com> References: <6053b0950710101002t19f1bc69lbfc59f0262c80444@mail.gmail.com> <470E62AD.8050106@soe.ucsc.edu> <6053b0950710111935h256da0c2jc97aac8f0c96600f@mail.gmail.com> Message-ID: <470FEFCF.2090809@soe.ucsc.edu> Hi Xueya, It turns out that when the bedRegionIn/bedRegionOut support for hgGcPercent was added, it was done for nib (makeGcTabFromNib) files but not for twoBit (makeGcTabFromTwoBit) files. We have now corrected this oversight. The change was made in the Kent source tree here: kent/src/hg/makeDb/hgGcPercent/hgGcPercent.c . . . with this revision: revision 1.26 date: 2007/10/12 06:27:17; author: daryl; state: Exp; lines: +63 -33 added support for the bedRegionIn/bedRegionOut option for 2bit files If you update to the latest source tree and try again, the problem should be fixed. Thank you for bringing it to our attention! -- Brooke Rhead UCSC Genome Bioinformatics Group Xueya Zhou wrote: > Hi Rhead, > > Thanks very much for your reply. Sorry that 'noLoads' is a typo in my > last email. > I did feed the program with correct syntax, but still could not get what > I want. > > I have attached the first few lines of my input bed file. Hope you can > help find the problem for me. > > In my computer, after I typed in > ~bin/i686/hgGcPercent -bedRegionIn=in.bed -bedRegionOut=out.bed -noLoad > -doGaps hg17 ~/dat/UCSC/sequence/hg17_2bit/out.2bit > > Then the program began with > # Calculating gcPercent with window size window size 20000 > # Using twoBit: /home/xzhou/dat/UCSC/sequence/hg17_2bit/out.2bit > ............................................................... > > After the program terminated, I saw the file gcPercent.bed that given GC > percent in each 20kb bin, and an empty file out.bed that I specified in > the command line. > > Xueya > > On 10/12/07, *Brooke Rhead* > wrote: > > Hello Xueya, > > I spoke to the developer who added the bedRegionIn and bedRegionOut > options to the hgGcPercent program. He said that your syntax looks > correct, except that 'noLoads' should be written as 'noLoad'. If you > change that and still see the problem, perhaps you could send us your > input file, or a sample of your input file to help us debug the program. > (No need to cc the whole list for that . . . attachments get stripped > by our mailing list software, anyway.) > > Let us know if the first fix does not resolve the issue. > > -- > Brooke Rhead > UCSC Genome Bioinformatics Group > > > > Xueya Zhou wrote: > > Dear UCSC Genome Browser Crews, > > > > I encountered a problem when using your hgGcPercent program to > calculate GC > > percentage in regions that I specified in a bed file (in.bed). > > > > The command line I used looked like the following: > > > > hgGcPercent -bedRegionIn=in.bed -bedRegionOut=out.bed -doGaps > -noLoads hg18 > > hg18.2bit > > > > Then It seemed that the program ignored the bed file that I > provided, and > > went on directly to calculate the whole genome GC percent with > the default > > window size. It produced gcPercent.bed file that detailed the GC > percent in > > each region across whole genome, and produced the empty file > out.bed which > > was really wanted. > > > > I have checked that both the file format and my command spelling > were > > correct. I also had a glance at the source code > > kent/src/hg/makeDb/hgGcPercent/hgGcPercent.c, it did process the > options of > > bedRegionIn and bedRegionOut. So I was puzzled about what is > going wrong. > > Could you help me figure it out? Thanks! > > > > Xueya > > > > > > > -- > Xueya Zhou > > Bioinformatics Division, Tsinghua National Laboratory of Information > Science and Technology > > Address: FIT 1-107, Tsinghua University, Beijing 100084, China > > Phone: +86-10-6279-5578 ext 822 From al-hasani at dife.de Sat Oct 13 13:35:05 2007 From: al-hasani at dife.de (Hadi Al-Hasani) Date: Sat, 13 Oct 2007 22:35:05 +0200 Subject: [Genome] mm8 to hg17 Message-ID: <4711481A0200009200015AE9@mail.dife.de> Dear all, we have been mapping quite a lot mouse QTL peak markers to mm8, i.e. NCBIs build 36. Now we've been asked by collegues to provide corresponding coordinates for the human genome, NCBIs build 35, i.e. hg 17. It seems that mm8<->hg17 is a rare combination....Does anyone have an idea how to deal with this, i.e. come up with a (basic) synteny table/alignment for hg17:mm8? Many thanks, Hadi ----- Hadi Al-Hasani, Ph.D. German Institute for Human Nutrition Potsdam-Rehbruecke Arthur-Scheunert-Allee 114-116 14558 Nuthetal / Germany Phone: +49.33200.88.388 Fax: +49.33200.88.334 e-mail: al-hasani at dife.de From zhaoh1 at gis.a-star.edu.sg Sun Oct 14 19:15:55 2007 From: zhaoh1 at gis.a-star.edu.sg (ZHAO Hao) Date: Mon, 15 Oct 2007 10:15:55 +0800 Subject: [Genome] questions about centromere and te