From dsmith00 at inbox.com Mon Sep 3 07:08:19 2007 From: dsmith00 at inbox.com (Dave Smith) Date: Mon, 3 Sep 2007 06:08:19 -0800 Subject: [Genome] mysql intersect query Message-ID: Hello I would like replicate the intersect queries that are performed by your table browser but instead use the public access mysql server. The type of intersect queries I am interested would be like these two (for human): All Known Genes records that have any overlap with SNPs All Known Genes records that have no overlap with SNPs I would be most grateful if you could provide me with the exact mysql queries that are used to produce this data in the table browser. I hope you can help and many thanks for providing the excellent browser! Dave ____________________________________________________________ ONE-CLICK WEBMAIL ACCESS - Easily monitor & access your email accounts! Visit http://www.inbox.com/notifier and check it out! From hiram at soe.ucsc.edu Mon Sep 3 07:31:10 2007 From: hiram at soe.ucsc.edu (Hiram Clawson) Date: Mon, 03 Sep 2007 07:31:10 -0700 Subject: [Genome] mysql intersect query In-Reply-To: References: Message-ID: <46DC1AAE.2000805@soe.ucsc.edu> Good Morning Dave: For almost all cases in the table browser, intersections are not performed with MySQL queries. Almost always, any item of an intersection is turned into a bed format representation, then the two bed format representations are intersected. We have a kent source tool command: featureBits, that can do this type of intersection. So if you extract your two items of interest as bed files, you can then intersect them with the featureBits tool. --Hiram Dave Smith wrote: > Hello > > I would like replicate the intersect queries that are performed by your table browser but instead use the public access mysql server. > > The type of intersect queries I am interested would be like these two (for human): > > All Known Genes records that have any overlap with SNPs > All Known Genes records that have no overlap with SNPs > > I would be most grateful if you could provide me with the exact mysql queries that are used to produce this data in the table browser. I hope you can help and many thanks for providing the excellent browser! > > Dave > > ____________________________________________________________ > ONE-CLICK WEBMAIL ACCESS - Easily monitor & access your email accounts! > Visit http://www.inbox.com/notifier and check it out! > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome > From alc at sanger.ac.uk Mon Sep 3 07:17:12 2007 From: alc at sanger.ac.uk (Avril Coghlan) Date: Mon, 03 Sep 2007 15:17:12 +0100 Subject: [Genome] converting coordinates from build 35 to build 36 Message-ID: <1188829032.32438.24.camel@deskpro104.dynamic.sanger.ac.uk> Hello, I would like very much to be able to convert genomic coordinates from human build 35 (hg17) to human build 36 (hg18). I looked on your website on the page http://hgdownload.cse.ucsc.edu/goldenPath/hg18/liftOver/ but there is no liftover file 'hg17Tohg18.zip' available. I'm wondering do you have such a file that you can give me, or could you make one for me? I will be very grateful for your help. regards Avril Coghlan Sanger Institute, Cambridge, UK -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From zhangr04 at post.kiz.ac.cn Sat Sep 1 21:42:14 2007 From: zhangr04 at post.kiz.ac.cn (zhangrui) Date: Sun, 2 Sep 2007 12:42:14 +0800 Subject: [Genome] Known Genes and Refseq genes Message-ID: <20070902043204.0D7533E00BF@post.kiz.ac.cn> Hi, I found that the known genes set of human contains some noncoding genes, such as microRNAs. For example, "uc001dbx.1 - 65296704 65296779 chr1", which is correspond to "hsa-mir-101-1". However, in the website, it is said that the Known Genes track shows known protein coding genes based on proteins from SWISS-PROT, TrEMBL, and TrEMBL-NEW and their corresponding mRNAs from GenBank. Could you tell me the possible reason and how can I get the known protein coding genes in the human genomes? whether the Refseq genes only contain protein coding genes? Thanks, Zhang From hartera at soe.ucsc.edu Mon Sep 3 11:03:42 2007 From: hartera at soe.ucsc.edu (Rachel Harte) Date: Mon, 3 Sep 2007 11:03:42 -0700 (PDT) Subject: [Genome] converting coordinates from build 35 to build 36 In-Reply-To: <1188829032.32438.24.camel@deskpro104.dynamic.sanger.ac.uk> References: <1188829032.32438.24.camel@deskpro104.dynamic.sanger.ac.uk> Message-ID: Hello Avril, The liftOver files are normally stored in the downloads for the genome from which you are lifting. In this case, you would like to lift from hg17 so the liftOver file is in the hg17 downloads: http://hgdownload.cse.ucsc.edu/goldenPath/hg17/liftOver/ The file is called hg17ToHg18.over.chain.gz. I hope that this helps you. Please let us know if you have further questions. Rachel Rachel Harte UCSC Genome Bioinformatics Group http://genome.ucsc.edu On Mon, 3 Sep 2007, Avril Coghlan wrote: > Hello, > I would like very much to be able to convert genomic coordinates > from human build 35 (hg17) to human build 36 (hg18). I looked on > your website on the page > http://hgdownload.cse.ucsc.edu/goldenPath/hg18/liftOver/ > but there is no liftover file 'hg17Tohg18.zip' available. > I'm wondering do you have such a file that you can give me, > or could you make one for me? I will be very grateful for > your help. > regards > Avril Coghlan > Sanger Institute, Cambridge, UK > > > > > -- > The Wellcome Trust Sanger Institute is operated by Genome Research > Limited, a charity registered in England with number 1021457 and a > company registered in England with number 2742969, whose registered > office is 215 Euston Road, London, NW1 2BE. > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome > From hartera at soe.ucsc.edu Mon Sep 3 11:23:06 2007 From: hartera at soe.ucsc.edu (Rachel Harte) Date: Mon, 3 Sep 2007 11:23:06 -0700 (PDT) Subject: [Genome] Known Genes and Refseq genes In-Reply-To: <20070902043204.0D7533E00BF@post.kiz.ac.cn> References: <20070902043204.0D7533E00BF@post.kiz.ac.cn> Message-ID: Hello Zhang, Known Genes has recently been updated to UCSC Genes which now includes both protein-coding and noncoding genes. The ID that you provide below (uc001dbx.1) is a UCSC stable ID from this new track on the human hg18 assembly. If you go the the latest human assembly, hg18, and click on the blue/gray bar at the left side of the UCSC Genes track (or on the hyperlink above the track control), you will see the description for this track where it is stated that it includes gene predictions for noncoding genes: "The UCSC Genes track shows gene predictions based on data from RefSeq, Genbank, and UniProt. This is a moderately conservative set of predictions, requiring the support of one GenBank RNA sequence plus at least one additional line of evidence. The RefSeq RNAs are an exception to this, requiring no additional evidence. The track includes both protein-coding and putative non-coding transcripts. Some of these non-coding transcripts may actually code for protein, but the evidence for the associated protein is weak at best." Our Table Browser will allow you to select only the protein-coding genes in the UCSC Gene set. If you click on the "Tables" link on the top blue menu bar, you will be taken to the Table Browser interface. Select the following: genome: Human assembly: Mar. 2006 group: Genes and Gene Prediction Tracks track: UCSC Genes table: kgTxInfo Then click on the "create" button next to filter and you can set the category to be coding. Alternatively, you can use our public mySQL server to query our database tables directly: http://genome.ucsc.edu/FAQ/FAQdownloads#download29 I hope that this will help you. Please let us know if you have further questions. Rachel Rachel Harte UCSC Genome Bioinformatics Group http://genome.ucsc.edu On Sun, 2 Sep 2007, zhangrui wrote: > Hi, > > I found that the known genes set of human contains some noncoding genes, such as microRNAs. For example, "uc001dbx.1 - 65296704 65296779 chr1", which is correspond to "hsa-mir-101-1". > However, in the website, it is said that the Known Genes track shows known protein coding genes based on proteins from SWISS-PROT, TrEMBL, and TrEMBL-NEW and their corresponding mRNAs from GenBank. Could you tell me the possible reason and how can I get the known protein coding genes in the human genomes? > > whether the Refseq genes only contain protein coding genes? > > Thanks, > Zhang > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome > From macampbell at davidson.edu Mon Sep 3 14:25:47 2007 From: macampbell at davidson.edu (A. Malcolm Campbell) Date: Mon, 3 Sep 2007 17:25:47 -0400 Subject: [Genome] Genome Browser Aligned Sequence suggestion Message-ID: <791E634B-11FA-4849-BC23-FE28AF906E2E@davidson.edu> In the comparative genomics section, could you PLEASE add a yeast comparison? As one of the great model organisms, it would be very powerful to see its sequences aligned with the other eukaryotes. As a second request, it would be instructional to be able to compare Arabidopsis by this method as well. Thank you, Malcolm ________________________________________________________ A. Malcolm Campbell, Ph.D. Professor of Biology Director, James G. Martin Genomics Program Davidson College Founding Director of GCAT (www.bio.davidson.edu/GCAT) Box 7118 (US Mail) 209 Ridge Road (shipping) Davidson, NC 28036 704-894-2692 (phone) 704-894-2512 (fax) www.bio.davidson.edu/campbell From archanat at soe.ucsc.edu Tue Sep 4 11:34:38 2007 From: archanat at soe.ucsc.edu (Archana Thakkapallayil) Date: Tue, 04 Sep 2007 11:34:38 -0700 Subject: [Genome] Genome Browser Aligned Sequence suggestion In-Reply-To: <791E634B-11FA-4849-BC23-FE28AF906E2E@davidson.edu> References: <791E634B-11FA-4849-BC23-FE28AF906E2E@davidson.edu> Message-ID: <46DDA53E.3090304@soe.ucsc.edu> Hello Malcom, I have passed your suggestions to our developers and here is the response: Regarding the yeast alignments, we cannot align genomic sequence across such large evolutionary distances. Only the most conserved of things (histones, chromatin modification, etc) are align-able at this distance and there's effectively no synteny. So orthology is impossible to determine. Regarding Arabidopsis, unfortunately we do not have plans for including plants to our browser. We are sorry about that. Thank you for the suggestions. Regards, Archana UCSC Genome Bioinformatics Group A. Malcolm Campbell wrote: > In the comparative genomics section, could you PLEASE add a yeast > comparison? As one of the great model organisms, it would be very > powerful to see its sequences aligned with the other eukaryotes. > > As a second request, it would be instructional to be able to compare > Arabidopsis by this method as well. > > Thank you, > Malcolm > > > ________________________________________________________ > A. Malcolm Campbell, Ph.D. > Professor of Biology > Director, James G. Martin Genomics Program > Davidson College > Founding Director of GCAT (www.bio.davidson.edu/GCAT) > > Box 7118 (US Mail) > 209 Ridge Road (shipping) > Davidson, NC 28036 > 704-894-2692 (phone) > 704-894-2512 (fax) > www.bio.davidson.edu/campbell > > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome > From emmanuel.mongin at mail.mcgill.ca Tue Sep 4 12:07:50 2007 From: emmanuel.mongin at mail.mcgill.ca (Emmanuel Mongin) Date: Tue, 4 Sep 2007 15:07:50 -0400 Subject: [Genome] Liftover, multiple hits in target genome Message-ID: Hi, I am trying to map human regions to teleost genome using liftover and the liftover files available in your download section. Using liftover and these files I only can map the region to the best hit in the target genome. Is there a way I can produce liftover files which would allow me to map a query region to different target sequences? Thanks a lot. Emmanuel Mongin From SU_CHEN at LILLY.COM Tue Sep 4 12:34:09 2007 From: SU_CHEN at LILLY.COM (Chen Su) Date: Tue, 4 Sep 2007 15:34:09 -0400 Subject: [Genome] Question Message-ID: Hi, I have a quick question. I have a BED file with chrom regions. and I'd like to know the list of genes that either overlap or are within 1kb distance to those regions. What's the best way to do this? Thanks a lot for your help! - Chen From archanat at soe.ucsc.edu Tue Sep 4 13:33:39 2007 From: archanat at soe.ucsc.edu (Archana Thakkapallayil) Date: Tue, 04 Sep 2007 13:33:39 -0700 Subject: [Genome] RP11-85P8 In-Reply-To: <46D48290.5000507@soe.ucsc.edu> References: <25072009.215891188283662431.OPEN-XCHANGE.WebMail.tomcat@hboxf1> <46D48290.5000507@soe.ucsc.edu> Message-ID: <46DDC123.9080007@soe.ucsc.edu> Hello Dominique, Sorry for the delay in responding to your question. We are examining the issue to restore the clone to the fishClones track. In the meantime, to locate RP11-85P8 on hg18, take the hg17 sequence: >hg17_RP11-85P8 TCACTAAATAGCTGAAAAATTTACATATTATTTTAAAACATAGACTTAAA AAATCATATTAGCTTCTCCTTAGCAAAATGCTTTTGTTTTATGTATTTAC AAGAATATACTGTACTTCAGGTACACAATTCACTCAAGCCAGCCTGAGAA GGCCTTGGATGCAGATCAATGCTCCAATAAAGTTCATTATCAGCTCCTCC TGCCTTGTGACAGGATGATTTGATTTTACAAAAGTCCCTTTGAAAACAAG AGTAAACGCAGACAGCTTCTAGAGAAAAGTCTGGTGAAGCAGCAGTTGAT AATAGATTTTCTTTTAGTGATGAAATTAATCTTGTTTTGGTAATCTACAG CCTGTTAGGGATAGGTGGAGGGATGAAGTCCTTAAAACTAAATTGTTCCT And blat that to hg18, to find the location: chr13:27475789-27476188, which can also be found on hg18 as the STS Marker RH1828. I hope that this helps. If you have further questions please don't hesitate to contact us. Regards, Archana. Archana Thakkapallayil wrote: > Hello Dominique, > > Our developers are looking into this issue and it may take a few days. I > will get back to you when I have more information. > > Regards, > > Archana > UCSC Genome Bioinformatics Group > > Dominique Muehlematter wrote: > >> Sir, At the beginning of the year, I was interested to perform FISH >> experiments in 13q12 region and consequently I looked for BACs in USCS >> database and ordered them to BACPAC Resources. One of these BACs is >> RP11-85P8 (found with UCSC Genome Browser v149). Now I am unable to >> relocalize it in 13q12. Do you know what happens ? Thank you in advance >> for your response. Yours sincerly. D. M?hlematter >> Dr. Dominique M?hlematter Unit? de cytog?n?tique du cancer >> Service de g?n?tique m?dicale CHUV 1011 Lausanne, Suisse >> tel. 41'21'314'33'82 Fax 41'21'314'34'44 >> email : Dominique.Muhlematter at chuv.ch >> >> >> _______________________________________________ >> Genome maillist - Genome at soe.ucsc.edu >> http://www.soe.ucsc.edu/mailman/listinfo/genome >> >> > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome > From archanat at soe.ucsc.edu Tue Sep 4 14:35:04 2007 From: archanat at soe.ucsc.edu (Archana Thakkapallayil) Date: Tue, 04 Sep 2007 14:35:04 -0700 Subject: [Genome] Question In-Reply-To: References: Message-ID: <46DDCF88.2040601@soe.ucsc.edu> Hello Chen, This task can be accomplished with the Table Browser intersection tool. It will involve first creating a custom track with your BED file. In order to get the list of genes that are within 1kb distance to your regions, first take the BED file and then subtract 1,000 from all the starts and add 1,000 to all the ends (so that all of the chrom regions are now 1,000 bases longer on either side). Then upload that BED file as a custom track. Please see the following link for information on creating custom annotation tracks: http://www.genome.ucsc.edu/goldenPath/help/hgTracksHelp.html#CustomTracks Then intersect your custom track with the 'UCSC Genes' or the 'RefSeq Genes' track using the Table Browser intersection tool. More information on using the intersection tool is here: http://www.genome.ucsc.edu/goldenPath/help/hgTablesHelp.html#Intersection Note that when you do a Table Browser intersection, the results only give the items and their positions from the first table that you selected. So to obtain the list of gene names, you'll have to select the gene track as the first track, and your custom track as the second track for the intersection. I hope that this helps. If you have further questions please don't hesitate to contact us. Regards, Archana UCSC Genome Bioinformatics Group Chen Su wrote: > Hi, > I have a quick question. I have a BED file with chrom regions. and I'd > like to know the list of genes that either overlap or are within 1kb > distance to those regions. What's the best way to do this? Thanks a lot > for your help! > - Chen > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome > From gopinath at cshl.edu Tue Sep 4 16:14:29 2007 From: gopinath at cshl.edu (Gopinathrao, Gopal) Date: Tue, 4 Sep 2007 19:14:29 -0400 Subject: [Genome] [Reactome-announce] Ver 22 released! Message-ID: Version 22 of the Reactome Knowledgebase has been released and is accessible at http://www.reactome.org! Reactome is a curated knowledgebase developed and maintained by the Reactome Knowledgebase team (Lincoln Stein's group at CSHL and Ewan Birney's group at European Bioinformatics Institute). Reactome covers human biological processes ranging from basic pathways of metabolism to complex events such as hormonal signaling and apoptosis. The information in Reactome is provided by expert bench biologists, and edited and managed as a relational database by the Reactome staff. New material is peer-reviewed and revised as necessary before publication to the web. Reactome entries are linked to corresponding ones in NCBI, Entrez Gene, RefSeq, OMIM, Ensembl genome annotations, HapMap, UCSC Genome Browser, KEGG, ChEBI and Gene Ontology (GO). New topics released in Version 22 include Botulinum neurotoxicity, Membrane trafficking and Metabolism of vitamins, pathways for HIV Nef protein interactions (HIV infection pathway), Immunoregulatory interactions (Immune signaling pathway) and AMPK regulation of fatty acid oxidation (Integration of energy metabolism pathways). A new tool, Reactome Mart, is available for comprehensive datamining. Reactome Wiki is available for editing documentation by the user community. As before, Reactome data can be exported in SMBL, Prot?g?, and BioPAX level 2 formats. Protein-protein interaction datasets derived from curated human and predicted non-human events are available. A SOAP based Web Services API is also available along with other resources to access the Reactome data. Links to these are on the Download page. The standard display feature allows the user to choose the focus species annotations - curated (for human) and electronically inferred (for 22 other species). Updated release statistics and the Editorial Calendar are available. Like everything in Reactome, these downloaded and exported materials can be reused and redistributed freely. For questions and comments please reply to this message or write to help at reactome.org _______________________________________________ Reactome-announce mailing list Reactome-announce at reactome.org http://mail.reactome.org/mailman/listinfo/reactome-announce From archanat at soe.ucsc.edu Tue Sep 4 17:11:09 2007 From: archanat at soe.ucsc.edu (Archana Thakkapallayil) Date: Tue, 04 Sep 2007 17:11:09 -0700 Subject: [Genome] Liftover, multiple hits in target genome In-Reply-To: References: Message-ID: <46DDF41D.8020002@soe.ucsc.edu> Hello Emmanuel, You can get multiple output regions using the liftOver tool by checking the box 'Allow multiple output regions' and using the coordinates in the BED4 format. However, the liftOver tool gives you only the best net in the specified region. In this case, if you view this region on the Human browser with the Zebrafish chain and net tracks ON, you can see that there is only one level one net in that location on chr25 on zebrafish. If you are using stand alone liftOver tool, there are several parameters that can be used at the command line to allow for multple regions ( but they still must be level-one nets), or to loosen up the matching requirements. Sorry, we are not going to produce a liftOver file that will enable you to do this. You could get the information that you need using the Table Browser and the appropriate chain table. I hope this information helps you. Please let us know if you have further questions. Regards, Archana UCSC Genome Bioinformatics Group Emmanuel Mongin wrote: > Hi, > > I am trying to map human regions to teleost genome using liftover and > the liftover files available in your download section. Using liftover > and these files I only can map the region to the best hit in the > target genome. Is there a way I can produce liftover files which > would allow me to map a query region to different target sequences? > > Thanks a lot. > > Emmanuel Mongin > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome > From rhead at soe.ucsc.edu Tue Sep 4 17:12:27 2007 From: rhead at soe.ucsc.edu (Brooke Rhead) Date: Tue, 04 Sep 2007 17:12:27 -0700 Subject: [Genome] BLATing probe sequences against human genome In-Reply-To: <7.0.1.0.2.20070830120537.022d5118@llnl.gov> References: <7.0.1.0.2.20070828182012.00e46598@llnl.gov> <46D622A0.40401@soe.ucsc.edu> <7.0.1.0.2.20070830120537.022d5118@llnl.gov> Message-ID: <46DDF46B.8060205@soe.ucsc.edu> Hello Tim, Thank you for searching the archives. The answer you found does contradict what I said! I spoke to Galt about it, and he did some testing and confirmed the following: ---- I should have said "-q rnax" and "-q dnax" in that old post instead of "-q dna" and "-q rna". It is only on the translated queries that using q=dnax will double your results compared to q=rnax because the latter assumes you have the query sequence from the right strand and don't need the query to get reverse-complemented too. With "-q dnax" that assumption is not held and blat aligns the reverse-complemented query too. For translated queries, there is a complete index for each of the strands, and they both get searched on any query. I think the user is right that polyAAAA trimming is activated by default if the "-q rna" is specified, so that would be the only difference between using "-q dna" and "-q rna" with untranslated queries. For untranslated queries, only the positive strand of the target is indexed, and both the query and its reverse-complement are automatically searched against that pos-strand index by BLAT. ---- I hope this clears up the confusion. Please let us know if you have further questions. -- Brooke Rhead UCSC Genome Bioinformatics Group Tim Gernat wrote: > At 06:51 PM 8/29/2007, you wrote: > > Hi Brooke, > > Thanks for your quick answer and the suggestions for parameters. > > In the meantime I found an old mail from the genome mailing list on > Google. It says that if q=dna is used, BLAT will also take the reverse > complement of a query and align it against the database (see Galt's > first answer in > http://www.soe.ucsc.edu/pipermail/genome/2006-August/011440.html). > Basically, what I am doing is aligning probe sequences (i.e. short > things) against the whole human genome and then count how many > full-length hits without gaps I get. So, if I use -q=dna instead of > q=rna and if what Galt says is true, I should potentially get hits for > loci that don't produce transcripts detected by the probes. > > Can you confirm this? > > Thanks, > Tim > >> Hello Tim, >> >> You can actually let the -t and -q parameters both default to "dna". >> (Apparently there is not any real difference between -q=dna and >> -q=rna. It makes a difference when you are using translated BLAT, so >> -q=dnax vs. -q=rnax will give different results). >> >> Since the probes are only 25 bases long, which is near the lower limit >> of the query length to which BLAT can detect matches, you will want to >> use parameters that will maximize BLAT's sensitivity. >> >> One of our developers has come up with a guide for standalone blat and >> gfServer/gfClient to maximize sensitivity for short sequences: >> >> ---- >> If a tile is not marked as overused, then here is the formula for the >> shortest guaranteed exact match: >> >> 2 * stepSize + tileSize - 1 >> >> With stepSize=5 and tileSize=11 you can find things >> as short as 2*5+11-1 = 20 bp. >> >> - stepSize can be from 1 to tileSize. >> >> - tileSize can be from 6 to 15. >> >> - Do not use -fastMap. >> >> - Do not use masking commandline options. >> >> - Use a large value for repMatch, e.g. -repMatch=1000000 to reduce the >> chance of a tile being marked as over-used >> >> - Do not use a .ooc file. >> >> Note that these changes will make BLAT more sensitive, but also make >> it slower and increase memory used. You can do one chromosome at a >> time to reduce memory requirements if needed. >> >> -minScore will not actually go less than 1 or greater than about >> qSize/2. Therefore use either pslReps or pslCDnaFilter program >> available in the Genome Browser source code to filter for >> size/score/coverage/quality desired. >> ---- >> >> I hope this is helpful. Please feel free to contact us again if you >> have further questions. >> >> -- >> Brooke Rhead >> UCSC Genome Bioinformatics Group >> >> >> >> Tim Gernat wrote: >>> Hi! >>> I want to use BLAT to align probe sequences from an Affymetrix >>> expression array to the human genome. The goal is to find out which >>> probes potentially detect RNA originating from multiple genes. Each >>> of the probes is described in a file that looks like this ... >>> Probe Set Name,Probe X,Probe Y,Probe Interrogation Position,Probe >>> Sequence,Target Strandedness >>> 1007_s_at,416,177,3330,CACCCAGCTGGTCCTGTGGATGGGA,Antisense >>> 1007_s_at,569,289,3443,GCCCCACTGGACAACACTGATTCCT,Antisense >>> ... >>> ... and the field "Target Strandedness" always says "Antisense". >>> My question is, how do I set the BLAT parameters -t and -q in order >>> to find all genomic loci from which RNA could be produced that can be >>> detected by one of the probes? Currently I set the parameters to >>> -t=dna and -q=rna, but I'm new to BLAT and not sure whether this is >>> right. >>> Thanks in advance for your help! >>> Regards, >>> Tim >>> >>> Tim Gernat >>> Genome Biology Group, CMLS >>> Lawrence Livermore National Laboratory >>> 7000 East Avenue, L-441 >>> Livermore, CA 94550 _______________________________________________ >>> Genome maillist - Genome at soe.ucsc.edu >>> http://www.soe.ucsc.edu/mailman/listinfo/genome >> >> Tim Gernat >> Genome Biology Group, CMLS >> Lawrence Livermore National Laboratory >> 7000 East Avenue, L-441 >> Livermore, CA 94550 From SU_CHEN at LILLY.COM Wed Sep 5 07:47:57 2007 From: SU_CHEN at LILLY.COM (Chen Su) Date: Wed, 5 Sep 2007 10:47:57 -0400 Subject: [Genome] Question In-Reply-To: <46DDCF88.2040601@soe.ucsc.edu> Message-ID: thank you very much! this is very helpful. - Chen Archana Thakkapallayil 09/04/2007 05:35 PM Please respond to archanat at soe.ucsc.edu To Chen Su cc genome at soe.ucsc.edu Subject Re: [Genome] Question Hello Chen, This task can be accomplished with the Table Browser intersection tool. It will involve first creating a custom track with your BED file. In order to get the list of genes that are within 1kb distance to your regions, first take the BED file and then subtract 1,000 from all the starts and add 1,000 to all the ends (so that all of the chrom regions are now 1,000 bases longer on either side). Then upload that BED file as a custom track. Please see the following link for information on creating custom annotation tracks: http://www.genome.ucsc.edu/goldenPath/help/hgTracksHelp.html#CustomTracks Then intersect your custom track with the 'UCSC Genes' or the 'RefSeq Genes' track using the Table Browser intersection tool. More information on using the intersection tool is here: http://www.genome.ucsc.edu/goldenPath/help/hgTablesHelp.html#Intersection Note that when you do a Table Browser intersection, the results only give the items and their positions from the first table that you selected. So to obtain the list of gene names, you'll have to select the gene track as the first track, and your custom track as the second track for the intersection. I hope that this helps. If you have further questions please don't hesitate to contact us. Regards, Archana UCSC Genome Bioinformatics Group Chen Su wrote: > Hi, > I have a quick question. I have a BED file with chrom regions. and I'd > like to know the list of genes that either overlap or are within 1kb > distance to those regions. What's the best way to do this? Thanks a lot > for your help! > - Chen > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome > From donnak at soe.ucsc.edu Wed Sep 5 11:42:26 2007 From: donnak at soe.ucsc.edu (Donna Karolchik) Date: Wed, 5 Sep 2007 11:42:26 -0700 Subject: [Genome] publishing screen shots] References: <46D75730.8050509@cse.ucsc.edu> <0e1d01c7eb61$b78377f0$0ba8a8c0@donnakLT> <46D75B86.9080700@cse.ucsc.edu> Message-ID: <049d01c7efec$9a85aa80$0ba8a8c0@donnakLT> hi Julie, Thanks for including the Genome Browser in your article! Feel free to include screen shots of the browser. We'd appreciate it if you follow our citation guidelines at http://genome.ucsc.edu/cite.html (note the "Screen shots" section at the bottom of the pate). You are welcome to display the UCSC logo, although it is not required. We'd appreciate the opportunity to review the poster/review. You can send the draft directly to me at the email address below. -Donna ----------------------------------- Donna Karolchik donnak at soe.ucsc.edu UCSC Genome Bioinformatics Group http://genome.ucsc.edu >>> -------- Original Message -------- >>> Subject: [Genome] publishing screen shots >>> Date: Thu, 30 Aug 2007 17:41:12 -0400 >>> From: Julie Segre >>> To: genome at soe.ucsc.edu >>> CC: cdgstrong at mail.nih.gov >>> >>> I am an investigator at NHGRI/NIH in Bethesda. Dr. Cristina >>> Strong >>> and I are preparing a mini-review article for the Journal of >>> Cell >>> Science on "Navigating the Genome". We plan to include screen >>> shot >>> captures of the UCSC website to instruct cell biologists how >>> to query >>> the website and return useful information. Do you have >>> constraints >>> on publication of this data? Would you like to have the UCSC >>> logo >>> displayed? Do you need to review the poster/review? Thanks, >>> and >>> feel free to contact me in whichever way is easiest for you. >>> Sincerely, Julie Segre >>> -- >>> Julie Segre, Ph.D. >>> Senior Investigator >>> National Human Genome Research Institute, NIH >>> 49 Convent Drive >>> Bldg 49, Room 4A26, MSC 4442 >>> Bethesda, MD 20892-4442 >>> Phone: 301 402 2314 >>> FAX: 301 402 4929 >>> e-mail: jsegre at nhgri.nih.gov >>> http://www.genome.gov/Staff/Segre/ >>> _______________________________________________ >>> Genome maillist - Genome at soe.ucsc.edu >>> http://www.soe.ucsc.edu/mailman/listinfo/genome >>> > From calhoujd at umich.edu Wed Sep 5 15:13:56 2007 From: calhoujd at umich.edu (calhoujd at umich.edu) Date: Wed, 05 Sep 2007 18:13:56 -0400 Subject: [Genome] findMotif on a Vista 64-bit machine Message-ID: <20070905181356.v3xmqmdgg0scsk8g@web.mail.umich.edu> I am looking to run the findMotif utility on a Windows Vista 64-bit machine. I had previously been running it on Windows XP using Cygwin, but I've read that Cygwin doesn't run properly on the Vista OS. Any advice for how to get findMotif running on Vista would be greatly appreciated. Thanks in advance. Sincerely, Jeff Calhoun University of Michigan Graduate Student From hiram at soe.ucsc.edu Wed Sep 5 15:58:44 2007 From: hiram at soe.ucsc.edu (Hiram Clawson) Date: Wed, 05 Sep 2007 15:58:44 -0700 Subject: [Genome] findMotif on a Vista 64-bit machine In-Reply-To: <20070905181356.v3xmqmdgg0scsk8g@web.mail.umich.edu> References: <20070905181356.v3xmqmdgg0scsk8g@web.mail.umich.edu> Message-ID: <46DF34A4.6000706@soe.ucsc.edu> Good Afternoon Jeff: That's too bad that CygWin doesn't work on Vista. There isn't much advice we can offer to that. Unless you can get some kind of gcc compiler system to function on Vista, that's your best bet. --Hiram calhoujd at umich.edu wrote: > I am looking to run the findMotif utility on a Windows Vista 64-bit > machine. I had previously been running it on Windows XP using Cygwin, > but I've read that Cygwin doesn't run properly on the Vista OS. Any > advice for how to get findMotif running on Vista would be greatly > appreciated. Thanks in advance. > > Sincerely, > > Jeff Calhoun > University of Michigan > Graduate Student From peter.shepard at gmail.com Wed Sep 5 16:10:54 2007 From: peter.shepard at gmail.com (Pete Shepard) Date: Wed, 5 Sep 2007 16:10:54 -0700 Subject: [Genome] ref_seq_id from bed coordinates Message-ID: <5c2c43620709051610h18d12904y6a68b6a9b2b9209a@mail.gmail.com> Hello Browser Folks, I have a list of genomic coordinates that map to regions in genes. Can you please tell me how I might use the Table Browser to get the refseq ids to which these coordinates map? Thanks From rhead at soe.ucsc.edu Wed Sep 5 16:42:20 2007 From: rhead at soe.ucsc.edu (Brooke Rhead) Date: Wed, 05 Sep 2007 16:42:20 -0700 Subject: [Genome] ref_seq_id from bed coordinates In-Reply-To: <5c2c43620709051610h18d12904y6a68b6a9b2b9209a@mail.gmail.com> References: <5c2c43620709051610h18d12904y6a68b6a9b2b9209a@mail.gmail.com> Message-ID: <46DF3EDC.3080806@soe.ucsc.edu> Hello Pete, To do this, first create a custom track of your genomic coordinates. If you need more information on this step, the custom track user's guide is here: http://genome.ucsc.edu/goldenPath/help/customTrack.html. Next, select the 'refGene' table in the Table Browser, then hit the "intersection: create" button. Select your custom track as the table with which to intersect. Choose one of the options under "These combinations will maintain the gene/alignment structure (if any) of RefSeq Genes" (otherwise, you will lose the refSeq ID information). Hit "submit". Now you can choose to output the lines from the refGene table that intersect with your custom track in whatever output format you wish to use (all fields, selected fields, BED, custom track, etc.). Note that this method will return a list of the RefSeq Genes and their genomic coordinates. The original genomic coordinates in the custom track you made will not be maintained. If you need to keep information from both tables, there is another set of tools you could use, called "Galaxy", that works in conjunction with the UCSC Table Browser: http://main.g2.bx.psu.edu/ Galaxy is run by Penn State. Their helpdesk email is galaxy-user at bx.psu.edu. -- Brooke Rhead UCSC Genome Bioinformatics Group Pete Shepard wrote: > Hello Browser Folks, > > I have a list of genomic coordinates that map to regions in genes. Can you > please tell me how I might use the Table Browser to get the refseq ids to > which these coordinates map? > > Thanks > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From Pelleymounter.Linda at mayo.edu Thu Sep 6 06:25:36 2007 From: Pelleymounter.Linda at mayo.edu (Pelleymounter, Linda L.) Date: Thu, 6 Sep 2007 08:25:36 -0500 Subject: [Genome] Affymetrix All Exon Chips Message-ID: <8FF81777FF523B40B3D2618A77E6BB2AD37581@MSGEBE20.mfad.mfroot.org> I was looking at the Affymetrix All Exon Chips (tissue). I noticed that the scale is from -4.0 to 4.0, where a green color indicates underexpression and a red color indicates overexpression. What does the black mean? Does the black mean no hybridization? or, does the black mean expressed, but only at the median level? I was specifically looking at the expression level within tissue of the areas represented by the Affymetrix Exon 1.0 Probe Sets, 2411275 and 2411276. Thank you. Linda From sukhinder.sandhu at osumc.edu Thu Sep 6 05:55:39 2007 From: sukhinder.sandhu at osumc.edu (Sukhinder Sandhu) Date: Thu, 06 Sep 2007 08:55:39 -0400 Subject: [Genome] Table browser output typo? Message-ID: Hi I was looking at the Table Schema of the following table at the table browser and I got the output from which it seems one of the header values is misplaced. Looks like 'exonCount' comes after 'exonStarts' and 'exonEnds', but its listed otherwise. TABLE browser options used: Vertebrate->Human->Mar2006->Genes and Gene Prediction Tracks->UCSC Genes->knownGene->chr2:239504713-239716637->all fields from selected table OUTPUT #name chrom strand txStart txEnd cdsStart cdsEnd exonCount exonStarts exonEnds proteinID alignID uc002vyj.1 chr2 - 239505934 239512902 239511019 239512311 10 239505934,239506360,239507120,239508015,239508242,239508628,239509705,239511 490,239512570,239512871, 239506149,239506480,239507213,239508109,239508504,239508971,239511023,239512 320,239512753,239512902, Q6ZUA0 uc002vyj.1 uc002vyk.1 chr2 - 239635318 239987580 239639729 239939331 27 239635318,239640092,239641381,239653354,239655121,239667727,239668734,239670 787,239674191,239676641,239681642,239689408,239694682,239698158,239701685,23 9713088,239720877,239721159,239726316,239731215,239743284,239750435,23976304 5,239776465,239823225,239939309,239987007, 239639769,239640234,239641466,239653488,239655240,239667825,239668854,239670 875,239674247,239676749,239681689,239689529,239694816,239698345,239701943,23 9713312,239721076,239721276,239726429,239731347,239743406,239750556,23976319 6,239776710,239823297,239939550,239987580, P56524 uc002vyk.1 Thanks sukhinder From jill.recla at jax.org Thu Sep 6 07:00:51 2007 From: jill.recla at jax.org (Jill Recla) Date: Thu, 06 Sep 2007 10:00:51 -0400 Subject: [Genome] Filter by phenotype? Message-ID: <46E00813.5000009@jax.org> Good morning, I'm trying to find all genes in a human QTL region that are related to a specific phenotype. I can find all genes in the region using the UCSC genome browser, but is there any way to filter them by phenotype? I have several QTLs that need to be searched and going through the genes manually will be quite time consuming. Thanks, Jill From kuhn at soe.ucsc.edu Thu Sep 6 09:07:13 2007 From: kuhn at soe.ucsc.edu (Robert Kuhn) Date: Thu, 6 Sep 2007 09:07:13 -0700 Subject: [Genome] Table browser output typo? Message-ID: <200709061607.JAA16862@moondance.cse.ucsc.edu> sukhinder, I read the table output as giving exonCounts as 10 and 27 for your two records. These numbers are followed by two comma-separated lists of 10 or 27 items. believe the fields and the headers match. best wishes, --b0b kuhn genome bioinformatics group > From genome at soe.ucsc.edu Thu Sep 6 08:47:34 2007 > To: > Subject: [Genome] Table browser output typo? > > Hi > I was looking at the Table Schema of the following table at the table > browser and I got the output from which it seems one of the header values is > misplaced. Looks like 'exonCount' comes after 'exonStarts' and 'exonEnds', > but its listed otherwise. > > TABLE browser options used: > Vertebrate->Human->Mar2006->Genes and Gene Prediction Tracks->UCSC > Genes->knownGene->chr2:239504713-239716637->all fields from selected table > > OUTPUT > #name chrom strand txStart txEnd cdsStart cdsEnd > exonCount exonStarts exonEnds proteinID alignID > uc002vyj.1 chr2 - 239505934 239512902 239511019 239512311 > 10 > 239505934,239506360,239507120,239508015,239508242,239508628,239509705,239511 > 490,239512570,239512871, > 239506149,239506480,239507213,239508109,239508504,239508971,239511023,239512 > 320,239512753,239512902, Q6ZUA0 uc002vyj.1 > uc002vyk.1 chr2 - 239635318 239987580 239639729 239939331 > 27 > 239635318,239640092,239641381,239653354,239655121,239667727,239668734,239670 > 787,239674191,239676641,239681642,239689408,239694682,239698158,239701685,23 > 9713088,239720877,239721159,239726316,239731215,239743284,239750435,23976304 > 5,239776465,239823225,239939309,239987007, > 239639769,239640234,239641466,239653488,239655240,239667825,239668854,239670 > 875,239674247,239676749,239681689,239689529,239694816,239698345,239701943,23 > 9713312,239721076,239721276,239726429,239731347,239743406,239750556,23976319 > 6,239776710,239823297,239939550,239987580, P56524 uc002vyk.1 > > > Thanks > > sukhinder > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome > From rhead at soe.ucsc.edu Thu Sep 6 15:29:47 2007 From: rhead at soe.ucsc.edu (Brooke Rhead) Date: Thu, 06 Sep 2007 15:29:47 -0700 Subject: [Genome] Filter by phenotype? In-Reply-To: <46E00813.5000009@jax.org> References: <46E00813.5000009@jax.org> Message-ID: <46E07F5B.70208@soe.ucsc.edu> Hello Jill, At the present time, we are not hosting phenotype data on our site. These data sets are subject to misinterpretation and errors, and due to the potential medical ramifications of the presence of incorrect or misleading information (and because we do not have sufficient time and expertise on our staff to ensure that errors are not present), we have elected not to host the data. Therefore, there is currently not a way to filter by phenotype using the Genome Browser. -- Brooke Rhead UCSC Genome Bioinformatics Group Jill Recla wrote: > Good morning, > > I'm trying to find all genes in a human QTL region that are related to a > specific phenotype. I can find all genes in the region using the UCSC > genome browser, but is there any way to filter them by phenotype? I > have several QTLs that need to be searched and going through the genes > manually will be quite time consuming. > > Thanks, > Jill > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From kayla at soe.ucsc.edu Thu Sep 6 16:05:13 2007 From: kayla at soe.ucsc.edu (Kayla Smith) Date: Thu, 06 Sep 2007 16:05:13 -0700 Subject: [Genome] Affymetrix All Exon Chips In-Reply-To: <8FF81777FF523B40B3D2618A77E6BB2AD37581@MSGEBE20.mfad.mfroot.org> References: <8FF81777FF523B40B3D2618A77E6BB2AD37581@MSGEBE20.mfad.mfroot.org> Message-ID: <46E087A9.60806@cse.ucsc.edu> Hello Linda, The black color in the Affymetrix All Exon track means at or around median expression. I hope this is helpful to you. Please don't hesitate to contact us again if you require further assistance. Kayla Smith UCSC Genome Bioinformatics Group Pelleymounter, Linda L. wrote: > I was looking at the Affymetrix All Exon Chips (tissue). I noticed that > the scale is from -4.0 to 4.0, where a green color indicates > underexpression and a red color indicates overexpression. What does the > black mean? Does the black mean no hybridization? or, does the black > mean expressed, but only at the median level? > I was specifically looking at the expression level within tissue of the > areas represented by the Affymetrix Exon 1.0 Probe Sets, 2411275 and > 2411276. > > Thank you. > Linda > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From Keith.Brown at bristol.ac.uk Fri Sep 7 07:37:09 2007 From: Keith.Brown at bristol.ac.uk (KW Brown, Cellular & Molecular Medicine) Date: Fri, 07 Sep 2007 15:37:09 +0100 Subject: [Genome] WT1 gene clone Message-ID: <24647030.1189179429@pam-1007.pam.bris.ac.uk> Hi, Just been looking at the human WT1 gene (chr11:32,366,412-32,471,435) for longer transcripts in UCSC Genome Browser and found S75264, which at first site looks very interesting (listed under "Human mRNAs from Genbank). However, looking up the reference, it is clear that this is the sequence of a polyhistidine tagged recombinant clone. Could you get this removed from the next human assembly to avoid further confusion? Many thanks, Keith Brown ---------------------- Dr Keith Brown Reader in Molecular Pathology University of Bristol Department of Cellular & Molecular Medicine School of Medical Sciences University Walk Bristol BS8 1TD UK Tel:+44 (0)117 3312071 Fax:+44 (0)117 9287896 EMail:Keith.Brown at bristol.ac.uk From aramamoo at iupui.edu Fri Sep 7 09:07:17 2007 From: aramamoo at iupui.edu (Anuradha Ramamoorthy) Date: Fri, 7 Sep 2007 12:07:17 -0400 Subject: [Genome] miRNA promoter Message-ID: <000e01c7f169$296bcfe0$db7ba695@ads.iu.edu> Hi, I recently came across a mail from the mailing list. Link: http://www.soe.ucsc.edu/pipermail/genome/2005-June/007710.html But, the solution given there did not work for me. Is it correct? Thanks, Anu From ann at soe.ucsc.edu Fri Sep 7 09:45:10 2007 From: ann at soe.ucsc.edu (Ann Zweig) Date: Fri, 07 Sep 2007 09:45:10 -0700 Subject: [Genome] miRNA promoter In-Reply-To: <000e01c7f169$296bcfe0$db7ba695@ads.iu.edu> References: <000e01c7f169$296bcfe0$db7ba695@ads.iu.edu> Message-ID: <46E18016.3080607@cse.ucsc.edu> Hello Anu, The solution given in the email from our archives is old, but correct. Let me see if I can give a few more details. If you still have trouble, please feel free to write back and let us know exactly where it's failing for you. Navigate to the Table Browser tool on our website ('Tables' from the top blue navigation bar), and configure it like so: clade: Vertebrate genome: Human assembly: May 2004 (hg17) group: Genes and Gene Prediction Tracks track: sno/miRNA table: wgRna region: genome Then press the filter 'create' button and set type does match miRna. This will filter the table to include only those miRNAs from the miRNA Registry at Wellcome Trust Sanger Institute. Press 'submit' to return to the main Table Browser page. Then choose your output format. If you want promoter sequence (as in the other email), choose 'sequence' then press the 'get output' button. On the next page, choose 'Promoter/Upstream by XXXX bases' and un-check all of the other boxes. Press 'get sequence'. This will return a page of promoter sequence for all miRNAs in the track, like so: >hg17_wgRna_hsa-mir-200b range=chr1:1141407-1142406 5'pad=0 3'pad=0 revComp=FALSE strand=+ repeatMasking=none ggtggaaggtgccagaaaacttgaagagtggctctggccagctctctggg cccagttggcaccaggtggttgcagagaaagggtgggaaggaggacagga aggacggcgtgtccagcgggcggggagccttggtctggcctgagggctga acctccctcgggtcctgagtgtgcctggagtagaagcctagggtctctgg gctccaggcagggccctgagcaaggaggggccacagggctgcccacttcc tcctgcccccctgcagaggcggctcagccctcgcggcgtctgaggcttga ctgcctgtgtctgtgtttgtggccggtctgcctctgtgcctgggtcagac cccagaccagcagacacacaaaccgcagggacggctgggcagggtcagga gcctgccccgcccgcacccccacccacacccccacccccacccccgcctg caccccctgccctcagacgctgtgcagtgagcggggcagcatgggagagg ggtctccaggtggcggggaccgttctgtctcgagagcctcgcagacaccg ggcctttgagaagagaaggggctgggcagggaagcagctcctggaacacc atcctggagacagaggcccttgtcccctgcctcagacaaggcagcacgtg gggcccggggggctggggctgctgtccaggccttcctatgggaccaccca gagggaaggtcccccgcagaggggtgggggcagagggccgagcggggcgg gcagagggcccgtgtcagccccactccgacctagtcctcggccgtctggc caggacacttcggccccccaggtgcccaccccaggacccaaagctggtgg ctgctggactcggcagggctggcgggtggggctcacccgggcccctgccc tccggcgatgctgtcctcagtgccccaggaggacgaggccccccagctac tgagcttcccagcgagtcccatgcaaccctcagccgggcggcccccggac There is another track on the hg17 assembly that may be of interest to you as well. It is the T-ScanS miRNA Track. This track shows conserved mammalian microRNA regulatory target sites in the 3' UTR regions of Refseq Genes, as predicted by TargetScanS. We have also created the sno/miRNA track for the latest human assembly (hg18). I hope this is helpful to you. Regards, ---------- Ann Zweig UCSC Genome Bioinformatics Group http://genome.ucsc.edu Anuradha Ramamoorthy wrote: > Hi, > > I recently came across a mail from the mailing list. Link: > http://www.soe.ucsc.edu/pipermail/genome/2005-June/007710.html > But, the solution given there did not work for me. Is it correct? > > Thanks, > Anu > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From rileen at gmail.com Fri Sep 7 09:54:36 2007 From: rileen at gmail.com (Rileen) Date: Fri, 7 Sep 2007 18:54:36 +0200 Subject: [Genome] Finding alternative terminal exons ....... Message-ID: <6dc7bd4a0709070954r633c3a93jafdc4abca41b8cc1@mail.gmail.com> Hi, Thanks once again for the great resource you provide :-) I've played around with the "knownAlt" track a bit, and was wondering whether there's any simple way of deriving a list of alternative terminal exons (ATEs) from it, or any other table/track. By this I mean instances where the gene has transcripts ending in different exons, not merely different positions in the same exon. For all the other events in the knownAlt track, the two positions seem to provide useful information, but for the altFinish category, they always seem to differ by 1. Why is this so? I was hoping that given the two positions, one cold check whether they were in two different exons to derive the list of ATEs as a subset of altFinish, but that seems wrong. Looking forward to your reply, Yours, Rileen -- ****************************************************************** "I know nothing, but i _know_ that." Rileen Sinha rileen at yahoo.com Personal Phone : (0049)3641412276 (cheaper to call) (0049)17624078373 ****************************************************************** From kuhn at soe.ucsc.edu Fri Sep 7 10:25:16 2007 From: kuhn at soe.ucsc.edu (Robert Kuhn) Date: Fri, 7 Sep 2007 10:25:16 -0700 Subject: [Genome] WT1 gene clone Message-ID: <200709071725.KAA12144@moondance.cse.ucsc.edu> Dear Dr. Brown, I agree with your interpretation, as the first two short alignment blocks are essentially poly-(CAT) and encode polyhistidine, according to the genbank record from 1995. However, our display of genbank records is an automated process and we do not do any manual curation of the mRNA alignments. The only way to remove this accession from the browser is for the original contributors of the sequence to remove the record from genbank, though perhaps the genbank staff could change the DEFINITION field to reflect the poly-His tag (see REMARK field below). The latter solution would still not remove the alignment from the mRNA track in the browser, however, The 3' end of the sequence does have a good alignment to this region. As with any database resource of biological data, the user needs to use his/her own judgement, as you have done here, to determine how best to proceed in the interpretation of data. best wishes, --b0b kuhn ucsc genome bioinformatics group LOCUS S75264 521 bp mRNA linear PRI 11-JUL-1995 DEFINITION WT1=Wilms' tumor suppressor protein [human, fetal kidney, mRNA, 521 nt]. ACCESSION S75264 VERSION S75264.1 GI:896246 KEYWORDS . SOURCE Homo sapiens (human) ORGANISM Homo sapiens Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini; Catarrhini; Hominidae; Homo. REFERENCE 1 (bases 1 to 521) AUTHORS Hamilton,T.B., Barilla,K.C. and Romaniuk,P.J. TITLE High affinity binding sites for the Wilms' tumour suppressor protein WT1 JOURNAL Nucleic Acids Res. 23 (2), 277-284 (1995) PUBMED 7862533 REMARK GenBank staff at the National Library of Medicine created this entry [NCBI gibbsq 160293] from the original journal article. > From genome-bounces at soe.ucsc.edu Fri Sep 7 09:08:32 2007 > To: genome at soe.ucsc.edu > Subject: [Genome] WT1 gene clone > > Hi, > Just been looking at the human WT1 gene (chr11:32,366,412-32,471,435) for > longer transcripts in UCSC Genome Browser and found S75264, which at first > site looks very interesting (listed under "Human mRNAs from Genbank). > However, looking up the reference, it is clear that this is the sequence of > a polyhistidine tagged recombinant clone. Could you get this removed from > the next human assembly to avoid further confusion? > Many thanks, > Keith Brown > ---------------------- > Dr Keith Brown > Reader in Molecular Pathology > University of Bristol > Department of Cellular & Molecular Medicine > School of Medical Sciences > University Walk > Bristol BS8 1TD > UK > Tel:+44 (0)117 3312071 > Fax:+44 (0)117 9287896 > EMail:Keith.Brown at bristol.ac.uk > > > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome > From ann at soe.ucsc.edu Fri Sep 7 13:27:48 2007 From: ann at soe.ucsc.edu (Ann Zweig) Date: Fri, 07 Sep 2007 13:27:48 -0700 Subject: [Genome] Finding alternative terminal exons ....... In-Reply-To: <6dc7bd4a0709070954r633c3a93jafdc4abca41b8cc1@mail.gmail.com> References: <6dc7bd4a0709070954r633c3a93jafdc4abca41b8cc1@mail.gmail.com> Message-ID: <46E1B444.20809@cse.ucsc.edu> Hello Rileen, Thanks for the compliments on the browser. You are correct that you are not going to find what you want directly from the altFinish items in the Alt Events track. However, you can use that track as a starting point to get what you need. Below I outline the steps you will need to take to find the ending exon for each transcript. 1. Make a custom track of the altFinish items from the Alt Events track. Navigate to the Table Browser ('Tables' from the top blue navigation bar) and create a custom track from the Alt Events track. Be sure to configure the filter to include on altFinish items from the table: "name does match altFinish" Read more about creating custom tracks using the table browser here: http://genome.ucsc.edu/goldenPath/help/hgTablesHelp.html#CustomTrack 2. Intersect that custom track with the UCSC Known Gene track (to make a new CT). Again, using the Table Browser, intersect the UCSC Genes track with your custom track from step 1. Create a new custom track. This will be a track containing all UCSC Genes that have overlap with altFinish items. Read more about doing intersections with the Table Browser here: http://genome.ucsc.edu/goldenPath/help/hgTablesHelp.html#Intersection 3. Get the exons from the resulting track. Choose the resulting custom track from step 2 in the Table Browser. Choose BED as the output type. From the next page, choose only the "Exons" button. Press 'get BED'. This will give you a list (in BED format) of each exon in each UCSC Gene from the custom track in step 2. 4. Mine the output for the location of the "last" exon. The BED output will have the following columns: chromosome chromStart chromEnd name score strand Here's an example of the 9 exons from one gene in the track: chr1 41748270 41749524 uc001cgz.1_exon_0_0_chr1_41748271_r 0 - chr1 41751073 41752008 uc001cgz.1_exon_1_0_chr1_41751074_r 0 - chr1 41756659 41756746 uc001cgz.1_exon_2_0_chr1_41756660_r 0 - chr1 41762992 41763168 uc001cgz.1_exon_3_0_chr1_41762993_r 0 - chr1 41813801 41813947 uc001cgz.1_exon_4_0_chr1_41813802_r 0 - chr1 41817994 41823576 uc001cgz.1_exon_5_0_chr1_41817995_r 0 - chr1 41867006 41867205 uc001cgz.1_exon_6_0_chr1_41867007_r 0 - chr1 41939173 41939253 uc001cgz.1_exon_7_0_chr1_41939174_r 0 - chr1 42156670 42156782 uc001cgz.1_exon_8_0_chr1_42156671_r 0 - Here's how to interpret the name field: name: uc001cgz.1_exon_0_0_chr1_41748271_r uc001cgz.1 gene name exon part of gene (all exon in your case) 0 exon number 0 score chr1 chromosome 41748271 chromStart r strand (r = reverse, f = forward) So, in the example above, there are 9 exons for gene uc001cgz.1. Because it is on the reverse strand, you want exon 0. Conversely, if it were on the forward strand, you'd be looking for the exon with the highest number. So, for this transcript, the location of the ending exon is: chr1:41748270-41749524 This should get you well on your way. Be sure to write back to the list if you get stuck or need more direction. Regards, ---------- Ann Zweig UCSC Genome Bioinformatics Group http://genome.ucsc.edu Rileen wrote: > Hi, > Thanks once again for the great resource you provide :-) > > I've played around with the "knownAlt" track a bit, and was wondering > whether there's any simple way of deriving a list of alternative terminal > exons (ATEs) from it, or any other table/track. By this I mean instances > where the gene has transcripts ending in different exons, not merely > different positions in the same exon. > > For all the other events in the knownAlt track, the two positions seem > to provide useful information, but for the altFinish category, they always > seem to differ by 1. > > Why is this so? > > I was hoping that given the two positions, one cold check whether they > were in two > different exons to derive the list of ATEs as a subset of altFinish, > but that seems > wrong. > > Looking forward to your reply, > Yours, > Rileen > From gkishore at yahoo.com Fri Sep 7 15:48:39 2007 From: gkishore at yahoo.com (kishore) Date: Fri, 7 Sep 2007 15:48:39 -0700 (PDT) Subject: [Genome] custom track format regarding "color" Message-ID: <722474.96680.qm@web38808.mail.mud.yahoo.com> dear UCSC group, i am creating custom tracks for our internal data and am using the .WIG format. i encountered one problem with including the color info in the track line. if u include the color info in the track line and upload the track using the browse and upload option in the UCSC website, it shows an error. however, if u copy and paste the data in the space provided and then submit it, no errors are shown and the data is displayed in color. is there some glitch somewhere as to why it doesn't like the color info in the track line if u are planning to browse the file and upload it directly??? ur help will be greatly appreciated since i have lots of different samples and would like to color code them and share the web link to other researchers. thanks in advance. Kishore Guda, BVSc&A.H (DVM)., Ph.D. Howard Hughes Medical Institute Case Western Reserve University-Ireland Cancer Center Cleveland, OH 44106 ____________________________________________________________________________________ Got a little couch potato? Check out fun summer activities for kids. http://search.yahoo.com/search?fr=oni_on_mail&p=summer+activities+for+kids&cs=bz From hiram at soe.ucsc.edu Fri Sep 7 15:57:32 2007 From: hiram at soe.ucsc.edu (Hiram Clawson) Date: Fri, 07 Sep 2007 15:57:32 -0700 Subject: [Genome] custom track format regarding "color" In-Reply-To: <722474.96680.qm@web38808.mail.mud.yahoo.com> References: <722474.96680.qm@web38808.mail.mud.yahoo.com> Message-ID: <46E1D75C.5080201@soe.ucsc.edu> Good Afternoon Dr. Guda: Can you provide a sample set of data that will illustrate the problem ? Remember to verify that your "track" line is a single line despite the number of arguments on the line. --Hiram kishore wrote: > dear UCSC group, > i am creating custom tracks for our internal data and am using the .WIG format. i encountered one problem with including the color info in the track line. if u include the color info in the track line and upload the track using the browse and upload option in the UCSC website, it shows an error. however, if u copy and paste the data in the space provided and then submit it, no errors are shown and the data is displayed in color. is there some glitch somewhere as to why it doesn't like the color info in the track line if u are planning to browse the file and upload it directly??? ur help will be greatly appreciated since i have lots of different samples and would like to color code them and share the web link to other researchers. thanks in advance. > > Kishore Guda, BVSc&A.H (DVM)., Ph.D. > Howard Hughes Medical Institute > Case Western Reserve University-Ireland Cancer Center > Cleveland, OH 44106 From emmanuel.buschiazzo at pg.canterbury.ac.nz Fri Sep 7 22:40:11 2007 From: emmanuel.buschiazzo at pg.canterbury.ac.nz (Emmanuel Buschiazzo) Date: Sat, 08 Sep 2007 17:40:11 +1200 Subject: [Genome] chromInfo Message-ID: <46E235BB.9020409@pg.canterbury.ac.nz> Hello team, I am looking for lengths of all scaffolds for the dasNov1, echTel1, loxAfr1 and oryCun1 assemblies. I can't find the chromInfo files for these assemblies on your ftp site. Could you make them available, or alternatively, could you indicate me another way to get this information? Thanks in advance. Emmanuel. From davidgloriam at googlemail.com Sun Sep 9 10:16:59 2007 From: davidgloriam at googlemail.com (David Gloriam) Date: Sun, 9 Sep 2007 18:16:59 +0100 Subject: [Genome] Looking for g2gOverlap and other BLAT-associated programs Message-ID: <000201c7f305$3f0f6cb0$6400a8c0@LAPTOP> Hi, I found a number of useful BLAT-associated programs in this document in the downloads section: http://genome-test.cse.ucsc.edu/~kent/exe/usage.txt Unfortunately, I can only find a small subset of these in the download folders for various OSs. At the moment I would need "g2gOverlap". Would anyone be able to tell me if it is possible to get a hold of these programs and if so where? Thanks, David From reiner.schulz at kcl.ac.uk Mon Sep 10 04:01:17 2007 From: reiner.schulz at kcl.ac.uk (Reiner Schulz) Date: Mon, 10 Sep 2007 12:01:17 +0100 Subject: [Genome] cpglh Message-ID: <46E523FD.2070801@kcl.ac.uk> i found the message below looking for the CpG island prediction program that UCSC uses for its browser tracks. i would be very much interested in obtaining the source code since i would like to apply the program to non-repeat-masked sequence. i work for King's College London, i.e., would use the program for academic purposes only. much appreciated, Reiner >>>>>>>>>>>>> We use the cpglh program from Washington University (St. Louis) Genome Sequencing Center. The original author was Gos Miklem from the Sanger Center. The version we use has been modified by LaDeana Hillier at WUGSC. If you are a non-profit, we could send you a copy of the source. cpglh requires hardmasked (Ns) fa files for input. Heather Trumbower UCSC Genome Bioinformatics Group > Hi > I was wondering what program you use for detecting CpG islands and what > are the parameters used? > Thanks > Razi -- (*)->[]->()->[]->(**)->[]->()->[]->(*)->[]->()->[]->()->[]->()->[]->()->[] (Humboldt University Berlin, Germany)->[]-> ... (University of Maryland, USA)->[]-> ... (King's College London, UK) https://josh.umds.ac.uk/~rschulz From rileen at gmail.com Sun Sep 9 12:48:36 2007 From: rileen at gmail.com (Rileen) Date: Sun, 9 Sep 2007 21:48:36 +0200 Subject: [Genome] Finding alternative terminal exons ....... In-Reply-To: <6dc7bd4a0709091247i63bd06dev12b37c741518faf3@mail.gmail.com> References: <6dc7bd4a0709070954r633c3a93jafdc4abca41b8cc1@mail.gmail.com> <46E1B444.20809@cse.ucsc.edu> <6dc7bd4a0709091247i63bd06dev12b37c741518faf3@mail.gmail.com> Message-ID: <6dc7bd4a0709091248r6c628182x5959a9a875d115e0@mail.gmail.com> Hi, Thanks for that, I now have the list of all the exons in UCSC genes with altFinish entries. I just need to put this together with a table giving the info on gene names/symbols, so that I know all the different transcripts of a given gene, and can check for different terminal exons accordingly, i.e something similar to the "refFlat" table for RefSeq data. Thanks once again, Yours, Rileen On 07/09/2007, Ann Zweig wrote: > Hello Rileen, > > Thanks for the compliments on the browser. > > You are correct that you are not going to find what you want directly > from the altFinish items in the Alt Events track. However, you can use > that track as a starting point to get what you need. Below I outline > the steps you will need to take to find the ending exon for each transcript. > > 1. Make a custom track of the altFinish items from the Alt Events track. > > Navigate to the Table Browser ('Tables' from the top blue navigation > bar) and create a custom track from the Alt Events track. Be sure to > configure the filter to include on altFinish items from the table: > "name does match altFinish" > > Read more about creating custom tracks using the table browser here: > http://genome.ucsc.edu/goldenPath/help/hgTablesHelp.html#CustomTrack > > > 2. Intersect that custom track with the UCSC Known Gene track (to make a > new CT). > > Again, using the Table Browser, intersect the UCSC Genes track with your > custom track from step 1. Create a new custom track. This will be a > track containing all UCSC Genes that have overlap with altFinish items. > > Read more about doing intersections with the Table Browser here: > http://genome.ucsc.edu/goldenPath/help/hgTablesHelp.html#Intersection > > > 3. Get the exons from the resulting track. > Choose the resulting custom track from step 2 in the Table Browser. > Choose BED as the output type. From the next page, choose only the > "Exons" button. Press 'get BED'. This will give you a list (in BED > format) of each exon in each UCSC Gene from the custom track in step 2. > > > 4. Mine the output for the location of the "last" exon. > The BED output will have the following columns: > chromosome > chromStart > chromEnd > name > score > strand > > Here's an example of the 9 exons from one gene in the track: > > chr1 41748270 41749524 uc001cgz.1_exon_0_0_chr1_41748271_r 0 - > chr1 41751073 41752008 uc001cgz.1_exon_1_0_chr1_41751074_r 0 - > chr1 41756659 41756746 uc001cgz.1_exon_2_0_chr1_41756660_r 0 - > chr1 41762992 41763168 uc001cgz.1_exon_3_0_chr1_41762993_r 0 - > chr1 41813801 41813947 uc001cgz.1_exon_4_0_chr1_41813802_r 0 - > chr1 41817994 41823576 uc001cgz.1_exon_5_0_chr1_41817995_r 0 - > chr1 41867006 41867205 uc001cgz.1_exon_6_0_chr1_41867007_r 0 - > chr1 41939173 41939253 uc001cgz.1_exon_7_0_chr1_41939174_r 0 - > chr1 42156670 42156782 uc001cgz.1_exon_8_0_chr1_42156671_r 0 - > > Here's how to interpret the name field: > > name: uc001cgz.1_exon_0_0_chr1_41748271_r > > uc001cgz.1 gene name > exon part of gene (all exon in your case) > 0 exon number > 0 score > chr1 chromosome > 41748271 chromStart > r strand (r = reverse, f = forward) > > So, in the example above, there are 9 exons for gene uc001cgz.1. > Because it is on the reverse strand, you want exon 0. Conversely, if it > were on the forward strand, you'd be looking for the exon with the > highest number. > > So, for this transcript, the location of the ending exon is: > chr1:41748270-41749524 > > > This should get you well on your way. Be sure to write back to the > list if you get stuck or need more direction. > > > Regards, > > ---------- > Ann Zweig > UCSC Genome Bioinformatics Group > http://genome.ucsc.edu > > > > Rileen wrote: > > Hi, > > Thanks once again for the great resource you provide :-) > > > > I've played around with the "knownAlt" track a bit, and was wondering > > whether there's any simple way of deriving a list of alternative terminal > > exons (ATEs) from it, or any other table/track. By this I mean instances > > where the gene has transcripts ending in different exons, not merely > > different positions in the same exon. > > > > For all the other events in the knownAlt track, the two positions seem > > to provide useful information, but for the altFinish category, they always > > seem to differ by 1. > > > > Why is this so? > > > > I was hoping that given the two positions, one cold check whether they > > were in two > > different exons to derive the list of ATEs as a subset of altFinish, > > but that seems > > wrong. > > > > Looking forward to your reply, > > Yours, > > Rileen > > > -- ****************************************************************** "I know nothing, but i _know_ that." Rileen Sinha rileen at yahoo.com Personal Phone : (0049)3641412276 (cheaper to call) (0049)17624078373 ****************************************************************** -- ****************************************************************** "I know nothing, but i _know_ that." Rileen Sinha rileen at yahoo.com Personal Phone : (0049)3641412276 (cheaper to call) (0049)17624078373 ****************************************************************** From yaelshemla at gmail.com Mon Sep 10 02:37:08 2007 From: yaelshemla at gmail.com (yael shemla) Date: Mon, 10 Sep 2007 12:37:08 +0300 Subject: [Genome] repeats for refseq genes Message-ID: Hello, I have a list of refseq genes and Im trying to get a list of repeats that match each gene. If I use the Table of refseq gene with intersection of repeat master table, i don't get the names of the repeats. If i use the other way, and make intersection of repeat master with refseq-table, I cant enter the list of genes. Is there another way to get this data? Thanks. From kayla at soe.ucsc.edu Mon Sep 10 09:44:04 2007 From: kayla at soe.ucsc.edu (Kayla Smith) Date: Mon, 10 Sep 2007 09:44:04 -0700 (PDT) Subject: [Genome] chromInfo In-Reply-To: <46E235BB.9020409@pg.canterbury.ac.nz> References: <46E235BB.9020409@pg.canterbury.ac.nz> Message-ID: Emmanuel, The assemblies you mention do not have browsers on our public site. However, you can have a look at this data on our test server http://genome-test.cse.ucsc.edu. The easiest way to get to the chromInfo tables would be to use the Table Browser, with the following settings: clade: Vertebrate genome: Rabbit assembly: May 2005 group: All Tables database: oryCun1 tables: chromInfo output format: all fields from selected table. click "get output" I hope this is helpful to you. Please don't hesitate to contact us again if you require further assistance. Kayla Smith UCSC Genome Bioinformatics Group On Sat, 8 Sep 2007, Emmanuel Buschiazzo wrote: > Hello team, > > I am looking for lengths of all scaffolds for the dasNov1, echTel1, > loxAfr1 and oryCun1 assemblies. I can't find the chromInfo files for > these assemblies on your ftp site. > > Could you make them available, or alternatively, could you indicate me > another way to get this information? > > Thanks in advance. > > Emmanuel. > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome > From hiram at soe.ucsc.edu Mon Sep 10 10:09:25 2007 From: hiram at soe.ucsc.edu (Hiram Clawson) Date: Mon, 10 Sep 2007 10:09:25 -0700 Subject: [Genome] Looking for g2gOverlap and other BLAT-associated programs In-Reply-To: <000201c7f305$3f0f6cb0$6400a8c0@LAPTOP> References: <000201c7f305$3f0f6cb0$6400a8c0@LAPTOP> Message-ID: <46E57A45.2080205@soe.ucsc.edu> Good Morning David: We do not package the hundreds of kent source tree utilities. http://genome.ucsc.edu/license/ http://genomewiki.ucsc.edu/index.php/Kent_source_utilities You can fetch and build the entire source tree: http://genome.ucsc.edu/admin/cvs.html or: http://hgdownload.cse.ucsc.edu/admin/jksrc.zip http://genome.ucsc.edu/admin/jk-install.html And the program you are looking for is in the source directory: src/hg/gigAssembler/g2gOverlap/ although you may be looking for the newer program: pslFilter which is in src/hg/pslFilter/ --Hiram $ pslFilter pslFilter - filter out psl file pslFilter in.psl out.psl options -dir Input files are directories rather than single files -reward=N (default 1) Bonus to score for match -cost=N (default 1) Penalty to score for mismatch -gapOpenCost=N (default 4) Penalty for gap opening -gapSizeLogMod=N (default 1.00) Penalty for gap sizes -minScore=N (default 15) Minimum score to pass filter -minMatch=N (default 30) Min match (including repeats to pass) -minUniqueMatch (default 20) Min non-repeats to pass) -maxBadPpt (default 700) Maximum divergence in parts per thousand -minAli (default 600) Minimum ratio query in alignment in ppt -noHead Don't output psl header -minAliT (default 0) Like minAli for target David Gloriam wrote: > Hi, > > I found a number of useful BLAT-associated programs in this document in the > downloads section: > > http://genome-test.cse.ucsc.edu/~kent/exe/usage.txt > > Unfortunately, I can only find a small subset of these in the download > folders for various OSs. At the moment I would need "g2gOverlap". Would > anyone be able to tell me if it is possible to get a hold of these programs > and if so where? > > Thanks, > > David From kayla at soe.ucsc.edu Mon Sep 10 13:25:34 2007 From: kayla at soe.ucsc.edu (Kayla Smith) Date: Mon, 10 Sep 2007 13:25:34 -0700 (PDT) Subject: [Genome] repeats for refseq genes In-Reply-To: References: Message-ID: Hello Yael, You are correct, if you start your intersection with the refGene table, you wont be able to get the names of the repeats. And if you start your intersection with the repeatMasker table, you wont be able to get the names of the refSeqs. In order to get the information for both of these tables, you can try using Galaxy: http://main.g2.bx.psu.edu/. Galaxy has more extensie data intersection tools. We provide a link in the Table Browser to "send output to Galaxy" Here is a previously answered mailing list question with some tips on how to use Galaxy: http://www.soe.ucsc.edu/pipermail/genome/2006-November/012256.html I hope this is helpful to you. Please don't hesitate to contact us again if you require further assistance. Kayla Smith UCSC Genome Bioinformatics Group On Mon, 10 Sep 2007, yael shemla wrote: > Hello, > I have a list of refseq genes and Im trying to get a list of repeats that > match each gene. > If I use the Table of refseq gene with intersection of repeat master table, > i don't get the names of the repeats. > If i use the other way, and make intersection of repeat master with > refseq-table, I cant enter the list of genes. > Is there another way to get this data? > > Thanks. > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome > From kayla at soe.ucsc.edu Mon Sep 10 17:26:36 2007 From: kayla at soe.ucsc.edu (Kayla Smith) Date: Mon, 10 Sep 2007 17:26:36 -0700 (PDT) Subject: [Genome] Finding alternative terminal exons ....... In-Reply-To: <6dc7bd4a0709091248r6c628182x5959a9a875d115e0@mail.gmail.com> References: <6dc7bd4a0709070954r633c3a93jafdc4abca41b8cc1@mail.gmail.com> <46E1B444.20809@cse.ucsc.edu> <6dc7bd4a0709091247i63bd06dev12b37c741518faf3@mail.gmail.com> <6dc7bd4a0709091248r6c628182x5959a9a875d115e0@mail.gmail.com> Message-ID: Rileen, There are a couple of things you can do from here: 1. Notice that in the example output section of Ann's message below, > chr1 41748270 41749524 uc001cgz.1_exon_0_0_chr1_41748271_r$ That "uc001cgz.1" is an identifier in from the knownGene table. You can connect this name to a gene symbol by using the table hg18.kgXref. However, this name is just a substring of the name of an exon in your output, so you'd have to parse the name out yourself. That is to say the information you need is there, but it's difficult to get at. 2. You can use the Galaxy tool to perform more advanced intersection operations. Galaxy is a set of tools created and maintained at Penn State University that works in concert with the UCSC Genome Browser. It is located here: http://main.g2.bx.psu.edu/ Galaxy is capable of intersecting two tables and keep identifying information from both. Use the "Get Data" link on the left hand side of the page to upload data or to retrieve it from the Genome Browser. You can also use the "send output to Galaxy" link on your existing custom track in the Table Browser. The "Operate on Genomic Intervals" link contains join and intersect tools, which you can use to join information from both your exon custom track data and from the UCSC Genes (knownGene) track, and then intersect that with the kgXref track, thereby keeping information from all tracks in the intersection. If you have trouble with any of the Galaxy tools, they have a helpdesk as well: galaxy-user at bx.psu.edu I hope this information is helpful to you. Please don't hesitate to contact us again if you require further assistance. Kayla Smith UCSC Genome Bioinformatics Group Kayla Smith UCSC Genome Bioinformatics Group On Sun, 9 Sep 2007, Rileen wrote: > Hi, > Thanks for that, I now have the list of all the exons in > UCSC genes with altFinish entries. > > I just need to put this together with a table giving the info on gene > names/symbols, so that I know all the different transcripts of a > given gene, and can check for different terminal exons accordingly, > i.e something similar to the "refFlat" table for RefSeq data. > > Thanks once again, > Yours, > Rileen > > On 07/09/2007, Ann Zweig wrote: > > Hello Rileen, > > > > Thanks for the compliments on the browser. > > > > You are correct that you are not going to find what you want directly > > from the altFinish items in the Alt Events track. However, you can use > > that track as a starting point to get what you need. Below I outline > > the steps you will need to take to find the ending exon for each transcript. > > > > 1. Make a custom track of the altFinish items from the Alt Events track. > > > > Navigate to the Table Browser ('Tables' from the top blue navigation > > bar) and create a custom track from the Alt Events track. Be sure to > > configure the filter to include on altFinish items from the table: > > "name does match altFinish" > > > > Read more about creating custom tracks using the table browser here: > > http://genome.ucsc.edu/goldenPath/help/hgTablesHelp.html#CustomTrack > > > > > > 2. Intersect that custom track with the UCSC Known Gene track (to make a > > new CT). > > > > Again, using the Table Browser, intersect the UCSC Genes track with your > > custom track from step 1. Create a new custom track. This will be a > > track containing all UCSC Genes that have overlap with altFinish items. > > > > Read more about doing intersections with the Table Browser here: > > http://genome.ucsc.edu/goldenPath/help/hgTablesHelp.html#Intersection > > > > > > 3. Get the exons from the resulting track. > > Choose the resulting custom track from step 2 in the Table Browser. > > Choose BED as the output type. From the next page, choose only the > > "Exons" button. Press 'get BED'. This will give you a list (in BED > > format) of each exon in each UCSC Gene from the custom track in step 2. > > > > > > 4. Mine the output for the location of the "last" exon. > > The BED output will have the following columns: > > chromosome > > chromStart > > chromEnd > > name > > score > > strand > > > > Here's an example of the 9 exons from one gene in the track: > > > > chr1 41748270 41749524 uc001cgz.1_exon_0_0_chr1_41748271_r 0 - > > chr1 41751073 41752008 uc001cgz.1_exon_1_0_chr1_41751074_r 0 - > > chr1 41756659 41756746 uc001cgz.1_exon_2_0_chr1_41756660_r 0 - > > chr1 41762992 41763168 uc001cgz.1_exon_3_0_chr1_41762993_r 0 - > > chr1 41813801 41813947 uc001cgz.1_exon_4_0_chr1_41813802_r 0 - > > chr1 41817994 41823576 uc001cgz.1_exon_5_0_chr1_41817995_r 0 - > > chr1 41867006 41867205 uc001cgz.1_exon_6_0_chr1_41867007_r 0 - > > chr1 41939173 41939253 uc001cgz.1_exon_7_0_chr1_41939174_r 0 - > > chr1 42156670 42156782 uc001cgz.1_exon_8_0_chr1_42156671_r 0 - > > > > Here's how to interpret the name field: > > > > name: uc001cgz.1_exon_0_0_chr1_41748271_r > > > > uc001cgz.1 gene name > > exon part of gene (all exon in your case) > > 0 exon number > > 0 score > > chr1 chromosome > > 41748271 chromStart > > r strand (r = reverse, f = forward) > > > > So, in the example above, there are 9 exons for gene uc001cgz.1. > > Because it is on the reverse strand, you want exon 0. Conversely, if it > > were on the forward strand, you'd be looking for the exon with the > > highest number. > > > > So, for this transcript, the location of the ending exon is: > > chr1:41748270-41749524 > > > > > > This should get you well on your way. Be sure to write back to the > > list if you get stuck or need more direction. > > > > > > Regards, > > > > ---------- > > Ann Zweig > > UCSC Genome Bioinformatics Group > > http://genome.ucsc.edu > > > > > > > > Rileen wrote: > > > Hi, > > > Thanks once again for the great resource you provide :-) > > > > > > I've played around with the "knownAlt" track a bit, and was wondering > > > whether there's any simple way of deriving a list of alternative terminal > > > exons (ATEs) from it, or any other table/track. By this I mean instances > > > where the gene has transcripts ending in different exons, not merely > > > different positions in the same exon. > > > > > > For all the other events in the knownAlt track, the two positions seem > > > to provide useful information, but for the altFinish category, they always > > > seem to differ by 1. > > > > > > Why is this so? > > > > > > I was hoping that given the two positions, one cold check whether they > > > were in two > > > different exons to derive the list of ATEs as a subset of altFinish, > > > but that seems > > > wrong. > > > > > > Looking forward to your reply, > > > Yours, > > > Rileen > > > > > > > > -- > ****************************************************************** > "I know nothing, but i _know_ that." > > Rileen Sinha rileen at yahoo.com > Personal Phone : (0049)3641412276 (cheaper to call) > (0049)17624078373 > ****************************************************************** > > > -- > ****************************************************************** > "I know nothing, but i _know_ that." > > Rileen Sinha rileen at yahoo.com > Personal Phone : (0049)3641412276 (cheaper to call) > (0049)17624078373 > ****************************************************************** > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome > From Laetitia.Poidevin at titus.u-strasbg.fr Tue Sep 11 05:55:08 2007 From: Laetitia.Poidevin at titus.u-strasbg.fr (Laetitia Poidevin) Date: Tue, 11 Sep 2007 14:55:08 +0200 Subject: [Genome] mouse mm9 Message-ID: <46E6902C.9030804@igbmc.u-strasbg.fr> Hello, In the new version of the mouse (mm9), several tables disappeared (kgAlias, kgProt, kgSpAlias, kgXref, knownGene, rnaRefSeq...). The files have been named differently or suppressed or ... ? Thank you for your answer Best regards Laetitia Poidevin From wxzheng_tju at hotmail.com Tue Sep 11 07:25:55 2007 From: wxzheng_tju at hotmail.com (ZhengWenXin) Date: Tue, 11 Sep 2007 22:25:55 +0800 Subject: [Genome] Help! Message-ID: Dear Prof., Would you please help me to resolve a problem about the conservation track of the UCSC genome browser? I?m interested in the conservation of the human genome with other species. As described in the Methods part, pairwise alignments with the human genome were generated for each species using blastz from repeat-masked genomic sequence. For example, how can I get the best alignment of the human genome and the mouse genome? If I use the blastz software to do it on a PC, how long it will take? Would you please give me an estimate? And I also want to know you hardware environment in which you perform the pairwise alignments. Thank you very much! Your help will be greatly appreciated. Best wishes, WenXin Zheng Wen-Xin Zheng, PhD candidate Bioinformatics Center Tianjin University Tianjin 300072 China Fax: +86-22-27402697 Website: http://tubic.tju.edu.cn _________________________________________________________________ ????? MSN ?????????? http://mobile.msn.com.cn/ From kuhn at soe.ucsc.edu Tue Sep 11 09:18:12 2007 From: kuhn at soe.ucsc.edu (Robert Kuhn) Date: Tue, 11 Sep 2007 09:18:12 -0700 Subject: [Genome] mouse mm9 Message-ID: <200709111618.JAA03497@moondance.cse.ucsc.edu> Dear Laetitia, The tables you named are still under development. they should be available for the latest mouse assembly in a few weeks. Some of them becoming available on our test server, genome-test.cse.ucsc.edu, but they have not yet been through our QA process and may not be in their final form. best wishes, --b0b kuhn ucsc genome bioinformatics group > From genome-bounces at soe.ucsc.edu Tue Sep 11 09:02:46 2007 > To: genome at soe.ucsc.edu > Subject: [Genome] mouse mm9 > > Hello, > In the new version of the mouse (mm9), several tables disappeared > (kgAlias, kgProt, kgSpAlias, kgXref, knownGene, rnaRefSeq...). > The files have been named differently or suppressed or ... ? > Thank you for your answer > Best regards > Laetitia Poidevin > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome > From archanat at soe.ucsc.edu Tue Sep 11 09:23:33 2007 From: archanat at soe.ucsc.edu (Archana Thakkapallayil) Date: Tue, 11 Sep 2007 09:23:33 -0700 Subject: [Genome] mouse mm9 In-Reply-To: <46E6902C.9030804@igbmc.u-strasbg.fr> References: <46E6902C.9030804@igbmc.u-strasbg.fr> Message-ID: <46E6C105.1080902@soe.ucsc.edu> Hello Laetitia, We have released the mm9 assembly with a minimum set of fundamental tracks. The development of other tracks are in progress and we are adding them when it is completed. At this time, this assembly does not contain the known genes track. These tables will be added as we complete the development of known genes track. Until then, please use these tables on the previous assembly (mm8). I hope this informations is helpful to you. Please don't hesitate to contact us again if you require further assistance. Regards, Archana UCSC Genome Bioinformatics Group Laetitia Poidevin wrote: > Hello, > In the new version of the mouse (mm9), several tables disappeared > (kgAlias, kgProt, kgSpAlias, kgXref, knownGene, rnaRefSeq...). > The files have been named differently or suppressed or ... ? > Thank you for your answer > Best regards > Laetitia Poidevin > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome > From dzhang at burnham.org Tue Sep 11 09:41:58 2007 From: dzhang at burnham.org (dongxian zhang) Date: Tue, 11 Sep 2007 09:41:58 -0700 Subject: [Genome] sequence search Message-ID: Hi, I am searching the BAC clone information for hypothetical protein XM_001471686 in your Genome Browser. The search result located the protein to Chromosome 12. However, the same clone was identified by NCBI Map Viewer in Chromosome 4. Because this protein matched a genomic contig sequence NT_039268, I used NT_039268 to search your Genome browser. This time I matched to Chromosome 4. However, I noticed the sequence number for the beginning and end of NT_039268 were different between your and NCBI genome viewer. Also, if I took the 100K sequence from NT_039268 that contains XM_001471686 to search your Genome Browser using BLAT, I could not match it to chromosome 4. Could you explain all these? How can I identify the right BAC clone that contains XM_001471686? Your help will be greatly appreciated. Sincerely, Dongxian Dongxian Zhang, Ph.D. Associate Professor Burnham Institute for Medical Research The Neuroscience and Aging Center 10901 North Torrey Pines Road La Jolla, CA 92037 USA Tel # (858)-795-5263 Fax# (858)-795-5292 Web: http://www.burnham.org/default.asp?contentID=194 This email message is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. From hiram at soe.ucsc.edu Tue Sep 11 10:01:24 2007 From: hiram at soe.ucsc.edu (Hiram Clawson) Date: Tue, 11 Sep 2007 10:01:24 -0700 Subject: [Genome] Help! In-Reply-To: References: Message-ID: <46E6C9E4.5030609@soe.ucsc.edu> Good Morning WenXin Zheng: Please use the Mouse Chain and Net tracks on the Human genome browser to observe the alignment between the Mouse and Human sequences. The blastz operation between Mouse and Human is performed here on a super computer with 394 CPUs (AMD Opteron's at 2 Ghz) with 4 Gb of memory. The operation is typically broken into approximately 100,000 separate instances of blastz, each job running for an average time of about 9 minutes. A total run time of about 1.7 years on a single CPU. --Hiram ZhengWenXin wrote: > Dear Prof., > Would you please help me to resolve a problem about the conservation track of the UCSC genome browser? > I?m interested in the conservation of the human genome with other species. As described in the Methods part, pairwise alignments with the human genome were generated for each species using blastz from repeat-masked genomic sequence. For example, how can I get the best alignment of the human genome and the mouse genome? If I use the blastz software to do it on a PC, how long it will take? Would you please give me an estimate? And I also want to know you hardware environment in which you perform the pairwise alignments. > Thank you very much! Your help will be greatly appreciated. > Best wishes, > WenXin Zheng > > > > > Wen-Xin Zheng, PhD candidate > Bioinformatics Center > Tianjin University > Tianjin 300072 > China > Fax: +86-22-27402697 > Website: http://tubic.tju.edu.cn > _________________________________________________________________ > ????? MSN ?????????? > http://mobile.msn.com.cn/ From hiram at soe.ucsc.edu Tue Sep 11 10:15:48 2007 From: hiram at soe.ucsc.edu (Hiram Clawson) Date: Tue, 11 Sep 2007 10:15:48 -0700 Subject: [Genome] sequence search In-Reply-To: References: Message-ID: <46E6CD44.4000605@soe.ucsc.edu> Good Morning Dongxian: Please note the differences in the versions of Mouse you are working with here. The NCBI Map viewer is currently showing NCBI Build 37. The UCSC Genome browser current default mouse display is NCBI Build 36. You can view the NCBI Build 37 sequence in the UCSC Genome browser. Go to the gateway browser page, http://genome.ucsc.edu/cgi-bin/hgGateway and select the July 2007 Mouse genome (UCSC mm9) When you blat your XM_001471686 sequence to the mm9 sequence, it is found exactly at chr4:155,573,561-155,574,905 Which is contig NT_039268.5 --Hiram dongxian zhang wrote: > Hi, > > I am searching the BAC clone information for hypothetical protein > XM_001471686 in your Genome Browser. The search result located the > protein to Chromosome 12. However, the same clone was identified by > NCBI Map Viewer in Chromosome 4. Because this protein matched a > genomic contig sequence NT_039268, I used NT_039268 to search your > Genome browser. This time I matched to Chromosome 4. However, I > noticed the sequence number for the beginning and end of NT_039268 > were different between your and NCBI genome viewer. Also, if I took > the 100K sequence from NT_039268 that contains XM_001471686 to search > your Genome Browser using BLAT, I could not match it to chromosome > 4. Could you explain all these? How can I identify the right BAC > clone that contains XM_001471686? Your help will be greatly > appreciated. > > Sincerely, > > Dongxian > > Dongxian Zhang, Ph.D. > Associate Professor > Burnham Institute for Medical Research > The Neuroscience and Aging Center > 10901 North Torrey Pines Road > La Jolla, CA 92037 > USA > Tel # (858)-795-5263 > Fax# (858)-795-5292 > Web: http://www.burnham.org/default.asp?contentID=194 From kate at soe.ucsc.edu Tue Sep 11 10:34:00 2007 From: kate at soe.ucsc.edu (Kate Rosenbloom) Date: Tue, 11 Sep 2007 10:34:00 -0700 Subject: [Genome] Help! In-Reply-To: References: Message-ID: <46E6D188.6010605@soe.ucsc.edu> Hello WenXin Zheng, If you want to obtain pairwise alignments for the human genome vs. other species in our conservation track, you can do this by downloading the pairwise alignments we used to create the multiple alignment, which are available here: http://hgdownload.cse.ucsc.edu/downloads.html#human These alignments are single-coverage (best alignments of other genome to human), that result from our chaining & netting processes applied to the blastz alignments. An alternative is to extract the pairwise alignments from the multiple alignment. You can do this by downloading the multiple alignment, eg. from: http://hgdownload.cse.ucsc.edu/goldenPath/hg18/multiz28way/ together with some simple scripting to filter the file. Another option is to use the Galaxy tool provided by the Penn State bioinformatics group -- there is a link to Galaxy from the menu bar on the Genome Browser home page. Blastz is developed and supported by the Penn State group: http://www.bx.psu.edu At UCSC we perform whole-genome blastz alignments on our high-performance computing clusters. If you want to align your own sequences in your local environment, you will want to download the blastz package from Penn State. They may also have suggestions regarding hardware configuration. Hope this helps, Kate --- Kate Rosenbloom UCSC Genome Bioinformatics ZhengWenXin wrote: > Dear Prof., > Would you please help me to resolve a problem about the conservation track of the UCSC genome browser? > I?m interested in the conservation of the human genome with other species. As described in the Methods part, pairwise alignments with the human genome were generated for each species using blastz from repeat-masked genomic sequence. For example, how can I get the best alignment of the human genome and the mouse genome? If I use the blastz software to do it on a PC, how long it will take? Would you please give me an estimate? And I also want to know you hardware environment in which you perform the pairwise alignments. > Thank you very much! Your help will be greatly appreciated. > Best wishes, > WenXin Zheng > > > > > Wen-Xin Zheng, PhD candidate > Bioinformatics Center > Tianjin University > Tianjin 300072 > China > Fax: +86-22-27402697 > Website: http://tubic.tju.edu.cn > _________________________________________________________________ > ????? MSN ?????????? > http://mobile.msn.com.cn/ > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From archie_russell at merck.com Tue Sep 11 11:01:09 2007 From: archie_russell at merck.com (Russell, Archie) Date: Tue, 11 Sep 2007 11:01:09 -0700 Subject: [Genome] Ucsc known genes track for cow In-Reply-To: <46E6902C.9030804@igbmc.u-strasbg.fr> References: <46E6902C.9030804@igbmc.u-strasbg.fr> Message-ID: <23B0A4FBD181A44D9B89C4FB3E96D594A22FCD@ussemx1100.merck.com> Hi, Do you have any plans to create the Ucsc Genes track for cow? Also, how does the the Ucsc Genes method compare with NCBI's XMs and the ensembl pipeline? It looks like UCSC Genes is based on RNA evidence where the others might not be, is that correct? Thanks, Archie ------------------------------------------------------------------------------ Notice: This e-mail message, together with any attachments, contains information of Merck & Co., Inc. (One Merck Drive, Whitehouse Station, New Jersey, USA 08889), and/or its affiliates (which may be known outside the United States as Merck Frosst, Merck Sharp & Dohme or MSD and in Japan, as Banyu - direct contact information for affiliates is available at http://www.merck.com/contact/contacts.html) that may be confidential, proprietary copyrighted and/or legally privileged. It is intended solely for the use of the individual or entity named on this message. If you are not the intended recipient, and have received this message in error, please notify us immediately by reply e-mail and then delete it from your system. ------------------------------------------------------------------------------ From archanat at soe.ucsc.edu Tue Sep 11 11:50:21 2007 From: archanat at soe.ucsc.edu (Archana Thakkapallayil) Date: Tue, 11 Sep 2007 11:50:21 -0700 Subject: [Genome] Ucsc known genes track for cow In-Reply-To: <23B0A4FBD181A44D9B89C4FB3E96D594A22FCD@ussemx1100.merck.com> References: <46E6902C.9030804@igbmc.u-strasbg.fr> <23B0A4FBD181A44D9B89C4FB3E96D594A22FCD@ussemx1100.merck.com> Message-ID: <46E6E36D.3060805@soe.ucsc.edu> Hello Archie, Unfortunately, we don't have any plans to create the UCSC Genes track for cow. We are sorry about that. You can read more about our UCSC Genes track here: http://www.genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg18&g=knownGene I hope this information is helpful. Please let us know if you have further questions. Regards, Archana UCSC Genome Bioinformatics Group Russell, Archie wrote: > > Hi, > > Do you have any plans to create the Ucsc Genes track for cow? > > Also, how does the the Ucsc Genes method compare with NCBI's XMs and the > ensembl pipeline? It looks like UCSC Genes is based on RNA evidence > where the others might not be, is that correct? > > Thanks, > Archie > > > > > ------------------------------------------------------------------------------ > Notice: This e-mail message, together with any attachments, contains > information of Merck & Co., Inc. (One Merck Drive, Whitehouse Station, > New Jersey, USA 08889), and/or its affiliates (which may be known > outside the United States as Merck Frosst, Merck Sharp & Dohme or MSD > and in Japan, as Banyu - direct contact information for affiliates is > available at http://www.merck.com/contact/contacts.html) that may be > confidential, proprietary copyrighted and/or legally privileged. It is > intended solely for the use of the individual or entity named on this > message. If you are not the intended recipient, and have received this > message in error, please notify us immediately by reply e-mail and then > delete it from your system. > > ------------------------------------------------------------------------------ > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome > From Jing.Ma at STJUDE.ORG Tue Sep 11 12:21:17 2007 From: Jing.Ma at STJUDE.ORG (Ma, Jing) Date: Tue, 11 Sep 2007 14:21:17 -0500 Subject: [Genome] adjust the thickness of the custom track display Message-ID: Hi, I'm trying to display some genomic segments using custom track. I can display them following the format without any problem. My question is, is there any way that I can adjust the thickness of the rectangles representing my segments (I hope to make them thinner than the default look)? Thanks, Jing Ma St. Jude Children's Research Hospital From archanat at soe.ucsc.edu Tue Sep 11 14:24:36 2007 From: archanat at soe.ucsc.edu (Archana Thakkapallayil) Date: Tue, 11 Sep 2007 14:24:36 -0700 Subject: [Genome] adjust the thickness of the custom track display In-Reply-To: References: Message-ID: <46E70794.6040701@soe.ucsc.edu> Hello Jing, Unfortunately, there isn't any option to adjust the thickness of the items displayed in the custom track. However, the thickness of the items get adjusted when you change the "text size" from the 'Configure' page. To do this, choose "configure" from the browser display page, then change the size of the text from the drop down menu. Also, I would like to mention about the bed-12 type tracks that have different sizes for items within the thickStart-thickEnd region, typically used to define UTR regions as narrow items. See this page for help on formatting a BED file: http://www.genome.ucsc.edu/goldenPath/help/customTrack.html#BED I hope this information is helpful to you. Please let us know if you have further questions. Regards, Archana UCSC Genome Bioinformatics Group Ma, Jing wrote: > Hi, > > > > I'm trying to display some genomic segments using custom track. I can > display them following the format without any problem. My question is, > is there any way that I can adjust the thickness of the rectangles > representing my segments (I hope to make them thinner than the default > look)? > > > > Thanks, > > > > Jing Ma > > St. Jude Children's Research Hospital > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome > From wxzheng_tju at hotmail.com Tue Sep 11 20:38:38 2007 From: wxzheng_tju at hotmail.com (ZhengWenXin) Date: Wed, 12 Sep 2007 11:38:38 +0800 Subject: [Genome] Thanks for your help! Message-ID: Hello Hiram and Kate, Thank you very much for your timely and detailed guidance, which will help me greatly with my study. Best wishes, WenXin Zheng Wen-Xin Zheng, PhD candidateBioinformatics CenterTianjin UniversityTianjin 300072ChinaFax: +86-22-27402697Website: http://tubic.tju.edu.cn _________________________________________________________________ MSN ???????????????????? http://cn.msn.com From yuhsuanl at umich.edu Wed Sep 12 11:09:27 2007 From: yuhsuanl at umich.edu (Lin Yu-Hsuan) Date: Wed, 12 Sep 2007 14:09:27 -0400 Subject: [Genome] Chain file for conversion of hg15 coordinates to hg18? Message-ID: <20070912140927.88one1dz40sw0ksw@web.mail.umich.edu> To whom it may concern, I was trying to convert hg15 coordinates to hg18 but found no such liftOver chain file under hg15 directory. I wonder if it is possible to get the file needed to convert hg15 coordinates to hg18 from you. Thank you very much. Best regards, Yu-Hsuan From rhead at soe.ucsc.edu Wed Sep 12 12:15:26 2007 From: rhead at soe.ucsc.edu (Brooke Rhead) Date: Wed, 12 Sep 2007 12:15:26 -0700 Subject: [Genome] Chain file for conversion of hg15 coordinates to hg18? In-Reply-To: <20070912140927.88one1dz40sw0ksw@web.mail.umich.edu> References: <20070912140927.88one1dz40sw0ksw@web.mail.umich.edu> Message-ID: <46E83ACE.6050001@soe.ucsc.edu> Hello Yu-Hsuan, We do not have plans to make a liftOver file for hg15 -> hg18. However, you can instead do a "double-lift" to go from hg15 coordinates to hg18 coordinates. First lift from hg15 -> hg17 (May 2004), then lift from hg17 -> hg18 (March 2006). I hope this solution works for you. Please let us know if you have further questions. -- Brooke Rhead UCSC Genome Bioinformatics Group Lin Yu-Hsuan wrote: > To whom it may concern, > > I was trying to convert hg15 coordinates to hg18 but found no such liftOver > chain file under hg15 directory. I wonder if it is possible to get the > file needed to convert hg15 coordinates to hg18 from you. Thank you very much. > > Best regards, > > Yu-Hsuan > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From Johanne.Duhaime at ircm.qc.ca Thu Sep 13 07:11:36 2007 From: Johanne.Duhaime at ircm.qc.ca (Duhaime Johanne) Date: Thu, 13 Sep 2007 10:11:36 -0400 Subject: [Genome] Yeast genome assembly Message-ID: <96C071542ED58D49BC08210D3456D5808612BA@pandore.ircm.priv> Is there a new assembly planned for the yeast genome? Our genomic facility has developed a multi-organism application creating BED and WIG files for data visualization. Since our users working on yeast also want to benefit the very useful and powerful tools available in your Browser and its Galaxy extension, but considering that the only assembly for the yeast is from 2003, they ask us to create a 2007-like assembly. Instead of doing so only for us, is there a possibility to add this new assembly for all the UCSC's yeast users with your help? Thank you Johanne Duhaime duhaimj at ircm.qc.ca From Vidar.Blikstad at medsci.uu.se Thu Sep 13 04:25:53 2007 From: Vidar.Blikstad at medsci.uu.se (Vidar Blikstad) Date: Thu, 13 Sep 2007 13:25:53 +0200 Subject: [Genome] EST sequences Message-ID: <20070913132553.p7fyi0vsbkggkss0@webmail3.uu.se> Hello! I?m a user of the table browser - can you help me to obtain EST sequences from a UniGene library (i.e.2NbHMSP). How to display these at the genome browser? Sincerely Vidar Blikstad Uppsala university From itot at tll.org.sg Wed Sep 12 23:05:38 2007 From: itot at tll.org.sg (Toshiro Ito) Date: Thu, 13 Sep 2007 14:05:38 +0800 Subject: [Genome] result output problem? Message-ID: Hello, I am trying to see H3K27me3 and LHP1 status in the Arabidopsis genome. When I change the magnification, the information seems lost. For example, size 9,000 bp, I see strong mark for H3K27me3 and binding of LHP1, http://epigenomics.mcdb.ucla.edu/cgi-bin/hgTracks?hgsid=4053&hgt.in1=1.5x&position=chr5%3A3169750-3183249 but when I zoom-in or out, H3K27 and LHP1 results become flat. I tried several other genes which should have these modification, but it did not show any enrichment (zero for Y-axis). Could you help this out? Thank you, Toshiro Ito From kayla at soe.ucsc.edu Thu Sep 13 11:29:42 2007 From: kayla at soe.ucsc.edu (Kayla Smith) Date: Thu, 13 Sep 2007 11:29:42 -0700 Subject: [Genome] Yeast genome assembly In-Reply-To: <96C071542ED58D49BC08210D3456D5808612BA@pandore.ircm.priv> References: <96C071542ED58D49BC08210D3456D5808612BA@pandore.ircm.priv> Message-ID: <46E98196.4010400@cse.ucsc.edu> Johanne, We don't currently have plans to update the existing yeast assembly on our website. We may consider updating it in the future if our scientific advisory board decides that it is a priority. You might want to consider running a mirror of our Genome Browser and adding the latest yeast assembly to that. Details for how to set up a mirror are here: http://genome.ucsc.edu/admin/mirror.html Thanks, Kayla Smith UCSC Genome Bioinformatics Group Duhaime Johanne wrote: > Is there a new assembly planned for the yeast genome? > > Our genomic facility has developed a multi-organism application creating > BED and WIG files for data visualization. Since our users working on > yeast also want to benefit the very useful and powerful tools available > in your Browser and its Galaxy extension, but considering that the only > assembly for the yeast is from 2003, they ask us to create a 2007-like > assembly. Instead of doing so only for us, is there a possibility to add > this new assembly for all the UCSC's yeast users with your help? > > Thank you > > Johanne Duhaime > duhaimj at ircm.qc.ca > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From hiram at soe.ucsc.edu Thu Sep 13 11:38:39 2007 From: hiram at soe.ucsc.edu (Hiram Clawson) Date: Thu, 13 Sep 2007 11:38:39 -0700 Subject: [Genome] result output problem? In-Reply-To: References: Message-ID: <46E983AF.8010506@soe.ucsc.edu> Good Morning Toshiro: Something is unusual with the data as it is loaded in these tracks. I suspect different "spans" of data have been mixed together, or the .wib file is not the correct file. Can you examine the table H3K27triM_origUniqHMMbpmap in the database and verify that it has only a single value of span. Also, verify that the .wib file referenced in the database table is the correct file for the table. Something is out of sync with this data. --Hiram Toshiro Ito wrote: > Hello, > I am trying to see H3K27me3 and LHP1 status in the Arabidopsis genome. > When I change the magnification, the information seems lost. > For example, size 9,000 bp, I see strong mark for H3K27me3 and binding of LHP1, > http://epigenomics.mcdb.ucla.edu/cgi-bin/hgTracks?hgsid=4053&hgt.in1=1.5x&position=chr5%3A3169750-3183249 > but when I zoom-in or out, H3K27 and LHP1 results become flat. > > I tried several other genes which should have these modification, but > it did not show any enrichment (zero for Y-axis). > > Could you help this out? > Thank you, > > Toshiro Ito From kayla at soe.ucsc.edu Thu Sep 13 11:40:36 2007 From: kayla at soe.ucsc.edu (Kayla Smith) Date: Thu, 13 Sep 2007 11:40:36 -0700 Subject: [Genome] result output problem? In-Reply-To: References: Message-ID: <46E98424.2000803@cse.ucsc.edu> Hello Toshiro, The link you've sent us it to a mirror site at UCLA. I recommend contacting them about any questions you have with their data. It looks like their contact information is: matteop at mcdb.ucla.edu I hope this is helpful to you. Kayla Smith UCSC Genome Bioinformatics Group Toshiro Ito wrote: > Hello, > I am trying to see H3K27me3 and LHP1 status in the Arabidopsis genome. > When I change the magnification, the information seems lost. > For example, size 9,000 bp, I see strong mark for H3K27me3 and binding of LHP1, > http://epigenomics.mcdb.ucla.edu/cgi-bin/hgTracks?hgsid=4053&hgt.in1=1.5x&position=chr5%3A3169750-3183249 > but when I zoom-in or out, H3K27 and LHP1 results become flat. > > I tried several other genes which should have these modification, but > it did not show any enrichment (zero for Y-axis). > > Could you help this out? > Thank you, > > Toshiro Ito > > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From ChinYi_Chu at URMC.Rochester.edu Thu Sep 13 13:14:28 2007 From: ChinYi_Chu at URMC.Rochester.edu (Chu, ChinYi) Date: Thu, 13 Sep 2007 16:14:28 -0400 Subject: [Genome] Question regarding how to view GeneChip Mouse Tiling Arrays data using UCSC Genome Browser Message-ID: <50FAE284F91C964DB5ED44B75AE7D717DC4BD8@e2k3ms3.urmc-sh.rochester.edu> Hi! Affymetrix technical support told me that I can use UCSC Genome Browser to view Affymetrix tiling array data. I went through the web site but I couldn't find a way to do it... could you please give me some hints? Thank you! Sincerely, Chin-Yi ------------------------------------------ Chin-Yi Chu Bioinformatics Service Functional Genomics Center University of Rochester 211 Bailey Road, Suite 1 West Henrietta, NY14586 Office: (585)276-9988 Functional Genomics Center: (585)427-9334 Email: chinyi_chu at urmc.rochester.edu URL: http://fgc.urmc.rochester.edu ------------------------------------------ From sdavis2 at mail.nih.gov Thu Sep 13 13:43:34 2007 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Thu, 13 Sep 2007 16:43:34 -0400 Subject: [Genome] Question regarding how to view GeneChip Mouse Tiling Arrays data using UCSC Genome Browser In-Reply-To: <50FAE284F91C964DB5ED44B75AE7D717DC4BD8@e2k3ms3.urmc-sh.rochester.edu> References: <50FAE284F91C964DB5ED44B75AE7D717DC4BD8@e2k3ms3.urmc-sh.rochester.edu> Message-ID: <46E9A0F6.7080105@mail.nih.gov> Chu, ChinYi wrote: > Hi! > > > > Affymetrix technical support told me that I can use UCSC Genome Browser > to view Affymetrix tiling array data. I went through the web site but I > couldn't find a way to do it... could you please give me some hints? > Thank you! > They may have also meant to use the Affymetrix Integrated Genome Browser: http://www.affymetrix.com/support/developer/tools/download_igb.affx Just a suggestion. Sean From kayla at soe.ucsc.edu Thu Sep 13 14:04:01 2007 From: kayla at soe.ucsc.edu (Kayla Smith) Date: Thu, 13 Sep 2007 14:04:01 -0700 Subject: [Genome] Question regarding how to view GeneChip Mouse Tiling Arrays data using UCSC Genome Browser In-Reply-To: <50FAE284F91C964DB5ED44B75AE7D717DC4BD8@e2k3ms3.urmc-sh.rochester.edu> References: <50FAE284F91C964DB5ED44B75AE7D717DC4BD8@e2k3ms3.urmc-sh.rochester.edu> Message-ID: <46E9A5C1.10905@cse.ucsc.edu> Hello, ChinYi: Yes, we have some Affymetrix data on the Genome Browser. The data we have is on human, mouse, rat, zebrafish and drosophila. We have both alignment tracks where we align consensus sequences to the genome, and data tracks, which show actual experimental data in those regions. An easy way to find Affymetrix tracks on our website is to open a browser and scroll down to the "Expression and Regulation" track controls. To get you started, here is a session for human (hg18) with a few Affymetrix tracks turned on: http://genome.cse.ucsc.edu/cgi-bin/hgTracks?hgS_doOtherUser=submit&hgS_otherUserName=Kayla&hgS_otherUserSessionName=hg18%2Daffymlq I hope this is helpful to you. Please don't hesitate to contact us again if we can be of further assistance. Kayla Smith UCSC Genome Bioinformatics Group Chu, ChinYi wrote: > Hi! > > > > Affymetrix technical support told me that I can use UCSC Genome Browser > to view Affymetrix tiling array data. I went through the web site but I > couldn't find a way to do it... could you please give me some hints? > Thank you! > > > > Sincerely, > > Chin-Yi > > > > ------------------------------------------ > > Chin-Yi Chu > > Bioinformatics Service > > Functional Genomics Center > > University of Rochester > > 211 Bailey Road, Suite 1 > > West Henrietta, NY14586 > > Office: (585)276-9988 > > Functional Genomics Center: (585)427-9334 > > Email: chinyi_chu at urmc.rochester.edu > > > URL: http://fgc.urmc.rochester.edu > > ------------------------------------------ > > > > > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From Thomas.Lacroix at UCHSC.edu Thu Sep 13 14:34:12 2007 From: Thomas.Lacroix at UCHSC.edu (Thomas.Lacroix at UCHSC.edu) Date: Thu, 13 Sep 2007 15:34:12 -0600 Subject: [Genome] problem discrepancy Comparative genomics Message-ID: Hi, We are downloading the track "mouse chain" of the tables "Comparative genomics" of the human Mar 2006 assembly for the whole chromosome 21 and we aim to look at what portion of the genome is conserved in mouse but there seems to be a discrepancy between the data from the download table and what is displayed in the genome browser. I checked manually some tStart and tEnd from the download table and it doesn't correspond to what is displayed in the genome browser when I turn on the chain or net track ! In other words it it does not match the boxes that represent conservation in the browser... Can you help ? Thomas From rhead at soe.ucsc.edu Thu Sep 13 18:08:08 2007 From: rhead at soe.ucsc.edu (Brooke Rhead) Date: Thu, 13 Sep 2007 18:08:08 -0700 Subject: [Genome] EST sequences In-Reply-To: <20070913132553.p7fyi0vsbkggkss0@webmail3.uu.se> References: <20070913132553.p7fyi0vsbkggkss0@webmail3.uu.se> Message-ID: <46E9DEF8.3000006@soe.ucsc.edu> Hello Vidar, To display ESTs from a particular library in the Genome Browser, first turn on the "Human ESTs" track (in the "mRNA and EST Tracks" section). Then go to the track control page, either by hitting the small button on the far left-hand side of the track, or by clicking the blue "Human ESTs" link right above the track control. The track control page contains several options for filtering the data displayed in the EST track, one of which is "library". Enter the library name of interest in the box. You may need to add wildcard characters to the library names, as we store longer names than your example in our database. For instance, the library name "2NbHMSP" used as a filter term yields no results, but the name "*2NbHMSP" matches these two libraries: Soares 2NbHMSP Soares_multiple_sclerosis_2NbHMSP If you click on the EST in the Genome Browser, you will be taken to a details page, where there is a link to the EST sequence. To get a lot of EST sequences at once, you will need to download the entire file of EST sequnces that we get from GenBank. (The Table Browser can be used to retrieve genomic sequence from the areas where ESTs align, but the EST sequence itself is not stored in a table and is not available via the Table Browser.) The file is located here: http://hgdownload.cse.ucsc.edu/goldenPath/hg18/bigZips/est.fa.gz Be aware that this is a very large file (~1.4G). The file is in FASTA format. The Table Browser *could* be used to get a list of the EST accession numbers in a particular library, but due to the huge size of the all_est table and the complexity of the query, it is very time-consuming to use it for this purpose. It is much easier to use our public MySQL database to obtain the information: http://genome.ucsc.edu/FAQ/FAQdownloads#download29 Here is an example MySQL query that will get you a list of accessions in the second library listed above: mysql> SELECT gbCdnaInfo.acc FROM gbCdnaInfo, library WHERE gbCdnaInfo.type='est' AND gbCdnaInfo.library=library.id AND library.name='Soares_multiple_sclerosis_2NbHMSP'; I hope this information is helpful. If you have further questions, please feel free to contact this mailing list again. -- Brooke Rhead UCSC Genome Bioinformatics Group Vidar Blikstad wrote: > Hello! > > I?m a user of the table browser - can you help me to obtain EST > sequences from a UniGene library (i.e.2NbHMSP). How to display these > at the genome browser? > > Sincerely > > Vidar Blikstad > Uppsala university > > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From ko.pang.cn at gmail.com Fri Sep 14 14:43:17 2007 From: ko.pang.cn at gmail.com (Pang Ko) Date: Fri, 14 Sep 2007 14:43:17 -0700 Subject: [Genome] Difference between Genome Browser and Table Browser output Message-ID: <178f32960709141443x68ab85e2v435854f305650217@mail.gmail.com> Hi, I am interested in the location chr12:64912201-64912645 in human genome (hg17) I am using the Genome Browser with Genome: Human Assembly: May 2004 and choose Track Conservation to be full. Then I click on the conservation track and then choose rat in the multiple alignment, it will bring me to rat genome location chr7:59446696-59,444,569 (rn3). When I use Table Browser and choose Genome: Human Assembly: May 2004 Group: Comparative Genomics Track: Conservation table: multiz17way Position: chr12:64912201-64912645 The output of table browser shows an alignment with rn3 location chr7:83636219-83638448 My question is why are the two output different? (one shows alignment with chr7:59446696-59444569 and the other shows chr7:83636219-83638448 in the same assembly) Did I do something wrong? Thank you, Ko. From rhead at soe.ucsc.edu Fri Sep 14 16:23:02 2007 From: rhead at soe.ucsc.edu (Brooke Rhead) Date: Fri, 14 Sep 2007 16:23:02 -0700 Subject: [Genome] Difference between Genome Browser and Table Browser output In-Reply-To: <178f32960709141443x68ab85e2v435854f305650217@mail.gmail.com> References: <178f32960709141443x68ab85e2v435854f305650217@mail.gmail.com> Message-ID: <46EB17D6.7070706@soe.ucsc.edu> Hello Ko, You did not do anything wrong, there is just a difference in formats in these two locations. The Table Browser outputs conservation data in MAF format, described here: http://genome.ucsc.edu/FAQ/FAQformat#format5 The numbering of coordinates on the negative strand in MAF files is relative to the other end of the chromosome (that is, it is relative to the reverse-complement of the positive strand). So, to calculate Genome Browser coordinates from MAF coordinates, you need to subtract each start that occurs on the negative strand from the total chromosome size. For example, the first conservation track output line for rat in the Table Browser (in the human (hg17) region chr12:64912201-64912645) looks like this: s rn3.chr7 83636219 54 - 143082968 TCATT... Which corresponds to: source: rn3.chr7 start: 83636219 size: 54 strand: - source size: 143082968 To convert these numbers to Genome Browser coordinates, first subtract the start from the source size: 143082968 - 83636219 = 59446749 This is the right-most coordinate displayed in the browser. To get the left-most coordinate, subtract 54 from this number: 59446749 - 54 = 59446695 This is almost the exact same coordinate you see in the Genome Browser. However, because MAF coordinates are zero-based and Browser coordinates are one-based, you need to add 1 to the start coordinate, so the final coordinate range becomes: chr7:59446696-59446749 which matches the first alignment block in the shown in the Genome Browser. I hope this explanation helps. If you have further questions, please do not hesitate to contact us again. -- Brooke Rhead UCSC Genome Bioinformatics Group Pang Ko wrote: > Hi, > > I am interested in the location chr12:64912201-64912645 in human genome (hg17) > > I am using the Genome Browser with > Genome: Human > Assembly: May 2004 > and choose Track Conservation to be full. Then I click on the > conservation track and then choose rat in the multiple alignment, it > will bring me to rat genome location chr7:59446696-59,444,569 (rn3). > > When I use Table Browser and choose > Genome: Human > Assembly: May 2004 > Group: Comparative Genomics > Track: Conservation > table: multiz17way > Position: chr12:64912201-64912645 > > The output of table browser shows an alignment with rn3 location > chr7:83636219-83638448 > > My question is why are the two output different? (one shows > alignment with chr7:59446696-59444569 and the other shows > chr7:83636219-83638448 in the same assembly) Did I do something wrong? > > Thank you, > Ko. > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From sachikoreed at gmail.com Fri Sep 14 21:12:10 2007 From: sachikoreed at gmail.com (sachiko reed) Date: Fri, 14 Sep 2007 21:12:10 -0700 Subject: [Genome] Genome Bioinformatics question Message-ID: Hi, My name is Sachiko Reed and I'm a third year PhD student in Sociology at UCSC and recent fellow with CBSE. I'm working with Professor Jenny Reardon on implications of genomic research and mixed race identity... I would love to learn more about the Genome Bioinformatics. I will be on campus October 3 and was wondering if there is someone I can meet with to learn more about your browser. I look forward to hearing from you! Sincerely, Sachiko 415.823.2330 From rjfeldma at globaldeterminants.com Mon Sep 17 08:01:01 2007 From: rjfeldma at globaldeterminants.com (Richard J. Feldmann) Date: Mon, 17 Sep 2007 11:01:01 -0400 Subject: [Genome] Venter Genome Message-ID: Do you have any sense of whether and when the J. Craig Venter genome presented recently in PLoS will be available at UCSC? -- ----------------------------------------------------------------- Richard J. Feldmann (v) 301-926-0921 Global Determinants, Inc. (c) 301-526-8524 17800 Mill Creek Dr. Derwood, Maryland 20855-1019 rjfeldma at globaldeterminants.com ----------------------------------------------------------------- From Alexandre.Blais at uOttawa.ca Sat Sep 15 10:06:13 2007 From: Alexandre.Blais at uOttawa.ca (Alexandre Blais) Date: Sat, 15 Sep 2007 13:06:13 -0400 Subject: [Genome] Using Tables with custom tracks Message-ID: Hello UCSC team, I am experiencing a problem using Tables with my own custom tracks. My goal is to get genomic sequence for a bunch of genomic coordinates. I am uploading the coordinates as custom tracks. So under "Tables", my approach is to select "sequence output", and choose the following: Group: Custom tracks Track: User track Table: ct_UserTrack And enter the names of my tracks there. When I click on "paste list", I get the following message: "Can't start query: describe ct_UserTrack mySQL error 1146: Table 'hg18.ct_UserTrack' doesn't exist" What I am doing wrong? I double checked to make sure I am looking in the same assembly I uploaded the tracks in. I can see the entries of my custom tracks just fine, in the browser... I am assuming I don't have to intersect or create any filter. I tried with various genome assemblies, and I get the same message in all cases. I am afraid I am doing something completely wrong... Thanks much for your help! Alex From jlu at bio.fsu.edu Sat Sep 15 19:16:59 2007 From: jlu at bio.fsu.edu (jlu at bio.fsu.edu) Date: Sat, 15 Sep 2007 22:16:59 -0400 Subject: [Genome] synteny map Message-ID: <20070915221659.3n1cy7p4qockg4ow@webmail.bio.fsu.edu> Dear UCSC, I did some study on synteny maps between species (human vs mouse) but didn't find anything that can give something as simple as what I attached here. The net/chain files are quite complicated and don't seems to do this. Could you please give me some hint as to if this kind of data is available or where to look for or how to compute them? thanks, Best wishes, Junjie ---------------------------------------------------------------- This message was sent using IMP, the Internet Messaging Program. From bob at bionet.nsc.ru Mon Sep 17 09:26:30 2007 From: bob at bionet.nsc.ru (Vlad Babenko) Date: Mon, 17 Sep 2007 23:26:30 +0700 Subject: [Genome] phastcons_scores Message-ID: <003501c7f947$84d1ba80$640419ac@amg5> Greetings, First, thanks to UCSC developers for the work and assistance ;-) Next is a problem: I didn't manage to coordinate hg18.phastcons tracks to that I may see in the browser. For example, it could be seen that the score =1 for the chr6 pos 26195725 (http://www.genome.ucsc.edu/cgi-bin/hgTracks?position=chr6:26195715-26195740&db=hg18&ss=../trash/hgSs/hgSs_www_221b_95c6a0.pslx+../trash/hgSs/hgSs_www_221b_95c6a0.fa&hgsid=92639599) Still I may see that in file (http://hgdownload.cse.ucsc.edu/goldenPath/hg18/phastCons2