From rstallings at rcsi.ie Mon Mar 3 05:08:19 2008 From: rstallings at rcsi.ie (Ray Stallings) Date: Mon, 3 Mar 2008 13:08:19 -0000 Subject: [Genome] (no subject) Message-ID: Hi, I am having difficulty getting Genome Browser to display MYCN binding sites. For example, the region chr13:90,795,999-90,802,001 contains two MYCN binding sites that appear in the tables (see attached pdf). When these coordinates are typed into genome browser, and the TFBS conserved box is set at "full", these two MYCN sites do not appear on the browser display (see attached pdf)_. This is not an isolated incident - MYCN is not appearing for many regions of the genome. Any ideas why this is so, and how I can correct the problem? Thanks Prof. Raymond L. Stallings, PhD. Chair of Cancer Genetics Royal College of Surgeons in Ireland 123 St. Stephen's Green York House Dublin 2, Ireland & Programme Leader, Cancer Genetics Children's Research Centre Our Lady's Children's Hospital, Crumlin PHONE: 353 1 402-8533 FAX: 353 1 402-2453 From andrewjyee at gmail.com Mon Mar 3 07:28:32 2008 From: andrewjyee at gmail.com (Andrew Yee) Date: Mon, 3 Mar 2008 10:28:32 -0500 Subject: [Genome] further questions about causes of duplicated gene Message-ID: <5dff5a0d0803030728x3098e8faq9c16eb3a11627b42@mail.gmail.com> When I search for "DUX4", there are two locations listed for the reference gene NM_033178, on chromosome 4 and 10. I've read through the FAQ on http://genome.ucsc.edu/FAQ/FAQtracks#tracks9 discussing "cause of duplicated gene." But is there a way to know ahead of time which one is the "true" coding sequence or which one is considered the coding sequence by RefSeq without having to manually examine the Genome Browser details of each result? The entry in NCBI has it listed on chromosome 4 (see http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&id=119220600). Thanks, Andrew From weirauch at soe.ucsc.edu Mon Mar 3 09:36:34 2008 From: weirauch at soe.ucsc.edu (Matt Weirauch) Date: Mon, 3 Mar 2008 09:36:34 -0800 Subject: [Genome] (no subject) In-Reply-To: References: Message-ID: Hi Ray, Try adjusting the Z score cutoff for the track on the track description page. (You can do this by clicking on the track name. The field is at the top of the page.) By default, the browser displays TFBS with z-scores greater than or equal to 2.33 (p < 0.01). This can be adjusted to be as low as 1.64 (p < 0.05). I'll bet the missing binding sites fall somewhere in this 1.64 to 2.33 range, which is why they are currently not being displayed in your browser. Matt On Mon, Mar 3, 2008 at 5:08 AM, Ray Stallings wrote: > Hi, > > > > I am having difficulty getting Genome Browser to display MYCN binding sites. > For example, the region chr13:90,795,999-90,802,001 contains two MYCN > binding sites that appear in the tables (see attached pdf). When these > coordinates are typed into genome browser, and the TFBS conserved box is set > at "full", these two MYCN sites do not appear on the browser display (see > attached pdf)_. This is not an isolated incident - MYCN is not appearing for > many regions of the genome. Any ideas why this is so, and how I can correct > the problem? > > > > Thanks > > > > Prof. Raymond L. Stallings, PhD. > > Chair of Cancer Genetics > > Royal College of Surgeons in Ireland > > 123 St. Stephen's Green > > York House > > Dublin 2, Ireland > > & > > Programme Leader, Cancer Genetics > > Children's Research Centre > > Our Lady's Children's Hospital, Crumlin > > > > PHONE: 353 1 402-8533 > > FAX: 353 1 402-2453 > > > > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome > > From klein.christopher at mayo.edu Mon Mar 3 09:26:57 2008 From: klein.christopher at mayo.edu (Klein, Christopher J., M.D.) Date: Mon, 3 Mar 2008 11:26:57 -0600 Subject: [Genome] Question on referencing? Message-ID: Dear Sir/Madam; How do I reference for publication your gene expression data for SEPT-9 gene at your site (GNF Expression Atlas 2 Data from U133A and GNF1H Chips). Is there a manuscript to refer to, or standard practice? Thanks Chris Klein From rhead at soe.ucsc.edu Mon Mar 3 10:06:32 2008 From: rhead at soe.ucsc.edu (Brooke Rhead) Date: Mon, 03 Mar 2008 10:06:32 -0800 Subject: [Genome] Question on referencing? In-Reply-To: References: Message-ID: <47CC3E28.6020608@soe.ucsc.edu> Hello Chris, Please see our guidelines for citing Genome Browser annotation tracks here: http://genome.cse.ucsc.edu/cite.html To find references specific to a particular Genome Browser track, click on the blue track name on the main Genome Browser display page (http://genome.cse.ucsc.edu/cgi-bin/hgTracks). Please let us know if you have further questions. -- Brooke Rhead UCSC Genome Bioinformatics Group Klein, Christopher J., M.D. wrote: > Dear Sir/Madam; > How do I reference for publication your gene expression data for SEPT-9 > gene at your site (GNF Expression Atlas 2 Data from U133A and GNF1H > Chips). Is there a manuscript to refer to, or standard practice? > Thanks > Chris Klein > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome > From john.woods at mail.utexas.edu Mon Mar 3 17:14:56 2008 From: john.woods at mail.utexas.edu (John Woods) Date: Mon, 03 Mar 2008 19:14:56 -0600 Subject: [Genome] BLAT Search Genome: hgc XML output? Message-ID: <47CCA290.6090005@mail.utexas.edu> I've Googled quite a bit for this, but no luck. I also see that BioPython has a BLAT parser which will output XML, which leads me to believe that the answer to my question will be no. But I thought I should ask anyway--before reinventing the wheel. So, the BLAT Search Rsults outputs this nice list with links to browser and details. If I click on "details" for a hit, I get cgi-bin/hgc, which highlights--in both the query and the match--the portions of the sequence which align well. Is it possible to get XML output for this page? Particularly, I'd like to be able to extract both the aligned an unaligned portions of sequence (from both query and result) in some sort of array or list. Alternatively, is there pal2nal functionality for this script? And finally, if the answer to both of those is no, any recommendations for parsers? I can't find the docs on BioPython's BLAT parser, sadly. Thanks very much! Cheers, John Woods Institute for Cellular and Molecular Biology The University of Texas at Austin From hiram at soe.ucsc.edu Mon Mar 3 23:17:11 2008 From: hiram at soe.ucsc.edu (Hiram Clawson) Date: Mon, 03 Mar 2008 23:17:11 -0800 Subject: [Genome] BLAT Search Genome: hgc XML output? In-Reply-To: <47CCA290.6090005@mail.utexas.edu> References: <47CCA290.6090005@mail.utexas.edu> Message-ID: <47CCF777.3060505@soe.ucsc.edu> Good Evening John: This is an interesting problem you present here. I haven't seen references to XML outputs for blat. As you are no doubt aware, the fundamental output of blat is the psl file format: http://genome.ucsc.edu/FAQ/FAQformat#format2 You can get psl output as one of the options for output in the blat WEB screen. If you are running blat locally, you will also be getting this psl output. See also, blat usage restrictions: http://genome.ucsc.edu/FAQ/FAQblat#blat3 At a command line, to get a similar type of ascii output you see in the hgc click through, we use the command line utility from the kent source tree: pslPretty Which can put together a psl output and the two sequences to produce the side by side alignment picture. http://genome.ucsc.edu/admin/cvs.html http://genome.ucsc.edu/admin/jk-install.html You will find numerous psl manipulation commands in the kent source tree. Here is a listing of these commands from our bin directory: pslCDnaFilter pslFilterPrimers pslPairs pslSimp pslToPslx pslCat pslGlue pslPartition pslSort pslToXa pslCheck pslHisto pslPretty pslSortAcc pslUniq pslCoverage pslHitPercent pslQuickFilter pslSplitOnTarget pslUnpile pslDiff pslIntronsOnly pslRecalcMatch pslStats pslxToFa pslDropOverlap pslMap pslReps pslSwap pslFilter pslMrnaCover pslSelect pslToBed Each command will indicate how to use it if run with no arguments. I don't know if these will help, but I'm guessing that since the psl output is the fundamental output from blat, one of these formatting and filtering commands might get what you want. You would also need the .2bit sequence files for the genomes of interest to work with them locally. Those can be fetched from the downloads server. --Hiram John Woods wrote: > I've Googled quite a bit for this, but no luck. I also see that > BioPython has a BLAT parser which will output XML, which leads me to > believe that the answer to my question will be no. But I thought I > should ask anyway--before reinventing the wheel. > > So, the BLAT Search Rsults outputs this nice list with links to browser > and details. If I click on "details" for a hit, I get cgi-bin/hgc, which > highlights--in both the query and the match--the portions of the > sequence which align well. > > Is it possible to get XML output for this page? Particularly, I'd like > to be able to extract both the aligned an unaligned portions of sequence > (from both query and result) in some sort of array or list. > > Alternatively, is there pal2nal functionality for this script? > > And finally, if the answer to both of those is no, any recommendations > for parsers? I can't find the docs on BioPython's BLAT parser, sadly. > > Thanks very much! > > Cheers, > John Woods > > Institute for Cellular and Molecular Biology > The University of Texas at Austin From jlwang at imcb.a-star.edu.sg Tue Mar 4 00:41:51 2008 From: jlwang at imcb.a-star.edu.sg (Wang JianLi) Date: Tue, 4 Mar 2008 16:41:51 +0800 Subject: [Genome] can blastz be used for a single fasta sequence file vs. multiple fasta sequence file? References: <47C6F224.7020007@charite.de> Message-ID: <889F4F6D1CFC9945B3610F97CD400C15015323@PIGEONS-MA.imcb.a-star.edu.sg> Dear all, I am not sure if blastz could be used for a single fasta sequence file vs. a multiple fasta sequence file. Because when I use lav2maf, I found the fasta headers from the multiple fasta sequence file were not recognized and in the maf file, all the fasta sequences end up with the same name. Thanks. Regards, JL Note: This message may contain confidential information. If this email has been sent to you by mistakes, please notify the sender and delete it immediately. Thank you. From K0435205 at kingston.ac.uk Tue Mar 4 09:44:48 2008 From: K0435205 at kingston.ac.uk (Gonzales-Sanchez, Ester) Date: Tue, 4 Mar 2008 17:44:48 -0000 Subject: [Genome] Output interpretation help Message-ID: Dear Webmaster. I need some help to be able to interpret the output of UCSC.I do not understand the output very well. I have added the output of the molecule sequence I am interested in.How can I comment and interpret this output?.Adhesion molecule , Icam-1. TWhat to bars and arrows means ?Blue,green,red and black colours.How can I interepret SNPs? position/search size 14,382 bp. move start Click on a feature for details. Click on base position to zoom in around cursor. Click gray/blue >ref|NC_000019.8|NC_000019:10237779-10242779 Homo sapiens chromosome 19, reference assembly, complete sequence GTCTCGAACTCCTGACCTCAGGTGATTCTCCTGCCTTGGCCTCCCAAAGTGCTGAGATTACAGGTGTGAG CCACTGCACACGGCCTTAAATTTTATTTATTATTTATTTATTTATTTATTTAGAGACTTAGTCTCACTCT GTTGCCCAGGCTGGAGTGCAGTGGCATGGTCTCGGCTCACTGCACTCCACCTCCTGGGTTCACGCCATTC TCCTGCCTCAGCCTCCCGAGTAGCTGGGACTACAGGCGCCCACCACCACTCCCGGCTAATTTTTGTATTT TTAGTAGAGATGGGGTTTCACTGTGTTAGCCAGGATAGTCTCGATCTCCTGACCTCGTGATCCGCCTGCC TCGGCCTCCCAAAGTGCTGGGATTACTTATTTTGTTTTTTGTAGAGACAGGTTCTCACTGTGTTGCCCAG GCTGGTCTTGAACTCCTGATCTCAAGTGATCTTCCCACCTCAGTCTCTCAAAGGGCTGGGATTACAGGGG TGAGCCACTGCACCCCACCTTCCCTCTACTTTTTGACGGTTTCCTTCTGCTATGAATGTGCATGTCCAGT TGTCTGCTTCTTAGAACTGATATTTACCTTCCTCATCCATCAGCCATTGGAGGAGGACTGGGACCGCTCA GATTATTGATCTGACCCATTCTTTCGGCAGGGTTTCCTGGTGGCTGTCTTCCATCACCAAAACTGGAATC AGAAGAGTTTCCATAGCCCTTTTTTTTTCCCCACATCTTTGCTGAAGCAGAGTTTTGAAAAACAAAACCA CAAACTAAGCTATTCCCCAGAAGAAATCTGTAATCAAAGATAAGCTCTGCCGGGCACAGTGGCTCACGCC TTTTGGAGGCCAAGGCGGGCGGATCACCTGAGGTCAGGAGTTCTAGACCTGCCAGGCCAACATGGTAAAA CCTCATCTCTACTAAAAATACAAAAATTAGCTAGATGTGGTGGTGGGTACCTGTAGTCTCAGCTACCTGG GAGGCTGAGGCAAGAGAATCGCTTGAACCTGGGAAGTAGAGGTTGCAGTGAGCCGAGATTGCACCACTGC ACTCCAGCCTGGGCGACGGAGTGAGACGACCTCACAAAAATTTACATAAATAAAATGAAAAGTAAAATAA AAATACAAAAGTTGGCCGGGTGCGTTTGCTCACGCCTGTAATCCCAGCACTTTGGGAGGGTGAGGCAGGC AGATAATGAGGTAAGAAGATCGAGACCATCCTGGCTAACACGGTGAAACCCTGTCTCTACTAAAAATACA AAAAATTAGCTGTGCGTGGTGACACGCACCTGTAGTCCCAGCTATTTGGGAGGCTGAGGCAGGAGAATCA CTTGAACCTGGGAGGTGGAGGTTGCAGTGAGCCGAGATCGCACCACTGCACTCCAGCCTGGGCCACAGAG TGAGACTCCATCTTGAAAAAAAAAAAAAATACAAAAGTTAGCCAGGGGTGTTGGTGGGTGCCTGTAATCC CAGCTATTTGGGAGGCTAAGGCAGAAGAATTTCTTGAACCTAGGAAACGGAGGTTGCAGTGAGCCGAGAT CACACCTCTGTACTCCAGCCTGGACAACAGAGCGAGACTTTGTCTCAAAAAAAAAAAAAAAAAAAAAACT AAATAGGCCGGGAGCAGTGGCTCATGCCTATAATCCCAGCCCTTTGGGAGGCCAAGGCAGGTGGATCACT TGAGGTCAGGAGTTTGAGACCAGGCTGGCCAACATGGTGTAACCCCGTCTCTACTAAAAACACAAAAATT AGCCGGGTCTGGTGGCGTATGTCTGTAATCCCAGCTACTCGGGAGGCTGAGGCAGGAGAATCACTTAAAC CTGGGAGGCAGGGGTTGCAGTGAGCTGAGATCGTGCCACTGCACTCTAGCCAGGGTGACAGAGTGAAACT CTGTCTCAAAAAATTAAAAAAGAAATTCAGCAAGTAATGAGTTAAGGAATTCGAATATTAAGGCGAGTGA CAAGGAACGCCCAGGATGTGGCCCAGGATGGAGTAGGGGGGACACTCATTTAGGAGAAAGCTCAGGCCAC AAGACAGGAGGAGCCAGCCTTGTTGGGGTTGAAGGGAAGAGCATTCCAGGCTGAGGGAACTGCAAGGCGT TTGCATGGGACACTATGGGATGGCTTCTGCCCTTGGTGGGCAGCCTCTGGTCTGAGGCCATTCTTTGGCC TCTTGAACTCCTGGGCTCAAGTGATCCTCCCATCTCGGCCTCCCAAAATGCTGGGATTACAGGTGGGAGC CGCGCCCAGGTGGATTTTTGTCTGACTCTGTTCATTCCTGTGTCCCCAGTACCTGGAAGGACGCCAAGCA This email has been scanned for all viruses by the MessageLabs Email Security System. From LeeMH at neuropeds.ucsf.edu Tue Mar 4 14:13:10 2008 From: LeeMH at neuropeds.ucsf.edu (Lee, Michaela) Date: Tue, 4 Mar 2008 14:13:10 -0800 Subject: [Genome] GenBank IDs Message-ID: I don't know how to generate a table with only GenBank IDs for a given chromosomal region. Any suggestions? Thanks. From v.tropepe at utoronto.ca Tue Mar 4 13:39:17 2008 From: v.tropepe at utoronto.ca (Vince Tropepe) Date: Tue, 4 Mar 2008 16:39:17 -0500 Subject: [Genome] Segment duplication on zebrafish chr 16? Message-ID: <406C019E-CFBB-465C-82F7-2731A93120B9@utoronto.ca> Hi, We are in the process of mapping a mutation in zebrafish. We though we had mapped the mutation to chr 16 matching contig BX255877.14. However, upon closer inspection it seems that some of the genes on this contig (e.g. BC092825, BC076364, BC096802) are predicted to also be present on contig CU104697.5, based on sequence comparisons with the chained tBLASTn and zebrafish refseq genes. This is the interval I am focusing on: chr16:5,740,301-6,199,660 (zebrafish danRer5 July 2007 assembly). My question is whether there is a real segment duplication in this region. Thanks for your help! vince ------ Vince Tropepe Department of Cell & Systems Biology University of Toronto 25 Harbord Street Toronto, ON, M5S 3G5 Canada T: 416-946-0338 F: 416-978-8532 v.tropepe at utoronto.ca From yang_shan88 at yahoo.com Tue Mar 4 13:07:38 2008 From: yang_shan88 at yahoo.com (Shan Yang) Date: Tue, 4 Mar 2008 13:07:38 -0800 (PST) Subject: [Genome] shade of repeat elements? Message-ID: <773808.59853.qm@web82712.mail.mud.yahoo.com> Hi, When I display repeat track in its "full" mode, I noticed that different repeats of the same family (e.g. LINEs) have different shade. Some are darker than others. Do they have any meanings? Thanks! Shan ____________________________________________________________________________________ Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ From rhead at soe.ucsc.edu Tue Mar 4 14:50:02 2008 From: rhead at soe.ucsc.edu (Brooke Rhead) Date: Tue, 04 Mar 2008 14:50:02 -0800 Subject: [Genome] shade of repeat elements? In-Reply-To: <773808.59853.qm@web82712.mail.mud.yahoo.com> References: <773808.59853.qm@web82712.mail.mud.yahoo.com> Message-ID: <47CDD21A.7030502@soe.ucsc.edu> Hello Shan, From the RepeatMasker details page: "The level of color shading in the graphical display reflects the amount of base mismatch, base deletion, and base insertion associated with a repeat element. The higher the combined number of these, the lighter the shading." (You can get to the details page by either clicking an item in the track, clicking on the blue track name in the Genome Browser, or clicking the blue or gray "mini-button" to the left of the track in the display.) I hope this information helps. -- Brooke Rhead UCSC Genome Bioinformatics Group Shan Yang wrote: > Hi, > > When I display repeat track in its "full" mode, I noticed that different repeats of the same family (e.g. LINEs) have different shade. Some are darker than others. Do they have any meanings? > > Thanks! > > Shan > > > > > ____________________________________________________________________________________ > Be a better friend, newshound, and > know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From mtong at seralogix.com Tue Mar 4 14:50:44 2008 From: mtong at seralogix.com (Mark Tong) Date: Tue, 4 Mar 2008 16:50:44 -0600 Subject: [Genome] A question on chromsome band Message-ID: <000a01c87e4a$2e258670$bd0211ac@sanger> Hi there, I search gene ALDOA with the following information: chr16:29,971,992-29,989,236 The genome browser tell me this gene is on Chromosome 16 p11.2, and this is consitenct with the data in table cytoband. However, if you check any NCBI web site (entrez, omim, etc, ), it will tell you this gene is on chromosome 16 band q22-24. I'm not particular about the subband, the p and q make a big difference. Could any one help me explain this? Thanks in advance, Mark Tong, Ph.D. From yang_shan88 at yahoo.com Tue Mar 4 15:36:48 2008 From: yang_shan88 at yahoo.com (Shan Yang) Date: Tue, 4 Mar 2008 15:36:48 -0800 (PST) Subject: [Genome] Bug in table browser? Message-ID: <619140.96571.qm@web82703.mail.mud.yahoo.com> Hi, I want to get all gap coordinates of hg18 through table browser. So I chose table as "Gap" and region as "genome" and then clicked "get output". But I found that there is no information for chrX, chrY and chrM. Not believing the result, I did the same thing and chose "position" and typed in "chrX". The query did return some gaps on chrX. The same is true for "chrY". And chrM seems have no gap. So I am wondering if there is a bug in the table browser so that when query for "genome", chrX and chrY are not included, which I think it should. The results are quite misleading. In this case, make me think that there is no gap in sex chromosomes. So is there a bug or I did something wrong? Thanks! Shan ____________________________________________________________________________________ Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ From yang_shan88 at yahoo.com Tue Mar 4 15:41:47 2008 From: yang_shan88 at yahoo.com (Shan Yang) Date: Tue, 4 Mar 2008 15:41:47 -0800 (PST) Subject: [Genome] Bug in table browser? Message-ID: <777742.3696.qm@web82706.mail.mud.yahoo.com> Sorry, I take back this question. I saw data for chrX and chrY in the results. They are between chr9 and chr10. (quite unusual place!) Shan ----- Original Message ---- From: Shan Yang To: genome at soe.ucsc.edu Sent: Tuesday, March 4, 2008 3:36:48 PM Subject: Bug in table browser? Hi, I want to get all gap coordinates of hg18 through table browser. So I chose table as "Gap" and region as "genome" and then clicked "get output". But I found that there is no information for chrX, chrY and chrM. Not believing the result, I did the same thing and chose "position" and typed in "chrX". The query did return some gaps on chrX. The same is true for "chrY". And chrM seems have no gap. So I am wondering if there is a bug in the table browser so that when query for "genome", chrX and chrY are not included, which I think it should. The results are quite misleading. In this case, make me think that there is no gap in sex chromosomes. So is there a bug or I did something wrong? Thanks! Shan ____________________________________________________________________________________ Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ ____________________________________________________________________________________ Never miss a thing. Make Yahoo your home page. http://www.yahoo.com/r/hs From rhead at soe.ucsc.edu Tue Mar 4 16:31:52 2008 From: rhead at soe.ucsc.edu (Brooke Rhead) Date: Tue, 04 Mar 2008 16:31:52 -0800 Subject: [Genome] Output interpretation help In-Reply-To: References: Message-ID: <47CDE9F8.4040904@soe.ucsc.edu> Hello Ester, For help learning how to use the Genome Browser, please see the online tutorials and training materials on the Open Helix web site: http://www.openhelix.com/downloads/ucsc/ucsc_home.shtml Additionally, please see our user's guide, here: http://genome.ucsc.edu/goldenPath/help/hgTracksHelp.html and our frequently asked questions page: http://genome.ucsc.edu/FAQ/ The materials at these links should help you get started using the Genome Browser. If you still have questions, please feel free to contact us again at the genome mailing list address. -- Brooke Rhead UCSC Genome Bioinformatics Group Gonzales-Sanchez, Ester wrote: > Dear Webmaster. > I need some help to be able to interpret the output of UCSC.I do not understand the output very well. > I have added the output of the molecule sequence I am interested in.How can I comment and interpret this output?.Adhesion molecule , Icam-1. > TWhat to bars and arrows means ?Blue,green,red and black colours.How can I interepret SNPs? > position/search size 14,382 bp. > > > > > > move start > Click on a feature for details. Click on base position to zoom in around cursor. Click gray/blue > >> ref|NC_000019.8|NC_000019:10237779-10242779 Homo sapiens chromosome 19, reference assembly, complete sequence > > GTCTCGAACTCCTGACCTCAGGTGATTCTCCTGCCTTGGCCTCCCAAAGTGCTGAGATTACAGGTGTGAG > > CCACTGCACACGGCCTTAAATTTTATTTATTATTTATTTATTTATTTATTTAGAGACTTAGTCTCACTCT > > GTTGCCCAGGCTGGAGTGCAGTGGCATGGTCTCGGCTCACTGCACTCCACCTCCTGGGTTCACGCCATTC > > TCCTGCCTCAGCCTCCCGAGTAGCTGGGACTACAGGCGCCCACCACCACTCCCGGCTAATTTTTGTATTT > > TTAGTAGAGATGGGGTTTCACTGTGTTAGCCAGGATAGTCTCGATCTCCTGACCTCGTGATCCGCCTGCC > > TCGGCCTCCCAAAGTGCTGGGATTACTTATTTTGTTTTTTGTAGAGACAGGTTCTCACTGTGTTGCCCAG > > GCTGGTCTTGAACTCCTGATCTCAAGTGATCTTCCCACCTCAGTCTCTCAAAGGGCTGGGATTACAGGGG > > TGAGCCACTGCACCCCACCTTCCCTCTACTTTTTGACGGTTTCCTTCTGCTATGAATGTGCATGTCCAGT > > TGTCTGCTTCTTAGAACTGATATTTACCTTCCTCATCCATCAGCCATTGGAGGAGGACTGGGACCGCTCA > > GATTATTGATCTGACCCATTCTTTCGGCAGGGTTTCCTGGTGGCTGTCTTCCATCACCAAAACTGGAATC > > AGAAGAGTTTCCATAGCCCTTTTTTTTTCCCCACATCTTTGCTGAAGCAGAGTTTTGAAAAACAAAACCA > > CAAACTAAGCTATTCCCCAGAAGAAATCTGTAATCAAAGATAAGCTCTGCCGGGCACAGTGGCTCACGCC > > TTTTGGAGGCCAAGGCGGGCGGATCACCTGAGGTCAGGAGTTCTAGACCTGCCAGGCCAACATGGTAAAA > > CCTCATCTCTACTAAAAATACAAAAATTAGCTAGATGTGGTGGTGGGTACCTGTAGTCTCAGCTACCTGG > > GAGGCTGAGGCAAGAGAATCGCTTGAACCTGGGAAGTAGAGGTTGCAGTGAGCCGAGATTGCACCACTGC > > ACTCCAGCCTGGGCGACGGAGTGAGACGACCTCACAAAAATTTACATAAATAAAATGAAAAGTAAAATAA > > AAATACAAAAGTTGGCCGGGTGCGTTTGCTCACGCCTGTAATCCCAGCACTTTGGGAGGGTGAGGCAGGC > > AGATAATGAGGTAAGAAGATCGAGACCATCCTGGCTAACACGGTGAAACCCTGTCTCTACTAAAAATACA > > AAAAATTAGCTGTGCGTGGTGACACGCACCTGTAGTCCCAGCTATTTGGGAGGCTGAGGCAGGAGAATCA > > CTTGAACCTGGGAGGTGGAGGTTGCAGTGAGCCGAGATCGCACCACTGCACTCCAGCCTGGGCCACAGAG > > TGAGACTCCATCTTGAAAAAAAAAAAAAATACAAAAGTTAGCCAGGGGTGTTGGTGGGTGCCTGTAATCC > > CAGCTATTTGGGAGGCTAAGGCAGAAGAATTTCTTGAACCTAGGAAACGGAGGTTGCAGTGAGCCGAGAT > > CACACCTCTGTACTCCAGCCTGGACAACAGAGCGAGACTTTGTCTCAAAAAAAAAAAAAAAAAAAAAACT > > AAATAGGCCGGGAGCAGTGGCTCATGCCTATAATCCCAGCCCTTTGGGAGGCCAAGGCAGGTGGATCACT > > TGAGGTCAGGAGTTTGAGACCAGGCTGGCCAACATGGTGTAACCCCGTCTCTACTAAAAACACAAAAATT > > AGCCGGGTCTGGTGGCGTATGTCTGTAATCCCAGCTACTCGGGAGGCTGAGGCAGGAGAATCACTTAAAC > > CTGGGAGGCAGGGGTTGCAGTGAGCTGAGATCGTGCCACTGCACTCTAGCCAGGGTGACAGAGTGAAACT > > CTGTCTCAAAAAATTAAAAAAGAAATTCAGCAAGTAATGAGTTAAGGAATTCGAATATTAAGGCGAGTGA > > CAAGGAACGCCCAGGATGTGGCCCAGGATGGAGTAGGGGGGACACTCATTTAGGAGAAAGCTCAGGCCAC > > AAGACAGGAGGAGCCAGCCTTGTTGGGGTTGAAGGGAAGAGCATTCCAGGCTGAGGGAACTGCAAGGCGT > > TTGCATGGGACACTATGGGATGGCTTCTGCCCTTGGTGGGCAGCCTCTGGTCTGAGGCCATTCTTTGGCC > > TCTTGAACTCCTGGGCTCAAGTGATCCTCCCATCTCGGCCTCCCAAAATGCTGGGATTACAGGTGGGAGC > > CGCGCCCAGGTGGATTTTTGTCTGACTCTGTTCATTCCTGTGTCCCCAGTACCTGGAAGGACGCCAAGCA > > > > > This email has been scanned for all viruses by the MessageLabs Email > Security System. > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From rhead at soe.ucsc.edu Tue Mar 4 16:48:33 2008 From: rhead at soe.ucsc.edu (Brooke Rhead) Date: Tue, 04 Mar 2008 16:48:33 -0800 Subject: [Genome] GenBank IDs In-Reply-To: References: Message-ID: <47CDEDE1.3010408@soe.ucsc.edu> Hello Michaela, You can use the Table Browser's "selected fields from primary and related tables" output format option to generate a table with only the information you need. (Hit the blue "Tables" link at the top of the page to get to the Table Browser.) There are several ways to define a chromosomal region of interest in the Table Browser. If there is only one region, paste it in the "position" box. If there are multiple regions of interest (but less than 1,000), you can use the "define regions" button in the Table Browser. For more than 1,000 regions, you can create a custom track to define the regions and intersect it with the GenBank track you wish to use. Detailed instructions on using the Table Browser are here: http://genome.ucsc.edu/goldenPath/help/hgTablesHelp.html There are several tracks associated with data from GenBank, including: RefSeq Genes Other RefSeq mRNAs Spliced ESTs ESTs Other mRNAs Other ESTs The table you select in the Table Browser will depend on which track you would like to retrieve data from. I hope this information helps. If you have further questions, please feel free to contact us again at the Genome mailing list address. -- Brooke Rhead UCSC Genome Bioinformatics Group Lee, Michaela wrote: > I don't know how to generate a table with only GenBank IDs for a > given chromosomal region. Any suggestions? Thanks. > > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From rhead at soe.ucsc.edu Tue Mar 4 17:30:18 2008 From: rhead at soe.ucsc.edu (Brooke Rhead) Date: Tue, 04 Mar 2008 17:30:18 -0800 Subject: [Genome] can blastz be used for a single fasta sequence file vs. multiple fasta sequence file? In-Reply-To: <889F4F6D1CFC9945B3610F97CD400C15015323@PIGEONS-MA.imcb.a-star.edu.sg> References: <47C6F224.7020007@charite.de> <889F4F6D1CFC9945B3610F97CD400C15015323@PIGEONS-MA.imcb.a-star.edu.sg> Message-ID: <47CDF7AA.5060106@soe.ucsc.edu> Hello JL, You are correct, blastz does not work with MAF input. You can use multiz instead. -- Brooke Rhead UCSC Genome Bioinformatics Group Wang JianLi wrote: > > Dear all, > > I am not sure if blastz could be used for a single fasta sequence > file vs. a multiple fasta sequence file. Because when I use lav2maf, I found the fasta headers from the multiple fasta sequence file were not recognized and in the maf file, all the fasta sequences end up with the same name. > Thanks. > > Regards, > JL > > Note: This message may contain confidential information. If this > email has been sent to you by mistakes, please notify the sender and delete it immediately. Thank you. > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From rhead at soe.ucsc.edu Tue Mar 4 20:44:27 2008 From: rhead at soe.ucsc.edu (Brooke Rhead) Date: Tue, 04 Mar 2008 20:44:27 -0800 Subject: [Genome] A question on chromsome band In-Reply-To: <000a01c87e4a$2e258670$bd0211ac@sanger> References: <000a01c87e4a$2e258670$bd0211ac@sanger> Message-ID: <47CE252B.7060706@soe.ucsc.edu> Hello Mark, I also see the ALDOA gene listed at NCBI on 16q22-24. On OMIM (http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Search&db=OMIM&term=103850), I see a link to "Gene map locus 16q22-q24", in the Entrez record (here: http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&id=34577111), I see '/map="16q22-q24"' in the "source" section. However, I also looked at the gene in NCBI's Map Viewer and Sequence Viewer, and they display ALDOA in the exact same position as the Genome Browser, on chr16:29,971,973-29,989,236, within 16p11.2: http://www.ncbi.nlm.nih.gov/projects/sviewer/?id=NC_000016.8&v=29971992-29989236 Ensembl GeneView also shows ALDOA in this region: http://www.ensembl.org/Homo_sapiens/geneview?gene=ENSG00000149925 I do not know why NCBI has contradictory information on this gene's location. I suggest contacting their helpdesk at info at ncbi.nlm.nih.gov. (And if you find a good explanation, let us know!) -- Brooke Rhead UCSC Genome Bioinformatics Group Mark Tong wrote: > Hi there, > I search gene ALDOA with the following information: > chr16:29,971,992-29,989,236 > The genome browser tell me this gene is on Chromosome 16 p11.2, and this is consitenct with the data in table cytoband. > > However, if you check any NCBI web site (entrez, omim, etc, ), it > will tell you this gene is on chromosome 16 band q22-24. > I'm not particular about the subband, the p and q make a big > difference. > > Could any one help me explain this? > > Thanks in advance, > > > Mark Tong, Ph.D. > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From rhead at soe.ucsc.edu Tue Mar 4 21:00:06 2008 From: rhead at soe.ucsc.edu (Brooke Rhead) Date: Tue, 04 Mar 2008 21:00:06 -0800 Subject: [Genome] further questions about causes of duplicated gene In-Reply-To: <5dff5a0d0803030728x3098e8faq9c16eb3a11627b42@mail.gmail.com> References: <5dff5a0d0803030728x3098e8faq9c16eb3a11627b42@mail.gmail.com> Message-ID: <47CE28D6.90003@soe.ucsc.edu> Hello Andrew, We do not store RefSeq's mappings in our database, so there is not a quick way of finding the RefSeq mappings with our data alone. You would need to get the information directly from RefSeq. However, one Genome Browser track that might be useful to you is the "CCDS" track, which contains the protein coding genes that RefSeq and Ensembl agree on. Unlike RefSeq mRNAs, which we align, these are agreed upon mappings. I hope this information is helpful. -- Brooke Rhead UCSC Genome Bioinformatics Group Andrew Yee wrote: > When I search for "DUX4", there are two locations listed for the reference > gene NM_033178, on chromosome 4 and 10. > > I've read through the FAQ on > http://genome.ucsc.edu/FAQ/FAQtracks#tracks9 discussing > "cause of duplicated gene." > > But is there a way to know ahead of time which one is the "true" coding > sequence or which one is considered the coding sequence by RefSeq without > having to manually examine the Genome Browser details of each result? The > entry in NCBI has it listed on chromosome 4 (see > http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&id=119220600). > > Thanks, > Andrew > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From n.tran at centenary.org.au Tue Mar 4 20:58:44 2008 From: n.tran at centenary.org.au (nham tran) Date: Wed, 05 Mar 2008 15:58:44 +1100 Subject: [Genome] Sense/antisense genes In-Reply-To: <4669D7FC.9080204@cse.ucsc.edu> References: <4669D7FC.9080204@cse.ucsc.edu> Message-ID: <47CE2884.5050506@centenary.org.au> Dear Ann, I am a novice trying to find antisense and sense pairs. I search the archives and found your advice on the archives. Can i ask you some specific questions about the task. Step One. Make a Custom Track with all of the genes on the positive strand. 1. Navigate to the Table Browser ("Tables" in the blue navigation bar across the top of the browser). 2. Configure like so: clade: vertebrate genome: human assembly: Mar. 2006 (assuming you want the latest human assembly) group: Genes and Gene Prediction Tracks track: UCSC Genes (or choose another gene track here) table: knownGene region: genome 3. filter: strand does match + 4. output format: custom track 5. name the custom track (ie sense) *6. HIT the output button* 6. get the custom track in table browser. *This will take you back to the table browser.* Now your custom track of all positive-strand genes is available in both the Table Browser and for viewing in the Genome Browser. *I cannot see the custom track in the table browers* Step Two. Repeat step one for genes on the negative strand,* but for step 3. filter: strand does match - (is this correct)* Step Three. Intersect the positive and negative Custom Tracks. 1. Use the Table Browser again. 2. Choose the positive-strand custom track from the table browser controls. *Here is the problem, I cannot find the name tracks "sense" or antisense in the table browers. Can you help. * 3. Press the intersect button and choose the negative-strand custom track from this page. Leave the first choice checked "All AAA records that have any overlap with BBB". Press submit. Many thanks Nham -- Nham Tran PhD Address: Vascular Biology (Room 5.39) Centenary Institute of Cancer Medicine and Cell Biology University of Sydney NSW 2042 Phone 61 2 9565 6226 From mtong at seralogix.com Wed Mar 5 06:44:15 2008 From: mtong at seralogix.com (Mark Tong) Date: Wed, 5 Mar 2008 08:44:15 -0600 Subject: [Genome] A question on chromsome band References: <000a01c87e4a$2e258670$bd0211ac@sanger> <47CE252B.7060706@soe.ucsc.edu> Message-ID: <000801c87ecf$6298cbd0$bd0211ac@sanger> Thanks, Brooke. I see your point. I will ask NCBI about the differences. Just for your info, this is not about Aldoa: there are lots of genes with contradictory banding information. Mark ----- Original Message ----- From: "Brooke Rhead" To: "Mark Tong" Cc: Sent: Tuesday, March 04, 2008 10:44 PM Subject: Re: [Genome] A question on chromsome band > Hello Mark, > > I also see the ALDOA gene listed at NCBI on 16q22-24. On OMIM > (http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Search&db=OMIM&term=103850), > I see a link to "Gene map locus 16q22-q24", in the Entrez record (here: > http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&id=34577111), I > see '/map="16q22-q24"' in the "source" section. > > However, I also looked at the gene in NCBI's Map Viewer and Sequence > Viewer, and they display ALDOA in the exact same position as the Genome > Browser, on chr16:29,971,973-29,989,236, within 16p11.2: > http://www.ncbi.nlm.nih.gov/projects/sviewer/?id=NC_000016.8&v=29971992-29989236 > > Ensembl GeneView also shows ALDOA in this region: > http://www.ensembl.org/Homo_sapiens/geneview?gene=ENSG00000149925 > > I do not know why NCBI has contradictory information on this gene's > location. I suggest contacting their helpdesk at info at ncbi.nlm.nih.gov. > (And if you find a good explanation, let us know!) > > -- > Brooke Rhead > UCSC Genome Bioinformatics Group > > > > Mark Tong wrote: >> Hi there, I search gene ALDOA with the following information: >> chr16:29,971,992-29,989,236 >> The genome browser tell me this gene is on Chromosome 16 p11.2, and > this is consitenct with the data in table cytoband. >> >> However, if you check any NCBI web site (entrez, omim, etc, ), it >> will tell you this gene is on chromosome 16 band q22-24. >> I'm not particular about the subband, the p and q make a big >> difference. >> >> Could any one help me explain this? >> >> Thanks in advance, Mark Tong, Ph.D. >> _______________________________________________ >> Genome maillist - Genome at soe.ucsc.edu >> http://www.soe.ucsc.edu/mailman/listinfo/genome > From yongshengbaicool at gmail.com Wed Mar 5 09:58:36 2008 From: yongshengbaicool at gmail.com (Yongsheng Bai) Date: Wed, 5 Mar 2008 11:58:36 -0600 Subject: [Genome] genome-mirror@soe.ucsc.edu Message-ID: <8fda134a0803050958m5bea1b81k28516e0a4b3d5458@mail.gmail.com> Hi, Could you please tell me how to determine the "ChromEnd" for the following tables from your website, since there is only "ChromStart" for them? > * Bipolar disorder > * Coronary artery disease > * Crohn's disease > * Hypertension > * Rheumatoid arthritis > * Type 1 diabetes > * Type 2 diabetes Same question for "NIMH Bipolar Disease"? > > * US (European Descent) ? 461 cases in 7 pools, 563 controls in 9 pools > * German ? 772 cases in 13 pools, 876 controls in 10 pools Thanks a lot. Yongsheng Bai, Ph.D. From barbj at mail.nih.gov Wed Mar 5 10:12:10 2008 From: barbj at mail.nih.gov (Barb, Jennifer (NIH/CIT) [E]) Date: Wed, 5 Mar 2008 13:12:10 -0500 Subject: [Genome] custom tracks Message-ID: <08BFEF2D7CC3104FA411B6E1991200C2826E5E@NIHCESMLBX3.nih.gov> Hi, Does anyone know of any sort of online tutorial for setting up custom tracks for microarrays? I read the GenomeWiki file but it still is a little bit fuzzy to me. Thanks, Jennifer From gareth.wilson at cancer.ucl.ac.uk Wed Mar 5 10:00:07 2008 From: gareth.wilson at cancer.ucl.ac.uk (Gareth Wilson) Date: Wed, 5 Mar 2008 18:00:07 -0000 Subject: [Genome] liftOver file Message-ID: <595C4E81822E244DB8853767F6A47A970BF5C8@pc6-13.pogb.cancer.ucl.ac.uk> Hello, I'm currently trying to convert my sequence coordinates from mm7 (Mouse build 35) to mm9 (Mouse build 37). I couldn't see the necessary file on your website. Do you happen to have it? Many Thanks Gareth. ----- Dr Gareth A Wilson Bioinformatician Medical Genomics Group UCL Cancer Institute Paul O'Gorman Building University College London 72 Huntley Street London WC1E 6BT tel: +44 (0) 20 7679 0999 ----- ********************************************************************** This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the system manager (it.support at cancer.ucl.ac.uk). ********************************************************************** From mccarthk at mail.nih.gov Wed Mar 5 10:12:00 2008 From: mccarthk at mail.nih.gov (McCarthy, Kateri (NIH/NICHD) [F]) Date: Wed, 5 Mar 2008 13:12:00 -0500 Subject: [Genome] Stumped Message-ID: To Whom It May Concern: I have a question and I don't know if you might be able to help me out. I am trying to find homologies between pseudoautosomal-region genes on the X chromosome to the cat. I'm using "Cat Chained Alignments" but I am concerned I don't see a chromosomal position for the scaffolds that are aligning. I would hope the homologs on the cat are located on the cat's X chromosome - but I can't seem to find a chromosome position. How can I determine the chromosomal position on the segments that are aligning to the human X chromosome gene? For example, I am aligned CD99 using the Cat Chained Alignment (I CAT Blat'ed the human CD99 sequence to get a quantitative sense of its alignment) - but how do I know where it's located? My assumption is that it should be on the cat's X as well. I hope you understand the problem. If you can offer any assistance I would be very grateful! Thank you for your time, Kateri From ann at soe.ucsc.edu Wed Mar 5 10:32:08 2008 From: ann at soe.ucsc.edu (Ann Zweig) Date: Wed, 05 Mar 2008 10:32:08 -0800 Subject: [Genome] liftOver file In-Reply-To: <595C4E81822E244DB8853767F6A47A970BF5C8@pc6-13.pogb.cancer.ucl.ac.uk> References: <595C4E81822E244DB8853767F6A47A970BF5C8@pc6-13.pogb.cancer.ucl.ac.uk> Message-ID: <47CEE728.9070908@cse.ucsc.edu> Hello Gareth, We do not have that file. However, I do have a solution that will work for you. You can do a two-step liftOver: first lift from mm7 to mm8, then from mm8 to mm9. Both of these files are available. This will give you the results you are looking for. I hope this information is helpful to you. Please don't hesitate to contact the mail list again if you require further assistance. Regards, ---------- Ann Zweig UCSC Genome Bioinformatics Group http://genome.ucsc.edu Please feel free to search the Genome mailing list archives by visiting our home page, clicking on "Contact Us", then typing a word or phrase into the search box. On that same page (http://genome.ucsc.edu/contacts.html), you can subscribe to the Genome mailing list. Gareth Wilson wrote: > Hello, > > I'm currently trying to convert my sequence coordinates from mm7 (Mouse > build 35) to mm9 (Mouse build 37). I couldn't see the necessary file on > your website. Do you happen to have it? > > Many Thanks > > Gareth. > > ----- > Dr Gareth A Wilson > > Bioinformatician > Medical Genomics Group > > UCL Cancer Institute > Paul O'Gorman Building > University College London > 72 Huntley Street > London > WC1E 6BT > > tel: +44 (0) 20 7679 0999 > ----- > > > ********************************************************************** > This email and any files transmitted with it are confidential and > intended solely for the use of the individual or entity to whom they > are addressed. If you have received this email in error please notify > the system manager (it.support at cancer.ucl.ac.uk). > ********************************************************************** > > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From ann at soe.ucsc.edu Wed Mar 5 10:41:37 2008 From: ann at soe.ucsc.edu (Ann Zweig) Date: Wed, 05 Mar 2008 10:41:37 -0800 Subject: [Genome] custom tracks In-Reply-To: <08BFEF2D7CC3104FA411B6E1991200C2826E5E@NIHCESMLBX3.nih.gov> References: <08BFEF2D7CC3104FA411B6E1991200C2826E5E@NIHCESMLBX3.nih.gov> Message-ID: <47CEE961.4040905@cse.ucsc.edu> Hello Jennifer, What you read is the best on-line help we have: http://genomewiki.cse.ucsc.edu/index.php/Microarray_track If you also need to learn about setting up custom tracks in general, see this help page as well: http://hgwdev.cse.ucsc.edu/goldenPath/help/hgTracksHelp.html#CustomTracks After reading these two help pages, if you still have questions, please feel free to ask specific questions to the list, or even send some sample data directly to me. We will help you troubleshoot as best we can. Regards, ---------- Ann Zweig UCSC Genome Bioinformatics Group http://genome.ucsc.edu Please feel free to search the Genome mailing list archives by visiting our home page, clicking on "Contact Us", then typing a word or phrase into the search box. On that same page (http://genome.ucsc.edu/contacts.html), you can subscribe to the Genome mailing list. Barb, Jennifer (NIH/CIT) [E] wrote: > Hi, > > Does anyone know of any sort of online tutorial for setting up custom > tracks for microarrays? I read the GenomeWiki file but it still is a > little bit fuzzy to me. > > Thanks, > > Jennifer > > > > > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From ann at soe.ucsc.edu Wed Mar 5 11:00:33 2008 From: ann at soe.ucsc.edu (Ann Zweig) Date: Wed, 05 Mar 2008 11:00:33 -0800 Subject: [Genome] Stumped In-Reply-To: References: Message-ID: <47CEEDD1.2000201@cse.ucsc.edu> Hello Kateri, As you probably noticed in the "Cat Chained Alignments" track on the human assembly, all you see are scaffolds (not chromosomes). This is because the entire cat assembly (felCat3) is still on scaffolds; that is, it has not yet been assembled onto chromosomes. You may be able to find more information about the cat genome on these sites: http://home.ncifcrf.gov/ccr/lgd/comparative_genome/catgenome/index_n.asp GARfield browser: http://lgd.abcc.ncifcrf.gov/cgi-bin/gbrowse/cat/ I hope this information is helpful to you. Please don't hesitate to contact the mail list again if you require further assistance. Regards, ---------- Ann Zweig UCSC Genome Bioinformatics Group http://genome.ucsc.edu McCarthy, Kateri (NIH/NICHD) [F] wrote: > To Whom It May Concern: > > > > I have a question and I don't know if you might be able to help me out. > I am trying to find homologies between pseudoautosomal-region genes on > the X chromosome to the cat. I'm using "Cat Chained Alignments" but I > am concerned I don't see a chromosomal position for the scaffolds that > are aligning. I would hope the homologs on the cat are located on the > cat's X chromosome - but I can't seem to find a chromosome position. > How can I determine the chromosomal position on the segments that are > aligning to the human X chromosome gene? For example, I am aligned CD99 > using the Cat Chained Alignment (I CAT Blat'ed the human CD99 sequence > to get a quantitative sense of its alignment) - but how do I know where > it's located? My assumption is that it should be on the cat's X as > well. > > > > I hope you understand the problem. If you can offer any assistance I > would be very grateful! > > > > Thank you for your time, > > > > Kateri > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From ann at soe.ucsc.edu Wed Mar 5 11:28:20 2008 From: ann at soe.ucsc.edu (Ann Zweig) Date: Wed, 05 Mar 2008 11:28:20 -0800 Subject: [Genome] genome-mirror@soe.ucsc.edu In-Reply-To: <8fda134a0803050958m5bea1b81k28516e0a4b3d5458@mail.gmail.com> References: <8fda134a0803050958m5bea1b81k28516e0a4b3d5458@mail.gmail.com> Message-ID: <47CEF454.5030905@cse.ucsc.edu> Hello Yongsheng Bai, The tracks that you are looking at contain genome-wide association data. Unlike many of our other tracks that annotate specific parts [start - stop] of the genome, these two tracks simply assign a value to certain points in the genome. They are meant to be viewed from a more zoomed-out viewpoint. You can use our Genome Graphs tool to view them (choose 'Genome Graphs' from the navigation bar on the left side of the home page). Genome Graphs is a tool for displaying genome-wide data sets such as the results of genome-wide SNP association studies, linkage studies and homozygosity mapping. Specifically, the Case Control Consortium tracks display the trend p-values (-log10) of the seven diseases reported by The Wellcome Trust Case Control Consortium. Reported p-values were taken for each of the Illumina550 probes on the genome. For visualization purposes, these p-values were logged and negated. For more details see the CCC website: http://www.wtccc.org.uk/ Likewise for the NIMH tracks: http://nimhgenetics.org/ I hope this information is helpful to you. Please don't hesitate to contact the mail list again if you require further assistance. Regards, ---------- Ann Zweig UCSC Genome Bioinformatics Group http://genome.ucsc.edu Please feel free to search the Genome mailing list archives by visiting our home page, clicking on "Contact Us", then typing a word or phrase into the search box. On that same page (http://genome.ucsc.edu/contacts.html), you can subscribe to the Genome mailing list. Yongsheng Bai wrote: > Hi, > > Could you please tell me how to determine the "ChromEnd" for the following > tables from your website, since there is only "ChromStart" for them? > >> * Bipolar disorder >> * Coronary artery disease >> * Crohn's disease >> * Hypertension >> * Rheumatoid arthritis >> * Type 1 diabetes >> * Type 2 diabetes > > Same question for "NIMH Bipolar Disease"? >> * US (European Descent) ? 461 cases in 7 pools, 563 controls in 9 > pools >> * German ? 772 cases in 13 pools, 876 controls in 10 pools > > Thanks a lot. > > Yongsheng Bai, Ph.D. > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From grobertson at bcgsc.ca Wed Mar 5 11:14:24 2008 From: grobertson at bcgsc.ca (Gordon Robertson) Date: Wed, 05 Mar 2008 11:14:24 -0800 Subject: [Genome] Minor Web LIftOver suggestion Message-ID: In the Web LiftOver UI, it would be helpful if the 'Original Assembly' and 'New Assembly' lists included an assembly's 'name' as well as its date. E.g. for Mouse, instead of 'May 2004', display 'May 2004 (mm5)'. G --- Gordon Robertson Gene Regulation Informatics Canada's Michael Smith Genome Sciences Centre Vancouver BC Canada www.bcgsc.ca grobertson at bcgsc.ca From ann at soe.ucsc.edu Wed Mar 5 11:32:23 2008 From: ann at soe.ucsc.edu (Ann Zweig) Date: Wed, 05 Mar 2008 11:32:23 -0800 Subject: [Genome] Minor Web LIftOver suggestion In-Reply-To: References: Message-ID: <47CEF547.5040503@cse.ucsc.edu> Hello Gordon, This is an excellent suggestion! I agree that this would be very helpful to users and I will pass it along to our developers. Regards, ---------- Ann Zweig UCSC Genome Bioinformatics Group http://genome.ucsc.edu Gordon Robertson wrote: > In the Web LiftOver UI, it would be helpful if the 'Original Assembly' and > 'New Assembly' lists included an assembly's 'name' as well as its date. E.g. > for Mouse, instead of 'May 2004', display 'May 2004 (mm5)'. > > G > --- > Gordon Robertson > Gene Regulation Informatics > Canada's Michael Smith Genome Sciences Centre > Vancouver BC Canada > www.bcgsc.ca > grobertson at bcgsc.ca > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From ann at soe.ucsc.edu Wed Mar 5 11:53:17 2008 From: ann at soe.ucsc.edu (Ann Zweig) Date: Wed, 05 Mar 2008 11:53:17 -0800 Subject: [Genome] Sense/antisense genes In-Reply-To: <47CE2884.5050506@centenary.org.au> References: <4669D7FC.9080204@cse.ucsc.edu> <47CE2884.5050506@centenary.org.au> Message-ID: <47CEFA2D.4030001@cse.ucsc.edu> Hello Nham, First let me thank you for searching the archives before asking your question. I am able to follow the steps and see the Custom Track in the Table Browser that you are not able to see. Let me give you a few more details. After you set up the parameters (steps 1-4) press "get output". On the next page, name the custom track (step 5) then press the "get custom track in table browser" button. Your custom track will be available in the drop-down menu at the top of the Table Browser page. Change "group" to Custom Tracks, then choose your track from the list in the "track" list. The rest should fall out after you get this step set up correctly. I hope this information is helpful to you. Please don't hesitate to contact the mail list again if you require further assistance. Regards, ---------- Ann Zweig UCSC Genome Bioinformatics Group http://genome.ucsc.edu nham tran wrote: > Dear Ann, > > I am a novice trying to find antisense and sense pairs. I search the > archives and found your advice on the archives. Can i ask you some > specific questions about the task. > > Step One. Make a Custom Track with all of the genes on the positive strand. > > 1. Navigate to the Table Browser ("Tables" in the blue navigation bar > across the top of the browser). > > 2. Configure like so: > clade: vertebrate > genome: human > assembly: Mar. 2006 (assuming you want the latest human assembly) > group: Genes and Gene Prediction Tracks > track: UCSC Genes (or choose another gene track here) > table: knownGene > region: genome > > 3. filter: strand does match + > > 4. output format: custom track > > 5. name the custom track (ie sense) > > *6. HIT the output button* > > 6. get the custom track in table browser. *This will take you back to > the table browser.* Now your custom track of all > positive-strand genes is available in both the Table Browser and for > viewing in the Genome Browser. *I cannot see the custom track in the > table browers* > > > Step Two. Repeat step one for genes on the negative strand,* but for > step 3. filter: strand does match - (is this correct)* > > > Step Three. Intersect the positive and negative Custom Tracks. > 1. Use the Table Browser again. > > 2. Choose the positive-strand custom track from the table browser controls. > *Here is the problem, I cannot find the name tracks "sense" or antisense > in the table browers. Can you help. * > > 3. Press the intersect button and choose the negative-strand custom > track from this page. Leave the first choice checked "All AAA records > that have any overlap with BBB". Press submit. > > Many thanks > > Nham > From grobertson at bcgsc.ca Wed Mar 5 12:00:28 2008 From: grobertson at bcgsc.ca (Gordon Robertson) Date: Wed, 05 Mar 2008 12:00:28 -0800 Subject: [Genome] Request - WIG files can have overlapped regions Message-ID: I discussed this a few months ago, perhaps with Hriam, and understand that your current WIG functionality does not allow BED-format records to overlap. But such overlap occurs in many published datasets. The case I face today is a set of ~24k promoter regions in mouse (Barrera et al. 2008), which I lifted from mm5 to mm8 through your web interface. If I select out the scored BED records for a tissue from these data, and format the records as a BED-format WIG file, I cannot load the file, because at least one pair of records overlaps. I anticipate that changing the functionality of your code to permit overlap in all three WIG file types is nontrivial, but could I ask that you consider addressing this? Barrera LO, Li Z, Smith AD, Arden KC, Cavenee WK, Zhang MQ, Green RD, Ren B. 2008. Genome-wide mapping and analysis of active promoters in mouse embryonic stem cells and adult organs. Genome Res. 18(1):46-59. G --- Gordon Robertson Gene Regulation Informatics Canada's Michael Smith Genome Sciences Centre Vancouver BC Canada www.bcgsc.ca grobertson at bcgsc.ca From melissa.s.cline at gmail.com Wed Mar 5 15:12:47 2008 From: melissa.s.cline at gmail.com (Melissa Cline) Date: Wed, 5 Mar 2008 15:12:47 -0800 Subject: [Genome] Bemused by an "invalid signed number" error in liftOver Message-ID: <8c9747eb0803051512i643d32fek206dfaf9de97fdc8@mail.gmail.com> Hi, I was trying to use the liftOver utility to translate some mm9 coordinates to mm8. I got an error on the following line: chr10 102471840 102471903 Homeo_OAR(PF03826) 2.1e-09- 102471840 102471903 0 1 63 0 (note that all fields should be tab-delimited, whether or not the tabs have been preserved as tabs). When I enter this line into the "Paste in data:" field on the Lift Genome Annotations form, I get the following response: invalid signed number: "2.1e-09" Why is this number invalid? The BED file documentation says that the score field should contain a number between 0 and 1000, which it is, so I'm honestly unclear on what's wrong. What do I need to do to fix this? Thanks, Melissa From hiram at soe.ucsc.edu Wed Mar 5 15:37:49 2008 From: hiram at soe.ucsc.edu (Hiram Clawson) Date: Wed, 05 Mar 2008 15:37:49 -0800 Subject: [Genome] Bemused by an "invalid signed number" error in liftOver In-Reply-To: <8c9747eb0803051512i643d32fek206dfaf9de97fdc8@mail.gmail.com> References: <8c9747eb0803051512i643d32fek206dfaf9de97fdc8@mail.gmail.com> Message-ID: <47CF2ECD.8010000@soe.ucsc.edu> Good Afternoon Melissa: That score column is an integer, not a float. Only digits 0-9 allowed. --Hiram Melissa Cline wrote: > Hi, > > I was trying to use the liftOver utility to translate some mm9 coordinates > to mm8. I got an error on the following line: > > chr10 102471840 102471903 Homeo_OAR(PF03826) > 2.1e-09- 102471840 102471903 0 1 63 > 0 > (note that all fields should be tab-delimited, whether or not the tabs have > been preserved as tabs). > > When I enter this line into the "Paste in data:" field on the Lift Genome > Annotations form, I get the following response: > > invalid signed number: "2.1e-09" > > > Why is this number invalid? The BED file documentation says that the score > field should contain a number between 0 and 1000, which it is, so I'm > honestly unclear on what's wrong. What do I need to do to fix this? > > Thanks, > > Melissa > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome > From mdbiggin at lbl.gov Wed Mar 5 15:35:53 2008 From: mdbiggin at lbl.gov (Mark Biggin) Date: Wed, 5 Mar 2008 15:35:53 -0800 Subject: [Genome] Drosophila ChIP/chip Message-ID: <1D6DF502-28FB-4134-AEE4-8A13F0E01BA4@lbl.gov> Dear UCSC team, we wonder if you would be interested in including ChIP/ chip data produced by the Berkeley Drosophila Transcription Network Project as a track on your browser? For now we would suggest providing 1% and 25% False Discovery Rate thresholded data for 7 sequence specific transcription factors and RNA polymerase. Wiggle files of the 1% FDR data are at the links below, so you can get a sense of the data. (these are to Release 4 of the mel genome sequence). We could improve the appearance and add color if you agree to host these. http://rana.lbl.gov/~smacarth/ChIPChip/Public_Release/wiggle/bcd_1_012505-sym-1.wig http://rana.lbl.gov/~smacarth/ChIPChip/Public_Release/wiggle/bcd_2_092005-sym-1.wig http://rana.lbl.gov/~smacarth/ChIPChip/Public_Release/wiggle/cad_1_020107-sym-1.wig http://rana.lbl.gov/~smacarth/ChIPChip/Public_Release/wiggle/gt_2_020107-sym-1.wig http://rana.lbl.gov/~smacarth/ChIPChip/Public_Release/wiggle/hb_1_012505-sym-1.wig http://rana.lbl.gov/~smacarth/ChIPChip/Public_Release/wiggle/hb_2_092305-sym-1.wig http://rana.lbl.gov/~smacarth/ChIPChip/Public_Release/wiggle/kni_1_092706-sym-1.wig http://rana.lbl.gov/~smacarth/ChIPChip/Public_Release/wiggle/kni_2_092706-sym-1.wig http://rana.lbl.gov/~smacarth/ChIPChip/Public_Release/wiggle/kr_1_113005-sym-1.wig http://rana.lbl.gov/~smacarth/ChIPChip/Public_Release/wiggle/kr_2_113005-sym-1.wig http://rana.lbl.gov/~smacarth/ChIPChip/Public_Release/wiggle/polII_8WG16_092905-sym-1.wig http://rana.lbl.gov/~smacarth/ChIPChip/Public_Release/wiggle/z_2_051504-sym-1.wig Our data are particularly well justified and of high quality. In addition, our interpretation and analysis of the in vivo DNA binding data differ significantly from most others, emphasizing differences in the classes of region bound at different levels and showing that many weakly bound regions are likely to be non functional. This is explained in our recent PLoS Biology paper. http://biology.plosjournals.org/perlserv/?request=get-document&doi=10.1371%2Fjournal.pbio.0060027 I'd be happy to provide any further information you may require. We have data on a further 14 factors that we hope to release by the summer. sincerely mark biggin Mark Biggin Genomics Division Lawrence Berkeley National Laboratory Berkeley CA 94720 Phone (510) 486 7606 Fax (510) 486 4229 email mdbiggin at lbl.gov From rhead at soe.ucsc.edu Wed Mar 5 18:34:14 2008 From: rhead at soe.ucsc.edu (Brooke Rhead) Date: Wed, 05 Mar 2008 18:34:14 -0800 Subject: [Genome] Segment duplication on zebrafish chr 16? In-Reply-To: <406C019E-CFBB-465C-82F7-2731A93120B9@utoronto.ca> References: <406C019E-CFBB-465C-82F7-2731A93120B9@utoronto.ca> Message-ID: <47CF5826.2080900@soe.ucsc.edu> Hello Vince, I see the two adjacent sets of the genes you mention on the danRer5 assembly, but I can't really tell you if the duplication is "real" or not. The RefSeq Genes and Human Proteins (mapped by chained tBLASTn) tracks were both made here by aligning sequence to the zebrafish assembly. However, the assembly of the genomic sequence was done at Sanger: http://www.sanger.ac.uk/Projects/D_rerio/Zv7_assembly_information.shtml I see this note on their site: "This is still a *preliminary* assembly and there are a number of points to remember. The regions of the assembly covered by WGS contigs are of lower quality. In general regions which are highly variable do not form clusters since they are quite likely from different haplotypes. This also affects the generation of the physical map resulting in assembly dropouts and false duplications. In this assembly special attention has been paid to these issues and over 200 Mb of duplicated sequence has been removed compared to Zv6." Another thing to note is that our mapping of the genes you mention (BC092825, for instance) is only duplicated in the most recent zebrafish assembly (Zv7/danRer5). In our four previous assemblies, they only mapped to one location. I suggest contacting Sanger to see if they can provide better insight to what is occurring in this region. -- Brooke Rhead UCSC Genome Bioinformatics Group Vince Tropepe wrote: > Hi, > > We are in the process of mapping a mutation in zebrafish. We though > we had mapped the mutation to chr 16 matching contig BX255877.14. > However, upon closer inspection it seems that some of the genes on > this contig (e.g. BC092825, BC076364, BC096802) are predicted to also > be present on contig CU104697.5, based on sequence comparisons with > the chained tBLASTn and zebrafish refseq genes. > > This is the interval I am focusing on: chr16:5,740,301-6,199,660 > (zebrafish danRer5 July 2007 assembly). > > My question is whether there is a real segment duplication in this > region. > > Thanks for your help! > vince > > > ------ > Vince Tropepe > Department of Cell & Systems Biology > University of Toronto > 25 Harbord Street > Toronto, ON, M5S 3G5 > Canada > > T: 416-946-0338 > F: 416-978-8532 > v.tropepe at utoronto.ca > > > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From jlwang at imcb.a-star.edu.sg Wed Mar 5 22:52:08 2008 From: jlwang at imcb.a-star.edu.sg (Wang JianLi) Date: Thu, 6 Mar 2008 14:52:08 +0800 Subject: [Genome] correction of chunk coordinates in blastz alignments References: <47C6F224.7020007@charite.de> <889F4F6D1CFC9945B3610F97CD400C15015323@PIGEONS-MA.imcb.a-star.edu.sg> <47CDF7AA.5060106@soe.ucsc.edu> Message-ID: <889F4F6D1CFC9945B3610F97CD400C1501532B@PIGEONS-MA.imcb.a-star.edu.sg> Hi All, I split human genomic sequences into chunks with size 10M, and overlap size 10K, and run blastz to align each chunk with another genome. Then I have a number of .lav files after the blastz. I am not sure how I can piece them together, to correct the coordicates of the sequence chunks. As in each chunk, the sequence coordinate will start from 1, but in the original chromosome sequence, this coordinate should be adjusted by the size of chunks and overlap region. I read that blastz-normalizeLav written by Scott Schwartz can do that. Does anybody know where to get that script? Thank you. Regards, J.L Note: This message may contain confidential information. If this email has been sent to you by mistakes, please notify the sender and delete it immediately. Thank you. From liunasophia at gmail.com Wed Mar 5 17:56:36 2008 From: liunasophia at gmail.com (na liu) Date: Wed, 5 Mar 2008 20:56:36 -0500 Subject: [Genome] Get special sequences from UCSC browser Message-ID: Hi, professors: I want all the FB elements in Drosophila melanogaster from UCSC browser. How can I get the full information about them, including the coordinate information, etc. Look forward to your reply. Best Na From acromero at caltech.edu Wed Mar 5 22:45:31 2008 From: acromero at caltech.edu (Alex Romero) Date: Wed, 5 Mar 2008 22:45:31 -0800 Subject: [Genome] Source of genomic DNA Message-ID: Hello, I am doing an analysis of all animal genomes and an important part of it involves the source from which the genomic DNA was obtained. Is there anywhere where this is available? I was able to find it for some genomes here and there, but many of the genomes I could not find anything. Thank you. Alex Romero From bush at HMC.Edu Wed Mar 5 18:01:05 2008 From: bush at HMC.Edu (Eliot Bush) Date: Wed, 05 Mar 2008 18:01:05 -0800 Subject: [Genome] 30-way multiz on human Message-ID: <47CF5061.2000605@hmc.edu> hello, I see that the current mouse browser is using 30-way multiz alignments which include orangutan and marmoset. I'm wondering when the human browser is going to start using these alignments. thanks much, Eliot From therealsisterdot at gmail.com Thu Mar 6 01:44:58 2008 From: therealsisterdot at gmail.com (thereal sisterdot) Date: Thu, 6 Mar 2008 10:44:58 +0100 Subject: [Genome] old topic revived: multiz sequence accessibility Message-ID: Hey all... i know the subject has been discussed before, so i apologize for bringing it up again... as already discussed there are sequences in multiz alignments, which are not available for browsing- applies to many of the drosophilids... they are also not available in the genome-test... so the only solution for people who want to extract the whole fragments around the alignment seems to be to download the complete genomes and extract the segments they want with the methods they choose... wouldn't it be nice to provide a genome browsing capability within UCSC for those only partially represented genomes- even if there is no annotation associated with them (besides the already available- e.g. multiz )... thanx sisterdot From hiram at soe.ucsc.edu Thu Mar 6 06:12:58 2008 From: hiram at soe.ucsc.edu (Hiram Clawson) Date: Thu, 06 Mar 2008 06:12:58 -0800 Subject: [Genome] correction of chunk coordinates in blastz alignments In-Reply-To: <889F4F6D1CFC9945B3610F97CD400C1501532B@PIGEONS-MA.imcb.a-star.edu.sg> References: <47C6F224.7020007@charite.de> <889F4F6D1CFC9945B3610F97CD400C15015323@PIGEONS-MA.imcb.a-star.edu.sg> <47CDF7AA.5060106@soe.ucsc.edu> <889F4F6D1CFC9945B3610F97CD400C1501532B@PIGEONS-MA.imcb.a-star.edu.sg> Message-ID: <47CFFBEA.4010605@soe.ucsc.edu> Good Morning J.L.: All of the tools required to run the blastz sequence are in the kent source tree. Fetch the source tree: http://genome.ucsc.edu/admin/cvs.html and then look at the file: src/hg/utils/automation/doBlastzChainNet.pl and library there: HgAutomate.pm See also articles about this subject in the genomewiki.ucsc.edu --Hiram Wang JianLi wrote: > > Hi All, > > I split human genomic sequences into chunks with size 10M, and overlap size 10K, and run blastz to align each chunk with another genome. Then I have a number of .lav files after the blastz. I am not sure how I can piece them together, to correct the coordicates of the sequence chunks. As in each chunk, the sequence coordinate will start from 1, but in the original chromosome sequence, this coordinate should be adjusted by the size of chunks and overlap region. > > I read that blastz-normalizeLav written by Scott Schwartz can do that. Does anybody know where to get that script? > > Thank you. > > Regards, > J.L From hiram at soe.ucsc.edu Thu Mar 6 06:16:45 2008 From: hiram at soe.ucsc.edu (Hiram Clawson) Date: Thu, 06 Mar 2008 06:16:45 -0800 Subject: [Genome] Source of genomic DNA In-Reply-To: References: Message-ID: <47CFFCCD.4020403@soe.ucsc.edu> Good Morning Alex: From the gateway page for each genome: http://genome.ucsc.edu/cgi-bin/hgGateway there is a link to the sequencing center for each genome. The sequencing centers describe the animal used for that genome. In many cases, the photograph we display is the actual animal that was sequenced. --Hiram Alex Romero wrote: > Hello, I am doing an analysis of all animal genomes and an important part of > it involves the source from which the genomic DNA was obtained. Is there > anywhere where this is available? I was able to find it for some genomes > here and there, but many of the genomes I could not find anything. Thank > you. > > Alex Romero > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome > From ravids at yahoo.com Thu Mar 6 06:13:11 2008 From: ravids at yahoo.com (ravid straussman) Date: Thu, 6 Mar 2008 06:13:11 -0800 (PST) Subject: [Genome] Affy all exon database Message-ID: <531753.56873.qm@web50301.mail.re2.yahoo.com> Dear Sir, I couldn't find any indication to whether the log ratios (ExpScores) in the Affy all exon table are in base-2 or base-10. Any idea? Thanks, Ravid Straussman. --------------------------------- Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. From vebaev at gmail.com Thu Mar 6 06:54:47 2008 From: vebaev at gmail.com (Vesselin Baev) Date: Thu, 6 Mar 2008 16:54:47 +0200 Subject: [Genome] genome alignment Message-ID: Dear All, Is there an already done genome-wide human-chimp alignment that I can use for extracting portions of it with specified coordinates? If not, what should I use to make such alignment and what to use to extract regions of it with coordinates (Galaxy or?)? Vesko -- ------------------------------------------------ Vesselin Baev, PhD University of Plovdiv Dept. Molecular Biology Bioinformatics Group Tzar Assen 24 Plovdiv 4000, BULGARIA 032/ 261 (534) 089/ 43 80 945 Skype: vesselin_baev vebaev at gmail.com baev at uni-plovdiv.bg From ann at soe.ucsc.edu Thu Mar 6 10:03:26 2008 From: ann at soe.ucsc.edu (Ann Zweig) Date: Thu, 06 Mar 2008 10:03:26 -0800 Subject: [Genome] Drosophila ChIP/chip In-Reply-To: <1D6DF502-28FB-4134-AEE4-8A13F0E01BA4@lbl.gov> References: <1D6DF502-28FB-4134-AEE4-8A13F0E01BA4@lbl.gov> Message-ID: <47D031EE.1040408@soe.ucsc.edu> Hello Mark, Our team is looking at your data and discussing the possibility of hosting it on our website. We will get back to you off-list with any questions. Regards, ---------- Ann Zweig UCSC Genome Bioinformatics Group http://genome.ucsc.edu Mark Biggin wrote: > Dear UCSC team, we wonder if you would be interested in including ChIP/ > chip data produced by the Berkeley Drosophila Transcription Network > Project as a track on your browser? For now we would suggest providing > 1% and 25% False Discovery Rate thresholded data for 7 sequence > specific transcription factors and RNA polymerase. Wiggle files of the > 1% FDR data are at the links below, so you can get a sense of the > data. (these are to Release 4 of the mel genome sequence). We could > improve the appearance and add color if you agree to host these. > > http://rana.lbl.gov/~smacarth/ChIPChip/Public_Release/wiggle/bcd_1_012505-sym-1.wig > http://rana.lbl.gov/~smacarth/ChIPChip/Public_Release/wiggle/bcd_2_092005-sym-1.wig > http://rana.lbl.gov/~smacarth/ChIPChip/Public_Release/wiggle/cad_1_020107-sym-1.wig > http://rana.lbl.gov/~smacarth/ChIPChip/Public_Release/wiggle/gt_2_020107-sym-1.wig > http://rana.lbl.gov/~smacarth/ChIPChip/Public_Release/wiggle/hb_1_012505-sym-1.wig > http://rana.lbl.gov/~smacarth/ChIPChip/Public_Release/wiggle/hb_2_092305-sym-1.wig > http://rana.lbl.gov/~smacarth/ChIPChip/Public_Release/wiggle/kni_1_092706-sym-1.wig > http://rana.lbl.gov/~smacarth/ChIPChip/Public_Release/wiggle/kni_2_092706-sym-1.wig > http://rana.lbl.gov/~smacarth/ChIPChip/Public_Release/wiggle/kr_1_113005-sym-1.wig > http://rana.lbl.gov/~smacarth/ChIPChip/Public_Release/wiggle/kr_2_113005-sym-1.wig > http://rana.lbl.gov/~smacarth/ChIPChip/Public_Release/wiggle/polII_8WG16_092905-sym-1.wig > http://rana.lbl.gov/~smacarth/ChIPChip/Public_Release/wiggle/z_2_051504-sym-1.wig > > Our data are particularly well justified and of high quality. In > addition, our interpretation and analysis of the in vivo DNA binding > data differ significantly from most others, emphasizing differences in > the classes of region bound at different levels and showing that many > weakly bound regions are likely to be non functional. This is > explained in our recent PLoS Biology paper. > http://biology.plosjournals.org/perlserv/?request=get-document&doi=10.1371%2Fjournal.pbio.0060027 > > I'd be happy to provide any further information you may require. We > have data on a further 14 factors that we hope to release by the summer. > > sincerely > > mark biggin > > > Mark Biggin > Genomics Division > Lawrence Berkeley National Laboratory > Berkeley CA 94720 > > Phone (510) 486 7606 > Fax (510) 486 4229 > email mdbiggin at lbl.gov > > > > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From kayla at soe.ucsc.edu Thu Mar 6 10:57:54 2008 From: kayla at soe.ucsc.edu (Kayla Smith) Date: Thu, 06 Mar 2008 10:57:54 -0800 Subject: [Genome] genome alignment In-Reply-To: References: Message-ID: <47D03EB2.60203@cse.ucsc.edu> Hello Vesko, Yes, we have chain and net data between human and chimp on our website. To see the chimp chains and nets on human, open the hg18 browser, and scroll down to the "Comparative Genomics" section. You'll see "Chimp Chain" and "Chimp Net" which you can turn on. You can use our Table Browser ("Tables" on the blue bar on the top of the main page) to extract a subset of this data. And finally, to download this data in it's entirety, click on "Downloads" --> "Human" --> "Human/Chimp pairwise alignments" or click here: http://hgdownload.cse.ucsc.edu/goldenPath/hg18/vsPanTro2/ I hope this information is helpful to you. Please don't hesitate to contact us again if you require further assistance. Kayla Smith UCSC Genome Bioinformatics Group Vesselin Baev wrote: > Dear All, > Is there an already done genome-wide human-chimp alignment that I can > use for extracting portions of it with specified coordinates? > If not, what should I use to make such alignment and what to use to > extract regions of it with coordinates (Galaxy or?)? > > Vesko > > From kayla at soe.ucsc.edu Thu Mar 6 11:28:35 2008 From: kayla at soe.ucsc.edu (Kayla Smith) Date: Thu, 06 Mar 2008 11:28:35 -0800 Subject: [Genome] Get special sequences from UCSC browser In-Reply-To: References: Message-ID: <47D045E3.4040402@cse.ucsc.edu> Hello Na, Our Table Browser can be used to retrieve our Fly Base data. Click on "Tables" on the blue bar on the top of the main page and select the following settings: clade: Insect; genome: D. melanogaster; assembly: Apr. 2006 group: Genes and Gene Prediction Tracks; track: FlyBase Genes table: flyBaseGene; region: genome; output format: all fields from selected table; click on "get output". There is a link toward the top of the page for help on using the Table Browser. Alternatively, the flyBaseGene table can be downloaded by clicking on "Downloads" --> "D. Melanogaster" --> "Annotation Database" or by clicking here: http://hgdownload.cse.ucsc.edu/goldenPath/dm3/database/ I hope this is helpful to you. Please don't hesitate to contact us again if you require further assistance. Kayla Smith UCSC Genome Bioinformatics Group na liu wrote: > Hi, professors: > I want all the FB elements in Drosophila melanogaster from UCSC browser. > How can I get the full information about them, including the coordinate > information, etc. > > Look forward to your reply. > > Best > > Na > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From kayla at soe.ucsc.edu Thu Mar 6 11:45:42 2008 From: kayla at soe.ucsc.edu (Kayla Smith) Date: Thu, 06 Mar 2008 11:45:42 -0800 Subject: [Genome] Affy all exon database In-Reply-To: <531753.56873.qm@web50301.mail.re2.yahoo.com> References: <531753.56873.qm@web50301.mail.re2.yahoo.com> Message-ID: <47D049E6.8040403@cse.ucsc.edu> Hello, Ravid, The log ratios are using base-2. As an aside, I like to point out our wiki page to users of the Affy All Exon track, in case it can answer future questions: http://genomewiki.cse.ucsc.edu/index.php/Microarray_track I hope this information is helpful to you. Please don't hesitate to contact us again if you require further assistance. Kayla Smith UCSC Genome Bioinformatics Group ravid straussman wrote: > Dear Sir, > > I couldn't find any indication to whether the log ratios (ExpScores) in the Affy all exon table are in base-2 or base-10. > > Any idea? > > Thanks, > > Ravid Straussman. > > > --------------------------------- > Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From MDBiggin at lbl.gov Thu Mar 6 10:41:09 2008 From: MDBiggin at lbl.gov (Mark Biggin) Date: Thu, 6 Mar 2008 10:41:09 -0800 Subject: [Genome] Drosophila ChIP/chip In-Reply-To: <47D031EE.1040408@soe.ucsc.edu> References: <1D6DF502-28FB-4134-AEE4-8A13F0E01BA4@lbl.gov> <47D031EE.1040408@soe.ucsc.edu> Message-ID: <5C315F85-C1DC-4B15-B7AA-F99C73CAE684@lbl.gov> Ann, I should perhaps have also provide a link to our own web site, which may provide answers to some potential questions. http://bdtnp.lbl.gov/Fly-Net/chipchip.jsp?w=summary You'll see our motivation for preferring that you host a visualization of our data is that currently we have provided links to the UCSC browser for loading files as guest tracks, but to allow quick upload times, we only upload crude simplifications of the data that do not show much critical detail. (http://bdtnp.lbl.gov/Fly-Net/SearchChipper?first=12 the links on the right). It would be great to allow folks to quick see the richer data shown in the wiggle files in the below message. thank you for considering our request. mark On Mar 6, 2008, at 10:03 AM, Ann Zweig wrote: > Hello Mark, > > Our team is looking at your data and discussing the possibility of > hosting it on our website. We will get back to you off-list with > any questions. > > > Regards, > > ---------- > Ann Zweig > UCSC Genome Bioinformatics Group > http://genome.ucsc.edu > > > > Mark Biggin wrote: >> Dear UCSC team, we wonder if you would be interested in including >> ChIP/ chip data produced by the Berkeley Drosophila Transcription >> Network Project as a track on your browser? For now we would >> suggest providing 1% and 25% False Discovery Rate thresholded data >> for 7 sequence specific transcription factors and RNA polymerase. >> Wiggle files of the 1% FDR data are at the links below, so you can >> get a sense of the data. (these are to Release 4 of the mel genome >> sequence). We could improve the appearance and add color if you >> agree to host these. >> http://rana.lbl.gov/~smacarth/ChIPChip/Public_Release/wiggle/bcd_1_012505-sym-1.wig >> http://rana.lbl.gov/~smacarth/ChIPChip/Public_Release/wiggle/bcd_2_092005-sym-1.wig >> http://rana.lbl.gov/~smacarth/ChIPChip/Public_Release/wiggle/cad_1_020107-sym-1.wig >> http://rana.lbl.gov/~smacarth/ChIPChip/Public_Release/wiggle/gt_2_020107-sym-1.wig >> http://rana.lbl.gov/~smacarth/ChIPChip/Public_Release/wiggle/hb_1_012505-sym-1.wig >> http://rana.lbl.gov/~smacarth/ChIPChip/Public_Release/wiggle/hb_2_092305-sym-1.wig >> http://rana.lbl.gov/~smacarth/ChIPChip/Public_Release/wiggle/kni_1_092706-sym-1.wig >> http://rana.lbl.gov/~smacarth/ChIPChip/Public_Release/wiggle/kni_2_092706-sym-1.wig >> http://rana.lbl.gov/~smacarth/ChIPChip/Public_Release/wiggle/kr_1_113005-sym-1.wig >> http://rana.lbl.gov/~smacarth/ChIPChip/Public_Release/wiggle/kr_2_113005-sym-1.wig >> http://rana.lbl.gov/~smacarth/ChIPChip/Public_Release/wiggle/polII_8WG16_092905-sym-1.wig >> http://rana.lbl.gov/~smacarth/ChIPChip/Public_Release/wiggle/z_2_051504-sym-1.wig >> Our data are particularly well justified and of high quality. In >> addition, our interpretation and analysis of the in vivo DNA >> binding data differ significantly from most others, emphasizing >> differences in the classes of region bound at different levels and >> showing that many weakly bound regions are likely to be non >> functional. This is explained in our recent PLoS Biology paper. >> http://biology.plosjournals.org/perlserv/?request=get-document&doi=10.1371%2Fjournal.pbio.0060027 >> I'd be happy to provide any further information you may require. >> We have data on a further 14 factors that we hope to release by >> the summer. >> sincerely >> mark biggin >> Mark Biggin >> Genomics Division >> Lawrence Berkeley National Laboratory >> Berkeley CA 94720 >> Phone (510) 486 7606 >> Fax (510) 486 4229 >> email mdbiggin at lbl.gov >> _______________________________________________ >> Genome maillist - Genome at soe.ucsc.edu >> http://www.soe.ucsc.edu/mailman/listinfo/genome Mark Biggin Genomics Division Lawrence Berkeley National Laboratory Berkeley CA 94720 Phone (510) 486 7606 Fax (510) 486 4229 email mdbiggin at lbl.gov From czaleski at albany.edu Thu Mar 6 12:05:56 2008 From: czaleski at albany.edu (Chris Zaleski) Date: Thu, 6 Mar 2008 15:05:56 -0500 (EST) Subject: [Genome] downloaded exons Message-ID: <42770.74.76.159.124.1204833956.squirrel@webmail.albany.edu> Greetings, I have a question about downloaded BED files from the table browser. I've chosen a 'Gene' track, BED output format, and then Exons (from the 2nd page). An example 'name' field looks like the following: NM_152486_exon_0_0_chr1_850984_f I understand most, but not all, of the tokens in this string. Could you please explain what all the items represent? Thanks very much, Chris Zaleski Bioinformatics Team Lead Tenenbaum Lab - SUNY Albany From ann at soe.ucsc.edu Thu Mar 6 12:36:40 2008 From: ann at soe.ucsc.edu (Ann Zweig) Date: Thu, 06 Mar 2008 12:36:40 -0800 Subject: [Genome] Request - WIG files can have overlapped regions In-Reply-To: References: Message-ID: <47D055D8.2090209@soe.ucsc.edu> Gordon and Hiram are discussion possible solutions to this off-list. If you are particularly interested in the solutions they come up with, please don't hesitate to write back to the list for details. Regards, ---------- Ann Zweig UCSC Genome Bioinformatics Group http://genome.ucsc.edu Gordon Robertson wrote: > I discussed this a few months ago, perhaps with Hriam, and understand that > your current WIG functionality does not allow BED-format records to overlap. > > But such overlap occurs in many published datasets. The case I face today is > a set of ~24k promoter regions in mouse (Barrera et al. 2008), which I > lifted from mm5 to mm8 through your web interface. If I select out the > scored BED records for a tissue from these data, and format the records as a > BED-format WIG file, I cannot load the file, because at least one pair of > records overlaps. > > I anticipate that changing the functionality of your code to permit overlap > in all three WIG file types is nontrivial, but could I ask that you consider > addressing this? > > Barrera LO, Li Z, Smith AD, Arden KC, Cavenee WK, Zhang MQ, Green RD, Ren B. > 2008. Genome-wide mapping and analysis of active promoters in mouse > embryonic stem cells and adult organs. Genome Res. 18(1):46-59. > > G > --- > Gordon Robertson > Gene Regulation Informatics > Canada's Michael Smith Genome Sciences Centre > Vancouver BC Canada > www.bcgsc.ca > grobertson at bcgsc.ca > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From kayla at soe.ucsc.edu Thu Mar 6 13:29:00 2008 From: kayla at soe.ucsc.edu (Kayla Smith) Date: Thu, 06 Mar 2008 13:29:00 -0800 Subject: [Genome] 30-way multiz on human In-Reply-To: <47CF5061.2000605@hmc.edu> References: <47CF5061.2000605@hmc.edu> Message-ID: <47D0621C.70508@cse.ucsc.edu> Hello Eliot, We are not actively working on a 30-way multiz alignment for human at this time. However, you may be interested in the human/marmoset and human/orangutan nets and chains tracks on our development server. These will be out on our public server soon, but for now, you can see them here (this is a session with those tracks turned on): http://genome-test.cse.ucsc.edu/cgi-bin/hgTracks?hgS_doOtherUser=submit&hgS_otherUserName=Kayla&hgS_otherUserSessionName=hg18_ponAbe2_calJac1 I hope this information is helpful to you. Please don't hesitate to contact us again if you require further assistance. Kayla Smith UCSC Genome Bioinformatics Group Eliot Bush wrote: > hello, > > I see that the current mouse browser is using 30-way multiz alignments > which include orangutan and marmoset. I'm wondering when the human > browser is going to start using these alignments. > > thanks much, > Eliot > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From vebaev at gmail.com Thu Mar 6 13:28:00 2008 From: vebaev at gmail.com (Vesselin Baev) Date: Thu, 6 Mar 2008 23:28:00 +0200 Subject: [Genome] genome-wide align Message-ID: Dear all, I explored the "Tables" menu in Genome browser. I wander what should I use to extract regions (with coordinates from RefSeqs) of aligned human-chimp genome to look something like this format: >NM_017821.0.1 hg17.chr1 206501580 206501621 - 245522847 mm5.chr4 122200029 122200071 + 154141344 rn3.chr5 143135020 143135060 + 173106704 canFam1.chr15 6850518 6850559 + 67237905 CCTGCCCCTATTGTAAGTCAATTAATA-AAAAGAGCCATCTGG CTTGCCTCTATGATAAACCAGTTAATATAAAAGTGTCACATGG CTTGCCTCTGTTATAAGCCACTTAATA--AAAGTGTCACATGG CCTGCCTCTCATAGGAAGCAAGTAATG-AAAAGAGCCATCTGG >NM_004070.0.0 hg17.chr1 16105460 16105489 + 245522847 mm5.chr4 14280459 14280491 - 154141344 rn3.chr5 12802003 12802032 - 173106704 canFam1.chr2 5413700 5413729 - 87725193 GCCGGCCCAGCAAGATGAAACAG---GGCACCC GTCAGCCTGGGGGGGGTCGGCAGCCTGGCACCC GTCAGCCTGG---GAGTCGGCAGCCTGGCGCCC GCCAGCCCAGCAAGATGAAACAG---GGTGGCC >NM_004070.0.1 hg17.chr1 16105490 16105510 + 245522847 mm5.chr4 14280499 14280519 - 154141344 rn3.chr5 12821881 12821900 - 173106704 canFam1.chr2 5413730 5413751 - 87725193 CAGCTGACCTGGTACTGAGGT- CAGCTGCCATGGATCTGGGAT- CAGCTAAGA-GCTGCAGAGGC- CGGCCGCCCTGGTGAAGGAGAT Vesko 2008/3/6, Kayla Smith : > > Hello Vesko, > > Yes, we have chain and net data between human and chimp on our website. > To see the chimp chains and nets on human, open the hg18 browser, and > scroll down to the "Comparative Genomics" section. You'll see "Chimp > Chain" and "Chimp Net" which you can turn on. You can use our Table > Browser ("Tables" on the blue bar on the top of the main page) to > extract a subset of this data. And finally, to download this data in > it's entirety, click on "Downloads" --> "Human" --> "Human/Chimp > pairwise alignments" or click here: > > http://hgdownload.cse.ucsc.edu/goldenPath/hg18/vsPanTro2/ > > I hope this information is helpful to you. Please don't hesitate to > contact us again if you require further assistance. > > > Kayla Smith > UCSC Genome Bioinformatics Group > > > > > Vesselin Baev wrote: > > Dear All, > > Is there an already done genome-wide human-chimp alignment that I can > > use for extracting portions of it with specified coordinates? > > If not, what should I use to make such alignment and what to use to > > extract regions of it with coordinates (Galaxy or?)? > > > > Vesko > > > > > > -- ------------------------------------------------ Vesselin Baev, PhD University of Plovdiv Dept. Molecular Biology Bioinformatics Group Tzar Assen 24 Plovdiv 4000, BULGARIA 032/ 261 (534) 089/ 43 80 945 Skype: vesselin_baev vebaev at gmail.com baev at uni-plovdiv.bg -- ------------------------------------------------ Vesselin Baev, PhD University of Plovdiv Dept. Molecular Biology Bioinformatics Group Tzar Assen 24 Plovdiv 4000, BULGARIA 032/ 261 (534) 089/ 43 80 945 Skype: vesselin_baev vebaev at gmail.com baev at uni-plovdiv.bg From keithlamont at mac.com Thu Mar 6 13:58:18 2008 From: keithlamont at mac.com (Keith Lamont) Date: Thu, 6 Mar 2008 15:58:18 -0600 Subject: [Genome] Request - WIG files can have overlapped regions In-Reply-To: <47D055D8.2090209@soe.ucsc.edu> References: <47D055D8.2090209@soe.ucsc.edu> Message-ID: <09F15549-D3A9-4E80-84B9-D315DB541DA7@mac.com> We have the same issue, so I am interested in solutions to this. Keith Lamont On Mar 6, 2008, at 2:36 PM, Ann Zweig wrote: > Gordon and Hiram are discussion possible solutions to this off- > list. If you are > particularly interested in the solutions they come up with, please > don't > hesitate to write back to the list for details. > > Regards, > > ---------- > Ann Zweig > UCSC Genome Bioinformatics Group > http://genome.ucsc.edu > > > > > Gordon Robertson wrote: >> I discussed this a few months ago, perhaps with Hriam, and >> understand that >> your current WIG functionality does not allow BED-format records to >> overlap. >> >> But such overlap occurs in many published datasets. The case I face >> today is >> a set of ~24k promoter regions in mouse (Barrera et al. 2008), >> which I >> lifted from mm5 to mm8 through your web interface. If I select out >> the >> scored BED records for a tissue from these data, and format the >> records as a >> BED-format WIG file, I cannot load the file, because at least one >> pair of >> records overlaps. >> >> I anticipate that changing the functionality of your code to permit >> overlap >> in all three WIG file types is nontrivial, but could I ask that you >> consider >> addressing this? >> >> Barrera LO, Li Z, Smith AD, Arden KC, Cavenee WK, Zhang MQ, Green >> RD, Ren B. >> 2008. Genome-wide mapping and analysis of active promoters in mouse >> embryonic stem cells and adult organs. Genome Res. 18(1):46-59. >> >> G >> --- >> Gordon Robertson >> Gene Regulation Informatics >> Canada's Michael Smith Genome Sciences Centre >> Vancouver BC Canada >> www.bcgsc.ca >> grobertson at bcgsc.ca >> >> _______________________________________________ >> Genome maillist - Genome at soe.ucsc.edu >> http://www.soe.ucsc.edu/mailman/listinfo/genome > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From kayla at soe.ucsc.edu Thu Mar 6 14:36:21 2008 From: kayla at soe.ucsc.edu (Kayla Smith) Date: Thu, 06 Mar 2008 14:36:21 -0800 Subject: [Genome] old topic revived: multiz sequence accessibility In-Reply-To: References: Message-ID: <47D071E5.9060909@cse.ucsc.edu> Hello Sisterdot, Thank you for your suggestion. I have passed your suggestion on to our developers and project manager. Meanwhile, if you have any other questions about the Genome Browser, please don't hesitate to contact us again. Kayla Smith UCSC Genome Bioinformatics Group thereal sisterdot wrote: > Hey all... > > i know the subject has been discussed before, so i apologize for bringing it > up again... > > as already discussed there are sequences in multiz alignments, which are not > available for browsing- applies to many of the drosophilids... they are also > not available in the genome-test... > > so the only solution for people who want to extract the whole fragments > around the alignment seems to be to download the complete > genomes and extract the segments they want with the methods they choose... > > wouldn't it be nice to provide a genome browsing capability within UCSC for > those only partially represented genomes- even if there is no annotation > associated with them (besides the already available- e.g. multiz )... > > thanx > sisterdot > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From kayla at soe.ucsc.edu Thu Mar 6 15:28:59 2008 From: kayla at soe.ucsc.edu (Kayla Smith) Date: Thu, 06 Mar 2008 15:28:59 -0800 Subject: [Genome] downloaded exons In-Reply-To: <42770.74.76.159.124.1204833956.squirrel@webmail.albany.edu> References: <42770.74.76.159.124.1204833956.squirrel@webmail.albany.edu> Message-ID: <47D07E3B.3030201@cse.ucsc.edu> Hello Chris, This is best described with an example. I shall choose the "CITED2" gene on the hg18 assembly. For your reference, this gene is at this position: chr6:139,735,090-139,737,478. Please note that this gene is on the reverse strand, which you can see from the <<< arrows on the display of the gene. Once I have this gene in view in the Genome Browser, I click on "Tables" to go to the Table Browser. I use the following settings: clade: Vertebrate; genome: Human; assembly: Mar. 2006 group: Genes and Gene Prediction Tracks; track: UCSC Genes; table: knownGene; position: chr6:139,735,090-139,737,478 output format: BED. Click "Get output" and on the next page select the radio button next to "Exons", and click "Get BED". Here are the 2 lines of output: chr6 139735089 139736782 uc003qip.1_exon_0_0_chr6_139735090_r 0 - chr6 139737242 139737478 uc003qip.1_exon_1_0_chr6_139737243_r 0 - Looking at: uc003qip.1_exon_0_0_chr6_139735090_r Here we have: 1. uc003qip.1 is the name of the UCSC gene 2. exon_0_0 is the number of the exon. You'll note that the next exon is labeled "exon_1_0". 3. chr6_139735090 is the starting position of that exon 4. r means "reverse strand" I hope this information is helpful to you. Please don't hesitate to contact us again if you require further assistance. Kayla Smith UCSC Genome Bioinformatics Group Chris Zaleski wrote: > Greetings, > > I have a question about downloaded BED files from the table browser. I've > chosen a 'Gene' track, BED output format, and then Exons (from the 2nd > page). An example 'name' field looks like the following: > > NM_152486_exon_0_0_chr1_850984_f > > I understand most, but not all, of the tokens in this string. Could you > please explain what all the items represent? > > Thanks very much, > Chris Zaleski > Bioinformatics Team Lead > Tenenbaum Lab - SUNY Albany > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From angie at soe.ucsc.edu Thu Mar 6 15:54:47 2008 From: angie at soe.ucsc.edu (Angie Hinrichs) Date: Thu, 6 Mar 2008 15:54:47 -0800 (PST) Subject: [Genome] downloaded exons In-Reply-To: <47D07E3B.3030201@cse.ucsc.edu> References: <42770.74.76.159.124.1204833956.squirrel@webmail.albany.edu> <47D07E3B.3030201@cse.ucsc.edu> Message-ID: Hi Chris, Here's a small addition to Kayla's explanation. You might be wondering about the extra _0 following the exon number: > 2. exon_0_0 is the number of the exon. You'll note that the next exon > is labeled "exon_1_0". -- I was! I had to look it up in the code. It is the number of extra bases appended to each end of the exon. In your case (and in most cases I think), 0 bases were appended so you're getting just the exons. It comes from the Table Browser's "Output track as BED" options page: Create one BED record per: ... (*) Exons plus [0] bases at each end Hope that helps, Angie On Thu, 6 Mar 2008, Kayla Smith wrote: > > Hello Chris, > > This is best described with an example. I shall choose the "CITED2" > gene on the hg18 assembly. For your reference, this gene is at this > position: chr6:139,735,090-139,737,478. Please note that this gene is > on the reverse strand, which you can see from the <<< arrows on the > display of the gene. > > Once I have this gene in view in the Genome Browser, I click on "Tables" > to go to the Table Browser. I use the following settings: > > clade: Vertebrate; genome: Human; assembly: Mar. 2006 > group: Genes and Gene Prediction Tracks; track: UCSC Genes; > table: knownGene; position: chr6:139,735,090-139,737,478 > output format: BED. > > Click "Get output" and on the next page select the radio button next to > "Exons", and click "Get BED". > > Here are the 2 lines of output: > > chr6 139735089 139736782 uc003qip.1_exon_0_0_chr6_139735090_r > 0 - > chr6 139737242 139737478 uc003qip.1_exon_1_0_chr6_139737243_r > 0 - > > > Looking at: uc003qip.1_exon_0_0_chr6_139735090_r > > Here we have: > > 1. uc003qip.1 is the name of the UCSC gene > 2. exon_0_0 is the number of the exon. You'll note that the next exon > is labeled "exon_1_0". > 3. chr6_139735090 is the starting position of that exon > 4. r means "reverse strand" > > I hope this information is helpful to you. Please don't hesitate to > contact us again if you require further assistance. > > Kayla Smith > UCSC Genome Bioinformatics Group > > > > Chris Zaleski wrote: > > Greetings, > > > > I have a question about downloaded BED files from the table browser. I've > > chosen a 'Gene' track, BED output format, and then Exons (from the 2nd > > page). An example 'name' field looks like the following: > > > > NM_152486_exon_0_0_chr1_850984_f > > > > I understand most, but not all, of the tokens in this string. Could you > > please explain what all the items represent? > > > > Thanks very much, > > Chris Zaleski > > Bioinformatics Team Lead > > Tenenbaum Lab - SUNY Albany > > _______________________________________________ > > Genome maillist - Genome at soe.ucsc.edu > > http://www.soe.ucsc.edu/mailman/listinfo/genome > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome > -- angie at soe.ucsc.edu Software Developer, UCSC CBSE / Genome Bioinformatics Group From kayla at soe.ucsc.edu Thu Mar 6 16:01:29 2008 From: kayla at soe.ucsc.edu (Kayla Smith) Date: Thu, 06 Mar 2008 16:01:29 -0800 Subject: [Genome] Foxp3 Message-ID: <47D085D9.9070904@cse.ucsc.edu> Hello Kikuo, The coordinates you have given, chrX:47,348,709-47,355,827 do not correspond to the Foxp3 gene on the human, hg18 browser. When I search for "Foxp3" in the position/search box on the hg18 browser, I get 3 hits for "UCSC Genes" and 2 hits for "RefSeq Genes". Could this be the discrepancy you had noticed? I hope this helps to clear things up. If not, please don't hesitate to contact us again. Kayla Smith UCSC Genome Bioinformatics Group ---- Original Message ----- From: ????? To: genome at soe.ucsc.edu Sent: Wednesday, March 05, 2008 10:13 PM Subject: Foxp3 Dear Sir There are two human Foxp3 genes in the map chrX:47,348,709-47,355,827, those sequences are completely different. Yesterday they were three. PLease let me know which one is correct? Sincerely Kikuo Onozaki?Professor, Ph.D Department of Molecular Health Sciences, Graduate School of Pharmaceuticlal Sciences, Nagoya City University Tanabe, Mizuho-ku, Nagoya 467-8603, Japan Tel/Fax: +81-52-836-3419 E-mail: konozaki at phar.nagoya-cu.ac.jp From rhead at soe.ucsc.edu Thu Mar 6 16:05:28 2008 From: rhead at soe.ucsc.edu (Brooke Rhead) Date: Thu, 06 Mar 2008 16:05:28 -0800 Subject: [Genome] 3'utr coordinates from the table browser In-Reply-To: <47ACFD78.80503@soe.ucsc.edu> References: <83CD6ECF-DCA7-4478-9C9E-CF11A1A9F7F2@ustc.edu> <47ACFD78.80503@soe.ucsc.edu> Message-ID: <47D086C8.3050002@soe.ucsc.edu> Hi Wen, I finally have an explanation for you about the second "0" in the Table Browser output headers (your question #2 below): http://www.soe.ucsc.edu/pipermail/genome/2008-March/015765.html Thanks for being patient! -- Brooke Rhead UCSC Genome Bioinformatics Group Brooke Rhead wrote: > Hello Wen, > > Please see answers to your questions intersperesed below: > > Wen Huang wrote: >> Hi, >> >> I have a few questions about the BED file generated when retrieving >> 3'utr coordinates. >> >> I choose 3'UTR exons from the output. >> >> Below are a few lines from the file. I have a few questions: >> >> 1) since these are 3'UTR exons (not just 3'UTR), do they also include >> some exon sequences that are from the coding region? (e.g. STOP in the >> middle of a exon) For 3'UTRs that span multiple exons, are all the >> exons included? > > The UTRs are the untranslated regions of exons. They do not include > coding regions. For UTRs that span multiple exons, all exons are > included in the Table Browser output, but multiple exons will occupy > multiple lines in the BED file. > > An easy way to examine the regions output by the Table Browser is to > choose "custom track" as the output format -- the selected regions will > appear in a "user track" at the top of the Genome Browser display. > >> 2) the first three and the last columns are easy to understand, >> correct me if I am wrong. chromosome id, start, end, strand. What do >> the numbers in, for example, "NM_174812_utr3_0_0_chr16_3833_r" mean >> and what does the second column from the last mean?(they are all 0's). > > I think the second zero in the name generated by the Table Browser is > unused in this instance, but I am not certain. I have asked our > developers about it, and I will send a follow-up to this answer when I > know for sure. > >> 3) are these coordinates start from 0 or 1? > > The BED coordinates start from 0. See an explanation here: > http://genome.ucsc.edu/FAQ/FAQtracks#tracks1 > > I hope this information is helpful. > > -- > Brooke Rhead > UCSC Genome Bioinformatics Group > > > >> Thank you very much. >> >> Wen >> chr16 3882 5634 NM_174812_utr3_0_0_chr16_3883_r 0 - >> chr16 35447 35625 NM_001080309_utr3_1_0_chr16_35448_f 0 + >> chr16 71582 72004 NM_001098464_utr3_4_0_chr16_71583_f 0 + >> chr16 191157 192344 NM_174143_utr3_10_0_chr16_191158_f 0 + >> chr16 354114 354322 NM_174088_utr3_4_0_chr16_354115_f 0 + >> >> >> _______________________________________________ >> Genome maillist - Genome at soe.ucsc.edu >> http://www.soe.ucsc.edu/mailman/listinfo/genome > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From kayla at soe.ucsc.edu Thu Mar 6 17:36:41 2008 From: kayla at soe.ucsc.edu (Kayla Smith) Date: Thu, 06 Mar 2008 17:36:41 -0800 Subject: [Genome] genome alignment In-Reply-To: References: <47D03EB2.60203@cse.ucsc.edu> Message-ID: <47D09C29.8010406@cse.ucsc.edu> Hello again, Vesselin, What method did you use to obtain the data you have listed below? Did you use Galaxy to get those results? What you can do with the Genome Browser is to go to the Table Browser ("Tables" on the blue bar on the top of the main page) and set the following options: clade: Vertebrate; genome: Human; assembly: Mar. 2006; group: Comparative Genomics; track: Conservation; table: multiz28way region: (put in the region you are interested in); output format: MAF and click "Get output". You can also make an intersection in the Table Browser between a Custom Track of the coordinates you are interested in, and the multiz28way table. Further instructions on the Table Brower can be found here: http://genome.ucsc.edu/goldenPath/help/hgTablesHelp.html If you want to get results exactly as you have listed below, you will have to use Galaxy. Kayla Smith UCSC Genome Bioinformatics Group Vesselin Baev wrote: > Dear Kayla Smith, > I explored the "Tables" menu in Genome browser. I wander what should I > use to extract regions (with coordinates from RefSeqs) of aligned > human-chimp genome to look something like this format: > > >> NM_017821.0.1 hg17.chr1 206501580 206501621 - 245522847 mm5.chr4 > 122200029 122200071 + 154141344 rn3.chr5 143135020 143135060 + > 173106704 canFam1.chr15 6850518 6850559 + 67237905 > CCTGCCCCTATTGTAAGTCAATTAATA-AAAAGAGCCATCTGG > CTTGCCTCTATGATAAACCAGTTAATATAAAAGTGTCACATGG > CTTGCCTCTGTTATAAGCCACTTAATA--AAAGTGTCACATGG > CCTGCCTCTCATAGGAAGCAAGTAATG-AAAAGAGCCATCTGG > >> NM_004070.0.0 hg17.chr1 16105460 16105489 + 245522847 mm5.chr4 > 14280459 14280491 - 154141344 rn3.chr5 12802003 12802032 - 173106704 > canFam1.chr2 5413700 5413729 - 87725193 > GCCGGCCCAGCAAGATGAAACAG---GGCACCC > GTCAGCCTGGGGGGGGTCGGCAGCCTGGCACCC > GTCAGCCTGG---GAGTCGGCAGCCTGGCGCCC > GCCAGCCCAGCAAGATGAAACAG---GGTGGCC > >> NM_004070.0.1 hg17.chr1 16105490 16105510 + 245522847 mm5.chr4 > 14280499 14280519 - 154141344 rn3.chr5 12821881 12821900 - 173106704 > canFam1.chr2 5413730 5413751 - 87725193 > CAGCTGACCTGGTACTGAGGT- > CAGCTGCCATGGATCTGGGAT- > CAGCTAAGA-GCTGCAGAGGC- > CGGCCGCCCTGGTGAAGGAGAT > > Vesko > > > > > 2008/3/6, Kayla Smith : >> Hello Vesko, >> >> Yes, we have chain and net data between human and chimp on our website. >> To see the chimp chains and nets on human, open the hg18 browser, and >> scroll down to the "Comparative Genomics" section. You'll see "Chimp >> Chain" and "Chimp Net" which you can turn on. You can use our Table >> Browser ("Tables" on the blue bar on the top of the main page) to >> extract a subset of this data. And finally, to download this data in >> it's entirety, click on "Downloads" --> "Human" --> "Human/Chimp >> pairwise alignments" or click here: >> >> http://hgdownload.cse.ucsc.edu/goldenPath/hg18/vsPanTro2/ >> >> I hope this information is helpful to you. Please don't hesitate to >> contact us again if you require further assistance. >> >> >> Kayla Smith >> UCSC Genome Bioinformatics Group >> >> >> >> >> Vesselin Baev wrote: >> > Dear All, >> > Is there an already done genome-wide human-chimp alignment that I can >> > use for extracting portions of it with specified coordinates? >> > If not, what should I use to make such alignment and what to use to >> > extract regions of it with coordinates (Galaxy or?)? >> > >> > Vesko >> > >> > >> >> > > From jlwang at imcb.a-star.edu.sg Thu Mar 6 19:33:37 2008 From: jlwang at imcb.a-star.edu.sg (Wang JianLi) Date: Fri, 7 Mar 2008 11:33:37 +0800 Subject: [Genome] correction of chunk coordinates in blastz alignments References: <47C6F224.7020007@charite.de> <889F4F6D1CFC9945B3610F97CD400C15015323@PIGEONS-MA.imcb.a-star.edu.sg> <47CDF7AA.5060106@soe.ucsc.edu> <889F4F6D1CFC9945B3610F97CD400C1501532B@PIGEONS-MA.imcb.a-star.edu.sg> <47CFFBEA.4010605@soe.ucsc.edu> Message-ID: <889F4F6D1CFC9945B3610F97CD400C1501532D@PIGEONS-MA.imcb.a-star.edu.sg> Thank you. I have found the blastz-normalizeLav. But I didn't figure out what parameters I need to specify. Does the blastz-normalizeLav read all the .lav files at one run? Thanks. ________________________________ From: Hiram Clawson [mailto:hiram at soe.ucsc.edu] Sent: Thu 3/6/2008 10:12 PM To: Wang JianLi Cc: genome at soe.ucsc.edu Subject: Re: [Genome] correction of chunk coordinates in blastz alignments Good Morning J.L.: All of the tools required to run the blastz sequence are in the kent source tree. Fetch the source tree: http://genome.ucsc.edu/admin/cvs.html and then look at the file: src/hg/utils/automation/doBlastzChainNet.pl and library there: HgAutomate.pm See also articles about this subject in the genomewiki.ucsc.edu --Hiram Wang JianLi wrote: > > Hi All, > > I split human genomic sequences into chunks with size 10M, and overlap size 10K, and run blastz to align each chunk with another genome. Then I have a number of .lav files after the blastz. I am not sure how I can piece them together, to correct the coordicates of the sequence chunks. As in each chunk, the sequence coordinate will start from 1, but in the original chromosome sequence, this coordinate should be adjusted by the size of chunks and overlap region. > > I read that blastz-normalizeLav written by Scott Schwartz can do that. Does anybody know where to get that script? > > Thank you. > > Regards, > J.L Note: This message may contain confidential information. If this email has been sent to you by mistakes, please notify the sender and delete it immediately. Thank you. From whuang.ustc at gmail.com Thu Mar 6 20:04:27 2008 From: whuang.ustc at gmail.com (Wen Huang) Date: Thu, 6 Mar 2008 22:04:27 -0600 Subject: [Genome] Is the bovine genome refseq in UCSC genome browser up-to-date? Message-ID: <690AB1D0-15AD-4F67-AD14-77AA5D3612BB@gmail.com> Hi, I am just curious that the Refseq Genes Track of the Table browser of cow genome has only 10,452 genes. Do you keep up with the NCBI refseq database? Because EntrezGene (not refseq though, I don't find information about refseq count) has close to 30,000 genes. I understand that the annotation of bovine genome is far from complete. I just don't know when people do genomic analysis, do they actually use only these 10,000 genes? Thanks, Wen From jayuan2008 at yahoo.com Thu Mar 6 20:43:20 2008 From: jayuan2008 at yahoo.com (Yuan Jian) Date: Thu, 6 Mar 2008 20:43:20 -0800 (PST) Subject: [Genome] kgID Message-ID: <422574.3975.qm@web46006.mail.sp1.yahoo.com> hello UCSC, in Genome browser, how to show kgID such as uc001aaa.1 in the image? --------------------------------- Looking for last minute shopping deals? Find them fast with Yahoo! Search. From jlwang at imcb.a-star.edu.sg Thu Mar 6 23:09:05 2008 From: jlwang at imcb.a-star.edu.sg (Wang JianLi) Date: Fri, 7 Mar 2008 15:09:05 +0800 Subject: [Genome] nib file References: <47C6F224.7020007@charite.de> <889F4F6D1CFC9945B3610F97CD400C15015323@PIGEONS-MA.imcb.a-star.edu.sg> <47CDF7AA.5060106@soe.ucsc.edu> Message-ID: <889F4F6D1CFC9945B3610F97CD400C1501532E@PIGEONS-MA.imcb.a-star.edu.sg> Hi, I am running the chainToAxt program. I am not quite clear about the .nib files that the program asks for. Could somebody enlighten me about that? Thanks. Regards, J.L Note: This message may contain confidential information. If this email has been sent to you by mistakes, please notify the sender and delete it immediately. Thank you. From hefh at big.ac.cn Thu Mar 6 17:45:28 2008 From: hefh at big.ac.cn (Fuhong He) Date: Fri, 7 Mar 2008 09:45:28 +0800 Subject: [Genome] Questions on axtNet alignments Message-ID: <200803070945284215019@big.ac.cn> Dear sir, To our knowledge, the axtNet file axtNet/chr21.hg18.mm9.net.axt provides the chained and netted alignments of the best chain (hg18.mm9.all.chain.gz). But we noticed that a chain was segmented into small axtNet alignments. What is the criteria you used to segment the chain? Is this done by netToAxt programme? Thank you for your any help! Best regards! Yours sincerely, Fuhong From Joao.Fadista at agrsci.dk Fri Mar 7 02:34:20 2008 From: Joao.Fadista at agrsci.dk (=?iso-8859-1?Q?Jo=E3o_Fadista?=) Date: Fri, 7 Mar 2008 11:34:20 +0100 Subject: [Genome] add custom tracks Message-ID: Hi, I would like to know if it is possible to add a custom track with the sequences of my reads to be able to see SNPs in relation to the reference genome. Best regards Jo?o Fadista Ph.d. student UNIVERSITY OF AARHUS Faculty of Agricultural Sciences Dept. of Genetics and Biotechnology Blichers All? 20, P.O. BOX 50 DK-8830 Tjele Phone: +45 8999 1900 Direct: +45 8999 1900 E-mail: Joao.Fadista at agrsci.dk Web: www.agrsci.org ________________________________ DJF now offers new degree programmes . News and news media . This email may contain information that is confidential. Any use or publication of this email without written permission from Faculty of Agricultural Sciences is not allowed. If you are not the intended recipient, please notify Faculty of Agricultural Sciences immediately and delete this email. From hiram at soe.ucsc.edu Fri Mar 7 07:48:12 2008 From: hiram at soe.ucsc.edu (Hiram Clawson) Date: Fri, 07 Mar 2008 07:48:12 -0800 Subject: [Genome] nib file In-Reply-To: <889F4F6D1CFC9945B3610F97CD400C1501532E@PIGEONS-MA.imcb.a-star.edu.sg> References: <47C6F224.7020007@charite.de> <889F4F6D1CFC9945B3610F97CD400C15015323@PIGEONS-MA.imcb.a-star.edu.sg> <47CDF7AA.5060106@soe.ucsc.edu> <889F4F6D1CFC9945B3610F97CD400C1501532E@PIGEONS-MA.imcb.a-star.edu.sg> Message-ID: <47D163BC.5080005@soe.ucsc.edu> Good Morning J.L.: Please note the description of the file formats in the FAQ: 2bit: http://genome.ucsc.edu/FAQ/FAQformat#format7 and nib: http://genome.ucsc.edu/FAQ/FAQformat#format8 You want to use the 2bit file, it is more efficient. The 2bit files are available on the downloads server (which appears to be off-line at this moment ?) --Hiram Wang JianLi wrote: > > Hi, I am running the chainToAxt program. I am not quite clear about the .nib files that the program asks for. > > Could somebody enlighten me about that? > > Thanks. > > Regards, > J.L > > Note: This message may contain confidential information. If this email has been sent to you by mistakes, please notify the sender and delete it immediately. Thank you. > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome > From elizabetho at mail.utexas.edu Fri Mar 7 06:12:38 2008 From: elizabetho at mail.utexas.edu (Elizabeth Kahanek) Date: Fri, 07 Mar 2008 08:12:38 -0600 Subject: [Genome] CDT files for Acuity Message-ID: Hello, Molecular Dynamics indicates your Genome Browser can be used to download data for creating CDT (chromosome descriptor) files for their Acuity software. I have tried following their instructions, but they don't seem to work, primarily because the main identifier for each row of data is a character string whose identity I cannot determine. (The string begins with "uc00".) Do you have a protocol for creating CDT files using your browser? Thanks for any help you can provide. Elizabeth Kahanek -- Elizabeth Osterndorff-Kahanek Waggoner Center for Alcohol and Addiction Research University of Texas at Austin 2500 Speedway MBB 1.124, A4800 Austin, TX 78712 From liun at mskcc.org Fri Mar 7 08:20:29 2008 From: liun at mskcc.org (Na Liu) Date: Fri, 7 Mar 2008 11:20:29 -0500 Subject: [Genome] uncertain annotations in UCSC Message-ID: Dear professors, I want to obtain all FB elements information of Drosophila melanogaster from UCSC. I am not sure if my extracting way is correct because the results are suspectable: firstly , I choose 'Variation and Repeats ' in the group box by using TableBrowser. Below, at the output format box, I choose "all fields from selected table". Then I obtained a long list. I notice there are some entries annotated as FB4_DM. Are they meant FB elements? Why do you name them FB4, not FB? What do you mean by the number '4'? They are suspectable because some of them are very short (may be ~30nt, 40nt, 50nt,....)and can not form a hairpin structure(according to the definition, FB element has long inverted terminal repeats). If my extracting method is not correct, could you please tell me the correct one? Look forward to your reply sincerely .!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Look forward to your reply sincerely .!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Look forward to your reply sincerely .!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Best Na ===================================================================== Please note that this e-mail and any files transmitted with it may be privileged, confidential, and protected from disclosure under applicable law. If the reader of this message is not the intended recipient, or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any reading, dissemination, distribution, copying, or other use of this communication or any of its attachments is strictly prohibited. If you have received this communication in error, please notify the sender immediately by replying to this message and deleting this message, any attachments, and all copies and backups from your computer. From stefanie.figura at gmx.de Fri Mar 7 02:22:08 2008 From: stefanie.figura at gmx.de (Stefanie Figura) Date: Fri, 07 Mar 2008 11:22:08 +0100 Subject: [Genome] tfbsConsSites and TRANSFAC annotation Message-ID: <20080307102208.163250@gmx.net> Dear All! Is there any file which can be downloaded to get the connection for all TRANSFAC Matrices like V$LMO2COM_02 -> M00278?? I need the M00xxx annotation, so I would be very glad if somebody can help me. Thanks in advance! Best regards, Stefanie -- Der GMX SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen! Ideal f?r Modem und ISDN: http://www.gmx.net/de/go/smartsurfer From kuhn at soe.ucsc.edu Fri Mar 7 08:51:04 2008 From: kuhn at soe.ucsc.edu (Robert Kuhn) Date: Fri, 7 Mar 2008 08:51:04 -0800 Subject: [Genome] add custom tracks Message-ID: <200803071651.IAA18773@moondance.cse.ucsc.edu> Joao, That is an excellent suggestion. It is not possible at the moment, but we will add it to our list of things we'd like to introduce into the Browser. I'm sorry that I can't give you an idea of if/when we'd be able to implement it. best wishes, --b0b kuhn ucsc genome bioinformatics group > From genome-bounces at soe.ucsc.edu Fri Mar 7 02:34:44 2008 > To: > Subject: [Genome] add custom tracks > > Hi, > > I would like to know if it is possible to add a custom track with the sequences of my reads to be able to see SNPs in relation to the reference genome. > > > > Best regards > > Jo?o Fadista > Ph.d. student > > > > UNIVERSITY OF AARHUS > Faculty of Agricultural Sciences > Dept. of Genetics and Biotechnology > Blichers All? 20, P.O. BOX 50 > DK-8830 Tjele > > Phone: +45 8999 1900 > Direct: +45 8999 1900 > E-mail: Joao.Fadista at agrsci.dk > Web: www.agrsci.org > ________________________________ > > DJF now offers new degree programmes . > > News and news media . > > This email may contain information that is confidential. Any use or publication of this email without written permission from Faculty of Agricultural Sciences is not allowed. If you are not the intended recipient, please notify Faculty of Agricultural Sciences immediately and delete this email. > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome > From Xianjun.Dong at bccs.uib.no Fri Mar 7 08:51:49 2008 From: Xianjun.Dong at bccs.uib.no (Xianjun Dong) Date: Fri, 07 Mar 2008 17:51:49 +0100 Subject: [Genome] how scaffold-->chrUn? Message-ID: <47D172A5.6030901@ii.uib.no> hi, How can you create chrUn by scaffolds? Is there a way to get the coordinate on chrUn for a region on some scaffold? i.e. scaffold212:12345-67890 Thanks Xianjun -- --------------------------- Sterding (Xianjun) Dong PhD student, Boris Lenhard's group Bergen Center of Computational Science Bergen University, Norway Mobile: 0047-47361688 Telephone: 0047-55276381 Skype: xianjun.dong From kuhn at soe.ucsc.edu Fri Mar 7 09:43:04 2008 From: kuhn at soe.ucsc.edu (Robert Kuhn) Date: Fri, 7 Mar 2008 09:43:04 -0800 Subject: [Genome] kgID Message-ID: <200803071743.JAA22417@moondance.cse.ucsc.edu> Hello, The kgID can be displayed by visiting the configuration page for the KnownGenes track. You can reach this page in two ways: Either click on the small minibutton at the left side of the track in the track display image, or in the track controls below the image, click on the name of the track, Known Genes, above the menu pulldown in the Genes and Gene Prediction track group. Choose "UCSC Known Gene ID." best wishes, --b0b kuhn ucsc genome bioinformatics group From angie at soe.ucsc.edu Fri Mar 7 10:06:30 2008 From: angie at soe.ucsc.edu (Angie Hinrichs) Date: Fri, 7 Mar 2008 10:06:30 -0800 (PST) Subject: [Genome] uncertain annotations in UCSC In-Reply-To: References: Message-ID: Hi Na, UCSC does not assign the names; we simply run RepeatMasker on the genome and display its results. RepeatMasker works by aligning consensus sequences from a library file to the genome. The library file is the source of the repeat name, class and family annotated by RepeatMasker. The library file is owned by RepBase Update (GIRI), but can be viewed after completing a registration process. To retrieve the library file, visit this web page: http://www.girinst.org/repbase/index.html On the left there is a "Free registration" link. After you have completed the registration process, the information in RepBase Update and/or RepBase Reports may be helpful. Best wishes for your research, Angie On Fri, 7 Mar 2008, Na Liu wrote: > Dear professors, > > I want to obtain all FB elements information of Drosophila > melanogaster from UCSC. I am not sure if my extracting way is > correct because the results are suspectable: > > firstly , I choose 'Variation and Repeats ' in the group box by > using TableBrowser. Below, at the output format box, I choose "all > fields from selected table". > > Then I obtained a long list. I notice there are some entries > annotated as FB4_DM. Are they meant FB elements? Why do you name them > FB4, not FB? What do you mean by the number '4'? They are > suspectable because some of them are very short (may be ~30nt, 40nt, > 50nt,....)and can not form a hairpin structure(according to the > definition, FB element has long inverted terminal repeats). > > If my extracting method is not correct, could you please tell me the > correct one? > > Look forward to your reply > sincerely .!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! > Look forward to your reply > sincerely .!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! > Look forward to your reply > sincerely .!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! > > Best > Na > > > > ===================================================================== > > Please note that this e-mail and any files transmitted with it may be > privileged, confidential, and protected from disclosure under > applicable law. If the reader of this message is not the intended > recipient, or an employee or agent responsible for delivering this > message to the intended recipient, you are hereby notified that any > reading, dissemination, distribution, copying, or other use of this > communication or any of its attachments is strictly prohibited. If > you have received this communication in error, please notify the > sender immediately by replying to this message and deleting this > message, any attachments, and all copies and backups from your > computer. > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From ann at soe.ucsc.edu Fri Mar 7 10:25:03 2008 From: ann at soe.ucsc.edu (Ann Zweig) Date: Fri, 07 Mar 2008 10:25:03 -0800 Subject: [Genome] Is the bovine genome refseq in UCSC genome browser up-to-date? In-Reply-To: <690AB1D0-15AD-4F67-AD14-77AA5D3612BB@gmail.com> References: <690AB1D0-15AD-4F67-AD14-77AA5D3612BB@gmail.com> Message-ID: <47D1887F.3010107@cse.ucsc.edu> Hello Wen, We do keep up-to-date with RefSeq Genes; we download and display new data from NCBI every night. The latest cow assembly (bosTau3: Aug. 2006), has just over 10,000 genes. This is true on our website (in the RefSeq Gene track) as well as on the NCBI website. I'm not sure where you are seeing the count of 30,000. You may want to visit the Human Genome Sequencing Center Bos taurus website: http://www.hgsc.bcm.tmc.edu/projects/bovine/ for more information. Regards, ---------- Ann Zweig UCSC Genome Bioinformatics Group http://genome.ucsc.edu Wen Huang wrote: > Hi, > > I am just curious that the Refseq Genes Track of the Table browser of > cow genome has only 10,452 genes. Do you keep up with the NCBI refseq > database? Because EntrezGene (not refseq though, I don't find > information about refseq count) has close to 30,000 genes. I > understand that the annotation of bovine genome is far from complete. > I just don't know when people do genomic analysis, do they actually > use only these 10,000 genes? > > Thanks, > Wen > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From hiram at soe.ucsc.edu Fri Mar 7 11:07:40 2008 From: hiram at soe.ucsc.edu (Hiram Clawson) Date: Fri, 07 Mar 2008 11:07:40 -0800 Subject: [Genome] Questions on axtNet alignments In-Reply-To: <200803070945284215019@big.ac.cn> References: <200803070945284215019@big.ac.cn> Message-ID: <47D1927C.8050505@soe.ucsc.edu> Good Morning Fuhong: You can read about the chain and net process in the paper: http://www.pnas.org/cgi/content/full/100/20/11484 Evolution's cauldron: Duplication, deletion, and rearrangement in the mouse and human genomes W. James Kent, Robert Baertsch, Angie Hinrichs, Webb Miller, and David Haussler PNAS | September 30, 2003 | vol. 100 | no. 20 | 11484-11489 And more discussion in the genomewiki: http://genomewiki.ucsc.edu/index.php/Chains_Nets http://genomewiki.ucsc.edu/index.php/Whole_genome_alignment_howto --Hiram Fuhong He wrote: > Dear sir, > > To our knowledge, the axtNet file axtNet/chr21.hg18.mm9.net.axt provides the chained and netted alignments of the best chain (hg18.mm9.all.chain.gz). But we noticed that a chain was segmented into small axtNet alignments. What is the criteria you used to segment the chain? Is this done by netToAxt programme? > > Thank you for your any help! > > Best regards! > > > Yours sincerely, > Fuhong From angie at soe.ucsc.edu Fri Mar 7 11:21:56 2008 From: angie at soe.ucsc.edu (Angie Hinrichs) Date: Fri, 7 Mar 2008 11:21:56 -0800 (PST) Subject: [Genome] uncertain annotations in UCSC In-Reply-To: <4D73B3FC-BFD3-4E30-8D3B-59AE7C928D54@mskcc.org> References: <4D73B3FC-BFD3-4E30-8D3B-59AE7C928D54@mskcc.org> Message-ID: