From Joao.Fadista at agrsci.dk Fri Feb 1 03:49:36 2008 From: Joao.Fadista at agrsci.dk (=?iso-8859-1?Q?Jo=E3o_Fadista?=) Date: Fri, 1 Feb 2008 12:49:36 +0100 Subject: [Genome] karyoview and retrieving features Message-ID: Hello, I would like to know if the UCSC (or maybe another website) has a similar tool to the karyoview tool in ensembl but with an extra function: I can decide the size of each chromosome. I would also like to know if there is a tool where I can paste a fasta sequence file and then the output would be like the number and type of repeats there, the %GC content and other possible features as well. Best regards Jo?o Fadista Ph.d. student UNIVERSITY OF AARHUS Faculty of Agricultural Sciences Dept. of Genetics and Biotechnology Blichers All? 20, P.O. BOX 50 DK-8830 Tjele Phone: +45 8999 1900 Direct: +45 8999 1900 E-mail: Joao.Fadista at agrsci.dk Web: www.agrsci.org ________________________________ News and news media . This email may contain information that is confidential. Any use or publication of this email without written permission from Faculty of Agricultural Sciences is not allowed. If you are not the intended recipient, please notify Faculty of Agricultural Sciences immediately and delete this email. From angelique.chauvin at free.fr Fri Feb 1 05:22:43 2008 From: angelique.chauvin at free.fr (angelique.chauvin at free.fr) Date: Fri, 01 Feb 2008 14:22:43 +0100 Subject: [Genome] Informations request Message-ID: <1201872163.47a31d2301f2a@imp.free.fr> I'm working on a PCR technique and I have a lot of primer to test. I've thus made a program to treat this data by submission on your site. I've received this message from your server : There is a very high volume of traffic coming from your site (IP address 81.80.225.127) as of Thu Jan 31 08:06:25 2008 (California time). So that other users get a fair share of our bandwidth, we are putting in a delay of 10.2 seconds before we service your request. This delay will slowly decrease over a half hour as activity returns to normal. This high volume of traffic is likely due to program-driven rather than interactive access, or the submission of queries on a large number of sequences. If you are making large batch queries, please write to our genome at cse.ucsc.edu public mailing list and inquire about more efficient ways to access our data. If you are sharing an IP address with someone who is sumitting large batch queries, we apologize for the inconvenience. Please contact genome-www at cse.ucsc.edu if you think this delay is being imposed unfairly. I contact you to have more informations. I will not often use this program but I'm interessed in more efficient ways to access to your data. Thank you in advance for your response Ang?lique CHAUVIN angelique.chauvin at free.fr Laboratoire de genetique moleculaire INSERM U613 BREST From Richard.Dixon at medical-solutions.co.uk Fri Feb 1 07:40:41 2008 From: Richard.Dixon at medical-solutions.co.uk (Richard Dixon) Date: Fri, 1 Feb 2008 15:40:41 -0000 Subject: [Genome] human 3' utr seqs that do not cross hyb to orthologous rat gene seqs Message-ID: <8EB869FD03D63F499B0C7D1BE3CEA50B930B3D@palm.medical-solutions.co.uk> Dear Genome people, Please could someone tell me if it is possible to use the UCSC browser to solve my task. I have a list of ~1500 human genes for which I want to find the 3'utr seqs (~70-100bps) that are NOT similar (eg :<75% similar) to the orthologous Rat gene seq. This is to design human oligo probes that will not cross hyb with rat genes. I would really appreciate your advice on this Rick Dr. Richard Dixon Medical Solutions plc 1 Orchard Place Nottingham Business Park Nottingham, NG8 6PX United Kingdom Phone: +44 (0)115 973 9027 Email: richard.dixon at medical-solutions.co.uk Web: http://www.medical-solutions.co.uk/ Fax: +44 (0)115 973 9021 This email has been sent from Medical Solutions plc, a company registered in England and Wales, or from one of the companies within its control, including Medical Solutions (Nottingham) Limited and Geneservice Limited. The information in this email and any attachments are confidential and must not be copied, distributed or made available to anyone other than the intended addressee. If you receive this communication in error, please notify the sender as soon as possible and destroy the original email. When expressed to our clients any opinions or advice contained in this email are subject to the terms and conditions expressed in the governing services agreement. Please note that, whilst we try to ensure that attachments are virus-free, we cannot accept responsibility for situations where this is not the case. Medical Solutions plc (registered number 79136); Medical Solutions (Nottingham) Limited (registered number 4078501) and Geneservice Limited (registered number 5355417). All companies registered in England and Wales, registered offices 1 Orchard Place, Nottingham Business Park, Nottingham NG8 6PX United Kingdom. www.medical-solutions.co.uk From sabrown at uvm.edu Fri Feb 1 09:04:23 2008 From: sabrown at uvm.edu (Stephen Brown) Date: Fri, 1 Feb 2008 12:04:23 -0500 Subject: [Genome] sequence discrepancy Message-ID: <557c732730f41b776cc3932b2329446e@uvm.edu> Dear All: I have found a rat gene (GAJ1) where electronic PCR on the UCSC browser yields a product length (with the primers i've chosen) of 429 bases. PCR from the cDNA/mRNA yields a predicted length of 498. i have not yet done the PCR to see what the truth is, but i assume it will be 498. primers in question are: CTGCGAAAACGTCTGCTATG GGACGTGAGAGGAAGCAGTC its really not that important, but i wondered if you (pl) would have insight into what happened to the ~70 bases?? thanks steve brown University of Vermont From kuhn at soe.ucsc.edu Fri Feb 1 09:13:57 2008 From: kuhn at soe.ucsc.edu (Robert Kuhn) Date: Fri, 1 Feb 2008 09:13:57 -0800 Subject: [Genome] Dealay warning Message-ID: <200802011713.JAA11936@sundance.cse.ucsc.edu> Hello, John, The warning is indeed, as stated, due to traffic specific to your IP. If you get such messages again, please send your IP with your message (offline to me would be ok), so we can investigate. Is it possible that you are sharing an IP though a firewall? In that case, we might be seeing your personal traffic added to that of others at your institution. We have a collection of scripts on our wiki page that allow automated blat activity at a level that does not trigger the delay: http://genomewiki.cse.ucsc.edu/index.php/Blat_Scripts best wishes, --b0b kuhn ucsc genome bioinformatics group > From genome-bounces at soe.ucsc.edu Thu Jan 31 15:10:00 2008 > To: genome at soe.ucsc.edu > Subject: [Genome] Dealay warning > > Using Blat (maximum 20 AA sequences on a single form) manually, I'm > getting sporadic high traffic and delay warnings. I have been using such > searches over months without difficulty. My IP is not shared with any > other user, and there is definitely not high traffic to UCSC. > > Is this a response to high general traffic, rather than as worded, > specific to my IP? > > John Edwards > > > > > > > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome > From hiram at soe.ucsc.edu Fri Feb 1 10:15:36 2008 From: hiram at soe.ucsc.edu (Hiram Clawson) Date: Fri, 01 Feb 2008 10:15:36 -0800 Subject: [Genome] karyoview and retrieving features In-Reply-To: References: Message-ID: <47A361C8.2020304@soe.ucsc.edu> Good Morning Jo?o: The "karyotype" equivalent function at the UCSC genome browser are "custom tracks", see also: http://genome.ucsc.edu/goldenPath/help/customTrack.html If you have sequence and you want to find out where it is in the genome, use the blat tool at the UCSC genome browser: http://genome.ucsc.edu/cgi-bin/hgBlat This will locate your sequence in the genome and you can use the genome browser to examine features in that area, including the GC%. See also: http://genome.ucsc.edu/goldenPath/help/hgTracksHelp.html If you would like to repeat mask some sequence, the WEB site: http://www.girinst.org/censor/index.php can repeat mask sequence. --Hiram Jo?o Fadista wrote: > Hello, > > I would like to know if the UCSC (or maybe another website) has a > similar tool to the karyoview tool in ensembl but with an extra > function: I can decide the size of each chromosome. I would also > like to know if there is a tool where I can paste a fasta sequence > file and then the output would be like the number and type of repeats > there, the %GC content and other possible features as well. > > > Best regards > > Jo?o Fadista > Ph.d. student From angie at soe.ucsc.edu Fri Feb 1 10:13:40 2008 From: angie at soe.ucsc.edu (angie at soe.ucsc.edu) Date: Fri, 1 Feb 2008 10:13:40 -0800 (PST) Subject: [Genome] interupted repeats track in UCSC genome browser In-Reply-To: <002c01c86427$f6538530$aa5e4284@gilastlab8> References: <002c01c86427$f6538530$aa5e4284@gilastlab8> Message-ID: <3109.76.105.6.155.1201889620.squirrel@webmail.soe.ucsc.edu> Hi Asaf, The RepeatMasker ID is a number in the last column of .out that normally increments every line -- but when RepeatMasker sees fragments of the same repeat that appear to be from the same original insertion, it assigns the same ID to all fragments. To build our Interrupted Repeats, we use a script (attached) to identify fragments joined by common IDs. I think the repeatmasker.org website has a more detailed display that explicitly shows nesting of the repeats when multiple repeat insertions have broken up prior insertions. For more information about how RepeatMasker decides to assign the common IDs to identify fragments of original insertions, I suggest you contact the authors, Arian Smit and Robert Hubley (repeatmasker.org). Robert in particular did a lot of work for the IDs/nesting, and asked us to add the Interrupted Repeats track. Your MLT1C example does look pretty long, but common IDs are in fact used in the RepeatMasker.out file. You can download the .out files here: http://hgdownload.cse.ucsc.edu/goldenPath/hg18/bigZips/chromOut.zip (generally I would recommend ftp:// instead of http:// there, but at the moment our ftp server seems to be oversubscribed) In chr1.fa.out, these lines show the common IDs that join the MLT1C in your example (ID=56 in final column): 1438 21.8 10.6 3.9 chr1 24070757 24070927 (223178792) + MLT1C LTR/MaLR 6 192 (275) 56 2227 11.4 0.0 0.0 chr1 24070928 24071233 (223178486) + AluSx SINE/Alu 1 306 (6) 57 1438 21.8 10.6 3.9 chr1 24071234 24071477 (223178242) + MLT1C LTR/MaLR 193 449 (18) 56 ... 2535 4.7 0.0 0.7 chr1 24127081 24127381 (223122338) C AluY SINE/Alu (12) 299 1 219 562 20.4 1.1 4.9 chr1 24127444 24127595 (223122124) + MLT1C LTR/MaLR 418 593 (0) 56 2118 10.9 1.7 0.7 chr1 24127646 24127941 (223121778) + AluSx SINE/Alu 1 299 (13) 220 Our database tables store only the first character of the final column, because that column used to contain ' ' or '-' back when the table format was defined. So unfortunately the IDs can't be retrieved from the rmsk database tables, only from the RepeatMasker .out files. Hope that helps, Angie > Hi > > I would like to receive detailed information about how the interrupted > repeats track in human hg18 build was constructed. > > I don't understand what exactly the repeat masker id is, that you mention > in > the description of this track. Since I am using this data as the base for > a > whole genomic bioinformatic research, I would like to receive an answer > from > the people who knows well this issue. > > I also found a few cases which seem like bugs in this track, for example: > in > the region of human hg18 chr1:24,070,000-24,130,000 there is MLT1C > interrupted repeat stretching over 60K bp. This seems quite unusual > distance > to my opinion. I don't believe that there is actual connection between the > 2 > parts of this repeat. I saw that 25% of the interrupted repeats are> 2000 > nts, with the largest stretching for 463K bases. I have this geeling that > too long interrupted repeats might be wrong. > > > > Regards, > > Asaf Levy > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome > From ann at soe.ucsc.edu Fri Feb 1 10:53:45 2008 From: ann at soe.ucsc.edu (Ann Zweig) Date: Fri, 01 Feb 2008 10:53:45 -0800 Subject: [Genome] conversion of genome coordinates between genome assemblies In-Reply-To: <47d9979f0801312051m21db5e78g99db2b726a426de4@mail.gmail.com> References: <47d9979f0801312051m21db5e78g99db2b726a426de4@mail.gmail.com> Message-ID: <47A36AB9.70303@cse.ucsc.edu> Hello Bogdan, The LiftOver tool on the website will help you with this task. This tool converts genome coordinates and genome annotation files between assemblies. The input data can be pasted into the text box, or uploaded from a file. You will find the tool here: http://genome.ucsc.edu/cgi-bin/hgLiftOver There is no direct lift from mm7 to mm9, but you can do a double-lift: first from mm7 (Aug 2005) to mm8 (Feb 2006), then from mm8 to mm9 (July 2007). Paste all of your input (predicted binding site locations) into the large text box in either position format (e.g. chr1:111111-222222) or BED format (e.g. chr1 111111 222222 itemName). I hope this information is helpful to you. Please don't hesitate to contact the mail list again if you require further assistance. Regards, ---------- Ann Zweig UCSC Genome Bioinformatics Group http://genome.ucsc.edu Please feel free to search the Genome mailing list archives by visiting our home page, clicking on "Contact Us", then typing a word or phrase into the search box. On that same page (http://genome.ucsc.edu/contacts.html), you can subscribe to the Genome mailing list. Bogdan Tanasa wrote: > Hello hello. I would like to have your suggestions on the following: > > I do have a set of predicted binding sites of transcription factors in > mouse genome assembly 2005 (mm7). I would like to have the > coordinates of the predicted binding sites in mouse mm9 assembly. > > I am looking for an efficient and quick way to convert the coordinates > between these mouse genome assemblies. I would appreciate your > suggestions. Thanks ! > > Bogdan > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From Kenneth_Wan at lbl.gov Fri Feb 1 11:26:03 2008 From: Kenneth_Wan at lbl.gov (Kenneth Wan) Date: Fri, 1 Feb 2008 11:26:03 -0800 Subject: [Genome] Genome Browser - Affy data for D. Melanogaster Release 5 Message-ID: <19E385F7-1F9D-4460-8EE6-55557A79CD61@lbl.gov> Dear UCSC, Do you have a timeline for incorporating the Affymetrix microarray data for the Expression and Regulation tracks for the latest release of the D. melanogaster genome browser? This data is extremely helpful for us to analyze our EST/cDNA data. Thanks, Ken Kenneth H. Wan Berkeley Drosophila Genome Project Lawrence Berkeley National Laboratory 1 Cyclotron Road; MS 64-119 Berkeley, CA 94720 510-486-4456 510-486-6798 (fax) http://oskibear95.blogspot.com From jim at psi.utoronto.ca Fri Feb 1 16:50:22 2008 From: jim at psi.utoronto.ca (Jim Huang) Date: Fri, 1 Feb 2008 14:50:22 -1000 Subject: [Genome] Conversion between hg17 and hg18 coordinates Message-ID: <43543365-727B-4176-9BA8-6128D18FFB62@psi.toronto.edu> Hi, Do you have a lift file handy of all conversions from hg17 to hg18? Thanks, Jim C. Huang Ph.D. candidate Probabilistic and Statistical Inference Group Department of Electrical and Computer Engineering University of Toronto http://www.psi.toronto.edu/~jim/ From jim at psi.utoronto.ca Fri Feb 1 17:32:50 2008 From: jim at psi.utoronto.ca (Jim Huang) Date: Fri, 1 Feb 2008 15:32:50 -1000 Subject: [Genome] Conversion from hg17 to mm9 coordinates Message-ID: <26956300-CAA6-44A4-85FA-A4770F94A9CD@psi.toronto.edu> Hi, Regarding my previous email, I was in fact asking for a liftOver file from hg17 to mm9? Sorry for the confusion, Jim C. Huang Ph.D. candidate Probabilistic and Statistical Inference Group Department of Electrical and Computer Engineering University of Toronto http://www.psi.toronto.edu/~jim/ From ann at soe.ucsc.edu Fri Feb 1 13:28:18 2008 From: ann at soe.ucsc.edu (Ann Zweig) Date: Fri, 01 Feb 2008 13:28:18 -0800 Subject: [Genome] Conversion from hg17 to mm9 coordinates In-Reply-To: <26956300-CAA6-44A4-85FA-A4770F94A9CD@psi.toronto.edu> References: <26956300-CAA6-44A4-85FA-A4770F94A9CD@psi.toronto.edu> Message-ID: <47A38EF2.2060805@cse.ucsc.edu> Hello Jim, There is no direct liftOver file from hg17 to mm9. I suggest you do a double-lift: hg17 -> hg18, then hg18 -> mm9. Regards, ---------- Ann Zweig UCSC Genome Bioinformatics Group http://genome.ucsc.edu Jim Huang wrote: > Hi, > > Regarding my previous email, I was in fact asking for a liftOver file > from hg17 to mm9? > > Sorry for the confusion, > > Jim C. Huang > Ph.D. candidate > Probabilistic and Statistical Inference Group > Department of Electrical and Computer Engineering > University of Toronto > http://www.psi.toronto.edu/~jim/ > > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From ann at soe.ucsc.edu Fri Feb 1 13:58:38 2008 From: ann at soe.ucsc.edu (Ann Zweig) Date: Fri, 01 Feb 2008 13:58:38 -0800 Subject: [Genome] Informations request In-Reply-To: <1201872163.47a31d2301f2a@imp.free.fr> References: <1201872163.47a31d2301f2a@imp.free.fr> Message-ID: <47A3960E.2000305@cse.ucsc.edu> Hello Ang?lique, Our software automatically detects usage that exceeds our restrictions and places a block on that site. This block will be automatically removed (most likely sometime today). Our usage restrictions, which are posted on our home page and in the FAQ, limit program-driven use to a maximum of one hit every 15 seconds and no more than 5,000 hits per day. If you tell us exactly what you're trying to retrieve from our site, we can recommend the best way to obtain the data without exceeding our limits. Regards, ---------- Ann Zweig UCSC Genome Bioinformatics Group http://genome.ucsc.edu Please feel free to search the Genome mailing list archives by visiting our home page, clicking on "Contact Us", then typing a word or phrase into the search box. On that same page (http://genome.ucsc.edu/contacts.html), you can subscribe to the Genome mailing list. angelique.chauvin at free.fr wrote: > I'm working on a PCR technique and I have a lot of primer to test. I've thus > made a program to treat this data by submission on your site. > I've received this message from your server : > > There is a very high volume of traffic coming from your site (IP address > 81.80.225.127) as of Thu Jan 31 08:06:25 2008 > (California time). So that other users get a fair share of our bandwidth, we > are putting in a delay of 10.2 seconds before we service your request. This > delay will slowly decrease over a half hour as activity returns to normal. This > high volume of traffic is likely due to program-driven rather than interactive > access, or the submission of queries on a large number of sequences. If you are > making large batch queries, please write to our genome at cse.ucsc.edu public > mailing list and inquire about more efficient ways to access our data. If you > are sharing an IP address with someone who is sumitting large batch queries, we > apologize for the inconvenience. Please contact genome-www at cse.ucsc.edu if you > think this delay is being imposed unfairly. > > I contact you to have more informations. I will not often use this program but > I'm interessed in more efficient ways to access to your data. > > Thank you in advance for your response > > Ang?lique CHAUVIN > angelique.chauvin at free.fr > Laboratoire de genetique moleculaire > INSERM U613 > BREST > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From galt at soe.ucsc.edu Fri Feb 1 14:05:32 2008 From: galt at soe.ucsc.edu (Galt Barber) Date: Fri, 1 Feb 2008 14:05:32 -0800 (PST) Subject: [Genome] Question on the exon structure In-Reply-To: <9BEE7CC4462DB14997A5C8CF8F3BEB0201BEACDB@ussemx1100.merck.com> References: <9BEE7CC4462DB14997A5C8CF8F3BEB0201BEAC79@ussemx1100.merck.com> <9BEE7CC4462DB14997A5C8CF8F3BEB0201BEACDB@ussemx1100.merck.com> Message-ID: Hi, Jin We think you may be able to do this with a BED 9 custom track. You can give each set a different color. All probes in the same set will share the same color. http://genome.ucsc.edu/goldenPath/help/hgTracksHelp.html#BED I haven't actually seen your probe data, so this is just an example. You would probably set fields as follows: chrom (col 1) e.g. chr22 chromStart (col 2) e.g. 20001 chromEnd (col 3) e.g. 20200 name (col 4) e.g. coolProbeId999 score (col 5) just set to 0 if unneeded strand (col 6) just set to + if unneeded thickStart (col 7 = col 2) set to chromStart thickEnd (col 8 = col 3) set to chromEnd itemRgb (col 9 = r,g,b) where r,g,b are range 0 to 255. e.g. 255,0,255 = red and blue = purple You need to set the custom track line attribute itemRgb=On. http://genome.ucsc.edu/goldenPath/help/hgTracksHelp.html#CustomTracks This should allow you to color your sets. The system will probably work better if you can stick to fewer than 10 different colors. -Galt On Fri, 1 Feb 2008, Ma, Jin wrote: > Galt, > > Thanks for your reply. We are trying to display probe sets together. If > we load each individual probe as a separate sequence, it is difficult to > show which probes belong to the same set. So one way of doing that, we > think, would be to use the whole probe set as the sequence and treat > each probe as the exon. However, the central line makes it really look > like the exon, which is confusing for the user because we only want them > to be grouped but not "linked". Is there a way (parameter setting?) we > can remove the central line? Or is there any suggestion of a different > way of doing it? We need to apply it to only one track. Thanks so much > for your time and help. > > Jin > > > -----Original Message----- > From: Galt Barber [mailto:galt at soe.ucsc.edu] > Sent: Thursday, January 31, 2008 6:34 PM > To: Ma, Jin > Cc: genome at soe.ucsc.edu > Subject: Re: [Genome] Question on the exon structure > > > Could you please explain more about what you're trying to do? > > Do you need it to apply to everything or to just one > custom/customized track? > > -Galt > > > On Thu, 31 Jan 2008, Ma, Jin wrote: > > > Hi, > > > > I need your help on a question. Is there a way to remove or > > significantly lighten the central line connecting all the exons for a > > sequence (one row in the BED file)? > > > > Jin > > > > > > > > > ------------------------------------------------------------------------ > ------ > > Notice: This e-mail message, together with any attachments, contains > > information of Merck & Co., Inc. (One Merck Drive, Whitehouse Station, > > New Jersey, USA 08889), and/or its affiliates (which may be known > > outside the United States as Merck Frosst, Merck Sharp & Dohme or MSD > > and in Japan, as Banyu - direct contact information for affiliates is > > available at http://www.merck.com/contact/contacts.html) that may be > > confidential, proprietary copyrighted and/or legally privileged. It is > > intended solely for the use of the individual or entity named on this > > message. If you are not the intended recipient, and have received this > > message in error, please notify us immediately by reply e-mail and > then > > delete it from your system. > > > > > ------------------------------------------------------------------------ > ------ > > _______________________________________________ > > Genome maillist - Genome at soe.ucsc.edu > > http://www.soe.ucsc.edu/mailman/listinfo/genome > > > > > > > ------------------------------------------------------------------------------ > Notice: This e-mail message, together with any attachments, contains > information of Merck & Co., Inc. (One Merck Drive, Whitehouse Station, > New Jersey, USA 08889), and/or its affiliates (which may be known > outside the United States as Merck Frosst, Merck Sharp & Dohme or MSD > and in Japan, as Banyu - direct contact information for affiliates is > available at http://www.merck.com/contact/contacts.html) that may be > confidential, proprietary copyrighted and/or legally privileged. It is > intended solely for the use of the individual or entity named on this > message. If you are not the intended recipient, and have received this > message in error, please notify us immediately by reply e-mail and then > delete it from your system. > > ------------------------------------------------------------------------------ > From ann at soe.ucsc.edu Fri Feb 1 16:05:19 2008 From: ann at soe.ucsc.edu (Ann Zweig) Date: Fri, 01 Feb 2008 16:05:19 -0800 Subject: [Genome] human 3' utr seqs that do not cross hyb to orthologous rat gene seqs In-Reply-To: <8EB869FD03D63F499B0C7D1BE3CEA50B930B3D@palm.medical-solutions.co.uk> References: <8EB869FD03D63F499B0C7D1BE3CEA50B930B3D@palm.medical-solutions.co.uk> Message-ID: <47A3B3BF.4020509@cse.ucsc.edu> Hello Richard, Yes, you can do this using tools available on the UCSC Browser website. The general steps will include getting the 3' UTR positions you are interested in, using the liftOver tool to lift those positions to the rat genome, getting the sequence for both human and rat, then running an aligner on the two sequences. You can get the 3' UTR positions for your genes of interest by using the Table Browser tool on our website ('Tables' from the top blue navigation bar). Choose your gene set (e.g. RefSeq or UCSC Known Genes), then enter your gene names in the "identifiers (names/accessions)" field. Choose BED as your output format. This will give you positional information for the 3' UTRs of all of your genes in human. To convert these positions to rat, use the liftOver tool: http://genome.ucsc.edu/cgi-bin/hgLiftOver Get the underlying sequence for both human and rat by creating a custom track, then choosing that custom track in the Table Browser to get the sequence. See this FAQ for more details on this step: http://genome.ucsc.edu/FAQ/FAQdownloads#download32. Once you have the sequence for both human and rat, use an alignment tool to align the sequence and filter on percent similarity. This should be enough to get you started. If you need more detailed instruction, don't hesitate to write back to the list. Regards, ---------- Ann Zweig UCSC Genome Bioinformatics Group http://genome.ucsc.edu Please feel free to search the Genome mailing list archives by visiting our home page, clicking on "Contact Us", then typing a word or phrase into the search box. On that same page (http://genome.ucsc.edu/contacts.html), you can subscribe to the Genome mailing list. Richard Dixon wrote: > Dear Genome people, > > > > Please could someone tell me if it is possible to use the UCSC browser > to solve my task. > > > > I have a list of ~1500 human genes for which I want to find the 3'utr > seqs (~70-100bps) that are NOT similar (eg :<75% similar) to the > orthologous Rat gene seq. > > > > This is to design human oligo probes that will not cross hyb with rat > genes. > > > > I would really appreciate your advice on this > > > > Rick > > > > Dr. Richard Dixon > > > > Medical Solutions plc > > 1 Orchard Place > > Nottingham Business Park > > Nottingham, NG8 6PX > > United Kingdom > > > > Phone: +44 (0)115 973 9027 > > Email: richard.dixon at medical-solutions.co.uk > > Web: http://www.medical-solutions.co.uk/ > > Fax: +44 (0)115 973 9021 > > > > > This email has been sent from Medical Solutions plc, a company registered in England and Wales, or from one of the companies within its control, including Medical Solutions (Nottingham) Limited and Geneservice Limited. The information in this email and any attachments are confidential and must not be copied, distributed or made available to anyone other than the intended addressee. If you receive this communication in error, please notify the sender as soon as possible and destroy the original email. When expressed to our clients any opinions or advice contained in this email are subject to the terms and conditions expressed in the governing services agreement. Please note that, whilst we try to ensure that attachments are virus-free, we cannot accept responsibility for situations where this is not the case. > > Medical Solutions plc (registered number 79136); Medical Solutions (Nottingham) Limited (registered number 4078501) and Geneservice Limited (registered number 5355417). All companies registered in England and Wales, registered offices 1 Orchard Place, Nottingham Business Park, Nottingham NG8 6PX United Kingdom. > www.medical-solutions.co.uk > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From kuhn at soe.ucsc.edu Sat Feb 2 14:43:25 2008 From: kuhn at soe.ucsc.edu (Robert Kuhn) Date: Sat, 2 Feb 2008 14:43:25 -0800 Subject: [Genome] sequence discrepancy Message-ID: <200802022243.OAA09157@moondance.cse.ucsc.edu> Steve, It is not clear to me where you get the value for the PCR on the transcripts. When I use your primers on the rat assembly, I also get a region of 429 bases (in gene GJA1). When I click into the details page of the mRNA that aligns at this location (e.g., X06656), I see that the mRNA is a perfect match to the reference assembly for the entire 429 bases. Could you please explain where you are seeing the extra 70 bases? best wishes, --b0b kuhn ucsc genome bioinformatics group > From genome at soe.ucsc.edu Fri Feb 1 09:07:18 2008 > Subject: [Genome] sequence discrepancy > > Dear All: > > I have found a rat gene (GAJ1) where electronic PCR on the UCSC browser > yields a product length (with the primers i've chosen) of 429 bases. > PCR from the cDNA/mRNA yields a predicted length of 498. i have not > yet done the PCR to see what the truth is, but i assume it will be > 498. > > primers in question are: > > CTGCGAAAACGTCTGCTATG > GGACGTGAGAGGAAGCAGTC > > > its really not that important, but i wondered if you (pl) would have > insight into what happened to the ~70 bases?? > > thanks > > steve brown > University of Vermont > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From pbuckley at rcsi.ie Mon Feb 4 03:12:42 2008 From: pbuckley at rcsi.ie (Patrick Buckley) Date: Mon, 4 Feb 2008 11:12:42 -0000 Subject: [Genome] miRNA genomic position Message-ID: Hi there, I was wondering if you could tell me why the genomic position of miRNA genes is different in UCSC compared to the Sanger miRNA registry? Thank you in advance for your help Best regards, Patrick Buckley From kayla at soe.ucsc.edu Mon Feb 4 09:41:15 2008 From: kayla at soe.ucsc.edu (Kayla Smith) Date: Mon, 4 Feb 2008 09:41:15 -0800 (PST) Subject: [Genome] miRNA genomic position In-Reply-To: References: Message-ID: Hello Patrick, We use BLAT to align miRNAs that we obtain from the Sanger miRNA Registry. You can read more about this on the details page for the sno/miRNA Track here: http://genome.ucsc.edu/cgi-bin/hgTrackUi?g=wgRna If you could provide a specific example of an miRNA that does not appear to be aligned properly, I can look into it further. I hope this information is helpful to you. Please don't hesitate to contact us again if you require further assistance. Kayla Smith UCSC Genome Bioinformatics Group On Mon, 4 Feb 2008, Patrick Buckley wrote: > Hi there, > > I was wondering if you could tell me why the genomic position of miRNA genes > is different in UCSC compared to the Sanger miRNA registry? > > Thank you in advance for your help > > > Best regards, > > Patrick Buckley > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome > From ann at soe.ucsc.edu Mon Feb 4 10:02:16 2008 From: ann at soe.ucsc.edu (Ann Zweig) Date: Mon, 04 Feb 2008 10:02:16 -0800 Subject: [Genome] Genome Browser - Affy data for D. Melanogaster Release 5 In-Reply-To: <19E385F7-1F9D-4460-8EE6-55557A79CD61@lbl.gov> References: <19E385F7-1F9D-4460-8EE6-55557A79CD61@lbl.gov> Message-ID: <47A75328.30809@cse.ucsc.edu> Hello Ken, We have added this to our project list, although it is lower priority than our work on the vertebrate assemblies and therefore may not show up for a while. Regards, ---------- Ann Zweig UCSC Genome Bioinformatics Group http://genome.ucsc.edu Kenneth Wan wrote: > Dear UCSC, > > Do you have a timeline for incorporating the Affymetrix microarray > data for the Expression and Regulation tracks for the latest release > of the D. melanogaster genome browser? This data is extremely > helpful for us to analyze our EST/cDNA data. > > Thanks, > Ken > > Kenneth H. Wan > Berkeley Drosophila Genome Project > Lawrence Berkeley National Laboratory > 1 Cyclotron Road; MS 64-119 > Berkeley, CA 94720 > 510-486-4456 > 510-486-6798 (fax) > http://oskibear95.blogspot.com > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From galt at soe.ucsc.edu Mon Feb 4 10:36:05 2008 From: galt at soe.ucsc.edu (Galt Barber) Date: Mon, 4 Feb 2008 10:36:05 -0800 (PST) Subject: [Genome] Trouble with BLAT to align SNPs In-Reply-To: <479D0771.2010501@stanford.edu> References: <479D0771.2010501@stanford.edu> Message-ID: 1. Question about noise, i.e. low-quality alignments in output. Use pslReps and pslCDnaFilter to apply filtering criteria to the psl output of BLAT. This will allow you to get rid of low-quality alignments. It works much more reliably and flexibly than using -minScore or -minIdentity blat commandline options. Here is our BLAT FAQ which has lots of handy info: http://genome.ucsc.edu/FAQ/FAQblat 2. Question about why the first FASTA record seems to be skipped. It appears that you may have a leading space in the first line before the ">" character. Apparently BLAST tolerates it but not BLAT. -Galt On Sun, 27 Jan 2008, Michael Reese wrote: > Dear BLAT gurus, > > I'm pretty new to BLAT, and am attempting to use it to align gene > models from one strain of a parasite to the genomes of other strains > that have been sequenced. The genome is relatively small (~60Mb), so > it's no problem to hold it all in memory. I used seg to mask low > complexity regions in both my genome (nucleotide) and query sequences > (protein) with lower case. BLAT works great to align things when > there's no low complexity. But as soon as I have stretches of low > complexity, I end up with a bunch of noise. I've tried the various > flags (mask=; qMask=; repeats= lower) for masking, with very little > effect. If I use the "mask" flag, I see that the output is represented > as "X" in a blast output style alignment, but I don't see any > improvement in the noise. I see no change when I use the qMask or > repeats flags. > > I'm using standalone BLAT v33x5. My database and input sequences > are both in fasta format, as faToNib tells me it'll only work on single > sequences, and I really need to do everything in batch. I've assumed > the file format isn't a problem, since the help file suggests that it > deals with all file types equivalently. > > An unrelated problem: BLAT is ignoring the first entry in my input > (multi-seq) fasta file. (BLAST does not and finds the seq in the genome > no problem, so I'm unsure as to what that problem is). > > An example of my input (lower-case masked): > > >25.m01838|hypothetical protein > MEGTQPTCYCTVLRGGIGLLRQYSTEKAMTAGREQLLHTEDAADRVLHPTSDSLLTLQLV > LTKGTTTFRQAYILGTALPFLDLSYHNIFLDNATTRANLFLHAPAYIWTGGWDTGIplvp > flllaplasllvYAFLWKSLWRPLKREEERTGRLALTRGEDGKKALHSPSPKPFSLSLVP > LYASDLLSHLGPFLAKHVESRGVCKRALSELQDPLLAREAKGSALAGETDDETEGHRGKK > TQDTGTELKQQRDKTPLEIVLSVFVGENLPEGIVKQSTEWLLELSCLCTILRHTPPEDRE > VPYDLLLRLVNANASLWAHEPSFEELRGRILLKLLENTSNALTVYDREAPQLQARSVESD > SGHQTEKDELSVPEKVLSSDHGSSADLAERASAKNesspehassssapsdasdVETQEEK > KGTQDTPRLVLDLLLHPGPYTTPDSIFLQLWPVGQRIRAGEPEHALRRAVKLLRSFVEAR > DEARISERHASDELRENsvslsssassfwsKLRGRESRKQAGEKDASGDYGAQDVSPLSA > ALLLCDTSLpsprrsppWRMKlllreletdllcvlSVVFLGVFSRKSGGESLFGEVGKIA > LTGIRALRGLAILEAAYHLETNFIHSPAYYEATDSDMIKQSLGLVTFNGLLAAAVFRTHK > YALLPFFLLRIRDMFAMDFRI > > In addition to a hit that returns the entire sequence (in its two > exons), I get a series of things that hit to the low complexity regions, > e.g. (in blast output format): > > Query: 9 SSPFAFSSSHSASPSS 24 > SS + SSS SASPSS > Sbjct: 685739 SSSSSASSSSSASPSS 685786 > > Any hints I could get would be greatly appreciated. > > Regards, > Michael > > -- > > Michael Reese, Ph.D. > Postdoctoral Scholar > Stanford University > Dept. Microbiology & Immunology > Fairchild D305 > 300 Pasteur Dr. (299 Campus Dr. for Courier) > Stanford CA, 94305-5124 > Phone: (650)-723-7884 > Fax: (650)-723-6853 > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome > From rhead at soe.ucsc.edu Mon Feb 4 11:08:07 2008 From: rhead at soe.ucsc.edu (Brooke Rhead) Date: Mon, 04 Feb 2008 11:08:07 -0800 Subject: [Genome] A question about liftover program In-Reply-To: <267274.14898.qm@web82712.mail.mud.yahoo.com> References: <267274.14898.qm@web82712.mail.mud.yahoo.com> Message-ID: <47A76297.100@soe.ucsc.edu> Hello Shan, I apologize for the delay in answering your question. This is indeed a problem that other users have encountered, and a page has been created on our wiki site with instructions for creating liftOver chain files: http://genomewiki.cse.ucsc.edu/index.php/Minimal_Steps_For_LiftOver The wiki page above uses tools in the Genome Browser source code. See this link for downloading the source if you have not already done so: http://hgdownload.cse.ucsc.edu/downloads.html#source_downloads There is also some discussion of creating liftOver chains in the mailing list archives: http://www.soe.ucsc.edu/pipermail/genome-mirror/2007-June/000509.html I hope this information is helpful. Please feel free to write back to the mailing list if you have further questions. -- Brooke Rhead UCSC Genome Bioinformatics Group Shan Yang wrote: > Dear Genome Browser > > I am studying a microbial genome that hasn't been fully annotated. It > is a substrain of E.Coli K12, which is annotated. Since they are very > similar in sequence, I would like to use the K12 annotation for my > genome for now. It looks like that liftover program is very suitable > for such task. But the problem is that I don't have the chain files > between the two strains to run liftover. I am wondering if I can get > some idea how to generate such chain files from two fasta files of my > microbial genome and K12. I think it is a quite general problem many > researchers may run into. So it will be great if you can provide such > program on the website. > > Thank you very much! > > Shan > > > > > ____________________________________________________________________________________ > Looking for last minute shopping deals? Find them fast with Yahoo! > Search. > http://tools.search.yahoo.com/newsearch/category.php?category=shopping > _______________________________________________ Genome maillist - > Genome at soe.ucsc.edu http://www.soe.ucsc.edu/mailman/listinfo/genome From hao.chen at gmail.com Mon Feb 4 11:15:01 2008 From: hao.chen at gmail.com (Hao Chen) Date: Mon, 4 Feb 2008 11:15:01 -0800 Subject: [Genome] Annotation for a gene Message-ID: <59de9e970802041115q3a9a383eod2998ac96f2c7dd2@mail.gmail.com> Hi there, I am trying to find the 5'UTR and 3'UTR of a gene Gdf2. I used to be able to do it, but could not figure out how I did it earlier. Could you give me instructions on how to find annotation of a gene? Thanks! Hao From rhead at soe.ucsc.edu Mon Feb 4 12:17:45 2008 From: rhead at soe.ucsc.edu (Brooke Rhead) Date: Mon, 04 Feb 2008 12:17:45 -0800 Subject: [Genome] Annotation for a gene In-Reply-To: <59de9e970802041115q3a9a383eod2998ac96f2c7dd2@mail.gmail.com> References: <59de9e970802041115q3a9a383eod2998ac96f2c7dd2@mail.gmail.com> Message-ID: <47A772E9.2080306@soe.ucsc.edu> Hello Hao, To visualize the untranslated regions for any gene, type the gene name into the "position/search" box in the Genome Browser. The UTRs are represented by the thinner lines on each end of the gene. More information about gene track displays is located here: http://hgwdev.cse.ucsc.edu/goldenPath/help/hgTracksHelp.html#GeneDisplay Note that there are several gene annotation tracks, and the annotations may differ from one another. It is up to you to determine which annotation track to use. To obtain the genome coordinates and sequence of the UTRs, click on a gene in the gene track you wish to use and look for the genomic sequence link. When you click on it, you will be taken to a page where you can select the regions of the gene for which to retrieve sequence. Select the checkboxes for "5' UTR Exons" and "3' UTR Exons", and de-select the checkboxes for the other regions. Also select the option for "One FASTA record per region". When you hit "submit", you should get the genomic sequence of the UTRs in FASTA format. The header for each region should include the genomic coordinates of the sequence and look similar to this: >hg18_knownGene_uc001jfa.1_0 range=chr10:48036700-48036859 5'pad=0 3'pad=0 revComp=TRUE strand=- repeatMasking=none Another way to get this information is with the Table Browser (the "Tables" link in the blue bar at the top of the page). For detailed instructions on using the Table Browser, see: http://genome.ucsc.edu/goldenPath/help/hgTablesHelp.html I hope this information is helpful. Should you have further questions, please feel free to contact us again at the genome mailing list address, genome at soe.ucsc.edu. -- Brooke Rhead UCSC Genome Bioinformatics Group Hao Chen wrote: > Hi there, > > I am trying to find the 5'UTR and 3'UTR of a gene Gdf2. I used to be able to > do it, but could not figure out how I did it earlier. Could you give me > instructions on how to find annotation of a gene? > > Thanks! > Hao > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From Aaron.Fletcher at utsouthwestern.edu Mon Feb 4 14:53:54 2008 From: Aaron.Fletcher at utsouthwestern.edu (Aaron Fletcher) Date: Mon, 04 Feb 2008 16:53:54 -0600 Subject: [Genome] ?'s about sequence alignments via UCSC genome Message-ID: <47A743220200001A0002FBCB@swnw124.swmed.edu> Hi, my name is Dr. Aaron Fletcher and I am a post-doc who is trying to determine sequence homology throughout different species "evolution". I started with a 1300 bp fragment and performed a BLAST search on it to narrow it down to a more manageable 161 bp sequence I am particularly interested in. I was told that UCSC genome was a good sight to do sequence homology on. I am not familiar/endowed in doing sequence homologies and would appriciate any help/guidance you could provide. I only have a DNA sequence...I do not know if it is a coding region or not and therefore do not currently have a peptide sequence to compare as well. What is the best way to compare my mouse sequence to other eukaryotes to determine conservation, etc. Thank you in advance for your help, Aaron Fletcher From fahre001 at umn.edu Tue Feb 5 09:00:35 2008 From: fahre001 at umn.edu (Scott C. Fahrenkrug) Date: Tue, 05 Feb 2008 11:00:35 -0600 Subject: [Genome] new bovine genome build Message-ID: <47A89633.6070201@umn.edu> Greetings, I am writing to inquire as to the possibility that you can upload the most recent build of the bovine genome, Build 4.0. The entire community realizes that build 3.1 is a mess, and a new build has been generated. This new build can be found at ftp://ftp.hgsc.bcm.tmc.edu/pub/data/Btaurus/fasta/Btau20070913-freeze/ Can you please let me know if/when you will post this data? Thanks for your help. Cheers, Scott Scott C. Fahrenkrug, Ph.D. Associate Professor of Functional Genomics Director, U of MN Animal Biotechnology Center Beckman Center for Transposon Research Department of Animal Science, University of Minnesota 495B Animal Science/Veterinary Medicine 1988 Fitch Avenue St. Paul, MN 55108 O:612-624-7216 F:612-625-2743 C:612-670-2078 fahre001 at umn.edu http://primer.ansci.umn.edu/Fahrenkruglab/ http://www.ansci.umn.edu/faculty/fahrenkrug.htm From rhead at soe.ucsc.edu Tue Feb 5 10:55:06 2008 From: rhead at soe.ucsc.edu (Brooke Rhead) Date: Tue, 05 Feb 2008 10:55:06 -0800 Subject: [Genome] ?'s about sequence alignments via UCSC genome In-Reply-To: <47A743220200001A0002FBCB@swnw124.swmed.edu> References: <47A743220200001A0002FBCB@swnw124.swmed.edu> Message-ID: <47A8B10A.5000202@soe.ucsc.edu> Hello Aaron, On the most recent mouse assembly (mm9, July 2007), there is a track in the Genome Browser called "Conservation" that shows multiple alignments of mouse and 29 other vertebrate species, along with measures of conservation. Click on the blue track name for more details about how the track was made, and to display more species. You may also be interested in the "Most Conserved" track, which shows predictions of conserved elements produced by the phastCons program, as well as the pairwise alignments of mouse against other species in the chain and net tracks. All of these tracks are in the "Comparative Genomics" section of the Genome Browser. If you need to find the location of your sequence on the mm9 assembly, you can use the BLAT tool. Click on "Blat" in the blue bar at the top of the page, select the Mouse, July 2007 assembly, paste the sequence into the box, and hit "submit". From there you should be able to select the best match to the sequence you entered. I hope this information helps get you started. If you have further questions, please feel free to write back to the genome mailing list address. -- Brooke Rhead UCSC Genome Bioinformatics Group Aaron Fletcher wrote: > Hi, my name is Dr. Aaron Fletcher and I am a post-doc who is trying > to determine sequence homology throughout different species > "evolution". I started with a 1300 bp fragment and performed a > BLAST search on it to narrow it down to a more manageable 161 bp > sequence I am particularly interested in. I was told that UCSC > genome was a good sight to do sequence homology on. I am not > familiar/endowed in doing sequence homologies and would appriciate > any help/guidance you could provide. I only have a DNA sequence...I > do not know if it is a coding region or not and therefore do not > currently have a peptide sequence to compare as well. What is the > best way to compare my mouse sequence to other eukaryotes to > determine conservation, etc. > > Thank you in advance for your help, Aaron Fletcher > > _______________________________________________ Genome maillist - Genome at soe.ucsc.edu http://www.soe.ucsc.edu/mailman/listinfo/genome From rgodinez at fas.harvard.edu Tue Feb 5 13:21:22 2008 From: rgodinez at fas.harvard.edu (Ricardo Godinez Moreno) Date: Tue, 5 Feb 2008 15:21:22 -0600 Subject: [Genome] Where are the promoters and PlyA in Genescan (BED) Message-ID: <1202246482.47a8d352092eb@webmail.fas.harvard.edu> Hi, I would like to know how introns are localized and re-formated as bed or gtf files in the UCSC genome browser. The reason of this is because I am doing statistics of intron length using Genescan gene predictions with the original genescan format wich includes Promoters, PlyA signals. I am comparing this results to intron lengths using reformated genescan genesets (GTF BED) in genome browser which only include exons and introns but not Promoters or plyA signals. My concern about how intron length is calculated, is because I think that since bed and gtf formats only have exons and introns but not promoters or poly A signals. I suppose that plyA and promoters may be included as introns in the genescan genesets. In summary. I need to compare genome browser intron length using genescan genesets and compare them to my own genescan results. But I wonder if introns have been confused with poly A or promoter elements. If so, how can I fix that. Ricardo. -- RICARDO GODINEZ MORENO PhD Student HARVARD UNIVERSITY 26th OXFORD ST. Dept of Organismic Evoutionary Biology Cambrdige MA. 02138 Lab: (617)4962401 Office:(617)4968387 http://www.oeb.harvard.edu/faculty/edwards/people/RicardoGodinez.htm From ann at soe.ucsc.edu Tue Feb 5 13:45:30 2008 From: ann at soe.ucsc.edu (Ann Zweig) Date: Tue, 05 Feb 2008 13:45:30 -0800 Subject: [Genome] new bovine genome build In-Reply-To: <47A89633.6070201@umn.edu> References: <47A89633.6070201@umn.edu> Message-ID: <47A8D8FA.5060001@soe.ucsc.edu> Hello Scott, The new bovine assembly release is on our project todo list. We plan to discuss its priority at the next project meeting (mid February). After that meeting, I will send you a follow-up email with an approximate release time-frame. Regards, ---------- Ann Zweig UCSC Genome Bioinformatics Group http://genome.ucsc.edu Scott C. Fahrenkrug wrote: > Greetings, > I am writing to inquire as to the possibility that you can upload the > most recent build of the bovine genome, Build 4.0. > The entire community realizes that build 3.1 is a mess, and a new build > has been generated. > > This new build can be found at > > ftp://ftp.hgsc.bcm.tmc.edu/pub/data/Btaurus/fasta/Btau20070913-freeze/ > > Can you please let me know if/when you will post this data? > > Thanks for your help. > Cheers, > Scott > > Scott C. Fahrenkrug, Ph.D. > Associate Professor of Functional Genomics > Director, U of MN Animal Biotechnology Center > Beckman Center for Transposon Research > Department of Animal Science, University of Minnesota > 495B Animal Science/Veterinary Medicine > 1988 Fitch Avenue > St. Paul, MN 55108 > > O:612-624-7216 > F:612-625-2743 > C:612-670-2078 > fahre001 at umn.edu > http://primer.ansci.umn.edu/Fahrenkruglab/ > http://www.ansci.umn.edu/faculty/fahrenkrug.htm > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From hiram at soe.ucsc.edu Tue Feb 5 13:57:09 2008 From: hiram at soe.ucsc.edu (Hiram Clawson) Date: Tue, 05 Feb 2008 13:57:09 -0800 Subject: [Genome] Where are the promoters and PlyA in Genescan (BED) In-Reply-To: <1202246482.47a8d352092eb@webmail.fas.harvard.edu> References: <1202246482.47a8d352092eb@webmail.fas.harvard.edu> Message-ID: <47A8DBB5.2040404@soe.ucsc.edu> Good Afternoon Ricardo: Go to the "Tables" browser. Select the genscan track. Ask for "bed" type of output, then in the "get output" screen, select introns. You will then have a bed file of introns only. awk '{print $3-$2}' to see the intron sizes. --Hiram Ricardo Godinez Moreno wrote: > Hi, > > I would like to know how introns are localized and re-formated as bed or gtf > files in the UCSC genome browser. > > The reason of this is because I am doing statistics of intron length using > Genescan gene predictions with the original genescan format wich includes > Promoters, PlyA signals. I am comparing this results to intron lengths using > reformated genescan genesets (GTF BED) in genome browser which only include > exons and introns but not Promoters or plyA signals. > > My concern about how intron length is calculated, is because I think that since > bed and gtf formats only have exons and introns but not promoters or poly A > signals. I suppose that plyA and promoters may be included as introns in the > genescan genesets. > > In summary. I need to compare genome browser intron length using genescan > genesets and compare them to my own genescan results. But I wonder if introns > have been confused with poly A or promoter elements. If so, how can I fix that. > > Ricardo. > > > > > -- > RICARDO GODINEZ MORENO > PhD Student > HARVARD UNIVERSITY > 26th OXFORD ST. > Dept of Organismic Evoutionary Biology > Cambrdige MA. 02138 > Lab: (617)4962401 > Office:(617)4968387 > http://www.oeb.harvard.edu/faculty/edwards/people/RicardoGodinez.htm From cshi at spectragenetics.com Tue Feb 5 14:48:12 2008 From: cshi at spectragenetics.com (Chaofu Shi) Date: Tue, 05 Feb 2008 17:48:12 -0500 Subject: [Genome] downloading fosmid sequence Message-ID: <47A8E7AC.8000902@spectragenetics.com> Hi, I have a quick question regarding how to download the sequence of a fosmid. When you use the Genome Browser Gateway to view a gene, you can also see the fosmids which cover this gene. Could you please tell me a quick way to download the sequence of a fosmid of interest from the window in which the gene and fosmids are displayed? Thank you very much! Chaofu Shi, Ph.D. SpectraGenetics Pittsburgh, PA From ann at soe.ucsc.edu Tue Feb 5 16:02:41 2008 From: ann at soe.ucsc.edu (Ann Zweig) Date: Tue, 05 Feb 2008 16:02:41 -0800 Subject: [Genome] downloading fosmid sequence In-Reply-To: <47A8E7AC.8000902@spectragenetics.com> References: <47A8E7AC.8000902@spectragenetics.com> Message-ID: <47A8F921.1060607@soe.ucsc.edu> Hello Chaofu, There are a couple of ways to do this using the Genome Browser. If you have only a few genes and corresponding fosmids of interest, you can just click one-by-one on the individual fosmids while viewing them in the Browser. From the details page for the fosmid, click on the two "Genomic alignments of Fosmid ends" links to see the sequence for that end of the pair. If you have several fosmids of interest and you know the identifier of each fosmid, you can use the Table Browser to get the underlying sequence in bulk. Navigate to the Table Browser tool on our website ('Tables' from the top blue navigation bar). Configure it like so: clade: Vertebrate genome: Human assembly: Mar. 2006 (the latest human assembly = hg18) group: Mapping and Sequencing Tracks track: Fosmid End Pairs table: fosEndPairs region: genome identifiers: (paste in the list of fosmid identifiers) output format: sequence The configure the Sequence Retrieval page like so: x Blocks x One FASTA record per region (block/between blocks) Then press the "get sequence" button. This will give you a page containing the underlying genomic sequence for each end of the fosmid. Note that the fosmid end that is on the negative strand will *not* be reverse complemented (whereas in the first method above, it *is* reverse complemented). I hope this information is helpful to you. Please don't hesitate to contact the mail list again if you require further assistance. Regards, ---------- Ann Zweig UCSC Genome Bioinformatics Group http://genome.ucsc.edu Please feel free to search the Genome mailing list archives by visiting our home page, clicking on "Contact Us", then typing a word or phrase into the search box. On that same page (http://genome.ucsc.edu/contacts.html), you can subscribe to the Genome mailing list. Chaofu Shi wrote: > Hi, I have a quick question regarding how to download the sequence of a > fosmid. When you use the Genome Browser Gateway to view a gene, you can > also see the fosmids which cover this gene. Could you please tell me a > quick way to download the sequence of a fosmid of interest from the > window in which the gene and fosmids are displayed? > > Thank you very much! > > Chaofu Shi, Ph.D. > SpectraGenetics > Pittsburgh, PA > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From kayla at soe.ucsc.edu Tue Feb 5 16:16:39 2008 From: kayla at soe.ucsc.edu (Kayla Smith) Date: Tue, 05 Feb 2008 16:16:39 -0800 Subject: [Genome] RepeatMasker .out file for bosTau2 In-Reply-To: References: Message-ID: <47A8FC67.4050905@cse.ucsc.edu> Hello Hideo, Try downloading the bosTau2 rmsk table using the Table Browser. Or from here: http://hgdownload.cse.ucsc.edu/goldenPath/bosTau2/database/rmsk.txt.gz This should have the data you are looking for. I hope this information is helpful to you. Please don't hesitate to contact us again if you require further assistance. Kayla Smith UCSC Genome Bioinformatics Group Hideo Imamura wrote: > Is it possible to obtain the repeat annotation for bosTau2? That is, I need RepeatMasker .out file for bosTau2. Similar data is available for bosTau3. > Thank you very much. > Hideo > -------------------------------------------------------- > Hideo Imamura, Ph.D. > Bioinformatics and computational biololgy, > Biology Department --- Boston College. > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From rebecca.robilotto at yale.edu Tue Feb 5 21:35:27 2008 From: rebecca.robilotto at yale.edu (Rebecca Robilotto) Date: Wed, 6 Feb 2008 00:35:27 -0500 Subject: [Genome] Processed pseudogenes Message-ID: <4ad5f4d60802052135t4baa163j52da9627e0630f45@mail.gmail.com> Hello, I am working on a project that involves looking at the syntenic regions between C. elegans and C. Briggsae. In the chain and net files I found that it creates it own subset of syntenic regions. We have a set of pseudogenes that I want to compare with the syntenic regions to see if there is any relationship. I was reading the PNAS paper Evolution's cauldron: Duplication, deletion, and rearrangement in the mouse and human genomes and it mentioned that the NETFILTER program creates the syntenic subset and it includes the pseudogenes that arise from tandem duplication, but it eliminates the processed pseudogenes information. This paper discusses mouse and human, but I was wondering it has the same result with C. elegans and C. Briggsae. I would like to know about all the possible pseudogenes. Is there a way that I could get all the pseudogene information in the chain and net file and not just the duplicated ones? Thank you, Rebecca Robilotto From ann at soe.ucsc.edu Wed Feb 6 11:16:02 2008 From: ann at soe.ucsc.edu (Ann Zweig) Date: Wed, 06 Feb 2008 11:16:02 -0800 Subject: [Genome] Processed pseudogenes In-Reply-To: <4ad5f4d60802052135t4baa163j52da9627e0630f45@mail.gmail.com> References: <4ad5f4d60802052135t4baa163j52da9627e0630f45@mail.gmail.com> Message-ID: <47AA0772.4020301@cse.ucsc.edu> Hello Rebecca, The chain track includes all of the pseudogene mappings (whereas the net track keeps only the top alignment for each point in the genome). The chains include coherent alignments to paralogs, but the net discards then and keeps the ortholog. You can use the chainFilter program (available in our source code) to extract chains in a certain region from a chain file. chainFilter has a lot of parameters so you can constrain the score (to get rid of the really short alignments). The Genome Browser and Blat software are free for academic, nonprofit, and personal use. A license is required for commercial use. How to download the software: http://genome.cse.ucsc.edu/FAQ/FAQlicense#license3 You can obtain the source tree either via CVS: http://genome.ucsc.edu/admin/cvs.html or a zip file: http://hgdownload.cse.ucsc.edu/admin/jksrc.zip Please note the build instructions: http://genome.ucsc.edu/admin/jk-install.html All of the kent utilities output their usage message and command line options by running them with no arguments. Download the chain file here: http://hgdownload.cse.ucsc.edu/goldenPath/ce4/vsCb3/ Regards, ---------- Ann Zweig UCSC Genome Bioinformatics Group http://genome.ucsc.edu Please feel free to search the Genome mailing list archives by visiting our home page, clicking on "Contact Us", then typing a word or phrase into the search box. On that same page (http://genome.ucsc.edu/contacts.html), you can subscribe to the Genome mailing list. Rebecca Robilotto wrote: > Hello, > > I am working on a project that involves looking at the syntenic regions > between C. elegans and C. Briggsae. In the chain and net files I found that > it creates it own subset of syntenic regions. We have a set of pseudogenes > that I want to compare with the syntenic regions to see if there is any > relationship. I was reading the PNAS paper Evolution's cauldron: > Duplication, deletion, and rearrangement in the mouse and human genomes and > it mentioned that the NETFILTER program creates the syntenic subset and it > includes the pseudogenes that arise from tandem duplication, but it > eliminates the processed pseudogenes information. This paper discusses mouse > and human, but I was wondering it has the same result with C. elegans and C. > Briggsae. I would like to know about all the possible pseudogenes. Is there > a way that I could get all the pseudogene information in the chain and net > file and not just the duplicated ones? > > Thank you, > Rebecca Robilotto > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From wguo at ambrygen.com Wed Feb 6 11:42:43 2008 From: wguo at ambrygen.com (Wei Guo) Date: Wed, 6 Feb 2008 11:42:43 -0800 Subject: [Genome] Question about UCSC genome browser. Message-ID: <6FEDCB9BFE41C3478EF9D770BDC551BA3050AD@ambrysrv.AmbryGenetics.local> To whom it may concern, Can custom sequences such as many short Solexa reads or longer consensus gene sequences, be uploaded to UCSC genome browser, be aligned to the human genome and be shown with regions of known and novel SNPs, indels, if any? Or can the genome browser show align percent, coverage depth, etc of uploaded custom sequences? Best regards, Wei Wei Guo, PhD Bioinformatics Scientist Ambry Genetics 100 Columbia, #200 Aliso Viejo, CA 92656 Direct: 949-900-5543 Cell: 512-577-2125 Fax: 949-900-5501 http://www.ambrygen.com Confidentiality Notice: This e-mail message, including any attachments, is for the sole use of the intended ecipient(s) and may contain confidential and privileged information. Any unauthorized review, use, copying, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message. From ann at soe.ucsc.edu Wed Feb 6 15:35:52 2008 From: ann at soe.ucsc.edu (Ann Zweig) Date: Wed, 06 Feb 2008 15:35:52 -0800 Subject: [Genome] Question about UCSC genome browser. In-Reply-To: <6FEDCB9BFE41C3478EF9D770BDC551BA3050AD@ambrysrv.AmbryGenetics.local> References: <6FEDCB9BFE41C3478EF9D770BDC551BA3050AD@ambrysrv.AmbryGenetics.local> Message-ID: <47AA4458.9070402@soe.ucsc.edu> Hello Wei, If you have sequences, you can BLAT those against the latest human genome assembly. BLAT on DNA is designed to quickly find sequences of 95% and greater similarity of length 25 bases or more. It may miss more divergent or shorter sequence alignments. It will find perfect sequence matches of 33 bases, and sometimes find them down to 20 bases. BLAT on proteins finds sequences of 80% and greater similarity of length 20 amino acids or more. BLAT will return all of the hits in the genome including %identity. To read more about using the BLAT tool, see this help page: http://genome.ucsc.edu/goldenPath/help/hgTracksHelp.html#BLATAlign If you make a custom track out of your BLAT results, you will be able to view your sequence hits in conjunction with any other track in the genome browser, such as the SNP track. To read more about creating Custom Tracks, see this help page: http://genome.ucsc.edu/goldenPath/help/hgTracksHelp.html#CustomTracks This should be enough to get you started. Please feel free to write back to the list if you need more detailed instructions. Regards, ---------- Ann Zweig UCSC Genome Bioinformatics Group http://genome.ucsc.edu Wei Guo wrote: > To whom it may concern, > > > > Can custom sequences such as many short Solexa reads or longer consensus > gene sequences, be uploaded to UCSC genome browser, be aligned to the > human genome and be shown with regions of known and novel SNPs, indels, > if any? Or can the genome browser show align percent, coverage depth, > etc of uploaded custom sequences? > > > > Best regards, > > Wei > > > > Wei Guo, PhD > > Bioinformatics Scientist > > Ambry Genetics > > > > 100 Columbia, #200 > > Aliso Viejo, CA 92656 > > Direct: 949-900-5543 > > Cell: 512-577-2125 > > Fax: 949-900-5501 > > http://www.ambrygen.com > > > > Confidentiality Notice: This e-mail message, including > > any attachments, is for the sole use of the intended > > ecipient(s) and may contain confidential and privileged > > information. Any unauthorized review, use, copying, > > disclosure or distribution is prohibited. If you are not > > the intended recipient, please contact the sender by > > reply e-mail and destroy all copies of the original message. > > > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From mahmudbt at hotmail.com Thu Feb 7 07:43:35 2008 From: mahmudbt at hotmail.com (Mahmudul Hasan) Date: Thu, 7 Feb 2008 16:43:35 +0100 Subject: [Genome] Making predicted protein sequence. Message-ID: Hello, I am wondering about what method do you use to make the predicted protein sequence from mRNA or genomic sequence. Thanking you, Mahmud Lund University Sweden _________________________________________________________________ Express yourself instantly with MSN Messenger! Download today it's FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/ From chunhua_qin at merck.com Thu Feb 7 10:15:09 2008 From: chunhua_qin at merck.com (Qin, Chunhua) Date: Thu, 7 Feb 2008 13:15:09 -0500 Subject: [Genome] Divergence calculation Message-ID: <2F74E7A6F7BF9D4EAC844009C380003C01FE7171@usctmx1113.merck.com> Hi, How were the divergence numbers (milliDiv, milliDel, milliIns in the following table) for the repetitive elements generated? Can we assume the milliDiv is the "sum" of milliDel and milliIns? If yes, why some elements showed milliDiv>0 when both milliDel and milliIns are equal to 0 in the example below? Thanks, C.Q. #filter: chr12_rmsk.repName like 'IAPLTR%' #milliDiv milliDel milliIns genoName genoStart genoEnd repName 28 0 44 chr12 48201400 48201742 IAPLTR2b 124 82 16 chr12 48518724 48519028 IAPLTR4 5 0 0 chr12 49633169 49633536 IAPLTR1_Mm 8 0 0 chr12 49639937 49640304 IAPLTR1_Mm 24 0 0 chr12 50095777 50096114 IAPLTR1a_Mm 21 0 0 chr12 50102541 50102878 IAPLTR1a_Mm 64 30 20 chr12 51010569 51010873 IAPLTR2b 32 12 10 chr12 51440316 51440726 IAPLTR2a 112 15 6 chr12 52554531 52554865 IAPLTR1a_Mm 118 50 44 chr12 52655062 52655221 IAPLTR4_I 12 0 0 chr12 52682164 52682490 IAPLTR3 ------------------------------------------------------------------------------ Notice: This e-mail message, together with any attachments, contains information of Merck & Co., Inc. (One Merck Drive, Whitehouse Station, New Jersey, USA 08889), and/or its affiliates (which may be known outside the United States as Merck Frosst, Merck Sharp & Dohme or MSD and in Japan, as Banyu - direct contact information for affiliates is available at http://www.merck.com/contact/contacts.html) that may be confidential, proprietary copyrighted and/or legally privileged. It is intended solely for the use of the individual or entity named on this message. If you are not the intended recipient, and have received this message in error, please notify us immediately by reply e-mail and then delete it from your system. ------------------------------------------------------------------------------ From kayla at soe.ucsc.edu Thu Feb 7 11:39:05 2008 From: kayla at soe.ucsc.edu (Kayla Smith) Date: Thu, 07 Feb 2008 11:39:05 -0800 Subject: [Genome] Divergence calculation In-Reply-To: <2F74E7A6F7BF9D4EAC844009C380003C01FE7171@usctmx1113.merck.com> References: <2F74E7A6F7BF9D4EAC844009C380003C01FE7171@usctmx1113.merck.com> Message-ID: <47AB5E59.4060403@cse.ucsc.edu> Hello Chunhua, You can find some information about the repeatMask track here: http://genome.ucsc.edu/cgi-bin/hgTrackUi?g=rmsk For more technical information, I recommend contacting the creaters of the RepeatMasker software at http://www.repeatmasker.org I hope this points you in the right direction. If you have any further questions about the Genome Browser, please don't hesitate to contact us again. Kayla Smith UCSC Genome Bioinformatics Group Qin, Chunhua wrote: > Hi, > > How were the divergence numbers (milliDiv, milliDel, milliIns in the > following table) for the repetitive elements generated? Can we assume > the milliDiv is the "sum" of milliDel and milliIns? If yes, why some > elements showed milliDiv>0 when both milliDel and milliIns are equal to > 0 in the example below? > > Thanks, > > C.Q. > > > #filter: chr12_rmsk.repName like 'IAPLTR%' > #milliDiv milliDel milliIns genoName > genoStart genoEnd repName > 28 0 44 chr12 48201400 48201742 IAPLTR2b > 124 82 16 chr12 48518724 48519028 IAPLTR4 > 5 0 0 chr12 49633169 49633536 > IAPLTR1_Mm > 8 0 0 chr12 49639937 49640304 > IAPLTR1_Mm > 24 0 0 chr12 50095777 50096114 > IAPLTR1a_Mm > 21 0 0 chr12 50102541 50102878 > IAPLTR1a_Mm > 64 30 20 chr12 51010569 51010873 IAPLTR2b > 32 12 10 chr12 51440316 51440726 IAPLTR2a > 112 15 6 chr12 52554531 52554865 > IAPLTR1a_Mm > 118 50 44 chr12 52655062 52655221 > IAPLTR4_I > 12 0 0 chr12 52682164 52682490 IAPLTR3 > > ------------------------------------------------------------------------------ > Notice: This e-mail message, together with any attachments, contains > information of Merck & Co., Inc. (One Merck Drive, Whitehouse Station, > New Jersey, USA 08889), and/or its affiliates (which may be known > outside the United States as Merck Frosst, Merck Sharp & Dohme or MSD > and in Japan, as Banyu - direct contact information for affiliates is > available at http://www.merck.com/contact/contacts.html) that may be > confidential, proprietary copyrighted and/or legally privileged. It is > intended solely for the use of the individual or entity named on this > message. If you are not the intended recipient, and have received this > message in error, please notify us immediately by reply e-mail and then > delete it from your system. > > ------------------------------------------------------------------------------ > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From kayla at soe.ucsc.edu Thu Feb 7 13:07:51 2008 From: kayla at soe.ucsc.edu (Kayla Smith) Date: Thu, 07 Feb 2008 13:07:51 -0800 Subject: [Genome] Divergence calculation In-Reply-To: <47AB5E59.4060403@cse.ucsc.edu> References: <2F74E7A6F7BF9D4EAC844009C380003C01FE7171@usctmx1113.merck.com> <47AB5E59.4060403@cse.ucsc.edu> Message-ID: <47AB7327.40807@cse.ucsc.edu> Chunhua, Hello again. Let me follow up with something a coworker has pointed out to me: RepeatMasker .out files are in the format of cross_match output (a fast Smith-Waterman aligner written a long time ago by Phil Green). The repeatmasker.org help page describes the format in the "How to read the results" section: http://www.repeatmasker.org/webrepeatmaskerhelp.html#reading The cross_match / .out files actually show percentages, which we convert into parts per million so we can store an int instead of a float. "Div" is for "divergence" and is actually the proportion of base substitutions, independent of the proportion of bases inserted or deleted with respect to the consensus sequence. I hope that helps to get you started. Kayla Smith UCSC Genome Bioinformatics Group Kayla Smith wrote: > Hello Chunhua, > > You can find some information about the repeatMask track here: > http://genome.ucsc.edu/cgi-bin/hgTrackUi?g=rmsk > > For more technical information, I recommend contacting the creaters of > the RepeatMasker software at http://www.repeatmasker.org > > I hope this points you in the right direction. If you have any further > questions about the Genome Browser, please don't hesitate to contact us > again. > > Kayla Smith > UCSC Genome Bioinformatics Group > > > Qin, Chunhua wrote: >> Hi, >> >> How were the divergence numbers (milliDiv, milliDel, milliIns in the >> following table) for the repetitive elements generated? Can we assume >> the milliDiv is the "sum" of milliDel and milliIns? If yes, why some >> elements showed milliDiv>0 when both milliDel and milliIns are equal to >> 0 in the example below? >> >> Thanks, >> >> C.Q. >> >> >> #filter: chr12_rmsk.repName like 'IAPLTR%' >> #milliDiv milliDel milliIns genoName >> genoStart genoEnd repName >> 28 0 44 chr12 48201400 48201742 IAPLTR2b >> 124 82 16 chr12 48518724 48519028 IAPLTR4 >> 5 0 0 chr12 49633169 49633536 >> IAPLTR1_Mm >> 8 0 0 chr12 49639937 49640304 >> IAPLTR1_Mm >> 24 0 0 chr12 50095777 50096114 >> IAPLTR1a_Mm >> 21 0 0 chr12 50102541 50102878 >> IAPLTR1a_Mm >> 64 30 20 chr12 51010569 51010873 IAPLTR2b >> 32 12 10 chr12 51440316 51440726 IAPLTR2a >> 112 15 6 chr12 52554531 52554865 >> IAPLTR1a_Mm >> 118 50 44 chr12 52655062 52655221 >> IAPLTR4_I >> 12 0 0 chr12 52682164 52682490 IAPLTR3 >> >> ------------------------------------------------------------------------------ >> Notice: This e-mail message, together with any attachments, contains >> information of Merck & Co., Inc. (One Merck Drive, Whitehouse Station, >> New Jersey, USA 08889), and/or its affiliates (which may be known >> outside the United States as Merck Frosst, Merck Sharp & Dohme or MSD >> and in Japan, as Banyu - direct contact information for affiliates is >> available at http://www.merck.com/contact/contacts.html) that may be >> confidential, proprietary copyrighted and/or legally privileged. It is >> intended solely for the use of the individual or entity named on this >> message. If you are not the intended recipient, and have received this >> message in error, please notify us immediately by reply e-mail and then >> delete it from your system. >> >> ------------------------------------------------------------------------------ >> _______________________________________________ >> Genome maillist - Genome at soe.ucsc.edu >> http://www.soe.ucsc.edu/mailman/listinfo/genome > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From kayla at soe.ucsc.edu Thu Feb 7 13:36:24 2008 From: kayla at soe.ucsc.edu (Kayla Smith) Date: Thu, 07 Feb 2008 13:36:24 -0800 Subject: [Genome] Making predicted protein sequence. In-Reply-To: References: Message-ID: <47AB79D8.50806@cse.ucsc.edu> Hello Mahmudul, Assuming that you're looking at the UCSC Genes track, check the details page here: http://genome.ucsc.edu/cgi-bin/hgTrackUi?g=knownGene Step 8 says: "Protein predictions are generated. For non-RefSeq transcripts we use the txCdsPredict program to determine if the transcript is protein-coding and if so, the locations of the start and stop codons. The program weighs as positive evidence the length of the protein, the presence of a Kozak consensus sequence at the start codon, and the length of the orthologous predicted protein in other species. As negative evidence it considers nonsense-mediated decay and start codons in any frame upstream of the predicted start codon. For RefSeq transcripts the RefSeq protein prediction is used." I hope this helps to get you started. If you have further questions, please don't hesitate to contact us again. Kayla Smith UCSC Genome Bioinformatics Group Mahmudul Hasan wrote: > Hello, > I am wondering about what method do you use to make the predicted protein sequence from mRNA or genomic sequence. > > Thanking you, > > Mahmud > Lund University > Sweden > _________________________________________________________________ > Express yourself instantly with MSN Messenger! Download today it's FREE! > http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/ > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome > From - Thu From dglemay at ucdavis.edu Thu Feb 7 15:07:58 2008 From: dglemay at ucdavis.edu (Danielle Lemay) Date: Thu, 07 Feb 2008 15:07:58 -0800 Subject: [Genome] Btau 4.0 (cow) Message-ID: <47AB8F4E.1090604@ucdavis.edu> Hello, Is there an expected arrival date for Bos taurus 4.0 ? (The 3.1 assembly has many problems.) Thanks, Danielle ==================================================== Danielle Lemay PhD Candidate, Nutritional Biology German Lab University of California at Davis dglemay at ucdavis.edu (530) 297 7688 From roht at nhlbi.nih.gov Thu Feb 7 16:14:03 2008 From: roht at nhlbi.nih.gov (Roh, Tae-Young (NIH/NHLBI) [F]) Date: Thu, 7 Feb 2008 19:14:03 -0500 Subject: [Genome] wigEncode Message-ID: <54EB8D559F57334392453C8AB62D06C7015DBE5C@NIHCESMLBX6.nih.gov> Hi, When I ran winEncode to prepare wiggle format data, I saw the upper limit is fixed to 25. My data contains even higher values. I want to know which algorithm is used to calculate the values and if there is a trick to change the upper limit at wigEncode. Thanks, Tae-Young ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Tae-Young Roh, Ph.D. Laboratory of Molecular Immunology NHLBI/NIH Building 10, Room 7B20A 9000 Rockville Pike Bethesda, MD 20892-1674 Phone: (301) 496-9459 fax: (301) 402-0971 email: roht at mail.nih.gov From jin_ma at merck.com Thu Feb 7 17:03:44 2008 From: jin_ma at merck.com (Ma, Jin) Date: Thu, 7 Feb 2008 17:03:44 -0800 Subject: [Genome] Where to download these genome files Message-ID: <9BEE7CC4462DB14997A5C8CF8F3BEB0201BEAF90@ussemx1100.merck.com> Hi, Would you please advise on which URL to use to download the following genome files for mm9? Thanks. Chr*.2bit Chr*.fa Chr*.fa.nib Chr*.nib Jin ------------------------------------------------------------------------------ Notice: This e-mail message, together with any attachments, contains information of Merck & Co., Inc. (One Merck Drive, Whitehouse Station, New Jersey, USA 08889), and/or its affiliates (which may be known outside the United States as Merck Frosst, Merck Sharp & Dohme or MSD and in Japan, as Banyu - direct contact information for affiliates is available at http://www.merck.com/contact/contacts.html) that may be confidential, proprietary copyrighted and/or legally privileged. It is intended solely for the use of the individual or entity named on this message. If you are not the intended recipient, and have received this message in error, please notify us immediately by reply e-mail and then delete it from your system. ------------------------------------------------------------------------------ From rhead at soe.ucsc.edu Thu Feb 7 17:43:18 2008 From: rhead at soe.ucsc.edu (Brooke Rhead) Date: Thu, 07 Feb 2008 17:43:18 -0800 Subject: [Genome] Where to download these genome files In-Reply-To: <9BEE7CC4462DB14997A5C8CF8F3BEB0201BEAF90@ussemx1100.merck.com> References: <9BEE7CC4462DB14997A5C8CF8F3BEB0201BEAF90@ussemx1100.merck.com> Message-ID: <47ABB3B6.4090004@soe.ucsc.edu> Hello Jin, To get to the genome files for mm9, start from our homepage (http://genome.ucsc.edu/) and click the "Downloads" link in the light blue bar on the left-hand side of the page, then click on "Mouse". Under the heading "Jul. 2007 (mm9)" you should see two links: * Full data set (http://hgdownload.cse.ucsc.edu/goldenPath/mm9/bigZips/) * Data set by chromosome (http://hgdownload.cse.ucsc.edu/goldenPath/mm9/chromosomes/) The first link contains the mm9.2bit file, and the second link contains the chr*.fa files. There are no .nib files for mm9, as the .nib format was replaced by the .2bit format (see this page: http://genome.ucsc.edu/FAQ/FAQformat for descriptions of the .nib and .2bit formats). I hope this information is helpful. -- Brooke Rhead UCSC Genome Bioinformatics Group Ma, Jin wrote: > Hi, > > Would you please advise on which URL to use to download the following > genome files for mm9? Thanks. > > Chr*.2bit > Chr*.fa > Chr*.fa.nib > Chr*.nib > > Jin > > > > ------------------------------------------------------------------------------ > Notice: This e-mail message, together with any attachments, contains > information of Merck & Co., Inc. (One Merck Drive, Whitehouse Station, > New Jersey, USA 08889), and/or its affiliates (which may be known > outside the United States as Merck Frosst, Merck Sharp & Dohme or MSD > and in Japan, as Banyu - direct contact information for affiliates is > available at http://www.merck.com/contact/contacts.html) that may be > confidential, proprietary copyrighted and/or legally privileged. It is > intended solely for the use of the individual or entity named on this > message. If you are not the intended recipient, and have received this > message in error, please notify us immediately by reply e-mail and then > delete it from your system. > > ------------------------------------------------------------------------------ > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From brianherb at jhmi.edu Fri Feb 8 07:18:41 2008 From: brianherb at jhmi.edu (Brian Herb) Date: Fri, 8 Feb 2008 10:18:41 -0500 Subject: [Genome] Bee genome Message-ID: <70131f240802080718y582b03e1sb175d3abbe211cc2@mail.gmail.com> Hi, Just wondering when the latest release of the Apis Mellifera genome, Amel 4.0, will be available to view on the UCSC browser. Thank you for this service, I use the browser almost every day! -Brian From geoffroy at titus.u-strasbg.fr Fri Feb 8 02:33:01 2008 From: geoffroy at titus.u-strasbg.fr (=?ISO-8859-1?Q?Geoffroy_V=E9ronique?=) Date: Fri, 08 Feb 2008 11:33:01 +0100 Subject: [Genome] Problems with FTP connection Message-ID: <47AC2FDD.7070403@igbmc.u-strasbg.fr> Hello, We have had problems accessing UCSC ftp site (hgdownload.cse.ucsc.edu) for the last six days. We can not connect to the site: "> ftp hgdownload.cse.ucsc.edu Connected to hgdownload.cse.ucsc.edu (128.114.119.140). " (no connection) Here are the error message we got with a mirror: "Cannot connect, skipping package" FTP connection through a Web browser seems to be working though. Do you have any idea of what's happening? Thank you very much in advance for your help, V?ronique Geoffroy -- --------------------------------------------------------------------- V?ronique GEOFFROY | Plate-forme Bio-informatique | T?l : (+33) 388 65 33 09 de Strasbourg | Fax : (+33) 388 65 32 76 I.G.B.M.C. 1, rue Laurent Fries | Email: geoffroy at igbmc.u-strasbg.fr 67404 Illkirch | FRANCE | http://bips.u-strasbg.fr/ --------------------------------------------------------------------- From rosenfel at cshl.edu Fri Feb 8 07:22:59 2008 From: rosenfel at cshl.edu (Jeffrey Rosenfeld) Date: Fri, 8 Feb 2008 10:22:59 -0500 Subject: [Genome] Conservation scores Message-ID: Hi, I am trying to find conservation scores for a human/rat/mouse alignment from hg18. I looked around your site, but I couldn't find it. Do you have such a file, or would it be easy to make one? Thank You, Jeffrey Rosenfeld Cold Spring Harbor Laboratory From rhead at soe.ucsc.edu Fri Feb 8 10:22:36 2008 From: rhead at soe.ucsc.edu (Brooke Rhead) Date: Fri, 08 Feb 2008 10:22:36 -0800 Subject: [Genome] Bee genome In-Reply-To: <70131f240802080718y582b03e1sb175d3abbe211cc2@mail.gmail.com> References: <70131f240802080718y582b03e1sb175d3abbe211cc2@mail.gmail.com> Message-ID: <47AC9DEC.4070400@soe.ucsc.edu> Hello Brian, I'm glad to hear that you find the Genome Browser useful -- thanks! At the moment we don't have any plans to further update our insect assemblies (other than perhaps drosophila). However, we will add Amel 4.0 to our request list. -- Brooke Rhead UCSC Genome Bioinformatics Group Brian Herb wrote: > Hi, > Just wondering when the latest release of the Apis Mellifera genome, Amel > 4.0, will be available to view on the UCSC browser. Thank you for this > service, I use the browser almost every day! -Brian > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From D.Ellsworth at wriwindber.org Fri Feb 8 10:44:11 2008 From: D.Ellsworth at wriwindber.org (Darrell Ellsworth) Date: Fri, 8 Feb 2008 13:44:11 -0500 Subject: [Genome] Public repository for SNP data Message-ID: <7690673B79E70D429A12057919339223C39AE0@wri-xchng.WRIWINDBER.ORG> Hello, I am trying to identify a public repository for raw data from 500K (Affy) SNP files. This data was used to generate a manuscript in press in the Journal of Molecular Diagnostics and they require submission of the data to a public database. All I have found thus far are gene expression data repositories. Thanks for any assistance you can provde. From kayla at soe.ucsc.edu Fri Feb 8 12:15:26 2008 From: kayla at soe.ucsc.edu (Kayla Smith) Date: Fri, 08 Feb 2008 12:15:26 -0800 Subject: [Genome] Public repository for SNP data In-Reply-To: <7690673B79E70D429A12057919339223C39AE0@wri-xchng.WRIWINDBER.ORG> References: <7690673B79E70D429A12057919339223C39AE0@wri-xchng.WRIWINDBER.ORG> Message-ID: <47ACB85E.2000107@cse.ucsc.edu> Hello Darrell, I can tell you what resources we have available for you: 1. We have a SNP track and an SNP array track on our Browser. Here is a session with those two tracks turned on: http://genome.cse.ucsc.edu/cgi-bin/hgTracks?hgS_doOtherUser=submit&hgS_otherUserName=Kayla&hgS_otherUserSessionName=hg18%2Daffysnp You can click on the blue bar on the left hand side of either of those tracks to read more information on how they're created. 2. We get our SNP data from dbSNP: http://www.ncbi.nlm.nih.gov/SNP/ 3. If you're looking for some specific raw data, it's probably best to contact the people who generated that data. You could also try asking Affymetrix, if you think it's something that they would have: http://affymetrix.com/index.affx I hope this points you in a useful direction. Please don't hesitate to contact us again if we can be of further assistance. Kayla Smith UCSC Genome Bioinformatics Group Darrell Ellsworth wrote: > Hello, > > > > I am trying to identify a public repository for raw data from 500K > (Affy) SNP files. This data was used to generate a manuscript in press > in the Journal of Molecular Diagnostics and they require submission of > the data to a public database. All I have found thus far are gene > expression data repositories. > > > > Thanks for any assistance you can provde. > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From rhead at soe.ucsc.edu Fri Feb 8 12:32:06 2008 From: rhead at soe.ucsc.edu (Brooke Rhead) Date: Fri, 08 Feb 2008 12:32:06 -0800 Subject: [Genome] Problems with FTP connection In-Reply-To: <47AC2FDD.7070403@igbmc.u-strasbg.fr> References: <47AC2FDD.7070403@igbmc.u-strasbg.fr> Message-ID: <47ACBC46.5020902@soe.ucsc.edu> Hello V?ronique Geoffroy, Our ftp server is generally quite busy, but we wouldn't expect it to deny connections for six days due to being too busy. Are you able to connect manually, as instructed here: http://genome.ucsc.edu/FAQ/FAQdownloads#download1 ? If you are not able to connect manually, may I ask what ftp program and operating system you are using? -- Brooke Rhead UCSC Genome Bioinformatics Group Geoffroy V?ronique wrote: > Hello, > > We have had problems accessing UCSC ftp site (hgdownload.cse.ucsc.edu) > for the last six days. We can not connect to the site: > > "> ftp hgdownload.cse.ucsc.edu > Connected to hgdownload.cse.ucsc.edu (128.114.119.140). > " > (no connection) > > Here are the error message we got with a mirror: > "Cannot connect, skipping package" > > FTP connection through a Web browser seems to be working though. > > Do you have any idea of what's happening? > > Thank you very much in advance for your help, > > V?ronique Geoffroy > From huang_wen at ustc.edu Fri Feb 8 14:46:26 2008 From: huang_wen at ustc.edu (Wen Huang) Date: Fri, 8 Feb 2008 16:46:26 -0600 Subject: [Genome] 3'utr coordinates from the table browser Message-ID: <83CD6ECF-DCA7-4478-9C9E-CF11A1A9F7F2@ustc.edu> Hi, I have a few questions about the BED file generated when retrieving 3'utr coordinates. I choose 3'UTR exons from the output. Below are a few lines from the file. I have a few questions: 1) since these are 3'UTR exons (not just 3'UTR), do they also include some exon sequences that are from the coding region? (e.g. STOP in the middle of a exon) For 3'UTRs that span multiple exons, are all the exons included? 2) the first three and the last columns are easy to understand, correct me if I am wrong. chromosome id, start, end, strand. What do the numbers in, for example, "NM_174812_utr3_0_0_chr16_3833_r" mean and what does the second column from the last mean?(they are all 0's). 3) are these coordinates start from 0 or 1? Thank you very much. Wen chr16 3882 5634 NM_174812_utr3_0_0_chr16_3883_r 0 - chr16 35447 35625 NM_001080309_utr3_1_0_chr16_35448_f 0 + chr16 71582 72004 NM_001098464_utr3_4_0_chr16_71583_f 0 + chr16 191157 192344 NM_174143_utr3_10_0_chr16_191158_f 0 + chr16 354114 354322 NM_174088_utr3_4_0_chr16_354115_f 0 + From hiram at soe.ucsc.edu Fri Feb 8 15:16:56 2008 From: hiram at soe.ucsc.edu (Hiram Clawson) Date: Fri, 08 Feb 2008 15:16:56 -0800 Subject: [Genome] wigEncode In-Reply-To: <54EB8D559F57334392453C8AB62D06C7015DBE5C@NIHCESMLBX6.nih.gov> References: <54EB8D559F57334392453C8AB62D06C7015DBE5C@NIHCESMLBX6.nih.gov> Message-ID: <47ACE2E8.2080006@soe.ucsc.edu> Good Afternoon Tae-Young: I'm not sure I understand what the difficulty is here. When you run wigEncode to encode wiggle data, wigEncode finds the minimum and maximum values in your entire data set and prints out to stderr a line about these values when it is finished. It imposes no limits on the data. When you have finished encoding your data, you can note the minimum and maximum values as found by wigEncode and use those in your trackDb.ra file, or 'track' line for a custom track to determine how the viewing window functions for that data. Please note the "viewLimits" option for the custom track 'track' definition line: http://genome.ucsc.edu/goldenPath/help/wiggle.html Can you clarify how you are seeing the limit fixed at 25 ? --Hiram Roh, Tae-Young (NIH/NHLBI) [F] wrote: > Hi, > > When I ran winEncode to prepare wiggle format data, I saw the upper > limit is fixed to 25. > My data contains even higher values. > I want to know which algorithm is used to calculate the values and if > there is a trick to change the upper limit at wigEncode. > Thanks, > Tae-Young From rhead at soe.ucsc.edu Fri Feb 8 15:41:07 2008 From: rhead at soe.ucsc.edu (Brooke Rhead) Date: Fri, 08 Feb 2008 15:41:07 -0800 Subject: [Genome] Conservation scores In-Reply-To: References: Message-ID: <47ACE893.6050505@soe.ucsc.edu> Hello Jeffrey, The hg18 assembly in the Genome Browser has a few tracks that may be what you are looking for: - Conservation This track shows multiple alignments of 28 vertebrate species and two measures of evolutionary conservation -- conservation across all 28 species and an alternative measurement restricted to the placental mammal subset (17 species plus human) of the alignment. - Most Conserved This track shows predictions of conserved elements produced by the phastCons program based on a whole-genome alignment of vertebrates, and for the placental mammal subset of species in the alignment. - TFBS Conserved This track contains the location and score of transcription factor binding sites conserved in the human/mouse/rat alignment. You can read more details about each track by clicking on its name in the Genome Browser, or by clicking the "mini-button" for the track -- the thin blue or gray bar to the far left of the track in the main Genome Browser display. I hope one of these tracks is helpful. Please feel free to contact us again at this mailing list address if we can be of further assistance. -- Brooke Rhead UCSC Genome Bioinformatics Group Jeffrey Rosenfeld wrote: > Hi, > > I am trying to find conservation scores for a human/rat/mouse > alignment from hg18. I looked around your site, but I couldn't find > it. Do you have such a file, or would it be easy to make one? > > Thank You, > Jeffrey Rosenfeld > Cold Spring Harbor Laboratory > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From rhead at soe.ucsc.edu Fri Feb 8 17:10:16 2008 From: rhead at soe.ucsc.edu (Brooke Rhead) Date: Fri, 08 Feb 2008 17:10:16 -0800 Subject: [Genome] 3'utr coordinates from the table browser In-Reply-To: <83CD6ECF-DCA7-4478-9C9E-CF11A1A9F7F2@ustc.edu> References: <83CD6ECF-DCA7-4478-9C9E-CF11A1A9F7F2@ustc.edu> Message-ID: <47ACFD78.80503@soe.ucsc.edu> Hello Wen, Please see answers to your questions intersperesed below: Wen Huang wrote: > Hi, > > I have a few questions about the BED file generated when retrieving > 3'utr coordinates. > > I choose 3'UTR exons from the output. > > Below are a few lines from the file. I have a few questions: > > 1) since these are 3'UTR exons (not just 3'UTR), do they also include > some exon sequences that are from the coding region? (e.g. STOP in the > middle of a exon) For 3'UTRs that span multiple exons, are all the > exons included? The UTRs are the untranslated regions of exons. They do not include coding regions. For UTRs that span multiple exons, all exons are included in the Table Browser output, but multiple exons will occupy multiple lines in the BED file. An easy way to examine the regions output by the Table Browser is to choose "custom track" as the output format -- the selected regions will appear in a "user track" at the top of the Genome Browser display. > 2) the first three and the last columns are easy to understand, > correct me if I am wrong. chromosome id, start, end, strand. What do > the numbers in, for example, "NM_174812_utr3_0_0_chr16_3833_r" mean > and what does the second column from the last mean?(they are all 0's). I think the second zero in the name generated by the Table Browser is unused in this instance, but I am not certain. I have asked our developers about it, and I will send a follow-up to this answer when I know for sure. > 3) are these coordinates start from 0 or 1? The BED coordinates start from 0. See an explanation here: http://genome.ucsc.edu/FAQ/FAQtracks#tracks1 I hope this information is helpful. -- Brooke Rhead UCSC Genome Bioinformatics Group > > Thank you very much. > > Wen > chr16 3882 5634 NM_174812_utr3_0_0_chr16_3883_r 0 - > chr16 35447 35625 NM_001080309_utr3_1_0_chr16_35448_f 0 + > chr16 71582 72004 NM_001098464_utr3_4_0_chr16_71583_f 0 + > chr16 191157 192344 NM_174143_utr3_10_0_chr16_191158_f 0 + > chr16 354114 354322 NM_174088_utr3_4_0_chr16_354115_f 0 + > > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From achpj at hotmail.com Sun Feb 10 06:25:52 2008 From: achpj at hotmail.com (Maria Astrom) Date: Sun, 10 Feb 2008 08:25:52 -0600 Subject: [Genome] downloading specific type of repeats In-Reply-To: <641511.74007.qm@web38814.mail.mud.yahoo.com> References: <641511.74007.qm@web38814.mail.mud.yahoo.com> Message-ID: I am grad student. I am in need of downloading the specific type of repeats(such as CTG triple repeats, in the promoter regions of human genome. I could not find any criteria in table browser, specifically addressing such kind of repeats. Is there anyway, to make the download the profile of promoters matching only specific criteria ? Thanks Maria _________________________________________________________________ Connect and share in new ways with Windows Live. http://www.windowslive.com/share.html?ocid=TXT_TAGHM_Wave2_sharelife_012008 From arhan at ucla.edu Sat Feb 9 12:22:15 2008 From: arhan at ucla.edu (Areum Han) Date: Sat, 9 Feb 2008 12:22:15 -0800 Subject: [Genome] self pair wise alignment file for mm Message-ID: <000801c86b59$7662b1a0$02f46180@AreumHan> Hi. in hgdownload.cse.ucsc.edu ftp, Is there a way that I can download self alignment files for a mm8? (expected path : /goldenPath/mm8/vsSelf/axtNet/). ----------------------------------------------------------- Areum Han Graduate student, Biomedical Engineering Dept., UCLA 310-775-1606 / arhan at ucla.edu ----------------------------------------------------------- From davidsalako at gmail.com Sat Feb 9 09:18:54 2008 From: davidsalako at gmail.com (David Salako) Date: Sat, 9 Feb 2008 11:18:54 -0600 Subject: [Genome] Accessing UCSC database using mysql client within Matlab Message-ID: <284c688c0802090918i6cb2234bg540c737da34faedb@mail.gmail.com> Hi, I need to know what 'root' and 'password' I need to use for access to the UCSC database. I am using the values 'genomep' or 'genome' and 'password'. Doesn't seem to work. The 'host' I am using in the connection string is 'genome-mysql.cse.ucsc.edu '. Please advise. Thanks, -- David Salako Tel: 713 254 8370 E-mail: davidsalako at gmail.com From newgumtree at gmail.com Sat Feb 9 12:28:50 2008 From: newgumtree at gmail.com (Qu Zhang) Date: Sat, 9 Feb 2008 15:28:50 -0500 Subject: [Genome] question on downloading est/genomic alignments info Message-ID: Hi there, I am trying to download the EST/genomic alignment information for all human-EST, but I can't find the right option in Table browser. Could you tell me what I can do to get that information? Another question is that is it possible that an EST is mapped to more than one location in the genome? Many thanks for your help! Best, Qu From nsakabe at bsd.uchicago.edu Sat Feb 9 14:59:47 2008 From: nsakabe at bsd.uchicago.edu (Sakabe, Noboru [BSD] - HGD) Date: Sat, 9 Feb 2008 16:59:47 -0600 Subject: [Genome] exonWalk.txt missing in hg18? Message-ID: <74518891A099FA4EB488FF7F636AC4881FEF91@ADM-EXCHVS04.bsdad.uchicago.edu> Where can I find the exonWalk.txt file for hg18? Thanks, Noboru This email is intended only for the use of the individual or entity to which it is addressed and may contain information that is privileged and confidential. If the reader of this email message is not the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication is prohibited. If you have received this email in error, please notify the sender and destroy/delete all copies of the transmittal. Thank you. From B.Glaser at bristol.ac.uk Mon Feb 11 07:15:45 2008 From: B.Glaser at bristol.ac.uk (Beate Glaser) Date: Mon, 11 Feb 2008 15:15:45 -0000 Subject: [Genome] file conversion Message-ID: <15980812.1202742945@epi-pc32.ads.bris.ac.uk> Hi, I tried to convert some human methylation data (BED file format) given in hg16 (July 2003) coordinates into hg 18 coordinates (March 2006) using : liftOver.linux.i386 test1_2003.txt hg16ToHg18.over.chain test1_2006.txt UnMapped The conversion under linux is carried out however, the alignment is wrong. I only know because the methylation position is always C in hg16 (+ strand) and its all over the place in hg18. Would anyone know what went wrong? Thanks a lot for any help. Beate ---------------------- Beate Glaser Dept Social Medicine Canynge Hall Room 3.5 Whiteladies Road Bristol BS8 2PR UK ++44-117-331-3901 From enrique.muro at mdc-berlin.de Mon Feb 11 05:15:58 2008 From: enrique.muro at mdc-berlin.de (Enrique M. Muro) Date: Mon, 11 Feb 2008 14:15:58 +0100 Subject: [Genome] hg18, is static data? Message-ID: <1202735758.6908.19.camel@cuatrocaminos> We downloaded long time ago (2006) the hg18 and set up a mysql database. Now we have downloaded again hg18 (febr. 2008) in another lab setting up again a mysql database. we were comparing the 2006 and 2008 mysql versions realizing that the mysql data is different in the diff. versions (see for instance chromEnd in the next example) OLD hg18 download (2006) mysql> select * from estOrientInfo where name='DB511739'; +-----+--------------+------------+----------+----------+-------------------+-----------+--------------+-----------+--------------+ | bin | chrom | chromStart | chromEnd | name | intronOrientation | sizePolyA | revSizePolyA | signalPos | revSignalPos | +-----+--------------+------------+----------+----------+-------------------+-----------+--------------+-----------+--------------+ ... | 827 | chr17 | 31820231 | 31820657 | DB511739 | 0 | 0 | 2 | 0 | ... NEW hg18 download (2008) mysql> select * from estOrientInfo where name='DB511739'; +-----+--------------+------------+----------+----------+-------------------+-----------+--------------+-----------+--------------+ | bin | chrom | chromStart | chromEnd | name | intronOrientation | sizePolyA | revSizePolyA | signalPos | revSignalPos | +-----+--------------+------------+----------+----------+-------------------+-----------+--------------+-----------+--------------+ ... | 827 | chr17 | 31820231 | 31820891 | DB511739 | 0 | 0 | 0 | 0 | 0 | ... is hg18 data static and we committed some kind of mistake with the mysql set up? or the hg18 data could be modified over time? Thanks in advance, Enrique Muro From kayla at soe.ucsc.edu Mon Feb 11 09:02:28 2008 From: kayla at soe.ucsc.edu (Kayla Smith) Date: Mon, 11 Feb 2008 09:02:28 -0800 (PST) Subject: [Genome] self pair wise alignment file for mm In-Reply-To: <000801c86b59$7662b1a0$02f46180@AreumHan> References: <000801c86b59$7662b1a0$02f46180@AreumHan> Message-ID: Hello Areum, Try this link: http://genome-test.cse.ucsc.edu/goldenPath/mm8/vsSelf/ Keep in mind though that this data is on our test server, and as it has not gone through our QA process, may contain errors. Kayla Smith UCSC Genome Bioinformatics Group On Sat, 9 Feb 2008, Areum Han wrote: > Hi. > in hgdownload.cse.ucsc.edu ftp, > Is there a way that I can download self alignment files for a mm8? > (expected path : /goldenPath/mm8/vsSelf/axtNet/). > > ----------------------------------------------------------- > Areum Han > Graduate student, > Biomedical Engineering Dept., UCLA > 310-775-1606 / arhan at ucla.edu > ----------------------------------------------------------- > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome > From rhead at soe.ucsc.edu Mon Feb 11 10:03:09 2008 From: rhead at soe.ucsc.edu (Brooke Rhead) Date: Mon, 11 Feb 2008 10:03:09 -0800 Subject: [Genome] hg18, is static data? In-Reply-To: <1202735758.6908.19.camel@cuatrocaminos> References: <1202735758.6908.19.camel@cuatrocaminos> Message-ID: <47B08DDD.2090803@soe.ucsc.edu> Hello Enrique, The EST data for all assemblies is updated weekly with new data from GenBank. See more information on GenBank updates on this page: http://hgdownload.cse.ucsc.edu/goldenPath/hg18/bigZips/ We frequently add or update data. The changes are reflected in our Release Log: http://genome.ucsc.edu/goldenPath/releaseLog.html I hope this information is helpful. -- Brooke Rhead UCSC Genome Bioinformatics Group Enrique M. Muro wrote: > We downloaded long time ago (2006) the hg18 and set up a mysql database. > Now we have downloaded again hg18 (febr. 2008) in another lab setting up > again a mysql database. > > we were comparing the 2006 and 2008 mysql versions realizing that the > mysql data is different in the diff. versions (see for instance chromEnd > in the next example) > > OLD hg18 download (2006) > > mysql> select * from estOrientInfo where name='DB511739'; > +-----+--------------+------------+----------+----------+-------------------+-----------+--------------+-----------+--------------+ > | bin | chrom | chromStart | chromEnd | name | > intronOrientation | sizePolyA | revSizePolyA | signalPos | revSignalPos > | > +-----+--------------+------------+----------+----------+-------------------+-----------+--------------+-----------+--------------+ > ... > | 827 | chr17 | 31820231 | 31820657 | DB511739 | > 0 | 0 | 2 | 0 | > ... > > > NEW hg18 download (2008) > mysql> select * from estOrientInfo where name='DB511739'; > +-----+--------------+------------+----------+----------+-------------------+-----------+--------------+-----------+--------------+ > | bin | chrom | chromStart | chromEnd | name | > intronOrientation | sizePolyA | revSizePolyA | signalPos | revSignalPos > | > +-----+--------------+------------+----------+----------+-------------------+-----------+--------------+-----------+--------------+ > ... > | 827 | chr17 | 31820231 | 31820891 | DB511739 | > 0 | 0 | 0 | 0 | 0 | > ... > > > is hg18 data static and we committed some kind of mistake with the mysql > set up? or the hg18 data could be modified over time? > > Thanks in advance, > Enrique Muro > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome > From kayla at soe.ucsc.edu Mon Feb 11 10:21:14 2008 From: kayla at soe.ucsc.edu (Kayla Smith) Date: Mon, 11 Feb 2008 10:21:14 -0800 (PST) Subject: [Genome] exonWalk.txt missing in hg18? In-Reply-To: <74518891A099FA4EB488FF7F636AC4881FEF91@ADM-EXCHVS04.bsdad.uchicago.edu> References: <74518891A099FA4EB488FF7F636AC4881FEF91@ADM-EXCHVS04.bsdad.uchicago.edu> Message-ID: Hello Noboru, We don't have an exonWalk track for the hg18 browser. However, you can take the data from the hg17 version of this track and lift it to hg18. The hg17 version is here: http://hgdownload.cse.ucsc.edu/goldenPath/hg17/database/ LiftOver is here: http://genome.ucsc.edu/cgi-bin/hgLiftOver This should help get you started. Please don't hesitate to contact us again if you require further assistance. Kayla Smith UCSC Genome Bioinformatics Group On Sat, 9 Feb 2008, Sakabe, Noboru [BSD] - HGD wrote: > Where can I find the exonWalk.txt file for hg18? > Thanks, > > Noboru > > > This email is intended only for the use of the individual or entity to which it is addressed and may contain information that is privileged and confidential. If the reader of this email message is not the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication is prohibited. If you have received this email in error, please notify the sender and destroy/delete all copies of the transmittal. Thank you. > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome > From rhead at soe.ucsc.edu Mon Feb 11 10:31:27 2008 From: rhead at soe.ucsc.edu (Brooke Rhead) Date: Mon, 11 Feb 2008 10:31:27 -0800 Subject: [Genome] question on downloading est/genomic alignments info In-Reply-To: References: Message-ID: <47B0947F.30403@soe.ucsc.edu> Hello Qu, To get to the table corresponding to the track "Human ESTs" in the Genome Browser, make the following selections in the Table Browser: (assuming you are using the latest human assembly) clade: vertebrate genome: human assembly: Mar. 2006 group: mRNA and EST tracks track: Human ESTs table: est Note that this is a very large table. If you plan to download all of it, I recommend using the "gzip compressed" option for your output. Alternatively, the table is split into chromosomes on our downloads page, here: http://hgdownload.cse.ucsc.edu/goldenPath/hg18/database/ the files are named chr1_est.txt.gz, chr2_est.txt.gz, etc. It is possible for an EST to map to more than one location. From the details page for the track: "When a single EST aligned in multiple places, the alignment having the highest base identity was identified. Only alignments having a base identity level within 0.5% of the best and at least 96% base identity with the genomic sequence were kept." You can get to the details page by clicking on the blue track name link on the main Genome Browser display page. I hope this information is helpful. -- Brooke Rhead UCSC Genome Bioinformatics Group Qu Zhang wrote: > Hi there, > > I am trying to download the EST/genomic alignment information for all > human-EST, but I can't find the right option in Table browser. Could you > tell me what I can do to get that information? Another question is that is > it possible that an EST is mapped to more than one location in the genome? > Many thanks for your help! > > Best, > Qu > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome > From rhead at soe.ucsc.edu Mon Feb 11 10:39:27 2008 From: rhead at soe.ucsc.edu (Brooke Rhead) Date: Mon, 11 Feb 2008 10:39:27 -0800 Subject: [Genome] Public repository for SNP data In-Reply-To: <47ACB85E.2000107@cse.ucsc.edu> References: <7690673B79E70D429A12057919339223C39AE0@wri-xchng.WRIWINDBER.ORG> <47ACB85E.2000107@cse.ucsc.edu> Message-ID: <47B0965F.6050508@soe.ucsc.edu> Hi Darrell, Let me add to what my colleague has said. If you are looking to submit SNP array data to a public repository, dbSNP is the right place for that. There is a 'SNP SUBMISSION' link on the left pane of this page: http://www.ncbi.nlm.nih.gov/projects/SNP/ http://www.ncbi.nlm.nih.gov/projects/SNP/how_to_submit.html I hope this information is helpful. -- Brooke Rhead UCSC Genome Bioinformatics Group Kayla Smith wrote: > Hello Darrell, > > I can tell you what resources we have available for you: > > 1. We have a SNP track and an SNP array track on our Browser. Here is > a session with those two tracks turned on: > > http://genome.cse.ucsc.edu/cgi-bin/hgTracks?hgS_doOtherUser=submit&hgS_otherUserName=Kayla&hgS_otherUserSessionName=hg18%2Daffysnp > > You can click on the blue bar on the left hand side of either of those > tracks to read more information on how they're created. > > 2. We get our SNP data from dbSNP: > http://www.ncbi.nlm.nih.gov/SNP/ > > 3. If you're looking for some specific raw data, it's probably best to > contact the people who generated that data. You could also try asking > Affymetrix, if you think it's something that they would have: > > http://affymetrix.com/index.affx > > I hope this points you in a useful direction. Please don't hesitate to > contact us again if we can be of further assistance. > > Kayla Smith > UCSC Genome Bioinformatics Group > > > Darrell Ellsworth wrote: > >> Hello, >> >> >> >> I am trying to identify a public repository for raw data from 500K >> (Affy) SNP files. This data was used to generate a manuscript in press >> in the Journal of Molecular Diagnostics and they require submission of >> the data to a public database. All I have found thus far are gene >> expression data repositories. >> >> >> >> Thanks for any assistance you can provde. >> >> _______________________________________________ >> Genome maillist - Genome at soe.ucsc.edu >> http://www.soe.ucsc.edu/mailman/listinfo/genome >> > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome > From rhead at soe.ucsc.edu Mon Feb 11 11:21:16 2008 From: rhead at soe.ucsc.edu (Brooke Rhead) Date: Mon, 11 Feb 2008 11:21:16 -0800 Subject: [Genome] Accessing UCSC database using mysql client within Matlab In-Reply-To: <284c688c0802090918i6cb2234bg540c737da34faedb@mail.gmail.com> References: <284c688c0802090918i6cb2234bg540c737da34faedb@mail.gmail.com> Message-ID: <47B0A02C.2070405@soe.ucsc.edu> Hello David, I don't have any specific information on using Matlab to access our public MySQL database, but perhaps I can still help. I assume you are already looking at the instructions on our FAQ page: http://genome.ucsc.edu/FAQ/FAQdownloads#download29 The user 'genome' does not require a password. (The user 'genomep' was set up to require a password and to be added to a configuration file for access from tools in the Kent source tree.) I suggest trying user 'genome' with no password. -- Brooke Rhead UCSC Genome Bioinformatics Group David Salako wrote: > Hi, > I need to know what 'root' and 'password' I need to use for access to the > UCSC database. > I am using the values 'genomep' or 'genome' and 'password'. Doesn't seem to > work. > The 'host' I am using in the connection string is 'genome-mysql.cse.ucsc.edu > '. > Please advise. > Thanks, > > From rhead at soe.ucsc.edu Mon Feb 11 11:43:07 2008 From: rhead at soe.ucsc.edu (Brooke Rhead) Date: Mon, 11 Feb 2008 11:43:07 -0800 Subject: [Genome] file conversion In-Reply-To: <15980812.1202742945@epi-pc32.ads.bris.ac.uk> References: <15980812.1202742945@epi-pc32.ads.bris.ac.uk> Message-ID: <47B0A54B.5020406@soe.ucsc.edu> Hello Beate, Do you get a similar result if you use our web-based BLAT to convert coordinates from hg16 to hg18? http://genome.ucsc.edu/cgi-bin/hgBlat If you get a better result using web-based BLAT, instructions for replicating results using command-line BLAT are here: http://genome.ucsc.edu/FAQ/FAQblat#blat7 . If you are still having troubles, can you send us an example of hg16 coordinates that do not seem to be lifting correctly to hg18? -- Brooke Rhead UCSC Genome Bioinformatics Group Beate Glaser wrote: > Hi, > > I tried to convert some human methylation data (BED file format) given in > hg16 (July 2003) coordinates into hg 18 coordinates (March 2006) using : > > liftOver.linux.i386 test1_2003.txt hg16ToHg18.over.chain test1_2006.txt > UnMapped > > The conversion under linux is carried out however, the alignment is wrong. > I only know because the methylation position is always C in hg16 (+ strand) > and its all over the place in hg18. Would anyone know what went wrong? > > Thanks a lot for any help. > > Beate > > ---------------------- > Beate Glaser > Dept Social Medicine > Canynge Hall > Room 3.5 > Whiteladies Road > Bristol BS8 2PR > UK > > ++44-117-331-3901 > > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome > From rhead at soe.ucsc.edu Mon Feb 11 11:49:55 2008 From: rhead at soe.ucsc.edu (Brooke Rhead) Date: Mon, 11 Feb 2008 11:49:55 -0800 Subject: [Genome] file conversion In-Reply-To: <47B0A54B.5020406@soe.ucsc.edu> References: <15980812.1202742945@epi-pc32.ads.bris.ac.uk> <47B0A54B.5020406@soe.ucsc.edu> Message-ID: <47B0A6E3.5000900@soe.ucsc.edu> Hi again Beate, A colleague suggested a couple of things to double-check: 1. Make sure that the input coordinates in test1_2003.txt really are from hg16. 2. Make sure you are using our 0-based half-open coordinates in the input bed file given to liftOver (See an explanation here: http://genome.ucsc.edu/FAQ/FAQtracks#tracks1). -- Brooke Rhead UCSC Genome Bioinformatics Group Brooke Rhead wrote: > Hello Beate, > > Do you get a similar result if you use our web-based BLAT to convert > coordinates from hg16 to hg18? > http://genome.ucsc.edu/cgi-bin/hgBlat > > If you get a better result using web-based BLAT, instructions for > replicating results using command-line BLAT are here: > http://genome.ucsc.edu/FAQ/FAQblat#blat7 . > > If you are still having troubles, can you send us an example of hg16 > coordinates that do not seem to be lifting correctly to hg18? > > -- > Brooke Rhead > UCSC Genome Bioinformatics Group > > > > Beate Glaser wrote: > >> Hi, >> >> I tried to convert some human methylation data (BED file format) given in >> hg16 (July 2003) coordinates into hg 18 coordinates (March 2006) using : >> >> liftOver.linux.i386 test1_2003.txt hg16ToHg18.over.chain test1_2006.txt >> UnMapped >> >> The conversion under linux is carried out however, the alignment is wrong. >> I only know because the methylation position is always C in hg16 (+ strand) >> and its all over the place in hg18. Would anyone know what went wrong? >> >> Thanks a lot for any help. >> >> Beate >> >> ---------------------- >> Beate Glaser >> Dept Social Medicine >> Canynge Hall >> Room 3.5 >> Whiteladies Road >> Bristol BS8 2PR >> UK >> >> ++44-117-331-3901 >> >> >> _______________________________________________ >> Genome maillist - Genome at soe.ucsc.edu >> http://www.soe.ucsc.edu/mailman/listinfo/genome >> >> > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome > From MCM at stowers-institute.org Mon Feb 11 11:45:23 2008 From: MCM at stowers-institute.org (Gogol, Madelaine) Date: Mon, 11 Feb 2008 13:45:23 -0600 Subject: [Genome] different fonts? Message-ID: Hi, I noticed that the genome browser recently started having a different appearance on my windows machine in IE6/FF2 than on my linux machine (FF 1.5). The fonts are bigger in windows and genes are clunkier (see attached). I started noticing this recently and I like the smaller fonts better. Is this something that's happened on my end? Thanks, Madelaine Gogol Programmer Analyst II Stowers Institute From hiram at soe.ucsc.edu Mon Feb 11 12:01:36 2008 From: hiram at soe.ucsc.edu (Hiram Clawson) Date: Mon, 11 Feb 2008 12:01:36 -0800 Subject: [Genome] different fonts? In-Reply-To: References: Message-ID: <47B0A9A0.6090700@soe.ucsc.edu> Good Afternoon Madelaine: You can change the font displayed in the browser. Click on the "configure" button and select a font size from there. These fonts are internal and specific to the genome browser, they have no relationship to your desktop preferences and appear exactly as specified despite any particular desktop or WEB browser preferences you may have at your end. I suspect one of your sessions that you see as different has been set to a different sized font. --Hiram Gogol, Madelaine wrote: > Hi, > > I noticed that the genome browser recently started having a different > appearance on my windows machine in IE6/FF2 than on my linux machine (FF > 1.5). > > The fonts are bigger in windows and genes are clunkier (see attached). I > started noticing this recently and I like the smaller fonts better. Is > this something that's happened on my end? > > Thanks, > Madelaine Gogol > Programmer Analyst II > Stowers Institute > > > ------------------------------------------------------------------------ > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From MCM at stowers-institute.org Mon Feb 11 11:57:36 2008 From: MCM at stowers-institute.org (Gogol, Madelaine) Date: Mon, 11 Feb 2008 13:57:36 -0600 Subject: [Genome] nevermind about "different fonts?" Message-ID: I just figured out configure: text size. Thanks! From mkoda at saitama-med.ac.jp Mon Feb 11 17:19:35 2008 From: mkoda at saitama-med.ac.jp (Masakazu Kohda) Date: Tue, 12 Feb 2008 10:19:35 +0900 Subject: [Genome] Public repository for SNP data In-Reply-To: <47B0965F.6050508@soe.ucsc.edu> References: <7690673B79E70D429A12057919339223C39AE0@wri-xchng.WRIWINDBER.ORG> <47ACB85E.2000107@cse.ucsc.edu> <47B0965F.6050508@soe.ucsc.edu> Message-ID: <530dbf580802111719h6aa85f4fv656e5da82b1f481e@mail.gmail.com> Hi Darrell, Did you see GEO? http://www.ncbi.nlm.nih.gov/geo/ it is not only for Expression array data but also for SNP array data. I hope this information is helpful. On Feb 12, 2008 3:39 AM, Brooke Rhead wrote: > Hi Darrell, > > Let me add to what my colleague has said. If you are looking to submit > SNP array data to a public repository, dbSNP is the right place for that. > > There is a 'SNP SUBMISSION' link on the left pane of this page: > http://www.ncbi.nlm.nih.gov/projects/SNP/ > > http://www.ncbi.nlm.nih.gov/projects/SNP/how_to_submit.html > > I hope this information is helpful. > > -- > Brooke Rhead > UCSC Genome Bioinformatics Group > > > > Kayla Smith wrote: > > Hello Darrell, > > > > I can tell you what resources we have available for you: > > > > 1. We have a SNP track and an SNP array track on our Browser. Here is > > a session with those two tracks turned on: > > > > http://genome.cse.ucsc.edu/cgi-bin/hgTracks?hgS_doOtherUser=submit&hgS_otherUserName=Kayla&hgS_otherUserSessionName=hg18%2Daffysnp > > > > You can click on the blue bar on the left hand side of either of those > > tracks to read more information on how they're created. > > > > 2. We get our SNP data from dbSNP: > > http://www.ncbi.nlm.nih.gov/SNP/ > > > > 3. If you're looking for some specific raw data, it's probably best to > > contact the people who generated that data. You could also try asking > > Affymetrix, if you think it's something that they would have: > > > > http://affymetrix.com/index.affx > > > > I hope this points you in a useful direction. Please don't hesitate to > > contact us again if we can be of further assistance. > > > > Kayla Smith > > UCSC Genome Bioinformatics Group > > > > > > Darrell Ellsworth wrote: > > > >> Hello, > >> > >> > >> > >> I am trying to identify a public repository for raw data from 500K > >> (Affy) SNP files. This data was used to generate a manuscript in press > >> in the Journal of Molecular Diagnostics and they require submission of > >> the data to a public database. All I have found thus far are gene > >> expression data repositories. > >> > >> > >> > >> Thanks for any assistance you can provde. > >> > >> _______________________________________________ > >> Genome maillist - Genome at soe.ucsc.edu > >> http://www.soe.ucsc.edu/mailman/listinfo/genome > >> > > > > _______________________________________________ > > Genome maillist - Genome at soe.ucsc.edu > > http://www.soe.ucsc.edu/mailman/listinfo/genome > > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome > -- Kohda Masakazu mkoda at saitama-med.ac.jp Research Center for Genomic Medicine Saitama Medical University From B.Glaser at bristol.ac.uk Tue Feb 12 06:15:53 2008 From: B.Glaser at bristol.ac.uk (Beate Glaser) Date: Tue, 12 Feb 2008 14:15:53 -0000 Subject: [Genome] file conversion In-Reply-To: <47B0A54B.5020406@soe.ucsc.edu> References: <47B0A54B.5020406@soe.ucsc.edu> Message-ID: <10916640.1202825753@epi-pc32.ads.bris.ac.uk> Hi Brooke, thanks a lot for looking into that. I figured out where the error was. Using: > liftOver.linux.i386 test1_2003.txt hg16ToHg18.over.chain test1_2006.txt > UnMapped only affects position 2 and 3 of the bed file (chromStart, chromEnd), but not position 7 and 8 (thickStart, thickEnd) which I used for display as well. So the old coordinates were still present at position 7/8 in the converted file and messed up the alignment in hg18. I took 7/8 out and the conversion seems fine now. Thanks a lot for your help, Beate ---------------------- Beate Glaser Dept Social Medicine Canynge Hall Room 3.5 Whiteladies Road Bristol BS8 2PR UK ++44-117-331-3901 From mjohnson at sfbrgenetics.org Tue Feb 12 08:35:26 2008 From: mjohnson at sfbrgenetics.org (Matt Johnson) Date: Tue, 12 Feb 2008 10:35:26 -0600 Subject: [Genome] Extended DNA case/color options Message-ID: <002301c86d95$45bba820$2c3a7cce@win.sfbrgenetics.org> Dear UCSC Genome Browser I am wanting to view genomic sequence using the 'Get DNA in Window' option and customized my view with the 'extended case/color options'. When I click on the 'extended case/color options' button the 'Track Name' list in the next window is limited to only 10 items. I know there are more options available because when performing the same task on my laptop I am presented with an extensive list of options to color code and format the sequence text after selecting the 'submit' button. Is there a way of extending this track name list on my desktop PC? Thanking you in advance. Matt ____________________________________________ Matt Johnson Ph.D. Department of Genetics Southwest Foundation for Biomedical Research P.O. Box 760549 San Antonio, TX 78245-0549 USA P; +1 210 258 9465 F; +1 210 259 9595 www.sfbr.org From barbj at mail.nih.gov Tue Feb 12 08:36:29 2008 From: barbj at mail.nih.gov (Barb, Jennifer (NIH/CIT) [E]) Date: Tue, 12 Feb 2008 11:36:29 -0500 Subject: [Genome] Gene ontology annotations Message-ID: <08BFEF2D7CC3104FA411B6E1991200C2826D35@NIHCESMLBX3.nih.gov> Does anyone know how to obtain the gene ontology annotations for each RefSeq id using the Tables option at UCSC? Any help that can be provided will be greatly appreciated. Sincerely, Jennifer From kayla at soe.ucsc.edu Tue Feb 12 11:23:13 2008 From: kayla at soe.ucsc.edu (Kayla Smith) Date: Tue, 12 Feb 2008 11:23:13 -0800 Subject: [Genome] Btau 4.0 (cow) In-Reply-To: <47AB8F4E.1090604@ucdavis.edu> References: <47AB8F4E.1090604@ucdavis.edu> Message-ID: <47B1F221.5070507@cse.ucsc.edu> Hello Danielle, Please see this previously answered mailing list question on the topic: https://www.soe.ucsc.edu/pipermail/genome/2008-February/015501.html I hope this is helpful to you. Please don't hesitate to contact us again if you require further assistance. Kayla Smith UCSC Genome Bioinformatics Group Danielle Lemay wrote: > Hello, > > Is there an expected arrival date for Bos taurus 4.0 ? > (The 3.1 assembly has many problems.) > > Thanks, > Danielle > > ==================================================== > Danielle Lemay > PhD Candidate, Nutritional Biology > German Lab > University of California at Davis > dglemay at ucdavis.edu > (530) 297 7688 > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From schaffer at scripps.edu Tue Feb 12 15:07:35 2008 From: schaffer at scripps.edu (Lana Schaffer) Date: Tue, 12 Feb 2008 15:07:35 -0800 Subject: [Genome] 3' fasta sequence Message-ID: <8F897BD2B0EB714CB79B96D84911B30209E0DF@EXCHV1.lj.ad.scripps.edu> Hi, Is there an easy batch query to get the fasta sequence Or 3' fasta sequence? Lana Schaffer Biostatistics/Informatics The Scripps Research Institute DNA Array Core Facility La Jolla, CA 92037 (858) 784-2263 (858) 784-2994 schaffer at scripps.edu From rhead at soe.ucsc.edu Tue Feb 12 15:43:15 2008 From: rhead at soe.ucsc.edu (Brooke Rhead) Date: Tue, 12 Feb 2008 15:43:15 -0800 Subject: [Genome] 3' fasta sequence In-Reply-To: <8F897BD2B0EB714CB79B96D84911B30209E0DF@EXCHV1.lj.ad.scripps.edu> References: <8F897BD2B0EB714CB79B96D84911B30209E0DF@EXCHV1.lj.ad.scripps.edu> Message-ID: <47B22F13.40606@soe.ucsc.edu> Hello Lana, You can use the Table Browser (the "Tables" link in the blue bar at the top of the page) to get fasta sequences. Choose the track for which you would like to retrieve sequence, and "outputput format: sequence", then hit "get output". If you are using a "Genes and Gene Predictions" track, you should see an option on the next page to select genomic, protein or mRNA sequence. If you choose to retrie