From kuhn at soe.ucsc.edu Sun Dec 2 11:26:53 2007 From: kuhn at soe.ucsc.edu (Robert Kuhn) Date: Sun, 2 Dec 2007 11:26:53 -0800 Subject: [Genome] Annotated transciption start sites Message-ID: <200712021926.LAA22620@moondance.cse.ucsc.edu> hello, Raja, For the four last assemblies, you should use the Table Browser to retrieve the values you seek from the knownGene table. You could also use the refGene table, which contains the RefSeq annotations. You will want to determine for youself which data set best suits your needs. feel free to contact the list again if you need more detial. you can also find answers t previously asked questions by ssearching the mailing-llist archives: http://genome.ucsc.edu/contacts.html (use the center search box) best wishes, --b0b kuhn ucsc genome bioinformatics group > From genome-bounces at soe.ucsc.edu Sun Dec 2 01:41:40 2007 > To: genome at soe.ucsc.edu > Subject: [Genome] Annotated transciption start sites > > Hi, > Where and how can I download a list of all the annotated transcription > start sites with their relative positions on the genome (chromosome and > base number) for any of the assemblies of the human genome? > Thank you, > Raga > > > Raga Krishnakumar > Ph.D. Candidate > Department of Molecular Biology and Genetics > Cornell University > > rk254 at cornell.edu > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome > From ann at soe.ucsc.edu Mon Dec 3 09:30:34 2007 From: ann at soe.ucsc.edu (Ann Zweig) Date: Mon, 03 Dec 2007 09:30:34 -0800 Subject: [Genome] anything can I do with mm5 data? In-Reply-To: References: Message-ID: <47543D3A.50901@cse.ucsc.edu> Hello Weike, First of all, let me apologize for taking so long to answer your question. You are correct that there is no liftOver from mm5 directly to mm9. However, you can do a double-lift from mm5 to mm8 (Feb 2006), and then from mm8 to mm9. This should give you the results you need. I hope this information is helpful to you. Please don't hesitate to contact the mail list again if you require further assistance. Regards, ---------- Ann Zweig UCSC Genome Bioinformatics Group http://genome.ucsc.edu Weike Mo wrote: > Hi, > > I am looking at the FANTOM3 data set of RIKEN. However, to my surprise, > they only have the newest annotation to mm5. Is there any program I can > use for transform mm5 data sets to mm9 annotation? I can not find any > chain file for liftOver for this transformation. Thanks very much. > > Weike Mo > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From John.Darlow at ucd.ie Mon Dec 3 08:32:14 2007 From: John.Darlow at ucd.ie (John Darlow) Date: Mon, 03 Dec 2007 16:32:14 +0000 (GMT) Subject: [Genome] BAT2D1 protein Message-ID: On the page for Human BAT2D1, chr1:169,721,290-169,829,273, Mar 2006 Assembly, http://genome.ucsc.edu/cgi-bin/hgGene? hgg_gene=uc001ghs.1&hgg_prot=NP_055987&hgg_chrom=chr1&hgg_start=1697212 89&hgg_end=169829273&hgg_type=knownGene&db=hg18&hgsid=100350226 In the table under 'Sequence and Links to Tools and Databases' you have 'Protein (2817 aa)', but clicking on the link to Proteome Browser brings one to a page about a protein (Q9Y520) of 2701 amino-acids, not 2817, and all the other links I tried to find out about domain structure of the protein also indicated that it only has 2701 amino- acids. So, I downloaded your genomic sequence of the BAT2D1 gene, first with exons in capitals, and then with CDS in capitals to identify the start and stop, then identified exactly which amino-acids were in each exon (since your diagram of the protein in the Protein-Browser does not show the exons), and labelled this on my copy. I found that it actually has apparently 2816 amino-acids, not 2817 (perhaps you counted the stop-codon as an amino-acid?) Then I translated it and aligned the 2816-amino-acid sequence with the 2701-amino-acid sequence from the Proteome Browser. When I found the place where the two sequences diverge, I then looked at my annotated genomic sequence and found where the difference comes. You have exon 32 (chr1:169,823,411-169,823,506) and then just two nucleotides, cc, before a 5-nt exon 33 (chr1:169823509-169823513). The 2701-amino-acid protein is made by translating the cc, which makes a different frame with an earlier stop-codon than the one in which the cc is spliced out of the RNA. The question is. Why does your page say 'Protein (2817 aa)' (which should be 2816) and then give a link to a protein of 2701 aa, and which is the right answer? John Darlow National Centre for Medical Genetics Dublin Ireland From wxzheng_tju at hotmail.com Mon Dec 3 01:07:40 2007 From: wxzheng_tju at hotmail.com (ZhengWenXin) Date: Mon, 3 Dec 2007 17:07:40 +0800 Subject: [Genome] Help! Message-ID: Dear Prof., Would you please help me to resolve a problem about the repeat track of the UCSC genome browser? I?m interested in the repeat of the human genome. Though the Repeat track, I could easily find what repeats were aligned along the sequence. But would you please tell me the occurrence frequencies of short repeats in the whole human genome? Thank you very much! Your help will be greatly appreciated. Best wishes, WenXin Zheng Wen-Xin Zheng, PhD candidate Bioinformatics Center Tianjin University Tianjin 300072 China Fax: +86-22-27402697 Website: http://tubic.tju.edu.cn _________________________________________________________________ MSN ???????????????????? http://cn.msn.com From rhead at soe.ucsc.edu Mon Dec 3 11:24:56 2007 From: rhead at soe.ucsc.edu (Brooke Rhead) Date: Mon, 03 Dec 2007 11:24:56 -0800 Subject: [Genome] Help! In-Reply-To: References: Message-ID: <47545808.8030908@soe.ucsc.edu> Hello Wen-Xin, You can get summary information on Genome Browser tracks using the Table Browser. To do so, hit the "Tables" link in the blue bar at the top of the page, and make the following selections: clade: vertebrate genome: human assembly: March 2006 (or whichever assembly you are using) group: variation and repeats track: RepeatMasker or Simple Repeats (whichever is of interest) table: rmsk or simpleRepeat region: genome Then hit the "summary/statistics" button at the bottom of the page. You should get a list of statistics about the track, including an "item count", which is the number of items in a track, and "item bases", which is the number of genomic bases covered by items in the track. You can also go back and use the filter button to limit the types of repeats that are counted in the summary. For instance, if you are using the RepeatMasker track, you can hit the "filter: create" button, and then choose to limit your search so that "repClass does match SINE" (this is just an example -- you might want to use some other kind of filter). I hope this is helpful. If you have further questions please feel free to write to this mailing list again. -- Brooke Rhead UCSC Genome Bioinformatics Group ZhengWenXin wrote: > Dear Prof., Would you please help me to resolve a problem about the > repeat track of the UCSC genome browser? > I?m interested in the repeat of the human genome. Though the Repeat track, I could easily find what repeats were aligned along the sequence. But would you please tell me the occurrence frequencies of short repeats in the whole human genome? > Thank you very much! Your help will be greatly appreciated. > Best wishes, > WenXin Zheng > > > > > Wen-Xin Zheng, PhD candidate > Bioinformatics Center > Tianjin University > Tianjin 300072 > China > Fax: +86-22-27402697 > Website: http://tubic.tju.edu.cn > _________________________________________________________________ > MSN ???????????????????? > http://cn.msn.com > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From yxiang at superarray.net Mon Dec 3 12:47:54 2007 From: yxiang at superarray.net (Yan Xiang) Date: Mon, 3 Dec 2007 15:47:54 -0500 Subject: [Genome] human chromosomes Message-ID: <00d401c835ed$c76fbfd0$6b01020a@IT3> Dear Sir/Madam, I was using Human Genome Browser and had a question about these chromosomes below(what chromosomes they are? Or contig accession numbers for them?): chr1_random chr10_random chr11_random chr13_random chr15_random chr16_random chr17_random chr19_random chr2_random chr21_random chr22_h2_hap1 chr22_random chr3_random chr4_random chr5_h2_hap1 chr5_random chr6_cox_hap1 chr6_qbl_hap2 chr6_random chr7_random chr8_random chr9_random chrX_random Below are some chromosomes I got from NCBI webpage. Is there any relationship between UCSC chromosomes up and the NCBI chromosomes below? chr1|NT_113870.1 chr1|NT_113871.1 chr1|NT_113872.1 chr1|NT_113874.1 chr1|NT_113875.1 chr1|NT_113878.1 chr10|NT_113918.1 chr13|NT_113923.1 chr15|NT_113924.1 chr15|NT_113925.1 chr15|NT_113926.1 chr15|NT_113927.1 chr15|NT_113928.1 chr16|NT_113929.1 chr17|NT_113930.1 chr17|NT_113931.1 chr17|NT_113932.1 chr17|NT_113933.1 chr17|NT_113934.1 chr17|NT_113935.1 chr17|NT_113936.1 chr17|NT_113937.1 chr17|NT_113939.1 chr17|NT_113942.1 chr17|NT_113943.1 chr17|NT_113944.1 chr19|NT_113949.1 chr2|NT_113880.1 chr21|NT_113951.1 chr21|NT_113952.1 chr21|NT_113953.1 chr21|NT_113954.1 chr21|NT_113955.1 chr21|NT_113956.1 chr21|NT_113957.1 chr21|NT_113958.1 chr22|NT_113961.1 chr3|NT_113881.1 chr3|NT_113883.1 chr3|NT_113884.1 chr4|NT_113885.1 chr4|NT_113886.1 chr4|NT_113888.1 chr4|NT_113889.1 chr5|NT_113890.1 chr6|NT_113898.1 chr6|NT_113899.1 chr7|NT_113900.1 chr7|NT_113901.1 chr7|NT_113902.1 chr8|NT_113906.1 chr8|NT_113908.1 chr8|NT_113910.1 chr9|NT_113911.1 chr9|NT_113912.1 chr9|NT_113913.1 chr9|NT_113915.1 chr9|NT_113917.1 chrX|NT_113962.1 chrX|NT_113963.1 chrX|NT_113964.1 chrX|NT_113965.1 chrX|NT_113966.1 Thank you! Best Regards, Yan Xiang From kayla at soe.ucsc.edu Mon Dec 3 13:44:07 2007 From: kayla at soe.ucsc.edu (Kayla Smith) Date: Mon, 03 Dec 2007 13:44:07 -0800 Subject: [Genome] human chromosomes In-Reply-To: <00d401c835ed$c76fbfd0$6b01020a@IT3> References: <00d401c835ed$c76fbfd0$6b01020a@IT3> Message-ID: <475478A7.9030908@cse.ucsc.edu> Hello Yan, Here is a link to our FAQ on the "_random" chromosomes you asked about. http://genome.ucsc.edu/FAQ/FAQdownloads#download10 I hope this information is helpful to you. Please don't hesitate to contact us again if you require further assistance. Kayla Smith UCSC Genome Bioinformatics Group Yan Xiang wrote: > Dear Sir/Madam, > > I was using Human Genome Browser and had a question about these chromosomes below(what chromosomes they are? Or contig accession numbers for them?): > > chr1_random > chr10_random > chr11_random > chr13_random > chr15_random > chr16_random > chr17_random > chr19_random > chr2_random > chr21_random > chr22_h2_hap1 > chr22_random > chr3_random > chr4_random > chr5_h2_hap1 > chr5_random > chr6_cox_hap1 > chr6_qbl_hap2 > chr6_random > chr7_random > chr8_random > chr9_random > chrX_random > > Below are some chromosomes I got from NCBI webpage. Is there any relationship between UCSC chromosomes up and the NCBI chromosomes below? > > chr1|NT_113870.1 > chr1|NT_113871.1 > chr1|NT_113872.1 > chr1|NT_113874.1 > chr1|NT_113875.1 > chr1|NT_113878.1 > chr10|NT_113918.1 > chr13|NT_113923.1 > chr15|NT_113924.1 > chr15|NT_113925.1 > chr15|NT_113926.1 > chr15|NT_113927.1 > chr15|NT_113928.1 > chr16|NT_113929.1 > chr17|NT_113930.1 > chr17|NT_113931.1 > chr17|NT_113932.1 > chr17|NT_113933.1 > chr17|NT_113934.1 > chr17|NT_113935.1 > chr17|NT_113936.1 > chr17|NT_113937.1 > chr17|NT_113939.1 > chr17|NT_113942.1 > chr17|NT_113943.1 > chr17|NT_113944.1 > chr19|NT_113949.1 > chr2|NT_113880.1 > chr21|NT_113951.1 > chr21|NT_113952.1 > chr21|NT_113953.1 > chr21|NT_113954.1 > chr21|NT_113955.1 > chr21|NT_113956.1 > chr21|NT_113957.1 > chr21|NT_113958.1 > chr22|NT_113961.1 > chr3|NT_113881.1 > chr3|NT_113883.1 > chr3|NT_113884.1 > chr4|NT_113885.1 > chr4|NT_113886.1 > chr4|NT_113888.1 > chr4|NT_113889.1 > chr5|NT_113890.1 > chr6|NT_113898.1 > chr6|NT_113899.1 > chr7|NT_113900.1 > chr7|NT_113901.1 > chr7|NT_113902.1 > chr8|NT_113906.1 > chr8|NT_113908.1 > chr8|NT_113910.1 > chr9|NT_113911.1 > chr9|NT_113912.1 > chr9|NT_113913.1 > chr9|NT_113915.1 > chr9|NT_113917.1 > chrX|NT_113962.1 > chrX|NT_113963.1 > chrX|NT_113964.1 > chrX|NT_113965.1 > chrX|NT_113966.1 > > Thank you! > > Best Regards, > > > Yan Xiang > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From kayla at soe.ucsc.edu Mon Dec 3 15:40:17 2007 From: kayla at soe.ucsc.edu (Kayla Smith) Date: Mon, 03 Dec 2007 15:40:17 -0800 Subject: [Genome] Standardizing the heights of different tracks Message-ID: <475493E1.2000407@cse.ucsc.edu> Hello Steve, There are two ways to change the browser view you've mentioned. 1. Click on the blue bar on the left hand side of the track in question. In the list of settings at the top of the track details page, set "Data view scaling" to "use vertical viewing range setting". You can also set the "Vertical Viewing Range" to whatever values are appropriate. 2. Since you are running a mirror, you can add some information to the trackDb.ra block for the tracks in question. This information is in the README file in the trackDb directory. I'll paste the relevant part of the README here: ------------------------------------------------------------------------------ 14. type wig [lower] [upper] Continuous value graphing track. [lower] - overall lower limit of the data, default 0.0 [upper] - overall upper limit of the data, default 127.0 trackDb record options: autoScale on|off # default is off gridDefault on|off # default is off (draw y=0.0 line) maxHeightPixels max:default:min # default is 128:128:11 graphType bar|points # default is bar viewLimits lower:upper # default is from the type line limits yLineMark real-value # default is 0.0 yLineOnOff on|off # default is off (draw y=yLineMark line) windowingFunction maximum|mean|minimum # default is maximum smoothingWindow off|[2-16] # default is off wigColorBy # use colors in bed for wiggle # in overlapping regions spanList s1,s2,s3... # list of spans in the loaded table # you can find the spans by doing: # "select span from group by span" # typically spanList is only one: # spanList 1 # rarely it may be more: # spanList 1,1000 # special efforts must be made to load extra spans # into the table for special purposes. -------------------------------------------------------------------------- I hope this information is helpful to you. Please don't hesitate to contact us again if you require further assistance. Kayla Smith UCSC Genome Bioinformatics Group Hi, Could you please help us on the heights of custom tracks? We have separate ChIP-sequencing experiments on Arabidopsis, each is being represented by a separate track. The "score" is the coverage by sequencing reads. We need that 5x coverage (denoted by a horizontal black line here) would be displayed at identical height in each experiment and in each window, Unfortunately, so far it depends on the maximal coverage at the individual window of the individual track. See the green scores (H3 Drought) 5x runs very high, but at TM Drought (bottom score track with Navy blue) it is about a tenth of the height of H3 control. We need that the same coverage would result in the same height in all windows in all experiments. Could you please help us with that? Thank you so much! Steve Steve Ladunga Professor of Computational Biology UNL Center for Biotechnology and Department of Statistics E204 Beadle Center, University of Nebraska-Lincoln 1901 Vine St., Lincoln, NE 68588-0665 Phone: (402) 472-6074 Fax: (402) 472-3139 sladunga at unl.edu From jje at gate.sinica.edu.tw Tue Dec 4 00:18:13 2007 From: jje at gate.sinica.edu.tw (J.J. Emerson) Date: Tue, 4 Dec 2007 16:18:13 +0800 Subject: [Genome] Chimpanzee self alignments Message-ID: I notice that the self alignments at UCSC are done only for select genomes which appear to be vertebrate, relatively "high quality" and "important". I was wondering if the chimpanzee genome could be run through the pipeline in order to collect the self alignments? I know this is a big undertaking, but considering the attention to primate genomics, especially with regard to duplication, this would seem useful, even if the chimp genome isn't as high quality as the human genome. It seems as if chimp meets most of these criteria well, its assembly quality notwithstanding. It would be a really great resource, even with the caveats. I can think of many people who would find such a resource invaluable. Thanks a lot! Cheers, J.J. PS These are some keywords that might help others using a search engine to find this post and its response: Pan troglodytes, paralogy, synteny, self alignment, duplication, chimp, chimpanzee, duplication, CNV From trees at gate.sinica.edu.tw Tue Dec 4 00:34:40 2007 From: trees at gate.sinica.edu.tw (Trees_Juen Chuang) Date: Tue, 4 Dec 2007 16:34:40 +0800 Subject: [Genome] non-human self alignments Message-ID: <00a501c83650$83686680$330ea8c0@grctrees> Dear UCSC genome browser: Could your browser provide non-human self alignments (e.g., chimpanzee, macaque, mouse self alignments) (chain format)? Thank you very much. Best regards, TJ Chuang Academia Sinica, Taiwan From dromin at cs.bgu.ac.il Tue Dec 4 07:50:59 2007 From: dromin at cs.bgu.ac.il (dromi nir) Date: Tue, 4 Dec 2007 17:50:59 +0200 (IST) Subject: [Genome] question about Direct MySQL access to data Message-ID: <200712041551.lB4FoxCl026735@indigo.cs.bgu.ac.il> hi, i wasn't able to download the mySQL client in the link you gave. is there other places from were i can down load a mySQL client (for WINDOWS!!) (i looked at the net and couldn't find any) thanks a lot, Nir From chandrashekara.mallappa at umassmed.edu Tue Dec 4 08:23:20 2007 From: chandrashekara.mallappa at umassmed.edu (Chandrashekara Mallappa) Date: Tue, 04 Dec 2007 11:23:20 -0500 Subject: [Genome] Annotation-confused Message-ID: Hi I am trying to clone CHD2 gene from mouse. As for as my knowledge goes full length cDNA of this gene is never cloned. This gene is annotated to have 6 exons and a transcript length of 739bp in Ensemble and the same gene is annotated to have more than 25 exons with a transcript length of 5794bp in NCBI and UCSC genome browser. It is very important for me to clone this gene for my research. So please guide me about what transcript should I consider...Thanks a lot for your time and help. From birney at ebi.ac.uk Tue Dec 4 09:06:37 2007 From: birney at ebi.ac.uk (Ewan Birney) Date: Tue, 4 Dec 2007 17:06:37 +0000 Subject: [Genome] Annotation-confused In-Reply-To: References: Message-ID: On 4 Dec 2007, at 16:23, Chandrashekara Mallappa wrote: > Hi I am trying to clone CHD2 gene from mouse. As for as my > knowledge goes > full length cDNA of this gene is never cloned. This gene is > annotated to > have 6 exons and a transcript length of 739bp in Ensemble and the > same gene > is annotated to have more than 25 exons with a transcript length of > 5794bp > in NCBI and UCSC genome browser. It is very important for me to > clone this > gene for my research. So please guide me about what transcript > should I > consider...Thanks a lot for your time and help. > Looking into this, the NCBI/UCSC model looks far better. The Ensembl model has broken the gene into two portions due to two partial cDNAs which have not been linked together. This is a split/merge decision therefore that we (Ensembl) called incorrectly. I'm also going to forward this to the Ensembl gene builders to look into this case to work out if we could have called it better from the evidence. > > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From viswanathl at mail.nih.gov Tue Dec 4 11:12:47 2007 From: viswanathl at mail.nih.gov (Viswanath, Lalitha (NIH/NCI) [C]) Date: Tue, 4 Dec 2007 14:12:47 -0500 Subject: [Genome] Query about knownGenes tables at http://hgdownload.cse.ucsc.edu/goldenPath/hg18/database/ Message-ID: Hi I am looking for detailed documentation regarding the following 3 tables knownGenes.txt.gz knownToLocusLink.txt.gz knownToRefSeq.txt.gz at http://hgdownload.cse.ucsc.edu/goldenPath/hg18/database/ I am using the first file to get chromosomal starts and stops for genes from UCSC's Genome Browser and the latter 2 to get the mappings to corresponding RefSeq and Entrez Gene Ids. Strangely I find that none of the UCSC Ids in the first file find a mention in the other two files?? The ids in the first file are prefixed uc007xxxx while the ids in the other two files are prefixed uc004xxx. Any inputs on understanding the above better would be helpful. Hoping to hear further Thanks Lalitha From archanat at soe.ucsc.edu Tue Dec 4 12:14:05 2007 From: archanat at soe.ucsc.edu (Archana Thakkapallayil) Date: Tue, 04 Dec 2007 12:14:05 -0800 Subject: [Genome] Query about knownGenes tables at http://hgdownload.cse.ucsc.edu/goldenPath/hg18/database/ In-Reply-To: References: Message-ID: <4755B50D.4090008@soe.ucsc.edu> Hello Lalitha, I am not sure what you are looking at. But, I don't see anything wrong with the IDs in the tables that you've mentioned. They all have values from uc001 ... uc004. To view a description for a specific table, select that table in the Table Browser, then click the "describe table schema" button (if you know the table name and want to bypass the group/track settings, simply set group="All Tables", then select your table from the table list). On this page you can also see which other tables are related to the one currently in use and via which identifier, under the section "Connected Tables and Joining Fields". hg18.knownToLocusLink.name (via knownGene.name) hg18.knownToRefSeq.name (via knownGene.name) I hope this information is helpful to you. If you have further questions, please do not hesitate to contact us again. Regards, Archana UCSC Genome Bioiformatics Group Viswanath, Lalitha (NIH/NCI) [C] wrote: > Hi > > I am looking for detailed documentation regarding the following 3 tables > > knownGenes.txt.gz > > knownToLocusLink.txt.gz > > knownToRefSeq.txt.gz > > at http://hgdownload.cse.ucsc.edu/goldenPath/hg18/database/ > > I am using the first file to get chromosomal starts and stops for genes > from UCSC's Genome Browser and the latter 2 to get the mappings to > corresponding RefSeq and Entrez Gene Ids. > > Strangely I find that none of the UCSC Ids in the first file find a > mention in the other two files?? > > The ids in the first file are prefixed uc007xxxx while the ids in the > other two files are prefixed uc004xxx. > > > > Any inputs on understanding the above better would be helpful. > > > > Hoping to hear further > > Thanks > > Lalitha > > > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome > From archanat at soe.ucsc.edu Tue Dec 4 12:14:58 2007 From: archanat at soe.ucsc.edu (Archana Thakkapallayil) Date: Tue, 04 Dec 2007 12:14:58 -0800 Subject: [Genome] question about Direct MySQL access to data In-Reply-To: <200712041551.lB4FoxCl026735@indigo.cs.bgu.ac.il> References: <200712041551.lB4FoxCl026735@indigo.cs.bgu.ac.il> Message-ID: <4755B542.7050007@soe.ucsc.edu> Hello Nir, Unfortunately, MySQL is refusing to provide v4.0. It's still the fastest version of mysql. The next best thing is to go to : http://downloads.mysql.com/ and get the newest version 4.1 which is currently: http://downloads.mysql.com/archives.php?p=mysql-4.1&v=4.1.22 I hope this information is helpful to you. If you have further questions, please do not hesitate to contact us again. Regards, Archana UCSC Genome Bioinformatics Group dromi nir wrote: > hi, > > i wasn't able to download the mySQL client in the link you gave. > is there other places from were i can down load a mySQL client (for WINDOWS!!) > (i looked at the net and couldn't find any) > > thanks a lot, > > Nir > > > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome > From rachelz807 at gmail.com Tue Dec 4 13:47:06 2007 From: rachelz807 at gmail.com (Rachel Zhang) Date: Tue, 4 Dec 2007 16:47:06 -0500 Subject: [Genome] UCSC genome browser for User input GFF files Message-ID: Dear UCSC genome browser help, Could you help suggest how to best display the user data in the GFF format? The example is in the attached text file. It contains the human chromosome #, the start # and end 3, and an experimental value associated with that segment. All the strand is +. The start and end # are estimated mapping from mouse genome to human genome based on synteny mapping. I tried the "Custom track" http://genome.ucsc.edu/goldenPath/customTracks/custTracks.html, but I could only get the graph with the segment, no experimental value can be displayed. Would you be able to tell me if it is my gff format that is incorrect, or I should use some other function to graph? Thank you so much for your help! With kind regards, Rachel Zhang Engineering Science Biomedical Option University of Toronto -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: UCSC1121A.gff Url: http://www.soe.ucsc.edu/pipermail/genome/attachments/20071204/2f1c211f/attachment-0001.pl From archanat at soe.ucsc.edu Tue Dec 4 16:35:21 2007 From: archanat at soe.ucsc.edu (Archana Thakkapallayil) Date: Tue, 04 Dec 2007 16:35:21 -0800 Subject: [Genome] UCSC genome browser for User input GFF files In-Reply-To: References: Message-ID: <4755F249.1010805@soe.ucsc.edu> Hello Rachel, I tried loading your custom track and it works fine for me. But, it displays only "H1" for all the items in the example on the browser. From the user's guide for the GFF format, it is said that in the 9th field, all lines with the same group are linked together into a single item. So, if you need separate names for each items, you could either stay with the GFF format and remove the group 'H1' and use only the name that you like to display or you could use a different format. example: chr1 Sam HSP 960386 960446 . + . Note "A_53_P150121" chr1 Sam HSP 966557 966617 . + . Note "A_53_P104941". I hope this information is helpful to you. If you have further questions, please do not hesitate to contact us again. Regards, Archana UCSC Genome Bioinformatics Group Rachel Zhang wrote: > Dear UCSC genome browser help, > > Could you help suggest how to best display the user data in the GFF format? > The example is in the attached text file. > > It contains the human chromosome #, the start # and end 3, and an > experimental value associated with that segment. All the strand is +. > The start and end # are estimated mapping from mouse genome to human genome > based on synteny mapping. > > I tried the "Custom track" > http://genome.ucsc.edu/goldenPath/customTracks/custTracks.html, but I could > only get the graph with the segment, no experimental value can be displayed. > > > Would you be able to tell me if it is my gff format that is incorrect, or I > should use some other function to graph? > > Thank you so much for your help! > > With kind regards, > Rachel Zhang > > Engineering Science Biomedical Option > University of Toronto > > ------------------------------------------------------------------------ > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From donovan.chan at mcgill.ca Wed Dec 5 11:26:29 2007 From: donovan.chan at mcgill.ca (Donovan Chan) Date: Wed, 5 Dec 2007 14:26:29 -0500 Subject: [Genome] Rat update and Cow dataset Message-ID: Hello, I just had two quick question. First, I was wondering when a lastest version of the Rat genome would be uploaded. The last version (rn4) was back in Nov 2004. Second, concerning the cow genome sequence: Would it be possible to obtain the data set by Chromosome, instead of the full data set? I know in the bosTau2, the information is available, but the most recent version would be best. All the data sets have been of great help to my reserach. Thank you for all your help Cheers Donovan Chan From ann at soe.ucsc.edu Wed Dec 5 13:36:32 2007 From: ann at soe.ucsc.edu (Ann Zweig) Date: Wed, 05 Dec 2007 13:36:32 -0800 Subject: [Genome] BAT2D1 protein In-Reply-To: References: Message-ID: <475719E0.8060402@cse.ucsc.edu> Hello John, I have taken a look at the gene/protein you are looking at. It is a little strange, but I think I have a good explanation. The protein sequence that you can get from the Known Gene detail page (click on "Protein (2817 aa)") really is 2817 amino acids. This is an exact match to the RefSeq sequence for NM_015172 (from which it was determined). See: http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&id=115298681 The closest protein in the UniProt database has only 2701 amino acids -- it is missing the two last exons. Our Proteome Browser annotations are based on UniProt; this explains the difference. I hope this information is helpful to you. Please don't hesitate to contact the mail list again if you require further assistance. Regards, ---------- Ann Zweig UCSC Genome Bioinformatics Group http://genome.ucsc.edu John Darlow wrote: > On the page for Human BAT2D1, chr1:169,721,290-169,829,273, Mar 2006 > Assembly, > > http://genome.ucsc.edu/cgi-bin/hgGene? > hgg_gene=uc001ghs.1&hgg_prot=NP_055987&hgg_chrom=chr1&hgg_start=1697212 > 89&hgg_end=169829273&hgg_type=knownGene&db=hg18&hgsid=100350226 > > In the table under 'Sequence and Links to Tools and Databases' you > have 'Protein (2817 aa)', but clicking on the link to Proteome Browser > brings one to a page about a protein (Q9Y520) of 2701 amino-acids, not > 2817, and all the other links I tried to find out about domain > structure of the protein also indicated that it only has 2701 amino- > acids. > > So, I downloaded your genomic sequence of the BAT2D1 gene, first with > exons in capitals, and then with CDS in capitals to identify the start > and stop, then identified exactly which amino-acids were in each exon > (since your diagram of the protein in the Protein-Browser does not > show the exons), and labelled this on my copy. I found that it > actually has apparently 2816 amino-acids, not 2817 (perhaps you > counted the stop-codon as an amino-acid?) Then I translated it and > aligned the 2816-amino-acid sequence with the 2701-amino-acid sequence > from the Proteome Browser. When I found the place where the two > sequences diverge, I then looked at my annotated genomic sequence and > found where the difference comes. > > You have exon 32 (chr1:169,823,411-169,823,506) and then just two > nucleotides, cc, before a 5-nt exon 33 (chr1:169823509-169823513). The > 2701-amino-acid protein is made by translating the cc, which makes a > different frame with an earlier stop-codon than the one in which the > cc is spliced out of the RNA. > > The question is. Why does your page say 'Protein (2817 aa)' (which > should be 2816) and then give a link to a protein of 2701 aa, and > which is the right answer? > > John Darlow > National Centre for Medical Genetics > Dublin > Ireland > > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From gbot at taconic.com Thu Dec 6 08:38:25 2007 From: gbot at taconic.com (Gerald Bothe) Date: Thu, 06 Dec 2007 11:38:25 -0500 Subject: [Genome] download a small sequence from mouse build 33 (May 2004) Message-ID: <4757DF31020000B200072E95@mail2.taconic.com> I have SNPs with base coordinates from mouse build 33, e.g. mm33-17-66541888. I know how to view the sequence in the browser, but how do I download it? The sequence on the screen seems to be in a bitmap format, not text that I could just copy and past. Gerald Gerald W. Bothe, Ph. D. Senior Scientist, R&D Taconic Biotechnology 5 University Place Rensselaer, NY 12144 (518) 257 2030 ext. 12126 From morales at mpi-cbg.de Thu Dec 6 08:55:51 2007 From: morales at mpi-cbg.de (morales at mpi-cbg.de) Date: Thu, 6 Dec 2007 17:55:51 +0100 (MET) Subject: [Genome] ID composition Message-ID: <2852.10.1.42.114.1196960151.squirrel@webmail.mpi-cbg.de> Dear help, Could you explain me the meaning of the the three letters present before before the dot in the UCSC genes ID (uc001ppp.1, uc001ppq.1, the ppp or the ppq in those examples). Is the last letter indicating something about the alternative spliced variants and if so, which is the order for assigning them? Thank you in advance, Lucia Morales MPI-CBG Dresden From gtg894p at mail.gatech.edu Thu Dec 6 10:55:52 2007 From: gtg894p at mail.gatech.edu (Jittima Piriyapongsa) Date: Thu, 06 Dec 2007 13:55:52 -0500 Subject: [Genome] phastCons17way database Message-ID: <7.0.1.0.2.20071206134524.024ec978@mail.gatech.edu> Hi, I am trying to get the conservation scores from phastCons17way track in the table browser. How can I store and run it locally? I already downloaded the sql table definition from ftp://hgdownload.cse.ucsc.edu/goldenPath/hg18/database/phastCons17way.sql. What is /gbdb/hg18/multiz17way/phastCons17way.wib? How can I get and use it for creating and querying the local database? Thank you. Jittima From kayla at soe.ucsc.edu Thu Dec 6 11:15:33 2007 From: kayla at soe.ucsc.edu (Kayla Smith) Date: Thu, 06 Dec 2007 11:15:33 -0800 Subject: [Genome] question about double records in refseq database In-Reply-To: References: Message-ID: <47584A55.7060600@cse.ucsc.edu> Hello Yael, yael shemla wrote: > Hi, > > I would like to get your help about the refseq database: > > 1. The refseq databse should give deatils about full length transcript. > If so, why sometimes for the same gene (same genesymbol name) there are > several refseq records? is it alternative splicing forms? and if so, why > dont i get all the alternative splicing for all genes? One reason there might be more than one RefSeq record for the same gene symbol is alternative splicing. However, please see a full explanation of the RefSeq database at the RefSeq web site: http://www.ncbi.nlm.nih.gov/projects/RefSeq/ Also, the NCBI handbook has more information on RefSeq: http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=handbook.chapter.ch18 > 2. Why there are double records sometimes with the same refseq name? I am not sure what you are referring to. Do you have an example? > 3. I searched for the refseq "NM_010699",on mouse genome March2005. > when i enter the "NM_010699", i get details only about the > chr7:40932461-40940836 position > but when I search in this table with all the refseq in chr14, i get in the > list the NM_010699 gene. > why i don't get the two details when i search for the gene in the first > time? I'm looking into why the search is only giving you one of the two hits. If you'd like to try our development server http://hgwdev.cse.ucsc.edu it is giving both hits. I'll get back to you when this is fixed. I hope this information is helpful to you. Please don't hesitate to contact us again if you require further assistance. Kayla Smith UCSC Genome Bioinformatics Group From hiram at soe.ucsc.edu Thu Dec 6 11:23:31 2007 From: hiram at soe.ucsc.edu (Hiram Clawson) Date: Thu, 06 Dec 2007 11:23:31 -0800 Subject: [Genome] phastCons17way database In-Reply-To: <7.0.1.0.2.20071206134524.024ec978@mail.gatech.edu> References: <7.0.1.0.2.20071206134524.024ec978@mail.gatech.edu> Message-ID: <47584C33.7010600@soe.ucsc.edu> Please note the discussion of how to do this in the genomewiki: http://genomewiki.ucsc.edu/index.php/Using_hgWiggle_without_a_database --Hiram Jittima Piriyapongsa wrote: > Hi, > > I am trying to get the conservation scores from phastCons17way track > in the table browser. How can I store and run it locally? > I already downloaded the sql table definition from > ftp://hgdownload.cse.ucsc.edu/goldenPath/hg18/database/phastCons17way.sql. > What is /gbdb/hg18/multiz17way/phastCons17way.wib? How can I get and > use it for creating and querying the local database? > > Thank you. > Jittima From kayla at soe.ucsc.edu Thu Dec 6 12:04:38 2007 From: kayla at soe.ucsc.edu (Kayla Smith) Date: Thu, 06 Dec 2007 12:04:38 -0800 Subject: [Genome] download a small sequence from mouse build 33 (May 2004) In-Reply-To: <4757DF31020000B200072E95@mail2.taconic.com> References: <4757DF31020000B200072E95@mail2.taconic.com> Message-ID: <475855D6.6010204@cse.ucsc.edu> Hello Gerald, Your coordinates from mouse build 33 correspond to our "mm5" mouse assembly. You'll need to lift those coordinates to our "mm8" assembly first. You can find the liftOver tool by clicking on "Utilities" on the blue bar on the left hand side of the main page, or by clicking here: http://genome.ucsc.edu/cgi-bin/hgLiftOver The reason for this is that we do not offer a SNP track for older mouse assemblies. We do offer a SNP track for both mm8 and mm9, the two most recent mouse assemblies. Here I would save your new set of coordinates as a Custom Track. Details on custom tracks are here: http://genome.ucsc.edu/cgi-bin/hgCustom From here, you can use the Table Browser to access the SNP table. The Table Browser is found via the "Tables" link on the blue bar on the top of the main page. Use the following settings: clade: Vertebrate genome: Mouse assembly: Feb 2006 group: Variation and Repeats track: SNPs(126) table: snp126 region: genome intersection: create group: custom tracks, and select your track output format: all fields from selected table (you could also select "sequence" here) click on "get output" I am not sure whether you have a list of RS numbers, or if you just have some coordinates. The way to retrieve data is going to be different in each case. Also, you may want flanking sequence around your snps. The answer above should get you started with our tools. I hope this is helpful to you. If you need more help, please don't hesitate to write back to us. Kayla Smith UCSC Genome Bioinformatics Group Gerald Bothe wrote: > I have SNPs with base coordinates from mouse build 33, e.g. mm33-17-66541888. I know how to view the sequence in the browser, but how do I download it? The sequence on the screen seems to be in a bitmap format, not text that I could just copy and past. > > Gerald > > > > > Gerald W. Bothe, Ph. D. > Senior Scientist, R&D > Taconic Biotechnology > 5 University Place > Rensselaer, NY 12144 > (518) 257 2030 ext. 12126 > > > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome > From - Thu From kayla at soe.ucsc.edu Thu Dec 6 13:14:04 2007 From: kayla at soe.ucsc.edu (Kayla Smith) Date: Thu, 06 Dec 2007 13:14:04 -0800 Subject: [Genome] ID composition In-Reply-To: <2852.10.1.42.114.1196960151.squirrel@webmail.mpi-cbg.de> References: <2852.10.1.42.114.1196960151.squirrel@webmail.mpi-cbg.de> Message-ID: <4758661C.6030104@cse.ucsc.edu> Hello Lucia, There is no meaning behind the naming scheme of the UCSC genes. The suffixes you mention are not significant, other than in general they're labeled as the first gene on chr1 is "uc001aaa.1" and the last gene on chrY is "uc004fxp.1". Here is a link to the details page for the UCSC Genes description page: http://genome.ucsc.edu/cgi-bin/hgTrackUi?g=knownGene I hope this is helpful to you. Please don't hesitate to contact us again if you require further assistance. Kayla Smith UCSC Genome Bioinformatics Group morales at mpi-cbg.de wrote: > Dear help, > > Could you explain me the meaning of the the three letters present before > before the dot in the UCSC genes ID (uc001ppp.1, uc001ppq.1, the ppp or > the ppq in those examples). Is the last letter indicating something about > the alternative spliced variants and if so, which is the order for > assigning them? > > Thank you in advance, > > Lucia Morales > MPI-CBG Dresden > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From li3 at niehs.nih.gov Thu Dec 6 13:21:49 2007 From: li3 at niehs.nih.gov (Li, Leping (NIH/NIEHS) [E]) Date: Thu, 6 Dec 2007 16:21:49 -0500 Subject: [Genome] 28-way protein sequence alignment Message-ID: <8BB67CD070A6EE479378FBE1CEB79478019FE439@NIHCESMLBX6.nih.gov> Hi, I have a short human protein sequence. I would like to obtain the multiz alignment of the protein sequences rather than the DNA sequences. I'd appreciate your help. Thanks. Leping Li NIEHS/NIH From kayla at soe.ucsc.edu Thu Dec 6 14:43:39 2007 From: kayla at soe.ucsc.edu (Kayla Smith) Date: Thu, 06 Dec 2007 14:43:39 -0800 Subject: [Genome] download a small sequence from mouse build 33 (May 2004) In-Reply-To: <475855D6.6010204@cse.ucsc.edu> References: <4757DF31020000B200072E95@mail2.taconic.com> <475855D6.6010204@cse.ucsc.edu> Message-ID: <47587B1B.8030704@cse.ucsc.edu> Hello Again Gerald, I made a mistake in my reply to you this morning. We do have a SNP track for the mm5 (NCBI build 33) assembly. You can skip the liftOver instructions, and go directly to making a Custom Track and using the Table Browser. Let me know if you need any assistance with this. Kayla Smith UCSC Genome Bioinformatics Group Kayla Smith wrote: > Hello Gerald, > > Your coordinates from mouse build 33 correspond to our "mm5" mouse > assembly. You'll need to lift those coordinates to our "mm8" assembly > first. You can find the liftOver tool by clicking on "Utilities" on the > blue bar on the left hand side of the main page, or by clicking here: > > http://genome.ucsc.edu/cgi-bin/hgLiftOver > > The reason for this is that we do not offer a SNP track for older mouse > assemblies. We do offer a SNP track for both mm8 and mm9, the two most > recent mouse assemblies. > > Here I would save your new set of coordinates as a Custom Track. > Details on custom tracks are here: > > http://genome.ucsc.edu/cgi-bin/hgCustom > > From here, you can use the Table Browser to access the SNP table. The > Table Browser is found via the "Tables" link on the blue bar on the top > of the main page. Use the following settings: > > clade: Vertebrate > genome: Mouse > assembly: Feb 2006 > group: Variation and Repeats > track: SNPs(126) > table: snp126 > region: genome > intersection: create > group: custom tracks, and select your track > output format: all fields from selected table (you could also select > "sequence" here) > click on "get output" > > I am not sure whether you have a list of RS numbers, or if you just have > some coordinates. The way to retrieve data is going to be different in > each case. Also, you may want flanking sequence around your snps. The > answer above should get you started with our tools. I hope this is > helpful to you. If you need more help, please don't hesitate to write > back to us. > > Kayla Smith > UCSC Genome Bioinformatics Group > > > > > > > > > > > > > Gerald Bothe wrote: >> I have SNPs with base coordinates from mouse build 33, e.g. mm33-17-66541888. I know how to view the sequence in the browser, but how do I download it? The sequence on the screen seems to be in a bitmap format, not text that I could just copy and past. >> >> Gerald >> >> >> >> >> Gerald W. Bothe, Ph. D. >> Senior Scientist, R&D >> Taconic Biotechnology >> 5 University Place >> Rensselaer, NY 12144 >> (518) 257 2030 ext. 12126 >> >> >> >> _______________________________________________ >> Genome maillist - Genome at soe.ucsc.edu >> http://www.soe.ucsc.edu/mailman/listinfo/genome >> From - Thu > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From kayla at soe.ucsc.edu Thu Dec 6 16:20:59 2007 From: kayla at soe.ucsc.edu (Kayla Smith) Date: Thu, 06 Dec 2007 16:20:59 -0800 Subject: [Genome] 28-way protein sequence alignment In-Reply-To: <8BB67CD070A6EE479378FBE1CEB79478019FE439@NIHCESMLBX6.nih.gov> References: <8BB67CD070A6EE479378FBE1CEB79478019FE439@NIHCESMLBX6.nih.gov> Message-ID: <475891EB.8070108@cse.ucsc.edu> Hello Leping, First you need to find genomic coordinates that correspond to your sequence. You can use BLAT to help you find where your protein sequence is located. Here is a link to BLAT: Next, use the Table Browser ("Tables", on the blue bar on the top of the main page) to get the data you're interested in. Set the following options: clade: Vertebrate genome: Human assembly: Mar2006 group: Comparative Genomics track: Conservation table: multiz28way position (paste in your genomic coordinates here!) output format: MAF- multiple alignment format. Press "get output" I hope this is helpful to you. Please don't hesitate to contact us again if you require further assistance. Kayla Smith UCSC Genome Bioinformatics Group Li, Leping (NIH/NIEHS) [E] wrote: > Hi, > > > > I have a short human protein sequence. I would like to obtain the multiz > alignment of the protein sequences rather than the DNA sequences. I'd > appreciate your help. Thanks. > > > > Leping Li > > NIEHS/NIH > > > > > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From peter.shepard at gmail.com Thu Dec 6 17:14:12 2007 From: peter.shepard at gmail.com (Pete Shepard) Date: Thu, 6 Dec 2007 17:14:12 -0800 Subject: [Genome] mutually exclusive exons Message-ID: <5c2c43620712061714g799bdd62h59347ba52100ffe4@mail.gmail.com> Dear Genome Browser Folks, I am wondering if you have a field for mutually exclusive exons in your knownAlt table. Thanks From rhead at soe.ucsc.edu Thu Dec 6 17:45:07 2007 From: rhead at soe.ucsc.edu (Brooke Rhead) Date: Thu, 06 Dec 2007 17:45:07 -0800 Subject: [Genome] Rat update and Cow dataset In-Reply-To: References: Message-ID: <4758A5A3.6020309@soe.ucsc.edu> Hello Donovan, The Nov. 2004 (rn4) rat genome browser corresponds to the most recent sequence that is available (version 3.4 from Baylor). An updated version will be released in the future, and when it is, we will make a browser for it. The Baylor HGSC Rat Genome Project web page is located here, if you are interested: http://www.hgsc.bcm.tmc.edu/projects/rat/ Regarding the bosTau3 genome sequence, the reason we did not generate the data set by chromosome is that there are a large number of scaffolds in addition to the chromosomes. Is the problem that the file is too large to download? If so, we would consider making smaller split files. If you are able to download the file successfully but need tools to extract just the chromosomes of interest from the fasta file, you can use the "faSomeRecords" tool in the kent source tree, which is available for free for academic, non-profit and personal use. See this link for information on downloading the source tree: http://hgdownload.cse.ucsc.edu/downloads.html#source_downloads If this solution does not work for you, please let us know. -- Brooke Rhead UCSC Genome Bioinformatics Group Donovan Chan wrote: > Hello, > I just had two quick question. First, I was wondering when a lastest version of the Rat genome would be uploaded. The last version (rn4) was back in Nov 2004. Second, concerning the cow genome sequence: Would it be possible to obtain the data set by Chromosome, instead of the full data set? I know in the bosTau2, the information is available, but the most recent version would be best. All the data sets have been of great help to my reserach. Thank you for all your help > Cheers > Donovan Chan > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From lhou at genetics.ac.cn Thu Dec 6 19:32:01 2007 From: lhou at genetics.ac.cn (=?gb2312?B?uu7A2g==?=) Date: Fri, 7 Dec 2007 11:32:01 +0800 Subject: [Genome] what about the NM accession number of Chimpanzee Message-ID: <200712071132014067633@genetics.ac.cn> Dear collegue: I am downloading the promoter sequences of chimpanzee, from "Table browser". I just wonder why the NM accession number of every sequence in the FASTA format is that from human? For example, one of the chimpanzee upstream sequence names is ">panTro2_refGene_NM_002600 range=chr1:66940236-66948235 5'pad=0 3'pad=0 revComp=FALSE strand=+ repeatMasking=lower", in which NM_002600 is the NM accession No. for human instead of chimpanzee. Does it mean this gene of chimpanzee is a homolog of NM_002600 in human? Thanks! Best wishes, Lei Hou Graduate student, Jing-Dong Jackie Han's Lab Institute of Genetics and Developmental Biology Chinese Academy of Sciences Datun Road, Beijing 100101 P.R. China TEL: +86-(0)10-6484 5797 From nickitiffin at imaginet.co.za Fri Dec 7 00:48:40 2007 From: nickitiffin at imaginet.co.za (Nicki Tiffin) Date: Fri, 7 Dec 2007 10:48:40 +0200 Subject: [Genome] Human Genome Graphs Upload Fails Message-ID: <004a01c838ad$f71efd00$3bce9e89@PCavds> Hi, I'm trying to upload a list of human SNP rs ids for display in "Human Genome Graphs". I've prepared a tab-delimited file with rs_id (tab) p_value, contains about 200 snp ids and looks like: rsid sig rs11738115 0.001 rs11739020 0.001 rs11740375 0.001 rs11747896 0.001 rs11750464 0.001 I fill in the values: clade: vertebrate genome: Human assembly: Mar.2006 I click on upload, and fill in: name of dataset: AS001 description: AS001 file format: tab delimited markers are: dbSNP rsID column labels: first row I leave the rest to default I use 'browse' to specify the filename (C:/path/AS001snps.txt) and 'submit'. I get returned: "Data Upload Complete (0bytes) These data are now available in the drop-down menus on the main page for graphing" I click on OK. My data is not listed in the graph drop-down box. I'm using: Web browser: Internet Explorer 7.0 OS: Windows XP I don't know how to start troubleshooting this failure to upload the data (I have tried using a different input file with approx 8000 snps, same result), any advice would be most welcome. Thanks, Nicki From ann at soe.ucsc.edu Fri Dec 7 13:32:54 2007 From: ann at soe.ucsc.edu (Ann Zweig) Date: Fri, 07 Dec 2007 13:32:54 -0800 Subject: [Genome] mutually exclusive exons In-Reply-To: <5c2c43620712061714g799bdd62h59347ba52100ffe4@mail.gmail.com> References: <5c2c43620712061714g799bdd62h59347ba52100ffe4@mail.gmail.com> Message-ID: <4759BC06.8040101@cse.ucsc.edu> Hello Pete, We have only enumerated the following alternative splicing and other events in the knownAlt table: * Alternate Promoter (altPromoter) - Transcription starts at multiple places. * Alternate Finish Site (altFinish) - Transcription ends at multiple places. * Cassette Exon (cassetteExon) - Exon is present in some transcripts but not others. These are found by looking for exons that overlap an intron in the same transcript. * Retained Intron (retainedIntron) - Introns are spliced out in some transcripts but not others. In some cases, particularly when the intron is near the 3' end, this can reflect an incompletely processed transcript rather than a true alt-splicing event. * Overlapping Exon (bleedingExon) - Initial or terminal exons overlap in an intron in another transcript. These often are associated with incompletely processed transcripts. * Alternate 3' End (altThreePrime) - Variations on the 3' end of an intron. * Alternate 5' End (altFivePrime) - Variations on the 5' end of an intron. * Intron Ends have AT/AC (atacIntron) - An intron with AT/AC ends rather than the usual GT/AG. These are associated with the minor spliceosome. * Strange Intron Ends (strangeSplice) - An intron with ends that are not GT/AG, GC/AG, or AT/AC. These are usually artifacts of some sort due to sequencing error or polymorphism. Depending on exactly what output you are trying to get, you might be able to create this on your own using the Custom Track tool. You can read about it here: http://genome.cse.ucsc.edu/goldenPath/help/customTrack.html Regards, ---------- Ann Zweig UCSC Genome Bioinformatics Group http://genome.ucsc.edu Please feel free to search the Genome mailing list archives by visiting our home page, clicking on "Contact Us", then typing a word or phrase into the search box. On that same page (http://genome.ucsc.edu/contacts.html), you can subscribe to the Genome mailing list. Pete Shepard wrote: > Dear Genome Browser Folks, > > I am wondering if you have a field for mutually exclusive exons in your > knownAlt table. > > Thanks > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From rhead at soe.ucsc.edu Fri Dec 7 13:44:03 2007 From: rhead at soe.ucsc.edu (Brooke Rhead) Date: Fri, 07 Dec 2007 13:44:03 -0800 Subject: [Genome] Rat update and Cow dataset In-Reply-To: <4758A5A3.6020309@soe.ucsc.edu> References: <4758A5A3.6020309@soe.ucsc.edu> Message-ID: <4759BEA3.5040408@soe.ucsc.edu> Hi again Donovan, We went ahead and created the per-chromosome sequence downloads for bosTau3: http://hgdownload.cse.ucsc.edu/goldenPath/bosTau3/chromosomes/ Please let us know if you have any issues using them. -- Brooke Rhead UCSC Genome Bioinformatics Group Brooke Rhead wrote: > Hello Donovan, > > The Nov. 2004 (rn4) rat genome browser corresponds to the most recent > sequence that is available (version 3.4 from Baylor). An updated > version will be released in the future, and when it is, we will make a > browser for it. The Baylor HGSC Rat Genome Project web page is located > here, if you are interested: > > http://www.hgsc.bcm.tmc.edu/projects/rat/ > > Regarding the bosTau3 genome sequence, the reason we did not generate > the data set by chromosome is that there are a large number of scaffolds > in addition to the chromosomes. Is the problem that the file is too > large to download? If so, we would consider making smaller split files. > > If you are able to download the file successfully but need tools to > extract just the chromosomes of interest from the fasta file, you can > use the "faSomeRecords" tool in the kent source tree, which is available > for free for academic, non-profit and personal use. See this link for > information on downloading the source tree: > > http://hgdownload.cse.ucsc.edu/downloads.html#source_downloads > > If this solution does not work for you, please let us know. > > -- > Brooke Rhead > UCSC Genome Bioinformatics Group > > > Donovan Chan wrote: >> Hello, >> I just had two quick question. First, I was wondering when a lastest > version of the Rat genome would be uploaded. The last version (rn4) was > back in Nov 2004. Second, concerning the cow genome sequence: Would it > be possible to obtain the data set by Chromosome, instead of the full > data set? I know in the bosTau2, the information is available, but the > most recent version would be best. All the data sets have been of great > help to my reserach. Thank you for all your help >> Cheers >> Donovan Chan >> >> _______________________________________________ >> Genome maillist - Genome at soe.ucsc.edu >> http://www.soe.ucsc.edu/mailman/listinfo/genome > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From ann at soe.ucsc.edu Fri Dec 7 13:48:38 2007 From: ann at soe.ucsc.edu (Ann Zweig) Date: Fri, 07 Dec 2007 13:48:38 -0800 Subject: [Genome] what about the NM accession number of Chimpanzee In-Reply-To: <200712071132014067633@genetics.ac.cn> References: <200712071132014067633@genetics.ac.cn> Message-ID: <4759BFB6.8000206@cse.ucsc.edu> Hello Lei, Please see this previously-answered mail list question on the subject: http://www.soe.ucsc.edu/pipermail/genome/2007-November/015056.html Regards, ---------- Ann Zweig UCSC Genome Bioinformatics Group http://genome.ucsc.edu Please feel free to search the Genome mailing list archives by visiting our home page, clicking on "Contact Us", then typing a word or phrase into the search box. On that same page (http://genome.ucsc.edu/contacts.html), you can subscribe to the Genome mailing list. ?? wrote: > Dear collegue: > > I am downloading the promoter sequences of chimpanzee, from "Table browser". I just wonder why the NM accession number of every sequence in the FASTA format is that from human? For example, one of the chimpanzee upstream sequence names is ">panTro2_refGene_NM_002600 range=chr1:66940236-66948235 5'pad=0 3'pad=0 revComp=FALSE strand=+ repeatMasking=lower", in which NM_002600 is the NM accession No. for human instead of chimpanzee. Does it mean this gene of chimpanzee is a homolog of NM_002600 in human? > > Thanks! > > Best wishes, > > Lei Hou > Graduate student, Jing-Dong Jackie Han's Lab > Institute of Genetics and Developmental Biology > Chinese Academy of Sciences > Datun Road, Beijing 100101 > P.R. China > TEL: +86-(0)10-6484 5797 > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From ann at soe.ucsc.edu Fri Dec 7 13:53:00 2007 From: ann at soe.ucsc.edu (Ann Zweig) Date: Fri, 07 Dec 2007 13:53:00 -0800 Subject: [Genome] Human Genome Graphs Upload Fails In-Reply-To: <004a01c838ad$f71efd00$3bce9e89@PCavds> References: <004a01c838ad$f71efd00$3bce9e89@PCavds> Message-ID: <4759C0BC.9060506@cse.ucsc.edu> Hello Nicki, When I reproduce your steps exactly as you have stated them, the upload also fails for me. I'm not sure why that is. However, when I try the same thing without completing the following fields, it works fine: file format: markers are: column labels: Please try it this way (let the tool choose 'best guess' for those fields for you) and see if it works. If it doesn't, please write back to the list and we will troubleshoot with you. Regards, ---------- Ann Zweig UCSC Genome Bioinformatics Group http://genome.ucsc.edu Please feel free to search the Genome mailing list archives by visiting our home page, clicking on "Contact Us", then typing a word or phrase into the search box. On that same page (http://genome.ucsc.edu/contacts.html), you can subscribe to the Genome mailing list. Nicki Tiffin wrote: > Hi, > > I'm trying to upload a list of human SNP rs ids for display in "Human Genome Graphs". > > I've prepared a tab-delimited file with rs_id (tab) p_value, contains about 200 snp ids and looks like: > > rsid sig > rs11738115 0.001 > rs11739020 0.001 > rs11740375 0.001 > rs11747896 0.001 > rs11750464 0.001 > > I fill in the values: > clade: vertebrate > genome: Human > assembly: Mar.2006 > > I click on upload, and fill in: > name of dataset: AS001 > description: AS001 > file format: tab delimited > markers are: dbSNP rsID > column labels: first row > > I leave the rest to default > I use 'browse' to specify the filename (C:/path/AS001snps.txt) and 'submit'. > > I get returned: > "Data Upload Complete (0bytes) > These data are now available in the drop-down menus on the main page for graphing" > I click on OK. > > My data is not listed in the graph drop-down box. > > I'm using: > > Web browser: Internet Explorer 7.0 > OS: Windows XP > > I don't know how to start troubleshooting this failure to upload the data (I have tried using a different input file with approx 8000 snps, same result), any advice would be most welcome. > Thanks, > Nicki > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From jin_ma at merck.com Fri Dec 7 17:14:44 2007 From: jin_ma at merck.com (Ma, Jin) Date: Fri, 7 Dec 2007 17:14:44 -0800 Subject: [Genome] SNP update Message-ID: <9BEE7CC4462DB14997A5C8CF8F3BEB0201B70319@ussemx1100.merck.com> Hi, Is there a plan to update the snp files to current build 128. Both human and mouse are still on build126 (updated Oct 2006). Thanks. http://hgdownload.cse.ucsc.edu/goldenPath/hg18/database/snp126.txt.gz http://hgdownload.cse.ucsc.edu/goldenPath/mm8/database/snp126.txt.gz Jin ------------------------------------------------------------------------------ Notice: This e-mail message, together with any attachments, contains information of Merck & Co., Inc. (One Merck Drive, Whitehouse Station, New Jersey, USA 08889), and/or its affiliates (which may be known outside the United States as Merck Frosst, Merck Sharp & Dohme or MSD and in Japan, as Banyu - direct contact information for affiliates is available at http://www.merck.com/contact/contacts.html) that may be confidential, proprietary copyrighted and/or legally privileged. It is intended solely for the use of the individual or entity named on this message. If you are not the intended recipient, and have received this message in error, please notify us immediately by reply e-mail and then delete it from your system. ------------------------------------------------------------------------------ From bioinfor2007 at gmail.com Sat Dec 8 08:33:55 2007 From: bioinfor2007 at gmail.com (wenbing meng) Date: Sun, 9 Dec 2007 00:33:55 +0800 Subject: [Genome] Couldn't make AF_INET socket Message-ID: Hello, I'm setting up webBlat followed the instructions. When I do a blat ,I get the alert : Couldn't make AF_INET socket. Permission denied Sorry, the BLAT/iPCR server seems to be down. Please try again later. What does it mean? Thank you very much! From asaflev1 at post.tau.ac.il Sun Dec 9 09:12:58 2007 From: asaflev1 at post.tau.ac.il (asaf levy) Date: Sun, 9 Dec 2007 19:12:58 +0200 Subject: [Genome] Question about repeat masker table Message-ID: <20071209170654.E2005BCC04D@post.tau.ac.il> Hi, Can you please explain for me better what does the repLeft column in repeat masker tables stand for? I don't understand its description: -#bases after match (if strand is +) or start (if strand is -) in repeat sequence Regards, Asaf Levy Tel Aviv University From chris11 at helix.nih.gov Mon Dec 10 08:51:17 2007 From: chris11 at helix.nih.gov (chris11 at helix.nih.gov) Date: Mon, 10 Dec 2007 11:51:17 -0500 (EST) Subject: [Genome] gnf1m expression data Message-ID: <4944.157.99.64.13.1197305477.squirrel@helix.nih.gov> Hi, while downloading expression data associated with gnf1m probes from the ucsc genome browser, I am unable to find information on the tissue of origin for each expression value. For example, I can download what follows: #mm9.affyGnf1m.qName hgFixed.gnfMouseAtlas2All.expScores gnf1m06313_a_at 10,21,2,2,4,10,9,1,9,3,1,2,1,3,1,1,4,28,1,2,6,12,2,2,3,3,1,2,1,8,1,3,39,5,3,4,2,1,58,142,8,1,9,3,4,16,48,4,23,2,5,5,3,4,7,9,2,18,1,2,4,5,1,2,4,12,16,11,4,0,1,1,13,2,1,8,2,25,61,9,1,5,2,13,32,28,1,2,6,2,2,1,16,6,30,3,7,27,15,3,3,20,804,1488,5,1,1,4,5,5,5,4,4,0,10676,13278,10842,9280,18,5,3,4, The hgFixed.gnfMouseAtlas2All.expScores column does not state from which tissue each of the comma-separated values had originated (there should be 61 distinct tissues with 2 replicates each). Could you help me find that information? Many thanks, Chris Ottolenghi Genetics, NIA/Baltimore From KAskland at butler.org Mon Dec 10 06:23:43 2007 From: KAskland at butler.org (Kathleen Askland) Date: Mon, 10 Dec 2007 09:23:43 -0500 Subject: [Genome] (no subject) Message-ID: Hello. I'm writing with a question about downloading data with a specific combination of variables using the Tables Browser. I've been successful in downloading several combinations, but cannot seem to figure out an approach that allows me to have BOTH the probe ID corresponding to the Absolute GNF Human Atlas 2 expression data (i.e., hgFixed.gnfHumanAtlas2All, e.g., in the format '1007_s_at') AND the human gene symbol, or any other ID that can easily be associated with the gene symbol. Any table that I've successfully downloaded with the former, has contained no variable that is easily linkable to gene symbol, NCBI/Entrez Gene GeneID, etc.... This is puzzling as this is precisely the data that is retrieved in the Gene Sorter display when any gene or genomic region is entered. If there is no way of doing this in the Table Browser, is there a way to obtain a single downloadable file from Gene Sorter for the entire genome? Thank you, Kathleen Askland, MD From finneyr at mail.nih.gov Mon Dec 10 10:20:31 2007 From: finneyr at mail.nih.gov (Finney, Richard (NIH/NCI) [E]) Date: Mon, 10 Dec 2007 13:20:31 -0500 Subject: [Genome] protein to dna blat / protein to genome blat alignment In-Reply-To: References: Message-ID: Is there a way to get command line blat to do protein to dna (protein to genome) sequence alignment? I get the message "d and q must both be either protein or dna" when running a command like this "blat -t=prot =q=dna chr1.fa p.fsa output.psl" Normally, I'd take the error message at face value, but I'm a little stumped because the hgBlat server does do protein to genome alignment. I've probably annoyed your sysadmins before by setting up curl/wget scripts to torture your servers to get this information but would really like to do it locally via command line. Thanks for any thoughts and help on this. From kayla at soe.ucsc.edu Mon Dec 10 10:49:15 2007 From: kayla at soe.ucsc.edu (Kayla Smith) Date: Mon, 10 Dec 2007 10:49:15 -0800 (PST) Subject: [Genome] gnf1m expression data In-Reply-To: <4944.157.99.64.13.1197305477.squirrel@helix.nih.gov> References: <4944.157.99.64.13.1197305477.squirrel@helix.nih.gov> Message-ID: Hello Chris, The table: hgFixed.gnfMouseAtlas2MedianExps has the tissue information you are looking for. Here are two previously answered mailing list questions on the topic: http://www.soe.ucsc.edu/pipermail/genome/2005-June/007616.html http://www.soe.ucsc.edu/pipermail/genome/2007-August/014407.html Kayla Smith UCSC Genome Bioinformatics Group On Mon, 10 Dec 2007 chris11 at helix.nih.gov wrote: > Hi, > while downloading expression data associated with gnf1m probes from the > ucsc genome browser, I am unable to find information on the tissue of > origin for each expression value. > For example, I can download what follows: > #mm9.affyGnf1m.qName hgFixed.gnfMouseAtlas2All.expScores > gnf1m06313_a_at 10,21,2,2,4,10,9,1,9,3,1,2,1,3,1,1,4,28,1,2,6,12,2,2,3,3,1,2,1,8,1,3,39,5,3,4,2,1,58,142,8,1,9,3,4,16,48,4,23,2,5,5,3,4,7,9,2,18,1,2,4,5,1,2,4,12,16,11,4,0,1,1,13,2,1,8,2,25,61,9,1,5,2,13,32,28,1,2,6,2,2,1,16,6,30,3,7,27,15,3,3,20,804,1488,5,1,1,4,5,5,5,4,4,0,10676,13278,10842,9280,18,5,3,4, > > The hgFixed.gnfMouseAtlas2All.expScores column does not state from which > tissue each of the comma-separated values had originated (there should be > 61 distinct tissues with 2 replicates each). Could you help me find that > information? > Many thanks, > > Chris Ottolenghi > > Genetics, NIA/Baltimore > > > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome > From galt at soe.ucsc.edu Mon Dec 10 11:01:29 2007 From: galt at soe.ucsc.edu (Galt Barber) Date: Mon, 10 Dec 2007 11:01:29 -0800 (PST) Subject: [Genome] protein to dna blat / protein to genome blat alignment In-Reply-To: References: Message-ID: Only dna/rna to dna/rna queries can be done in nucleotide space. All other combinations of type for query and target really happen in protein space by translating either the query or the target or both into protein space. This true for blast as well as blat. We call the query (-q) the usually smaller thing you are searching for, and the target (-t) is the big thing you are searching, often the genome. According to your description you have protein sequences as your query and you wish to use blat to search the target genome which is given as dna. Therefore you should use blat -q=prot -t=dnax chr1.fa p.fsa output.psl If you run blat at the commandline with nothing after it, you will see all the options including the ones we are discussing: prompt> blat blat - Standalone BLAT v. 34 fast sequence search command line tool usage: blat database query [-ooc=11.ooc] output.psl [...] options: -t=type Database type. Type is one of: dna - DNA sequence prot - protein sequence dnax - DNA sequence translated in six frames to protein The default is dna -q=type Query type. Type is one of: dna - DNA sequence rna - RNA sequence prot - protein sequence dnax - DNA sequence translated in six frames to protein rnax - DNA sequence translated in three frames to protein The default is dna -prot Synonymous with -t=prot -q=prot [...] -Galt On Mon, 10 Dec 2007, Finney, Richard (NIH/NCI) [E] wrote: > Is there a way to get command line blat to do protein to dna (protein to > genome) sequence alignment? I get the message "d and q must both be > either protein or dna" when running a command like this > > "blat -t=prot =q=dna chr1.fa p.fsa output.psl" > > Normally, I'd take the error message at face value, but I'm a little > stumped > because the hgBlat server does do protein to genome alignment. > > I've probably annoyed your sysadmins before by setting up curl/wget > scripts to torture your servers to get this information but would really > like to do > it locally via command line. > > Thanks for any thoughts and help on this. > > > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome > From galt at soe.ucsc.edu Mon Dec 10 11:38:14 2007 From: galt at soe.ucsc.edu (Galt Barber) Date: Mon, 10 Dec 2007 11:38:14 -0800 (PST) Subject: [Genome] Couldn't make AF_INET socket In-Reply-To: References: Message-ID: Refer to the webBlat install.txt for instructions on setting up and configuring webBlat. Make sure the gfServer instances you need are running, and on the right port(s). Make sure your webBlat.cfg matches the gfServer instances you started. Make sure that you have sufficient rights to use the port(s). For some operations in some environments, you may need to be an administrator. If there is a firewall, make sure that it is configured to allow you to reach your gfServer port(s). If after checking these things you are still having trouble, please include in your next question the OS, command-line and ports if possible, and the version of webBlat that you are using. -Galt On Sun, 9 Dec 2007, wenbing meng wrote: > Hello, > I'm setting up webBlat followed the instructions. When I do a blat ,I get > the alert : > > Couldn't make AF_INET socket. > Permission denied Sorry, the BLAT/iPCR server seems to be down. Please try > again later. > > What does it mean? > > Thank you very much! > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome > From Ilya.Ioschikhes at osumc.edu Mon Dec 10 11:29:23 2007 From: Ilya.Ioschikhes at osumc.edu (Ioschikhes, Ilya ) Date: Mon, 10 Dec 2007 14:29:23 -0500 Subject: [Genome] CpG islands and TSS Message-ID: <6254D46E7DD84E4098725E04E5E25919B55242@msxc02.OSUMC.EDU> Hello, Please let me know how could I get following information for known CpG islands: Length; C,G content; Observed/Expected CpG ratio; Distance from nearest TSS. Thanks, Ilya Ioshikhes, Ph.D. Assistant Professor Department of Biomedical Informatics and Department of Molecular & Cellular Biochemistry, Associate Investigator Davis Heart and Lung Research Institute, Ohio State University 3172c Graves Hall 333 W. 10th Ave. Columbus, OH 43210 TEL: +1 (614) 292-8929 Fax: +1 (614) 688-6600 E-mail: Ilya.Ioschikhes at osumc.edu From rhead at soe.ucsc.edu Mon Dec 10 11:43:41 2007 From: rhead at soe.ucsc.edu (Brooke Rhead) Date: Mon, 10 Dec 2007 11:43:41 -0800 Subject: [Genome] Question about repeat masker table In-Reply-To: <20071209170654.E2005BCC04D@post.tau.ac.il> References: <20071209170654.E2005BCC04D@post.tau.ac.il> Message-ID: <475D96ED.1040706@soe.ucsc.edu> Hi Asaf, Here are a couple of previously-answered questions on this topic: http://www.soe.ucsc.edu/pipermail/genome/2007-May/013526.html http://www.soe.ucsc.edu/pipermail/genome/2007-February/012756.html If, after reading these, you still have questions, please feel free to email us again at this mailing list. -- Brooke Rhead UCSC Genome Bioinformatics Group asaf levy wrote: > Hi, > > Can you please explain for me better what does the repLeft column in repeat > masker tables stand for? > > I don't understand its description: > > -#bases after match (if strand is +) or start (if strand is -) in repeat > sequence > > > > Regards, > > Asaf Levy > > Tel Aviv University > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From rhead at soe.ucsc.edu Mon Dec 10 12:01:02 2007 From: rhead at soe.ucsc.edu (Brooke Rhead) Date: Mon, 10 Dec 2007 12:01:02 -0800 Subject: [Genome] (no subject) In-Reply-To: References: Message-ID: <475D9AFE.7060107@soe.ucsc.edu> Hello Kathleen, The 'knownToGnfAtlas2' table contains the knownGene identifier (from the "Known Genes" or "UCSC Genes" track, depending on the database) linked to the GNF Atlas 2 probe ID. The relationship is: hgFixed.gnfHumanAtlas2All.name = hg18.knownToGnfAtlas2.value I hope this is helpful. If you have further questions, please feel free to contact us again. -- Brooke Rhead UCSC Bioinformatics Group Kathleen Askland wrote: > Hello. > I'm writing with a question about downloading data with a specific > combination of variables using the Tables Browser. > I've been successful in downloading several combinations, but cannot > seem to figure out an approach that allows me to have BOTH the probe ID > corresponding to the Absolute GNF Human Atlas 2 expression data (i.e., > hgFixed.gnfHumanAtlas2All, e.g., in the format '1007_s_at') AND the > human gene symbol, or any other ID that can easily be associated with > the gene symbol. Any table that I've successfully downloaded with the > former, has contained no variable that is easily linkable to gene > symbol, NCBI/Entrez Gene GeneID, etc.... > This is puzzling as this is precisely the data that is retrieved in the > Gene Sorter display when any gene or genomic region is entered. > If there is no way of doing this in the Table Browser, is there a way to > obtain a single downloadable file from Gene Sorter for the entire > genome? > Thank you, > Kathleen Askland, MD > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From peter.scacheri at case.edu Mon Dec 10 14:37:05 2007 From: peter.scacheri at case.edu (Peter Scacheri) Date: Mon, 10 Dec 2007 17:37:05 -0500 Subject: [Genome] Wiggle track Message-ID: How do I make a wiggle track from the data below? (Sorry, I've read the help pages, but I can't figure this out!) Also, is there a way to autoscale the data? Thanks, Peter chr1 149645162 149645217 0.09955 chr1 149576001 149576060 0.0161285 chr1 149748090 149748142 0.692 chr1 149425822 149425881 -0.083185 chr1 149781679 149781738 0.16 chr1 149506363 149506407 0.0886 chr1 149539255 149539314 -0.11105 chr1 149798247 149798306 -0.111 chr1 149607234 149607285 0.1008 chr1 149905306 149905365 -0.022781 chr1 149671528 149671587 0.2355 chr1 149683830 149683889 0 chr1 149530390 149530449 -0.02773 chr1 149679760 149679819 -0.02525 chr1 149452382 149452441 0.01005 chr1 149552708 149552762 0.04119 chr1 149546405 149546464 -0.1248 chr1 149669687 149669744 0.05535 chr1 149582350 149582397 -0.014805 chr1 149698153 149698197 0.1855 chr1 149505715 149505767 0.0663 chr1 149547068 149547120 0.06485 chr1 149427694 149427751 0.0223 chr1 149790533 149790588 -0.01355 chr1 149860334 149860393 0.217 chr1 149554891 149554936 0.06705 From rhead at soe.ucsc.edu Mon Dec 10 16:20:10 2007 From: rhead at soe.ucsc.edu (Brooke Rhead) Date: Mon, 10 Dec 2007 16:20:10 -0800 Subject: [Genome] SNP update In-Reply-To: <9BEE7CC4462DB14997A5C8CF8F3BEB0201B70319@ussemx1100.merck.com> References: <9BEE7CC4462DB14997A5C8CF8F3BEB0201B70319@ussemx1100.merck.com> Message-ID: <475DD7BA.5000101@soe.ucsc.edu> Hello Jin, We are currently working on SNP build 128 for the human (hg18) assembly. It will likely be out after the first of the year. Once that is done, we do plan to update mouse SNPs to build 128. However, note that it is dbSNP build 128 is built on the mm9 assembly, not mm8, so our snp128 track for mouse will only be present on mm9. -- Brooke Rhead UCSC Genome Bioinformatics Group Ma, Jin wrote: > Hi, > > Is there a plan to update the snp files to current build 128. Both human > and mouse are still on build126 (updated Oct 2006). Thanks. > > http://hgdownload.cse.ucsc.edu/goldenPath/hg18/database/snp126.txt.gz > http://hgdownload.cse.ucsc.edu/goldenPath/mm8/database/snp126.txt.gz > > Jin > > > > ------------------------------------------------------------------------------ > Notice: This e-mail message, together with any attachments, contains > information of Merck & Co., Inc. (One Merck Drive, Whitehouse Station, > New Jersey, USA 08889), and/or its affiliates (which may be known > outside the United States as Merck Frosst, Merck Sharp & Dohme or MSD > and in Japan, as Banyu - direct contact information for affiliates is > available at http://www.merck.com/contact/contacts.html) that may be > confidential, proprietary copyrighted and/or legally privileged. It is > intended solely for the use of the individual or entity named on this > message. If you are not the intended recipient, and have received this > message in error, please notify us immediately by reply e-mail and then > delete it from your system. > > ------------------------------------------------------------------------------ > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From braney at soe.ucsc.edu Mon Dec 10 16:27:46 2007 From: braney at soe.ucsc.edu (Brian Raney) Date: Mon, 10 Dec 2007 16:27:46 -0800 Subject: [Genome] Chimpanzee self alignments In-Reply-To: References: Message-ID: Hey J.J., As you've noticed, self-chains are not part of the standard UCSC cross-species pipeline, but we have chimp self-chains generated for internal use that you're welcome to use with the warning that these have not been through the UCSC QA process. Self chains are also hard to tune since there is no one whole genome duplication point like there is in speciation, so self chains are inevitably going to favor certain percent i/d duplications over those that are older ( or more recent). http://genome-test.cse.ucsc.edu/goldenPath/panTro2/vsSelf/ brian On Dec 4, 2007 12:18 AM, J.J. Emerson wrote: > I notice that the self alignments at UCSC are done only for select > genomes which appear to be vertebrate, relatively "high quality" and > "important". I was wondering if the chimpanzee genome could be run > through the pipeline in order to collect the self alignments? I know > this is a big undertaking, but considering the attention to primate > genomics, especially with regard to duplication, this would seem > useful, even if the chimp genome isn't as high quality as the human > genome. It seems as if chimp meets most of these criteria well, its > assembly quality notwithstanding. It would be a really great > resource, even with the caveats. I can think of many people who would > find such a resource invaluable. > > Thanks a lot! > > Cheers, > > J.J. > > PS > These are some keywords that might help others using a search engine > to find this post and its response: > Pan troglodytes, paralogy, synteny, self alignment, duplication, > chimp, chimpanzee, duplication, CNV > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome > From braney at soe.ucsc.edu Mon Dec 10 16:34:48 2007 From: braney at soe.ucsc.edu (Brian Raney) Date: Mon, 10 Dec 2007 16:34:48 -0800 Subject: [Genome] non-human self alignments In-Reply-To: <00a501c83650$83686680$330ea8c0@grctrees> References: <00a501c83650$83686680$330ea8c0@grctrees> Message-ID: Hey TJ, Here are the directories containing the chimpanzee, macaque, and mouse self alignments. None of the files in these directories have been through our QA process. Use at your own risk. http://genome-test.cse.ucsc.edu/goldenPath/panTro2/vsSelf/ http://genome-test.cse.ucsc.edu/goldenPath/rheMac2/vsSelf/ http://genome-test.cse.ucsc.edu/goldenPath/mm9/vsSelf/ I hope these will be useful to you. brian On Dec 4, 2007 12:34 AM, Trees_Juen Chuang wrote: > Dear UCSC genome browser: > > Could your browser provide non-human self alignments (e.g., chimpanzee, > macaque, mouse self alignments) (chain format)? > Thank you very much. > > Best regards, > TJ Chuang > Academia Sinica, Taiwan > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome > From rhead at soe.ucsc.edu Mon Dec 10 17:46:03 2007 From: rhead at soe.ucsc.edu (Brooke Rhead) Date: Mon, 10 Dec 2007 17:46:03 -0800 Subject: [Genome] Wiggle track In-Reply-To: References: Message-ID: <475DEBDB.1090506@soe.ucsc.edu> Hello Peter, This page is probably the most useful for figuring out how do display wiggle data as a custom track (apologies if you have already looked at it): http://genome.ucsc.edu/goldenPath/help/wiggle.html The first thing you need to do is add this line before any of the data: track type=wiggle_0 This is the only required track definition line for a wiggle track. If you want the data to be autoscaled by default, you can add "autoScale=on" to the track definition line, like so: track type=wiggle_0 autoScale=on (You can also just make this selection with the track controls after you have made the custom track . . . click on the mini-button on the far left-hand side of your track in the Genome Browser.) Another thing you need to do is sort your lines of data so that the chromosome positions are in ascending order. You will get an error if they are out of order. Now you can simply paste the data into the box on the custom track page. I was able to successfully load this: track type=wiggle_0 name=myWiggleTrack autoScale=on chr1 149425822 149425881 -0.08 chr1 149427694 149427751 0.02 chr1 149452382 149452441 0.01 chr1 149505715 149505767 0.07 chr1 149506363 149506407 0.09 chr1 149530390 149530449 -0.03 chr1 149539255 149539314 -0.11 chr1 149546405 149546464 -0.12 chr1 149547068 149547120 0.06 chr1 149552708 149552762 0.04 chr1 149554891 149554936 0.07 chr1 149576001 149576060 0.02 chr1 149582350 149582397 -0.01 chr1 149607234 149607285 0.1 chr1 149645162 149645217 0.1 chr1 149669687 149669744 0.06 chr1 149671528 149671587 0.24 chr1 149679760 149679819 -0.03 chr1 149683830 149683889 0 chr1 149698153 149698197 0.19 chr1 149748090 149748142 0.69 chr1 149781679 149781738 0.16 chr1 149790533 149790588 -0.01 chr1 149798247 149798306 -0.11 chr1 149860334 149860393 0.22 chr1 149905306 149905365 -0.02 I hope this helps you get started. If you are still having difficulties, please feel free to write back to this mailing list. -- Brooke Rhead UCSC Genome Bioinformatics Group Peter Scacheri wrote: > How do I make a wiggle track from the data below? (Sorry, I've read > the help pages, but I can't figure this out!) Also, is there a way > to autoscale the data? > > Thanks, > Peter > > chr1 149645162 149645217 0.09955 > chr1 149576001 149576060 0.0161285 > chr1 149748090 149748142 0.692 > chr1 149425822 149425881 -0.083185 > chr1 149781679 149781738 0.16 > chr1 149506363 149506407 0.0886 > chr1 149539255 149539314 -0.11105 > chr1 149798247 149798306 -0.111 > chr1 149607234 149607285 0.1008 > chr1 149905306 149905365 -0.022781 > chr1 149671528 149671587 0.2355 > chr1 149683830 149683889 0 > chr1 149530390 149530449 -0.02773 > chr1 149679760 149679819 -0.02525 > chr1 149452382 149452441 0.01005 > chr1 149552708 149552762 0.04119 > chr1 149546405 149546464 -0.1248 > chr1 149669687 149669744 0.05535 > chr1 149582350 149582397 -0.014805 > chr1 149698153 149698197 0.1855 > chr1 149505715 149505767 0.0663 > chr1 149547068 149547120 0.06485 > chr1 149427694 149427751 0.0223 > chr1 149790533 149790588 -0.01355 > chr1 149860334 149860393 0.217 > chr1 149554891 149554936 0.06705 > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From rhead at soe.ucsc.edu Mon Dec 10 18:24:22 2007 From: rhead at soe.ucsc.edu (Brooke Rhead) Date: Mon, 10 Dec 2007 18:24:22 -0800 Subject: [Genome] CpG islands and TSS In-Reply-To: <6254D46E7DD84E4098725E04E5E25919B55242@msxc02.OSUMC.EDU> References: <6254D46E7DD84E4098725E04E5E25919B55242@msxc02.OSUMC.EDU> Message-ID: <475DF4D6.702@soe.ucsc.edu> Hello Ilya, Information on CpG islands is in the table 'cpgIslandExt', which is available from the Table Browser (hit the "Tables" link in the blue bar at the top of the page and look under the Expression and Regulation group for the CpG Islands track), or from the downloads page: http://hgdownload.cse.ucsc.edu/downloads.html (go to the appropriate assembly, and then click the "annotation database" link and look for the file 'cpgIslandExt.txt.gz'). The island length is in the 'length' field of the table, the field 'perCg' contains the percentage of the island that is C or G (and the field 'numCg' contains the count of C and G in the island), and the observed/expected ratio is in the field 'obsExp'. Finding the distance to the nearest transcription start site will be more difficult. Depending on the assembly you are using, there may or may not be a TSS track available. If there is, you will need to use your own tools to find the nearest TSS to a CpG Island and to calculate the distance between them. There might be a tool at the Galaxy web site (http://main.g2.bx.psu.edu/ ; this site is run by Penn State) that might be useful for this. I hope this information helps. If you have further questions, please feel free to contact us again at this mailing list address. -- Brooke Rhead UCSC Genome Bioinformatics Group Ioschikhes, Ilya wrote: > Hello, > > Please let me know how could I get following information for known CpG > islands: > > Length; C,G content; Observed/Expected CpG ratio; Distance from > nearest TSS. > > Thanks, > > > Ilya Ioshikhes, Ph.D. > > Assistant Professor > > Department of Biomedical Informatics and > > Department of Molecular & Cellular Biochemistry, > > Associate Investigator > > Davis Heart and Lung Research Institute, > Ohio State University > 3172c Graves Hall > 333 W. 10th Ave. > Columbus, OH 43210 > TEL: +1 (614) 292-8929 > Fax: +1 (614) 688-6600 > E-mail: Ilya.Ioschikhes at osumc.edu > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From li3 at niehs.nih.gov Tue Dec 11 06:08:29 2007 From: li3 at niehs.nih.gov (Li, Leping (NIH/NIEHS) [E]) Date: Tue, 11 Dec 2007 09:08:29 -0500 Subject: [Genome] 28-way protein sequence alignment Message-ID: <8BB67CD070A6EE479378FBE1CEB79478019FE446@NIHCESMLBX6.nih.gov> Hello again, The steps outlined gave me the multiz alignment of genomic sequences rather than protein sequences I need. What I would like to obtain is the multiz alignment of protein sequences for a human protein query sequence. Thanks again. Leping Li NIEHS/NIH -----Original Message----- From: Kayla Smith [mailto:kayla at soe.ucsc.edu] Sent: Thu 12/6/2007 7:20 PM To: Li, Leping (NIH/NIEHS) [E] Cc: genome at soe.ucsc.edu Subject: Re: [Genome] 28-way protein sequence alignment Hello Leping, First you need to find genomic coordinates that correspond to your sequence. You can use BLAT to help you find where your protein sequence is located. Here is a link to BLAT: Next, use the Table Browser ("Tables", on the blue bar on the top of the main page) to get the data you're interested in. Set the following options: clade: Vertebrate genome: Human assembly: Mar2006 group: Comparative Genomics track: Conservation table: multiz28way position (paste in your genomic coordinates here!) output format: MAF- multiple alignment format. Press "get output" I hope this is helpful to you. Please don't hesitate to contact us again if you require further assistance. Kayla Smith UCSC Genome Bioinformatics Group Li, Leping (NIH/NIEHS) [E] wrote: > Hi, > > > > I have a short human protein sequence. I would like to obtain the multiz > alignment of the protein sequences rather than the DNA sequences. I'd > appreciate your help. Thanks. > > > > Leping Li > > NIEHS/NIH > > > > > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From katleen.depreter at ugent.be Tue Dec 11 00:21:08 2007 From: katleen.depreter at ugent.be (Katleen De Preter) Date: Tue, 11 Dec 2007 09:21:08 +0100 Subject: [Genome] [Fwd: question UCSC Tables] In-Reply-To: <475DDF48.1060007@soe.ucsc.edu> References: <475D7FD8.6060707@ucsc.edu> <475DDF48.1060007@soe.ucsc.edu> Message-ID: <475E4874.2040507@ugent.be> Hello, As I have understood well from your answer, it is not possible to get the original search terms in the output table. As this is a mixture of HUGO and Alias and some other types of identifiers, it would be very helpful to check for which of the input terms Genome Browser has found a match... Best regards, Katleen De Preter Brooke Rhead schreef: > Hello Kathleen, > > Are you by any chance using the "UCSC Genes" track or the "RefSeq > Genes" track? Both of these tracks have some extra functionality in > the Table Browser that allow you to use an identifier that is NOT the > main identifier in the table you have selected. > > To see what I am referring to, look at the text at the top of the page > when you hit the "paste list" button. For UCSC Genes, the text at the > top is: > > "Please paste in the identifiers you want to include. The items must > be values of the name field of the currently selected table, > knownGene, or the alias field of the alias table kgAlias." > > What is happening when you paste in HUGO identifiers is that the Table > Browser is going through the kgAlias table to make selections from the > knownGene table, but only the fields from the knownGene table are > returned. > > To include information from the kgAlias table in your output, choose > the output format "selected fields from primary and related tables". > Now hit "get output", and then from the Linked Tables section, check > the box for the kgAlias table and hit the "allow selection from > checked tables" button at the bottom of the page. Be sure the > kgAlias.alias field is selected (as well as the fields you want to > retrieve from knownGene), and then hit "get output". > > You should see a new column on the end of your output that contains > the alias field from kgAlias. Note that since one UCSC Gene ID > generally corresponds to several aliases, there are several names > listed in that column. The identifier you originally entered should > be included in the list. > > I hope this information is helpful. If you have further questions, > please do not hesitate to contact us again. However, please send > future questions to genome at soe.ucsc.edu, our moderated forum for user > questions and support. (Note that this is a public mailing list, see > http://genome.ucsc.edu/contacts.html for details.) > > -- > Brooke Rhead > UCSC Genome Bioinformatics Group > > >> -------- Original Message -------- >> Subject: question UCSC Tables >> Date: Mon, 10 Dec 2007 17:29:16 +0100 >> From: Katleen De Preter >> To: cbseweb at cbse.ucsc.edu >> >> >> >> Dear Mr/Mrs, >> >> I would like to search the positions of a list of gene symbols (HUGO >> names and aliases). When I perform a search using the Tables >> function, I get a large list of results. However, in this results >> file, I cannot find the original Gene Symbols I have searched for. >> How can I obtain also the original list in the output file? >> >> Thank you in advance, >> Best regards, >> >> Katleen De Preter -- dr. ir. Katleen De Preter Center for Medical Genetics Ghent (CMGG) Ghent University Hospital Medical Research Building (MRB), 2nd floor, room 120.038 De Pintelaan 185, B-9000 Ghent, Belgium +32 9 332 5533 (phone) | +32 9 332 6549 (fax) http://medgen.ugent.be Katleen.DePreter at UGent.be From birney at ebi.ac.uk Tue Dec 11 06:50:18 2007 From: birney at ebi.ac.uk (Ewan Birney) Date: Tue, 11 Dec 2007 14:50:18 +0000 Subject: [Genome] 28-way protein sequence alignment In-Reply-To: <8BB67CD070A6EE479378FBE1CEB79478019FE446@NIHCESMLBX6.nih.gov> References: <8BB67CD070A6EE479378FBE1CEB79478019FE446@NIHCESMLBX6.nih.gov> Message-ID: <6D7EEF7C-58BA-407A-9477-674BF2708EF0@ebi.ac.uk> On 11 Dec 2007, at 14:08, Li, Leping (NIH/NIEHS) [E] wrote: > Hello again, > > The steps outlined gave me the multiz alignment of genomic sequences > rather than protein sequences I need. What I would like to obtain > is the > multiz alignment of protein sequences for a human protein query > sequence. Li, not at UCSC there is a multiple alignment of protein resource at the Ensembl pages: From a gene page: http://www.ensembl.org/Homo_sapiens/geneview? gene=ENSG00000160145;db=core select the "Gene Tree info." on the left hand "yellow column" http://www.ensembl.org/Homo_sapiens/genetreeview? db=core;gene=ENSG00000160145 you can then output the alignment select "Export" in the pull down menu, and select "Alignment Dump" http://www.ensembl.org/Homo_sapiens/alignview? class=GeneTree;gene=ENSG00000160145;format=clustalw Notice you can change formats etc... If you need any help, do email helpdesk (helpdesk at ensembl.org) > > Thanks again. > > Leping Li > NIEHS/NIH > > -----Original Message----- > From: Kayla Smith [mailto:kayla at soe.ucsc.edu] > Sent: Thu 12/6/2007 7:20 PM > To: Li, Leping (NIH/NIEHS) [E] > Cc: genome at soe.ucsc.edu > Subject: Re: [Genome] 28-way protein sequence alignment > > > Hello Leping, > > First you need to find genomic coordinates that correspond to your > sequence. You can use BLAT to help you find where your protein > sequence is located. > > Here is a link to BLAT: > > Next, use the Table Browser ("Tables", on the blue bar on the top > of the > > main page) to get the data you're interested in. Set the following > options: > > clade: Vertebrate > genome: Human > assembly: Mar2006 > group: Comparative Genomics > track: Conservation > table: multiz28way > position (paste in your genomic coordinates here!) > output format: MAF- multiple alignment format. > > Press "get output" > > I hope this is helpful to you. Please don't hesitate to contact us > again if you require further assistance. > > Kayla Smith > UCSC Genome Bioinformatics Group > > > Li, Leping (NIH/NIEHS) [E] wrote: >> Hi, >> >> >> >> I have a short human protein sequence. I would like to obtain the > multiz >> alignment of the protein sequences rather than the DNA sequences. I'd >> appreciate your help. Thanks. >> >> >> >> Leping Li >> >> NIEHS/NIH >> >> >> >> >> >> _______________________________________________ >> Genome maillist - Genome at soe.ucsc.edu >> http://www.soe.ucsc.edu/mailman/listinfo/genome > > > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From ann at soe.ucsc.edu Tue Dec 11 08:30:17 2007 From: ann at soe.ucsc.edu (Ann Zweig) Date: Tue, 11 Dec 2007 08:30:17 -0800 Subject: [Genome] 28-way protein sequence alignment In-Reply-To: <8BB67CD070A6EE479378FBE1CEB79478019FE446@NIHCESMLBX6.nih.gov> References: <8BB67CD070A6EE479378FBE1CEB79478019FE446@NIHCESMLBX6.nih.gov> Message-ID: <475EBB19.9060604@soe.ucsc.edu> Hello Li, This is something that we are intending to create in the near future. Protein alignment tracks will be available for many assemblies that currently have multiple alignments in DNA-space. Regards, ---------- Ann Zweig UCSC Genome Bioinformatics Group http://genome.ucsc.edu Li, Leping (NIH/NIEHS) [E] wrote: > Hello again, > > The steps outlined gave me the multiz alignment of genomic sequences > rather than protein sequences I need. What I would like to obtain is the > multiz alignment of protein sequences for a human protein query > sequence. > > Thanks again. > > Leping Li > NIEHS/NIH > > -----Original Message----- > From: Kayla Smith [mailto:kayla at soe.ucsc.edu] > Sent: Thu 12/6/2007 7:20 PM > To: Li, Leping (NIH/NIEHS) [E] > Cc: genome at soe.ucsc.edu > Subject: Re: [Genome] 28-way protein sequence alignment > > > Hello Leping, > > First you need to find genomic coordinates that correspond to your > sequence. You can use BLAT to help you find where your protein > sequence is located. > > Here is a link to BLAT: > > Next, use the Table Browser ("Tables", on the blue bar on the top of the > > main page) to get the data you're interested in. Set the following > options: > > clade: Vertebrate > genome: Human > assembly: Mar2006 > group: Comparative Genomics > track: Conservation > table: multiz28way > position (paste in your genomic coordinates here!) > output format: MAF- multiple alignment format. > > Press "get output" > > I hope this is helpful to you. Please don't hesitate to contact us > again if you require further assistance. > > Kayla Smith > UCSC Genome Bioinformatics Group > > > Li, Leping (NIH/NIEHS) [E] wrote: >> Hi, >> >> >> >> I have a short human protein sequence. I would like to obtain the > multiz >> alignment of the protein sequences rather than the DNA sequences. I'd >> appreciate your help. Thanks. >> >> >> >> Leping Li >> >> NIEHS/NIH >> >> >> >> >> >> _______________________________________________ >> Genome maillist - Genome at soe.ucsc.edu >> http://www.soe.ucsc.edu/mailman/listinfo/genome > > > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From raimondo at crs4.it Tue Dec 11 08:34:25 2007 From: raimondo at crs4.it (Domenico Raimondo) Date: Tue, 11 Dec 2007 17:34:25 +0100 Subject: [Genome] liftOver number of conversion Message-ID: <475EBC11.6010600@crs4.it> Hi I'm trying to use the liftOver tool and one thing I don't understand is why the server (and also local installation) doesn't accept more than 74 input lines to be converted. Thanks for your help. Domenico. From ann at soe.ucsc.edu Tue Dec 11 09:29:27 2007 From: ann at soe.ucsc.edu (Ann Zweig) Date: Tue, 11 Dec 2007 09:29:27 -0800 Subject: [Genome] liftOver number of conversion In-Reply-To: <475EBC11.6010600@crs4.it> References: <475EBC11.6010600@crs4.it> Message-ID: <475EC8F7.9070306@soe.ucsc.edu> Hello Domenico, There is no hard limit imposed within the liftOver tool. I wonder if, perhaps, line 75 of your input file has an error. If you would like me to take a look at it, feel free to send it to me (no need to cc the entire list) and let me know what assembly you are lifting to/from. Regards, ---------- Ann Zweig UCSC Genome Bioinformatics Group http://genome.ucsc.edu Please feel free to search the Genome mailing list archives by visiting our home page, clicking on "Contact Us", then typing a word or phrase into the search box. On that same page (http://genome.ucsc.edu/contacts.html), you can subscribe to the Genome mailing list. Domenico Raimondo wrote: > Hi > > I'm trying to use the liftOver tool and one thing I don't understand is why the server (and also local installation) > doesn't accept more than 74 input lines to be converted. > Thanks for your help. > > Domenico. > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From Gonzales.Patrick at mayo.edu Tue Dec 11 09:59:21 2007 From: Gonzales.Patrick at mayo.edu (Gonzales, Patrick R.) Date: Tue, 11 Dec 2007 11:59:21 -0600 Subject: [Genome] UCE Custom Track Message-ID: <572057D3BDD52A46BD05BC6DA5068611EE4C75@MSGEBE22.mfad.mfroot.org> HI, I'm trying to add the data from Bejerano's UCE paper to the UCSC browser, but the link from the supplemental info is giving me this error: Your URL "http://www.cse.ucsc.edu/~jill/ultra.hg17.track" resulted in a redirect message (HTTP status code 302 Found). Sorry, redirects are not supported. Redirection location: http://www.soe.ucsc.edu/~jill/ultra.hg17.track Please help. Pat Patrick R. Gonzales, MS Development Technologist Cytogenetics Array CGH Mayo Clinic Hilton 932 (507)284-8338 gonzales.patrick at mayo.edu From ann at soe.ucsc.edu Tue Dec 11 10:11:26 2007 From: ann at soe.ucsc.edu (Ann Zweig) Date: Tue, 11 Dec 2007 10:11:26 -0800 Subject: [Genome] UCE Custom Track In-Reply-To: <572057D3BDD52A46BD05BC6DA5068611EE4C75@MSGEBE22.mfad.mfroot.org> References: <572057D3BDD52A46BD05BC6DA5068611EE4C75@MSGEBE22.mfad.mfroot.org> Message-ID: <475ED2CE.2040909@soe.ucsc.edu> Hello Pat, Can you please tell me what web page this "supplemental info" link is on? That way I can look into correcting the link itself. Until then, if you follow the second link in your email, you will find the data. http://www.soe.ucsc.edu/~jill/ultra.hg17.track Regards, ---------- Ann Zweig UCSC Genome Bioinformatics Group http://genome.ucsc.edu Please feel free to search the Genome mailing list archives by visiting our home page, clicking on "Contact Us", then typing a word or phrase into the search box. On that same page (http://genome.ucsc.edu/contacts.html), you can subscribe to the Genome mailing list. Gonzales, Patrick R. wrote: > HI, I'm trying to add the data from Bejerano's UCE paper to the UCSC > browser, but the link from the supplemental info is giving me this > error: Your URL "http://www.cse.ucsc.edu/~jill/ultra.hg17.track" > resulted in a redirect message (HTTP status code 302 Found). > Sorry, redirects are not supported. Redirection location: > http://www.soe.ucsc.edu/~jill/ultra.hg17.track > > Please help. > > Pat > > > Patrick R. Gonzales, MS > > Development Technologist > Cytogenetics Array CGH > Mayo Clinic > Hilton 932 > (507)284-8338 > gonzales.patrick at mayo.edu > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From ann at soe.ucsc.edu Tue Dec 11 10:27:26 2007 From: ann at soe.ucsc.edu (Ann Zweig) Date: Tue, 11 Dec 2007 10:27:26 -0800 Subject: [Genome] UCE Custom Track In-Reply-To: <572057D3BDD52A46BD05BC6DA5068611EE4C76@MSGEBE22.mfad.mfroot.org> References: <572057D3BDD52A46BD05BC6DA5068611EE4C75@MSGEBE22.mfad.mfroot.org> <475ED2CE.2040909@soe.ucsc.edu> <572057D3BDD52A46BD05BC6DA5068611EE4C76@MSGEBE22.mfad.mfroot.org> Message-ID: <475ED68E.4030703@soe.ucsc.edu> Hello again, Pat, Thanks for forwarding the link. We will get that fixed. As for using the UCE data, you can create the Custom Track using the data set within the UCSC Genome Browser. I will walk you through it. If you get stuck you can read more about Custom Tracks here: http://genome.ucsc.edu/goldenPath/help/hgTracksHelp.html#CustomTracks 1. Open the Genome Browser and choose the hg17 assembly (May 2004) 2. Press the "add custom tracks" button. 3. Enter the URL of the data into the "Paste URLs" box: http://www.soe.ucsc.edu/~jill/ultra.hg17.track Then press 'submit'. 4. From the Manage Custom Tracks page, press "go to genome browser" button. The UCE track will be displayed in red near the top of the tracks image. Regards, ---------- Ann Zweig UCSC Genome Bioinformatics Group http://genome.ucsc.edu Gonzales, Patrick R. wrote: > This is the link: > > http://www.soe.ucsc.edu/~jill/ultra.html > > I don't know how to format and use the data from the link you sent, > regarding entering that data as a custom track. > > -----Original Message----- > From: Ann Zweig [mailto:ann at soe.ucsc.edu] > Sent: Tuesday, December 11, 2007 12:11 PM > To: Gonzales, Patrick R. > Cc: genome at soe.ucsc.edu > Subject: Re: [Genome] UCE Custom Track > > Hello Pat, > > Can you please tell me what web page this "supplemental info" > link is on? That > way I can look into correcting the link itself. > > Until then, if you follow the second link in your email, you > will find the > data. http://www.soe.ucsc.edu/~jill/ultra.hg17.track > > > Regards, > > ---------- > Ann Zweig > UCSC Genome Bioinformatics Group > http://genome.ucsc.edu > > Please feel free to search the Genome mailing list archives by visiting > our home > page, clicking on "Contact Us", then typing a word or phrase into the > search > box. On that same page > (http://genome.ucsc.edu/contacts.html), you can subscribe to the Genome > mailing > list. > > > Gonzales, Patrick R. wrote: >> HI, I'm trying to add the data from Bejerano's UCE paper to the UCSC >> browser, but the link from the supplemental info is giving me this >> error: Your URL "http://www.cse.ucsc.edu/~jill/ultra.hg17.track" >> resulted in a redirect message (HTTP status code 302 Found). >> Sorry, redirects are not supported. Redirection location: >> http://www.soe.ucsc.edu/~jill/ultra.hg17.track >> >> Please help. >> >> Pat >> >> >> Patrick R. Gonzales, MS >> >> Development Technologist >> Cytogenetics Array CGH >> Mayo Clinic >> Hilton 932 >> (507)284-8338 >> gonzales.patrick at mayo.edu >> >> _______________________________________________ >> Genome maillist - Genome at soe.ucsc.edu >> http://www.soe.ucsc.edu/mailman/listinfo/genome From Gonzales.Patrick at mayo.edu Tue Dec 11 10:15:42 2007 From: Gonzales.Patrick at mayo.edu (Gonzales, Patrick R.) Date: Tue, 11 Dec 2007 12:15:42 -0600 Subject: [Genome] UCE Custom Track In-Reply-To: <475ED2CE.2040909@soe.ucsc.edu> References: <572057D3BDD52A46BD05BC6DA5068611EE4C75@MSGEBE22.mfad.mfroot.org> <475ED2CE.2040909@soe.ucsc.edu> Message-ID: <572057D3BDD52A46BD05BC6DA5068611EE4C76@MSGEBE22.mfad.mfroot.org> This is the link: http://www.soe.ucsc.edu/~jill/ultra.html I don't know how to format and use the data from the link you sent, regarding entering that data as a custom track. -----Original Message----- From: Ann Zweig [mailto:ann at soe.ucsc.edu] Sent: Tuesday, December 11, 2007 12:11 PM To: Gonzales, Patrick R. Cc: genome at soe.ucsc.edu Subject: Re: [Genome] UCE Custom Track Hello Pat, Can you please tell me what web page this "supplemental info" link is on? That way I can look into correcting the link itself. Until then, if you follow the second link in your email, you will find the data. http://www.soe.ucsc.edu/~jill/ultra.hg17.track Regards, ---------- Ann Zweig UCSC Genome Bioinformatics Group http://genome.ucsc.edu Please feel free to search the Genome mailing list archives by visiting our home page, clicking on "Contact Us", then typing a word or phrase into the search box. On that same page (http://genome.ucsc.edu/contacts.html), you can subscribe to the Genome mailing list. Gonzales, Patrick R. wrote: > HI, I'm trying to add the data from Bejerano's UCE paper to the UCSC > browser, but the link from the supplemental info is giving me this > error: Your URL "http://www.cse.ucsc.edu/~jill/ultra.hg17.track" > resulted in a redirect message (HTTP status code 302 Found). > Sorry, redirects are not supported. Redirection location: > http://www.soe.ucsc.edu/~jill/ultra.hg17.track > > Please help. > > Pat > > > Patrick R. Gonzales, MS > > Development Technologist > Cytogenetics Array CGH > Mayo Clinic > Hilton 932 > (507)284-8338 > gonzales.patrick at mayo.edu > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From ann at soe.ucsc.edu Tue Dec 11 12:00:26 2007 From: ann at soe.ucsc.edu (Ann Zweig) Date: Tue, 11 Dec 2007 12:00:26 -0800 Subject: [Genome] [Fwd: question UCSC Tables] In-Reply-To: <475E4874.2040507@ugent.be> References: <475D7FD8.6060707@ucsc.edu> <475DDF48.1060007@soe.ucsc.edu> <475E4874.2040507@ugent.be> Message-ID: <475EEC5A.9050500@soe.ucsc.edu> Hello Katleen, By slightly changing the order of your queries in the Table Browser, I think you will be able to get what you want. I will explain by using an example: Let's say this is your set of HUGO (HGNC) geneSymbols: ATP2B3 ABCD1 CD99L2 GABRA3 Enter that set into the 'paste list' section of the Table Browser as before. However, this time, choose the kgXref table (found in the 'table' drop-down list of the UCSC Genes 'track' section of the Table Browser). For 'output format' choose "selected fields from primary and related tables" as before. To set the 'select fields' page up, start by choosing the geneSymbol field of the hg18.kgXref table. Then choose the knownGene table from the list of linked tables, and select the following fields: chrom, txStart, txEnd (and whatever else you want) Submit your query (get output). The output, in this example, looks like this: #hg18.kgXref.geneSymbol hg18.knownGene.chrom hg18.knownGene.txStart hg18.knownGene.txEnd ATP2B3 chrX 152454773 152501581 ATP2B3 chrX 152454773 152501581 ATP2B3 chrX 152480786 152501581 ABCD1 chrX 152643529 152663374 ABCD1 chrX 152655356 152663374 ABCD1 chrX 152661852 152663374 CD99L2 chrX 149685466 149817837 CD99L2 chrX 149685466 149817837 CD99L2 chrX 149685466 149817837 GABRA3 chrX 151087185 151370486 The first column contains the HGNC geneSymbols, the other columns the position. You will be able to clearly see which of your symbols did not match, as they will not be listed. If you would like only the canonical gene for each of your geneSymbols (instead of all of the isoforms), you will need to include the knownIsoforms and knownCanonical tables in your Table Browser query. Regards, ---------- Ann Zweig UCSC Genome Bioinformatics Group http://genome.ucsc.edu Please feel free to search the Genome mailing list archives by visiting our home page, clicking on "Contact Us", then typing a word or phrase into the search box. On that same page (http://genome.ucsc.edu/contacts.html), you can subscribe to the Genome mailing list. Katleen De Preter wrote: > Hello, > > As I have understood well from your answer, it is not possible to get > the original search terms in the output table. As this is a mixture of > HUGO and Alias and some other types of identifiers, it would be very > helpful to check for which of the input terms Genome Browser has found a > match... > > Best regards, > > Katleen De Preter > > > Brooke Rhead schreef: >> Hello Kathleen, >> >> Are you by any chance using the "UCSC Genes" track or the "RefSeq >> Genes" track? Both of these tracks have some extra functionality in >> the Table Browser that allow you to use an identifier that is NOT the >> main identifier in the table you have selected. >> >> To see what I am referring to, look at the text at the top of the page >> when you hit the "paste list" button. For UCSC Genes, the text at the >> top is: >> >> "Please paste in the identifiers you want to include. The items must >> be values of the name field of the currently selected table, >> knownGene, or the alias field of the alias table kgAlias." >> >> What is happening when you paste in HUGO identifiers is that the Table >> Browser is going through the kgAlias table to make selections from the >> knownGene table, but only the fields from the knownGene table are >> returned. >> >> To include information from the kgAlias table in your output, choose >> the output format "selected fields from primary and related tables". >> Now hit "get output", and then from the Linked Tables section, check >> the box for the kgAlias table and hit the "allow selection from >> checked tables" button at the bottom of the page. Be sure the >> kgAlias.alias field is selected (as well as the fields you want to >> retrieve from knownGene), and then hit "get output". >> >> You should see a new column on the end of your output that contains >> the alias field from kgAlias. Note that since one UCSC Gene ID >> generally corresponds to several aliases, there are several names >> listed in that column. The identifier you originally entered should >> be included in the list. >> >> I hope this information is helpful. If you have further questions, >> please do not hesitate to contact us again. However, please send >> future questions to genome at soe.ucsc.edu, our moderated forum for user >> questions and support. (Note that this is a public mailing list, see >> http://genome.ucsc.edu/contacts.html for details.) >> >> -- >> Brooke Rhead >> UCSC Genome Bioinformatics Group >> >> >>> -------- Original Message -------- >>> Subject: question UCSC Tables >>> Date: Mon, 10 Dec 2007 17:29:16 +0100 >>> From: Katleen De Preter >>> To: cbseweb at cbse.ucsc.edu >>> >>> >>> >>> Dear Mr/Mrs, >>> >>> I would like to search the positions of a list of gene symbols (HUGO >>> names and aliases). When I perform a search using the Tables >>> function, I get a large list of results. However, in this results >>> file, I cannot find the original Gene Symbols I have searched for. >>> How can I obtain also the original list in the output file? >>> >>> Thank you in advance, >>> Best regards, >>> >>> Katleen De Preter > From ljc37 at cornell.edu Tue Dec 11 13:01:57 2007 From: ljc37 at cornell.edu (Leighton J. Core) Date: Tue, 11 Dec 2007 16:01:57 -0500 Subject: [Genome] size limit of BED files? Message-ID: <1B5124A3-BB72-42AE-A08F-E7C76C756DE1@cornell.edu> Hi, I am trying to upload a custom track to the genome browser, and I get the error message: "Can't read file". I have uploaded BED files before, and this BED file was made in the same way. The only difference is that it is much larger: it has 7.2 million lines and is 565 MB in size. Is this file just too big? Any help is appreciated. Thank you, Leighton Core From hiram at soe.ucsc.edu Tue Dec 11 13:29:11 2007 From: hiram at soe.ucsc.edu (Hiram Clawson) Date: Tue, 11 Dec 2007 13:29:11 -0800 Subject: [Genome] size limit of BED files? In-Reply-To: <1B5124A3-BB72-42AE-A08F-E7C76C756DE1@cornell.edu> References: <1B5124A3-BB72-42AE-A08F-E7C76C756DE1@cornell.edu> Message-ID: <475F0127.3040409@soe.ucsc.edu> Good Afternoon Leighton: Can you gzip the file to reduce its transfer size ? Where does the "Can't read file" message appear ? Is that the full message ? --Hiram Leighton J. Core wrote: > Hi, > I am trying to upload a custom track to the genome browser, and I get > the error message: "Can't read file". I have uploaded BED files > before, and this BED file was made in the same way. The only > difference is that it is much larger: it has 7.2 million lines and is > 565 MB in size. Is this file just too big? Any help is appreciated. > > Thank you, > Leighton Core From knthatcher at ucdavis.edu Tue Dec 11 15:21:59 2007 From: knthatcher at ucdavis.edu (Karen Thatcher) Date: Tue, 11 Dec 2007 15:21:59 -0800 (PST) Subject: [Genome] Rat chromosome Y? Message-ID: <200712112321.lBBNLxTq012837@citheronia.ucdavis.edu> Hi, I've been searching for multiple genes on chromosome Y in rat, but have been unable to find them by either gene name: SRY, ZFY, HY, USP9Y, RBMY, TSPY or by genback accession #. Is there an assembly for chrY in rat? Thank you, Karen From ann at soe.ucsc.edu Tue Dec 11 15:33:28 2007 From: ann at soe.ucsc.edu (Ann Zweig) Date: Tue, 11 Dec 2007 15:33:28 -0800 Subject: [Genome] Rat chromosome Y? In-Reply-To: <200712112321.lBBNLxTq012837@citheronia.ucdavis.edu> References: <200712112321.lBBNLxTq012837@citheronia.ucdavis.edu> Message-ID: <475F1E48.9050304@soe.ucsc.edu> Hello Karen, The current rat assembly does not include a Y chromosome. I just checked the Baylor website and it looks like they have received funding for sequencing, among other things, the Y chrom. http://www.hgsc.bcm.tmc.edu/projects/rat/ "The HGSC is sequencing the genome of the rat (Rattus norvegicus). The rat has been funded for a genome upgrade. This includes finishing some regions to high quality, addressing the Y chromosome, and producing an improved genome assembly." Regards, ---------- Ann Zweig UCSC Genome Bioinformatics Group http://genome.ucsc.edu Please feel free to search the Genome mailing list archives by visiting our home page, clicking on "Contact Us", then typing a word or phrase into the search box. On that same page (http://genome.ucsc.edu/contacts.html), you can subscribe to the Genome mailing list. Karen Thatcher wrote: > Hi, > > I've been searching for multiple genes on chromosome Y in rat, but have > been unable to find them by either gene name: SRY, ZFY, HY, USP9Y, RBMY, > TSPY or by genback accession #. Is there an assembly for chrY in rat? > > Thank you, > Karen > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From jje at gate.sinica.edu.tw Tue Dec 11 19:06:24 2007 From: jje at gate.sinica.edu.tw (J.J. Emerson) Date: Wed, 12 Dec 2007 11:06:24 +0800 Subject: [Genome] Chimpanzee self alignments In-Reply-To: References: Message-ID: <8612B221-56E9-434A-BEFF-488CEA9CD8C3@gate.sinica.edu.tw> Dear Brian, Thanks so much much for providing access to this resource! It will turn out to be quite useful. I'm most grateful that you've provided all of the blastz search work and subsequent chaining. I definitely would have balked at doing this myself if I was required to come up with sufficient server time to do that. I am further interested in organizing these chains into nets in the same way that human self-alignments are organized. If you also have an internal results set with these nets, would it be possible for you to provide access to it? Otherwise, could you make available the pipeline/protocol that you use to generate self-nets like you do with human? I'd be willing to do this myself, assuming that the computational time won't be prohibitive. I'm guessing that the most time-consuming part of the process is the blastz searching, with subsequent steps being rather fast in comparison. Thanks a lot for all of your help! Cheers, J.J. PS If you just happened to be willing net the chains too, I certainly wouldn't complain! In fact, I'd be rather grateful. However, I would completely understand if you are reluctant further process an internal results set that you hadn't planned on supporting in the first place. On Dec 11, 2007, at 8:27 AM, Brian Raney wrote: > Hey J.J., > > As you've noticed, self-chains are not part of the standard UCSC > cross-species pipeline, but we have chimp self-chains generated > for internal use that you're welcome to use with the warning that > these have not been through the UCSC QA process. Self chains > are also hard to tune since there is no one whole genome duplication > point like > there is in speciation, so self chains are inevitably going to favor > certain percent i/d duplications over those that are older ( or more > recent). > > http://genome-test.cse.ucsc.edu/goldenPath/panTro2/vsSelf/ > > brian > > On Dec 4, 2007 12:18 AM, J.J. Emerson wrote: > I notice that the self alignments at UCSC are done only for select > genomes which appear to be vertebrate, relatively "high quality" and > "important". I was wondering if the chimpanzee genome could be run > through the pipeline in order to collect the self alignments? I know > this is a big undertaking, but considering the attention to primate > genomics, especially with regard to duplication, this would seem > useful, even if the chimp genome isn't as high quality as the human > genome. It seems as if chimp meets most of these criteria well, its > assembly quality notwithstanding. It would be a really great > resource, even with the caveats. I can think of many people who would > find such a resource invaluable. > > Thanks a lot! > > Cheers, > > J.J. > > PS > These are some keywords that might help others using a search engine > to find this post and its response: > Pan troglodytes, paralogy, synteny, self alignment, duplication, > chimp, chimpanzee, duplication, CNV > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome > From dpf05 at mails.tsinghua.edu.cn Tue Dec 11 19:47:19 2007 From: dpf05 at mails.tsinghua.edu.cn (Pufeng Du) Date: Wed, 12 Dec 2007 11:47:19 +0800 Subject: [Genome] Chimp ESTs Message-ID: <000601c83c71$b46889a0$25826fa6@bioinforbd891e> Hi I have found that the est sequences in the following file are human ests, not chimps. HYPERLINK "http://hgdownload.cse.ucsc.edu/goldenPath/panTro2/bigZips/est.fa.gz"http:// hgdownload.cse.ucsc.edu/goldenPath/panTro2/bigZips/est.fa.gz Is this an error of UCSC database? --- Pufeng Du, Phd. Candidate Email: HYPERLINK "mailto:dpf05 at mails.tsinghua.edu.cn"dpf05 at mails.tsinghua.edu.cn No virus found in this outgoing message. Checked by AVG Free Edition. Version: 7.5.503 / Virus Database: 269.17.1/1181 - Release Date: 2007-12-11 17:05 From barth at dmbr.ugent.be Wed Dec 12 06:23:18 2007 From: barth at dmbr.ugent.be (Bart Hooghe) Date: Wed, 12 Dec 2007 15:23:18 +0100 Subject: [Genome] multiz 28 way Message-ID: <475FEED6.7060400@dmbr.ugent.be> Dear UCSC staff - I was wondering how I should interpret the double human records in the 28 way at http://hgdownload.cse.ucsc.edu/goldenPath/hg18/multiz28way/maf/upstream5000.maf.gz ? I downloaded the multiz 17 way and multiz 28 way a while ago and I am now confronted with the fact that in the 28 way there is a 'species' with name NM_... and one with hg18. I assume both point to the human sequence. But why and what's the difference ? This is not the case for multiz 17 way. the output of 'grep -A25 "NM_003639" multiz28way_upstream5000.maf | more' starts with s NM_003639 0 5000 + 5000 .................................... and contains only a few hundred bases in the middle of the alignment sequence the 14th species in this alignment piece starts with s hg18 0 5000 + 5000 CCTGAACTCTTGCTGGTCTGCTTACTTGCCGTGGTTCC-CTGGCCGCAA the output of 'grep -A20 "NM_003639" multiz17way_upstream5000.maf | more' starts with s NM_003639 0 5000 + 5000 CCTGAACTCTTGCTGGTCTGCTTACTTGCCGTGGTTCCCTGGCCGCA-ATAGCACATAGG Greets, Bart --------------------------------------------------------------------------------- Bart L. Hooghe, predoctoral fellow, Bioinformatics Core, Department for Molecular Biomedical Research (DMBR) VIB - Ghent University Fiers-Schell-Van Montagu Research Building Technologiepark 927 , B-9052 Zwijnaarde , Belgium tel ++32(0)93313693 fax ++32(0)93313609 mailto: Bart.Hooghe at dmbr.UGent.be --------------------------------------------------------------------------------- From Robert.Olinski at neuro.uu.se Wed Dec 12 03:23:27 2007 From: Robert.Olinski at neuro.uu.se (Robert Olinski) Date: Wed, 12 Dec 2007 20:23:27 +0900 Subject: [Genome] Accessing the right chained self alignment in hg18 Message-ID: <20071212202327.7vl95xihwk0gk0ow@webmail5.uu.se> Hello there at UCSC! I have a problem with identification of the desired sequence using chained self alignment option in the human Genome Browser. For example, string "fill 159921 30 chr1 + 220755888 30 id 494 score 2387 ali 30 qOver 0 qFar 220060138 qDup 0 type syn tN 0 qN 0 tR 30 qR 30 tTrf 0 qTrf 0" that came from hg18.hg18.net finds many syntenic hits for chromosome 1 with the following IDs 318; 4716115, etc. I would like to know how can I find directly self aligned segment with ID 494 as specified in the provided string. Thank you very much for your answer! With regards, /Robert Robert P Olinski, M.Sc.,PhD Graduate School of Bioscience and Biotechnology Department of Biological Sciences Tokyo Institute of Technology 4259 Nagatsuta-cho, Midori-ku, Yokohama 226-8501 Japan phone:+81-45-924-5744 fax: +81-45-924-5835 email: Robert.Olinski at neuro.uu.se email 2: robertinuppsala at hotmail.com From rhead at soe.ucsc.edu Wed Dec 12 11:17:12 2007 From: rhead at soe.ucsc.edu (Brooke Rhead) Date: Wed, 12 Dec 2007 11:17:12 -0800 Subject: [Genome] Chimp ESTs In-Reply-To: <000601c83c71$b46889a0$25826fa6@bioinforbd891e> References: <000601c83c71$b46889a0$25826fa6@bioinforbd891e> Message-ID: <476033B8.1030508@soe.ucsc.edu> Hello Pufeng, The chimpanzee (panTro2) assembly is a special case in the Genome Browser. Because there are very few GenBank sequences for chimp, and because the human and chimp assemblies are so similar, we have included human sequences in the "native" alignments for chimp. This affects four tracks: RefSeq mRNAs ESTs Spliced ESTs The fact that these tracks include human sequences is noted on the track description pages (click on the track name in the Genome Browser). These two download files are also affected: est.fa.gz xenoMrna.fa.gz We have now updated the description on: http://hgdownload.cse.ucsc.edu/goldenPath/panTro2/bigZips/ to reflect the inclusion of human sequences in these files. -- Brooke Rhead UCSC Genome Bioinformatics Group Pufeng Du wrote: > Hi > > > > I have found that the est sequences in the following file are human ests, > not chimps. > > > > HYPERLINK > "http://hgdownload.cse.ucsc.edu/goldenPath/panTro2/bigZips/est.fa.gz"http:// > hgdownload.cse.ucsc.edu/goldenPath/panTro2/bigZips/est.fa.gz > > > > Is this an erro