From huntley at ebi.ac.uk Thu Nov 1 08:33:47 2007 From: huntley at ebi.ac.uk (Rachael Huntley) Date: Thu, 01 Nov 2007 15:33:47 +0000 Subject: [Genome] UCSC IDs Message-ID: <4729F1DB.4030108@ebi.ac.uk> Hi, Can you tell me whether UCSC IDs are the same as GenBank IDs and whether this is always the case? Thanks, Rachael. -- Rachael Huntley Ph.D. European Bioinformatics Institute-EMBL Wellcome Trust Genome Campus Hinxton Cambridge CB10 1SD United Kingdom Tel: +44 (0)1223 492515 Fax: +44 (0)1223 494468 E-mail: huntley at ebi.ac.uk URL: http://www.ebi.ac.uk/goa From oharis at scripps.edu Thu Nov 1 09:44:20 2007 From: oharis at scripps.edu (Olivier Harismendy) Date: Thu, 1 Nov 2007 09:44:20 -0700 Subject: [Genome] dbSNP(128) Message-ID: <71DF3AEC-D11B-4C3E-827B-C1340E7EB362@scripps.edu> Hi, I was wondering when you were planning to upload dbSNP(127) released 6 month ago or dbSNP(128) released a week ago. I would love to have the recent SNP mapped to UCSC without fishing them out from dbSNP itself. Thanks for the great Job you are doing Olivier _______________________________________________ Olivier Harismendy, PhD Scripps Genomic Medicine The Scripps Research Institute Maildrop MEM 275 10550 North Torrey Pines Road San Diego CA 92037 Phone (858) 784 2550 Email : oharis at scripps.edu From Ohlsen.Kari at scrippshealth.org Thu Nov 1 09:39:52 2007 From: Ohlsen.Kari at scrippshealth.org (Ohlsen, Kari) Date: Thu, 1 Nov 2007 09:39:52 -0700 Subject: [Genome] dbsnp 127 Message-ID: <4B1B93F6147ED647AD844250B1A6375E02E992D5@MSG04.corp.scripps.org> Hello there, I was wondering when the coordinates for dbSNP build 127 would be deposited for human build hg18. I found an old post, indicating that it was available on the test server. Should we use that? Thanks, Kari Ohlsen ========================= Kari L. Ohlsen, Ph.D. Bioinformatics Program Manager Scripps Genomic Medicine Scripps Health and the Scripps Research Institute The Scripps Research Institute, MEM-275 10550 North Torrey Pines Road La Jolla, California 92037 Office (858) 784-2657 "Scripps Information Security" ------------------------------------------------------------------------------ This e-mail and any files transmitted with it may contain privileged and confidential information and are intended solely for the use of the individual or entity to which they are addressed. If you are not the intended recipient or the person responsible for delivering the e-mail to the intended recipient, you are hereby notified that any dissemination or copying of this e-mail or any of its attachment(s) is strictly prohibited. If you have received this e-mail in error, please immediately notify the sending individual or entity by e-mail and permanently delete the original e-mail and attachment(s) from your computer system. Thank you for your cooperation. ============================================================================== From cchang at medicine.umaryland.edu Thu Nov 1 10:37:56 2007 From: cchang at medicine.umaryland.edu (Christy Chang) Date: Thu, 01 Nov 2007 13:37:56 -0400 Subject: [Genome] OMIM number Message-ID: <4729D6B3.8D94.00B1.0@medicine.umaryland.edu> I am trying to find a list of all the known genes associated with human diseases and obtained such list from OMIM. But I need to translate the OMIM numbers to RefSeq numbers for downstream application. I noted that OMIM numbers are provided in your individual RefSeq gene information. Is there a table where I can get all the RefSeq-OMIM numbers at once? Thank you for your help. Christy Confidentiality Statement: This email message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. From rhead at soe.ucsc.edu Thu Nov 1 10:57:50 2007 From: rhead at soe.ucsc.edu (Brooke Rhead) Date: Thu, 01 Nov 2007 10:57:50 -0700 Subject: [Genome] Query about Unigene mapping data in UCSC In-Reply-To: References: Message-ID: <472A139E.4080601@soe.ucsc.edu> Hello Lalitha, We do not generally update the uniGene_3 table. It is just created once per build. You can check on the date any table was made by going to the track controls page (by clicking on the blue track name above the drop-down menu that controls visibility) and looking for the "data last updated" line. For the hg18 uniGene_3 table the update date is: Data last updated: 2006-11-17 I hope this information is helpful. -- Brooke Rhead UCSC Genome Bioinformatics Group Viswanath, Lalitha (NIH/NCI) [C] wrote: > Hi > > I am looking to use blat alignments of Unigene sequences against build > hg18 of the human genome provided by UCSC. > > I would appreciate if you could provide the frequency of updates for the > uniGene_3 table. > > > > > > Thanks > > Lalitha > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From kayla at soe.ucsc.edu Thu Nov 1 14:06:49 2007 From: kayla at soe.ucsc.edu (Kayla Smith) Date: Thu, 01 Nov 2007 14:06:49 -0700 Subject: [Genome] UCSC IDs In-Reply-To: <4729F1DB.4030108@ebi.ac.uk> References: <4729F1DB.4030108@ebi.ac.uk> Message-ID: <472A3FE9.7000009@cse.ucsc.edu> Hello Rachel, The UCSC IDs and the GenBank IDs are not the same thing. You will find useful correspondences in the tables hg18.kgXref and hg18.knownToRefSeq Those tables can be downloaded here: http://hgdownload.cse.ucsc.edu/goldenPath/hg18/database/ The details page for the Known Genes track also has some information: http://genome.ucsc.edu/cgi-bin/hgTrackUi?g=knownGene I hope this information is helpful to you. Please don't hesitate to contact us again if you require further assistance. Kayla Smith UCSC Genome Bioinformatics Group Rachael Huntley wrote: > Hi, > > Can you tell me whether UCSC IDs are the same as GenBank IDs and whether > this is always the case? > > Thanks, > Rachael. > From kayla at soe.ucsc.edu Thu Nov 1 15:50:14 2007 From: kayla at soe.ucsc.edu (Kayla Smith) Date: Thu, 01 Nov 2007 15:50:14 -0700 Subject: [Genome] OMIM number In-Reply-To: <4729D6B3.8D94.00B1.0@medicine.umaryland.edu> References: <4729D6B3.8D94.00B1.0@medicine.umaryland.edu> Message-ID: <472A5826.40400@cse.ucsc.edu> Hello Christy, The proteome.hgnc table has both refSeqId and omim fields. You can either look at this table in the Table Browser or download it here: http://hgdownload.cse.ucsc.edu/goldenPath/proteinDB/proteins070202/database/hgnc.txt.gz I hope this information is helpful to you. Please don't hesitate to contact us again if you require further assistance. Kayla Smith UCSC Genome Bioinformatics Group Christy Chang wrote: > I am trying to find a list of all the known genes associated with human diseases and obtained such list from OMIM. > But I need to translate the OMIM numbers to RefSeq numbers for downstream application. I noted that OMIM numbers > are provided in your individual RefSeq gene information. Is there a table where I can get all the RefSeq-OMIM numbers at once? > > Thank you for your help. > > Christy > > Confidentiality Statement: > This email message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From kayla at soe.ucsc.edu Thu Nov 1 15:57:14 2007 From: kayla at soe.ucsc.edu (Kayla Smith) Date: Thu, 01 Nov 2007 15:57:14 -0700 Subject: [Genome] dbsnp 127 In-Reply-To: <4B1B93F6147ED647AD844250B1A6375E02E992D5@MSG04.corp.scripps.org> References: <4B1B93F6147ED647AD844250B1A6375E02E992D5@MSG04.corp.scripps.org> Message-ID: <472A59CA.6070602@cse.ucsc.edu> Hello Kari, Yes, the dbSNP build 127 is currently available on our test server, http://genome-test.cse.ucsc.edu/ Please note that data found here has not gone through our rigorous QA process and may contain errors. Please don't hesitate to contact us again if you require further assistance. Kayla Smith UCSC Genome Bioinformatics Group Ohlsen, Kari wrote: > Hello there, > > I was wondering when the coordinates for dbSNP build 127 would be > deposited for human build hg18. I found an old post, indicating that it > was available on the test server. Should we use that? > > Thanks, > Kari Ohlsen > > ========================= > Kari L. Ohlsen, Ph.D. > Bioinformatics Program Manager > Scripps Genomic Medicine > Scripps Health and the Scripps Research Institute > The Scripps Research Institute, MEM-275 > 10550 North Torrey Pines Road > La Jolla, California 92037 > > Office (858) 784-2657 > > > "Scripps Information Security" > ------------------------------------------------------------------------------ > This e-mail and any files transmitted with it may contain privileged and confidential information and are intended solely for the use of the individual or entity to which they are addressed. If you are not the intended recipient or the person responsible for delivering the e-mail to the intended recipient, you are hereby notified that any dissemination or copying of this e-mail or any of its attachment(s) is strictly prohibited. If you have received this e-mail in error, please immediately notify the sending individual or entity by e-mail and permanently delete the original e-mail and attachment(s) from your computer system. Thank you for your cooperation. > > > ============================================================================== > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From kayla at soe.ucsc.edu Thu Nov 1 16:08:12 2007 From: kayla at soe.ucsc.edu (Kayla Smith) Date: Thu, 01 Nov 2007 16:08:12 -0700 Subject: [Genome] dbSNP(128) In-Reply-To: <71DF3AEC-D11B-4C3E-827B-C1340E7EB362@scripps.edu> References: <71DF3AEC-D11B-4C3E-827B-C1340E7EB362@scripps.edu> Message-ID: <472A5C5C.3030206@cse.ucsc.edu> Hello Olivier, We have dbSNP 127 available on our test server for now: http://genome-test.cse.ucsc.edu/ Please keep in mind that it has not yet gone through our rigorous QA process and may contain errors. We do not yet have a track for dbSNP 128. I can email you when this data is available on the Genome Browser. Kayla Smith UCSC Genome Bioinformatics Group Olivier Harismendy wrote: > Hi, > > I was wondering when you were planning to upload dbSNP(127) released > 6 month ago or dbSNP(128) released a week ago. I would love to have > the recent SNP mapped to UCSC without fishing them out from dbSNP > itself. > > Thanks for the great Job you are doing > > Olivier > > _______________________________________________ > Olivier Harismendy, PhD > > Scripps Genomic Medicine > The Scripps Research Institute > Maildrop MEM 275 > 10550 North Torrey Pines Road > San Diego CA 92037 > > Phone (858) 784 2550 > Email : oharis at scripps.edu > > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From viswanathl at mail.nih.gov Fri Nov 2 08:06:03 2007 From: viswanathl at mail.nih.gov (Viswanath, Lalitha (NIH/NCI) [C]) Date: Fri, 2 Nov 2007 11:06:03 -0400 Subject: [Genome] Query regarding EST Blat Alignment results Message-ID: Hi NCI's caBIO (part of caBIG: http://cabig.nci.nih.gov ) exposes the chromosomal positions of ESTs as available in ftp://hgdownload.cse.ucsc.edu/goldenPath/hg18/database/chr*_est.txt.gz This dataset does not provide the BLAT alignment scores and contains multiple mappings per EST, in some cases. Examples are AA001509, AA001126, T40080, Z99433, etc I would appreciate if you could advise whether a) a pruned data set containing only one result per EST is available or b) a dataset providing BLAT scores for the alignments is available Any input on the settings for BLAT or process to filter BLAT alignment results would be helpful in understanding the results. Thanks Lalitha Data Architect SAIC (NYSE: SAI) National Cancer Institute (NIH) From julien.roux at isb-sib.ch Fri Nov 2 02:01:28 2007 From: julien.roux at isb-sib.ch (Julien Roux) Date: Fri, 02 Nov 2007 10:01:28 +0100 Subject: [Genome] mapping of zebrafish Compugen oligonucleotides Message-ID: <472AE768.6050802@isb-sib.ch> Hello I was told a few months ago that the UCSC genome browser was in the process of mapping a set of oligonucleotides probes for zebrafish microarray, designed by Compugen. This is used for example in that article: http://genetics.plosjournals.org/perlserv/?request=get-document&doi=10.1371/journal.pgen.0010029 Was this mapping achieved? Where can I find it on the genome browser or on the ftp server? Thanks a lot for your help. Julien -- Julien Roux, PhD student SIB - Swiss Institute of Bioinformatics Evolutionary Bioinformatics Group (Marc Robinson-Rechavi) http://www.unil.ch/dee/page22707.html Biophore, University of Lausanne, 1015 Lausanne, Switzerland tel: +41 21 692 4221 fax: +41 21 692 4165 From ann at soe.ucsc.edu Fri Nov 2 11:11:10 2007 From: ann at soe.ucsc.edu (Ann Zweig) Date: Fri, 02 Nov 2007 11:11:10 -0700 Subject: [Genome] mapping of zebrafish Compugen oligonucleotides In-Reply-To: <472AE768.6050802@isb-sib.ch> References: <472AE768.6050802@isb-sib.ch> Message-ID: <472B683E.30204@cse.ucsc.edu> Hello Julien, We do have that data, but it has not yet been mapped to the zebrafish genome. We are in correspondence with the authors to resolve some inconsistencies in the data set. After this is resolved, it will be available on our website. Regards, ---------- Ann Zweig UCSC Genome Bioinformatics Group http://genome.ucsc.edu Julien Roux wrote: > Hello > > I was told a few months ago that the UCSC genome browser was in the > process of mapping a set of oligonucleotides probes for zebrafish > microarray, designed by Compugen. > This is used for example in that article: > http://genetics.plosjournals.org/perlserv/?request=get-document&doi=10.1371/journal.pgen.0010029 > > Was this mapping achieved? Where can I find it on the genome browser or > on the ftp server? > > Thanks a lot for your help. > Julien > From zxu at uhnres.utoronto.ca Fri Nov 2 11:40:10 2007 From: zxu at uhnres.utoronto.ca (zxu) Date: Fri, 02 Nov 2007 14:40:10 -0400 Subject: [Genome] question about custom refseq track. Message-ID: <472B6F0A.459B06E3@uhnres.utoronto.ca> I had a question about how to customize refseq track? bacically, I want to show certain refseq within certain chromosome but not all of them? let's say in chr6:200000-1000000, there are gene DUSP22, IRF4, EXOC2, HUS1B. but I want to only show DUSP22, IRF4, not the other two genes. since DUSP22, IRF4 are realted to IFN pathway, let's say? Best regrads zd From ann at soe.ucsc.edu Fri Nov 2 14:48:16 2007 From: ann at soe.ucsc.edu (Ann Zweig) Date: Fri, 02 Nov 2007 14:48:16 -0700 Subject: [Genome] question about custom refseq track. In-Reply-To: <472B6F0A.459B06E3@uhnres.utoronto.ca> References: <472B6F0A.459B06E3@uhnres.utoronto.ca> Message-ID: <472B9B20.9080102@cse.ucsc.edu> Hello zxu, It is possible to do what you are suggesting by making a Custom Track from the refSeq track. When you do this, you can limit the genes you would like to display in your track. If you have never made a Custom Track using the Table Browser, you can read about it here: http://genome.ucsc.edu/goldenPath/help/hgTablesHelp.html Configure the Table Browser by choosing the refSeq table. Then enter your list of genes into the 'identifiers (names/accessions): "paste list"' (e.g. DUSP22 IRF4). Choose "custom track" as your output format. Then follow the prompts to view your newly-created Custom Track in the Genome Browser. I hope this information is helpful to you. Please don't hesitate to contact the mail list again if you require further assistance. Regards, ---------- Ann Zweig UCSC Genome Bioinformatics Group http://genome.ucsc.edu Please feel free to search the Genome mailing list archives by visiting our home page, clicking on "Contact Us", then typing a word or phrase into the search box. On that same page (http://genome.ucsc.edu/contacts.html), you can subscribe to the Genome mailing list. zxu wrote: > I had a question about how to customize refseq track? bacically, I want > to show certain refseq within certain chromosome but not all of them? > let's say in chr6:200000-1000000, there are gene DUSP22, IRF4, EXOC2, > HUS1B. but I want to only show DUSP22, IRF4, not the other two genes. > since DUSP22, IRF4 are realted to IFN pathway, let's say? > > Best regrads > > zd > > > > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From ann at soe.ucsc.edu Fri Nov 2 16:16:45 2007 From: ann at soe.ucsc.edu (Ann Zweig) Date: Fri, 02 Nov 2007 16:16:45 -0700 Subject: [Genome] Query regarding EST Blat Alignment results In-Reply-To: References: Message-ID: <472BAFDD.2090600@cse.ucsc.edu> Hello Lalitha, To generate the EST track, human ESTs from GenBank are aligned against the genome using BLAT. Quite often, ESTs will align to more than one location in the genome. When a single EST aligns in multiple places, the alignment having the highest base identity is identified. Other EST alignments having a base identity level within a certain amount of the best alignment (call this "X") and at minimum a certain percent base identity (call this "Y") with the genomic sequence are kept. For ESTs in the latest human assembly (hg18), we apply following filtering to the BLATted results: pslCDnaFilter -minId=0.95 -minCover=0.25 -globalNearBest=0.0025 -minQSize=20 -minNonRepSize=16 -ignoreNs -bestOverlap -polyASizes=ests.polya -usePolyTHead ests-in.psl ests-out.psl Where: ests.polya is created by faPolyASizes globalNearBest=0.0025 == "X" in the previous paragraph minId=0.95 == "Y" in the previous paragraph We keep and display all alignments that pass the filtering listed above. The near-best filtering is used since it can be very difficult to determine exactly which locus actually produced an EST. The polymorphisms within the genome make it difficult to distinguish between very similar loci. So keeping only one EST alignment might mean we are keeping the *wrong* alignment. We do not have a pruned data set, however, if you want to see only one alignment for each EST, you could prune the data set yourself based on whichever parameter you think is "best". We do not have a table that contains the BLAT score. But the EST table you are looking at does contain all of the information needed to compute the BLAT score. The score used in filtering is computed from the following tool in our source tree: kent/src/hg/pslCDnaFilter/cDnaAligns.c a simple score function that is sometimes used is: int pslScore(const struct psl *psl) /* Return score for psl. */ { int sizeMul = pslIsProtein(psl) ? 3 : 1; return sizeMul * (psl->match + ( psl->repMatch>>1)) - sizeMul * psl->misMatch - psl->qNumInsert - psl->tNumInsert; } The Genome Browser and Blat software are free for academic, nonprofit, and personal use. A license is required for commercial use. How to download the software: http://genome.cse.ucsc.edu/FAQ/FAQlicense#license3 You can obtain the source tree either via CVS: http://genome.ucsc.edu/admin/cvs.html or a zip file: http://hgdownload.cse.ucsc.edu/admin/jksrc.zip Please note the build instructions: http://genome.ucsc.edu/admin/jk-install.html All of the kent utilities output their usage message and command line options by running them with no arguments. I hope this information is helpful to you. Please don't hesitate to contact the mail list again if you require further assistance. Regards, ---------- Ann Zweig UCSC Genome Bioinformatics Group http://genome.ucsc.edu Please feel free to search the Genome mailing list archives by visiting our home page, clicking on "Contact Us", then typing a word or phrase into the search box. On that same page (http://genome.ucsc.edu/contacts.html), you can subscribe to the Genome mailing list. Viswanath, Lalitha (NIH/NCI) [C] wrote: > Hi > > NCI's caBIO (part of caBIG: http://cabig.nci.nih.gov ) exposes the > chromosomal positions of ESTs as available in > > ftp://hgdownload.cse.ucsc.edu/goldenPath/hg18/database/chr*_est.txt.gz > > > > This dataset does not provide the BLAT alignment scores and contains > multiple mappings per EST, in some cases. > > > > Examples are AA001509, AA001126, T40080, Z99433, etc > > > > I would appreciate if you could advise whether > > a) a pruned data set containing only one result per EST is available or > > b) a dataset providing BLAT scores for the alignments is available > > > > Any input on the settings for BLAT or process to filter BLAT alignment > results would be helpful in understanding the results. > > > > Thanks > > Lalitha > > Data Architect > > SAIC (NYSE: SAI) > > National Cancer Institute (NIH) > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From mguttman at MIT.EDU Sat Nov 3 22:02:14 2007 From: mguttman at MIT.EDU (Mitchell Guttman) Date: Sun, 4 Nov 2007 01:02:14 -0400 Subject: [Genome] MM9 to MM8 Chain Files Message-ID: <000801c81e9f$ddcecc90$0601a8c0@mguttman> Hi, I am interested in converting MM9 coordinates to MM8, is there any way to get the appropriate chain files? Thanks, Mitchell Guttman From viswanathl at mail.nih.gov Mon Nov 5 08:25:54 2007 From: viswanathl at mail.nih.gov (Viswanath, Lalitha (NIH/NCI) [C]) Date: Mon, 5 Nov 2007 11:25:54 -0500 Subject: [Genome] Query about Genbank Accession to RefSeq Ids Message-ID: Hi Does UCSC provide an exhaustive mapping of all available RefSeq Ids to Genbank accession numbers? The documentation for the Known Genes Cross Reference table shows Mrna Id as a separate column from RefSeq ID. The data seems to indicate that mRNA id is of the same format as a Genbank accession number. Is it correct to assume that the Mrna Id is analogous to Genbank accession number? Thanks Lalitha From barbj at mail.nih.gov Mon Nov 5 13:08:14 2007 From: barbj at mail.nih.gov (Barb, Jennifer (NIH/CIT) [E]) Date: Mon, 5 Nov 2007 16:08:14 -0500 Subject: [Genome] refseq table question Message-ID: <08BFEF2D7CC3104FA411B6E1991200C2826776@NIHCESMLBX3.nih.gov> I am trying to obtain the refseq gene names along with the transcript id number from the Refseq table from the UCSC genome browser website but I only seem to find either a transcript id or a gene symbol, but no gene names/titles. Does anyone have a way to pull this info out of the tables on the website? Sincerely, Jennifer From kayla at soe.ucsc.edu Mon Nov 5 14:33:07 2007 From: kayla at soe.ucsc.edu (Kayla Smith) Date: Mon, 05 Nov 2007 14:33:07 -0800 Subject: [Genome] MM9 to MM8 Chain Files In-Reply-To: <000801c81e9f$ddcecc90$0601a8c0@mguttman> References: <000801c81e9f$ddcecc90$0601a8c0@mguttman> Message-ID: <472F9A23.1060004@cse.ucsc.edu> Hello Mitchell, Thank you for the suggestion. This is now on our list of things to do. I'll let you know when these files are ready. Thank you, Kayla Smith UCSC Genome Bioinformatics Group Mitchell Guttman wrote: > Hi, > > I am interested in converting MM9 coordinates to MM8, is there any way to get the appropriate chain files? > > Thanks, > Mitchell Guttman > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From rhead at soe.ucsc.edu Mon Nov 5 14:42:28 2007 From: rhead at soe.ucsc.edu (Brooke Rhead) Date: Mon, 05 Nov 2007 14:42:28 -0800 Subject: [Genome] refseq table question In-Reply-To: <08BFEF2D7CC3104FA411B6E1991200C2826776@NIHCESMLBX3.nih.gov> References: <08BFEF2D7CC3104FA411B6E1991200C2826776@NIHCESMLBX3.nih.gov> Message-ID: <472F9C54.70606@soe.ucsc.edu> Hello Jennifer, The 'refGene' table contains a 'name' field, which corresponds to the transcript ID, and a 'name2' field, which corresponds to the gene ID. You can use the Table Browser to get this information. Configure the Table Browser with the clade, genome, and assembly of interest. Then select: group: Genes and gene prediction tracks track: RefSeq Genes table: refGene region: genome output format: selected fields from primary and related tables Hit "get output", then select the boxes next to "name" and "name2". Hit "get output" again. You should see two columns corresponding to the transcript ID and gene name. For example, the first several results from this Table Browser query (using the human, March 2006 assembly) look like this: #name name2 NM_024763 WDR78 NM_207014 WDR78 NM_145243 OMA1 NM_012102 RERE NM_024503 HIVEP3 NM_001042682 RERE NM_001042681 RERE ... I hope this information helps. If this is not what you were looking for, or if we can clarify any of the above, please feel free to write back to this mailing list. -- Brooke Rhead UCSC Genome Bioinformatics Group Barb, Jennifer (NIH/CIT) [E] wrote: > I am trying to obtain the refseq gene names along with the transcript id > number from the Refseq table from the UCSC genome browser website but I > only seem to find either a transcript id or a gene symbol, but no gene > names/titles. Does anyone have a way to pull this info out of the > tables on the website? > Sincerely, > Jennifer > > > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From rmorin at bcgsc.ca Mon Nov 5 14:41:36 2007 From: rmorin at bcgsc.ca (Ryan Morin) Date: Mon, 5 Nov 2007 14:41:36 -0800 Subject: [Genome] Intronic single-exon genes Message-ID: <65DC1C4E-9269-452C-8FA8-8E603E34536D@bcgsc.ca> I am trying to get a list of all the genes (ensembl or known genes) that meet the following criteria: 1) have only a single exon 2) reside within the intron of another gene (on either the same or opposing strand) Does anyone know if this is possible using the table browser? Thanks, Ryan From kayla at soe.ucsc.edu Mon Nov 5 15:46:12 2007 From: kayla at soe.ucsc.edu (Kayla Smith) Date: Mon, 05 Nov 2007 15:46:12 -0800 Subject: [Genome] Intronic single-exon genes In-Reply-To: <65DC1C4E-9269-452C-8FA8-8E603E34536D@bcgsc.ca> References: <65DC1C4E-9269-452C-8FA8-8E603E34536D@bcgsc.ca> Message-ID: <472FAB44.8060808@cse.ucsc.edu> Hello Ryan, To get this set of genes, you'll need to make three Custom Tracks: 1. hg18.knownGenes filtered with exonCount=1 2. the introns of hg18.knownGenes 3. Make a third custom track by intersecting the CT of knownGene with exonCount=1 with the CT of knownGene introns. For your convenience, here is a session containing the three custom tracks described above: http://genome.cse.ucsc.edu/cgi-bin/hgTracks?hgS_doOtherUser=submit&hgS_otherUserName=Kayla&hgS_otherUserSessionName=kayla%2DkgIntrons I hope this information is helpful to you. Please don't hesitate to contact us again if you require further assistance. Kayla Smith UCSC Genome Bioinformatics Group Ryan Morin wrote: > I am trying to get a list of all the genes (ensembl or known genes) > that meet the following criteria: > > 1) have only a single exon > > 2) reside within the intron of another gene (on either the same or > opposing strand) > > Does anyone know if this is possible using the table browser? > > Thanks, > > Ryan > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From rhead at soe.ucsc.edu Mon Nov 5 16:10:07 2007 From: rhead at soe.ucsc.edu (Brooke Rhead) Date: Mon, 05 Nov 2007 16:10:07 -0800 Subject: [Genome] Query about Genbank Accession to RefSeq Ids In-Reply-To: References: Message-ID: <472FB0DF.3020108@soe.ucsc.edu> Hello Lalitha, RefSeq IDs are a type of Genbank accession numbers, as I understand it. According to the NCBI Handbook, here: http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=handbook.section.GenBank_ASM "In some cases, creation of a RefSeq record involves no more than selecting a single good example from GenBank and making a copy in RefSeq, which credits the GenBank record. In other cases, NCBI in-house staff generates and annotates the records based on the existing primary data, sometimes by combining parts of several GenBank records." I'm not sure if this answers your question. Please write back to this list with further questions, and we can try to clarify the way we store data in our tables. -- Brooke Rhead UCSC Genome Bioinformatics Group Viswanath, Lalitha (NIH/NCI) [C] wrote: > Hi > > Does UCSC provide an exhaustive mapping of all available RefSeq Ids to > Genbank accession numbers? > > > > The documentation for the Known Genes Cross Reference table shows Mrna > Id as a separate column from RefSeq ID. > > The data seems to indicate that mRNA id is of the same format as a > Genbank accession number. Is it correct to assume that the Mrna Id is > analogous to Genbank accession number? > > > > > > Thanks > > Lalitha > > > > > > > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From viswanathl at mail.nih.gov Tue Nov 6 06:54:53 2007 From: viswanathl at mail.nih.gov (Viswanath, Lalitha (NIH/NCI) [C]) Date: Tue, 6 Nov 2007 09:54:53 -0500 Subject: [Genome] Query about Ensembl Ids Message-ID: Hi Does UCSC offer a mapping of Ensembl Gene/Protein Ids to RefSeq or Genbank or Unigene Cluster Ids? Thanks Lalitha From barbj at mail.nih.gov Tue Nov 6 05:02:51 2007 From: barbj at mail.nih.gov (Barb, Jennifer (NIH/CIT) [E]) Date: Tue, 6 Nov 2007 08:02:51 -0500 Subject: [Genome] refseq table question In-Reply-To: <472F9C54.70606@soe.ucsc.edu> References: <08BFEF2D7CC3104FA411B6E1991200C2826776@NIHCESMLBX3.nih.gov> <472F9C54.70606@soe.ucsc.edu> Message-ID: <08BFEF2D7CC3104FA411B6E1991200C2826777@NIHCESMLBX3.nih.gov> Hi Brooke, Thank you, that is helpful although I was actually looking for the actual name of the gene and not just the gene symbol and id. Perhaps there is no way to obtain that directly from UCSC and I would have to go to NCBI and download Refseq and parse that for the information that I am looking for? What do you think? Jennifer -----Original Message----- From: Brooke Rhead [mailto:rhead at soe.ucsc.edu] Sent: Monday, November 05, 2007 5:42 PM To: Barb, Jennifer (NIH/CIT) [E] Cc: genome at soe.ucsc.edu Subject: Re: [Genome] refseq table question Hello Jennifer, The 'refGene' table contains a 'name' field, which corresponds to the transcript ID, and a 'name2' field, which corresponds to the gene ID. You can use the Table Browser to get this information. Configure the Table Browser with the clade, genome, and assembly of interest. Then select: group: Genes and gene prediction tracks track: RefSeq Genes table: refGene region: genome output format: selected fields from primary and related tables Hit "get output", then select the boxes next to "name" and "name2". Hit "get output" again. You should see two columns corresponding to the transcript ID and gene name. For example, the first several results from this Table Browser query (using the human, March 2006 assembly) look like this: #name name2 NM_024763 WDR78 NM_207014 WDR78 NM_145243 OMA1 NM_012102 RERE NM_024503 HIVEP3 NM_001042682 RERE NM_001042681 RERE ... I hope this information helps. If this is not what you were looking for, or if we can clarify any of the above, please feel free to write back to this mailing list. -- Brooke Rhead UCSC Genome Bioinformatics Group Barb, Jennifer (NIH/CIT) [E] wrote: > I am trying to obtain the refseq gene names along with the transcript id > number from the Refseq table from the UCSC genome browser website but I > only seem to find either a transcript id or a gene symbol, but no gene > names/titles. Does anyone have a way to pull this info out of the > tables on the website? > Sincerely, > Jennifer > > > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From yueming.ding at jax.org Tue Nov 6 08:24:51 2007 From: yueming.ding at jax.org (Yueming Ding) Date: Tue, 6 Nov 2007 11:24:51 -0500 Subject: [Genome] use liftOver to update SNP postions? In-Reply-To: <446B9DBA.1070902@cse.ucsc.edu> Message-ID: <20071106112451713.00000002264@sable> -----Original Message----- Hi Ann, Do you think if I can use liftOver to get Build 37 positions for mouse SNPs (mapped on Build 36)? Thanks. Yueming Ding The Jackson Laboratory From archanat at soe.ucsc.edu Tue Nov 6 09:53:54 2007 From: archanat at soe.ucsc.edu (Archana Thakkapallayil) Date: Tue, 06 Nov 2007 09:53:54 -0800 Subject: [Genome] Query about Ensembl Ids In-Reply-To: References: Message-ID: <4730AA32.9010702@soe.ucsc.edu> Hello Lalitha, This information can be obtained from the Table Browser using two tables, knownToEnsembl and kgXref. If you have the list of Ensembl Ids, you could get the corresponding Known Gene Id from the table 'knownToEnsembl' and using this kgID you could extract the corresponding RefSeq ID from the table 'kgXref'. I hope this information is helpful to you. Please don't hesitate to contact us again if you require further assistance. Regards, Archana UCSC Genome Bioinformatics Group Viswanath, Lalitha (NIH/NCI) [C] wrote: > Hi > > Does UCSC offer a mapping of Ensembl Gene/Protein Ids to RefSeq or > Genbank or Unigene Cluster Ids? > > > > Thanks > > Lalitha > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome > From ann at soe.ucsc.edu Tue Nov 6 10:25:18 2007 From: ann at soe.ucsc.edu (Ann Zweig) Date: Tue, 06 Nov 2007 10:25:18 -0800 Subject: [Genome] use liftOver to update SNP postions? In-Reply-To: <20071106112451713.00000002264@sable> References: <20071106112451713.00000002264@sable> Message-ID: <4730B18E.3030605@soe.ucsc.edu> Hello Yueming, Yes, the liftOver tool should work fine to map SNPs from mouse Build 36 (mm8) to Build 37 (mm9). I would suggest that you get the mm8 coordinate positions of the SNPs you are interested in by using the Table Browser. Table Browser User's Guide: http://genome.ucsc.edu/goldenPath/help/hgTablesHelp.html If you set up the filters and then restrict the output to only the chrom, chromStart, chromEnd and name fields, you will end up with a BED file that can be easily read by the liftOver tool. Like so: #chrom chromStart chromEnd name chr12 57628160 57628161 rs29205579 chr12 57628230 57628231 rs29211117 chr12 57628357 57628358 rs29193493 chr12 57628412 57628413 rs29222836 You can feed that into the liftOver tool to map those SNPs (complete with their names) to the mm9 assembly. If you would then like to view those mapped SNPs in the mm9 browser, you can make a custom track. Custom Track User's Guide: http://genome.ucsc.edu/goldenPath/help/customTrack.html Regards, ---------- Ann Zweig UCSC Genome Bioinformatics Group http://genome.ucsc.edu Please feel free to search the Genome mailing list archives by visiting our home page, clicking on "Contact Us", then typing a word or phrase into the search box. On that same page (http://genome.ucsc.edu/contacts.html), you can subscribe to the Genome mailing list. Yueming Ding wrote: > > -----Original Message----- > Hi Ann, > > Do you think if I can use liftOver to get Build 37 positions for mouse SNPs (mapped on Build 36)? Thanks. > > Yueming Ding > The Jackson Laboratory > From rhead at soe.ucsc.edu Tue Nov 6 11:46:50 2007 From: rhead at soe.ucsc.edu (Brooke Rhead) Date: Tue, 06 Nov 2007 11:46:50 -0800 Subject: [Genome] refseq table question In-Reply-To: <08BFEF2D7CC3104FA411B6E1991200C2826777@NIHCESMLBX3.nih.gov> References: <08BFEF2D7CC3104FA411B6E1991200C2826776@NIHCESMLBX3.nih.gov> <472F9C54.70606@soe.ucsc.edu> <08BFEF2D7CC3104FA411B6E1991200C2826777@NIHCESMLBX3.nih.gov> Message-ID: <4730C4AA.5070606@soe.ucsc.edu> Hi Jennifer, Can you be more specific about what you mean by "actual name of the gene", and maybe give an example or two? Do we display it anywhere on our site? I'm trying to figure out if this is something we store in a table here that I could point you to. -- Brooke Rhead UCSC Genome Bioinformatics Group Barb, Jennifer (NIH/CIT) [E] wrote: > Hi Brooke, > Thank you, that is helpful although I was actually looking for the > actual name of the gene and not just the gene symbol and id. Perhaps > there is no way to obtain that directly from UCSC and I would have to go > to NCBI and download Refseq and parse that for the information that I am > looking for? What do you think? > Jennifer > > > > -----Original Message----- > From: Brooke Rhead [mailto:rhead at soe.ucsc.edu] > Sent: Monday, November 05, 2007 5:42 PM > To: Barb, Jennifer (NIH/CIT) [E] > Cc: genome at soe.ucsc.edu > Subject: Re: [Genome] refseq table question > > Hello Jennifer, > > The 'refGene' table contains a 'name' field, which corresponds to the > transcript ID, and a 'name2' field, which corresponds to the gene ID. > You can use the Table Browser to get this information. > > Configure the Table Browser with the clade, genome, and assembly of > interest. Then select: > > group: Genes and gene prediction tracks > track: RefSeq Genes > table: refGene > region: genome > output format: selected fields from primary and related tables > > Hit "get output", then select the boxes next to "name" and "name2". Hit > > "get output" again. > > You should see two columns corresponding to the transcript ID and gene > name. For example, the first several results from this Table Browser > query (using the human, March 2006 assembly) look like this: > > #name name2 > NM_024763 WDR78 > NM_207014 WDR78 > NM_145243 OMA1 > NM_012102 RERE > NM_024503 HIVEP3 > NM_001042682 RERE > NM_001042681 RERE > ... > > I hope this information helps. If this is not what you were looking > for, or if we can clarify any of the above, please feel free to write > back to this mailing list. > > -- > Brooke Rhead > UCSC Genome Bioinformatics Group > > > > Barb, Jennifer (NIH/CIT) [E] wrote: >> I am trying to obtain the refseq gene names along with the transcript > id >> number from the Refseq table from the UCSC genome browser website but > I >> only seem to find either a transcript id or a gene symbol, but no gene >> names/titles. Does anyone have a way to pull this info out of the >> tables on the website? >> Sincerely, >> Jennifer >> >> >> >> _______________________________________________ >> Genome maillist - Genome at soe.ucsc.edu >> http://www.soe.ucsc.edu/mailman/listinfo/genome From rhead at soe.ucsc.edu Tue Nov 6 16:02:55 2007 From: rhead at soe.ucsc.edu (Brooke Rhead) Date: Tue, 06 Nov 2007 16:02:55 -0800 Subject: [Genome] refseq table question In-Reply-To: <4730C4AA.5070606@soe.ucsc.edu> References: <08BFEF2D7CC3104FA411B6E1991200C2826776@NIHCESMLBX3.nih.gov> <472F9C54.70606@soe.ucsc.edu> <08BFEF2D7CC3104FA411B6E1991200C2826777@NIHCESMLBX3.nih.gov> <4730C4AA.5070606@soe.ucsc.edu> Message-ID: <473100AF.3040203@soe.ucsc.edu> Hello again Jennifer, You might find the 'name' field of the 'description' table useful. It contains the information from the DEFINITION line in the GenBank record. For instance, for GABRA3 (NM_000808) http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&id=156602646 the DEFINITION line contains the text "Homo sapiens gamma-aminobutyric acid (GABA) A receptor, alpha 3 (GABRA3), mRNA." The 'refGene' table is connected to the 'gbCdnaInfo' table, which is then linked to the 'description' table, like so: hg18.gbCdnaInfo.acc (via refGene.name) hg18.description.id (via gbCdnaInfo.description) You can use the "selected fields from primary and related tables" output format in the Table Browser to connect the fields from the refGene table to the description table. You might also find the 'product' field of the 'refLink' table useful. It contains the description of the protein product from the GenBank record (in the "FEATURES" section, under "CDS", then "/product"). The text in this field for GABRA3 is "gamma-aminobutyric acid A receptor, alpha 3 precursor". Perhaps one of these fields is what you are looking for. -- Brooke Rhead UCSC Genome Bioinformatics Group Brooke Rhead wrote: > Hi Jennifer, > > Can you be more specific about what you mean by "actual name of the > gene", and maybe give an example or two? Do we display it anywhere on > our site? I'm trying to figure out if this is something we store in a > table here that I could point you to. > > -- > Brooke Rhead > UCSC Genome Bioinformatics Group > > > Barb, Jennifer (NIH/CIT) [E] wrote: >> Hi Brooke, >> Thank you, that is helpful although I was actually looking for the >> actual name of the gene and not just the gene symbol and id. Perhaps >> there is no way to obtain that directly from UCSC and I would have to go >> to NCBI and download Refseq and parse that for the information that I am >> looking for? What do you think? >> Jennifer >> >> >> >> -----Original Message----- >> From: Brooke Rhead [mailto:rhead at soe.ucsc.edu] >> Sent: Monday, November 05, 2007 5:42 PM >> To: Barb, Jennifer (NIH/CIT) [E] >> Cc: genome at soe.ucsc.edu >> Subject: Re: [Genome] refseq table question >> >> Hello Jennifer, >> >> The 'refGene' table contains a 'name' field, which corresponds to the >> transcript ID, and a 'name2' field, which corresponds to the gene ID. >> You can use the Table Browser to get this information. >> >> Configure the Table Browser with the clade, genome, and assembly of >> interest. Then select: >> >> group: Genes and gene prediction tracks >> track: RefSeq Genes >> table: refGene >> region: genome >> output format: selected fields from primary and related tables >> >> Hit "get output", then select the boxes next to "name" and "name2". Hit >> >> "get output" again. >> >> You should see two columns corresponding to the transcript ID and gene >> name. For example, the first several results from this Table Browser >> query (using the human, March 2006 assembly) look like this: >> >> #name name2 >> NM_024763 WDR78 >> NM_207014 WDR78 >> NM_145243 OMA1 >> NM_012102 RERE >> NM_024503 HIVEP3 >> NM_001042682 RERE >> NM_001042681 RERE >> ... >> >> I hope this information helps. If this is not what you were looking >> for, or if we can clarify any of the above, please feel free to write >> back to this mailing list. >> >> -- >> Brooke Rhead >> UCSC Genome Bioinformatics Group >> >> >> >> Barb, Jennifer (NIH/CIT) [E] wrote: >>> I am trying to obtain the refseq gene names along with the transcript >> id >>> number from the Refseq table from the UCSC genome browser website but >> I >>> only seem to find either a transcript id or a gene symbol, but no gene >>> names/titles. Does anyone have a way to pull this info out of the >>> tables on the website? >>> Sincerely, >>> Jennifer >>> >>> >>> >>> _______________________________________________ >>> Genome maillist - Genome at soe.ucsc.edu >>> http://www.soe.ucsc.edu/mailman/listinfo/genome > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From clement.kent at utoronto.ca Tue Nov 6 21:36:56 2007 From: clement.kent at utoronto.ca (Clement Kent) Date: Wed, 07 Nov 2007 00:36:56 -0500 Subject: [Genome] PhastCons .pp.gz files appear corrupt on Windows XP Message-ID: <20071107003656.dtcx4yk7448kcock@webmail.utoronto.ca> Hi, I downloaded the contents of the directory http://hgdownload.cse.ucsc.edu/goldenPath/dm3/phastCons15way/ several ways (ftp and firefox download) and experienced problems trying to extract the .pp file from the .gz archives, when the file had been downloaded using windows dos ftp. My extractor (7-Zip) claimed the files had mismatching crc ("broken" was the actual error message). When I downloaded the files from within Firefox, I was able to extract without a problem. Your documentation recommends using ftp; if there is a special setting for Windows users to use you might consider adding it to the docs. I tried both ascii and binary get and mget commands, with no luck. Thanks, Clement Kent Dept. of Biology, University of Toronto Mississauga From boconnor at ucla.edu Wed Nov 7 13:18:40 2007 From: boconnor at ucla.edu (Brian O'Connor) Date: Wed, 07 Nov 2007 13:18:40 -0800 Subject: [Genome] refGene and refSeqAli table dumps Message-ID: <47322BB0.2030306@ucla.edu> Hi, I'm trying to find the table dumps for the refGene and refSeqAli tables in hg18. I see them in the table browser (http://genome.ucsc.edu/cgi-bin/hgTables?command=start). However, when I tried looking for them in http://hgdownload.cse.ucsc.edu/goldenPath/hg18/database/ I couldn't find them. Maybe I'm looking in the wrong place? Also, could you clarify the difference in these two tables? Thanks very much for your help!! --Brian O'Connor From rhead at soe.ucsc.edu Wed Nov 7 15:03:40 2007 From: rhead at soe.ucsc.edu (Brooke Rhead) Date: Wed, 07 Nov 2007 15:03:40 -0800 Subject: [Genome] refGene and refSeqAli table dumps In-Reply-To: <47322BB0.2030306@ucla.edu> References: <47322BB0.2030306@ucla.edu> Message-ID: <4732444C.4000005@soe.ucsc.edu> Hello Brian, You are looking in the correct place for the table dumps. However, there was a recent glitch on our downloads server that prevented these tables from getting dumped. They should reappear in the next several days. In the meantime, you can dump them from the Table Browser by selecting each table and choosing "region: genome" and "output format: all fields from selected table". Be sure to enter a name for the output file to save it to your computer, and choose the option to compress it to speed up the download. Regarding the difference between refGene and refSeqAli: refSeqAli is generated first. It contains alignment information from BLATing the RefSeq sequences against the genome. The table is in PSL format, described here: http://genome.ucsc.edu/FAQ/FAQformat#format2 . This table is used to display the alignments in the "mRNA/Genomic Alignments" section of the RefSeq Genes details pages. The refGene table is generated from refSeqAli and additional information from the GenBank record, like where the coding sequence begins and ends, and the gene symbol. It is in the same format as our other gene prediction tables. The information in refGene is used to display the RefSeq Genes in the Genome Browser. I hope this information helps. If you have further questions, please do not hesitate to contact us again. -- Brooke Rhead UCSC Genome Bioinformatics Group Brian O'Connor wrote: > Hi, > > I'm trying to find the table dumps for the refGene and refSeqAli tables > in hg18. I see them in the table browser > (http://genome.ucsc.edu/cgi-bin/hgTables?command=start). However, when I > tried looking for them in > http://hgdownload.cse.ucsc.edu/goldenPath/hg18/database/ I couldn't find > them. Maybe I'm looking in the wrong place? > > Also, could you clarify the difference in these two tables? > > Thanks very much for your help!! > > --Brian O'Connor > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From rhead at soe.ucsc.edu Wed Nov 7 15:59:03 2007 From: rhead at soe.ucsc.edu (Brooke Rhead) Date: Wed, 07 Nov 2007 15:59:03 -0800 Subject: [Genome] looking for the table containing the exact information of a side by side Message-ID: <47325147.3060705@soe.ucsc.edu> Hello Xiaolu, If you are looking for the actual sequence alignments, and not just summary information about the alignments, you will need to download the pairwise alignment files available on our downloads server, here: http://hgdownload.cse.ucsc.edu/goldenPath/hg18/vsMm9/ (The chain and net tables do not contain any sequence.) For a good explanation of how to use the information in the chain and net tables, please see this previously answered question: http://www.soe.ucsc.edu/pipermail/genome/2006-February/009869.html Regarding your last question, I do not believe the tName (human) and qName (mouse) have been reversed in the chainMm9 and chainMm9Link tables. Is there a specific example in the tables that looks incorrect? -- Brooke Rhead UCSC Genome Bioinformatics Group > Subject: looking for the table containing the exact information of a side by side > alignment > From: Xiaolu Huang > Date: Wed, 7 Nov 2007 14:11:07 -0600 > To: genome at soe.ucsc.edu > CC: Katheleen.Gardiner at UCHSC.edu > > Dear bioinformaticians at UCSC, > > I have a side by side alignment and want to calculate the similarity. > > a smaple side by side alignment (net) (query human genome and target is > mouse genome) is as follows: > > 87486267 aaacctggagactaggaactagtacagcattggccgtgc-----------tgcagcaatg > 87486315 > >>>>>>>> || || ||| | |||| |||||| | || ||| | |||| | > >>>>>>>> > 29355972 caaagtgaagaatgggaa-tagtaccgtatcatttttgcatatccacagatatagcacta > 29356030 > > 87486316 gt-----------cgaccgtccacactctgcaac 87486338 > >>>>>>>> | || | || | ||| >>>>>>>> > 29356031 agaactgctttagcaactaacagtacattacaat 29356064 > > I have checked the netMm9 table, it does not seem to have the granuality > I need (detailed to each base and base pair). Do you have a table that > contain the detailed side by side alignment information for mouse and > human genome comparison? > > Also for the chainMm9 and chainMm9Link tables, is the target name and > query name have been reversed? > > Thanks! > > Xiaolu Huang > From rhead at soe.ucsc.edu Wed Nov 7 17:13:56 2007 From: rhead at soe.ucsc.edu (Brooke Rhead) Date: Wed, 07 Nov 2007 17:13:56 -0800 Subject: [Genome] PhastCons .pp.gz files appear corrupt on Windows XP In-Reply-To: <20071107003656.dtcx4yk7448kcock@webmail.utoronto.ca> References: <20071107003656.dtcx4yk7448kcock@webmail.utoronto.ca> Message-ID: <473262D4.5000003@soe.ucsc.edu> Hello Clement, One of our developers offers this information about DOS FTP: --- When using Windows DOS FTP, one needs to say: "bin" at the ftp command prompt before starting a transfer. This tells FTP to transfer the file and not modify it. Otherwise, DOS FTP thinks it is transferring a text file and it changes all the unix end of line markers into which is a bad thing to do for a binary file. After transferring the files, the md5 checksum from the downloaded file can be compared to what we claim to deliver as stated in md5sum.txt --- I hope this helps explain what you are seeing. -- Brooke Rhead UCSC Genome Bioinformatics Group Clement Kent wrote: > > Hi, I downloaded the contents of the directory > http://hgdownload.cse.ucsc.edu/goldenPath/dm3/phastCons15way/ several > ways (ftp and firefox download) and experienced problems trying to > extract the .pp file from the .gz archives, when the file had been > downloaded using windows dos ftp. My extractor (7-Zip) claimed the > files had mismatching crc ("broken" was the actual error message). > > When I downloaded the files from within Firefox, I was able to > extract without a problem. > > Your documentation recommends using ftp; if there is a special > setting for Windows users to use you might consider adding it to the > docs. I tried both ascii and binary get and mget commands, with no luck. > > Thanks, > > Clement Kent > Dept. of Biology, University of Toronto Mississauga > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From dm435 at cam.ac.uk Thu Nov 8 03:40:42 2007 From: dm435 at cam.ac.uk (Diego Miranda) Date: 08 Nov 2007 11:40:42 +0000 Subject: [Genome] Precomputed TFBS for mouse Message-ID: Hello, I am in need of the precomputed TFBS for the mouse genome (same format as they are provided for the human genome). I guess they are not available on the UCSC Genome Browser....would you have them somewhere for me, please? Thanks a lot for your help. With best wishes from Cambridge, diego -- Diego Miranda-Saavedra, PhD Bioinformatics at Gottgens Lab Cambridge Institute for Medical Research Wellcome Trust/MRC Building Addenbrooke's Hospital Hills Road Cambridge CB2 0XY United Kingdom Tel. +44 (0)1223 336822 E-mail. dm435 at cam.ac.uk Web. http://hscl.cimr.cam.ac.uk/ From paco.hulpiau at dmbr.ugent.be Thu Nov 8 01:26:43 2007 From: paco.hulpiau at dmbr.ugent.be (Paco Hulpiau) Date: Thu, 08 Nov 2007 10:26:43 +0100 Subject: [Genome] known gene IDs (kgId) refer to wrong Ensembl transcript ID (ENST) in knownToEnsembl table Message-ID: <4732D653.5020907@dmbr.ugent.be> Hi, I want to get transcripts (both refseq NM_s and ensembl ENSTs) for every gene by using the approved symbols from HGNC. The script to do the job stopped because it found a duplicate entry for a hgncId and was apparently caused by the entries below. The HGNC symbols are searched in the kgXref table to get the RefSeqs and I use the known gene ID (kgId) to get the Ensembl transcripts. In the knownToEnsembl table I get one ENST id using the kgId and then use this ENST to get the ENSG id in the ensGtp table. If I have the ENSG I can get all ENSTs for the gene again in the ensGtp table. Both ACCN3 and ABCB8 lead to ENSG00000197150 (ABCB8). I think the kgIds for ACCN3 are referring to the wrong ENST ids in the knownToEnsembl table. Same for ASIC3. In the Ensembl genome browser ENST00000356058 is a transcript for ABCB8 and not for ACCN3 or for ASIC3. Is this correct or am I missing something here? kgXref table knownToEnsembl table ensGtp table [kgId] (using kgId) (using ENST) uc003win.1 NM_004769 ACCN3 ENST00000356058 ENSG00000197150 uc003wio.1 NM_020321 ACCN3 ENST00000356058 ENSG00000197150 uc003wip.1 NM_020322 ACCN3 ENST00000356058 ENSG00000197150 uc003wik.1 NM_007188 ABCB8 ENST00000358849 ENSG00000197150 uc003wiq.1 AB209421 ASIC3 ENST00000356058 ENSG00000197150 uc003wil.1 AK002018 ABCB8 ENST00000297504 ENSG00000197150 uc003wim.1 AK094005 ABCB8 ENST00000358849 ENSG00000197150 uc003wij.1 CR599833 ABCB8 ENST00000356058 ENSG00000197150 uc003wii.1 AK128129 ABCB8 ENST00000356058 ENSG00000197150 Another thing I've noticed (e.g. for PCDH11X) is that some ENSTs are deprecated in Ensembl but are still in the ensGtp table. Now I also use the Ensembl API to look if all ENSTs found for a certain gene have a stable ID or not. Maybe there is another way to do the job? Thanks for any help or comment. Regards, Paco From WistowG at NEI.NIH.GOV Thu Nov 8 06:48:23 2007 From: WistowG at NEI.NIH.GOV (Wistow, Graeme (NIH/NEI) [E]) Date: Thu, 8 Nov 2007 09:48:23 -0500 Subject: [Genome] track for duplication loci Message-ID: <509008DD30F9944A8D9B1F9B382A7454B053AB@NIHCESMLBX.nih.gov> Evan Eichler and others have identified regions of the human genome that are subject to duplication/deletion and can act as hot spots for recombination: For example: Jiang Z, Tang H, Ventura M, Cardone MF, Marques-Bonet T, She X, Pevzner PA, Eichler EE. Ancestral reconstruction of segmental duplications reveals punctuated cores of human genome evolution. Nat Genet. 2007 Nov;39(11):1361-8. Cooper GM, Nickerson DA, Eichler EE. Mutational and selective effects on copy-number variants in the human genome. Nat Genet. 2007 Jul;39(7 Suppl):S22-9. Is there a browser track that indicates the relative positions of these regions on the current build or is one on the way? Graeme Wistow Ph.D. Chief, Section on Molecular Structure and Functional Genomics, National Eye Institute, 7/201, NIH, Bethesda, MD 20892-0703 USA Tel: 301-402-3452 Fax: 301-496-0078 Email: graeme at helix.nih.gov From kayla at soe.ucsc.edu Thu Nov 8 11:53:42 2007 From: kayla at soe.ucsc.edu (Kayla Smith) Date: Thu, 08 Nov 2007 11:53:42 -0800 Subject: [Genome] Precomputed TFBS for mouse In-Reply-To: References: Message-ID: <47336946.3010906@cse.ucsc.edu> Hello Diego, We do not have any plans to create a TFBS track for the mouse. One thing you might try is our liftOver utility (http://genome.ucsc.edu/cgi-bin/hgLiftOver), to convert human TFBS coordinates to mouse coordinates. The details page from the human TFBS track has detailed information on the generation of this track, and a link to Biobase, which might be useful to look into: http://genome.ucsc.edu/cgi-bin/hgTrackUi?g=tfbsConsSites I hope this information is helpful to you. Please don't hesitate to contact us again if you require further assistance. Kayla Smith UCSC Genome Bioinformatics Group Diego Miranda wrote: > Hello, > > I am in need of the precomputed TFBS for the mouse genome (same format as > they are provided for the human genome). I guess they are not available on > the UCSC Genome Browser....would you have them somewhere for me, please? > > Thanks a lot for your help. > > With best wishes from Cambridge, > > diego > > From Rosemary.Elliott at roswellpark.org Thu Nov 8 12:45:11 2007 From: Rosemary.Elliott at roswellpark.org (Elliott, Rosemary) Date: Thu, 08 Nov 2007 16:45:11 -0400 Subject: [Genome] BLAT problem Message-ID: There is a problem in BLAT. I had been using the mouse assemblies 36 and 37, and wished to add dog. I opened a new window, pulled down the BLAT menu for dog. Dog showed, but the assembly list did not change to the dog assemblies, and stayed as the mouse assemblies, even though the dog was checked. I pasted in a sequence, and got the results for mouse, and the check mark switched from dog to mouse. I tried with another organism, Opossum, with the same unhelpful result. Please check this out so we can use assemblies for different organisms. Often I have four BLAT windows open for different organisms, so that I can compare results, and have been able to get appropriate responses for each organism. One of your upgrades must have caused a error that has not been picked up as yet. I use a MAC OS ver. 10.4.8 on a power MAC G5. I have been using your wonderful program for years with great success, and very few glitches. This appears to be a glitch. Rosemary Elliott Roswell Park Cancer Institute, Buffalo. NY This email message may contain legally privileged and/or confidential information. If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited. If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you. From kayla at soe.ucsc.edu Thu Nov 8 14:20:31 2007 From: kayla at soe.ucsc.edu (Kayla Smith) Date: Thu, 08 Nov 2007 14:20:31 -0800 Subject: [Genome] BLAT problem In-Reply-To: References: Message-ID: <47338BAF.9080607@cse.ucsc.edu> Hello Rosemary, Please try resetting your cart by clicking on the following link: http://genome.ucsc.edu/cgi-bin/cartReset Let me know if that doesn't clear up the problem for you. Kayla Smith UCSC Genome Bioinformatics Group Elliott, Rosemary wrote: > There is a problem in BLAT. I had been using the mouse assemblies 36 and > 37, and wished to add dog. I opened a new window, pulled down the BLAT menu > for dog. Dog showed, but the assembly list did not change to the dog > assemblies, and stayed as the mouse assemblies, even though the dog was > checked. I pasted in a sequence, and got the results for mouse, and the > check mark switched from dog to mouse. I tried with another organism, > Opossum, with the same unhelpful result. Please check this out so we can use > assemblies for different organisms. > > Often I have four BLAT windows open for different organisms, so that I can > compare results, and have been able to get appropriate responses for each > organism. One of your upgrades must have caused a error that has not been > picked up as yet. > > I use a MAC OS ver. 10.4.8 on a power MAC G5. I have been using your > wonderful program for years with great success, and very few glitches. This > appears to be a glitch. > > Rosemary Elliott > Roswell Park Cancer Institute, > Buffalo. NY > > > > This email message may contain legally privileged and/or confidential information. If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited. If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you. > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From galt at soe.ucsc.edu Thu Nov 8 14:22:39 2007 From: galt at soe.ucsc.edu (Galt Barber) Date: Thu, 8 Nov 2007 14:22:39 -0800 (PST) Subject: [Genome] BLAT problem In-Reply-To: References: Message-ID: Hello! Our pulldown menus rely on a tiny bit of javascript. Please check and make sure your browser has javascript enabled. -Galt On Thu, 8 Nov 2007, Elliott, Rosemary wrote: > There is a problem in BLAT. I had been using the mouse assemblies 36 and > 37, and wished to add dog. I opened a new window, pulled down the BLAT menu > for dog. Dog showed, but the assembly list did not change to the dog > assemblies, and stayed as the mouse assemblies, even though the dog was > checked. I pasted in a sequence, and got the results for mouse, and the > check mark switched from dog to mouse. I tried with another organism, > Opossum, with the same unhelpful result. Please check this out so we can use > assemblies for different organisms. > > Often I have four BLAT windows open for different organisms, so that I can > compare results, and have been able to get appropriate responses for each > organism. One of your upgrades must have caused a error that has not been > picked up as yet. > > I use a MAC OS ver. 10.4.8 on a power MAC G5. I have been using your > wonderful program for years with great success, and very few glitches. This > appears to be a glitch. > > Rosemary Elliott > Roswell Park Cancer Institute, > Buffalo. NY > > > > This email message may contain legally privileged and/or confidential information. If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited. If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you. > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome > From toleno at usc.edu Thu Nov 8 15:09:52 2007 From: toleno at usc.edu (Donna Toleno) Date: Thu, 08 Nov 2007 15:09:52 -0800 Subject: [Genome] using the knownGene table Message-ID: Hello mailing list. I am new to using the UCSC resources and only slightly more experienced with NCBI. I am using the knownGene table to obtain the exon start and end sites for transcripts of a list of genes. The output from the table is sorted by chromosome number and position so the order of the output is different from my list order. Is there a way to do this query in a batch or scripted way so that I know which gene corresponds to each of the transcripts. I considered using the proteinID field in this table to merge together the appropriate information about my list of genes. However the FAQ for UCSC Genome Bioinformatics seems to advise against using this protein ID. I am not sure how I can translate the protein ID to the UniProt accession number. The other piece of information I have on my list of interesting genes is an accession number that begins with NM. I am equally confused by the display of RefSeqs in transcripts and products in the ncbi database. There is some text on this topic in the following link: http://www.ncbi.nlm.nih.gov/entrez/query/static/help/genefaq.html I realize that you probably can not comment on the ncbi resources. My main goal is to obtain the number of exons and the exon boundaries for a list of genes. I would appreciate any suggestions or advice. Perhaps one of the other tables will get me the information I need. Thank you, Donna From kuhn at soe.ucsc.edu Thu Nov 8 16:16:42 2007 From: kuhn at soe.ucsc.edu (Robert Kuhn) Date: Thu, 8 Nov 2007 16:16:42 -0800 Subject: [Genome] track for duplication loci Message-ID: <200711090016.QAA18011@moondance.cse.ucsc.edu> Dr. Wistow, We do have a track of data from Eichler's lab, Segmental Dups, last updated 2006-07-14. We will look into whether the data in the papers you cite are available for inclusion in the Browser. Thank you for bringing these data to our attention. best wishes, --b0b kuhn ucsc genome bioinformatics group > From genome at soe.ucsc.edu Thu Nov 8 09:32:57 2007 > To: > Subject: [Genome] track for duplication loci > > Evan Eichler and others have identified regions of the human genome that > are subject to duplication/deletion and can act as hot spots for > recombination: > > For example: > > Jiang Z, Tang H, Ventura M, Cardone MF, Marques-Bonet T, She X, Pevzner > PA, Eichler EE. > > Ancestral reconstruction of segmental duplications reveals punctuated > cores of human genome evolution. > > Nat Genet. 2007 Nov;39(11):1361-8. > > > > Cooper GM, Nickerson DA, Eichler EE. > > Mutational and selective effects on copy-number variants in the human > genome. > > Nat Genet. 2007 Jul;39(7 Suppl):S22-9. > > > > Is there a browser track that indicates the relative positions of these > regions on the current build or is one on the way? > > Graeme Wistow Ph.D. > Chief, Section on Molecular Structure and Functional Genomics, > National Eye Institute, > 7/201, NIH, > Bethesda, MD 20892-0703 USA > Tel: 301-402-3452 > Fax: 301-496-0078 > Email: graeme at helix.nih.gov > > > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome > From kayla at soe.ucsc.edu Thu Nov 8 17:14:59 2007 From: kayla at soe.ucsc.edu (Kayla Smith) Date: Thu, 08 Nov 2007 17:14:59 -0800 Subject: [Genome] using the knownGene table In-Reply-To: References: Message-ID: <4733B493.5030008@cse.ucsc.edu> Hello Donna, One quick way to get the positions of all the exons in a track is to use the Table Browser as follows: clade: Vertebrate, genome: Human, assembly, Mar. 2006 group: Genes and Gene Prediction Tracks, track: UCSC Genes table: knownGene region: genome output format: custom track click "get output" On the next page select "Create one BED record per:" "Exons" and click "get custom track in file". This will give you a file with the positions of all the exons, and in the name of each item is the kgId and which exon it is. chr1 1736 2090 uc001aaa.1_exon_0_0_chr1_1737_f 0 + chr1 2475 2584 uc001aaa.1_exon_1_0_chr1_2476_f 0 + chr1 3083 4121 uc001aaa.1_exon_2_0_chr1_3084_f 0 + To filter the results to only show the items you're interested in, go back and click on "identifiers: paste list" and paste in your names. Another way to go about this task is to use the Table Browser, but instead of selecting "custom track", select "all fields from selected tables" then click "get output". The "exonCount" column is how many exons a given gene has, and the exonStarts and exonEnds can be used to find out the positions of the exons within a gene. I'm not sure what your question is about ncbi, but if you can rephrase it I can give it a shot. I hope this information is helpful to you. Please don't hesitate to contact us again if you require further assistance. Kayla Smith UCSC Genome Bioinformatics Group Donna Toleno wrote: > > Hello mailing list. > > I am new to using the UCSC resources and only slightly more experienced with NCBI. I am using the knownGene table to obtain the exon start and end sites for transcripts of a list of genes. The output from the table is sorted by chromosome number and position so the order of the output is different from my list order. Is there a way to do this query in a batch or scripted way so that I know which gene corresponds to each of the transcripts. I considered using the proteinID field in this table to merge together the appropriate information about my list of genes. However the FAQ for UCSC Genome Bioinformatics seems to advise against using this protein ID. I am not sure how I can translate the protein ID to the UniProt accession number. > > The other piece of information I have on my list of interesting genes is an accession number that begins with NM. I am equally confused by the display of RefSeqs in transcripts and products in the ncbi database. There is some text on this topic in the following link: > > http://www.ncbi.nlm.nih.gov/entrez/query/static/help/genefaq.html > > I realize that you probably can not comment on the ncbi resources. > My main goal is to obtain the number of exons and the exon boundaries for a list of genes. I would appreciate any suggestions or advice. Perhaps one of the other tables will get me the information I need. > > > Thank you, > > Donna > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From kayla at soe.ucsc.edu Thu Nov 8 17:24:41 2007 From: kayla at soe.ucsc.edu (Kayla Smith) Date: Thu, 08 Nov 2007 17:24:41 -0800 Subject: [Genome] known gene IDs (kgId) refer to wrong Ensembl transcript ID (ENST) in knownToEnsembl table In-Reply-To: <4732D653.5020907@dmbr.ugent.be> References: <4732D653.5020907@dmbr.ugent.be> Message-ID: <4733B6D9.9080207@cse.ucsc.edu> Paco, The Ensembl Gene Predictions track, ensGene, is a prediction track. Notice that ACCN1 and ABCB8 are neighbors on chr7. There are a few predicted ENSTs that span both of those genes. This is why you are seeing ENSG00000197150 correspond to the two genes. Our ensGene and ensGtp tables are out of date, and updating them is on or list of things to do. I can let you know when this information is updated. I hope this information is helpful to you. Please don't hesitate to contact us again if you require further assistance. Kayla Smith UCSC Genome Bioinformatics Group Paco Hulpiau wrote: > Hi, > > I want to get transcripts (both refseq NM_s and ensembl ENSTs) for every > gene by using the approved symbols from HGNC. The script to do the job > stopped because it found a duplicate entry for a hgncId and was > apparently caused by the entries below. > > The HGNC symbols are searched in the kgXref table to get the RefSeqs and > I use the known gene ID (kgId) to get the Ensembl transcripts. In the > knownToEnsembl table I get one ENST id using the kgId and then use this > ENST to get the ENSG id in the ensGtp table. If I have the ENSG I can > get all ENSTs for the gene again in the ensGtp table. > > Both ACCN3 and ABCB8 lead to ENSG00000197150 (ABCB8). I think the kgIds > for ACCN3 are referring to the wrong ENST ids in the knownToEnsembl > table. Same for ASIC3. In the Ensembl genome browser ENST00000356058 is > a transcript for ABCB8 and not for ACCN3 or for ASIC3. Is this correct > or am I missing something here? > > kgXref table knownToEnsembl table ensGtp table > [kgId] (using kgId) (using ENST) > uc003win.1 NM_004769 ACCN3 ENST00000356058 ENSG00000197150 > uc003wio.1 NM_020321 ACCN3 ENST00000356058 ENSG00000197150 > uc003wip.1 NM_020322 ACCN3 ENST00000356058 ENSG00000197150 > uc003wik.1 NM_007188 ABCB8 ENST00000358849 ENSG00000197150 > uc003wiq.1 AB209421 ASIC3 ENST00000356058 ENSG00000197150 > uc003wil.1 AK002018 ABCB8 ENST00000297504 ENSG00000197150 > uc003wim.1 AK094005 ABCB8 ENST00000358849 ENSG00000197150 > uc003wij.1 CR599833 ABCB8 ENST00000356058 ENSG00000197150 > uc003wii.1 AK128129 ABCB8 ENST00000356058 ENSG00000197150 > > Another thing I've noticed (e.g. for PCDH11X) is that some ENSTs are > deprecated in Ensembl but are still in the ensGtp table. Now I also use > the Ensembl API to look if all ENSTs found for a certain gene have a > stable ID or not. > > Maybe there is another way to do the job? Thanks for any help or comment. > > Regards, > > Paco > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From rhead at soe.ucsc.edu Thu Nov 8 17:47:07 2007 From: rhead at soe.ucsc.edu (Brooke Rhead) Date: Thu, 08 Nov 2007 17:47:07 -0800 Subject: [Genome] looking for the table containing the exact information of a side by side In-Reply-To: References: <47325147.3060705@soe.ucsc.edu> Message-ID: <4733BC1B.3010500@soe.ucsc.edu> Hi Xiaolu, I think what you are saying is that you would also like to see qName and qEnd listed in this table. You can, as you say, get it from the chainMm9 table. There is an easy way to do part of this with the Table Browser: first select the chainMm9Link table, then choose "output format: selected fields from primary and related tables". Now you can select the chainMm9 table, and select the qName and qStrand fields. The qEnd field could be generated from qStart and the information in the target fields. You would need to subtract tEnd from tStart to get the size of the alignment, then add that number to the qStart. I believe the Galaxy web site run by Penn State (http://main.g2.bx.psu.edu/) has tools to manipulate columns in this manner (look under the "Text Manipulation" header on the left-hand side of the page). I hope this information is helpful. -- Brooke Rhead UCSC Genome Bioinformatics Group Xiaolu Huang wrote: > Dear Dr. Rhead, > > Thanks a lot for the prompt reply! The information you provided is very > helpful. I have found the file I am looking for. > > for the table of chainMm9Link. Here is the concern: > > At the table brower page, I have entered "human" for genome, track > "mouse chain" position "chr21". and for chainMm9Link table results > without any filters and with all attribute information. I have got > > ************************************************************ > #bin tName tStart tEnd qStart chainId > 935 chr21 46001419 46001439 53663552 212 > 935 chr21 46001441 46001479 53663572 212 > 935 chr21 46001488 46001514 53663610 212 > ************************************************************* > > At the tName, I would rather see the mouse chromosome name, and mouse > treated as the target and human as the query. But I can also go to the > chainMm9 table and get the information. > > Thanks! > > Xiaolu > From ann at soe.ucsc.edu Fri Nov 9 08:11:12 2007 From: ann at soe.ucsc.edu (Ann Zweig) Date: Fri, 09 Nov 2007 08:11:12 -0800 Subject: [Genome] there might have some bug in your liftover batch program? In-Reply-To: <47348523.4632AFFC@uhnres.utoronto.ca> References: <472B6F0A.459B06E3@uhnres.utoronto.ca> <472B9B20.9080102@cse.ucsc.edu> <47348523.4632AFFC@uhnres.utoronto.ca> Message-ID: <473486A0.8090203@soe.ucsc.edu> Hello Zxu, There is a 50,000 bp telomeric gap at the very beginning of chr11 in both assemblies. The liftOver tool skips over this gap. Gap Type: telomere Bridged: no Position: chr11:1-50000 Band: 11p15.5 Genomic Size: 50000 Regards, ---------- Ann Zweig UCSC Genome Bioinformatics Group http://genome.ucsc.edu zxu wrote: > hi, > > might have some bug in your liftover batch program? > I am trying to conver 2004 Human version to 2006 Human version. when convert > chr11:1-1000000, it give back result chr11:50001-1000000, missing 50000 bp. for > all other conversion, it seems OK. > > zd > TWRI > UHN, U of toronto > > Ann Zweig wrote: > >> Hello zxu, >> >> It is possible to do what you are suggesting by making a Custom Track >> from the refSeq track. When you do this, you can limit the genes you >> would like to display in your track. >> >> If you have never made a Custom Track using the Table Browser, you can >> read about it here: http://genome.ucsc.edu/goldenPath/help/hgTablesHelp.html >> >> Configure the Table Browser by choosing the refSeq table. Then enter >> your list of genes into the 'identifiers (names/accessions): "paste >> list"' (e.g. DUSP22 IRF4). Choose "custom track" as your output format. >> Then follow the prompts to view your newly-created Custom Track in the >> Genome Browser. >> >> I hope this information is helpful to you. Please don't hesitate to >> contact the mail list again if you require further assistance. >> >> Regards, >> >> ---------- >> Ann Zweig >> UCSC Genome Bioinformatics Group >> http://genome.ucsc.edu >> >> Please feel free to search the Genome mailing list archives by visiting >> our home page, clicking on "Contact Us", then typing a word or phrase >> into the search box. On that same page >> (http://genome.ucsc.edu/contacts.html), you can subscribe to the Genome >> mailing list. >> >> zxu wrote: >>> I had a question about how to customize refseq track? bacically, I want >>> to show certain refseq within certain chromosome but not all of them? >>> let's say in chr6:200000-1000000, there are gene DUSP22, IRF4, EXOC2, >>> HUS1B. but I want to only show DUSP22, IRF4, not the other two genes. >>> since DUSP22, IRF4 are realted to IFN pathway, let's say? >>> >>> Best regrads >>> >>> zd >>> >>> >>> >>> >>> _______________________________________________ >>> Genome maillist - Genome at soe.ucsc.edu >>> http://www.soe.ucsc.edu/mailman/listinfo/genome From zxu at uhnres.utoronto.ca Fri Nov 9 08:04:51 2007 From: zxu at uhnres.utoronto.ca (zxu) Date: Fri, 09 Nov 2007 11:04:51 -0500 Subject: [Genome] there might have some bug in your liftover batch program? References: <472B6F0A.459B06E3@uhnres.utoronto.ca> <472B9B20.9080102@cse.ucsc.edu> Message-ID: <47348523.4632AFFC@uhnres.utoronto.ca> hi, might have some bug in your liftover batch program? I am trying to conver 2004 Human version to 2006 Human version. when convert chr11:1-1000000, it give back result chr11:50001-1000000, missing 50000 bp. for all other conversion, it seems OK. zd TWRI UHN, U of toronto Ann Zweig wrote: > Hello zxu, > > It is possible to do what you are suggesting by making a Custom Track > from the refSeq track. When you do this, you can limit the genes you > would like to display in your track. > > If you have never made a Custom Track using the Table Browser, you can > read about it here: http://genome.ucsc.edu/goldenPath/help/hgTablesHelp.html > > Configure the Table Browser by choosing the refSeq table. Then enter > your list of genes into the 'identifiers (names/accessions): "paste > list"' (e.g. DUSP22 IRF4). Choose "custom track" as your output format. > Then follow the prompts to view your newly-created Custom Track in the > Genome Browser. > > I hope this information is helpful to you. Please don't hesitate to > contact the mail list again if you require further assistance. > > Regards, > > ---------- > Ann Zweig > UCSC Genome Bioinformatics Group > http://genome.ucsc.edu > > Please feel free to search the Genome mailing list archives by visiting > our home page, clicking on "Contact Us", then typing a word or phrase > into the search box. On that same page > (http://genome.ucsc.edu/contacts.html), you can subscribe to the Genome > mailing list. > > zxu wrote: > > I had a question about how to customize refseq track? bacically, I want > > to show certain refseq within certain chromosome but not all of them? > > let's say in chr6:200000-1000000, there are gene DUSP22, IRF4, EXOC2, > > HUS1B. but I want to only show DUSP22, IRF4, not the other two genes. > > since DUSP22, IRF4 are realted to IFN pathway, let's say? > > > > Best regrads > > > > zd > > > > > > > > > > _______________________________________________ > > Genome maillist - Genome at soe.ucsc.edu > > http://www.soe.ucsc.edu/mailman/listinfo/genome From szheng3 at jhmi.edu Fri Nov 9 14:12:11 2007 From: szheng3 at jhmi.edu (SIKA ZHENG) Date: Fri, 09 Nov 2007 14:12:11 -0800 Subject: [Genome] track on genome browser Message-ID: Hi, I have a question when reading the sequence scheme on the genome browser. What is the difference when the intron is marked as such: 1. >>>>>>>>>>>. 2. _________________ _________________ and there is hybrid between them too. i am also confused by the meaning of blue, orange, black, green exon. I look forward to the answer from experts. Thanks, Sika From ann at soe.ucsc.edu Fri Nov 9 15:09:57 2007 From: ann at soe.ucsc.edu (Ann Zweig) Date: Fri, 09 Nov 2007 15:09:57 -0800 Subject: [Genome] track on genome browser In-Reply-To: References: Message-ID: <4734E8C5.7060603@soe.ucsc.edu> Hello Sika, The way the introns are marked depend on a number of things including how closely zoomed in you are, how 'crowded' the display is, which track you are looking at, etc. That said, I can give you some general information. We try to indicate the direction of transcription on the gene and gene prediction tracks, when possible. We will place arrows on the introns or exons, or both. As for the coloring of the exons, you will need to give me a bit more information. What assembly are you looking at (human, mouse), and what annotation track are you viewing (Known Genes, Ref Seq Genes, etc.)? You can always read about the details behind a track (description, methods, display, credits, references) by pressing on the 'mini-button' to the left of the actual track display, or by clicking on the hyperlinked track name in the track controls (below the display). It is likely that this page will contain the answer to your exon-color question. If not, please don't hesitate to contact the mail list again with a more detailed question. Regards, ---------- Ann Zweig UCSC Genome Bioinformatics Group http://genome.ucsc.edu Please feel free to search the Genome mailing list archives by visiting our home page, clicking on "Contact Us", then typing a word or phrase into the search box. On that same page (http://genome.ucsc.edu/contacts.html), you can subscribe to the Genome mailing list. SIKA ZHENG wrote: > Hi, > > I have a question when reading the sequence scheme on the genome browser. What is the difference when the intron is marked as such: > 1. >>>>>>>>>>>. > 2. _________________ > _________________ > > and there is hybrid between them too. > > i am also confused by the meaning of blue, orange, black, green exon. > > I look forward to the answer from experts. > Thanks, > Sika > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From taysk at novasprint.com Mon Nov 12 05:42:50 2007 From: taysk at novasprint.com (Tay Sen Kwan) Date: Mon, 12 Nov 2007 21:42:50 +0800 Subject: [Genome] liftOver using over.chain Message-ID: <4738585A.6030009@novasprint.com> Hello, I understand from past e-mail queries that the difference between the all.chain and the over.chain liftover files were that the latter contain "filtered results and that there are no duplicates." I am using the hg18To....over.chain files to perform liftover of individual exons of single-copy, human cDNAs to other genomes, I would like to find out if there were certain families of repeats that were not filtered out in these files; or alternatively, what are the families of repeats that are filtered out ? Many thanks. Regards, Sen Kwan From rhead at soe.ucsc.edu Mon Nov 12 13:44:26 2007 From: rhead at soe.ucsc.edu (Brooke Rhead) Date: Mon, 12 Nov 2007 13:44:26 -0800 Subject: [Genome] liftOver using over.chain In-Reply-To: <4738585A.6030009@novasprint.com> References: <4738585A.6030009@novasprint.com> Message-ID: <4738C93A.4090805@soe.ucsc.edu> Hello Sen Kwan, I think you are referring to this previously-answered question: http://www.soe.ucsc.edu/pipermail/genome/2005-December/009336.html Let me clarify that answer a little bit. The over.chain file is filtered so that there are no duplicate chains in any particular region (unlike the all.chain files, which can have multiple chains in a single region). The all.chain files are the chained blastz alignments that correspond to what is displayed in the Chain tracks in the Genome Browser. The over.chain files consist of chained and netted alignments (see this previously-answered question: http://www.soe.ucsc.edu/pipermail/genome/2006-February/009717.html) in the chain file format. So, the over.chain files have been filtered to remove multiple chains, not to remove duplicate regions (repeats). Repeats are considered during the chaining process. Please see the chain description page for more information, including this paper listed in the references section: Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: Duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. http://www.pnas.org/cgi/content/abstract/1932072100v1 You also might find this general description of chains and nets on our genomewiki useful: http://genomewiki.ucsc.edu/index.php/Chains_Nets . I hope this information helps. If you have further questions, please feel free to contact us again. -- Brooke Rhead UCSC Genome Bioinformatics Group Tay Sen Kwan wrote: > Hello, > > I understand from past e-mail queries that the difference between the > all.chain and the over.chain liftover files were that the latter contain > "filtered results and that there are no duplicates." I am using the > hg18To....over.chain files to perform liftover of individual exons of > single-copy, human cDNAs to other genomes, I would like to find out if > there were certain families of repeats that were not filtered out in > these files; or alternatively, what are the families of repeats that are > filtered out ? Many thanks. > > Regards, > > Sen Kwan > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From christoph at bock.name Mon Nov 12 15:35:50 2007 From: christoph at bock.name (Christoph Bock) Date: Tue, 13 Nov 2007 00:35:50 +0100 Subject: [Genome] Question regarding visualization of NimbleGen ChIP-chip data Message-ID: <003101c82584$c8d19bb0$4722138b@infno3607> Hello, I was wondering how NimbleGen ChIP-chip data should best be imported into the UCSC Genome Browser, for visualization and analysis. The very simple approach - uploading the GFF files from the NimbleGen DVDs as custom tracks - doesn't really produce the desired result (it just displays the oligomer positions rather than their values). I'm sure that you performed this task a hundred times during the ENCODE project, and I'd be grateful if you could give me a quick hint or even a set of scripts that help me prepare the standard NimbleGen DVD data (pair format or GFF format) for visualization in UCSC Genome Browser. Thanks in advance! Christoph From hiram at soe.ucsc.edu Mon Nov 12 16:13:46 2007 From: hiram at soe.ucsc.edu (Hiram Clawson) Date: Mon, 12 Nov 2007 16:13:46 -0800 Subject: [Genome] Question regarding visualization of NimbleGen ChIP-chip data In-Reply-To: <003101c82584$c8d19bb0$4722138b@infno3607> References: <003101c82584$c8d19bb0$4722138b@infno3607> Message-ID: <4738EC3A.7090205@soe.ucsc.edu> Good Afternoon Christoph: Can you find some data on the DVD that has position and value ? Something like a two column format. Or perhaps something like a four column format: chrom, start, end, value If you can find the value data, that is most likely the wiggle data you need to make a graph. This is one of the standard formats for custom tracks. --Hiram Christoph Bock wrote: > Hello, > > I was wondering how NimbleGen ChIP-chip data should best be imported into > the UCSC Genome Browser, for visualization and analysis. The very simple > approach - uploading the GFF files from the NimbleGen DVDs as custom tracks > - doesn't really produce the desired result (it just displays the oligomer > positions rather than their values). > > I'm sure that you performed this task a hundred times during the ENCODE > project, and I'd be grateful if you could give me a quick hint or even a set > of scripts that help me prepare the standard NimbleGen DVD data (pair format > or GFF format) for visualization in UCSC Genome Browser. > > Thanks in advance! > Christoph > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome > From nuclearn at gmail.com Mon Nov 12 22:15:13 2007 From: nuclearn at gmail.com (Li Xiao) Date: Tue, 13 Nov 2007 14:15:13 +0800 Subject: [Genome] exon number for Entrez gene Message-ID: <150864390711122215m22379e9dlc14067e4a4139961@mail.gmail.com> HI, ALL I have currently a Entrez gene list. How to obtain the number of exons (or gene structure) for each gene? Thanks. Li Xiao -- ********************************************************************* Li Xiao Sichuan Key Laboratory of Molecular Biology and Biotechnology College of Life Science, Sichuan University Chengdu, SiChuan, P.R.China TEL:86-28-85470083 FAX:86-28-85412738 E-MAIL: nuclearn at gmail.com URL: http://scbi.scu.edu.cn ********************************************************************** From dessen at igr.fr Tue Nov 13 00:34:36 2007 From: dessen at igr.fr (Philippe DESSEN) Date: Tue, 13 Nov 2007 09:34:36 +0100 Subject: [Genome] ERVWE1 human gene Message-ID: Hi, It seems me curious that in the last hg18 build for Human genome, there is no presence of the ERVWE1 gene (NM_014590) located in 7q21. Are there some specific policies regarding the location of all Refseq NM_ entries ?? Best regards Philippe Dessen ------------------------------------------------------------------------ ------------ Philippe DESSEN Laboratoire "G?nomes et Cancers" FRE 2939 CNRS Institut de Canc?rologie Gustave Roussy - PR1 39 rue Camille Desmoulins 94805 VILLEJUIF Cedex - tel : 0142114490 fax : 0142116267 e-mail : dessenATigr.fr ------------------------------------------------------------------------ ------------ Soutenez le mouvement SAUVONS LA RECHERCHE : http://recherche-en-danger.apinc.org From ann at soe.ucsc.edu Tue Nov 13 09:35:04 2007 From: ann at soe.ucsc.edu (Ann Zweig) Date: Tue, 13 Nov 2007 09:35:04 -0800 Subject: [Genome] ERVWE1 human gene In-Reply-To: References: Message-ID: <4739E048.40609@soe.ucsc.edu> Hello Philippe, The process that we used to create the Known Genes annotation track in the browser changed dramatically between the hg17 assembly release and the hg18 release. You can read about the two different methods used to create the tracks by pressing on the 'mini-button' to the left of the actual track display, or by clicking on the hyperlinked track name in the track controls (below the display) in each of the two assemblies. The gene you reference, ERVWE1 is found in the Known Gene track of the hg17 assembly, but not, as you point out, in the hg18 assembly. You will note that it is missing from the RefSeq Genes track in both assemblies. When I view that area of the genome in the browser (for both assemblies) I see that there is a LTR in that location. You can turn on the Repeat Masker track to see this LTR. Although this particular gene was located by BLAT, it did not pass the pslCdnaFilter filter because of the LTR. The pslCdnaFilter tool throws out any alignment that doesn't have at least 16 bases of non-repeat-masked target sequence. This is a case of a good gene getting rejected because it sits completely within an LTR. This part of the filtering process is slated for refactoring in the future so as not to reject good genes such as this one. I hope this helps explain the missing gene. Please don't hesitate to contact the mail list again if you require further assistance. Regards, ---------- Ann Zweig UCSC Genome Bioinformatics Group http://genome.ucsc.edu Philippe DESSEN wrote: > Hi, > > It seems me curious that in the last hg18 build for Human genome, there > is no presence of > the ERVWE1 gene (NM_014590) located in 7q21. > > Are there some specific policies regarding the location of all Refseq > NM_ entries ?? > > > Best regards > > Philippe Dessen > > ------------------------------------------------------------------------ > ------------ > Philippe DESSEN > Laboratoire "G?nomes et Cancers" FRE 2939 CNRS > Institut de Canc?rologie Gustave Roussy - PR1 39 rue Camille > Desmoulins > 94805 VILLEJUIF Cedex - tel : 0142114490 fax : 0142116267 e-mail : > dessenATigr.fr > ------------------------------------------------------------------------ > ------------ > Soutenez le mouvement SAUVONS LA RECHERCHE : > http://recherche-en-danger.apinc.org > > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From fyfe at cvm.msu.edu Tue Nov 13 09:23:47 2007 From: fyfe at cvm.msu.edu (John Fyfe) Date: Tue, 13 Nov 2007 12:23:47 -0500 Subject: [Genome] problem with browser Message-ID: <47399753020000F600044E37@mail1.cvm.msu.edu> When we try to open any chr8 area from the dog (2005 assembly), we get teh following error: External file /gbdb/genbank/./data/processed/genbank.160.0/full/mrna.fa cannot be opened or has wrong size. Old size 1882991161, new size -1, error No such file or directory What to do? John C. Fyfe, D.V.M., Ph.D. Associate Professor of Microbiology and Molecular Genetics Laboratory of Comparative Medical Genetics 2209 Biomedical Physical Sciences Michigan State University East Lansing, MI 48824-4320 Voice: (517) 355-6463 ext 1559 Fax: (517) 353-8957 Email: fyfe at cvm.msu.edu http://www.mmg.msu.edu/faculty/fyfe.htm From ann at soe.ucsc.edu Tue Nov 13 09:49:17 2007 From: ann at soe.ucsc.edu (Ann Zweig) Date: Tue, 13 Nov 2007 09:49:17 -0800 Subject: [Genome] problem with browser In-Reply-To: <47399753020000F600044E37@mail1.cvm.msu.edu> References: <47399753020000F600044E37@mail1.cvm.msu.edu> Message-ID: <4739E39D.4010702@soe.ucsc.edu> Hello John, We just did a big update to all GenBank-related tracks in the dog (and other) assemblies. My suspicion is that you have had that browser window open for a long time and there is some old information cached which is clashing with the new information. My suggestion is for you to reset your cart. You can do that by visiting the gateway page and pressing "Click here to reset the browser user interface settings to their defaults", or by visiting this URL: http://genome.ucsc.edu/cgi-bin/cartReset Please be sure to let us know if this does not immediately fix the problem. Regards, ---------- Ann Zweig UCSC Genome Bioinformatics Group http://genome.ucsc.edu Please feel free to search the Genome mailing list archives by visiting our home page, clicking on "Contact Us", then typing a word or phrase into the search box. On that same page (http://genome.ucsc.edu/contacts.html), you can subscribe to the Genome mailing list. John Fyfe wrote: > When we try to open any chr8 area from the dog (2005 assembly), we get teh following error: > External file /gbdb/genbank/./data/processed/genbank.160.0/full/mrna.fa cannot be opened or has wrong size. Old size 1882991161, new size -1, error No such file or directory > > > What to do? > > John C. Fyfe, D.V.M., Ph.D. > Associate Professor of Microbiology > and Molecular Genetics > > Laboratory of Comparative Medical Genetics > 2209 Biomedical Physical Sciences > Michigan State University > East Lansing, MI 48824-4320 > > Voice: (517) 355-6463 ext 1559 > Fax: (517) 353-8957 > Email: fyfe at cvm.msu.edu > http://www.mmg.msu.edu/faculty/fyfe.htm > > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From ann at soe.ucsc.edu Tue Nov 13 10:55:30 2007 From: ann at soe.ucsc.edu (Ann Zweig) Date: Tue, 13 Nov 2007 10:55:30 -0800 Subject: [Genome] exon number for Entrez gene In-Reply-To: <150864390711122215m22379e9dlc14067e4a4139961@mail.gmail.com> References: <150864390711122215m22379e9dlc14067e4a4139961@mail.gmail.com> Message-ID: <4739F322.7010006@soe.ucsc.edu> Hello Li, It depends on exactly what format your gene names are in. If they look similar to this: GABRA3 (called the "Official Symbol" on the Entrez website), then you can just search for that name directly in the UCSC Genome Browser. If your gene names are actually the "GeneID" (from the Entrez website) similar to this: 2556, then you will have to use the Table Browser on our website to convert them into UCSC Gene Names before proceeding. You can use the knownToLocusLink table to do this conversion. Once you have located the gene in the Genome Browser, click directly on the annotation to go to the details page. On this page you will see information about the number of exons and more. Or you can visit the Table Browser to get the exact location of the exons. Paste the list of gene names into the 'identifiers' section, and output 'all fields' from the knownGene table. I hope this information is helpful to you. Please don't hesitate to contact the mail list again if you require further assistance, or need more detailed instructions. Regards, ---------- Ann Zweig UCSC Genome Bioinformatics Group http://genome.ucsc.edu Please feel free to search the Genome mailing list archives by visiting our home page, clicking on "Contact Us", then typing a word or phrase into the search box. On that same page (http://genome.ucsc.edu/contacts.html), you can subscribe to the Genome mailing list. Li Xiao wrote: > HI, ALL > > I have currently a Entrez gene list. How to obtain the number of exons (or > gene structure) for each gene? > Thanks. > > Li Xiao From einat at duke.edu Tue Nov 13 13:11:48 2007 From: einat at duke.edu (Einat Hazkani-Covo) Date: Tue, 13 Nov 2007 16:11:48 -0500 Subject: [Genome] ucsc test site Message-ID: <0BDD5E6B-1421-431D-AC59-4EAFAE6595D6@duke.edu> Hi, I saw that the orangutan genome is already available in your test site (genome-test.cse.ucsc.edu) and I wonder when it suppose to be available in the regular site. Also, is there a way to download the net file orangutan-human appear in the test version (and the orangutan genome sequence itself). Are nets to/from chimp and rhesus are planned soon? Many thanks, Einat From hiram at soe.ucsc.edu Tue Nov 13 13:27:13 2007 From: hiram at soe.ucsc.edu (Hiram Clawson) Date: Tue, 13 Nov 2007 13:27:13 -0800 Subject: [Genome] ucsc test site In-Reply-To: <0BDD5E6B-1421-431D-AC59-4EAFAE6595D6@duke.edu> References: <0BDD5E6B-1421-431D-AC59-4EAFAE6595D6@duke.edu> Message-ID: <473A16B1.4040700@soe.ucsc.edu> Good Afternoon Einat: I am currently working on the Orang genome browser. I expect it to be available in a couple of weeks on the public site. --Hiram Einat Hazkani-Covo wrote: > Hi, > > I saw that the orangutan genome is already available in your test > site (genome-test.cse.ucsc.edu) and I wonder when it suppose to be > available in the regular site. > Also, is there a way to download the net file orangutan-human appear > in the test version (and the orangutan genome sequence itself). Are > nets to/from chimp and rhesus are planned soon? > > Many thanks, > > Einat From crabtree at jcvi.org Wed Nov 14 12:00:27 2007 From: crabtree at jcvi.org (Crabtree, Jonathan) Date: Wed, 14 Nov 2007 15:00:27 -0500 Subject: [Genome] Citing the UCSC genome database/data in a web application. Message-ID: Hi- Our (non-profit) organization is working on a web-accessible resource in which we plan to use the data from the hg18 cytoBandIdeo table to: 1. Provide an ideogram-based entry page for our site. 2. Enable searches by cytogenetic band. >From reading the UCSC documentation and FAQs I gather that we are free to do this as long as we make the source of the data clear by citing the following references: Karolchik, D., Baertsch, R., Diekhans, M., Furey, T.S., Hinrichs, A., Lu, Y.T., Roskin, K.M., Schwartz, M., Sugnet, C.W., Thomas, D.J., Weber, R.J., Haussler, D. and Kent, W.J. The UCSC Genome Browser Database. Nucl. Acids Res 31(1), 51-54 (2003). (as per http://genome.ucsc.edu/cite.html) Furey, T.S., and Haussler, D. Integration of the cytogenetic map with the draft human genome sequence, Hum. Mol. Gen., 12(9), 1037-1044 (2003). (as per http://genome.ucsc.edu/cgi-bin/hgTrackUi?c=chr1&g=cytoBand) Assuming this is correct, my question is whether you have any specific guidelines on _how_ to include these references in a web-based application, or any resource that is not a standard publication per se? We are planning to include a general "credits" or "about" page that would list all of the references, including those for the UCSC data. Would this be acceptable? Thanks in advance, Jonathan From nicole.leahy at jax.org Wed Nov 14 12:35:56 2007 From: nicole.leahy at jax.org (nicole.leahy at jax.org) Date: Wed, 14 Nov 2007 15:35:56 -0500 (EST) Subject: [Genome] repetitive element names Message-ID: <29095687.1195072556434.JavaMail.ocsadmin@jcs-mid-prod.jax.org> Dear USCS: I am conducting research on repetitive elements in inbred lines of mice. In retrieving names of repeats from the SQL server, I noticed some oddities that may or may not be mistakes. I have been accessing: DATABASE: mm8 TABLE: chr1_rmsk FIELD: repName First, there a several pairs (and some triplets) of names where the second is the name of the first followed by "-int." For example: IAPLTR2_Mm IAPLTR2_Mm-int IAPLTR2_Mm-int-int Also, there are pairs where the second name is the first followed by an underscore. For example: MER34C MER34C_ Are these distinct repeats, typos, or the result of non-standardized nomeclature? Sincerely, Nicole Leahy From donnak at soe.ucsc.edu Wed Nov 14 13:39:30 2007 From: donnak at soe.ucsc.edu (Donna Karolchik) Date: Wed, 14 Nov 2007 13:39:30 -0800 Subject: [Genome] Citing the UCSC genome database/data in a web application. References: Message-ID: <048d01c82707$8f09af10$14a8a8c0@donnakLT> hi Jonathan, Your citations sound fine. It should be sufficient to list them on your credits or about page. I also recommend that you indicate the genome assembly (hg18) that the cytoBandIdeo table came from somewhere on your resource, as this information may be important to your users. -Donna ----------------------------------- Donna Karolchik UCSC Genome Bioinformatics Group http://genome.ucsc.edu ----- Original Message ----- From: "Crabtree, Jonathan" To: Sent: Wednesday, November 14, 2007 12:00 PM Subject: [Genome] Citing the UCSC genome database/data in a web application. > > Hi- > > Our (non-profit) organization is working on a web-accessible > resource in which we plan to use the data from the hg18 > cytoBandIdeo table to: > > 1. Provide an ideogram-based entry page for our site. > 2. Enable searches by cytogenetic band. > >>From reading the UCSC documentation and FAQs I gather that we >>are free to do this as long as we make the source of the data >>clear by citing the > following references: > > Karolchik, D., Baertsch, R., Diekhans, M., Furey, T.S., > Hinrichs, A., Lu, Y.T., Roskin, K.M., Schwartz, M., Sugnet, > C.W., Thomas, D.J., Weber, > R.J., Haussler, D. and Kent, W.J. The UCSC Genome Browser > Database. Nucl. Acids Res 31(1), 51-54 (2003). > (as per http://genome.ucsc.edu/cite.html) > > Furey, T.S., and Haussler, D. Integration of the cytogenetic map > with the draft human genome sequence, Hum. Mol. Gen., 12(9), > 1037-1044 (2003). > (as per > http://genome.ucsc.edu/cgi-bin/hgTrackUi?c=chr1&g=cytoBand) > > Assuming this is correct, my question is whether you have any > specific guidelines on _how_ to include these references in a > web-based > application, or any resource that is not a standard publication > per se? > > We are planning to include a general "credits" or "about" page > that would list all of the references, including those for the > UCSC data. > Would this be acceptable? Thanks in advance, > > Jonathan > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome > From Hebbring.Scott at mayo.edu Wed Nov 14 14:11:47 2007 From: Hebbring.Scott at mayo.edu (Hebbring, Scott J.) Date: Wed, 14 Nov 2007 16:11:47 -0600 Subject: [Genome] illumina 1M chip Message-ID: <3E5CA71A80DB304B86D502300412165601C25260@MSGEBE12.mfad.mfroot.org> Will Illumina's 1M chip SNP positions be available part of the SNP Arrays in the Variation and Repeat section? Thanks Scott Hebbring From rhead at soe.ucsc.edu Wed Nov 14 16:37:53 2007 From: rhead at soe.ucsc.edu (Brooke Rhead) Date: Wed, 14 Nov 2007 16:37:53 -0800 Subject: [Genome] repetitive element names In-Reply-To: <29095687.1195072556434.JavaMail.ocsadmin@jcs-mid-prod.jax.org> References: <29095687.1195072556434.JavaMail.ocsadmin@jcs-mid-prod.jax.org> Message-ID: <473B94E1.30600@soe.ucsc.edu> Hello Nicole, We have forwarded your question to the author of the RepeatMasker program. I'll let you know what we find out. -- Brooke Rhead UCSC Genome Bioinformatics Group nicole.leahy at jax.org wrote: > Dear USCS: > > I am conducting research on repetitive elements in inbred lines of mice. In retrieving names of repeats from the SQL server, I noticed some oddities that may or may not be mistakes. I have been accessing: > > DATABASE: mm8 > TABLE: chr1_rmsk > FIELD: repName > > First, there a several pairs (and some triplets) of names where the second is the name of the first followed by "-int." For example: > > IAPLTR2_Mm > IAPLTR2_Mm-int > IAPLTR2_Mm-int-int > > Also, there are pairs where the second name is the first followed by an underscore. For example: > > MER34C > MER34C_ > > Are these distinct repeats, typos, or the result of non-standardized nomeclature? > > Sincerely, > > Nicole Leahy > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From yaelshemla at gmail.com Thu Nov 15 01:39:23 2007 From: yaelshemla at gmail.com (yael shemla) Date: Thu, 15 Nov 2007 11:39:23 +0200 Subject: [Genome] question about double records in refseq database Message-ID: Hi, I would like to get your help about the refseq database: 1. The refseq databse should give deatils about full length transcript. If so, why sometimes for the same gene (same genesymbol name) there are several refseq records? is it alternative splicing forms? and if so, why dont i get all the alternative splicing for all genes? 2. Why there are double records sometimes with the same refseq name? 3. I searched for the refseq "NM_010699",on mouse genome March2005. when i enter the "NM_010699", i get details only about the chr7:40932461-40940836 position but when I search in this table with all the refseq in chr14, i get in the list the NM_010699 gene. why i don't get the two details when i search for the gene in the first time? Thanks a lot, Yael. From wby0 at CDC.GOV Thu Nov 15 08:30:55 2007 From: wby0 at CDC.GOV (Yu, Wei (CDC/CCHP/NOPHG)) Date: Thu, 15 Nov 2007 11:30:55 -0500 Subject: [Genome] Convert to chr position numbers Message-ID: <73C02CEE8005C24F8421250C1AFB2AAB2BFBDA@LTA3VS002.ees.hhs.gov> Hi, I am interested in building a customized track for our data. I have a question regarding to how to obtain the starting and ending positions on a gene, like chr7 127471196 127472363. You help would be highly appreciated. Thanks, Wei Wei Yu PhD, MS The National Office of Public Health Genomics Centers for Disease Control and Prevention 4770 Buford Hwy, Mail Stop K89 Atlanta, GA 30341 Phone: 770-488-8435 Fax: 770-488-8336 email: WYu at cdc.gov Visit: www.hugenavigator.net From viswanathl at mail.nih.gov Thu Nov 15 09:13:08 2007 From: viswanathl at mail.nih.gov (Viswanath, Lalitha (NIH/NCI) [C]) Date: Thu, 15 Nov 2007 12:13:08 -0500 Subject: [Genome] Query regarding mRNA Alignments provided by UCSC Message-ID: Hi Does UCSC provide an alignment of only RefSeq mRNAs against the human genome build 36? Thanks Lalitha From hiram at soe.ucsc.edu Thu Nov 15 09:47:33 2007 From: hiram at soe.ucsc.edu (Hiram Clawson) Date: Thu, 15 Nov 2007 09:47:33 -0800 Subject: [Genome] Convert to chr position numbers In-Reply-To: <73C02CEE8005C24F8421250C1AFB2AAB2BFBDA@LTA3VS002.ees.hhs.gov> References: <73C02CEE8005C24F8421250C1AFB2AAB2BFBDA@LTA3VS002.ees.hhs.gov> Message-ID: <473C8635.9090802@soe.ucsc.edu> Good Morning: Can you clarify what you are referring to ? Which genome assembly are you viewing ? What starting and ending positions do you mean ? What do you mean by "chr position numbers" ? When you see a gene of interest in the genome browser, click on the gene to obtain more information about that gene. You should find all information about the gene by clicking on the gene in the genome browser display image. --Hiram Yu, Wei (CDC/CCHP/NOPHG) wrote: > Hi, > > I am interested in building a customized track for our data. I have a > question regarding to how to obtain the starting and ending positions on > a gene, like chr7 127471196 127472363. You help would be highly > appreciated. > Thanks, > Wei From hiram at soe.ucsc.edu Thu Nov 15 09:49:32 2007 From: hiram at soe.ucsc.edu (Hiram Clawson) Date: Thu, 15 Nov 2007 09:49:32 -0800 Subject: [Genome] Query regarding mRNA Alignments provided by UCSC In-Reply-To: References: Message-ID: <473C86AC.5050207@soe.ucsc.edu> Good Morning Lalitha: Can you clarify your question. Are you looking at the RefSeq gene track on the genome browser and have a question about it ? More information about any track, how it was built, and so forth, can be obtained by clicking on the button in the track on the left margin of the display. --Hiram Viswanath, Lalitha (NIH/NCI) [C] wrote: > Hi > > Does UCSC provide an alignment of only RefSeq mRNAs against the human > genome build 36? > > Thanks > > Lalitha From kuhn at soe.ucsc.edu Thu Nov 15 10:10:14 2007 From: kuhn at soe.ucsc.edu (Robert Kuhn) Date: Thu, 15 Nov 2007 10:10:14 -0800 Subject: [Genome] problem with browser Message-ID: <200711151810.KAA27392@moondance.cse.ucsc.edu> Hello, John, Your question prompted us to do a bit of investigation and we discovered that the dog alignments had not actually been updated as recently as those of the other organisms. The situation has been rectivied and the dog is once again being updated on a nightly basis. Thank you for bringing this to our attention. We are sorry for any difficulty the situation may have caused. best wishes, --b0b kuhn ucsc genome bioinformatics group > From genome at soe.ucsc.edu Tue Nov 13 09:51:23 2007 > To: John Fyfe > Cc: genome at soe.ucsc.edu > Subject: Re: [Genome] problem with browser > > Hello John, > > We just did a big update to all GenBank-related tracks in the dog (and other) > assemblies. My suspicion is that you have had that browser window open for a > long time and there is some old information cached which is clashing with the > new information. My suggestion is for you to reset your cart. You can do that > by visiting the gateway page and pressing "Click here to reset the browser user > interface settings to their defaults", or by visiting this URL: > http://genome.ucsc.edu/cgi-bin/cartReset > > Please be sure to let us know if this does not immediately fix the problem. > > > Regards, > > ---------- > Ann Zweig > UCSC Genome Bioinformatics Group > http://genome.ucsc.edu > > Please feel free to search the Genome mailing list archives by visiting our home > page, clicking on "Contact Us", then typing a word or phrase into the search > box. On that same page > (http://genome.ucsc.edu/contacts.html), you can subscribe to the Genome mailing > list. > > > > > > John Fyfe wrote: > > When we try to open any chr8 area from the dog (2005 assembly), we get teh following error: > > External file /gbdb/genbank/./data/processed/genbank.160.0/full/mrna.fa cannot be opened or has wrong size. Old size 1882991161, new size -1, error No such file or directory > > > > > > What to do? > > > > John C. Fyfe, D.V.M., Ph.D. > > Associate Professor of Microbiology > > and Molecular Genetics > > > > Laboratory of Comparative Medical Genetics > > 2209 Biomedical Physical Sciences > > Michigan State University > > East Lansing, MI 48824-4320 > > > > Voice: (517) 355-6463 ext 1559 > > Fax: (517) 353-8957 > > Email: fyfe at cvm.msu.edu > > http://www.mmg.msu.edu/faculty/fyfe.htm > > > > > > _______________________________________________ > > Genome maillist - Genome at soe.ucsc.edu > > http://www.soe.ucsc.edu/mailman/listinfo/genome > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome > From rhead at soe.ucsc.edu Thu Nov 15 12:08:55 2007 From: rhead at soe.ucsc.edu (Brooke Rhead) Date: Thu, 15 Nov 2007 12:08:55 -0800 Subject: [Genome] illumina 1M chip In-Reply-To: <3E5CA71A80DB304B86D502300412165601C25260@MSGEBE12.mfad.mfroot.org> References: <3E5CA71A80DB304B86D502300412165601C25260@MSGEBE12.mfad.mfroot.org> Message-ID: <473CA757.9060206@soe.ucsc.edu> Hello Scott, This is on our project list, but we have some higher priority items that we must complete first before we get to it. -- Brooke Rhead UCSC Genome Bioinformatics Group Hebbring, Scott J. wrote: > Will Illumina's 1M chip SNP positions be available part of the SNP > Arrays in the Variation and Repeat section? > > Thanks > Scott Hebbring > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From dm435 at cam.ac.uk Thu Nov 15 13:02:40 2007 From: dm435 at cam.ac.uk (Diego Miranda) Date: 15 Nov 2007 21:02:40 +0000 Subject: [Genome] Precomputed TFBS for mouse In-Reply-To: <47336946.3010906@cse.ucsc.edu> References: <47336946.3010906@cse.ucsc.edu> Message-ID: Hello, I was using the downloadable liftOver facility under Linux--the program works perfect. However, since n input lines may map to n-m coordinates plus m mismatches, how do I reconstruct it? Is there a way I could make liftOver give me e.g. chromosome hg17_start hg17_end mapto:hg18_start mapto:hg18_end Many thanks, With best wishes from Cambridge, diego On Nov 8 2007, Kayla Smith wrote: > >Hello Diego, > >We do not have any plans to create a TFBS track for the mouse. One >thing you might try is our liftOver utility >(http://genome.ucsc.edu/cgi-bin/hgLiftOver), to convert human TFBS >coordinates to mouse coordinates. > >The details page from the human TFBS track has detailed information on >the generation of this track, and a link to Biobase, which might be >useful to look into: >http://genome.ucsc.edu/cgi-bin/hgTrackUi?g=tfbsConsSites > >I hope this information is helpful to you. Please don't hesitate to >contact us again if you require further assistance. > >Kayla Smith >UCSC Genome Bioinformatics Group > > >Diego Miranda wrote: >> Hello, >> >> I am in need of the precomputed TFBS for the mouse genome (same format >> as they are provided for the human genome). I guess they are not >> available on the UCSC Genome Browser....would you have them somewhere >> for me, please? >> >> Thanks a lot for your help. >> >> With best wishes from Cambridge, >> >> diego >> >> > > -- Diego Miranda-Saavedra, PhD Bioinformatics at Gottgens Lab Cambridge Institute for Medical Research Wellcome Trust/MRC Building Addenbrooke's Hospital Hills Road Cambridge CB2 0XY United Kingdom Tel. +44 (0)1223 336822 E-mail. dm435 at cam.ac.uk Web. http://hscl.cimr.cam.ac.uk/ From wby0 at CDC.GOV Thu Nov 15 11:38:15 2007 From: wby0 at CDC.GOV (Yu, Wei (CDC/CCHP/NOPHG)) Date: Thu, 15 Nov 2007 14:38:15 -0500 Subject: [Genome] Convert to chr position numbers In-Reply-To: <473C8635.9090802@soe.ucsc.edu> References: <73C02CEE8005C24F8421250C1AFB2AAB2BFBDA@LTA3VS002.ees.hhs.gov> <473C8635.9090802@soe.ucsc.edu> Message-ID: <73C02CEE8005C24F8421250C1AFB2AAB2BFBDC@LTA3VS002.ees.hhs.gov> Dear Hiram, Thanks for your quick response. I am sorry I did not make my question clear. Let me try again. Actually I would like to try to build a custom track using Genome Browser. I used the example 4 in your how-to page (http://genome.ucsc.edu/goldenPath/help/customTrack.html) to start my excise, browser position chr22:10000000-10020000 browser hide all track name=clones description="Clones" visibility=2 color=0,128,0 useScore=1 url="http://genome.ucsc.edu/goldenPath/help/clones.html#$$" chr22 10000000 10004000 cloneA 960 chr22 10002000 10006000 cloneB 200 chr22 10005000 10009000 cloneC 700 chr22 10006000 10010000 cloneD 600 chr22 10011000 10015000 cloneE 300 chr22 10012000 10017000 cloneF 100 If I want to replace CloneA, CloneB...with GeneA, GeneB.., I need to know the starting positions and ending positions of any given genes, so that the gene (gene symbol) can display in Genome Browser. From the gene symbol labels in the annotation track window, the detail pages (our data) related to the genes can be displayed. My question is where and how I could get the positioning information (e.g. 100000000, 10004000) for each gene or if there is other way to do this. Thanks a lot for your help, Wei -----Original Message----- From: Hiram Clawson [mailto:hiram at soe.ucsc.edu] Sent: Thursday, November 15, 2007 12:48 PM To: Yu, Wei (CDC/CCHP/NOPHG) Cc: genome at soe.ucsc.edu Subject: Re: [Genome] Convert to chr position numbers Good Morning: Can you clarify what you are referring to ? Which genome assembly are you viewing ? What starting and ending positions do you mean ? What do you mean by "chr position numbers" ? When you see a gene of interest in the genome browser, click on the gene to obtain more information about that gene. You should find all information about the gene by clicking on the gene in the genome browser display image. --Hiram Yu, Wei (CDC/CCHP/NOPHG) wrote: > Hi, > > I am interested in building a customized track for our data. I have a > question regarding to how to obtain the starting and ending positions > on a gene, like chr7 127471196 127472363. You help would be highly > appreciated. > Thanks, > Wei From kayla at soe.ucsc.edu Thu Nov 15 15:11:51 2007 From: kayla at soe.ucsc.edu (Kayla Smith) Date: Thu, 15 Nov 2007 15:11:51 -0800 Subject: [Genome] Convert to chr position numbers In-Reply-To: <73C02CEE8005C24F8421250C1AFB2AAB2BFBDC@LTA3VS002.ees.hhs.gov> References: <73C02CEE8005C24F8421250C1AFB2AAB2BFBDA@LTA3VS002.ees.hhs.gov> <473C8635.9090802@soe.ucsc.edu> <73C02CEE8005C24F8421250C1AFB2AAB2BFBDC@LTA3VS002.ees.hhs.gov> Message-ID: <473CD237.8000307@cse.ucsc.edu> Hello Wei, Here is an approach to your problem that uses the Table Browser ("Tables" on the blue bar on the top of the main page). Use the following settings (most of which are the default): clade: Vertebrate genome: Human assembly: Mar. 2006 group: Genes and Gene Prediction Tracks track: UCSC Genes table: knownGene region: genome output format: selected fields from primary and related tables click "get output" check the boxes next to chrom, txStart, txEnd, and, scroll down to hg18.kgXref and click "allow selection from checked tables". The field you want is hg18.kgXref.geneSymbol. Check that, then click "get output" The output looks like this: #hg18.knownGene.chrom hg18.knownGene.txStart hg18.knownGene.txEnd hg18.kgXref.geneSymbol chr1 1736 4121 BC032353 chr1 4558 14764 DKFZp434K1323 chr1 5658 7231 FAM39B You can then incorporate this information into your Custom Track with whatever other meta data you have. Also, if you're just interested in a subset of genes, and you know their names, you can use the "filter" feature, filter from linked tables (use hg18.kgXref again) and set "geneSymbol DOES MATCH [paste in your list of genes here]". I hope this is helpful to you. Please don't hesitate to contact us again if you require further assistance. Kayla Smith UCSC Genome Bioinformatics Group Yu, Wei (CDC/CCHP/NOPHG) wrote: > Dear Hiram, > > Thanks for your quick response. I am sorry I did not make my question > clear. Let me try again. Actually I would like to try to build a custom > track using Genome Browser. I used the example 4 in your how-to page > (http://genome.ucsc.edu/goldenPath/help/customTrack.html) to start my > excise, > > browser position chr22:10000000-10020000 > browser hide all > track name=clones description="Clones" visibility=2 > color=0,128,0 useScore=1 > url="http://genome.ucsc.edu/goldenPath/help/clones.html#$$" > chr22 10000000 10004000 cloneA 960 > chr22 10002000 10006000 cloneB 200 > chr22 10005000 10009000 cloneC 700 > chr22 10006000 10010000 cloneD 600 > chr22 10011000 10015000 cloneE 300 > chr22 10012000 10017000 cloneF 100 > > > If I want to replace CloneA, CloneB...with GeneA, GeneB.., I need to > know the starting positions and ending positions of any given genes, so > that the gene (gene symbol) can display in Genome Browser. From the gene > symbol labels in the annotation track window, the detail pages (our > data) related to the genes can be displayed. My question is where and > how I could get the positioning information (e.g. 100000000, 10004000) > for each gene or if there is other way to do this. > > Thanks a lot for your help, > > Wei > > > > -----Original Message----- > From: Hiram Clawson [mailto:hiram at soe.ucsc.edu] > Sent: Thursday, November 15, 2007 12:48 PM > To: Yu, Wei (CDC/CCHP/NOPHG) > Cc: genome at soe.ucsc.edu > Subject: Re: [Genome] Convert to chr position numbers > > Good Morning: > > Can you clarify what you are referring to ? Which genome assembly > are you viewing ? What starting and ending positions do you mean ? > What do you mean by "chr position numbers" ? > > When you see a gene of interest in the genome browser, click on the gene > to obtain more information about that gene. > You should find all information about the gene by clicking on the gene > in the genome browser display image. > > --Hiram > > Yu, Wei (CDC/CCHP/NOPHG) wrote: >> Hi, >> >> I am interested in building a customized track for our data. I have a >> question regarding to how to obtain the starting and ending positions >> on a gene, like chr7 127471196 127472363. You help would be highly >> appreciated. >> Thanks, >> Wei > > > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From kayla at soe.ucsc.edu Thu Nov 15 15:38:35 2007 From: kayla at soe.ucsc.edu (Kayla Smith) Date: Thu, 15 Nov 2007 15:38:35 -0800 Subject: [Genome] Precomputed TFBS for mouse In-Reply-To: References: <47336946.3010906@cse.ucsc.edu> Message-ID: <473CD87B.3000305@cse.ucsc.edu> Hello Diego, One of our developers has recommended this method: If his files have a name column with unique values (each name is used for only one item), then the UNIX sort and join programs could do this. The steps would be: 1. Make a sorted-by-name version of the hg17 file. The UNIX command sort -k can do this. For example, in a BED file the name is in the 4th column so "sort -k4,4" would do the trick. 2. Similarly, make a sorted-by-name version of the hg18 file. 3. Run join -j , e.g. join -j 4 for BED, with -o and an output format suitable for the input files (man join to see the format and other options). I hope this information is helpful to you. Please don't hesitate to contact us again if you require further assistance. Kayla Smith UCSC Genome Bioinformatics Group Diego Miranda wrote: > Hello, > > I was using the downloadable liftOver facility under Linux--the program > works perfect. However, since n input lines may map to n-m coordinates plus > m mismatches, how do I reconstruct it? Is there a way I could make liftOver > give me e.g. > > chromosome hg17_start hg17_end mapto:hg18_start mapto:hg18_end > > Many thanks, > > With best wishes from Cambridge, > > diego > > On Nov 8 2007, Kayla Smith wrote: > >> Hello Diego, >> >> We do not have any plans to create a TFBS track for the mouse. One >> thing you might try is our liftOver utility >> (http://genome.ucsc.edu/cgi-bin/hgLiftOver), to convert human TFBS >> coordinates to mouse coordinates. >> >> The details page from the human TFBS track has detailed information on >> the generation of this track, and a link to Biobase, which might be >> useful to look into: >> http://genome.ucsc.edu/cgi-bin/hgTrackUi?g=tfbsConsSites >> >> I hope this information is helpful to you. Please don't hesitate to >> contact us again if you require further assistance. >> >> Kayla Smith >> UCSC Genome Bioinformatics Group >> >> >> Diego Miranda wrote: >>> Hello, >>> >>> I am in need of the precomputed TFBS for the mouse genome (same format >>> as they are provided for the human genome). I guess they are not >>> available on the UCSC Genome Browser....would you have them somewhere >>> for me, please? >>> >>> Thanks a lot for your help. >>> >>> With best wishes from Cambridge, >>> >>> diego >>> >>> >> > From archanat at soe.ucsc.edu Fri Nov 16 11:02:59 2007 From: archanat at soe.ucsc.edu (Archana Thakkapallayil) Date: Fri, 16 Nov 2007 11:02:59 -0800 Subject: [Genome] Testing multiple oligoes in in-silico PCR In-Reply-To: <473DCF1A.9030500@cse.ucsc.edu> References: <473DCF1A.9030500@cse.ucsc.edu> Message-ID: <473DE963.7070404@soe.ucsc.edu> Hello Martin, Please see this previously answered mailing list question on the same topic: http://www.soe.ucsc.edu/pipermail/genome/2007-March/013015.html I hope that this helps you. Please let us know if you have further questions. Regards, Archana UCSC Genome Bioinf