From DSinicropi at genomichealth.com Thu Feb 1 12:15:45 2007 From: DSinicropi at genomichealth.com (Dominick Sinicropi) Date: Thu, 1 Feb 2007 12:15:45 -0800 Subject: [Genome] Question RE: gene sorter Message-ID: <0E87BFD3E2CC90489F5343C1207CEFAF01F5BC7B@POSTOFFICE01.genomichealth.com> Hello, I am exploring the use of the three P2P options in the gene sorter: M. Vidal P2P, E. Wanker P2P, HPRD P2P. I am unable to find any descriptive information about these databases in the Help section of the Gene Sorter. Can you please provide references or descriptive information? Thank you, Dominick Dominick Sinicropi, Ph.D. Senior Scientist Genomic Health, Inc. Redwood City, CA 94063 Tel: 650-569-2227 email: dsinicropi at genomichealth.com ______________________________________________________________________ The contents of this electronic message, including any attachments, are intended only for the use of the individual or entity to which they are addressed and may contain confidential information. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this message or any attachment is strictly prohibited. If you have received this transmission in error, please send an e-mail to postmaster at genomichealth.com and delete this message, along with any attachments, from your computer. From michael.cary at ucsf.edu Thu Feb 1 12:22:10 2007 From: michael.cary at ucsf.edu (Michael P. Cary) Date: Thu, 01 Feb 2007 12:22:10 -0800 Subject: [Genome] MySQL access to genome browser question Message-ID: Hi, Is there a way to retrieve DNA sequences directly from a MySQL query? E.g., is it possible to write a query to retrieve the sequence from C. elegans chromosome IV on the positive strand from position 90,232 - 90,554? I need to do this for ~100 or so sequences at a time, so using the manual interface is not really an option for me. Also, is there a FAQ or something that provides more detail on the direct MySQL interface? Thanks, Mike From hiram at soe.ucsc.edu Thu Feb 1 12:52:43 2007 From: hiram at soe.ucsc.edu (Hiram Clawson) Date: Thu, 1 Feb 2007 12:52:43 -0800 Subject: [Genome] MySQL access to genome browser question In-Reply-To: References: Message-ID: <4323b09bc3cb21b634684017868594a6@soe.ucsc.edu> Good Afternoon Mike: No, the genome sequence is not in MySQL. We keep that in files. For C. elegans you can pick up the sequence at: nib files: ftp://hgdownload.cse.ucsc.edu/gbdb/ce2/nib/ fasta files: ftp://hgdownload.cse.ucsc.edu/goldenPath/ce2/chromosomes/ And then with those files, using the kent source tree utilities faFrag or nibFrag you can extract any bits of sequence you want. This assumes you build the kent source tree: http://genome.ucsc.edu/admin/jk-install.html --Hiram On 2007 Feb 01, , at 12:22 PM, Michael P. Cary wrote: > Hi, > > Is there a way to retrieve DNA sequences directly from a MySQL query? > E.g., > is it possible to write a query to retrieve the sequence from C. > elegans > chromosome IV on the positive strand from position 90,232 - 90,554? I > need > to do this for ~100 or so sequences at a time, so using the manual > interface > is not really an option for me. > > Also, is there a FAQ or something that provides more detail on the > direct > MySQL interface? > > Thanks, > Mike From galt at soe.ucsc.edu Thu Feb 1 13:30:54 2007 From: galt at soe.ucsc.edu (Galt Barber) Date: Thu, 1 Feb 2007 13:30:54 -0800 (PST) Subject: [Genome] Question RE: gene sorter In-Reply-To: <0E87BFD3E2CC90489F5343C1207CEFAF01F5BC7B@POSTOFFICE01.genomichealth.com> References: <0E87BFD3E2CC90489F5343C1207CEFAF01F5BC7B@POSTOFFICE01.genomichealth.com> Message-ID: click on the column headers in gene sorter to see the full help page with references -Galt On Thu, 1 Feb 2007, Dominick Sinicropi wrote: > Hello, > > I am exploring the use of the three P2P options in the gene sorter: M. > Vidal P2P, E. Wanker P2P, HPRD P2P. I am unable to find any descriptive > information about these databases in the Help section of the Gene > Sorter. Can you please provide references or descriptive information? > Thank you, > > Dominick > > > > Dominick Sinicropi, Ph.D. > Senior Scientist > Genomic Health, Inc. > Redwood City, CA 94063 > > Tel: 650-569-2227 > email: dsinicropi at genomichealth.com > > > > ______________________________________________________________________ > The contents of this electronic message, including any attachments, are intended only for the use of the individual or entity to which they are addressed and may contain confidential information. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this message or any attachment is strictly prohibited. If you have received this transmission in error, please send an e-mail to postmaster at genomichealth.com and delete this message, along with any attachments, from your computer. > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome > From g-buehler at northwestern.edu Thu Feb 1 13:34:57 2007 From: g-buehler at northwestern.edu (Guenter Albrecht-Buehler) Date: Thu, 01 Feb 2007 15:34:57 -0600 Subject: [Genome] C.elegans genome Message-ID: <6.0.0.22.2.20070201152446.026a6ed0@casbah.it.northwestern.edu> Your C.elegans chromosome sequences (derived from 'chromFa.zip' ) contains a number of '.' (hex 0A) entries. What do they stand for? Sincerely Guenter Albrecht-Buehler Guenter Albrecht-Buehler,Ph.D. Robert Laughlin Rea Professor Department of Cell and Molecular Biology Northwestern University Medical School 303 E. Chicago Ave Chicago, IL 60611 Tel.:(312)-503-4261 Fax.:(312)-503-7912 e-mail: g-buehler at northwestern.edu http://www.basic.northwestern.edu/g-buehler/cellint0.htm From rhead at soe.ucsc.edu Thu Feb 1 13:52:36 2007 From: rhead at soe.ucsc.edu (Brooke Rhead) Date: Thu, 01 Feb 2007 13:52:36 -0800 Subject: [Genome] Downloading triCas2 genome assembly In-Reply-To: <3658dd5f0701311718o5c008c9ft5be475a75238594f@mail.gmail.com> References: <3658dd5f0701311718o5c008c9ft5be475a75238594f@mail.gmail.com> Message-ID: <45C26124.90503@soe.ucsc.edu> Hello Namshin Kim, The triCas2 data set has now been added to our downloads page: http://hgdownload.cse.ucsc.edu/downloads.html#triCas Thank you for bringing this to our attention. -- Brooke Rhead UCSC Genome Bioinformatics Group Namshin Kim wrote: > Hi, > > I am working with dm2 multiz15way alignment and I tried to download > all genomes sequences for 15 species at hgdownload.cse.ucsc.edu. > There is one problem. I couldn't find triCas2 genome assembly. > I know I can download triCas2 genome sequences from Beetle > genome sequencing project or NCBI ftp genomes. > But, I do know there is slight changes when dealing with small scaffolds. > What I need is the exact genome assembly used for generating dm2 > multiz15way alignments. Would you open ftp site or upload the > genome assembly for me? > > Thanks, > Namshin Kim > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From rhead at soe.ucsc.edu Thu Feb 1 14:26:48 2007 From: rhead at soe.ucsc.edu (Brooke Rhead) Date: Thu, 01 Feb 2007 14:26:48 -0800 Subject: [Genome] Automatically blast and annotation tools available? In-Reply-To: <7.0.0.16.2.20070131105411.02566b68@llnl.gov> References: <7.0.0.16.2.20070131105411.02566b68@llnl.gov> Message-ID: <45C26928.8060909@soe.ucsc.edu> Hello Shan, We do not have any tools in place that will automatically BLAT genome sequences and return associated gene names. Also, the common gene names displayed in our gene tracks are not necessarily the most recent gene names available -- they are the names that were available when the track was made. For the most recent gene names, try the HUGO Gene Nomenclature Committee: http://www.gene.ucl.ac.uk/nomenclature/ However, you may still wish to use BLAT to locate genes associated with your sequences. You can do this by BLATing your sequences, putting them into a custom track, and then intersecting your custom track with one of the gene tracks in the Genome Browser to get the gene name. If you wish to do this, BLAT your sequences against the assembly of interest, but instead of using "output type: hyperlink" use "output type: psl no header". You can then copy and paste the BLAT output into the Custom Track form and load the BLAT output as a custom track. Once this is done, go the the Table Browser and select the gene track you wish to use. Create an intersection of the gene table with your custom track, and select the option for "All [gene track] records that have any overlap with [your custom track]". Hit submit, then choose BED as the output format (you cannot get all fields or selected fields from a table that is being intersected). The BED file outputted should be the list of genes that intersect with your sequences. This output will have a name field, but in most cases the name will be an accession number, not a common name. You will need to get that from another table. If your list of genes came from the Known Genes or RefSeq Genes tracks, the table 'kgXref' can be used to convert the gene IDs to gene names (look for the 'geneSymbol' field in the 'kgXref' table). The 'kgXref' table for the Human, March 2006 (hg18) database was created on 3-3-2006. I hope this information is helpful. Please let us know if you have any further questions. -- Brooke Rhead UCSC Genome Bioinformatics Group Shan Yang wrote: > Hi, > > Here is our situation. We have a small database of genes. We have > part of the sequence for each of them. One property of the database > is the most recent common gene name of each entry. Since the gene > names are constantly changing, we are wondering if there is any > available tools in the genome browser to automatically blast our gene > sequences and give the current gene annotation in batches? > > Thanks a lot! > > Shan > > Shan Yang, PhD > Genome Biology Division, L-452 > Chemistry, Materials & Life Sciences Directorate (CMLS) > Lawrence Livermore National Laboratory > 7000 East Ave, Livermore, CA, 94550 > > Ph: 925-422-7389 > Fax: 925-422-2099 > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From hye at aecom.yu.edu Thu Feb 1 14:59:28 2007 From: hye at aecom.yu.edu (Hilda Ye) Date: Thu, 1 Feb 2007 17:59:28 -0500 Subject: [Genome] WSU cell line Message-ID: <013a01c74654$a06fd7f0$95336281@cb.aecom.yu.edu> Hi, Does anybody know the full name and nature of the "WSU" cell line on the "GNF Expression Atlas 1 Human Data on Affy U95 Chips"? There are several human lymphoma cell lines sharing the WSU prefix but representing difference diseases. Thanks a lot! Hilda Ye ------------------------------------ B. Hilda Ye, Ph.D. Associate Professor Department of Cell Biology Chanin 302C Albert Einstein College of Medicine Jack and Pearl Resnick Campus 1300 Morris Park Avenue Bronx, New York 10461 Phone: (718)430-3339 FAX: (718)430-8574 Email: hye at aecom.yu.edu Web: http://www.aecom.yu.edu/cellbiology/BCL6/ From hiram at soe.ucsc.edu Thu Feb 1 15:19:46 2007 From: hiram at soe.ucsc.edu (Hiram Clawson) Date: Thu, 01 Feb 2007 15:19:46 -0800 Subject: [Genome] C.elegans genome In-Reply-To: <6.0.0.22.2.20070201152446.026a6ed0@casbah.it.northwestern.edu> References: <6.0.0.22.2.20070201152446.026a6ed0@casbah.it.northwestern.edu> Message-ID: <45C27592.3020505@soe.ucsc.edu> Good Afternoon Guenter: Those would be the "Line feed" ASCII character, which is an end of line marker. These ascii files have fifty nucleotides to a single line. What type of editor are you using to look at these files ? --Hiram Guenter Albrecht-Buehler wrote: > Your C.elegans chromosome sequences (derived from 'chromFa.zip' ) contains > a number of '.' (hex 0A) entries. What do they stand for? > > Sincerely > > Guenter Albrecht-Buehler > > > Guenter Albrecht-Buehler,Ph.D. > Robert Laughlin Rea Professor > Department of Cell and Molecular Biology > Northwestern University Medical School > 303 E. Chicago Ave > Chicago, IL 60611 > Tel.:(312)-503-4261 > Fax.:(312)-503-7912 > e-mail: g-buehler at northwestern.edu > http://www.basic.northwestern.edu/g-buehler/cellint0.htm From kayla at soe.ucsc.edu Thu Feb 1 16:14:03 2007 From: kayla at soe.ucsc.edu (Kayla Smith) Date: Thu, 01 Feb 2007 16:14:03 -0800 Subject: [Genome] WSU cell line In-Reply-To: <013a01c74654$a06fd7f0$95336281@cb.aecom.yu.edu> References: <013a01c74654$a06fd7f0$95336281@cb.aecom.yu.edu> Message-ID: <45C2824B.8080804@cse.ucsc.edu> Hilde, In the Gene Sorter, you can click on the column headers to get information on them. When using the Gene Sorter to view the GNF Atlas 1 data, one thing to look out for is that even if you use the "sort by" pulldown menu to sort by GNF Atlas 1, you might be looking at GNF Atlas2 data (which is the default display). To view the GNF Atlas 1 data, click on the "configure" button, check the "On" box next to the GNF Atlas 1 data and then click "submit". With the GNF Atlas 1 data displayed, you can click on the column header to give you information on the column. You'll see a link to specific tissue information at the GNF: http://expression.gnf.org/human_annot.html Scroll down to the bottom to find the WSU information. It looks like they're using the "WSU-NHL" cell line. I hope this is helpful to you. Please don't hesitate to contact us again if you require further assistance. Kayla Smith UCSC Genome Bioinformatics Group Hilda Ye wrote: > Hi, > > Does anybody know the full name and nature of the "WSU" cell line on the > "GNF Expression Atlas 1 Human Data on Affy U95 Chips"? There are several > human lymphoma cell lines sharing the WSU prefix but representing difference > diseases. > > > > Thanks a lot! > > > > Hilda Ye > > > > ------------------------------------ > B. Hilda Ye, Ph.D. > Associate Professor > Department of Cell Biology > Chanin 302C > Albert Einstein College of Medicine > Jack and Pearl Resnick Campus > 1300 Morris Park Avenue > Bronx, New York 10461 > > > Phone: (718)430-3339 > FAX: (718)430-8574 > Email: hye at aecom.yu.edu > Web: http://www.aecom.yu.edu/cellbiology/BCL6/ > > > > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From nthiessen at bcgsc.ca Fri Feb 2 10:52:05 2007 From: nthiessen at bcgsc.ca (Nina Thiessen) Date: Fri, 02 Feb 2007 10:52:05 -0800 Subject: [Genome] including single quotes in track descriptions In-Reply-To: References: Message-ID: <45C38855.4000007@bcgsc.ca> greetings, Is there a way to have single quotes appear in a track description? When I submit a track with a description string such as track ... description="Something 'fun' to look at"... The title of the track appears in the genome browser image without the quotes, ie: Something fun to look at thank you, Nina From hiram at soe.ucsc.edu Fri Feb 2 11:14:55 2007 From: hiram at soe.ucsc.edu (Hiram Clawson) Date: Fri, 02 Feb 2007 11:14:55 -0800 Subject: [Genome] including single quotes in track descriptions In-Reply-To: <45C38855.4000007@bcgsc.ca> References: <45C38855.4000007@bcgsc.ca> Message-ID: <45C38DAF.8050207@soe.ucsc.edu> Good Morning Nina: Sorry, you can't (or rather, can not) have any quotes, ' or ", in the descriptions or name of the track. We don't have a quote quoting mechanism in place for that business. --Hiram Nina Thiessen wrote: > greetings, > > Is there a way to have single quotes appear in a track description? > > When I submit a track with a description string such as > track ... description="Something 'fun' to look at"... > The title of the track appears in the genome browser image without the > quotes, ie: Something fun to look at > > thank you, > Nina From Klebig.Mitch at mayo.edu Fri Feb 2 14:57:25 2007 From: Klebig.Mitch at mayo.edu (Klebig, Mitchell L. Ph.D.) Date: Fri, 2 Feb 2007 16:57:25 -0600 Subject: [Genome] Can I get percent identities or similar scores from the conservation plots? Message-ID: <59FE3E128F4DDB4F86390A7093D4ED43124D4E@msgebe11.mfad.mfroot.org> I would like to obtain a sequence conservation score, such as the percent identity scores obtained from BLAST analysis, for a given region of the human genome that is shown as having sequence conservation with another species in the browser. If this can be done, please send me instructions on how to proceed. (Note: I am not a bioinformaticist.) Thank you. From rhead at soe.ucsc.edu Fri Feb 2 15:40:22 2007 From: rhead at soe.ucsc.edu (Brooke Rhead) Date: Fri, 02 Feb 2007 15:40:22 -0800 Subject: [Genome] Can I get percent identities or similar scores from the conservation plots? In-Reply-To: <59FE3E128F4DDB4F86390A7093D4ED43124D4E@msgebe11.mfad.mfroot.org> References: <59FE3E128F4DDB4F86390A7093D4ED43124D4E@msgebe11.mfad.mfroot.org> Message-ID: <45C3CBE6.4000105@soe.ucsc.edu> Hello Mitchell, You could use BLAT (the BLAST-Like Alignment Tool) to align sequences from another species to the human genome; BLAT will output a percent identity for the alignment. Instructions for using BLAT are here: http://genome.ucsc.edu/goldenPath/help/hgTracksHelp.html#BLATAlign Note the limitations on using BLAT: "Only DNA sequences of 25,000 or fewer bases and protein or translated sequence of 10000 or fewer letters will be processed. Up to 25 sequences can be submitted at the same time. The total limit for multiple sequence submissions is 50,000 bases or 25,000 letters." If you have further questions, or if this method is not a good solution for you, please feel free to contact us again. -- Brooke Rhead UCSC Genome Bioinformatics Group Klebig, Mitchell L. Ph.D. wrote: > I would like to obtain a sequence conservation score, such as the > percent identity scores obtained from BLAST analysis, for a given region > of the human genome that is shown as having sequence conservation with > another species in the browser. If this can be done, please send me > instructions on how to proceed. (Note: I am not a bioinformaticist.) > Thank you. > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From manning at salk.edu Fri Feb 2 09:28:20 2007 From: manning at salk.edu (Gerard Manning) Date: Fri, 2 Feb 2007 17:28:20 +0000 Subject: [Genome] Downloading specific sequence region Message-ID: <22D2B780-7230-4524-ADCD-A9F0DB65848B@salk.edu> Hi, I have what I suspect is a very dumb question, but one that I haven't been able to answer after several trawls though your website. I'd like to be able to download the sequence from any region of any genome that you publish, ideally by pasting in the co-ordinates in the style used by the genome browser. I can find all kinds of areas where I can use those co-ordinates to get rich annotations and data, but not the raw sequence. My main use for the sequence is to run homology-based gene predictions (genewise and the like) and to search for remote regions of homology in the areas around partial gene predictions. My apologies if this is a FAQ, any help you can give would be very much appreciated. -Gerard Manning. From rhead at soe.ucsc.edu Fri Feb 2 18:51:27 2007 From: rhead at soe.ucsc.edu (Brooke Rhead) Date: Fri, 02 Feb 2007 18:51:27 -0800 Subject: [Genome] Downloading specific sequence region In-Reply-To: <22D2B780-7230-4524-ADCD-A9F0DB65848B@salk.edu> References: <22D2B780-7230-4524-ADCD-A9F0DB65848B@salk.edu> Message-ID: <45C3F8AF.5070602@soe.ucsc.edu> Hello Gerard, We do have an FAQ writeup on this one: http://genome.ucsc.edu/FAQ/FAQdownloads#download32 Please let us know if you have any further questions. -- Brooke Rhead UCSC Genome Bioinformatics Group Gerard Manning wrote: > Hi, > I have what I suspect is a very dumb question, but one that I > haven't been able to answer after several trawls though your website. > I'd like to be able to download the sequence from any region of any > genome that you publish, ideally by pasting in the co-ordinates in > the style used by the genome browser. I can find all kinds of areas > where I can use those co-ordinates to get rich annotations and data, > but not the raw sequence. My main use for the sequence is to run > homology-based gene predictions (genewise and the like) and to search > for remote regions of homology in the areas around partial gene > predictions. > > My apologies if this is a FAQ, any help you can give would be very > much appreciated. > > -Gerard Manning. > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From Noel.Buckley at iop.kcl.ac.uk Sat Feb 3 15:23:05 2007 From: Noel.Buckley at iop.kcl.ac.uk (Buckley, Noel) Date: Sat, 3 Feb 2007 23:23:05 -0000 Subject: [Genome] custom track Message-ID: <7AC047D44A8CD243B7FE9305102D5FD104F6F551@MAIL.bc.iop.kcl.ac.uk> Can you tell me if and how it is possible to retain a custom track uploaded as a BED file so that it doesn't disappear each time I close down the UCSC browser (this doesn't happen with DAS files on the ensembl browser). Many Thanks Noel Buckley ---- Noel Buckley Professor of Molecular Neurobiology Centre for the Cellular Basis of Behaviour The James Black Centre; Rm 1-045 King's College London Institute of Psychiatry 125 Coldharbour Lane London SE5 9NU T 020 7848 0784 F 020 7848 5308 E noel.buckley at iop.kcl.ac.uk W www.iop.kcl.ac.uk/iopweb/departments/home/default.aspx?locator=624 From hiram at soe.ucsc.edu Sat Feb 3 16:08:24 2007 From: hiram at soe.ucsc.edu (Hiram Clawson) Date: Sat, 3 Feb 2007 16:08:24 -0800 Subject: [Genome] custom track In-Reply-To: <7AC047D44A8CD243B7FE9305102D5FD104F6F551@MAIL.bc.iop.kcl.ac.uk> References: <7AC047D44A8CD243B7FE9305102D5FD104F6F551@MAIL.bc.iop.kcl.ac.uk> Message-ID: <42558af3700f791b94ee606569c17234@soe.ucsc.edu> Good Afternoon Noel: The custom track life time is approximately two days since last reference. This assumes your WEB browser is saving the UCSC cookie. To make your custom track more permanent, place your data in a WEB server under you control, and use a URL to reference you data from the UCSC genome browser. If your custom track was created in the table browser, then use the table browser to give you your track as a custom track data file, and save that locally. Load that file as a custom track if it has expired after the two day limit at UCSC. --Hiram On 2007 Feb 03, , at 3:23 PM, Buckley, Noel wrote: > Can you tell me if and how it is possible to retain a custom track > uploaded as a BED file so that it doesn't disappear each time I close > down the UCSC browser (this doesn't happen with DAS files on the > ensembl browser). > > Many Thanks > > Noel Buckley > ---- > Noel Buckley > Professor of Molecular Neurobiology > Centre for the Cellular Basis of Behaviour > The James Black Centre; Rm 1-045 > King's College London > Institute of Psychiatry > 125 Coldharbour Lane > London SE5 9NU > > T 020 7848 0784 > F 020 7848 5308 > E noel.buckley at iop.kcl.ac.uk > W www.iop.kcl.ac.uk/iopweb/departments/home/default.aspx?locator=624 > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome > From shalmom1 at mail.biu.ac.il Sun Feb 4 04:06:43 2007 From: shalmom1 at mail.biu.ac.il (Mali Salmon-Divon) Date: Sun, 4 Feb 2007 14:06:43 +0200 Subject: [Genome] TFBS conserved in mouse sequences Message-ID: <000101c7485c$4dec6e80$6a294684@ad.biu.ac.il> Dear Sir I have two questions: 1. I want to find specific conserved TF binding sites upstream to mouse sequences (I did the same for human sequences). However, your mouse database does not contain the TFBS conserved truck (although the data exist ? you did the comparison for the human database). Is it possible to search for TFBS conserved using mouse sequences? 2. One solution I thought about is to take the human homologous of my mouse sequences and to search the TF binding sites on the human sequences. Please advise how I can retrieve the human homologous accession numbers from the table browser (using mouse accession numbers). Looking forward to your answer Thanks a lot Mali Mali Salmon Bar Ilan University, Israel From michael.cahill at proteosys.com Mon Feb 5 04:21:55 2007 From: michael.cahill at proteosys.com (Michael A. Cahill) Date: Mon, 5 Feb 2007 13:21:55 +0100 Subject: [Genome] Incorrectly annotated link Message-ID: Dear Sir/Madam, I would like to report an error in your bioinformatics 'UCSC Proteome Browser' server for the entry of Protein O00264 (aka PGRC1_HUMAN or PGC1_HUMAN) Membrane associated progesterone receptor component 1 (mPR). http://harvester.embl.de/harvester/Q6IB/Q6IB11.htm#PROTEOME The above page concerns the HGNC protein designated PGRMC1. The following link is erroneous. "Pathways: BioCarta - h_mPRPathway (http://cgap.nci.nih.gov/Pathways/BioCarta/h_mPRPathway) - How Progesterone Initiates the Oocyte Maturation" This linked 'mPR' pathway refers to a seven membrane receptor G-coupled protein also called mPR (membrane progestin receptor), which is NOT the cytochrome B5-containing Membrane Progesterone Receptor (mPR/PGRMC1). See the following text from a review I have written on PGRMC1. Cahill MA. 2007. Progesterone Receptor Membrane Component 1: an integrative review. J. Ster. Biochem. Mol. Biol. in press (probably to appear in the April 2007 edition). mPR: nomenclatural booby trap Note that another protein (in addition to mPR/PGRMC1), termed membrane progestin receptor, has also been denoted as mPR in the literature. This different gene product is a G-protein coupled seven membrane domain progestin receptor found from fish to mammals that conveys non-genomic effects of progesterone, and is otherwise not at all related to PGRMC1 or PR [16]. There are currently eleven mammalian members belonging to this separate gene family, which has been named the PAQR family, after two of the initially described ligands (progestin and adipoQ receptors) [17]. References: 16. Y. Zhu, J. Bond, P. Thomas, Identification, classification, and partial characterization of genes in humans and other vertebrates homologous to a fish membrane progestin receptor, Proc. Natl. Acad. Sci. U. S. A. 100 (5) (2003) 2237-2242. 17. Y.T. Tang, T. Hu, M. Arterburn, B. Boyle, J.M. Bright, P.C. Emtage, W.D. Funk, PAQR proteins: a novel membrane receptor family defined by an ancient 7-transmembrane pass motif, J. Mol. Evol. 61 (3) (2005) 372-380. Kind regards, Mike Cahill. ________________________________________________________ PROTEOSYS AG - Carl-Zeiss-Str 51 - 55129 Mainz - Germany. Michael Cahill, Chief Research Officer, tel: +49(0)61 31-50 192 25. fax: +49(0)61 31-50 192 11. www.proteosys.com. From Mike.Mitchell at cancer.org.uk Mon Feb 5 08:28:54 2007 From: Mike.Mitchell at cancer.org.uk (Mike Mitchell) Date: Mon, 05 Feb 2007 16:28:54 +0000 Subject: [Genome] Incorrectly annotated link In-Reply-To: Message-ID: Hello Mike, All the links that you posted about are external to the UCSC Genome Browser, and are therefore not within their remit to correct. I would suggest that you contacted the curators of the external sites themselves with your concerns. On 5/2/07 12:21, "Michael A. Cahill" wrote: > Dear Sir/Madam, > > I would like to report an error in your bioinformatics 'UCSC Proteome > Browser' server for the entry of Protein O00264 (aka PGRC1_HUMAN or > PGC1_HUMAN) Membrane associated progesterone receptor component 1 (mPR). -- Mike Mitchell Bioinformatics & Biostatistics Service Cancer Research UK +44 (0) 207 269 3115 From msuder at MCB.McGill.CA Mon Feb 5 09:02:37 2007 From: msuder at MCB.McGill.CA (msuder@MCB.McGill.CA) Date: Mon, 5 Feb 2007 12:02:37 -0500 (EST) Subject: [Genome] RN4ToRN3 liftover file Message-ID: <52376.132.206.211.127.1170694957.squirrel@mail.mcb.mcgill.ca> Hi, I have coordinates in rn4 and would like to do a "liftover" to rn3. Unfortunately, the chain file does not appear to be available. Would it be possible for you to generate that file. Thanks! Matt From fanhsu at soe.ucsc.edu Mon Feb 5 09:29:56 2007 From: fanhsu at soe.ucsc.edu (Fan Hsu) Date: Mon, 5 Feb 2007 09:29:56 -0800 Subject: [Genome] Incorrectly annotated link In-Reply-To: Message-ID: Hi Michael, Thanks for pointing this out. I looked into it and found that the pathway link of this gene, PGRMC1 (NM_006667/O00264) was based on data downloaded from CGAP, ftp://ftp1.nci.nih.gov/pub/CGAP/Hs_GeneData.dat I believe CGAP/NCI gets this data from BioCarta. As Mike Mitchell points out, for data we obtained from 3rd parties, we do not manually curate them. Please contact the original data source, CGAP (and/or BioCarta) directly to correct their downloadable data file(s). We are in the process of building our next release of UCSC Known Genes (KG). If CGAP updates their file soon, the correction might make it to our next KG release. Otherwise, it should show up on the following release. Fan. -----Original Message----- From: genome-bounces at soe.ucsc.edu [mailto:genome-bounces at soe.ucsc.edu]On Behalf Of Michael A. Cahill Sent: Monday, February 05, 2007 4:22 AM To: genome at soe.ucsc.edu Subject: [Genome] Incorrectly annotated link Dear Sir/Madam, I would like to report an error in your bioinformatics 'UCSC Proteome Browser' server for the entry of Protein O00264 (aka PGRC1_HUMAN or PGC1_HUMAN) Membrane associated progesterone receptor component 1 (mPR). http://harvester.embl.de/harvester/Q6IB/Q6IB11.htm#PROTEOME The above page concerns the HGNC protein designated PGRMC1. The following link is erroneous. "Pathways: BioCarta - h_mPRPathway (http://cgap.nci.nih.gov/Pathways/BioCarta/h_mPRPathway) - How Progesterone Initiates the Oocyte Maturation" This linked 'mPR' pathway refers to a seven membrane receptor G-coupled protein also called mPR (membrane progestin receptor), which is NOT the cytochrome B5-containing Membrane Progesterone Receptor (mPR/PGRMC1). See the following text from a review I have written on PGRMC1. Cahill MA. 2007. Progesterone Receptor Membrane Component 1: an integrative review. J. Ster. Biochem. Mol. Biol. in press (probably to appear in the April 2007 edition). mPR: nomenclatural booby trap Note that another protein (in addition to mPR/PGRMC1), termed membrane progestin receptor, has also been denoted as mPR in the literature. This different gene product is a G-protein coupled seven membrane domain progestin receptor found from fish to mammals that conveys non-genomic effects of progesterone, and is otherwise not at all related to PGRMC1 or PR [16]. There are currently eleven mammalian members belonging to this separate gene family, which has been named the PAQR family, after two of the initially described ligands (progestin and adipoQ receptors) [17]. References: 16. Y. Zhu, J. Bond, P. Thomas, Identification, classification, and partial characterization of genes in humans and other vertebrates homologous to a fish membrane progestin receptor, Proc. Natl. Acad. Sci. U. S. A. 100 (5) (2003) 2237-2242. 17. Y.T. Tang, T. Hu, M. Arterburn, B. Boyle, J.M. Bright, P.C. Emtage, W.D. Funk, PAQR proteins: a novel membrane receptor family defined by an ancient 7-transmembrane pass motif, J. Mol. Evol. 61 (3) (2005) 372-380. Kind regards, Mike Cahill. ________________________________________________________ PROTEOSYS AG - Carl-Zeiss-Str 51 - 55129 Mainz - Germany. Michael Cahill, Chief Research Officer, tel: +49(0)61 31-50 192 25. fax: +49(0)61 31-50 192 11. www.proteosys.com. _______________________________________________ Genome maillist - Genome at soe.ucsc.edu http://www.soe.ucsc.edu/mailman/listinfo/genome From jtlee at microbio.ucla.edu Mon Feb 5 13:49:28 2007 From: jtlee at microbio.ucla.edu (Lee, Jason Thanh) Date: Mon, 5 Feb 2007 13:49:28 -0800 Subject: [Genome] Data download Message-ID: Dear UCSC Genome Browser staff, I have two questions, the first having higher priority. 1. After finding genes in a particular chromosome region, is there a simple way to download the list of genes found (eg. *.txt format)? 2. Is there an automated way to search using a list of chromosome positions as input, and download the list of genes found per search item as output, keeping the chromosome_location-to-gene correlation? Thank you in advance, Jason Lee UCLA/HHMI 310-825-0169 From rhead at soe.ucsc.edu Mon Feb 5 15:24:21 2007 From: rhead at soe.ucsc.edu (Brooke Rhead) Date: Mon, 05 Feb 2007 15:24:21 -0800 Subject: [Genome] Data download In-Reply-To: References: Message-ID: <45C7BCA5.8090104@soe.ucsc.edu> Hello Jason, 1. The Table Browser can be used to download the genes in a particular region. To use it, go to the Table Browser (the blue "Tables" link at the top of the page) and select the clade, genome, and assembly of interest. Then choose "group: Genes and Gene Prediction Tracks", and the gene track you wish to use (Known Genes, RefSeq Genes, etc.). Use the first table listed in the drop-down menu (the main table for a track is always listed first). Be sure that "region: position" is selected, and that your region of interest is entered in the position box. Choose "output format: selected fields from primary and related tables". Enter a name for your output file if you want to download the results. Hit "get output" and select the 'name' field, plus any additional fields you would like to output. Hit "get output" again. You should get a list of all of the genes in the region you specified. 2. To search the Genome Browser using a list of chromosome regions, first create a custom track of the regions. You will then be able to intersect your custom track with the gene track of interest using the Table Browser. Help on creating a custom track is located here: http://genome.ucsc.edu/goldenPath/help/customTrack.html For a simple list of chromosome positions, use BED format, described here: http://genome.ucsc.edu/goldenPath/help/customTrack.html#BED Once the custom track has been created, go the the Table Browser and select the gene track from which you wish to the retrieve gene names. Hit the "intersection: create" button and select your new custom track. Choose one of the intersection options listed under "These combinations will maintain the gene/alignment structure (if any) of [the gene track]". Hit "submit", then choose "output format: BED" and get the output. You should get all records from the gene table you selected (including their positions) that intersect with your custom track. I hope this information is helpful. Please let us know if we can clarify any of the above, or if you have further questions. -- Brooke Rhead UCSC Genome Bioinformatics Group Lee, Jason Thanh wrote: > Dear UCSC Genome Browser staff, > > > > I have two questions, the first having higher priority. > > > > 1. After finding genes in a particular chromosome region, is there a > simple way to download the list of genes found (eg. *.txt format)? > 2. Is there an automated way to search using a list of chromosome > positions as input, and download the list of genes found per search item as > output, keeping the chromosome_location-to-gene correlation? > > > > Thank you in advance, > > > > Jason Lee > > UCLA/HHMI > > 310-825-0169 > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From rhead at soe.ucsc.edu Mon Feb 5 17:47:41 2007 From: rhead at soe.ucsc.edu (Brooke Rhead) Date: Mon, 05 Feb 2007 17:47:41 -0800 Subject: [Genome] TFBS conserved in mouse sequences In-Reply-To: <000101c7485c$4dec6e80$6a294684@ad.biu.ac.il> References: <000101c7485c$4dec6e80$6a294684@ad.biu.ac.il> Message-ID: <45C7DE3D.3030505@soe.ucsc.edu> Hello Mali, You are correct that we do not have a TFBS Conserved track for any mouse database. A previously answered question on this topic is here: http://www.cse.ucsc.edu/pipermail/genome/2005-July/008024.html . . . this answer describes how to use the Conservation track to find the syntentic regions between human and mouse. If you would prefer find the human homologs to mouse accession numbers and search that way, I can also tell you how to do that. Go to the Table Browser and select the mouse assembly for which you have accession numbers. Select "group: All Tables" and "table: hgBlastTab". (This is a table of alignments between mouse proteins and human proteins.) Hit the "filter: create" button and paste the mouse accession numbers in the "query does match" box. (This box will actually accept a list of numbers.) Hit "submit" and then "get output". The 'query' field in the result will be the mouse accession that you entered, and the 'target' field will be the homologous human accession. I hope this information is helpful. If you have further questions, please do not hesitate to contact us again. -- Brooke Rhead UCSC Genome Bioinformatics Group Mali Salmon-Divon wrote: > Dear Sir > > I have two questions: > > 1. I want to find specific conserved TF binding sites upstream to > mouse sequences (I did the same for human sequences). However, your > mouse database does not contain the TFBS conserved truck (although > the data exist ? you did the comparison for the human database). Is > it possible to search for TFBS conserved using mouse sequences? > > > > 2. One solution I thought about is to take the human homologous of my > mouse sequences and to search the TF binding sites on the human > sequences. Please advise how I can retrieve the human homologous > accession numbers from the table browser (using mouse accession > numbers). > > Looking forward to your answer > > Thanks a lot > > Mali > > > > Mali Salmon Bar Ilan University, Israel > _______________________________________________ Genome maillist - > Genome at soe.ucsc.edu http://www.soe.ucsc.edu/mailman/listinfo/genome From asli_ay at gmx.de Tue Feb 6 03:19:21 2007 From: asli_ay at gmx.de (asli_ay@gmx.de) Date: Tue, 06 Feb 2007 12:19:21 +0100 Subject: [Genome] metylationstatus of cpG-islands Message-ID: <20070206111921.85430@gmx.net> Hello Team of UCSC Human Genome Browser, my name is Aslihan Gerhold-Ay and I?am from the Institute for Medical Biometrics, Epidemiology and Informatics in Mainz (Germany). I search at your Database for the cpG-islands of the human genome, that was very easy :-)) But I would like to know how I can automatically find out the methylationstatus of the cpG-islands. It would be very nice if you can help me. Thank you! Best regards Aslihan Gerhold-Ay Aslihan Gerhold-Ay Obere Zahlbacher Str. 69 55131 Mainz, Germany Institutsgeb?ude 902 Tel: 0 049 6131 / 173120 -- "Feel free" - 10 GB Mailbox, 100 FreeSMS/Monat ... Jetzt GMX TopMail testen: http://www.gmx.net/de/go/topmail?ac=OM.GX.GX003K11713T4783a From isabelle.dupanloup at zoo.unibe.ch Tue Feb 6 05:04:23 2007 From: isabelle.dupanloup at zoo.unibe.ch (Isabelle Dupanloup) Date: Tue, 6 Feb 2007 14:04:23 +0100 Subject: [Genome] Expression for human genes Message-ID: <002201c749ef$52bb3380$8f8a5c82@cmpg.unibe.ch> Hi there, I would like to study the expression of human genes using ESTs/cDNAs. I saw that you make available on your website the coordinates of ESTs/cDNAs mapping to the human genome (defined as blat hits). My question is about the best way to retrieve this info using your table browser Group: mRNA and EST track What are the optimal choices for 1) track ? 2) table ? There are many possibilities in there. What I need is the best mapping of ESTs/cDNAs to the human genome. Could you help me to find the way to get it ? Thank you very much in advance. And thank you also for your very nice tool. Isabelle ------------------- Isabelle Dupanloup PhD Computational and Molecular Population Genetics Group Baltzerstrasse 6 3012 Bern Switzerland Tel: +41 31 631 45 49 Fax: +41 31 631 48 88 From leong at wehi.EDU.AU Mon Feb 5 19:12:14 2007 From: leong at wehi.EDU.AU (Dillon Leong) Date: Tue, 6 Feb 2007 14:12:14 +1100 Subject: [Genome] Query: Restriction enzyme digest displays Message-ID: Dear Sir/Mdm, I've been using genome browser (mouse) to design DNA digests for Southern Blots. Recently, I noticed a discrepancy which I hope you can help to clarify. I downloaded the Refseq genomic sequence for mouse itga1 gene into a Microsoft Word file and searched for cleavage sites for restriction enzyme ClaI. I was sure to remove paragraph marks from the sequence. When I compared the cleavage pattern with that in genome browser, I found differences. The number of cleavage sites remained the same (=5), but the position of the sites and the resulting DNA fragment sizes were different. I can't figure out why this is so, and I've redone my Word document to verify that I haven't made mistakes while transferring the sequence from Refseq. Would you please give it a go and see if you get the same result? Best regards, Dillon From m.gilchrist at gurdon.cam.ac.uk Tue Feb 6 04:47:17 2007 From: m.gilchrist at gurdon.cam.ac.uk (Mike Gilchrist) Date: Tue, 06 Feb 2007 12:47:17 +0000 Subject: [Genome] technical question on "&hgt.customText=" Message-ID: <7.0.1.0.0.20070206112415.01fcdc38@gurdon.cam.ac.uk> Hi, I am working very happily with the custom track display mechanism, but now that I've started to use the &hgt.customText method, I have a couple of rather detailed queries that may affect how I implement it. The method works fine, but it's hard to know how best to use it, as I'm not sure exactly how it works. As far as I can see, these things are not touched on in the help pages, and may be useful for other people... (a) the 'browser position' line in the track data file. If this is left in (as it would be for separate loading via your browser pages) then it seems to override the &position= information earlier in the URL. Is this the intended behaviour? Obviously it's a simple matter to leave the 'browser position' line out of the data file, but then I need separate versions of the data file for uploading via the URL and uploading separately. It would certainly work better for me if the URL position= overrode the 'browser position' line. (b) reloading every time? If a user is given a list of URLs all containing the &hgt.customText instruction to load the (same) custom track data file, obviously the track data gets loaded when the first URL is clicked - but does it then get RE-loaded for every other URL the user clicks on? This would presumably have performance implications, both at the user end, but also for your servers. Typically I'm using a track file with ~10,000 data points, and each URL takes about 8 seconds to load including the first one. WIthout the &hgt.customText tag the same link takes only ~2 seconds. Superficially this indicates that the track data is reloaded each time, which might make the mechanisms problematic for larger track data sets. One could work around this with (say) chromosome specific sets when the number of data points gets too large... Anyway, I'd be interested in your thoughts on this. (c) reload if data changed? This may well be wishful thinking, but it would be nice if the track data was reloaded if (and only if) the track data had changed. I think I've confirmed that the track data is reloaded for each click by changing the colour of a track in the track data between clicks - but of course maybe you already check to see if the data file has changed! So obviously the non-reloading mechanism I'm advocating above would only be useful if newer version of the data were reloaded in place of older versions. Apologies if I'm too demanding - for me it's a sign that you have a great browser and I'd like it to do even more! Thanks for your help. Yours, mike Dr. Mike Gilchrist Core Bioinformatics Group Room 112 The Wellcome Trust/Cancer Research UK Gurdon Institute Cambridge CB2 1QN tel: (44-1223-7) 67210 fax: (44-1223-3) 34089 m.gilchrist at gurdon.cam.ac.uk From vlb2 at cornell.edu Mon Feb 5 18:28:04 2007 From: vlb2 at cornell.edu (Vanessa Bauer) Date: Mon, 5 Feb 2007 21:28:04 -0500 Subject: [Genome] GFF files Message-ID: Hello, We have downloaded intron multi-species alignments for a set of Drosophila loci. We now would like to link these alignments to their corresponding loci. We have deduced that your D. melanogaster coordinates are from release v. 3.2.0. We have not had success retrieving GFF files for version 3.2.0 from FlyBase. We were hoping you could send us in the right direction with regard linking downloaded intron alignments back to their corresponding CDS. thanks Vanessa Bauer DuMont From marek.szubert at gmail.com Tue Feb 6 00:10:32 2007 From: marek.szubert at gmail.com (Marek Szubert) Date: Tue, 6 Feb 2007 18:10:32 +1000 Subject: [Genome] elliotsEncode.mod for running phastCons Message-ID: Hi I would like to run phastCons on a subset of the hg18 multiz17way multiple sequence alignments that I have FTPed and wondered if you could send me the contents of the phyloHMM model file : /cluster/store5/gs.18/build35/bed/multiz17way.2005-12-20/cons/elliotsEnco de.mod that is used later in running phastCons to produce the conservation scores (named in /kent/src/hg/makeDb/doc/hg18.txt ). It would be good for me to see the phylogenetic tree used and what phastCons rate parameters were used as a basis for my own runs. If possible I would also like to request the two equivalent noncons and cons files for the dm3 multiz15way model files : /cluster/data/dm3/bed/multiz15way/phastCons/run.estimate/ave.noncons.mod and ave.cons.mod Thank you for your help Jan Szubert, University of Queensland. From kate at soe.ucsc.edu Tue Feb 6 09:38:08 2007 From: kate at soe.ucsc.edu (Kate Rosenbloom) Date: Tue, 6 Feb 2007 09:38:08 -0800 Subject: [Genome] elliotsEncode.mod for running phastCons In-Reply-To: References: Message-ID: <573a851b81af6f7190c922fe1e10ee0a@soe.ucsc.edu> Hello Jan, The model files you've requested are now posted to the relevant phastCons downloads areas on our test server and will soon be available on our public download server. For now, you can obtain them at: http://genome-test.cse.ucsc.edu/goldenPath/hg18/phastCons17way/ http://genome-test.cse.ucsc.edu/goldenPath/dm3/phastCons15way/ For the hg18 conservation track, the following phastCons rate parameters were used: expected length 14 target coverage .008 rho .28 The full set of command-line options used are documented in the doc/hg18.txt file. Cheers, Kate On Feb 6, 2007, at 12:10 AM, Marek Szubert wrote: > Hi > > I would like to run phastCons on a subset of the hg18 multiz17way > multiple > sequence alignments that I have FTPed and wondered if you could send > me the > contents of the phyloHMM model file : > > /cluster/store5/gs.18/build35/bed/multiz17way.2005-12-20/cons/ > elliotsEnco > de.mod > > that is used later in running phastCons to produce the conservation > scores > (named in /kent/src/hg/makeDb/doc/hg18.txt ). It would be good for me > to see > the phylogenetic tree used and what phastCons rate parameters were > used as a > basis for my own runs. > > If possible I would also like to request the two equivalent noncons > and cons > files for the dm3 multiz15way model files : > /cluster/data/dm3/bed/multiz15way/phastCons/run.estimate/ > ave.noncons.mod > and ave.cons.mod > > Thank you for your help > > Jan Szubert, University of Queensland. > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From kayla at soe.ucsc.edu Tue Feb 6 10:36:17 2007 From: kayla at soe.ucsc.edu (Kayla Smith) Date: Tue, 06 Feb 2007 10:36:17 -0800 Subject: [Genome] repStart, repEnd, repLeft in chrN_rmsk table In-Reply-To: <9955EE7A-4872-4BA5-805E-69A2D8C74566@manchester.ac.uk> References: <9955EE7A-4872-4BA5-805E-69A2D8C74566@manchester.ac.uk> Message-ID: <45C8CAA1.4010003@cse.ucsc.edu> Casey, You are correct that the information in the repStart and repLeft fields of the rmsk table is interpreted differently based on which strand is involved. It is true the the fields could be better named. The two fields behave as mirror images of each other. Allow me to describe how these fields are related. Of the three coordinates involved (repStart, repEnd, repLeft), repEnd can be thought of as the "middle" coordinate of the three. It is always a positive integer regardless of the strand the repeat element hits. It represents the end coordinate of the matching part of the repeat element in "repeat element coordinates". The "beginning" coordinate of the matching part of a repeat element is repStart (for + oriented hits) or repLeft (for - oriented hits). For + oriented hits, repStart & repEnd are the proper coordinates of the matching part of the element. RepLeft in this case is a numerical value which may be used (via the equation repEnd-repLeft) to obtain the size of the repeat element ("Left" in the sense of "repeat remaining" unaligned). For - oriented hits, repLeft & repEnd are the proper coordinates of the matching part of the element. RepLeft will be upstream from (smaller than) repEnd for these neg-strand alignments. RepStart in this case is a numerical value which may be used (via the equation repEnd-repStart) to obtain the size of the repeat element. I hope this is helpful to you. Please don't hesitate to contact us again if you require more assistance. Kayla Smith UCSC Genome Bioinformatics Group Casey Bergman wrote: > Hi - > > Following an old thread posted at genome/2003-November/003500.html> I have a query about the repStart, > repEnd, repLeft fields in the UCSC RepeatMasker tables. > > My concern is that the parsing of RepeatMasker coordinates on the > repeat query sequence may not be consistent for matches on positive > and negative strands of the genome. As can be seen in the sample rows > at db=hg18&hgta_doSchema=describe+table+schema> and throughout the > genome browser and download files, matches on the negative strand > have a negative repStart value which seems not to be possible, > whereas matches on the positive strand have interpretable integer > coordinates. > > On investigating a few matches on both positive and negative strands, > it appears that start and end coordinates of the query repeat are > stored at UCSC as repStart & repEnd for positive strand matches, but > stored as repLeft and repEnd for negative strand matches. This > appears to be related to differences in the format of RepeatMasker > output for positive and negative strand matches (see below, + vs C > rows). If this interpretation of the situation is correct, the > meaning of repStart, repEnd, repLeft fields changes for positive and > negative strand matches. It would be great to get a second opinion > on this, and if this situation might be flagged for review since the > current format is not terribly intuitive and may not be desired. > > All the best, > Casey > > ************* > > From > Example: > > 1306 15.6 6.2 0.0 HSU08988 6563 6781 (22462) C MER7A DNA/ > MER2_type (0) 336 103 > 12204 10.0 2.4 1.8 HSU08988 6782 7714 (21529) C TIGGER1 DNA/ > MER2_type (0) 2418 1493 > 279 3.0 0.0 0.0 HSU08988 7719 7751 (21492) + (TTTTA)n > Simple_repeat 1 33 (0) > 1765 13.4 6.5 1.8 HSU08988 7752 8022 (21221) C AluSx SINE/ > Alu (23) 289 1 > 12204 10.0 2.4 1.8 HSU08988 8023 8694 (20549) C TIGGER1 DNA/ > MER2_type (925) 1493 827 > 1984 11.1 0.3 0.7 HSU08988 8695 9000 (20243) C AluSg SINE/ > Alu (5) 305 1 > 12204 10.0 2.4 1.8 HSU08988 9001 9695 (19548) C TIGGER1 DNA/ > MER2_type (1591) 827 2 > 711 21.2 1.4 0.0 HSU08988 9696 9816 (19427) C MER7A DNA/ > MER2_type (224) 122 2 > This is a sequence in which a Tigger1 DNA transposon has integrated > into a MER7 DNA transposon copy. Subsequently two Alus integrated in > the Tigger1 sequence. The simple repeat is derived from the poly A of > the Alu element. The first line is interpreted like this: > > 1306 = Smith-Waterman score of the match, usually complexity > adjusted > The SW scores are not always directly comparable. Sometimes > the complexity adjustment has been turned off, and a variety of > scoring-matrices are used. > 15.6 = % substitutions in matching region compared to the > consensus > 6.2 = % of bases opposite a gap in the query sequence (deleted > bp) > 0.0 = % of bases opposite a gap in the repeat consensus > (inserted bp) > HSU08988 = name of query sequence > 6563 = starting position of match in query sequence > 7714 = ending position of match in query sequence > (22462) = no. of bases in query sequence past the ending position > of match > C = match is with the Complement of the consensus sequence > in the database > MER7A = name of the matching interspersed repeat > DNA/MER2_type = the class of the repeat, in this case a DNA > transposon > fossil of the MER2 group (see below for list and > references) > (0) = no. of bases in (complement of) the repeat consensus > sequence > prior to beginning of the match (so 0 means that the > match extended > all the way to the end of the repeat consensus sequence) > 2418 = starting position of match in database sequence (using > top-strand numbering) > 1465 = ending position of match in database sequence > > ************* > > Casey Bergman, Ph.D. > Faculty of Life Sciences > University of Manchester > Michael Smith Building > Oxford Road, M13 9PT > Manchester, UK > > Tel: +44-(0)161-275-1713 > Fax: +44-(0)161-275-5082 > skype: caseymbergman > > Email: casey.bergman at manchester.ac.uk > Web: http://www.bioinf.manchester.ac.uk/bergman/ > > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From christopher.x.moy at gsk.com Tue Feb 6 08:36:20 2007 From: christopher.x.moy at gsk.com (christopher.x.moy@gsk.com) Date: Tue, 6 Feb 2007 11:36:20 -0500 Subject: [Genome] Firefox problems with Table Browser Message-ID: Hello, I like to use the table browser link in the genome browser but I have found that when I use Firefox, the tables will display the results for whatever gene I may have previously been working with rather than the standard form for initial entry into the table browser. This happens even if I clear the cache on Firefox.This is not a problem with IE and only happens with Firefox/Mozilla. Chris From kalexiev at bnl.gov Tue Feb 6 10:23:03 2007 From: kalexiev at bnl.gov (Alexieva-Botcheva, Krassimira) Date: Tue, 6 Feb 2007 13:23:03 -0500 Subject: [Genome] Singapore PETs Message-ID: <8AF9A8BFBBF0194F988AB05A59F8AFD56E3321@exchangemb4.bnl.gov> Hi, I was wondering why I can't upload Singapore data (p53 ChIP-PET), they are not present anymore as option for customizing the tracks. Thanks Krassi Botcheva Krassimira Alexieva-Botcheva, PhD Biology Department Brookhaven National Laboratory Upton, NY 11973-5000 phone: (631) 344 4216 fax: (631) 344 3407 e-mail: kalexiev at bnl.gov From varia at umdnj.edu Tue Feb 6 08:29:27 2007 From: varia at umdnj.edu (Smita Varia) Date: Tue, 06 Feb 2007 11:29:27 -0500 Subject: [Genome] flanking sequences Message-ID: <45C8ACE7.6070405@umdnj.edu> hello, I would klike flanking sequences to mouse VGF gene. Please help me. I am trying to construct a loxP construct. Thank you Smita -- Smita Thakker-Varia, PhD. UMDNJ-RWJ Medical School Department of Neuroscience and Cell Biology 683 Hoes Lane, RWJ-SPH 357A Piscataway, NJ 08854 Email: Varia at UMDNJ.edu Tel.:732-235-5393 Fax.:732-235-4990 From kate at soe.ucsc.edu Tue Feb 6 11:12:43 2007 From: kate at soe.ucsc.edu (Kate Rosenbloom) Date: Tue, 6 Feb 2007 11:12:43 -0800 Subject: [Genome] Singapore PETs In-Reply-To: <8AF9A8BFBBF0194F988AB05A59F8AFD56E3321@exchangemb4.bnl.gov> References: <8AF9A8BFBBF0194F988AB05A59F8AFD56E3321@exchangemb4.bnl.gov> Message-ID: <0c0ea0187ebf39f2726124b7105cfa2f@soe.ucsc.edu> Hello Krassi, The Genome Institute of Singapore ChIP-PET track has been expanded to include additional data, and so has been renamed more generically, to 'GIS ChIP-PET'. You'll find the p53 data available as a subtrack of this track. Cheers, Kate On Feb 6, 2007, at 10:23 AM, Alexieva-Botcheva, Krassimira wrote: > Hi, > > I was wondering why I can't upload Singapore data (p53 ChIP-PET), they > are not present anymore as option for customizing the tracks. Thanks > > Krassi Botcheva > > > > > > Krassimira Alexieva-Botcheva, PhD > > Biology Department > > Brookhaven National Laboratory > > Upton, NY 11973-5000 > > phone: (631) 344 4216 > > fax: (631) 344 3407 > > e-mail: kalexiev at bnl.gov > > > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From kayla at soe.ucsc.edu Tue Feb 6 11:38:22 2007 From: kayla at soe.ucsc.edu (Kayla Smith) Date: Tue, 06 Feb 2007 11:38:22 -0800 Subject: [Genome] Query: Restriction enzyme digest displays In-Reply-To: References: Message-ID: <45C8D92E.1020108@cse.ucsc.edu> Dillon, We have a few different assemblies of the mouse genome, which will have different coordinates from each other. Our most recent, mm8, corresponds to NCBI's Build 36, while mm7 corresponds to NCBI's Build 35. Could you be comparing DNA from two different builds? The position of the itga1 gene in mm8 is chr13:116,080,957-116,222,842. I see in mm8 that claI hits itga1 at the following positions: chr13 116092758 116092764 ClaI 1000 + chr13 116109387 116109393 ClaI 1000 + chr13 116120677 116120683 ClaI 1000 + chr13 116152058 116152064 ClaI 1000 + chr13 116197694 116197700 ClaI 1000 + The position of the itga1 gene in mm7 is chr13:112,132,199-112,273,942 And these are the positions claI hits itga1 in mm7: chr13 112144000 112144006 ClaI 1000 + chr13 112160370 112160376 ClaI 1000 + chr13 112171660 112171666 ClaI 1000 + chr13 112202969 112202975 ClaI 1000 + chr13 112248561 112248567 ClaI 1000 + I obtained this data using the "Restr Enzymes" track in the Genome Browser. There is an option, when clicking into an individual restriction enzyme's details page to "Download BED of enzymes in this browser range", which you may find useful. I hope this information is useful to you. Please don't hesitate to write back if this didn't clear up your concerns about the discrepancy you are seeing. Kayla Smith UCSC Genome Bioinformatics Group Dillon Leong wrote: > Dear Sir/Mdm, > > I've been using genome browser (mouse) to design DNA digests for > Southern Blots. Recently, I noticed a discrepancy which I hope you > can help to clarify. > > I downloaded the Refseq genomic sequence for mouse itga1 gene into a > Microsoft Word file and searched for cleavage sites for restriction > enzyme ClaI. I was sure to remove paragraph marks from the sequence. > When I compared the cleavage pattern with that in genome browser, I > found differences. The number of cleavage sites remained the same > (=5), but the position of the sites and the resulting DNA fragment > sizes were different. > > I can't figure out why this is so, and I've redone my Word document > to verify that I haven't made mistakes while transferring the > sequence from Refseq. Would you please give it a go and see if you > get the same result? > > Best regards, > Dillon > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From kayla at soe.ucsc.edu Tue Feb 6 12:08:26 2007 From: kayla at soe.ucsc.edu (Kayla Smith) Date: Tue, 06 Feb 2007 12:08:26 -0800 Subject: [Genome] flanking sequences In-Reply-To: <45C8ACE7.6070405@umdnj.edu> References: <45C8ACE7.6070405@umdnj.edu> Message-ID: <45C8E03A.6000706@cse.ucsc.edu> Smita, The easiest way to do this is to open the mouse genome browser and search for the VGF gene. Once you have the VGF gene displayed in the browser, click on the "DNA" button on the blue bar on the top of the page. You will see "Sequence Retrieval Region Options:" and two text fields where you can input how many extra bases upstream and downstream of your gene that you would like. Click on "get DNA" to retrieve your sequence. I hope this helps you. Please don't hesitate to contact us again if you require further assistance. Kayls Smith UCSC Genome Bioinformatics Group Smita Varia wrote: > hello, > I would klike flanking sequences to mouse VGF gene. Please help me. I > am trying to construct a loxP construct. > Thank you > Smita > From kate at soe.ucsc.edu Tue Feb 6 12:11:56 2007 From: kate at soe.ucsc.edu (Kate Rosenbloom) Date: Tue, 6 Feb 2007 12:11:56 -0800 Subject: [Genome] technical question on "&hgt.customText=" In-Reply-To: <7.0.1.0.0.20070206112415.01fcdc38@gurdon.cam.ac.uk> References: <7.0.1.0.0.20070206112415.01fcdc38@gurdon.cam.ac.uk> Message-ID: Hi Mike, It's good to hear you're find our custom tracks feature useful! Changing the precedence of custom track file settings vs. CGI variables as you request would change the behavior of users' saved URL's and would require some internal discussion within our group to see if it is warranted. I'm not completely clear on the reloading scenarios you describe. A URL to hgTracks with the hgt.customText variable set does indeed always initiate custom track loading from the URL or text assigned to the CGI variable. It sounds as if you want to suppress track reloading under certain circumstances, but it is not obvious why. For your scenario (a), what are you trying to accomplish with the multiple URL's you are providing to users? If the goal is to reposition the browser window or display different tracks while maintaining the existing custom tracks, this can be done with a URL that doesn't reload the custom track. Custom tracks now persist for 2 days unless explicitly deleted. Your scenario (b) sounds as if the server would somehow know when client side data has changed -- please clarify what you are requesting here. Cheers, Kate On Feb 6, 2007, at 4:47 AM, Mike Gilchrist wrote: > > Hi, > > I am working very happily with the custom track display mechanism, > but now that I've started to use the &hgt.customText method, I have a > couple of rather detailed queries that may affect how I implement it. > The method works fine, but it's hard to know how best to use it, as > I'm not sure exactly how it works. As far as I can see, these things > are not touched on in the help pages, and may be useful for other > people... > > (a) the 'browser position' line in the track data file. > If this is left in (as it would be for separate loading via your > browser pages) then it seems to override the &position= information > earlier in the URL. Is this the intended behaviour? Obviously it's a > simple matter to leave the 'browser position' line out of the data > file, but then I need separate versions of the data file for > uploading via the URL and uploading separately. It would certainly > work better for me if the URL position= overrode the 'browser > position' line. > > (b) reloading every time? > If a user is given a list of URLs all containing the &hgt.customText > instruction to load the (same) custom track data file, obviously the > track data gets loaded when the first URL is clicked - but does it > then get RE-loaded for every other URL the user clicks on? This would > presumably have performance implications, both at the user end, but > also for your servers. Typically I'm using a track file with ~10,000 > data points, and each URL takes about 8 seconds to load including the > first one. WIthout the &hgt.customText tag the same link takes only > ~2 seconds. Superficially this indicates that the track data is > reloaded each time, which might make the mechanisms problematic for > larger track data sets. One could work around this with (say) > chromosome specific sets when the number of data points gets too > large... Anyway, I'd be interested in your thoughts on this. > > (c) reload if data changed? > This may well be wishful thinking, but it would be nice if the track > data was reloaded if (and only if) the track data had changed. I > think I've confirmed that the track data is reloaded for each click > by changing the colour of a track in the track data between clicks - > but of course maybe you already check to see if the data file has > changed! > So obviously the non-reloading mechanism I'm advocating above would > only be useful if newer version of the data were reloaded in place of > older versions. > > Apologies if I'm too demanding - for me it's a sign that you have a > great browser and I'd like it to do even more! > > Thanks for your help. > Yours, > mike > > > > > Dr. Mike Gilchrist > Core Bioinformatics Group > Room 112 > The Wellcome Trust/Cancer Research UK Gurdon Institute > Cambridge CB2 1QN > tel: (44-1223-7) 67210 > fax: (44-1223-3) 34089 > m.gilchrist at gurdon.cam.ac.uk > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From kayla at soe.ucsc.edu Tue Feb 6 12:13:54 2007 From: kayla at soe.ucsc.edu (Kayla Smith) Date: Tue, 06 Feb 2007 12:13:54 -0800 Subject: [Genome] Firefox problems with Table Browser In-Reply-To: References: Message-ID: <45C8E182.2080405@cse.ucsc.edu> Chris, Try resetting your cart on our website by clicking on the "Click here to reset" link from the Genome Browser Gateway page or type in this url into your browser: http://genome.ucsc.edu/cgi-bin/cartReset I hope this solves your problem. Please don't hesitate to contact us again if you require more assistance. Kayla Smith UCSC Genome Bioinformatics Group christopher.x.moy at gsk.com wrote: > Hello, > > I like to use the table browser link in the genome browser but I have > found that when I use Firefox, the tables will display the results for > whatever gene I may have previously been working with rather than the > standard form for initial entry into the table browser. This happens even > if I clear the cache on Firefox.This is not a problem with IE and only > happens with Firefox/Mozilla. > > Chris > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From jaimiesixsmith at hotmail.co.uk Tue Feb 6 13:40:58 2007 From: jaimiesixsmith at hotmail.co.uk (Jaimie Sixsmith) Date: Tue, 06 Feb 2007 21:40:58 +0000 Subject: [Genome] Genome Browser query Message-ID: Hi I have been using your Genome Browser and I was wondering if it is possible to reverse sequences in the system. For insatnce if the sequence was AGGGCCTTAGG is there anyway for it to be displayed as GGATTCCGGGA? This would prove most helpful for our studies. I hope that you can help Regards Dr Jaimie Sixsmith, PhD MCB Brown University _________________________________________________________________ Get Hotmail, News, Sport and Entertainment from MSN on your mobile. http://www.msn.txt4content.com/ From natemiller at jhu.edu Tue Feb 6 12:38:18 2007 From: natemiller at jhu.edu (Nate Miller) Date: Tue, 6 Feb 2007 15:38:18 -0500 Subject: [Genome] hg18 fasta files Message-ID: <68c02a3b0702061238w340a6bfeke94b1afac07f15dc@mail.gmail.com> Hello, I had a question about what the lower case versus upper case sequence means in the hg18 fasta files. I believe the lower case bases represent repetitive dna, but I'm not sure how this was determined. If you have any of this information, could you let me know? Thanks, Nate Miller From kayla at soe.ucsc.edu Tue Feb 6 14:18:54 2007 From: kayla at soe.ucsc.edu (Kayla Smith) Date: Tue, 06 Feb 2007 14:18:54 -0800 Subject: [Genome] hg18 fasta files In-Reply-To: <68c02a3b0702061238w340a6bfeke94b1afac07f15dc@mail.gmail.com> References: <68c02a3b0702061238w340a6bfeke94b1afac07f15dc@mail.gmail.com> Message-ID: <45C8FECE.5090907@cse.ucsc.edu> Nate, You are correct, the lower case bases represent repeats. There is a previously answered mailing list question on this topic here: http://www.cse.ucsc.edu/pipermail/genome/2006-December/012267.html I hope this is helpful to you. Please don't hesitate to contact us again if you require more assistance. Kayla Smith UCSC Genome Bioinformatics Group Nate Miller wrote: > Hello, > > I had a question about what the lower case versus upper case sequence means > in the hg18 fasta files. I believe the lower case bases represent repetitive > dna, but I'm not sure how this was determined. If you have any of this > information, could you let me know? > > Thanks, > Nate Miller > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From kayla at soe.ucsc.edu Tue Feb 6 15:20:39 2007 From: kayla at soe.ucsc.edu (Kayla Smith) Date: Tue, 06 Feb 2007 15:20:39 -0800 Subject: [Genome] Expression for human genes In-Reply-To: <002201c749ef$52bb3380$8f8a5c82@cmpg.unibe.ch> References: <002201c749ef$52bb3380$8f8a5c82@cmpg.unibe.ch> Message-ID: <45C90D47.5070102@cse.ucsc.edu> Isabelle, Use the following settings in the table browser to retrive the data you are interested in: clade: Vertebrate genome: Human assembly: Mar. 2006 group: mRNA and EST Tracks track: Spliced ESTs table: intronEst region: (specify the genomic position you are interested in here. Note that there are about 4 million items in this table, so choosing "genome" here may be overwhelming) output format: all fields from selected table click "get output" for your results. You may also download tables by clicking on the downloads link ("downloads" on the blue bar on the left-hand side of the main page), clicking on Human, and then clicking on "Annotation database". You are looking for the file: intronEst.txt.gz Note that in the methods section of the details page for ESTs, we mention: "When a single EST aligned in multiple places, the alignment having the highest base identity was identified. Only alignments having a base identity level within 0.5% of the best and at least 96% base identity with the genomic sequence were kept." Additionally, since you mention that you are studying gene expression, I want to point out that we have some gene expression data availible in our browser which may be of use to you. In the Human Genome Browser, we have GNF Atlas data and also some Affymetrix Human Exon array data. You can find these tracks under the "Expression and Regulation" section of the track controls of the browser. I hope this information is helpful to you. Please don't hesitate to contact us again if you require more assistance. Kayla Smith UCSC Genome Bioinformatics Group Isabelle Dupanloup wrote: > Hi there, > > I would like to study the expression of human genes using ESTs/cDNAs. > I saw that you make available on your website the coordinates of > ESTs/cDNAs mapping to the human genome (defined as blat hits). > My question is about the best way to retrieve this info using your table > browser > Group: mRNA and EST track > What are the optimal choices for > 1) track ? > 2) table ? > There are many possibilities in there. > What I need is the best mapping of ESTs/cDNAs to the human genome. > Could you help me to find the way to get it ? > > Thank you very much in advance. > And thank you also for your very nice tool. > > Isabelle > > ------------------- > Isabelle Dupanloup PhD > Computational and Molecular Population Genetics Group Baltzerstrasse 6 > 3012 Bern Switzerland > Tel: +41 31 631 45 49 > Fax: +41 31 631 48 88 > > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From kayla at soe.ucsc.edu Tue Feb 6 15:42:06 2007 From: kayla at soe.ucsc.edu (Kayla Smith) Date: Tue, 06 Feb 2007 15:42:06 -0800 Subject: [Genome] metylationstatus of cpG-islands In-Reply-To: <20070206111921.85430@gmx.net> References: <20070206111921.85430@gmx.net> Message-ID: <45C9124E.7090904@cse.ucsc.edu> Aslihan, While we don't have a genome-wide track of methylation data, there is an ENCODE track that has some methylation data. The ENCODE project attempts to provide detailed information for 1% of the genome. You could see this data on either the hg16 or hg17 Genome Browsers; the track is found under the "ENCODE Chromosome, Chromatin and DNA Structure" section and is called "Stanf Meth". I hope this is helpful to you. Please don't hesitate to contact us again if you require more assistance. Kayla Smith UCSC Genome Bioinformatics Group asli_ay at gmx.de wrote: > Hello Team of UCSC Human Genome Browser, > > my name is Aslihan Gerhold-Ay and I?am from the Institute for Medical Biometrics, Epidemiology and Informatics in Mainz (Germany). > > I search at your Database for the cpG-islands of the human genome, that was very easy :-)) > But I would like to know how I can automatically find out the methylationstatus of the cpG-islands. > > It would be very nice if you can help me. > > Thank you! > > > Best regards > > Aslihan Gerhold-Ay > > > > > > > > > > > > > > Aslihan Gerhold-Ay > > Obere Zahlbacher Str. 69 > 55131 Mainz, Germany > Institutsgeb?ude 902 > > Tel: 0 049 6131 / 173120 > > > From kayla at soe.ucsc.edu Tue Feb 6 17:10:36 2007 From: kayla at soe.ucsc.edu (Kayla Smith) Date: Tue, 06 Feb 2007 17:10:36 -0800 Subject: [Genome] Genome Browser query In-Reply-To: References: Message-ID: <45C9270C.10304@cse.ucsc.edu> Jaimie, It is not possible to view the reverse of a sequence in our genome browser. Here is a related previously answered mailing list question: http://www.cse.ucsc.edu/pipermail/genome/2006-February/009883.html If you have additional questions about the genome browser, please don't hesitate to contact us again. Kayla Smith UCSC Genome Bioinformatics Group Jaimie Sixsmith wrote: > Hi > I have been using your Genome Browser and I was wondering if it is possible > to reverse sequences in the system. For insatnce if the sequence was > AGGGCCTTAGG is there anyway for it to be displayed as GGATTCCGGGA? This > would prove most helpful for our studies. > I hope that you can help > > Regards > > Dr Jaimie Sixsmith, PhD > MCB Brown University > > _________________________________________________________________ > Get Hotmail, News, Sport and Entertainment from MSN on your mobile. > http://www.msn.txt4content.com/ > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From rhead at soe.ucsc.edu Tue Feb 6 17:44:46 2007 From: rhead at soe.ucsc.edu (Brooke Rhead) Date: Tue, 06 Feb 2007 17:44:46 -0800 Subject: [Genome] RN4ToRN3 liftover file In-Reply-To: <52376.132.206.211.127.1170694957.squirrel@mail.mcb.mcgill.ca> References: <52376.132.206.211.127.1170694957.squirrel@mail.mcb.mcgill.ca> Message-ID: <45C92F0E.703@soe.ucsc.edu> Hi Matt, Yes, we will generate the rn4 to rn3 liftovers. This should be ready in a few days. I will let you know when you can access it. -- Brooke Rhead UCSC Genome Bioinformatics Group msuder at MCB.McGill.CA wrote: > Hi, > > I have coordinates in rn4 and would like to do a "liftover" to rn3. > Unfortunately, the chain file does not appear to be available. Would it > be possible for you to generate that file. > > Thanks! > Matt > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From nit at nanzando.com Tue Feb 6 17:45:16 2007 From: nit at nanzando.com (Naomi ITOH) Date: Wed, 7 Feb 2007 10:45:16 +0900 Subject: [Genome] Permission Request (ASAP) Message-ID: <007301c74a59$af5ed140$75c8a8c0@nanzando.net> Rights and Permission UCSC Genome Bioinformatics Pardon my informality to send an e-mail to you suddenly. I hope that my contacting you is not an imposition in any way? Please send this E-mail to whom it may concern. We are now preparing Japanese book titled "Human Genetics"(written in Japanese and edited by Katsushi Tokunaga, Professor in Department of Human Genetics, Graduate School of Medicine, The University of Tokyo). This book will be printed 1500 copies each of whichi will cost about 40 US?. In this connection, we would like to include the following material from your site into our book thereof; Sample page of UCSC Genome Browser (http://genome.ucsc.edu/cgi-bin/hgTracks?hgsid=85679492&clade=vertebrate&org =Human&db=hg18&position=chrX%3A151%2C073%2C054-151%2C383%2C976&pix=620&Submi t=submit&hgsid=85679492) . The usual acknowledgement will of course be given to the page where the material appears. If the other permission is necessary, please let me know his/her recent address. We should be most grateful if you would grant us your permission to reprint the above requested page from your site. I'm looking forward to hearing from you on this matter at your earliest convenience. We are planning the publication above next month. If you have any questions, please send me E-mail. Sincerely yours. Naomi ITOH ------------------------------------- NANZANDO Co., Ltd. 1-11 Yushima 4-chome Bunkyo-ku Tokyo, JAPAN 113-0034 Tel. 81-3-5689-7850 Fax. 81-3-5689-7851 E-mail nit at nanzando.com URL http://www.nanzando.com ------------------------------------- From thefferon at mail.nih.gov Wed Feb 7 09:28:30 2007 From: thefferon at mail.nih.gov (Tim Hefferon) Date: Wed, 7 Feb 2007 11:28:30 -0600 Subject: [Genome] underlay Message-ID: Hi, Is there any such thing as creating an underlay for the genome browser, such that when looking at large amounts of data, all the exons could be shaded on the browser from top to bottom, for example? Thanks, Tim From rhead at soe.ucsc.edu Wed Feb 7 10:56:50 2007 From: rhead at soe.ucsc.edu (Brooke Rhead) Date: Wed, 07 Feb 2007 10:56:50 -0800 Subject: [Genome] Permission Request (ASAP) In-Reply-To: <007301c74a59$af5ed140$75c8a8c0@nanzando.net> References: <007301c74a59$af5ed140$75c8a8c0@nanzando.net> Message-ID: <45CA20F2.9020002@soe.ucsc.edu> To Whom It May Concern: Screenshots of the Genome Browser may be used in published materials if you cite the source. Please see our website for citation guidelines: http://genome.ucsc.edu/cite.html Screen shot guidelines are at the bottom of the page. -- Brooke Rhead UCSC Genome Bioinformatics Group Naomi ITOH wrote: > Rights and Permission > UCSC Genome Bioinformatics > > Pardon my informality to send an e-mail to you suddenly. > > I hope that my contacting you is not an imposition in any way? > Please send this E-mail to whom it may concern. > > We are now preparing Japanese book titled "Human Genetics"(written in > Japanese and edited by Katsushi Tokunaga, Professor in Department of Human > Genetics, Graduate School of Medicine, The University of Tokyo). This book > will be printed 1500 copies each of whichi will cost about 40 US?. > > In this connection, we would like to include the following material from > your site into our book thereof; > > Sample page of UCSC Genome Browser > (http://genome.ucsc.edu/cgi-bin/hgTracks?hgsid=85679492&clade=vertebrate&org > =Human&db=hg18&position=chrX%3A151%2C073%2C054-151%2C383%2C976&pix=620&Submi > t=submit&hgsid=85679492) > . > The usual acknowledgement will of course be given to the page where the > material appears. If the other permission is necessary, please let me > know his/her recent address. > > We should be most grateful if you would grant us your permission to reprint > the above requested page from your site. > > I'm looking forward to hearing from you on this matter at your earliest > convenience. > We are planning the publication above next month. > > If you have any questions, please send me E-mail. > > > Sincerely yours. > > Naomi ITOH > > ------------------------------------- > NANZANDO Co., Ltd. > 1-11 Yushima 4-chome > Bunkyo-ku Tokyo, JAPAN 113-0034 > Tel. 81-3-5689-7850 Fax. 81-3-5689-7851 > E-mail nit at nanzando.com > URL http://www.nanzando.com > ------------------------------------- > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From tanasa at scripps.edu Wed Feb 7 11:39:58 2007 From: tanasa at scripps.edu (Bogdan Tanasa) Date: Wed, 7 Feb 2007 11:39:58 -0800 (PST) Subject: [Genome] download GNF expression data Message-ID: <50031.137.131.48.118.1170877198.squirrel@webmail.scripps.edu> Hi UCSC genome browser, would like to know the way in which I could set up the Table Browser in order to download a set of GNF expression data for a list of genes of interest. I would appreciate your suggestions. Thanks, - Bogdan -- Bogdan Tanasa, MD Kellogg School of Science and Technology, The Scripps Research Institute, 10550 North Torrey Pines Road, La Jolla, California 92037 email: tanasa at scripps.edu From rhead at soe.ucsc.edu Wed Feb 7 13:25:48 2007 From: rhead at soe.ucsc.edu (Brooke Rhead) Date: Wed, 07 Feb 2007 13:25:48 -0800 Subject: [Genome] Permission Request (ASAP) In-Reply-To: <007301c74a59$af5ed140$75c8a8c0@nanzando.net> References: <007301c74a59$af5ed140$75c8a8c0@nanzando.net> Message-ID: <45CA43DC.5030905@soe.ucsc.edu> Hello again Naomi, I have another tip for publishing data from the UCSC Genome Browser: if you publish a url (like the one you list in your message), please remove the "hgsid=85679492" portion from it. "hgsid" is a temporary identifier that will expire after about a day. -- Brooke Rhead UCSC Genome Bioinformatics Group Naomi ITOH wrote: > Rights and Permission > UCSC Genome Bioinformatics > > Pardon my informality to send an e-mail to you suddenly. > > I hope that my contacting you is not an imposition in any way? > Please send this E-mail to whom it may concern. > > We are now preparing Japanese book titled "Human Genetics"(written in > Japanese and edited by Katsushi Tokunaga, Professor in Department of Human > Genetics, Graduate School of Medicine, The University of Tokyo). This book > will be printed 1500 copies each of whichi will cost about 40 US?. > > In this connection, we would like to include the following material from > your site into our book thereof; > > Sample page of UCSC Genome Browser > (http://genome.ucsc.edu/cgi-bin/hgTracks?hgsid=85679492&clade=vertebrate&org > =Human&db=hg18&position=chrX%3A151%2C073%2C054-151%2C383%2C976&pix=620&Submi > t=submit&hgsid=85679492) > . > The usual acknowledgement will of course be given to the page where the > material appears. If the other permission is necessary, please let me > know his/her recent address. > > We should be most grateful if you would grant us your permission to reprint > the above requested page from your site. > > I'm looking forward to hearing from you on this matter at your earliest > convenience. > We are planning the publication above next month. > > If you have any questions, please send me E-mail. > > > Sincerely yours. > > Naomi ITOH > > ------------------------------------- > NANZANDO Co., Ltd. > 1-11 Yushima 4-chome > Bunkyo-ku Tokyo, JAPAN 113-0034 > Tel. 81-3-5689-7850 Fax. 81-3-5689-7851 > E-mail nit at nanzando.com > URL http://www.nanzando.com > ------------------------------------- > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From QLIU at intra.nida.nih.gov Wed Feb 7 13:45:40 2007 From: QLIU at intra.nida.nih.gov (Liu, Qing-Rong (NIH/NIDA/IRP) [E]) Date: Wed, 7 Feb 2007 16:45:40 -0500 Subject: [Genome] Extract exon sequences Message-ID: <2A577D0CDDE6E943B249D3FC1C3CE0F3C570DD@NIHCESMLBX8.nih.gov> Dear Sir, How do I extract all the exon sequences from "get DNA" with extended case/color options without copy/paste? Thanks, Qing-Rong (Tim) Liu, Ph.D. Staff Scientist Molecular Neurobiology Branch NIDA/NIH Suite 3510 Triad Building 333 Cassell Drive Baltimore, MD 21224 e-mail: qliu at intra.nida.nih.gov Tel: (410) 550-2843 x 157 Fax: (410) 550-4-1535 From rhead at soe.ucsc.edu Wed Feb 7 14:03:13 2007 From: rhead at soe.ucsc.edu (Brooke Rhead) Date: Wed, 07 Feb 2007 14:03:13 -0800 Subject: [Genome] Extract exon sequences In-Reply-To: <2A577D0CDDE6E943B249D3FC1C3CE0F3C570DD@NIHCESMLBX8.nih.gov> References: <2A577D0CDDE6E943B249D3FC1C3CE0F3C570DD@NIHCESMLBX8.nih.gov> Message-ID: <45CA4CA1.5050003@soe.ucsc.edu> Hi Tim, I assume you are referring to using the Table Browser to get DNA using the "output format: sequence" option. If you enter a filename in the "output file:" box *before* hitting the "get output" button and setting the sequence retrieval and formatting options, the output should be saved to a file rather than displayed in your browser. If this does not answer your question, or if I have misunderstood your question, please let us know. -- Brooke Rhead UCSC Genome Bioinformatics Group Liu, Qing-Rong (NIH/NIDA/IRP) [E] wrote: > Dear Sir, > > How do I extract all the exon sequences from "get DNA" with extended > case/color options without copy/paste? > > Thanks, > > > Qing-Rong (Tim) Liu, Ph.D. > > Staff Scientist > > Molecular Neurobiology Branch > > NIDA/NIH > > Suite 3510 > > Triad Building > > 333 Cassell Drive > > Baltimore, MD 21224 > > e-mail: qliu at intra.nida.nih.gov > > Tel: (410) 550-2843 x 157 > > Fax: (410) 550-4-1535 > > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From rhead at soe.ucsc.edu Wed Feb 7 14:14:13 2007 From: rhead at soe.ucsc.edu (Brooke Rhead) Date: Wed, 07 Feb 2007 14:14:13 -0800 Subject: [Genome] underlay In-Reply-To: References: Message-ID: <45CA4F35.3060808@soe.ucsc.edu> Hello Tim, We do not currently have a feature like this. However, it has been requested by users in the past, and we are currently considering ways the feature or something like it could be implemented. I do not have an estimated completion date at this time. Thank you for your input on improving the Genome Browser. -- Brooke Rhead UCSC Genome Bioinformatics Group Tim Hefferon wrote: > Hi, > > Is there any such thing as creating an underlay for the genome > browser, such that when looking at large amounts of data, all the > exons could be shaded on the browser from top to bottom, for example? > > Thanks, > Tim > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From rhead at soe.ucsc.edu Wed Feb 7 15:25:37 2007 From: rhead at soe.ucsc.edu (Brooke Rhead) Date: Wed, 07 Feb 2007 15:25:37 -0800 Subject: [Genome] download GNF expression data In-Reply-To: <50031.137.131.48.118.1170877198.squirrel@webmail.scripps.edu> References: <50031.137.131.48.118.1170877198.squirrel@webmail.scripps.edu> Message-ID: <45CA5FF1.30307@soe.ucsc.edu> Hello Bogdan, To get GNF expression data for a set of genes, go to the Table Browser and select the clade, genome, and assembly of interest. Then select "group: Expression and Regulation" and "track: GNF Atlas 2". Choose "table: knownToGnfAtlas2". This is a table of UCSC Known Gene names linked to GNF Atlas 2 values. Paste your list of gene identifiers in the "identifiers (names/accessions):" section. These must be UCSC Known Gene identifiers. To see their general format, click on "describe table schema" and look at the 'name' field. If you need to convert from some other type of identifier (like the gene symbol) use the 'kgXref' table (look for the 'kgID' field). Once you have pasted the list of UCSC Known Gene identifiers into the box, select "output format: selected fields from primary and related tables". Now you can select the table you are interested in retrieving expression data from, and your results will be limited to the genes in your list. I hope this is helpful. Please let us know if you have any further questions. -- Brooke Rhead UCSC Genome Bioinformatics Group Bogdan Tanasa wrote: > Hi UCSC genome browser, > > would like to know the way in which I could set up the Table Browser in > order to download a set of GNF expression data for a list of genes of > interest. I would appreciate your suggestions. Thanks, > > - Bogdan > From sanghyuk at ewha.ac.kr Wed Feb 7 19:47:07 2007 From: sanghyuk at ewha.ac.kr (Sanghyuk Lee) Date: Thu, 08 Feb 2007 12:47:07 +0900 Subject: [Genome] Track control in creating links to the UCSC genome browser Message-ID: <01d701c74b33$ced43ee0$10bfffcb@your9ea8c53aff> Hi, I wonder if it is possible to control tracks when creating links to the UCSC genome browser. I would like to control tracks in the URL line, not in the custom track file. I found that org, db, position are three parameters allowed in the URL address. Please let me know what parameters are allowed in the URL line. Thank you very much for your help in advance. Sincerely, Sanghyuk From hiram at soe.ucsc.edu Wed Feb 7 20:50:04 2007 From: hiram at soe.ucsc.edu (Hiram Clawson) Date: Wed, 7 Feb 2007 20:50:04 -0800 Subject: [Genome] Track control in creating links to the UCSC genome browser In-Reply-To: <01d701c74b33$ced43ee0$10bfffcb@your9ea8c53aff> References: <01d701c74b33$ced43ee0$10bfffcb@your9ea8c53aff> Message-ID: <227cb81d05fd88891d45051e9b2adceb@soe.ucsc.edu> Good Evening Sanghyuk: To see all the variables you could change with specifications in the URL, when you are viewing a browser image with the settings you would like, change the URL in your WEB browser to read: http://genome.ucsc.edu/cgi-bin/cartDump and note the listing of variables and their values. You can specify any of these in the URL. Remember to properly encode special characters if necessary for the URL. --Hiram On 2007 Feb 07, , at 7:47 PM, Sanghyuk Lee wrote: > Hi, > > I wonder if it is possible to control tracks when creating links to > the UCSC genome browser. > I would like to control tracks in the URL line, not in the custom > track file. > I found that org, db, position are three parameters allowed in the URL > address. > Please let me know what parameters are allowed in the URL line. > Thank you very much for your help in advance. > > Sincerely, > > Sanghyuk > > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome > From deepsilence at gmail.com Wed Feb 7 23:30:43 2007 From: deepsilence at gmail.com (ChangHee Lee) Date: Thu, 8 Feb 2007 02:30:43 -0500 Subject: [Genome] [UCSC Genome Browser] Downloading raw conservation scores Message-ID: <264a05e60702072330y51b0d26cvb86f4cf70182a31e@mail.gmail.com> I'm wondering whether it is possible to download raw conservation scores of certain chromosomal region. I can see on the genome browser the beautiful conservation graph but every attempt to list the values in actual values failed. Could you please give some references how to get there? Sincerely yours, ChangHee Lee 2007. 02. 08. (Thu) From gerald.quon at utoronto.ca Wed Feb 7 19:38:23 2007 From: gerald.quon at utoronto.ca (Gerald Quon) Date: Wed, 07 Feb 2007 22:38:23 -0500 Subject: [Genome] compiling ucsc genome browser on mac os x Message-ID: <45CA9B2F.3040207@utoronto.ca> Hi, I'm trying to compile the genome browser on my Mac OS X on my Mac Pro (intel core machines). I can't seem to compile the individual programs, I get a lot of errors like the following: ------------------------------------------ G:~/kent/src/parasol gerald$ make cd lib && make gcc -O -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -D_GNU_SOURCE -DMACHTYPE_ppc -DJK_WARN -Wall -Werror -I../inc -I../../inc -I../../../inc -I../../../../inc -I../../../../../inc -o broadData.o -c broadData.c cc1: warnings being treated as errors broadData.c: In function 'bdReceive': broadData.c:54: warning: pointer targets in passing argument 6 of 'recvfrom' differ in signedness broadData.c: In function 'bdParseSectionQueryMessage': broadData.c:215: warning: pointer targets in assignment differ in signedness make[1]: *** [broadData.o] Error 1 make: *** [all] Error 2 ----------------------------------------- I've set MACHTYPE to be 'ppc'. any ideas? Gerald From wes.barris at csiro.au Wed Feb 7 20:07:08 2007 From: wes.barris at csiro.au (Wes Barris) Date: Thu, 08 Feb 2007 14:07:08 +1000 Subject: [Genome] Btau3 liftover file Message-ID: <45CAA1EC.90805@csiro.au> Hi, I was wondering when you might have a bosTau3ToHg18 liftover file? -- Wes Barris From davide.cittaro at ifom-ieo-campus.it Thu Feb 8 01:49:36 2007 From: davide.cittaro at ifom-ieo-campus.it (Davide Cittaro) Date: Thu, 8 Feb 2007 10:49:36 +0100 Subject: [Genome] compiling ucsc genome browser on mac os x In-Reply-To: <45CA9B2F.3040207@utoronto.ca> References: <45CA9B2F.3040207@utoronto.ca> Message-ID: On Feb 8, 2007, at 4:38 AM, Gerald Quon wrote: > Hi, > > I'm trying to compile the genome browser on my Mac OS X on my Mac Pro > (intel core machines). I can't seem to compile the individual > programs, > I get a lot of errors like the following: Hi, take a look here: http://www.soe.ucsc.edu/pipermail/genome-mirror/2006-November/ 000250.html you should remove -Werror flag in your kent/src/inc/common.mk d /* Davide Cittaro HPC and Bioinformatics Systems @ Informatics Core IFOM - Istituto FIRC di Oncologia Molecolare via adamello, 16 20139 Milano Italy tel.: +39(02)574303355 e-mail: davide.cittaro at ifom-ieo-campus.it */ From pfsulliv at med.unc.edu Thu Feb 8 05:58:52 2007 From: pfsulliv at med.unc.edu (Patrick Sullivan) Date: Thu, 08 Feb 2007 08:58:52 -0500 Subject: [Genome] Genome Graphs Message-ID: <45CB2C9C.3060705@med.unc.edu> UCSC genome mavens: First of all, this is an EXCELLENT tool. I have already circulated it to my colleagues and I would urge you to publicize it widely. It more or less solves one of the most pressing issues in the GWAS area (visualizing results in genomic context), and is far superior to any of the other paltry tools out there. Second, as requested, I have a couple of suggestions and/or wishes. 1. Pls allow the user to upload and plot qualitative data. In trying to understand the results of a genomewide association study, it is very useful to overlay external data from other studies that are often qualitative not quantitative. For example, a user might want simply to overlay the positions of linkage regions implicated in other studies or a candidate gene list. For these, one might only wish to note where they are qualitatively. Would note these with a bar. See Slide 1 in attached .ppt for example. Data input would be brilliant if there were several forms. Input might be the following: a) List of standard HUGO gene names. Could match against knownGene and obtain chromosome and txsMin and txsMax (where txsMin is the minimum txStart over all isoforms and txsMax is the maximum txEnd over all isoforms). Example: NRG1 DTNBP1 DISC1 COMT For the gene NRG1 on chr8 (has multiple isoforms), txsMin=31616809 and txsMax=32741615. These values could be pre-computed for all knownGenes for efficiency. b) Regions in from-to format where these could be chrN:x-y, SNP IDs, or STS markers. Example chr8:31616809-32741615 rs1234 rs5678 D19S123 D19S654 c) Coding suggestion - add an indicator flag in the first column for which sort of data are on that line. This would allow all types of data to be in one file. TYPE Field1 Field2 1 NRG1 1 DTNBP1 2 chr8:31616809-32741615 2 chr9:22616809-22741615 3 rs1234 rs5678 3 rs2222 rs3333 4 D19S123 D19S654 4 D1S111 D1S222 Type=1 for single standard gene names, Type=2 for chrN:from-to, Type=3 for two SNP IDs, and Type=4 for two STS markers. Could then split these into separate files, merge with the appropriate UCSC table to get the coordinates, and then concatenate for plotting. 2. When the user selects an area on the genome overview page, goes to the genome browser set on that area. Some suggestions for the user-defined tracks at the top. a) Show the baseline for each user track (see Slide 2). b) If user wants the scores to be connected, put a little tick at the location of each marker. Otherwise is hard to know the marker density in a region. See Slide 2. c) Allow the user to select of he/she wants the points connected or indicated by a vertical line (i.e., the -log(pvalue) for a SNP at that point). See Slide 3. d) If full display is selected, display the SNP name on the user track. Clicking on that SNP goes to the appropriate page about that SNP (as is done under the current SNP track). 3. An exceptionally useful feature is the merging that occurs (for rs numbers or STS markers). In the output that describes the matching of results, pls list those that failed to match so we can trouble-shoot. Again, many thanks for writing this tool. It will get a lot of use on our side. -- Pat ---- Patrick Sullivan, MD, FRANZCP ---- Ray M. Hayworth & Family Distinguished Professor ---- UNC/Genetics & Carolina Center for Genome Sciences ---- CB#7264, 103 Mason Farm Road ---- Neuroscience Research Building, Room 4109D ---- Chapel Hill, NC, 27599-7264, USA ---- V: +919-966-3358 F: +919-966-3630 From y.itan at ucl.ac.uk Thu Feb 8 06:25:43 2007 From: y.itan at ucl.ac.uk (Yuval Itan) Date: Thu, 8 Feb 2007 14:25:43 +0000 Subject: [Genome] psl score calculation Message-ID: Hello, I need to calculate psl score for my Blat hits, and I was trying to get the logic behind the calculation. I understand that this is the function for score calculation: int pslScore(const struct psl *psl) /* Return score for psl. */ { int sizeMul = pslIsProtein(psl) ? 3 : 1; return sizeMul * (psl->match + ( psl->repMatch>>1)) - sizeMul * psl->misMatch - psl->qNumInsert - psl->tNumInsert; } I was wondering what's the reason for psl->repMatch>>1 , isn't it like dividing repMatch by 2? Also, is the sizeMul element essential for the calculation if I use DNA sequences? Thank you very much, Yuval From jimmy.lin at jhmi.edu Thu Feb 8 08:25:49 2007 From: jimmy.lin at jhmi.edu (JIMMY LIN) Date: Thu, 08 Feb 2007 11:25:49 -0500 Subject: [Genome] Exon Track? Message-ID: Hi: quick question: is there a predefined exon track? I am interested in the average conservation level of the CCDS exons and am traying to use the table browser. I was able to dervie summary statistics but they include all CCDS introns and exons. Is there a way to define regions of only the exons? Thank you, Jimmy From alessandro.guffanti at itb.cnr.it Thu Feb 8 08:52:08 2007 From: alessandro.guffanti at itb.cnr.it (Alessandro Guffanti) Date: Thu, 08 Feb 2007 17:52:08 +0100 Subject: [Genome] Overlap between refSeq, known_gene and all_mRNA tables Message-ID: <45CB5538.5080402@itb.cnr.it> Dear Genome support: I would like to know which is the extent of overlap between the following UCSC db tables: knownGene all_mrna refGene I understand that RefSeq sequences are NOT included in all_mrna: mysql> select * from all_mrna where qName LIKE 'NM_%'; Empty set (0.00 sec) Q1) are all the refGene sequences contained in knwonGene ? Q2) to obtain the total coverage of the human genome in terms of (redundant) transcripts, shall I query all_mrna AND refGene ? or all_mrna and knownGene ? Best wishes and thanks in advance, Alessandro G -- Alessandro Guffanti ? Nanotechnologies Group CNR ? Institute of Biomedical Technologies c/o CISI - Via Fantoli, 16/15 ? 20138 Milano, Italy Ph.: +39-0250320918 Fax: +39-0250320919 Email: alessandro.guffanti at itb.cnr.it From hiram at soe.ucsc.edu Thu Feb 8 09:40:21 2007 From: hiram at soe.ucsc.edu (Hiram Clawson) Date: Thu, 8 Feb 2007 09:40:21 -0800 Subject: [Genome] compiling ucsc genome browser on mac os x In-Reply-To: <45CA9B2F.3040207@utoronto.ca> References: <45CA9B2F.3040207@utoronto.ca> Message-ID: Good Morning Gerald: For this parasol build, you will need to create a directory: mkdir parasol/lib/ppc (actually for an Intel Mac, you probably want i686 instead of ppc.) Then, to get your compiler to by-pass the warnings as errors, fixup one of the ifeq statements in src/inc/common.mk to read: ifeq (darwin,$(findstring darwin,${OSTYPE})) HG_WARN_ERR = -DJK_WARN -Wall -Werror -Wno-unused-variable else ifeq (solaris,$(findstring solaris,${OSTYPE})) HG_WARN_ERR = -DJK_WARN -Wall else HG_WARN_ERR = -DJK_WARN -Wall -Werror endif endif I just happened to fixup this ifeq darwin line yesterday. I discovered that OSTYPE was in my environment as darwin7.0 but it had not been exported. So, make sure your OSTYPE is exported. I'm curious, does the Intel Mac indicate an OSTYPE of darwinX.x ? Although, it looks like you would also need to eliminate the -Werror since your new compiler is actually complaining about something else. Usually we only have warnings about unused variables. I'll need to find someone here with an Intel Mac to see how well it builds the source tree. How did you happen to get into the parasol directory to be creating it ? Are you planning on using parasol to operate a cluster ? --Hiram On 2007 Feb 07, , at 7:38 PM, Gerald Quon wrote: > Hi, > > I'm trying to compile the genome browser on my Mac OS X on my Mac Pro > (intel core machines). I can't seem to compile the individual > programs, > I get a lot of errors like the following: > > > ------------------------------------------ > G:~/kent/src/parasol gerald$ make > cd lib && make > gcc -O -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -D_GNU_SOURCE > -DMACHTYPE_ppc -DJK_WARN -Wall -Werror -I../inc -I../../inc > -I../../../inc -I../../../../inc -I../../../../../inc -o broadData.o > -c > broadData.c > cc1: warnings being treated as errors > broadData.c: In function 'bdReceive': > broadData.c:54: warning: pointer targets in passing argument 6 of > 'recvfrom' differ in signedness > broadData.c: In function 'bdParseSectionQueryMessage': > broadData.c:215: warning: pointer targets in assignment differ in > signedness > make[1]: *** [broadData.o] Error 1 > make: *** [all] Error 2 > ----------------------------------------- > > I've set MACHTYPE to be 'ppc'. any ideas? > > Gerald From jimmy.lin at jhmi.edu Thu Feb 8 10:00:06 2007 From: jimmy.lin at jhmi.edu (JIMMY LIN) Date: Thu, 08 Feb 2007 13:00:06 -0500 Subject: [Genome] Exon Track? In-Reply-To: References: Message-ID: I found the answer on the mailing list. No need to reply. Thanks, Jimmy ----- Original Message ----- From: JIMMY LIN Date: Thursday, February 8, 2007 11:25 am Subject: Exon Track? To: genome at soe.ucsc.edu > Hi: > > quick question: is there a predefined exon track? > I am interested in the average conservation level of the CCDS exons > and am traying to use the table browser. > I was able to dervie summary statistics but they include all CCDS > introns and exons. > Is there a way to define regions of only the exons? > > Thank you,Jimmy From MAG at stowers-institute.org Thu Feb 8 09:44:03 2007 From: MAG at stowers-institute.org (Goel, Manisha) Date: Thu, 8 Feb 2007 11:44:03 -0600 Subject: [Genome] Downloading annotations for parts of genome Message-ID: Hello, I have a large number of D.melanogaster genome co-ordinates for which I want to download the annotations. Or more specifically I just need to know if the region denoted by these coordinates is annotated as "repeats". I have tried looking through the table browser, but am unable to figure out if I can do this using table browser. Could you pelase suggest the most straighforward way of getting this information from ucscs genome browser. Thanks a lot for your help. -Manisha From vlb2 at cornell.edu Thu Feb 8 09:38:00 2007 From: vlb2 at cornell.edu (Vanessa Bauer) Date: Thu, 8 Feb 2007 12:38:00 -0500 Subject: [Genome] possible reasons for sequence masking Message-ID: Hello, Sorry to bother you but I was unsuccessful answering the following question from browsing your web site. In short, I am curious if there are various reasons for sequences to be masked in an alignment. We have downloaded introns for a specific set of loci (roughly 8500) for Drosophila genomes from the Comparative Genomics "group" (multiz15way alignments). We our now attempting to get this data in the format that we want (i.e., each alignment block linked to its corresponding transcript and to mask any part of a intron that is also, at times, coding sequence) using the dm2 annotation. We have noticed upper and lower case letters in the alignments. While I did notice that repeats are masked on the web site I was also wondering if there is any other reason for masking. More specifically, have intron sequence that are also coding (due to alternative splicing or coding regions within introns of other coding regions) been masked? thanks, Vanessa From galt at soe.ucsc.edu Thu Feb 8 11:10:59 2007 From: galt at soe.ucsc.edu (Galt Barber) Date: Thu, 8 Feb 2007 11:10:59 -0800 (PST) Subject: [Genome] psl score calculation In-Reply-To: References: Message-ID: If it is dna, pslIsProtein() returns false, so sizeMul=1. (If it were protein, sizeMul=3). To keep the scores more comparable, scaling by 3 for protein makes sense as there are 3 bases per amino acid codon. If you know you are using only dna, you can just treat sizeMul as 1, or just factor it out. return psl->match + (psl->repMatch/2) - psl->misMatch - psl->qNumInsert - psl->tNumInsert; The repMatch is counted, but only worth half as much as a regular match that is not in a repeat-masked area. The >> operator shifts by the number of bits specified, dropping bits off the right least significant end. This is usually like dividing by two and ignoring any remainder. 14 >> 1 == 7. 15 >> 1 == 7. -Galt On Thu, 8 Feb 2007, Yuval Itan wrote: > Hello, > > I need to calculate psl score for my Blat hits, and I was trying to get > the logic behind the calculation. I understand that this is the > function for score calculation: > > int pslScore(const struct psl *psl) > /* Return score for psl. */ > { > int sizeMul = pslIsProtein(psl) ? 3 : 1; > > return sizeMul * (psl->match + ( psl->repMatch>>1)) - > sizeMul * psl->misMatch - psl->qNumInsert - psl->tNumInsert; > } > > I was wondering what's the reason for psl->repMatch>>1 , isn't it like > dividing repMatch by 2? > Also, is the sizeMul element essential for the calculation if I use DNA > sequences? > > Thank you very much, > > Yuval > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome > From jim_kent at pacbell.net Thu Feb 8 11:39:27 2007 From: jim_kent at pacbell.net (Jim Kent) Date: Thu, 8 Feb 2007 11:39:27 -0800 Subject: [Genome] Genome Graphs In-Reply-To: <45CB2C9C.3060705@med.unc.edu> References: <45CB2C9C.3060705@med.unc.edu> Message-ID: <2C774C60-38AF-4C35-8D23-6FA1B79987EB@pacbell.net> Thanks for the kind words on Genome Graphs. You've got a lot of good ideas for extending it. We'll keep them in mind. I'm not seeing the powerpoint attachment. Could you resend it? On Feb 8, 2007, at 5:58 AM, Patrick Sullivan wrote: > UCSC genome mavens: > > First of all, this is an EXCELLENT tool. I have already circulated > it to my colleagues and I would urge you to publicize it widely. It > more or less solves one of the most pressing issues in the GWAS > area (visualizing results in genomic context), and is far superior > to any of the other paltry tools out there. > > > Second, as requested, I have a couple of suggestions and/or wishes. > > > 1. Pls allow the user to upload and plot qualitative data. In > trying to understand the results of a genomewide association study, > it is very useful to overlay external data from other studies that > are often qualitative not quantitative. > > For example, a user might want simply to overlay the positions of > linkage regions implicated in other studies or a candidate gene > list. For these, one might only wish to note where they are > qualitatively. Would note these with a bar. See Slide 1 in > attached .ppt for example. > > Data input would be brilliant if there were several forms. Input > might be the following: > > a) List of standard HUGO gene names. Could match against knownGene > and obtain chromosome and txsMin and txsMax (where txsMin is the > minimum txStart over all isoforms and txsMax is the maximum txEnd > over all isoforms). Example: > > NRG1 > DTNBP1 > DISC1 > COMT > > For the gene NRG1 on chr8 (has multiple isoforms), txsMin=31616809 > and txsMax=32741615. These values could be pre-computed for all > knownGenes for efficiency. > > b) Regions in from-to format where these could be chrN:x-y, SNP > IDs, or STS markers. Example > > chr8:31616809-32741615 > rs1234 rs5678 > D19S123 D19S654 > > c) Coding suggestion - add an indicator flag in the first column > for which sort of data are on that line. This would allow all types > of data to be in one file. > > TYPE Field1 Field2 > 1 NRG1 > 1 DTNBP1 > 2 chr8:31616809-32741615 > 2 chr9:22616809-22741615 > 3 rs1234 rs5678 > 3 rs2222 rs3333 > 4 D19S123 D19S654 > 4 D1S111 D1S222 > > Type=1 for single standard gene names, Type=2 for chrN:from-to, > Type=3 for two SNP IDs, and Type=4 for two STS markers. > > Could then split these into separate files, merge with the > appropriate UCSC table to get the coordinates, and then concatenate > for plotting. > > > 2. When the user selects an area on the genome overview page, goes > to the genome browser set on that area. Some suggestions for the > user-defined tracks at the top. > > a) Show the baseline for each user track (see Slide 2). > > b) If user wants the scores to be connected, put a little tick at > the location of each marker. Otherwise is hard to know the marker > density in a region. See Slide 2. > > c) Allow the user to select of he/she wants the points connected or > indicated by a vertical line (i.e., the -log(pvalue) for a SNP at > that point). See Slide 3. > > d) If full display is selected, display the SNP name on the user > track. Clicking on that SNP goes to the appropriate page about that > SNP (as is done under the current SNP track). > > > 3. An exceptionally useful feature is the merging that occurs (for > rs numbers or STS markers). In the output that describes the > matching of results, pls list those that failed to match so we can > trouble-shoot. > > > Again, many thanks for writing this tool. It will get a lot of use > on our side. > > -- > Pat > > ---- Patrick Sullivan, MD, FRANZCP > ---- Ray M. Hayworth & Family Distinguished Professor > ---- UNC/Genetics & Carolina Center for Genome Sciences > ---- CB#7264, 103 Mason Farm Road > ---- Neuroscience Research Building, Room 4109D > ---- Chapel Hill, NC, 27599-7264, USA > ---- V: +919-966-3358 F: +919-966-3630 > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From khan at cshl.edu Fri Feb 9 08:07:32 2007 From: khan at cshl.edu (Khan, Sohail) Date: Fri, 9 Feb 2007 11:07:32 -0500 Subject: [Genome] Upload a coordinate file to retrieve the seq Message-ID: Dear UCSC team, I was wondering if it is possible to upload a file containing the start and stop coordinates for each chromosome and retrieve the sequences? I have one file per chrom. ~250,000 coord all together. Thanks. Khan. From liaojy at mail2.sysu.edu.cn Fri Feb 9 02:55:17 2007 From: liaojy at mail2.sysu.edu.cn (liaojy) Date: Fri, 9 Feb 2007 18:55:17 +0800 Subject: [Genome] A question about UCSC genome brower Message-ID: <000001c74c38$c9672c50$880ca8c0@leocomputer> Hi, I encounter a problem when I used UCSC genome browser. I searched ZHX3 gene in human genome, and then entered the "Human Gene ZHX3 Description and Page Index" PAGE (URL: http://genome.ucsc.edu/cgi-bin/hgGene?hgg_gene=NM_015035 &hgg_prot=ZHX3_HUMAN&hgg_chrom=chr20&hgg_start=39240502&hgg_end=39362153 &hgg_type=knownGene&db=hg18&hgsid=85816915%20TITLE=). In "Gene Ontology (GO) Annotations with Structured Vocabulary", it has 12 GO terms. I searched ZHX3 in GO database(http://www.godatabase.org ), I only got 4 terms, they were "nucleus","protein binding","transcription factor activity","negative regulation of transcription, DNA-dependent", Why GO database search result is inconsistent with UCSC genome brower result? Sincerely, Liaojianyou From rich at thevillas.eclipse.co.uk Fri Feb 9 10:07:34 2007 From: rich at thevillas.eclipse.co.uk (richard) Date: Fri, 09 Feb 2007 18:07:34 +0000 Subject: [Genome] conservation tracks for peptide sequences Message-ID: <45CCB866.4050906@thevillas.eclipse.co.uk> Hi, is there a way using either the genome browser or table browser to obtain mutiple alignments for peptide sequences in a similar manner as it is possible for a genomic region by clicking on the conservation track? cheers Rich From kayla at soe.ucsc.edu Fri Feb 9 10:21:41 2007 From: kayla at soe.ucsc.edu (Kayla Smith) Date: Fri, 09 Feb 2007 10:21:41 -0800 Subject: [Genome] [UCSC Genome Browser] Downloading raw conservation scores In-Reply-To: <264a05e60702072330y51b0d26cvb86f4cf70182a31e@mail.gmail.com> References: <264a05e60702072330y51b0d26cvb86f4cf70182a31e@mail.gmail.com> Message-ID: <45CCBBB5.2020003@cse.ucsc.edu> ChangHee, The raw scores are here: http://hgdownload.cse.ucsc.edu/goldenPath/hg18/phastCons17way/ I hope this is helpful to you. Please don't hesitate to contact us again if you require more assistance. Kayla Smith UCSC Genome Bioinformatics Group ChangHee Lee wrote: > I'm wondering whether it is possible to download raw conservation scores of > certain chromosomal region. I can see on the genome browser the beautiful > conservation graph but every attempt to list the values in actual values > failed. Could you please give some references how to get there? > > Sincerely yours, > ChangHee Lee > 2007. 02. 08. (Thu) > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From fanhsu at soe.ucsc.edu Fri Feb 9 10:36:51 2007 From: fanhsu at soe.ucsc.edu (Fan Hsu) Date: Fri, 9 Feb 2007 10:36:51 -0800 Subject: [Genome] A question about UCSC genome brower In-Reply-To: <000001c74c38$c9672c50$880ca8c0@leocomputer> Message-ID: Hi Jian You, The way we generated GO related annotation with UCSC Known Genes is based on using GO/UniProt association data downloaded from: ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/UNIPROT/gene_association.goa_unipro t.gz which seems to provide more comprehensive result. We have recently updated our go database and it will be released to our public servers soon. For this particular gene (NM_015035/ZHX3_HUMAN), the new GO data indicate 13 related GO terms: GO:0003676 GO:0003677 GO:0003700 GO:0005515 GO:0005622 GO:0005634 GO:0006350 GO:0006355 GO:0008270 GO:0043565 GO:0045449 GO:0045892 GO:0046872 Fan. -----Original Message----- From: genome-bounces at soe.ucsc.edu [mailto:genome-bounces at soe.ucsc.edu]On Behalf Of liaojy Sent: Friday, February 09, 2007 2:55 AM To: genome at soe.ucsc.edu Subject: [Genome] A question about UCSC genome brower Hi, I encounter a problem when I used UCSC genome browser. I searched ZHX3 gene in human genome, and then entered the "Human Gene ZHX3 Description and Page Index" PAGE (URL: http://genome.ucsc.edu/cgi-bin/hgGene?hgg_gene=NM_015035 &hgg_prot=ZHX3_HUMAN&hgg_chrom=chr20&hgg_start=39240502&hgg_end=39362153 &hgg_type=knownGene&db=hg18&hgsid=85816915%20TITLE=). In "Gene Ontology (GO) Annotations with Structured Vocabulary", it has 12 GO terms. I searched ZHX3 in GO database(http://www.godatabase.org ), I only got 4 terms, they were "nucleus","protein binding","transcription factor activity","negative regulation of transcription, DNA-dependent", Why GO database search result is inconsistent with UCSC genome brower result? Sincerely, Liaojianyou _______________________________________________ Genome maillist - Genome at soe.ucsc.edu http://www.soe.ucsc.edu/mailman/listinfo/genome From kayla at soe.ucsc.edu Fri Feb 9 11:12:34 2007 From: kayla at soe.ucsc.edu (Kayla Smith) Date: Fri, 09 Feb 2007 11:12:34 -0800 Subject: [Genome] Overlap between refSeq, known_gene and all_mRNA tables In-Reply-To: <45CB5538.5080402@itb.cnr.it> References: <45CB5538.5080402@itb.cnr.it> Message-ID: <45CCC7A2.5020301@cse.ucsc.edu> Alessandro, The Known Genes details page has information on how it is created. Look in the "Methods" section. Here is a link: http://genome.ucsc.edu/cgi-bin/hgTrackUi?g=knownGene Q1.) No, not all refGene sequences are contained in knownGene. Here is a link to a previously answered mailinglist question on that topic: http://www.cse.ucsc.edu/pipermail/genome/2006-November/012041.html Q2.)Of the two table combinations you mention, all_mrna and refGene would cover more of the genome. I hope this is helpful to you. Please don't hesitate to contact us again if you require further assistance. Kayla Smith UCSC Genome Bioinformatics Group Alessandro Guffanti wrote: > Dear Genome support: > > I would like to know which is the extent of overlap between the following > UCSC db tables: > > knownGene > all_mrna > refGene > > I understand that RefSeq sequences are NOT included in all_mrna: > > mysql> select * from all_mrna where qName LIKE 'NM_%'; > Empty set (0.00 sec) > > Q1) are all the refGene sequences contained in knwonGene ? > Q2) to obtain the total coverage of the human genome in terms of > (redundant) transcripts, shall I query > all_mrna AND refGene ? or all_mrna and knownGene ? > > Best wishes and thanks in advance, > > Alessandro G > From kpalaniappan at lbl.gov Fri Feb 9 11:29:37 2007 From: kpalaniappan at lbl.gov (Krishna Palaniappan) Date: Fri, 09 Feb 2007 11:29:37 -0800 Subject: [Genome] linking to ucsc genome browser Message-ID: <45CCCBA1.2010706@lbl.gov> Hello, Given an Entrez/Gene ID, or a gi number of a refseq protein or a refseq protein accesssion ID, how does one link to the UCSC genome browser for the model euk organisms, specifically: human, mouse, rat, zfish, worm, fly, arabidopsis? Could you point out the base URL template that we can use and the parameters? I tried digging thru the various help pages but could not find specific information on this. thanks krishna LBNL, Berkeley From rhead at soe.ucsc.edu Fri Feb 9 14:03:42 2007 From: rhead at soe.ucsc.edu (Brooke Rhead) Date: Fri, 09 Feb 2007 14:03:42 -0800 Subject: [Genome] conservation tracks for peptide sequences In-Reply-To: <45CCB866.4050906@thevillas.eclipse.co.uk> References: <45CCB866.4050906@thevillas.eclipse.co.uk> Message-ID: <45CCEFBE.5010702@soe.ucsc.edu> Hello Rich, Unfortunately, we do not have multiple alignment files of peptide sequences. However (if you haven't done this already), you may be interested in viewing the conservation track in the Genome Browser and turning on the codon translation feature. Do this by going to the track controls page, either by clicking the track name link right above its drop-down box, or by clicking the mini-button to the left of the track -- the tall blue or gray box on the far left-hand side of the display, and then scroll down to the "Codon Translation" section. The "color track by codons" feature is also available on several alignment tracks, such as Other RefSeq, Other mRNA and Other EST, which you may also be interested in viewing. Please let us know if you have any further questions. -- Brooke Rhead UCSC Genome Bioinformatics Group richard wrote: > Hi, > > is there a way using either the genome browser or table browser to > obtain mutiple alignments for peptide sequences in a similar manner as > it is possible for a genomic region by clicking on the conservation track? > > cheers > Rich > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From kayla at soe.ucsc.edu Fri Feb 9 14:15:00 2007 From: kayla at soe.ucsc.edu (Kayla Smith) Date: Fri, 09 Feb 2007 14:15:00 -0800 Subject: [Genome] Downloading annotations for parts of genome In-Reply-To: References: Message-ID: <45CCF264.4060404@cse.ucsc.edu> Manisha, If you are looking to download sequence, here is a link to our FAQ on how to download sequence in batch from an assembly: http://genome.ucsc.edu/FAQ/FAQdownloads#download32 When using the table browser to output sequence, there is an option under "Sequence Formatting Options" to mask repeats to either lower case (or to N). If you select this option before clicking on the "get sequence" button, then the repeat masked sequence will show up in your results as lower case, and everything else will be upper case. Additionally, you could intersect your genomic co-ordinates with repeat masked data to see if there is any overlap between your coordinates and repeats. To do this in the table browser, you could: 1. Save your genomic co-ordinates as a custom track 2. In the Table Browser, set: clade: insect genome: D. melanogaster assembly: Apr. 2004 group: Custom Tracks track: User Track table: (what ever you had named your custom track) click on create intersection group: Variation and Repeats track: RepeatMasker table: RepeatMaster (rmsk) Select the radio button that says "All User Track records that have any overlap with RepeatMaster" Click "submit" output format: BED click "get output" 3. Your results will be those of your items which had any overlap with RepeatMasker. I hope this is helpful to you. Please don't hesitate to contact us again if you require further assistance. Kayla Smith UCSC Genome Bioinformatics Group Goel, Manisha wrote: > Hello, > I have a large number of D.melanogaster genome co-ordinates for which I > want to download the annotations. > Or more specifically I just need to know if the region denoted by these > coordinates is annotated as "repeats". > I have tried looking through the table browser, but am unable to figure > out if I can do this using table browser. > Could you pelase suggest the most straighforward way of getting this > information from ucscs genome browser. > > Thanks a lot for your help. > -Manisha > > _______________________________________________ > Genome maillist - Genome at soe.ucsc.edu > http://www.soe.ucsc.edu/mailman/listinfo/genome From kayla at soe.ucsc.edu Fri Feb 9 14:36:44 2007 From: kayla at soe.ucsc.edu (Kayla Smith) Date: Fri, 09 Feb 2007 14:36:44 -0800 Subject: [Genome] Btau3 liftover file In-Reply-To: <45CAA1EC.90805@csiro.au> References: <45CAA1EC.90805@csiro.au> Message-ID: <45CCF77C.4040903@cse.ucsc.edu> Wes, We wont have a bosTau3 to human liftover for another couple months. I can email you directly when this data is available. In the meantime there is bosTau2ToHg18.over.chain.gz available here: http://hgdownload.cse.ucsc.edu/goldenPath/bosTau2/liftOver/ Please don't hesitate to contact us again if you have any other questions about the Genome Browser. Kayla Smith UCSC Genome Bioinformatics Group Wes Barris wrote: > Hi, > > I was wondering when you might have a bosTau3ToHg18 liftover file? From sladunga2 at unlnotes.unl.edu Fri Feb 9 14:56:37 2007 From: sladunga2 at unlnotes.unl.edu (Steve Ladunga) Date: Fri, 9 Feb 2007 16:56:37 -0600 Subject: [Genome] How can I find out the citation for a table/track? Message-ID: Hello, Could you please help me to find out the citation (literature reference) for your tables/tracks? I found SOME info on: http://genome.ucsc.edu/google/goldenPath/help/trackDescriptions.html#T_tracks and http://genome.ucsc.edu/google/goldenPath/gbdDescriptions.html but for several tables finding the publication would take quite a bit of time, and you may have them already. do you have the references for hg17 tiling-array-related tables? For example, what is the reference for encode_UCSD_ChIP_RNAP_HCT116 Thank you so much! Steve Steve Ladunga UNL Center for Biotechnology and Department of Statistics E204 Beadle Center, University of Nebraska-Lincoln 1901 Vine St., Lincoln, NE 68588-0665 Phone: (402) 472-6074 Fax: (402) 472-3139 sladunga at unl.edu From kayla at soe.ucsc.edu Fri Feb 9 15:14:36 2007 From: kayla at soe.ucsc.edu (Kayla Smith) Date: Fri, 09 Feb 2007 15:14:36 -0800 Subject: [Genome] possible reasons for sequence masking In-Reply-To: References: Message-ID: <45CD005C.9060308@cse.ucsc.edu> Vanessa, In the assembly sequence that we make available for download, we mask repeats only. However, if you click to get sequence for a gene/transcript, you have the option of having coding regions in upper case and introns in lower case -- and in that case it's not masking, it's just the use of case for a different purpose. And if there are different splice forms of a gene, the sequences returned will have upper and lower case in different places. I hope that helps to clear up the "masking" you might be seeing. Please don't hesitate to contact us again if you require more assistance. Kayla Smith UCSC Genome Bioinformatics Group Vanessa Bauer wrote: > Hello, > > Sorry to bother you but I was unsuccessful answering the following > question from browsing your web site. In short, I am curious if > there are various reasons for sequences to be masked in an alignment. > We have downloaded introns for a specific set of loci (roughly 8500) > for Drosophila genomes from the Comparative Genomics "group" > (multiz15way alignments). We our now attempting to get this data in > the format that we want (i.e., each alignment block linked to its > corresponding transcript and to mask any part of a