[Genome] How can I get gene names out from positions of probes in a BED file?

John Cumbers johncumbers at gmail.com
Thu May 17 13:57:23 PDT 2007


hi Rachel,

Many thanks for the predictors, I will investigate the options.
Here are the two snippets of the files that I'm importing.  I think
that I do want to do an intersect with each of them against the gene
track.  This is what I was trying to do before, but it was not
working.  On pasting now, I see that the two files are slightly
different and I'm not sure why this is (created on different analysis
programs is the reason, but not sure if this affects the results)
Best,
John


chr4	522051	522720	CWO2_1	52.27
chr4	528169	528828	CWO2_2	52.83
chr4	529288	530007	CWO2_3	54.48
chr4	577460	578115	CWO2_4	52.68
chr2L	108229	109517	CWO2_5	97.94
chr2L	124682	126386	CWO2_6	105.53
chr2L	127841	129218	CWO2_7	111.71
chr2L	131814	133016	CWO2_8	73.76
chr2L	140553	142303	CWO2_9	80.55
chr2L	165150	166328	CWO2_10	77.85
chr2L	244984	245608	CWO2_11	50.66
chr2L	246004	248078	CWO2_12	300.97
chr2L	248079	248744	CWO2_13	57.47
chr2L	249032	249692	CWO2_14	54.11
chr2L	272833	273705	CWO2_15	55.62
chr2L	274055	274679	CWO2_16	50.83


chr2L	108251	110286	target	999	+
chr2L	119671	121087	target	999	+
chr2L	123464	125698	target	999	+
chr2L	125807	126206	target	999	+
chr2L	127630	129148	target	999	+
chr2L	132047	133073	target	999	+
chr2L	136302	136375	target	506	+
chr2L	136666	137077	target	973	+
chr2L	140590	142555	target	999	+
chr2L	165301	166365	target	999	+
chr2L	221975	224058	target	999	+



On 5/17/07, Rachel Harte <hartera at soe.ucsc.edu> wrote:
>
> Hello John,
>
> Regarding the problem that you are having with the affy tiled data, please
> would you send me an example of the data that you are loading. Just a few
> lines will suffice and then I will be able to help you more easily. If you
> don't have gene names in the custom track then that would be the problem.
> You may need to do an intersection with one of the gene tracks to obtain
> gene names.
>
> I have several suggestions from other engineers of transcription factor
> binding site predictors that you could try:
>
> 1) There is a program in the Genome Browser source code called
> dnaMotifFind (src/hg/geneBounds/dnaMotifFind).
> The motif input is described as so:
>
> table dnaMotif
> "A gapless DNA motif"
>      (
>      string name;                        "Motif name."
>      int columnCount;                    "Count of columns in motif."
>      float[columnCount] aProb;           "Probability of A's in each
> column."
>      float[columnCount] cProb;           "Probability of C's in each
> column."
>      float[columnCount] gProb;           "Probability of G's in each
> column."
>      float[columnCount] tProb;           "Probability of T's in each
> column."
>      )
> It requires tabs between fields, and commas between the elements of the
> arrays. The source code is free for personal, academic and non-profit use
> and details about obtaining it are here:
>
> http://genome.ucsc.edu/FAQ/FAQlicense#license3
>
> 2) If you want to do de novo searching (i.e. not search for matches to a
> position weight matrix-type model), the program BEST was recommended:
>
> http://www.cs.uga.edu/~che/BEST
>
> It automatically runs AlignACE, BioProspector, CONSENSUS and MEME,
> combines their output, and does some nice optimizations (i.e. merging
> results, expanding motifs, etc.)
>
> It is only available for Linux.  You also can't run it from the
> command line so you need to do things manually, which can
> be a problem if you are planning on looking at many sets of genes.
> You will also need to do some parsing of the output in order to create a
> BED file format.
>
> 3) For searching for matches to known binding site profiles, rVista is a
> good program to use:
>
> http://rvista.dcode.org/
>
> I hope that this helps you. Please let us know if you have further
> questions.
>
> Rachel
>
> Rachel Harte
> UCSC Genome Bioinformatics Group
> http://genome.ucsc.edu
>
>
> On Thu, 17 May 2007, John Cumbers wrote:
>
> > Hello,
> > I have drosophila affy tiled array data and two bed files.
> >
> > I am having trouble pulling out gene names for the corresponding probes.
> > If I import the bed file as a custom track, then go to table browser and
> > click output at the bottom.  I've tried many of the settings, changing
> the
> > group from custom to all tables, but still I don't get any gene names,
> just
> > an output of position.   This maybe because no gene names are present in
> my
> > sample, (i.e. I'm just in intergenic regions) but I doubt it.  Am I
> doing
> > something wrong?  If it is because I don't have any genes there, then
> how
> > can I find which genes are nearest to these sites?
> >
> > As a second question, do you know of a computational transcription
> factor
> > binding site predictor that outputs predicted binding sites as a bed
> file,
> > or similar file that can be intersected with the tiled array data above,
> to
> > test how good the predictions are?
> >
> > Any help much appreciated,
> > John
> >
> >
> >
> > --
> > John Cumbers,  Graduate Student
> > Biology and Medicine
> > Brown University, Box G-W
> > Providence, Rhode Island, 02912, USA
> > Tel USA: +1 401 523 8190,  Fax: +1 401 863-2166
> > UK to USA: 0207 617 7824
> > _______________________________________________
> > Genome maillist  -  Genome at soe.ucsc.edu
> > http://www.soe.ucsc.edu/mailman/listinfo/genome
> >
>



-- 
John Cumbers,  Graduate Student
Biology and Medicine
Brown University, Box G-W
Providence, Rhode Island, 02912, USA
Tel USA: +1 401 523 8190,  Fax: +1 401 863-2166
UK to USA: 0207 617 7824


More information about the Genome mailing list