[Genome] How can I get gene names out from positions of probes in a BED file?
John Cumbers
johncumbers at gmail.com
Thu May 17 13:57:23 PDT 2007
hi Rachel,
Many thanks for the predictors, I will investigate the options.
Here are the two snippets of the files that I'm importing. I think
that I do want to do an intersect with each of them against the gene
track. This is what I was trying to do before, but it was not
working. On pasting now, I see that the two files are slightly
different and I'm not sure why this is (created on different analysis
programs is the reason, but not sure if this affects the results)
Best,
John
chr4 522051 522720 CWO2_1 52.27
chr4 528169 528828 CWO2_2 52.83
chr4 529288 530007 CWO2_3 54.48
chr4 577460 578115 CWO2_4 52.68
chr2L 108229 109517 CWO2_5 97.94
chr2L 124682 126386 CWO2_6 105.53
chr2L 127841 129218 CWO2_7 111.71
chr2L 131814 133016 CWO2_8 73.76
chr2L 140553 142303 CWO2_9 80.55
chr2L 165150 166328 CWO2_10 77.85
chr2L 244984 245608 CWO2_11 50.66
chr2L 246004 248078 CWO2_12 300.97
chr2L 248079 248744 CWO2_13 57.47
chr2L 249032 249692 CWO2_14 54.11
chr2L 272833 273705 CWO2_15 55.62
chr2L 274055 274679 CWO2_16 50.83
chr2L 108251 110286 target 999 +
chr2L 119671 121087 target 999 +
chr2L 123464 125698 target 999 +
chr2L 125807 126206 target 999 +
chr2L 127630 129148 target 999 +
chr2L 132047 133073 target 999 +
chr2L 136302 136375 target 506 +
chr2L 136666 137077 target 973 +
chr2L 140590 142555 target 999 +
chr2L 165301 166365 target 999 +
chr2L 221975 224058 target 999 +
On 5/17/07, Rachel Harte <hartera at soe.ucsc.edu> wrote:
>
> Hello John,
>
> Regarding the problem that you are having with the affy tiled data, please
> would you send me an example of the data that you are loading. Just a few
> lines will suffice and then I will be able to help you more easily. If you
> don't have gene names in the custom track then that would be the problem.
> You may need to do an intersection with one of the gene tracks to obtain
> gene names.
>
> I have several suggestions from other engineers of transcription factor
> binding site predictors that you could try:
>
> 1) There is a program in the Genome Browser source code called
> dnaMotifFind (src/hg/geneBounds/dnaMotifFind).
> The motif input is described as so:
>
> table dnaMotif
> "A gapless DNA motif"
> (
> string name; "Motif name."
> int columnCount; "Count of columns in motif."
> float[columnCount] aProb; "Probability of A's in each
> column."
> float[columnCount] cProb; "Probability of C's in each
> column."
> float[columnCount] gProb; "Probability of G's in each
> column."
> float[columnCount] tProb; "Probability of T's in each
> column."
> )
> It requires tabs between fields, and commas between the elements of the
> arrays. The source code is free for personal, academic and non-profit use
> and details about obtaining it are here:
>
> http://genome.ucsc.edu/FAQ/FAQlicense#license3
>
> 2) If you want to do de novo searching (i.e. not search for matches to a
> position weight matrix-type model), the program BEST was recommended:
>
> http://www.cs.uga.edu/~che/BEST
>
> It automatically runs AlignACE, BioProspector, CONSENSUS and MEME,
> combines their output, and does some nice optimizations (i.e. merging
> results, expanding motifs, etc.)
>
> It is only available for Linux. You also can't run it from the
> command line so you need to do things manually, which can
> be a problem if you are planning on looking at many sets of genes.
> You will also need to do some parsing of the output in order to create a
> BED file format.
>
> 3) For searching for matches to known binding site profiles, rVista is a
> good program to use:
>
> http://rvista.dcode.org/
>
> I hope that this helps you. Please let us know if you have further
> questions.
>
> Rachel
>
> Rachel Harte
> UCSC Genome Bioinformatics Group
> http://genome.ucsc.edu
>
>
> On Thu, 17 May 2007, John Cumbers wrote:
>
> > Hello,
> > I have drosophila affy tiled array data and two bed files.
> >
> > I am having trouble pulling out gene names for the corresponding probes.
> > If I import the bed file as a custom track, then go to table browser and
> > click output at the bottom. I've tried many of the settings, changing
> the
> > group from custom to all tables, but still I don't get any gene names,
> just
> > an output of position. This maybe because no gene names are present in
> my
> > sample, (i.e. I'm just in intergenic regions) but I doubt it. Am I
> doing
> > something wrong? If it is because I don't have any genes there, then
> how
> > can I find which genes are nearest to these sites?
> >
> > As a second question, do you know of a computational transcription
> factor
> > binding site predictor that outputs predicted binding sites as a bed
> file,
> > or similar file that can be intersected with the tiled array data above,
> to
> > test how good the predictions are?
> >
> > Any help much appreciated,
> > John
> >
> >
> >
> > --
> > John Cumbers, Graduate Student
> > Biology and Medicine
> > Brown University, Box G-W
> > Providence, Rhode Island, 02912, USA
> > Tel USA: +1 401 523 8190, Fax: +1 401 863-2166
> > UK to USA: 0207 617 7824
> > _______________________________________________
> > Genome maillist - Genome at soe.ucsc.edu
> > http://www.soe.ucsc.edu/mailman/listinfo/genome
> >
>
--
John Cumbers, Graduate Student
Biology and Medicine
Brown University, Box G-W
Providence, Rhode Island, 02912, USA
Tel USA: +1 401 523 8190, Fax: +1 401 863-2166
UK to USA: 0207 617 7824
More information about the Genome
mailing list