[Genome] How can I get gene names out from positions of probes in a BED file?

Rachel Harte hartera at soe.ucsc.edu
Thu May 17 13:25:59 PDT 2007


Hello John,

Regarding the problem that you are having with the affy tiled data, please
would you send me an example of the data that you are loading. Just a few
lines will suffice and then I will be able to help you more easily. If you
don't have gene names in the custom track then that would be the problem.
You may need to do an intersection with one of the gene tracks to obtain
gene names.

I have several suggestions from other engineers of transcription factor
binding site predictors that you could try:

1) There is a program in the Genome Browser source code called
dnaMotifFind (src/hg/geneBounds/dnaMotifFind).
The motif input is described as so:

table dnaMotif
"A gapless DNA motif"
     (
     string name;                        "Motif name."
     int columnCount;                    "Count of columns in motif."
     float[columnCount] aProb;           "Probability of A's in each
column."
     float[columnCount] cProb;           "Probability of C's in each
column."
     float[columnCount] gProb;           "Probability of G's in each
column."
     float[columnCount] tProb;           "Probability of T's in each
column."
     )
It requires tabs between fields, and commas between the elements of the
arrays. The source code is free for personal, academic and non-profit use
and details about obtaining it are here:

http://genome.ucsc.edu/FAQ/FAQlicense#license3

2) If you want to do de novo searching (i.e. not search for matches to a
position weight matrix-type model), the program BEST was recommended:

http://www.cs.uga.edu/~che/BEST

It automatically runs AlignACE, BioProspector, CONSENSUS and MEME,
combines their output, and does some nice optimizations (i.e. merging
results, expanding motifs, etc.)

It is only available for Linux.  You also can't run it from the
command line so you need to do things manually, which can
be a problem if you are planning on looking at many sets of genes.
You will also need to do some parsing of the output in order to create a
BED file format.

3) For searching for matches to known binding site profiles, rVista is a
good program to use:

http://rvista.dcode.org/

I hope that this helps you. Please let us know if you have further
questions.

Rachel

Rachel Harte
UCSC Genome Bioinformatics Group
http://genome.ucsc.edu


On Thu, 17 May 2007, John Cumbers wrote:

> Hello,
> I have drosophila affy tiled array data and two bed files.
>
> I am having trouble pulling out gene names for the corresponding probes.
> If I import the bed file as a custom track, then go to table browser and
> click output at the bottom.  I've tried many of the settings, changing the
> group from custom to all tables, but still I don't get any gene names, just
> an output of position.   This maybe because no gene names are present in my
> sample, (i.e. I'm just in intergenic regions) but I doubt it.  Am I doing
> something wrong?  If it is because I don't have any genes there, then how
> can I find which genes are nearest to these sites?
>
> As a second question, do you know of a computational transcription factor
> binding site predictor that outputs predicted binding sites as a bed file,
> or similar file that can be intersected with the tiled array data above, to
> test how good the predictions are?
>
> Any help much appreciated,
> John
>
>
>
> --
> John Cumbers,  Graduate Student
> Biology and Medicine
> Brown University, Box G-W
> Providence, Rhode Island, 02912, USA
> Tel USA: +1 401 523 8190,  Fax: +1 401 863-2166
> UK to USA: 0207 617 7824
> _______________________________________________
> Genome maillist  -  Genome at soe.ucsc.edu
> http://www.soe.ucsc.edu/mailman/listinfo/genome
>


More information about the Genome mailing list