[Genome] needLargeMem: Out of memory

Hiram Clawson hiram at soe.ucsc.edu
Wed Apr 16 09:24:00 PDT 2008


Good Morning Helder:

Your "program" to run blat on each chromosome is something
like the following shell script:

for C in 1 2 3 4 5 6 ... etc for all chrom names ...
do
     blat hg18.2bit:chr${C} multi_test.fa chr${C}.psl
done

There isn't a big problem here.  Simply run blat for
each chromosome.  It won't even take very much run-time,
even on your laptop.

--Hiram

Helder Nakaya wrote:
> Hello Hiram,
> 
> So, do I need to run blat for each chromosome?
> I can write a program to do this routine but isn´t there an easy way?
> 
> Maybe I need to buy a laptop with more RAM memory : )
> 
> Best,
> 
> Helder
> 
> 2008/4/15, Hiram Clawson <hiram at soe.ucsc.edu>:
>> Good Afternoon Helder:
>>
>> The genome is already split by chromosome in fasta files at:
>>
>> http://hgdownload.cse.ucsc.edu/goldenPath/hg18/chromosomes/
>>
>> Or, you can use the specification:
>>        file.2bit:seqid
>> to specify only one chromosome to be used from the 2bit file.
>>
>> For example:
>>        blat hg18.2bit:chr1 multi_test.fa output.psl
>>
>> Please note the output help message from the blat command when
>> used with no arguments.  You are running stand-alone blat by
>> doing what you are already trying to do.
>>
>> --Hiram
>>
>>
>> Helder Nakaya wrote:
>>> Hello Galt and Hiram,
>>>
>>> I am not trying to align two genomes. Here is what I have done:
>>>
>>> I want to get the genomic coordinates from sequences of a multifasta
>>> file. This multifasta file contains only 2 sequences smaller than
>>> 2,000 nt (it's a test. Later, I will run with a multifasta containing
>>> 244k sequences with 60-mers).
>>>
>>> So, I am trying to run blat using this command line:
>>> blat /home/database/hg18.2bit multi_test.fa output
>>>
>>> I use "top" to see if gfServer was running and it was not.
>>>
>>> I would like to know how do I split the genome by chromosome and run
>>> stand-alone blat.
>>>
>>> All the best,
>>>
>>> Helder
>>>
>>> 2008/4/15, Galt Barber <galt at soe.ucsc.edu>:
>>>
>>>> I don't think Helder is trying to align two entire genomes
>>>> (human and mouse)
>>>> to each other.  If that were so, then he needs blastz anyways.
>>>>
>>>> Furthermore, though blat targets can be huge genomes,
>>>> blat queries cannot exceed a certain size, so the queries would have
>>>> to be broken up into chunks of 5-10k and then the final
>>>> results would need chaining with tools.  Somewhat complex.
>>>> As you pointed out, most users would be better off just
>>>> using our chains.
>>>>
>>>> Perhaps Helder can tell us more about what he's
>>>> really trying to do?
>>>>
>>>> -Galt
>>>>
>>>>
>>>> On Tue, 15 Apr 2008, Hiram Clawson wrote:
>>>>
>>>>
>>>>> Good Morning Helder:
>>>>>
>>>>> I would guess trying to use both the mm9 and hg18 genomes
>>>>> together would require about 16 Gb of memory, or thereabouts.
>>>>>
>>>>> You *may* be able to do one chromosome vs. one chromosome
>>>>> at a time.  For example chr1 vs. chr1 which would be the
>>>>> worst case.  Even that may be too much for a 2 Gb memory
>>>>> machine, I don't know.  Try chrM vs chrM and work up from
>>>>> there.
>>>>>
>>>>> Your input to blat can be the chr1.fa fasta files.  You do
>>>>> not need to convert each fasta file to a 2bit file unless
>>>>> you are trying to save disk space.
>>>>>
>>>>> You can also use the genome browser chain and net tracks
>>>>> between hg18 and mm9.  Those are better alignments than blat
>>>>> can produce all by itself.
>>>>>
>>>>> --Hiram
>>>>>
>>>>> Helder Nakaya wrote:
>>>>>
>>>>>> Hey Galt, thanks.
>>>>>>
>>>>>> I'm trying to load hg18 and mm9. I have already converted all
>>>>>> chromosome .fa files to one single hg18.2bit file (with FaToTwoBit).
>>>>>>
>>>>>> I'm running stand-alone blat. But I have tried to load sequence
>> genome
>>>>>> with gfServer before run blat.
>>>>>>
>>>>>> If I split the genome by chromosomes, will the database be the
>>>>>> directory containing the chr.2bit files?
>>>>>>
>>>>>> Best,
>>>>>>
>>>>>> Helder
>>>>>>
>>>>>> 2008/4/15, Galt Barber <galt at soe.ucsc.edu>:
>>>>>>
>>>>>>> It means you are out of memory.
>>>>>>>  #define ENOMEM 12
>>>>>>>
>>>>>>> Which genome are you trying to load?
>>>>>>> How big is it?
>>>>>>> Are you running stand-alone blat? or gfServer/gfClient?
>>>>>>> What does your command-line look like?
>>>>>>>
>>>>>>> Typically when people don't have enough RAM,
>>>>>>> we advise them to split the database of the genome
>>>>>>> by chromosome, and do one blat run per chrom.
>>>>>>>
>>>>>>> One then needs to use utilities like
>>>>>>> pslReps or pslCDnaFilter
>>>>>>> to combine the resulting psl result files.
>>>>>>>
>>>>>>> -Galt
>>>>>>>
>>>>>>>
>>>>>>> On Tue, 15 Apr 2008, Helder Nakaya wrote:
>>>>>>>
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I'm new on this list and maybe this is not a novel question.
>>>>>>>>
>>>>>>>> I have instaled the blatSrc34 on my laptop (UBUNTU, system type:
>>>>>>>> 32-bit, processor: Intel(R) Core2 Duo CPU, T7250, 2GHz) with
>> 2038MB
>>>>>>>> (RAM).
>>>>>>>>
>>>>>>>> When I try to run BLAT, it gives me this error message:
>>>>>>>> "needLargeMem: Out of memory - request size 158821425 bytes,
>> errno: 12"
>>>>>>>> I would like to know if someone has also had this problem and
>> how do I solve it.
>>>>>>>> I think I need to allocate dynamic memory in memalloc.c program
>> but I
>>>>>>>> don't know how to do this.
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>>
>>>>>>>> Helder
>>>>>>>>
> 
> 


More information about the Genome mailing list