[Genome] More help with upstream .maf file

Hiram Clawson hiram at soe.ucsc.edu
Mon Jan 8 15:59:41 PST 2007


Good Afternoon Jeffrey:

It is going to be tough to get this file into an editor.  It is over 600 Mb
and almost 450,000 lines.  It  will load into the vi editor on a Linux
system, but that running vi process occupies over 1 Gb of memory
as it works.

It would probably be more useful if you could think about working with
this file in a programmatical  method rather than reading the file with an editor.
If you can locate Windows command equivalents to typical Linux
commands such as grep,awk,sed, and perl scripts, you would be able
to parse out the bits of interest from this file for your use.  The kent source
tree has a variety of useful commands that can work with maf files to transform
them in various ways, although that would require something such as
the Cygwin system to give you a Linux system on top of your Windows.
That is a lot of work though to get all of that up and running.

You could probably make do with some Perl scripts to parse the file if you
are into Perl programming.

--Hiram

calhoujd at notes.udayton.edu wrote:
> To Whom it May Concern,
> 
> Feel free to disregard my previous email seeking help.  I figured out my 
> previous issue, but have instead run into a new problem trying to use the 
> upstream2000.maf file (Multiple alignments of 16 vertebrate genomes with 
> Human).  I have tried using at least five text editors including notepad, 
> notepad++, wordpad, and the MEGA3.1 text editor to open the file with 
> little success.  I am assuming this is partially due to the size of the 
> file (>600 MB).  However, two of the text editors were able to begin to 
> open the file, but stopped way short of the entire file.  Enclosed is a 
> screen shot of how far the text editor was able to get (not quite through 
> the first alignment).  Any help as to how I can open this file in a text 
> editor would be greatly appreciated.
> 
> Sincerely,
> 
> Jeffrey Calhoun


More information about the Genome mailing list