[Genome] mirroring from BSC

Brooke Rhead rhead at soe.ucsc.edu
Wed Aug 8 13:40:38 PDT 2007


Hello Alexis,

Thank you for your interest in mirroring the Genome Browser.  The 
instructions on our mirror site 
(http://genome.ucsc.edu/admin/mirror.html) should be complete and 
up-to-date.  Once you have downloaded our source tree, be sure to look 
in the directory "src/product" for various README files.  These files 
contain detailed instructions on setting up and using various parts of 
the Browser.

 > Do you have in mind future changes on naming protocol for genome
 > files? full name, abbreviation based on directory name, uniformity
 > maske vs hardmask, etc...

The differences you see in our naming conventions are mostly historical. 
  We originally had only chrom*.zip files, because at one point we only 
had assemblies that were chromosome-based (not scaffold-based), and the 
zip format was considered more universally usable.  Then we made a 
switch to the .gz (or .tar.gz) format when we decided that it is 
well-supported enough on various platforms and provides superior
compression to .zip.

The "chrom" name vs. the "database" name differences started when we 
began displaying genomes that had not yet been assembled into 
chromosomes.  Usually these genomes are assembled into scaffolds, as is 
the case with bosTau2 and anoCar1.  Naming the files "chrom" does not 
make much sense in those cases, so we substitute the database name instead.

In the future, we generally expect to use the .gz compression instead of 
.zip (although sometimes .zip is still used on assemblies whose previous 
versions used this format).  We expect scaffold-based assemblies to have 
names like "database.*.gz" and chromosome-based assemblies to have names 
like "chrom*.gz".

 > About mirroring UCSC genomes and offer such mirroring
 > through our website, there is some needed constraint to accomplish,
 > something we must know? may we mirror only the databases without
 > Genome-Browser?

The full Genome Browser is freely available to mirror for non-commerical 
organizations.  You can mirror as much or as little as you would like. 
Our FAQs on the topic are located here: 
http://genome.ucsc.edu/FAQ/FAQlicense.

Note that for future mirror-related questions, we have a different 
mailing list specific to mirroring the Browser, at 
genome-mirror at soe.ucsc.edu.

I hope this information helps.  Good luck with your work.

--
Brooke Rhead
UCSC Genome Bioinformatics Group



Alexis Torrano wrote:
> 
> 
> 
> Hello
> 
> I am Alexis Torrano. I am
> mailing you from INB-BSC (National Institute in
> BioInformatics-Barcelona Supercomputing Center) asking for some advice.
> Our researchers make an important use of your databases. And we are
> interested in keeping a mirroring of such genome databases for
> them. 
> We plan
> to add an update process of your databases to our authomatical database
> update process. 
> Mainly we
> will follow the indications appearing on your website about rsync. We'd
> like to know if we should take into account some other issue which could
> make easier the update.
> 
> 
> We have observed that some files
> are of the kind *fa.gz, ga.masked,gz and others like *Fa.zip,
> *FaMasked.zip, hardmask.fa.gz.
> 
> Also, there are some files that
> match in someway their specie directory, some others do not.
> 
> 	Anolis_carolinensis/bigZips/anoCar1.fa.masked.gz
> 	Anopheles_gambiae/bigZips/chromFaMasked.zip
> 	Canis_familiaris/bigZips/chromFaMasked.tar.gz
> 	Bos_taurus/bigZips/bosTau2.hardmask.fa.gz
> 
> Do you have in
> mind future changes on naming protocol for genome files? full name,
> abbreviation based on directory name, uniformity maske vs hardmask,
> etc...
> 
> About mirroring UCSC genomes and offer such mirroring
> through our website, there is some needed constraint to accomplish,
> something we must know? may we mirror only the databases without
> Genome-Browser?
> 
> 
>            thank you very much.
> 
>         Alexis Torrano.-- 
> -----------------------------------------------------
> Alexis
> Torrano Martinez
> e-mail: atorrano at bsc.es, atorrano at lsi.upc.edu
> 
> Nodo Computacional GNHC-1
> (inb.bsc.es)
> Instituto
> Nacional de Bioinformatica (www.inab.org)
> Barcelona Supercomputing
> Center Node (www.bsc.es)
> BSC-CNS (www.bsc.es)
> c/. Jordi Girona
> 29
> Edifici Nexus II, despatx 1B  Tel:  (+34) 93 413 7605
> E-08034
> Barcelona              Fax:  (+34) 93
> Catalunya (Spain)
> Team
> info:
> http://inb.lsi.upc.edu/
> -----------------------------------------------------
> Berlin's Law
> of Computing - Computers don't do what you ask them to do, they do what
> you tell them to do. Named for Dean Berlin, noted observer of reality.
> _______________________________________________
> Genome maillist  -  Genome at soe.ucsc.edu
> http://www.soe.ucsc.edu/mailman/listinfo/genome


More information about the Genome mailing list