[Genome] Quality scores

Hiram Clawson hiram at soe.ucsc.edu
Thu Oct 11 10:07:51 PDT 2007


Good Morning James:

This would be a lot of work to collect this data together in one location.
In most cases the original quality scores are in qual fasta files from
the sequencing centers and are in scaffold or contig coordinate systems.
We convert those to our own in-house condensed format (qac) and lift
to chrom coordinates, which then becomes source data for the wiggle
format conversion.

I haven't proven this, but I suspect that the output of the table
browser for these numbers doesn't actually have any loss of
information.  The input numbers were integers in the range
of 0 to 100.  The wiggle conversion has a histogram range of
0 to 127.  So if you take the output of the table browser and
round them to the nearest integer, I think you have the original data.

Here is a sample from anoCar1:

Original qac data:
 >scaffold_0
  21 20 48 31 24 36 29 20 50 21 21 28 36 41 50 49 26 24 49 37
  17 18 50 20 12 43 38 50 42 50 30 21 49 43 30 40 42 50 50 49

Table browser output:
variableStep chrom=scaffold_0 span=1
1	20.9764
2	19.7795
3	47.9055
4	30.8504
5	23.9685
6	35.937
7	28.7559
8	19.7795
9	50
10	20.9764
11	20.9764
12	27.8583
13	35.937
14	40.7244
15	50
16	48.8032
17	25.7638
18	23.9685
19	48.8032
20	36.8346
... etc ...

I'm pretty sure this rounding rule will apply to quality data in
the range of 0 to 100.  Be wary if you find a data set with numbers > 100

--Hiram

James Taylor wrote:
> Hi Kayla,
> 
> Is there any way to get the original qa files that were used to  
> generate the quality tables in these assemblies? These appear to be  
> available for panTro2 [1] but not for most of the assemblies listed  
> below. We would prefer not to use the data from the table browser,  
> since it has been converted to the lossy "wib" format and we are  
> concerned about accumulated error.
> 
> Thanks,
> James
> 
> .. [1]: http://hgdownload.cse.ucsc.edu/goldenPath/panTro2/bigZips/ 
> chromQuals.tar.gz
> 
> On Oct 8, 2007, at 4:44 PM, Kayla Smith wrote:
> 
>> Hello Guru,
>>
>> We have quality scores for some other assemblies.  The table is named
>> "quality" in the respective assembly:
>>
>> anoCar1
>> bosTau1
>> bosTau2
>> bosTau3
>> caePb1
>> canFam1
>> canFam2
>> equCab1
>> felCat3
>> galGal2
>> gasAcu1
>> monDom1
>> monDom4
>> oryLat1
>> panTro1
>> panTro2
>> priPac1
>> rheMac2
>> strPur1
>>
>> For assemblies which do not have a quality table, you might be  
>> interested
>> in checking out the PHRED program:
>> http://www.phrap.com/phred/
>>
>> I hope this information is helpful to you.  Please don't hesitate to
>> contact us again if you require further assistance.
>>
>> Kayla Smith
>> UCSC Genome Bioinformatics Group
>>
>> On Mon, 8 Oct 2007, Guruprasad Ananda wrote:
>>
>>> Hi,
>>>
>>> I noticed that on the UCSC genome browser, quality score are
>>> available for download for chimpanzee and rhesus genomes. I was
>>> wondering if you had quality scores for other genomes. If not, could
>>> you suggest me an alternative place from where I can obtain quality
>>> scores for other genomes?
>>>
>>> Regards,
>>> Guru.
>>>
>>> Guruprasad Ananda
>>> Graduate Student
>>> Bioinformatics and Genomics
>>> The Pennsylvania State University


More information about the Genome mailing list