[Genome] Quality scores
Hiram Clawson
hiram at soe.ucsc.edu
Thu Oct 11 10:07:51 PDT 2007
Good Morning James:
This would be a lot of work to collect this data together in one location.
In most cases the original quality scores are in qual fasta files from
the sequencing centers and are in scaffold or contig coordinate systems.
We convert those to our own in-house condensed format (qac) and lift
to chrom coordinates, which then becomes source data for the wiggle
format conversion.
I haven't proven this, but I suspect that the output of the table
browser for these numbers doesn't actually have any loss of
information. The input numbers were integers in the range
of 0 to 100. The wiggle conversion has a histogram range of
0 to 127. So if you take the output of the table browser and
round them to the nearest integer, I think you have the original data.
Here is a sample from anoCar1:
Original qac data:
>scaffold_0
21 20 48 31 24 36 29 20 50 21 21 28 36 41 50 49 26 24 49 37
17 18 50 20 12 43 38 50 42 50 30 21 49 43 30 40 42 50 50 49
Table browser output:
variableStep chrom=scaffold_0 span=1
1 20.9764
2 19.7795
3 47.9055
4 30.8504
5 23.9685
6 35.937
7 28.7559
8 19.7795
9 50
10 20.9764
11 20.9764
12 27.8583
13 35.937
14 40.7244
15 50
16 48.8032
17 25.7638
18 23.9685
19 48.8032
20 36.8346
... etc ...
I'm pretty sure this rounding rule will apply to quality data in
the range of 0 to 100. Be wary if you find a data set with numbers > 100
--Hiram
James Taylor wrote:
> Hi Kayla,
>
> Is there any way to get the original qa files that were used to
> generate the quality tables in these assemblies? These appear to be
> available for panTro2 [1] but not for most of the assemblies listed
> below. We would prefer not to use the data from the table browser,
> since it has been converted to the lossy "wib" format and we are
> concerned about accumulated error.
>
> Thanks,
> James
>
> .. [1]: http://hgdownload.cse.ucsc.edu/goldenPath/panTro2/bigZips/
> chromQuals.tar.gz
>
> On Oct 8, 2007, at 4:44 PM, Kayla Smith wrote:
>
>> Hello Guru,
>>
>> We have quality scores for some other assemblies. The table is named
>> "quality" in the respective assembly:
>>
>> anoCar1
>> bosTau1
>> bosTau2
>> bosTau3
>> caePb1
>> canFam1
>> canFam2
>> equCab1
>> felCat3
>> galGal2
>> gasAcu1
>> monDom1
>> monDom4
>> oryLat1
>> panTro1
>> panTro2
>> priPac1
>> rheMac2
>> strPur1
>>
>> For assemblies which do not have a quality table, you might be
>> interested
>> in checking out the PHRED program:
>> http://www.phrap.com/phred/
>>
>> I hope this information is helpful to you. Please don't hesitate to
>> contact us again if you require further assistance.
>>
>> Kayla Smith
>> UCSC Genome Bioinformatics Group
>>
>> On Mon, 8 Oct 2007, Guruprasad Ananda wrote:
>>
>>> Hi,
>>>
>>> I noticed that on the UCSC genome browser, quality score are
>>> available for download for chimpanzee and rhesus genomes. I was
>>> wondering if you had quality scores for other genomes. If not, could
>>> you suggest me an alternative place from where I can obtain quality
>>> scores for other genomes?
>>>
>>> Regards,
>>> Guru.
>>>
>>> Guruprasad Ananda
>>> Graduate Student
>>> Bioinformatics and Genomics
>>> The Pennsylvania State University
More information about the Genome
mailing list