[Genome] information about "phastcons" score

hong sun hong.sun at esat.kuleuven.be
Sun Apr 6 23:58:16 PDT 2008


Hi Ann,
Thanks for your immediate reply. It help a lots!
I also embed my further questions (colored blue) within your answers 
below.  :-) 


We are interested in the pairwise alignment between intergenic region of 
50 mouse genes and the corresponding intergenic region of human.
The 50 intergenic region of mouse genes are as followings in /*Data1*/, 
what we are doing now is:
1 use UCSC genome browser to browser the chr reigon of our data, with 
selecting only human to do the pairwise alignment with mouse in the 
Conservation Track Settings page.

I have two comments to this part of your question.  If you are not 
already doing it, I would suggest that you create a Custom Track with 
your 50 intergenic mouse regions.  They will be displayed in the mouse 
genome browser and will be easier to navigate to.  Read about creating a 
custom track here: 
http://genome.ucsc.edu/goldenPath/help/hgTracksHelp.html#CustomTracks

/I will try it next time./

Please note that although you are only viewing only the human pairwise 
alignment, the phastCons wiggle values do not change correspondingly.  
This is a common misconception.

/OK, even I only select human, it will not affect the result.
And what is the phastCons wiggle values? is the *dataValue* from /*4 ** 
*/</**filter:  dataValue is >= 0.9  our fourth question: is this 
"dataValue" the threshold for the PhastCons conservation score?* />?
/
> 2 then click on the blue area conservation part on the genome browser 
> page, then it gives the alignments like /*Result1*/ format (followings),
>    *our first question: *is this alignment the alignment between mouse 
> intergenic region and the corresponding intergenic region of human?

This is the alignment between your mouse coordinates (whether they are 
intergenic or not) and the corresponding human coordinates (which may or 
may not be intergenic).

/Our coordinates are  from intergenic sequence.
But for the given results on UCSC database, how to decide which part to 
align with the given input data? /

>    *our second question: *can we download the alignment once but not 
> download each block?

Yes, instead of downloading from this page in a block-by-block fashion, 
I would suggest using the Human Net track on the mouse browser.  This is 
a pairwise alignment between mouse and human.
Take your third region from your Data1 file, chr12:87772800-87773999.  
In the Conservation details page, you will see these block-by-block 
alignments (as you have noted):

B D  Mouse  
gctgggatttctgtatgtgtgacac-aggggattagagaagg-gattagc-gggggtgg-a-ggactgat
B D  Human  
gctcgcgtgtc--aatatgtaacacaaggggattaaagaagg-aattacagtttgggat-g-gagaggat

However, from the Human Net details page (click on the "View alignment 
details of parts of net within browser window" link), you will see a 
base-by-base alignment (human on top, mouse on bottom):

75922219 gctcgcgtgtcaatatgtaacaca-aggggattaaagaaggaattacagtttgggatgga 
75922277
 >>>>>>>> ||| |  | ||  |||||   ||| ||||||||| |||||| ||||  |   ||| |||| 
 >>>>>>>>
87772800 gctgggatttctgtatgtgtgacacaggggattagagaagggattagcg---ggggtgga 
87772856

...and so on.

/OK, I see. And is this alignment the intergenic sequence alignment 
between mouse and corresponding intergenic sequence of human?
If not, the same question, //how to decide which part to align with the 
given input data?//
/
> 3 beside the pairwise alignment between intergenic region of mouse and 
> human, we are also interested in the conserved region of the pairwise 
> alignment, here we are willing to use PhastCons
>    conservation score, *our third question: *as we know PhastCons is 
> for multiple species alignment, but we do pairwise alignment, can we 
> also get/use PhastCons score to select the conserved region?

There is no pairwise PhastCons score computed or displayed on our 
website.  You are certainly welcome to use high-scoring regions to 
decide what you think is "conserved".  I would suggest using the items 
in the "Most Conserved" track.

/Good idea! From the "Most Conserved" I can get conserved region (I 
think I should choose the /*"PhastCons Vertebrate Conserved Elements, 
30-way Multiz Alignment"*/), maybe I can use this as our conserved region.
My question is, how this conserved region be defined? Between mouse and 
human (can be choose by user)?  Or between mouse and many other species 
(does 30-way Multiz mean 30 species aligned with Multiz)?
/
This genomewiki page might also be helpful to you in understanding the 
intricacies of the mm9 Conservation track: 
http://genomewiki.ucsc.edu/index.php/Mm9_multiple_alignment


> 4 Suppose we can use PhastCons score. Here goes the procedure what we 
> did to get the conserved region of the pairwise alignment.
>    We click table browser on the alignment page, and we choose 
> parameters like:
>    *group: Comparatics Genomics
>    *track: Conservation
>    *table: phastCons17way
>    * region: positon chr12:30523186-30524385
>    **filter:  dataValue is >= 0.9  our fourth question: is this 
> "dataValue" the threshold for the PhastCons conservation score?*
>    *output format: bed format,
>    With all of these, We get /*Result2*/ as followings.
>

If you decide to use the phastCons17way (or 30way), I would recommend 
you use the raw data available from our download server here: 
http://hgdownload.cse.ucsc.edu/downloads.html#mouse  Find the assembly 
you want and choose this link: "Conservation scores for alignments of XX 
vertebrate genomes with Mouse".

/So if I want to use PhastCons score to select conserved region(can 
PhastCons apply to two species?), we had better use your data rather 
than our data./
/I download pairwise alignment between mm9 and hg18 from  
ftp://hgdownload.cse.ucsc.edu/goldenPath/mm9/vsHg18, but it is only 
sequence alignment without PhastCons conservation scores,
maybe i should download phastCons score from here? 
ftp://hgdownload.cse.ucsc.edu/goldenPath/mm9/phastCons30way/vertebrate/

Or does UCSC database provide some wiser pairwise alignment tools? For 
example, it has some conservation score can be choose by the user to 
define conserved region, then I can input the two intergenic region and 
align it.


Ann, thanks very much for your great help! :-)


Many greetings!

Hong Sun






Ann Zweig wrote:
> Hello Hong Sun,
>
>     Since there are so many parts to your question, I have embedded my 
> answers within your questions below.  I am assuming that you are 
> working with the latest mouse assembly (mm9) and human assembly (hg18).
>
> hong sun wrote:
>> Hello,
>> We are interested in the pairwise alignment between intergenic region 
>> of 50 mouse genes and the corresponding intergenic region of human.
>> The 50 intergenic region of mouse genes are as followings in 
>> /*Data1*/, what we are doing now is:
>> 1 use UCSC genome browser to browser the chr reigon of our data, with 
>> selecting only human to do the pairwise alignment with mouse in the 
>> Conservation Track Settings page.
>
> I have two comments to this part of your question.  If you are not 
> already doing it, I would suggest that you create a Custom Track with 
> your 50 intergenic mouse regions.  They will be displayed in the mouse 
> genome browser and will be easier to navigate to.  Read about creating 
> a custom track here: 
> http://genome.ucsc.edu/goldenPath/help/hgTracksHelp.html#CustomTracks
>
> Please note that although you are only viewing only the human pairwise 
> alignment, the phastCons wiggle values do not change correspondingly.  
> This is a common misconception.
>
>
>> 2 then click on the blue area conservation part on the genome browser 
>> page, then it gives the alignments like /*Result1*/ format (followings),
>>    *our first question: *is this alignment the alignment between 
>> mouse intergenic region and the corresponding intergenic region of 
>> human?
>
> This is the alignment between your mouse coordinates (whether they are 
> intergenic or not) and the corresponding human coordinates (which may 
> or may not be intergenic).
>
>
>>    *our second question: *can we download the alignment once but not 
>> download each block?
>
> Yes, instead of downloading from this page in a block-by-block 
> fashion, I would suggest using the Human Net track on the mouse 
> browser.  This is a pairwise alignment between mouse and human.
>
> Take your third region from your Data1 file, chr12:87772800-87773999.  
> In the Conservation details page, you will see these block-by-block 
> alignments (as you have noted):
>
> B D  Mouse  
> gctgggatttctgtatgtgtgacac-aggggattagagaagg-gattagc-gggggtgg-a-ggactgat
> B D  Human  
> gctcgcgtgtc--aatatgtaacacaaggggattaaagaagg-aattacagtttgggat-g-gagaggat
>
> However, from the Human Net details page (click on the "View alignment 
> details of parts of net within browser window" link), you will see a 
> base-by-base alignment (human on top, mouse on bottom):
>
> 75922219 gctcgcgtgtcaatatgtaacaca-aggggattaaagaaggaattacagtttgggatgga 
> 75922277
> >>>>>>>> ||| |  | ||  |||||   ||| ||||||||| |||||| ||||  |   ||| |||| 
> >>>>>>>>
> 87772800 gctgggatttctgtatgtgtgacacaggggattagagaagggattagcg---ggggtgga 
> 87772856
>
> ...and so on.
>
>
>> 3 beside the pairwise alignment between intergenic region of mouse 
>> and human, we are also interested in the conserved region of the 
>> pairwise alignment, here we are willing to use PhastCons
>>    conservation score, *our third question: *as we know PhastCons is 
>> for multiple species alignment, but we do pairwise alignment, can we 
>> also get/use PhastCons score to select the conserved region?
>
> There is no pairwise PhastCons score computed or displayed on our 
> website.  You are certainly welcome to use high-scoring regions to 
> decide what you think is "conserved".  I would suggest using the items 
> in the "Most Conserved" track.
>
> This genomewiki page might also be helpful to you in understanding the 
> intricacies of the mm9 Conservation track: 
> http://genomewiki.ucsc.edu/index.php/Mm9_multiple_alignment
>
>
>> 4 Suppose we can use PhastCons score. Here goes the procedure what we 
>> did to get the conserved region of the pairwise alignment.
>>    We click table browser on the alignment page, and we choose 
>> parameters like:
>>    *group: Comparatics Genomics
>>    *track: Conservation
>>    *table: phastCons17way
>>    * region: positon chr12:30523186-30524385
>>    **filter:  dataValue is >= 0.9  our fourth question: is this 
>> "dataValue" the threshold for the PhastCons conservation score?*
>>    *output format: bed format,
>>    With all of these, We get /*Result2*/ as followings.
>>
>
> If you decide to use the phastCons17way (or 30way), I would recommend 
> you use the raw data available from our download server here: 
> http://hgdownload.cse.ucsc.edu/downloads.html#mouse  Find the assembly 
> you want and choose this link: "Conservation scores for alignments of 
> XX vertebrate genomes with Mouse".
>
> That said, I suggest that you instead use the "Most Conserved" track 
> (and corresponding table: phastConsElements17way (or 30way)).
>
>     This should be enough to point you in the right direction.  Please 
> don't hesitate to contact the mail list again if you require further 
> assistance.
>
>
> Regards,
>
> ----------
> Ann Zweig
> UCSC Genome Bioinformatics Group
> http://genome.ucsc.edu
>
>
>> We are willing to know with our goals, is the procedure correct? If 
>> not, could it be so kind of you to help us out? Thanks in advance! :-)
>>
>>
>>
>> Many greetings,
>>
>> Hong Sun
>>
>> *Data1:*
>> chr12:30523186-30524385
>> chr3:95366249-95367448
>> chr12:87772800-87773999
>> chr14:68894360-68895559
>> chr2:121139669-121140868
>> chr19:53192853-53194052
>> chr11:45726131-45727330
>> chr12:29260496-29261695
>> chr5:121854003-121855202
>> chr2:52246683-52247882
>> chr11:60353173-60354372
>> chr15:11850199-11851398
>> chr8:27250554-27251753
>> chr5:125729944-125731143
>> chr17:31365675-31366874
>> chr15:103067650-103068849
>> chr9:35200747-35201946
>> chr4:134544636-134545835
>> chr19:29714158-29715357
>> chr4:144158764-144159963
>> chr17:26235861-26237060
>> chr14:120236192-120237391
>> chr17:78322989-78324188
>> chr13:115579408-115580607
>> chr10:41964957-41966156
>> chr19:12699353-12700552
>> chr14:45739159-45740358
>> chr19:60944139-60945338
>> chr11:98856334-98857533
>> chr7:125355803-125357002
>> chr13:41349800-41350999
>> chr4:146828776-146829975
>> chr1:62636710-62637909
>> chr12:9599505-9600704
>> chr7:101294530-101295729
>> chr14:68912552-68913751
>> chr6:115960383-115961582
>> chr14:49776557-49777756
>> chr4:62045208-62046407
>> chr13:95385780-95386979
>> chr15:81187829-81189028
>> chr6:112912491-112913690
>> chr11:67781653-67782852
>> chr18:69468890-69470089
>> chr5:118287854-118289053
>> chr2:157834425-157835624
>> chr10:79751314-79752513
>> chr2:152023876-152025075
>> chr8:110324527-110325726
>> chr16:43151087-43152286
>>
>> *Results1:*
>> Conservation score statistics 
>> <http://genome.ucsc.edu/cgi-bin/hgc?hgsid=105549388&g=phastCons17way&i=phastCons17way&c=chr12&l=30523185&r=30524385&o=30523185&db=mm8&parentWigMaf=multiz17way> 
>>
>> Capitalize exons based on show bases
>>
>> Place cursor over species for alignment detail. Click on 'B' to link 
>> to browser for aligned species, click on 'D' to get DNA for aligned 
>> species.
>>
>> *Components not displayed:* X. tropicalis Elephant Cow Dog Armadillo 
>> Chicken Opossum Tetraodon Tenrec Chimp Rhesus Rabbit Zebrafish Rat
>> *Alignment block 1 of 9 in window, 30523186 - 30523549, 364 bps *
>> B 
>> <http://genome.ucsc.edu/cgi-bin/hgTracks?db=mm8&ct=&position=chr12%3A30523186-30523549> 
>> D 
>> <http://genome.ucsc.edu/cgi-bin/hgc?o=30523185&g=getDna&i=chr12&c=chr12&l=30523185&r=30523549&db=mm8>  
>> Mouse  
>> agttgagttttatactctcctaggtgctcagtccaatcaagttgagaatcaggatcaactgtcacacctg
>> B 
>> <http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg18&ct=&position=chr2%3A1727829-1735185> 
>> D 
>> <http://genome.ucsc.edu/cgi-bin/hgc?o=1727828&g=getDna&i=chr2&c=chr2&l=1727828&r=1735185&db=hg18&hgSeq.revComp=on>  
>> Human  
>> ======================================================================
>>
>>      Mouse  
>> ggctccagttccaaacctcacatttaagacctctgctcccttggttgtattgcctaacctggccttcctg
>>      Human  
>> ======================================================================
>>
>>      Mouse  
>> gctgaagaatggagagactggaaccccagggagaatcagagaactgtataaagtgtcagcattcaatctt
>>      Human  
>> ======================================================================
>>
>>      Mouse  
>> gcagagtacactctgatgttaacctcagggcttcccttgtcttaacgctgtccacgcaaaagccatccca
>>      Human  
>> ======================================================================
>>
>>      Mouse  
>> tcttccccacaagggttcctcattggcggtgaatgttggagacctcaggaatctctcgctagggagcttc
>>      Human  
>> ======================================================================
>>
>>      Mouse  tatttctgcagcac
>>      Human  ==============
>>
>> ................................................
>>
>>
>> *Results2:*
>> track name="Conservation" description="Vertebrate Multiz Alignment & 
>> Conservation"
>> # db: 'mm8', track: 'phastCons17way', output date: 2008-04-02 
>> 08:37:49 UTC
>> # chrom specified: chr12
>> # position specified: 30523186-30524385
>> # data values >= 0.9
>> chr12 30524026 30524047 chr12.1
>> chr12 30524281 30524382 chr12.2
>>
>>
>>
>>
>>
>> Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm
>>
>> _______________________________________________
>> Genome maillist  -  Genome at soe.ucsc.edu
>> http://www.soe.ucsc.edu/mailman/listinfo/genome
>




Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm



More information about the Genome mailing list