[Genome] differences between MAF and 2bit

Brooke Rhead rhead at soe.ucsc.edu
Thu Mar 8 10:55:51 PST 2007


Hello Pavel,

I see that the mouse regions you list consist only of Ns in the mm8.2bit 
file (using the twoBitToFa program to view as a fasta file).  I checked 
a few of the regions in the mm8 Genome Browser, and they consisted only 
of Ns there as well.  They also corresponded to gaps in the sequence, 
and did not correspond to any human nets or chains.

Are these regions by any chance on the negative strand?  If so, then the 
start coordinate listed in the MAF file is relative to the 
reverse-complemented source sequence, meaning that the start number 
needs to be subtracted from the length of the entire chromosome, which 
is listed right after the strand in the MAF file.  (A more detailed 
description of MAF files is located here: 
http://genome.ucsc.edu/FAQ/FAQformat#format5 .)

In the case of the first region you list, chr10:1580222-1595222, if the 
MAF file looked like this for mouse:

chr10 1580222 15000 - 129959148

Then the aligning region would be calculated as follows:

end = 129959148 - 1580222 = 128378926
start = 129959148 - 1580222 - 15000 = 128363926

So the mm8 aligning region would actually be chr10:128363926-128378926.

I hope this information is helpful.  Please let us know if you have 
further questions.

--
Brooke Rhead
UCSC Genome Bioinformatics Group



Pavel Sumazin wrote:
> I obtained a few coordinates from the mouse genome from the 17 
> vertebrate MAF files (mm8 is in the alignment) and attempted to retrieve 
> flanking sequence from mm8.2bit
> While the sequence information exists in the MAF files, it appears to be 
> missing from the 2bit file.  Is this a known phenomena?  Is there a 
> reason for this?
> I am including the problematic regions below.
>
> Thanks,
> Pavel
>
> ----------------------------------
>
> chr10:1580222-1595222
> chr10:1617696-1632696
> chr10:1622371-1637371
> chr10:1628386-1643386
> chr10:1719750-1734750
> chr10:1720947-1735947
> chr10:1723502-1738502
> chr10:1836299-1851299
> chr10:1845002-1860002
> chr10:1867826-1882826
> chr10:1914649-1929649
> chr10:1975021-1990021
> chr10:1975369-1990369
> chr10:1993795-2008795
> chr10:2015417-2030417
> chr10:2042729-2057729
> chr10:2043929-2058929
> chr10:2047326-2062326
> chr10:2047391-2062391
> chr10:2051919-2066919
> chr10:2101210-2116210
> chr10:2131846-2146846
> chr10:2141109-2156109
> chr10:2178724-2193724
> chr10:2214138-2229138
> chr10:2220258-2235258
> chr10:2243283-2258283
> chr10:2248703-2263703
> chr10:2288675-2303675
> chr10:2309814-2324814
> chr10:2329785-2344785
> chr10:2331482-2346482
> chr10:2450934-2465934
> chr10:2492728-2507728
> chr10:2511309-2526309
> chr10:260893-275893
> chr10:2680359-2695359
> chr10:2794156-2809156
> chr10:2809478-2824478
> chr10:2874457-2889457
> chr10:2874934-2889934
> chr10:2880224-2895224
> chr10:2920094-2935094
> chr10:377837-392837
> chr10:932728-947728
> chr1:177616-192616
> chr1:1984125-1999125
> chr1:2010530-2025530
> chr1:2081239-2096239
> chr1:2111890-2126890
> chr12:1152309-1167309
> chr1:2184304-2199304
> chr1:2498693-2513693
> chr1:2848074-2863074
> chr13:0-9435
> chr13:1092860-1107860
> chr13:11089088-11104088
> chr13:1501341-1516341
> chr13:1503859-1518859
> chr13:2206584-2221584
> chr13:22310-37310
> chr13:2454751-2469751
> chr13:691196-706196
> chr1:393253-408253
> chr1:537375-552375
> chr1:541851-556851
> chr17:2039786-2054786
> chr18:1317598-1332598
> chr18:1456739-1471739
> chr18:1693472-1708472
> chr18:44383-59383
> chr2:176101878-176116878
> chr2:176330068-176345068
> chr3:0-1684
> chr3:1568344-1583344
> chr3:15714271-15729271
> chr3:1854827-1869827
> chr3:1885230-1900230
> chr3:1915118-1930118
> chr4:1025555-1040555
> chr4:1084758-1099758
> chr4:1097824-1112824
> chr4:1208593-1223593
> chr4:121881029-121896029
> chr4:1272011-1287011
> chr4:1503025-1518025
> chr4:1534100-1549100
> chr4:154290-169290
> chr4:160073-175073
> chr4:160278-175278
> chr4:178279-193279
> chr4:1879861-1894861
> chr4:212494-227494
> chr4:2162246-2177246
> chr4:2163680-2178680
> chr4:2195773-2210773
> chr4:2209048-2224048
> chr4:2688384-2703384
> chr4:279208-294208
> chr4:282080-297080
> chr4:301592-316592
> chr4:311723-326723
> chr4:31762137-31777137
> chr4:326964-341964
> chr4:396249-411249
> chr4:436992-451992
> chr4:517689-532689
> chr4:521117-536117
> chr4:611893-626893
> chr4:809490-824490
> chr4:825296-840296
> chr5:68503-83503
> chr7:0-2353
> chr7:1461315-1476315
> chr7:1486950-1501950
> chr7:25797-40797
> chr7:369833-384833
> chr7:408919-423919
> chr7:466426-481426
> chr7:727175-742175
> chr7:883844-898844
> chr7:901622-916622
> chr8:1573442-1588442
> chr8:20223847-20238847
> chr8:21592558-21607558
> chr8:457261-472261
> chr8:58243751-58258751
> chr8:58248362-58263362
> chr8:58277430-58292430
> chr8:58301692-58316692
> chrX:1010247-1025247
> chrX:1523464-1538464
> chrX:1524856-1539856
> chrX:2814671-2829671
> chrX:2866971-2881971
> chrX:2933212-2948212
> chrX:3611925-3626925
> chrX:3808426-3823426
> chrX:3811419-3826419
> chrX:877279-892279
> chrX:893084-908084
> chrX:896646-911646
>
> _______________________________________________
> Genome maillist  -  Genome at soe.ucsc.edu
> http://www.soe.ucsc.edu/mailman/listinfo/genome
>   


More information about the Genome mailing list