EXCEL获取SNP的bin物理位置

前人的研究,已经将每条染色体分成了10个左右的bin区域,用于粗定位基因的物理位置。本工具用于将GWAS或QTL Mapping的位置信息转换成bin位置区域。

【我写好的excel下载】

https://dataholdcn.cn/excel/B73%20RefGen_v2%20bin%20physical%20position_tool_za.xlsx

【使用方法】

在 “Reslut”工作表中,替换SNP名称即可。

【实现过程】

首先,我们需要bin区域的物理信息,这里以B73 RefGen v2为例。将下表数据保存到excel文件的“Clean Table“工作表。Bin列表示物理位置bin区段,Chr是染色体号,Size是区段大小,Start是起始位置,End是结束位置,Index是用于查找的信息。

BinChrSize (Mbp)StartEndIndex
1.00Chr12.0412039901S1_1
1.01Chr110.36203990112398949S1_2039901
1.02Chr116.861239894929258944S1_12398949
1.03Chr122.892925894452152777S1_29258944
1.04Chr130.425215277782574898S1_52152777
1.05Chr193.0982584786175670048S1_82584786
1.06Chr123.22175670048198887570S1_175670048
1.07Chr129.4198854443228254156S1_198854443
1.08Chr121.84228254156250093155S1_228254156
1.09Chr117.77250093155267860952S1_250093155
1.10Chr115.33267860952283188047S1_267860952
1.11Chr114.78283259156298039872S1_283259156
1.12Chr13.39297960525301354135S1_297960525
2.00Chr21.5511551442S2_1
2.01Chr22.6415514424187615S2_1551442
2.02Chr210.82418761515005045S2_4187615
2.03Chr213.751501629028771109S2_15016290
2.04Chr242.972877110971742767S2_28771109
2.05Chr281.471742767153140073S2_71742767
2.06Chr234.04153140073187179001S2_153140073
2.07Chr217.72187179001204896812S2_187179001
2.08Chr218.42204896812223320002S2_204896812
2.09Chr212.15223320002235471439S2_223320002
2.10Chr22.42234646685237068873S2_234646685
3.00Chr31.7311729470S3_1
3.01Chr32.1617294703888321S3_1729470
3.02Chr34.7138883318599890S3_3888331
3.03Chr34.5859877213102315S3_8598772
3.04Chr3113.1513109892126255823S3_13109892
3.05Chr342.15126255823168409685S3_126255823
3.06Chr322.7168409685191113733S3_168409685
3.07Chr314.26191113733205370271S3_191113733
3.08Chr310.5205370271215869148S3_205370271
3.09Chr316.29215846541232140174S3_215846541
3.10Chr30.07232074985232140174S3_232074985
4.00Chr40.691694004S4_1
4.01Chr44.356940045045991S4_694004
4.02Chr46.36504599111403953S4_5045991
4.03Chr410.21140395321605708S4_11403953
4.04Chr410.672160570832280422S4_21605708
4.05Chr4118.8332251206151082840S4_32251206
4.06Chr419.96151110020171066103S4_151110020
4.07Chr48.77171066103179834540S4_171066103
4.08Chr425.3179834540205136045S4_179834540
4.09Chr432.11205136045237246237S4_205136045
4.10Chr42.74237246237239987981S4_237246237
4.11Chr42.04239987981242029974S4_239987981
5.00Chr53.3213323880S5_1
5.01Chr54.6733238807997002S5_3323880
5.02Chr56.86800806014870573S5_8008060
5.03Chr565.951485361980804839S5_14853619
5.04Chr591.5980804839172395871S5_80804839
5.05Chr522.92172438770195358319S5_172438770
5.06Chr59.3195305700204605587S5_195305700
5.07Chr57.12204659857211775643S5_204659857
5.08Chr53.75211775643215521293S5_211775643
5.09Chr52.41215465694217872852S5_215465694
6.00Chr68.2718274025S6_1
6.01Chr678.67827681386946680S6_8276813
6.02Chr69.898694668096840186S6_86946680
6.03Chr67.7896840186104619429S6_96840186
6.04Chr616.41104619429121033444S6_104619429
6.05Chr632.92121033444153956114S6_121033444
6.06Chr67.37153762257161129826S6_153762257
6.07Chr65.76161129826166892211S6_161129826
6.08Chr62.29166885247169174353S6_166885247
7.00Chr74.7114707470S7_1
7.01Chr79.15471235313861507S7_4712353
7.02Chr7114.3113861507128175453S7_13861507
7.03Chr727.88128175453156050470S7_128175453
7.04Chr712.28156050470168334968S7_156050470
7.05Chr76.07168334968174407842S7_168334968
7.06Chr72.4174362351176764762S7_174362351
8.00Chr81.8311831964S8_1
8.01Chr88.24183196410072434S8_1831964
8.02Chr811.081007243421153183S8_10072434
8.03Chr887.5421153183108690693S8_21153183
8.04Chr813.79108690693122478285S8_108690693
8.05Chr823.95122478285146426603S8_122478285
8.06Chr818.84146426603165267974S8_146426603
8.07Chr84.07165267974169336170S8_165267974
8.08Chr83.63169336170172961960S8_169336170
8.09Chr82.39172961960175347686S8_172961960
9.00Chr93.3113312268S9_1
9.01Chr98.47331226811781894S9_3312268
9.02Chr911.491178189423268159S9_11781894
9.03Chr978.4823268159101747879S9_23268159
9.04Chr925.44101747879127192550S9_101747879
9.05Chr99.61127192550136806801S9_127192550
9.06Chr911.43136806801148241346S9_136806801
9.07Chr96.63148241346154869700S9_148241346
9.08Chr92.15154599322156750706S9_154599322
10.00Chr102.6512650397S10_1
10.01Chr102.3626503975008368S10_2650397
10.02Chr108.55500836813559627S10_5008368
10.03Chr1074.791355962788348445S10_13559627
10.04Chr1039.6188348445127958713S10_88348445
10.05Chr109.35127936624137286408S10_127936624
10.06Chr105.5137311961142812228S10_137311961
10.07Chr106.82142812228149627545S10_142812228

新建“Reslut”工作表,第一行的前4列分别命名为:

SNPChrPositionBin

对应的公式为:

S1_191524962=”Chr”&MID(A2,2,FIND(“_”,A2)-2)=MID(A2,FIND(“_”,A2)+1,LEN(A2))=XLOOKUP(A2,’Clean Table’!F:F,’Clean Table’!A:A,,-1,1)

第一列用于填写SNP名称,第二列用于获得染色体信息,第三列获取物理位置信息,第四列通过查找获取bin位置。

接下来只需要修改第一列的名称,其他列下拉就可获得每个SNP的bin位置。

若要修改参考基因组,替换“Clean Table”工作表即可。

评论

发表评论

了解 数据控|突破是我们的每一步 的更多信息

立即订阅以继续阅读并访问完整档案。

继续阅读