Methods#
Data sources#
This report analyses genome variation data from the Malaria Vector Genome Observatory. See Table 1 below for a complete list of the sample sets used in the current analysis version, with information about the corresponding contributors, data releases and citations. These sample sets provide data for a total of 4,878 mosquitoes sampled from 25 countries.
Sample metadata, unphased SNP calls, and phased SNP haplotypes were retrieved from the Malaria Vector Genome Observatory cloud data repository hosted in Google Cloud Storage (GCS) via the MalariaGEN Python API version 15.0.1.
Sample inclusion and grouping into cohorts#
Samples were considered for inclusion if they met the following criteria:
Gender assigned as female via comparison of sequence coverage on autosomes and sex chromosomes.
Taxon assigned as gambiae, coluzzii, arabiensis or bissau via principle components analysis of genomic data from Chromosome 3 and comparison with reference samples with known taxon assignments.
After filtering according to these inclusion criteria, samples were grouped into cohorts by taxon, location of sampling and date of sampling. Samples were grouped spatially if their collection locations were within the same level 2 administrative unit, according to geoBoundaries version 5.0.0. Samples were grouped temporally if their collection dates were within the same quarter (3 month period) where possible, except in a small number of cases where metadata were only available on year of collection.
Cohorts were excluded from the analysis if the sample size was less than 15. Cohorts with more than 100 samples were randomly downsampled for computational efficiency. Cohorts were also excluded from the analysis if they failed H12 or G123 window size calibration (see below). After applying these filters, a total of 61 cohorts were retained for analysis (Table 2).
Cohort | Country | Region | District | Taxon | Year | Quarter | Sample Size |
---|---|---|---|---|---|---|---|
Angola / Luanda / coluzzii / 2009 / Q2 | Angola | Luanda | Luanda | coluzzii | 2009 | 2 | 77 |
Burkina Faso / Comoe / coluzzii / 2011 | Burkina Faso | Cascades | Comoe | coluzzii | 2011 | 18 | |
Burkina Faso / Comoe / coluzzii / 2012 | Burkina Faso | Cascades | Comoe | coluzzii | 2012 | 63 | |
Burkina Faso / Comoe / coluzzii / 2015 | Burkina Faso | Cascades | Comoe | coluzzii | 2015 | 33 | |
Burkina Faso / Comoe / coluzzii / 2016 | Burkina Faso | Cascades | Comoe | coluzzii | 2016 | 53 | |
Burkina Faso / Houet / coluzzii / 2012 / Q3 | Burkina Faso | Hauts-Bassins | Houet | coluzzii | 2012 | 3 | 78 |
Burkina Faso / Houet / coluzzii / 2014 / Q3 | Burkina Faso | Hauts-Bassins | Houet | coluzzii | 2014 | 3 | 32 |
Burkina Faso / Houet / gambiae / 2012 / Q3 | Burkina Faso | Hauts-Bassins | Houet | gambiae | 2012 | 3 | 73 |
Burkina Faso / Houet / gambiae / 2014 / Q3 | Burkina Faso | Hauts-Bassins | Houet | gambiae | 2014 | 3 | 41 |
Benin / Djougou / coluzzii / 2017 / Q2 | Benin | Donga | Djougou | coluzzii | 2017 | 2 | 78 |
Benin / Djougou / gambiae / 2017 / Q2 | Benin | Donga | Djougou | gambiae | 2017 | 2 | 30 |
Benin / Djougou / gambiae / 2017 / Q3 | Benin | Donga | Djougou | gambiae | 2017 | 3 | 34 |
Benin / Avrankou / coluzzii / 2017 / Q3 | Benin | Oueme | Avrankou | coluzzii | 2017 | 3 | 88 |
Democratic Republic of the Congo / Gbadolite / gambiae / 2015 / Q3 | Democratic Republic of the Congo | Nord-Ubangi | Gbadolite | gambiae | 2015 | 3 | 44 |
Central African Republic / Bangui / gambiae / 1994 / Q1 | Central African Republic | Bangui | Bangui | gambiae | 1994 | 1 | 53 |
Cote d'Ivoire / Sud-Comoe / gambiae / 2017 / Q3 | Cote d'Ivoire | Comoe | Sud-Comoe | gambiae | 2017 | 3 | 37 |
Cote d'Ivoire / Agneby-Tiassa / coluzzii / 2012 | Cote d'Ivoire | Lagunes | Agneby-Tiassa | coluzzii | 2012 | 80 | |
Cameroon / Mayo-Kani / gambiae / 2005 | Cameroon | Far North | Mayo-Kani | gambiae | 2005 | 18 | |
Cameroon / Haut-Nyong / gambiae / 2009 / Q3 | Cameroon | East | Haut-Nyong | gambiae | 2009 | 3 | 95 |
Cameroon / Lom-Et-Djerem / gambiae / 2009 / Q3 | Cameroon | East | Lom-Et-Djerem | gambiae | 2009 | 3 | 163 |
Gabon / Libreville / gambiae / 2000 / Q4 | Gabon | Estuaire | Libreville | gambiae | 2000 | 4 | 69 |
Ghana / Ablekuma Central Municipal / coluzzii / 2018 / Q1 | Ghana | Greater Accra Region | Ablekuma Central Municipal | coluzzii | 2018 | 1 | 266 |
Ghana / La-Nkwantanang-Madina / gambiae / 2017 / Q4 | Ghana | Greater Accra Region | La-Nkwantanang-Madina | gambiae | 2017 | 4 | 200 |
Ghana / Adansi Akrofuom / coluzzii / 2018 / Q4 | Ghana | Ashanti Region | Adansi Akrofuom | coluzzii | 2018 | 4 | 64 |
Ghana / Adansi South / coluzzii / 2018 / Q4 | Ghana | Ashanti Region | Adansi South | coluzzii | 2018 | 4 | 36 |
Ghana / Adansi South / gambiae / 2018 / Q4 | Ghana | Ashanti Region | Adansi South | gambiae | 2018 | 4 | 29 |
Ghana / Amansie Central / coluzzii / 2018 / Q4 | Ghana | Ashanti Region | Amansie Central | coluzzii | 2018 | 4 | 69 |
Ghana / Bekwai Municipal / coluzzii / 2018 / Q4 | Ghana | Ashanti Region | Bekwai Municipal | coluzzii | 2018 | 4 | 53 |
Ghana / Twifo Atti-Morkwa / coluzzii / 2012 / Q3 | Ghana | Central Region | Twifo Atti-Morkwa | coluzzii | 2012 | 3 | 25 |
Ghana / Upper Denkyira East Municipal / coluzzii / 2018 / Q4 | Ghana | Central Region | Upper Denkyira East Municipal | coluzzii | 2018 | 4 | 23 |
Ghana / Upper Denkyira West / coluzzii / 2018 / Q4 | Ghana | Central Region | Upper Denkyira West | coluzzii | 2018 | 4 | 118 |
Ghana / New Juaben South Municipal / gambiae / 2012 / Q4 | Ghana | Eastern Region | New Juaben South Municipal | gambiae | 2012 | 4 | 23 |
Ghana / Effia Kwesimintsim Municipal / coluzzii / 2012 / Q3 | Ghana | Western Region | Effia Kwesimintsim Municipal | coluzzii | 2012 | 3 | 24 |
Gambia, The / Lower Fuladu West / coluzzii / 2012 / Q4 | Gambia, The | Janjanbureh | Lower Fuladu West | coluzzii | 2012 | 4 | 172 |
Gambia, The / Central Badibu / bissau / 2011 / Q3 | Gambia, The | Kerewan | Central Badibu | bissau | 2011 | 3 | 52 |
Guinea / Kissidougou / gambiae / 2012 / Q4 | Guinea | Faranah | Kissidougou | gambiae | 2012 | 4 | 51 |
Guinea / Macenta / gambiae / 2012 / Q4 | Guinea | Nzerekore | Macenta | gambiae | 2012 | 4 | 51 |
Guinea-Bissau / Setor De Safim / bissau / 2010 | Guinea-Bissau | Biombo | Setor de Safim | bissau | 2010 | 33 | |
Guinea-Bissau / Bissau Autonomous Sector / bissau / 2010 | Guinea-Bissau | Bissau | Bissau Autonomous Sector | bissau | 2010 | 60 | |
Mali / Kangaba / gambiae / 2004 / Q3 | Mali | Koulikouro | Kangaba | gambiae | 2004 | 3 | 23 |
Mali / Kati / coluzzii / 2014 / Q3 | Mali | Koulikouro | Kati | coluzzii | 2014 | 3 | 27 |
Mali / Kati / gambiae / 2014 / Q3 | Mali | Koulikouro | Kati | gambiae | 2014 | 3 | 24 |
Mali / Yanfolila / coluzzii / 2012 / Q4 | Mali | Sikasso | Yanfolila | coluzzii | 2012 | 4 | 23 |
Mali / Yanfolila / gambiae / 2012 / Q4 | Mali | Sikasso | Yanfolila | gambiae | 2012 | 4 | 53 |
Mali / Bla / coluzzii / 2004 / Q3 | Mali | Segou | Bla | coluzzii | 2004 | 3 | 19 |
Malawi / Chikwawa / arabiensis / 2015 / Q2 | Malawi | Southern Region | Chikwawa | arabiensis | 2015 | 2 | 41 |
Mozambique / Morrumbene / gambiae / 2004 / Q1 | Mozambique | Inhambane | Morrumbene | gambiae | 2004 | 1 | 49 |
Mozambique / Morrumbene / gambiae / 2004 / Q2 | Mozambique | Inhambane | Morrumbene | gambiae | 2004 | 2 | 22 |
Togo / Lome Commune / gambiae / 2017 / Q4 | Togo | Maritime Region | Lome Commune | gambiae | 2017 | 4 | 179 |
Tanzania / Muleba / arabiensis / 2015 / Q1 | Tanzania | Kagera | Muleba | arabiensis | 2015 | 1 | 39 |
Tanzania / Muleba / arabiensis / 2015 / Q2 | Tanzania | Kagera | Muleba | arabiensis | 2015 | 2 | 98 |
Tanzania / Muleba / gambiae / 2015 / Q2 | Tanzania | Kagera | Muleba | gambiae | 2015 | 2 | 18 |
Tanzania / Tarime / arabiensis / 2012 / Q3 | Tanzania | Mara | Tarime | arabiensis | 2012 | 3 | 47 |
Tanzania / Muheza / gambiae / 2013 / Q1 | Tanzania | Tanga | Muheza | gambiae | 2013 | 1 | 32 |
Tanzania / Moshi / arabiensis / 2012 / Q3 | Tanzania | Manyara | Moshi | arabiensis | 2012 | 3 | 39 |
Uganda / Kalangala / gambiae / 2015 / Q2 | Uganda | Central Region | Kalangala | gambiae | 2015 | 2 | 60 |
Uganda / Busia / gambiae / 2016 / Q2 | Uganda | Eastern Region | Busia | gambiae | 2016 | 2 | 24 |
Uganda / Mayuge / gambiae / 2017 / Q2 | Uganda | Eastern Region | Mayuge | gambiae | 2017 | 2 | 21 |
Uganda / Tororo / arabiensis / 2012 / Q4 | Uganda | Eastern Region | Tororo | arabiensis | 2012 | 4 | 81 |
Uganda / Tororo / gambiae / 2012 / Q4 | Uganda | Eastern Region | Tororo | gambiae | 2012 | 4 | 112 |
Uganda / Kanungu / gambiae / 2012 / Q4 | Uganda | Western Region | Kanungu | gambiae | 2012 | 4 | 95 |
H12 and G123 window size calibration#
Both H12 (Garud et al. 2015) and G123 (Harris et al. 2018) are statistical methods for performing genome-wide selection scans which rely on dividing data into windows along the genome. Typically the size of these windows is set to a fixed number of polymorphic sites (SNPs). I.e., all windows contain data ─ either phased haplotypes or unphased genotypes ─ for the same number of SNPs. In order to detect recent selective sweeps, the size of these windows needs to be chosen so that windows are generally larger than the normal genetic distance over which linkage disequlibrium (LD) decays to background levels in the absence of recent positive selection. Therefore, in windows which are unaffected by recent selective sweeps, genetic diversity will be high and thus the values of the selection statistics will be low. Conversely, in windows affected by recent selective sweeps, linkage disequilibrium will extend over a longer genetic distance spanning multiple windows, so that genetic diversity within those windows is low and thus values of selection statistics will be high. In other words, the choice of window size affects the signal to noise ratio for selection scans using H12 and G123 statistics. If windows are too small, results are dominated by background noise. If windows are too large, noise is minimal but power to detect recent selection signals is reduced.
This decision regarding an appropriate window size needs to be made independently for each cohort of samples over which a selection scan will be performed. This is because different source populations may have different demographic histories, and this in turn may alter the genetic distance over which LD decays in the absence of positive selection. Previous studies have used various demographic inference methods to try to infer key demographic parameters for each cohort being analysed, then use these parameters to inform the decision of window size. In practice, this approach presents a number of challenges. Firstly, inference of demographic parameters is difficult, and even state of the art inference methods may reach inaccurate conclusions. Secondly, running demographic inference methods can be computationally demanding, and this becomes impractical for large numbers of cohorts.
For these reasons we have taken an empirical approach to window size calibration for H12 and G123 scans, designed to reach a good signal to noise ratio.
For each cohort, we compute H12 over contig 3RL for multiple window sizes of 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 1200, 1400, 1600, 1800, 2000, 2500, 3000, 3500, 4000, 4500, 5000, 6000, 7000, 8000, 9000 or 10000 SNPs. We then compute the 95th percentile of statistic values over all windows. We choose the smallest window size for which the 95th percentile is below 0.08. This means that any window with a statistic value above this threshold will be in the top 5% of windows.
Similarly, we compute G123 over contig 3RL for multiple window sizes of 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 1200, 1400, 1600, 1800, 2000, 2500, 3000, 3500, 4000, 4500, 5000, 6000, 7000, 8000, 9000 or 10000 SNPs. We then compute the 95th percentile of statistic values over all windows. We choose the smallest window size for which the 95th percentile is below 0.08.
TODO how was window-size calibration done?
TODO after calibration, some cohorts removed if cannot get a window-size.
H12 genome-wide selection scans#
TODO
G123 genome-wide selection scans#
TODO
IHS genome-wide selection scans#
TODO
Automated detection of selection signals#
TODO
Identification of selection alerts#
TODO
Web report generation#
TODO