banner

Workshop 1 - Training course in data analysis for genomic surveillance of African malaria vectors


Module 2 - Accessing and exploring Anopheles genomic data#

Theme: Data

This module provides an introduction to accessing and exploring data about Anopheles mosquito specimens collected in the field and submitted for whole-genome sequencing by MalariaGEN.

Learning objectives#

After completing this module, you will be able to:

  • Explain how Anopheles genomic data are generated.

  • Explain what types of data are available from MalariaGEN.

  • Explain where data from MalariaGEN are stored.

  • Use the malariagen_data Python package to access Ag3.0 data in Google Cloud.

  • Explore the Ag3.0 data release and summarise the mosquito samples for which genomic data are available using pivot tables and maps.

Lecture#

English#

Français#

Please note that the code in the cells below might differ from that shown in the video. This can happen because Python packages and their dependencies change due to updates, necessitating tweaks to the code.

Where do the data come from?#

  • The data we’ll be analysing in this training course where generated by multiple research groups collaborating as part of the Malaria Genomic Epidemiology Network (MalariaGEN).

  • MalariaGEN is a collaborative programme providing access to genome sequencing and data processing services to support surveillance of malaria parasites and vectors.

  • Through this programme, members of research groups and disease control programmes in malaria-endemic countries work in partnership with the Wellcome Sanger Institute.

  • The basic workflow involves collecting mosquitoes, shipping them to sequencing facilities, preparing DNA samples and performing Illumina whole-genome sequencing, then processing the resulting data so they are ready for analysis, as shown below.

%%html
<img width="50%" height="50%" src="https://vobs-resources.cog.sanger.ac.uk/training/img/workshop-1/w1m2-1.png"/>
  • Note that raw genome sequence data is not particularly useful by itself, and so the sequence reads are processed through variant-calling pipelines which identify different types of genetic variation between individual mosquitoes.

  • The results of variant-calling pipelines are then passed through a number of quality control, filtering and annotation steps to ensure data quality. We call this process data curation.

  • The analysis-ready genome variation data is then made available to all partners in the collaboration. This data can then be analysed to answer questions about the surveillance of mosquito populations, such as whether new forms of insecticide resistance are emerging and spreading.

What types of analysis-ready genomic data are available?#

  • When DNA is passed from one generation of mosquitoes to the next, it undergoes mutations, which are errors in the DNA copying process. There are different types of mutations that can occur. These include:

    • Single Nucleotide Polymorphisms (SNPs) - substitutions of a single letter in the DNA sequence

    • Copy Number Variants (CNVs) - duplications or deletions of sections of a DNA sequence

  • Different variant calling pipelines are used to identify these different types of mutations.

  • It is also very useful to know whether combinations of mutations occur together in the same DNA sequence. In order to reconstruct this information, another pipeline is used to produce phased haplotypes.

  • To help make sense of the genomic data, we also need some data about the mosquitoes which were sequenced, such as the time and place of collection. This data is known as sample metadata.

We will revisit CNVs and haplotypes in future workshops. For this workshop, we are only interested in SNPs and sample metadata.

Where are the data stored?#

  • To make accessing these data as simple as possible, the resulting data are stored in Google Cloud using a service called Google Cloud Storage (GCS). These data can then be downloaded to any computer, or can be analysed within the cloud using cloud computing services like colab.

  • If you are using colab to access and analyses these data, then you don’t need to download any data to your own computer or install any special software. You access colab through a web browser, and the code you run is executed on a different computer (a “virtual machine”) which sits alongside the data in Google Cloud.

Accessing the Ag3.0 data resource#

In this workshop we’ll be accessing and analysing data from the Anopheles gambiae 1000 Genomes Project phase 3 data resource, also known as “Ag3.0” for short. This includes data from whole-genome sequencing of 3,081 mosquitoes from 19 African countries.

To set up your notebook to access these data, first install the malariagen_data package.

%pip install -q --no-warn-conflicts malariagen_data

Then import packages and set up access to Anopheles gambiae genomic data.

Note that authentication is required to access data through the package, please follow the instructions here.

import malariagen_data
import plotly.express as px
ag3 = malariagen_data.Ag3()
ag3
MalariaGEN Ag3 API client
Please note that data are subject to terms of use, for more information see the MalariaGEN website or contact support@malariagen.net. See also the Ag3 API docs.
Storage URL gs://vo_agam_release/
Data releases available 3.0, 3.1, 3.2, 3.3, 3.4, 3.5, 3.6, 3.7, 3.8, 3.9
Results cache None
Cohorts analysis 20240418
AIM analysis 20220528
Site filters analysis dt_20200416
Software version malariagen_data 9.0.0
Client location unknown

You can now access a number of different types of data through the ag3 object. The full list of functions is available from the Ag3 API docs. For the rest of this module, we are just going to look at sample metadata.

Loading sample metadata#

We can use the sample_metadata() function to retrieve a pandas DataFrame containing metadata about all 3,081 samples in the Ag3.0 resource. In this DataFrame, each row represents one mosquito sample, and the columns such as country and year provide information about where the mosquito was originally collected.

df_samples = ag3.sample_metadata(sample_sets="3.0")
df_samples
                                     
sample_id partner_sample_id contributor country location year month latitude longitude sex_call ... admin1_name admin1_iso admin2_name taxon cohort_admin1_year cohort_admin1_month cohort_admin1_quarter cohort_admin2_year cohort_admin2_month cohort_admin2_quarter
0 AR0047-C LUA047 Joao Pinto Angola Luanda 2009 4 -8.884 13.302 F ... Luanda AO-LUA Luanda coluzzii AO-LUA_colu_2009 AO-LUA_colu_2009_04 AO-LUA_colu_2009_Q2 AO-LUA_Luanda_colu_2009 AO-LUA_Luanda_colu_2009_04 AO-LUA_Luanda_colu_2009_Q2
1 AR0049-C LUA049 Joao Pinto Angola Luanda 2009 4 -8.884 13.302 F ... Luanda AO-LUA Luanda coluzzii AO-LUA_colu_2009 AO-LUA_colu_2009_04 AO-LUA_colu_2009_Q2 AO-LUA_Luanda_colu_2009 AO-LUA_Luanda_colu_2009_04 AO-LUA_Luanda_colu_2009_Q2
2 AR0051-C LUA051 Joao Pinto Angola Luanda 2009 4 -8.884 13.302 F ... Luanda AO-LUA Luanda coluzzii AO-LUA_colu_2009 AO-LUA_colu_2009_04 AO-LUA_colu_2009_Q2 AO-LUA_Luanda_colu_2009 AO-LUA_Luanda_colu_2009_04 AO-LUA_Luanda_colu_2009_Q2
3 AR0061-C LUA061 Joao Pinto Angola Luanda 2009 4 -8.884 13.302 F ... Luanda AO-LUA Luanda coluzzii AO-LUA_colu_2009 AO-LUA_colu_2009_04 AO-LUA_colu_2009_Q2 AO-LUA_Luanda_colu_2009 AO-LUA_Luanda_colu_2009_04 AO-LUA_Luanda_colu_2009_Q2
4 AR0078-C LUA078 Joao Pinto Angola Luanda 2009 4 -8.884 13.302 F ... Luanda AO-LUA Luanda coluzzii AO-LUA_colu_2009 AO-LUA_colu_2009_04 AO-LUA_colu_2009_Q2 AO-LUA_Luanda_colu_2009 AO-LUA_Luanda_colu_2009_04 AO-LUA_Luanda_colu_2009_Q2
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
3076 AD0494-C 80-2-o-16 Martin Donnelly Lab Cross LSTM -1 -1 53.409 -2.969 F ... NaN NaN NaN unassigned NaN NaN NaN NaN NaN NaN
3077 AD0495-C 80-2-o-17 Martin Donnelly Lab Cross LSTM -1 -1 53.409 -2.969 M ... NaN NaN NaN unassigned NaN NaN NaN NaN NaN NaN
3078 AD0496-C 80-2-o-18 Martin Donnelly Lab Cross LSTM -1 -1 53.409 -2.969 M ... NaN NaN NaN unassigned NaN NaN NaN NaN NaN NaN
3079 AD0497-C 80-2-o-19 Martin Donnelly Lab Cross LSTM -1 -1 53.409 -2.969 F ... NaN NaN NaN unassigned NaN NaN NaN NaN NaN NaN
3080 AD0498-C 80-2-o-20 Martin Donnelly Lab Cross LSTM -1 -1 53.409 -2.969 M ... NaN NaN NaN unassigned NaN NaN NaN NaN NaN NaN

3081 rows × 32 columns

Exploring sample metadata#

Let’s use some pandas features such as groupby() and query() to explore the sample metadata.

For example, let’s first find out a bit more information about the different countries represented.

df_samples.groupby("country").size()
country
Angola                               81
Burkina Faso                        296
Cameroon                            444
Central African Republic             73
Cote d'Ivoire                        80
Democratic Republic of the Congo     76
Equatorial Guinea                    10
Gabon                                69
Gambia, The                         279
Ghana                               100
Guinea                              136
Guinea-Bissau                       101
Kenya                                86
Lab Cross                           297
Malawi                               41
Mali                                225
Mayotte                              23
Mozambique                           74
Tanzania                            300
Uganda                              290
dtype: int64

We can then use the pandas query() function to select all samples from a given country. E.g., find all samples from Burkina Faso.

df_samples.query("country == 'Burkina Faso'")
sample_id partner_sample_id contributor country location year month latitude longitude sex_call ... admin1_name admin1_iso admin2_name taxon cohort_admin1_year cohort_admin1_month cohort_admin1_quarter cohort_admin2_year cohort_admin2_month cohort_admin2_quarter
81 AB0085-Cx BF2-4 Austin Burt Burkina Faso Pala 2012 7 11.151 -4.235 F ... Hauts-Bassins BF-09 Houet gambiae BF-09_gamb_2012 BF-09_gamb_2012_07 BF-09_gamb_2012_Q3 BF-09_Houet_gamb_2012 BF-09_Houet_gamb_2012_07 BF-09_Houet_gamb_2012_Q3
82 AB0086-Cx BF2-6 Austin Burt Burkina Faso Pala 2012 7 11.151 -4.235 F ... Hauts-Bassins BF-09 Houet gambiae BF-09_gamb_2012 BF-09_gamb_2012_07 BF-09_gamb_2012_Q3 BF-09_Houet_gamb_2012 BF-09_Houet_gamb_2012_07 BF-09_Houet_gamb_2012_Q3
83 AB0087-C BF3-3 Austin Burt Burkina Faso Bana Village 2012 7 11.233 -4.472 F ... Hauts-Bassins BF-09 Houet coluzzii BF-09_colu_2012 BF-09_colu_2012_07 BF-09_colu_2012_Q3 BF-09_Houet_colu_2012 BF-09_Houet_colu_2012_07 BF-09_Houet_colu_2012_Q3
84 AB0088-C BF3-5 Austin Burt Burkina Faso Bana Village 2012 7 11.233 -4.472 F ... Hauts-Bassins BF-09 Houet coluzzii BF-09_colu_2012 BF-09_colu_2012_07 BF-09_colu_2012_Q3 BF-09_Houet_colu_2012 BF-09_Houet_colu_2012_07 BF-09_Houet_colu_2012_Q3
85 AB0089-Cx BF3-8 Austin Burt Burkina Faso Bana Village 2012 7 11.233 -4.472 F ... Hauts-Bassins BF-09 Houet coluzzii BF-09_colu_2012 BF-09_colu_2012_07 BF-09_colu_2012_Q3 BF-09_Houet_colu_2012 BF-09_Houet_colu_2012_07 BF-09_Houet_colu_2012_Q3
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
372 AB0314-C 6775 Nora Besansky Burkina Faso Monomtenga 2004 8 12.060 -1.170 F ... Centre-Sud BF-07 Bazega gambiae BF-07_gamb_2004 BF-07_gamb_2004_08 BF-07_gamb_2004_Q3 BF-07_Bazega_gamb_2004 BF-07_Bazega_gamb_2004_08 BF-07_Bazega_gamb_2004_Q3
373 AB0315-C 6777 Nora Besansky Burkina Faso Monomtenga 2004 8 12.060 -1.170 F ... Centre-Sud BF-07 Bazega gambiae BF-07_gamb_2004 BF-07_gamb_2004_08 BF-07_gamb_2004_Q3 BF-07_Bazega_gamb_2004 BF-07_Bazega_gamb_2004_08 BF-07_Bazega_gamb_2004_Q3
374 AB0316-C 6779 Nora Besansky Burkina Faso Monomtenga 2004 8 12.060 -1.170 F ... Centre-Sud BF-07 Bazega gambiae BF-07_gamb_2004 BF-07_gamb_2004_08 BF-07_gamb_2004_Q3 BF-07_Bazega_gamb_2004 BF-07_Bazega_gamb_2004_08 BF-07_Bazega_gamb_2004_Q3
375 AB0318-C 5072 Nora Besansky Burkina Faso Monomtenga 2004 7 12.060 -1.170 F ... Centre-Sud BF-07 Bazega gambiae BF-07_gamb_2004 BF-07_gamb_2004_07 BF-07_gamb_2004_Q3 BF-07_Bazega_gamb_2004 BF-07_Bazega_gamb_2004_07 BF-07_Bazega_gamb_2004_Q3
376 AB0325-C 1403 Nora Besansky Burkina Faso Monomtenga 2004 6 12.060 -1.170 F ... Centre-Sud BF-07 Bazega gambiae BF-07_gamb_2004 BF-07_gamb_2004_06 BF-07_gamb_2004_Q2 BF-07_Bazega_gamb_2004 BF-07_Bazega_gamb_2004_06 BF-07_Bazega_gamb_2004_Q2

296 rows × 32 columns

From a quick glance at the preview above, we can see there are samples collected in different years. Let’s summarise that.

df_samples.query("country == 'Burkina Faso'").groupby("year").size()
year
2004     13
2012    181
2014    102
dtype: int64

If we wanted to now inspect the samples collected from Burkina Faso in 2014, we could combine these conditions in a query.

df_samples.query("country == 'Burkina Faso' and year == 2014")
sample_id partner_sample_id contributor country location year month latitude longitude sex_call ... admin1_name admin1_iso admin2_name taxon cohort_admin1_year cohort_admin1_month cohort_admin1_quarter cohort_admin2_year cohort_admin2_month cohort_admin2_quarter
262 AB0326-C BF18-1 Austin Burt Burkina Faso Bana Village 2014 7 11.233 -4.472 F ... Hauts-Bassins BF-09 Houet coluzzii BF-09_colu_2014 BF-09_colu_2014_07 BF-09_colu_2014_Q3 BF-09_Houet_colu_2014 BF-09_Houet_colu_2014_07 BF-09_Houet_colu_2014_Q3
263 AB0327-C BF18-3 Austin Burt Burkina Faso Bana Village 2014 7 11.233 -4.472 F ... Hauts-Bassins BF-09 Houet coluzzii BF-09_colu_2014 BF-09_colu_2014_07 BF-09_colu_2014_Q3 BF-09_Houet_colu_2014 BF-09_Houet_colu_2014_07 BF-09_Houet_colu_2014_Q3
264 AB0328-C BF18-4 Austin Burt Burkina Faso Bana Village 2014 7 11.233 -4.472 F ... Hauts-Bassins BF-09 Houet coluzzii BF-09_colu_2014 BF-09_colu_2014_07 BF-09_colu_2014_Q3 BF-09_Houet_colu_2014 BF-09_Houet_colu_2014_07 BF-09_Houet_colu_2014_Q3
265 AB0329-C BF18-5 Austin Burt Burkina Faso Bana Village 2014 7 11.233 -4.472 F ... Hauts-Bassins BF-09 Houet coluzzii BF-09_colu_2014 BF-09_colu_2014_07 BF-09_colu_2014_Q3 BF-09_Houet_colu_2014 BF-09_Houet_colu_2014_07 BF-09_Houet_colu_2014_Q3
266 AB0330-C BF18-6 Austin Burt Burkina Faso Bana Village 2014 7 11.233 -4.472 F ... Hauts-Bassins BF-09 Houet coluzzii BF-09_colu_2014 BF-09_colu_2014_07 BF-09_colu_2014_Q3 BF-09_Houet_colu_2014 BF-09_Houet_colu_2014_07 BF-09_Houet_colu_2014_Q3
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
359 AB0533-C BF13-18 Austin Burt Burkina Faso Souroukoudinga 2014 7 11.238 -4.235 F ... Hauts-Bassins BF-09 Houet gambiae BF-09_gamb_2014 BF-09_gamb_2014_07 BF-09_gamb_2014_Q3 BF-09_Houet_gamb_2014 BF-09_Houet_gamb_2014_07 BF-09_Houet_gamb_2014_Q3
360 AB0536-C BF13-31 Austin Burt Burkina Faso Souroukoudinga 2014 7 11.238 -4.235 F ... Hauts-Bassins BF-09 Houet gambiae BF-09_gamb_2014 BF-09_gamb_2014_07 BF-09_gamb_2014_Q3 BF-09_Houet_gamb_2014 BF-09_Houet_gamb_2014_07 BF-09_Houet_gamb_2014_Q3
361 AB0537-C BF13-32 Austin Burt Burkina Faso Souroukoudinga 2014 7 11.238 -4.235 F ... Hauts-Bassins BF-09 Houet gambiae BF-09_gamb_2014 BF-09_gamb_2014_07 BF-09_gamb_2014_Q3 BF-09_Houet_gamb_2014 BF-09_Houet_gamb_2014_07 BF-09_Houet_gamb_2014_Q3
362 AB0538-C BF13-33 Austin Burt Burkina Faso Souroukoudinga 2014 7 11.238 -4.235 F ... Hauts-Bassins BF-09 Houet gambiae BF-09_gamb_2014 BF-09_gamb_2014_07 BF-09_gamb_2014_Q3 BF-09_Houet_gamb_2014 BF-09_Houet_gamb_2014_07 BF-09_Houet_gamb_2014_Q3
363 AB0408-C BF14-20 Austin Burt Burkina Faso Bana Village 2014 7 11.233 -4.472 F ... Hauts-Bassins BF-09 Houet coluzzii BF-09_colu_2014 BF-09_colu_2014_07 BF-09_colu_2014_Q3 BF-09_Houet_colu_2014 BF-09_Houet_colu_2014_07 BF-09_Houet_colu_2014_Q3

102 rows × 32 columns

Finally, let’s break this down by mosquito species.

df_samples.query("country == 'Burkina Faso' and year == 2014").groupby("taxon").size()
taxon
arabiensis     3
coluzzii      53
gambiae       46
dtype: int64

Summarising sample metadata with pivot tables#

In the examples above we explored a part of the sample metadata, but it can also be useful to get an overall summary of how many samples have been sequenced, broken down by time and place of collection and mosquito species. For that kind of summary the pivot_table() function is useful.

Let’s start by summarising the number of mosquitoes by country and species (taxon).

pivot_country_taxon = (
    df_samples
    .pivot_table(
        index="country", 
        columns="taxon", 
        values="sample_id",
        aggfunc="count",
        fill_value=0
    )
)
pivot_country_taxon
taxon arabiensis coluzzii gambiae gcx1 gcx3 unassigned
country
Angola 0 81 0 0 0 0
Burkina Faso 3 135 158 0 0 0
Cameroon 2 26 416 0 0 0
Central African Republic 0 18 55 0 0 0
Cote d'Ivoire 0 80 0 0 0 0
Democratic Republic of the Congo 0 0 76 0 0 0
Equatorial Guinea 0 0 10 0 0 0
Gabon 0 0 69 0 0 0
Gambia, The 0 200 2 77 0 0
Ghana 0 64 36 0 0 0
Guinea 0 11 124 0 0 1
Guinea-Bissau 0 0 7 93 0 1
Kenya 13 0 19 0 54 0
Lab Cross 0 0 0 0 0 297
Malawi 41 0 0 0 0 0
Mali 2 90 131 0 0 2
Mayotte 0 0 23 0 0 0
Mozambique 0 0 74 0 0 0
Tanzania 225 0 64 0 11 0
Uganda 82 0 207 0 0 1

We could also turn this into a bar chart.

fig = px.bar(pivot_country_taxon, height=600, width=800)
fig.update_layout(
    title="Ag3.0 genomes sequenced",
    yaxis_title="no. genomes",
)
fig.show()

Mosquitoes were also sampled in different years. Let’s make a new pivot table, breaking down by country, year and taxon.

pivot_country_year_taxon = (
    df_samples
    .pivot_table(
        index=["country", "year"], 
        columns=["taxon"], 
        values="sample_id",
        aggfunc="count",
        fill_value=0
    )
)
pivot_country_year_taxon
taxon arabiensis coluzzii gambiae gcx1 gcx3 unassigned
country year
Angola 2009 0 81 0 0 0 0
Burkina Faso 2004 0 0 13 0 0 0
2012 0 82 99 0 0 0
2014 3 53 46 0 0 0
Cameroon 2005 0 7 90 0 0 0
2009 0 0 303 0 0 0
2013 2 19 23 0 0 0
Central African Republic 1993 0 5 2 0 0 0
1994 0 13 53 0 0 0
Cote d'Ivoire 2012 0 80 0 0 0 0
Democratic Republic of the Congo 2015 0 0 76 0 0 0
Equatorial Guinea 2002 0 0 10 0 0 0
Gabon 2000 0 0 69 0 0 0
Gambia, The 2006 0 22 0 9 0 0
2011 0 6 0 68 0 0
2012 0 172 2 0 0 0
Ghana 2012 0 64 36 0 0 0
Guinea 2012 0 11 124 0 0 1
Guinea-Bissau 2010 0 0 7 93 0 1
Kenya 2000 0 0 19 0 0 0
2007 3 0 0 0 0 0
2012 10 0 0 0 54 0
Lab Cross -1 0 0 0 0 0 297
Malawi 2015 41 0 0 0 0 0
Mali 2004 2 36 33 0 0 0
2012 0 27 65 0 0 2
2014 0 27 33 0 0 0
Mayotte 2011 0 0 23 0 0 0
Mozambique 2003 0 0 3 0 0 0
2004 0 0 71 0 0 0
Tanzania 2012 87 0 0 0 0 0
2013 1 0 32 0 10 0
2015 137 0 32 0 1 0
Uganda 2012 82 0 207 0 0 1

For some countries there are data from multiple collection sites. Let’s inspect that for Burkina Faso by applying a query then creating a pivot table.

pivot_location_year_taxon_bf = (
    df_samples
    .query("country == 'Burkina Faso'")
    .pivot_table(
        index=["country", "admin1_name", "admin2_name", "location", "year"], 
        columns=["taxon"], 
        values="sample_id",
        aggfunc="count",
        fill_value=0
    )
)
pivot_location_year_taxon_bf
taxon arabiensis coluzzii gambiae
country admin1_name admin2_name location year
Burkina Faso Centre-Sud Bazega Monomtenga 2004 0 0 13
Hauts-Bassins Houet Bana Village 2012 0 42 23
2014 1 47 15
Pala 2012 0 11 48
2014 2 0 16
Souroukoudinga 2012 0 29 28
2014 0 6 15

We can see there are four collection sites in Burkina Faso.

Plotting maps of sampling locations#

To explore the different mosquito collection locations it can also be useful to plot some maps. You can plot maps within a notebook using various packages such as ipyleaflet. Let’s install the ipyleaflet package.

%pip install -qq ipyleaflet
Note: you may need to restart the kernel to use updated packages.

Now import some useful functions from ipyleaflet.

import ipyleaflet

Creating an interactive map is very straightforward, using the Map() function. Here is a world map centered on Africa. Note that this is an interactive map, you can pan and zoom.

m = ipyleaflet.Map(
    basemap=ipyleaflet.basemaps.OpenStreetMap.Mapnik, 
    center=[0, 20], 
    zoom=3,

)
m

Let’s now plot a map, adding in markers for all of the locations where we have mosquitoes. First create a pivot table with the location data we need.

pivot_location_taxon = (
    df_samples
    .pivot_table(
        index=["country", "location", "latitude", "longitude"], 
        columns=["taxon"], 
        values="sample_id",
        aggfunc="count",
        fill_value=0,
    )
)

pivot_location_taxon
taxon arabiensis coluzzii gambiae gcx1 gcx3 unassigned
country location latitude longitude
Angola Luanda -8.884 13.302 0 81 0 0 0 0
Burkina Faso Bana Village 11.233 -4.472 1 89 38 0 0 0
Monomtenga 12.060 -1.170 0 0 13 0 0 0
Pala 11.151 -4.235 2 11 64 0 0 0
Souroukoudinga 11.238 -4.235 0 35 43 0 0 0
... ... ... ... ... ... ... ... ... ...
Tanzania Muheza -4.940 38.948 1 0 32 0 10 0
Muleba -1.962 31.621 137 0 32 0 1 0
Tarime -1.431 34.199 47 0 0 0 0 0
Uganda Kihihi -0.751 29.701 1 0 95 0 0 0
Nagongera 0.770 34.026 81 0 112 0 0 1

127 rows × 6 columns

Now create a map with markers.

# create a map
m = ipyleaflet.Map(
    basemap=ipyleaflet.basemaps.OpenStreetMap.Mapnik, 
    center=[0, 20], 
    zoom=3,
)

# add markers for sampling locations
for row in pivot_location_taxon.reset_index().itertuples():
    title = (
        f"{row.location}, {row.country} ({row.latitude:.3f}, {row.longitude:.3f})\n"
        f"{row.gambiae} gambiae, {row.coluzzii} coluzzii, {row.arabiensis} arabiensis"
    )
    marker = ipyleaflet.Marker(
        location=(row.latitude, row.longitude), 
        draggable=False,
        title=title,
    )
    m.add_layer(marker)

# add a scale bar
m.add_control(ipyleaflet.ScaleControl(position="bottomleft"))

# display the map
m

Try hovering over the markers, you should see some text with a summary of how many samples are available by species.

Practical exercises#

English#

  1. Open this notebook in Google Colab and run it for yourself from top to bottom. Hint: click the rocket icon () at the top of the page, then select “Colab” from the drop-down menu. When colab opens, click the “Edit” menu, then select “Clear all outputs”, then begin running the cells.

  2. Find out how many mosquito specimens are available for each of the different Anopheles species represented. Hint: try grouping the sample metadata dataframe by the “taxon” column, then calling the size() method.

  3. Make a pivot table that shows how many samples are available in the Ag3.0 resource that were collected in Mali, summarised by year, location and taxon. Now try Cameroon, or any other country of interest.

  4. How many countries are there for which we have some samples of Anopheles coluzzii? What about Anopheles arabiensis and Anopheles gambiae? Hint: Make a pivot table by country and taxon, and then query it.

  5. Plot a map of all sampling locations, changing the basemap parameter to show a different background map. Hint: see the ipyleaflet basemaps documentation for a list of available options.

  6. Plot a map that starts centered and zoomed in to Uganda, or any other country of interest. Hint: change the center and zoom parameters when calling the ipyleaflet Map() function.

  7. Plot a map showing only locations where we have samples of Anopheles coluzzii. Now try Anopheles arabiensis or Anopheles gambiae.

  8. If you feel like a challenge, plot a map with markers for sampling locations, and add a popup to each marker showing a pivot table of how many samples were collected by year and species.

Français#

  1. Ouvrir ce notebook dans Google Colab et exécuter le vous-même du début à la fin. Indice: cliquer sur l’icone fusée () au sommet de la page et sélectionner “Colab” dans le menu déroulant. Quand Colab s’ouvre, cliquer sur le menu “Edit” et sélectionner “Clear all outputs”, commencer ensuite à exéuter les cellules.

  2. Trouver combien de moustiques sont disponibles pour chacune des différentes espèces d’Anophèles représentées. Indice: essayer de grouper le dataframe des métadonnées des échantillons selon la colonne “taxon”, ensuite utiliser la méthode size().

  3. Créer une table à pivôt qui montre combien de moustiques capturés au Mali sont présents dans Ag3.0, résummés par année, lieu de capture et taxon. Essayer ensuite le Cameroun ou autre pays de votre choix.

  4. Pour combien de pays avons nous des Anophèles coluzzii? Même question pour Anophèles arabiensis et Anophèles gambiae? Indice: créer une table à pivôt par pays et taxon et utiliser une requête.

  5. Créer une carte de tous les lieux de capture utilisant une autre basemap pour avoir un fond différent. Indice: regarder la documentation d’ipyleaflet basemaps pour une liste des options disponibles.

  6. Créer une carte centrée et zoomée sur l’Ouganda ou un autre pays de votre choix. Indice: modifier les paramètres center et zoom quand vous utilisez la fonction Map() d’ipyleaflet.

  7. Créer une carte ne montrant que les lieux de capture où des Anophèles coluzzii ont été capturés. Faire la même chose pour les Anophèles arabiensis ou les Anophèles gambiae.

  8. Si vous souhaitez un défi, créer une carte avec un marqueur pour chaque lieu de capture et ajouter un pop-up à chaque marqueur montrant une table à pivôt donnant le nombre de moustiques par année et taxon.