Interactive plots with plotly recitation solutions
Introduction
Today we are going to work with microbiome data. In this recitation we are going to provide microbiome data as the result of the shotgun sequencing of the pig gut microbiome.
Pig gut microbiome data
The goal of this recitation is to replicate the following plot, which expresses the relationship between the Bacteroidetes and Firmicutes while the rest of the Phyla levels were assigned to others.
Load the packages we will use
library(tidyverse)
library(plotly)
How many rows and columns does the data have?
<- read_csv("data/Phyla_RelAbund_Final_Filtered_WithMetadata.csv")
pig_micro
dim(pig_micro)
[1] 60 50
The data contains 60 rows and 50 columns
How many phyla does the data contains and how many columns represents metadata of the experiment?
glimpse(pig_micro)
Rows: 60
Columns: 50
$ Sample_Name <chr> "ShotgunWGS-ControlPig6G…
$ Pig <dbl> 6, 8, 3, 14, 5, 18, 16, …
$ Diet <chr> "Control", "Control", "C…
$ Time_Point <chr> "Day 14", "Day 0", "Day …
$ Diet_By_Time_Point <chr> "Control Day 14", "Contr…
$ Acidobacteria <dbl> 0.000741000, 0.000885000…
$ Actinobacteria <dbl> 0.04813358, 0.06598073, …
$ Apicomplexa <dbl> 5.95e-05, 9.14e-05, 4.92…
$ Aquificae <dbl> 0.000503, 0.000537, 0.00…
$ Ascomycota <dbl> 0.000384000, 0.000523000…
$ Bacillariophyta <dbl> 2.71e-05, 4.38e-05, 3.34…
$ Bacteroidetes <dbl> 0.3670956, 0.3312767, 0.…
$ Basidiomycota <dbl> 6.18e-05, 9.64e-05, 5.13…
$ `Candidatus Poribacteria` <dbl> 6.70e-06, 1.17e-05, 5.44…
$ Chlamydiae <dbl> 0.000142, 0.000197, 0.00…
$ Chlorobi <dbl> 0.001259844, 0.001445895…
$ Chloroflexi <dbl> 0.002020802, 0.002288481…
$ Chlorophyta <dbl> 9.53e-05, 1.36e-04, 9.38…
$ Chrysiogenetes <dbl> 8.53e-05, 9.90e-05, 7.02…
$ Crenarchaeota <dbl> 0.000167, 0.000215, 0.00…
$ Cyanobacteria <dbl> 0.002277461, 0.003240824…
$ Deferribacteres <dbl> 0.000385, 0.000475, 0.00…
$ `Deinococcus-Thermus` <dbl> 0.000639000, 0.000854000…
$ Dictyoglomi <dbl> 0.000314000, 0.000370000…
$ Elusimicrobia <dbl> 0.000163, 0.000240, 0.00…
$ Euryarchaeota <dbl> 0.003395060, 0.004565534…
$ Fibrobacteres <dbl> 0.001228664, 0.001657792…
$ Firmicutes <dbl> 0.5308271, 0.5293436, 0.…
$ Fusobacteria <dbl> 0.003955535, 0.005057419…
$ Gemmatimonadetes <dbl> 5.44e-05, 8.09e-05, 6.24…
$ Hemichordata <dbl> 6.70e-06, 4.29e-06, 4.14…
$ Korarchaeota <dbl> 1.93e-05, 2.31e-05, 1.35…
$ Lentisphaerae <dbl> 0.000197000, 0.000436000…
$ Microsporidia <dbl> 1.26e-05, 2.10e-05, 1.45…
$ Nanoarchaeota <dbl> 1.03e-06, 1.19e-06, 1.04…
$ Nematoda <dbl> 4.59e-05, 6.31e-05, 3.78…
$ Nitrospirae <dbl> 0.000142, 0.000190, 0.00…
$ Placozoa <dbl> 1.75e-05, 1.62e-05, 8.03…
$ Planctomycetes <dbl> 0.00039200, 0.00062700, …
$ Proteobacteria <dbl> 0.02713703, 0.03683140, …
$ Spirochaetes <dbl> 0.002968326, 0.004157693…
$ Synergistetes <dbl> 0.001147492, 0.001783025…
$ Tenericutes <dbl> 0.000712000, 0.000888000…
$ Thaumarchaeota <dbl> 1.44e-05, 1.90e-05, 1.53…
$ Thermotogae <dbl> 0.001292056, 0.001691600…
$ Verrucomicrobia <dbl> 0.000827000, 0.001495655…
$ `unclassified (derived from Bacteria)` <dbl> 0.000308, 0.000410, 0.00…
$ `unclassified (derived from Eukaryota)` <dbl> 0.000325000, 0.000997000…
$ `unclassified (derived from Viruses)` <dbl> 0.000398000, 0.000625000…
$ `unclassified (derived from other sequences)` <dbl> 1.24e-05, 7.86e-06, 1.61…
The first 5 rows represents metadata:
Sample_Name
Pig
Diet
Time_Point
Diet_By_Time_Point
The remainder of the columns represent the different phyla level annotation.
Computing the cumulative abundance for other phylum
Create a new column with a new phyla assignation
Keep the phyla level when they are Firmicutes or Bacteroidetes, otherwise assign Phyla to “Other level”.
Hint: You may need to pivot the data to evaluate the column names as observations
Here, we are creating a new column called Simplified_Phyla
where if the Phyla
is Bacteriodetes or Firmicutes, we specify the value that currently exists in Phyla
, and if its not, then we specify to give the value “Other phyla”.
We can do this with if_else()
.
|>
pig_micro pivot_longer(
# select all the phyla columns
cols = Acidobacteria:`unclassified (derived from other sequences)`,
values_to = "Abundance",
names_to = "Phyla") |>
mutate(Simplified_Phyla = if_else(condition = Phyla %in% c("Bacteroidetes", "Firmicutes"),
true = Phyla,
false = "Other phyla")) |>
head()
# A tibble: 6 Ă— 8
Sample_Name Pig Diet Time_Point Diet_By_Time_Point Phyla Abundance
<chr> <dbl> <chr> <chr> <chr> <chr> <dbl>
1 ShotgunWGS-ControlP… 6 Cont… Day 14 Control Day 14 Acid… 0.000741
2 ShotgunWGS-ControlP… 6 Cont… Day 14 Control Day 14 Acti… 0.0481
3 ShotgunWGS-ControlP… 6 Cont… Day 14 Control Day 14 Apic… 0.0000595
4 ShotgunWGS-ControlP… 6 Cont… Day 14 Control Day 14 Aqui… 0.000503
5 ShotgunWGS-ControlP… 6 Cont… Day 14 Control Day 14 Asco… 0.000384
6 ShotgunWGS-ControlP… 6 Cont… Day 14 Control Day 14 Baci… 0.0000271
# â„ą 1 more variable: Simplified_Phyla <chr>
Or with case_when()
.
|>
pig_micro pivot_longer(
# select all the phyla columns
cols = Acidobacteria:`unclassified (derived from other sequences)`,
values_to = "Abundance",
names_to = "Phyla") |>
mutate(Simplified_Phyla = case_when(Phyla == "Bacteroidetes" ~ Phyla,
== "Firmicutes" ~ Phyla,
Phyla # .default = "Other phyla")) |> # use this if you have a newer version of R
TRUE ~ "Other phyla")) |>
head()
# A tibble: 6 Ă— 8
Sample_Name Pig Diet Time_Point Diet_By_Time_Point Phyla Abundance
<chr> <dbl> <chr> <chr> <chr> <chr> <dbl>
1 ShotgunWGS-ControlP… 6 Cont… Day 14 Control Day 14 Acid… 0.000741
2 ShotgunWGS-ControlP… 6 Cont… Day 14 Control Day 14 Acti… 0.0481
3 ShotgunWGS-ControlP… 6 Cont… Day 14 Control Day 14 Apic… 0.0000595
4 ShotgunWGS-ControlP… 6 Cont… Day 14 Control Day 14 Aqui… 0.000503
5 ShotgunWGS-ControlP… 6 Cont… Day 14 Control Day 14 Asco… 0.000384
6 ShotgunWGS-ControlP… 6 Cont… Day 14 Control Day 14 Baci… 0.0000271
# â„ą 1 more variable: Simplified_Phyla <chr>
Compute the cumulative abundance by the new Simplified_Phyla
levels that you created
We want to have a summary that has the relative abundance of each of the Simplified_Phyla
for each pig.
<- pig_micro |>
pig_clean pivot_longer(
# select all the phyla columns
cols = Acidobacteria:`unclassified (derived from other sequences)`,
values_to = "Abundance",
names_to = "Phyla") |>
mutate(Simplified_Phyla = if_else(condition = Phyla %in% c("Bacteroidetes", "Firmicutes"),
true = Phyla,
false = "Other phyla")) |>
group_by(Sample_Name, Simplified_Phyla, Time_Point, Pig) |>
summarise(Abundance = sum(Abundance))
head(pig_clean)
# A tibble: 6 Ă— 5
# Groups: Sample_Name, Simplified_Phyla, Time_Point [6]
Sample_Name Simplified_Phyla Time_Point Pig Abundance
<chr> <chr> <chr> <dbl> <dbl>
1 ShotgunWGS-ControlPig10GutMicrobi… Bacteroidetes Day 0 10 0.481
2 ShotgunWGS-ControlPig10GutMicrobi… Firmicutes Day 0 10 0.435
3 ShotgunWGS-ControlPig10GutMicrobi… Other phyla Day 0 10 0.0839
4 ShotgunWGS-ControlPig10GutMicrobi… Bacteroidetes Day 14 10 0.383
5 ShotgunWGS-ControlPig10GutMicrobi… Firmicutes Day 14 10 0.513
6 ShotgunWGS-ControlPig10GutMicrobi… Other phyla Day 14 10 0.104
Create the base bar plot with ggplot
First some wrangling.
# what are the time points called again?
unique(pig_clean$Time_Point)
[1] "Day 0" "Day 14" "Day 7"
# make Time_Point a factor so the faceting goes Day 0, Day 7, Day 14
# instead of putting Day 14 in the middle
$Time_Point <- as.factor(pig_clean$Time_Point)
pig_clean
# set levels to be the order we want
$Time_Point <- factor(pig_clean$Time_Point,
pig_cleanlevels = c("Day 0", "Day 7", "Day 14"))
Then plot.
<- pig_clean |>
stack_barplot ggplot(aes(x = as.numeric(Pig), y = Abundance, fill = Simplified_Phyla,
# set text to specify hover test
text = glue::glue("{Simplified_Phyla}: {round(Abundance, 3)*100}%"))) +
geom_col() +
scale_fill_brewer(palette = "GnBu") +
facet_wrap(vars(Time_Point)) +
theme_classic() +
labs(y = "Relative Abundance",
fill = "Phyla",
x = "Pig Number",
title = "Bacteriodetes, Firmicutes, and Other Phyla from \nPig Microbiome Sequencing using Shotgun Metagenomics") +
theme(panel.grid = element_blank(),
axis.text = element_text(color = "black", size = 11),
strip.text = element_text(color = "black", size = 10),
strip.background = element_blank())
stack_barplot
Make the plot interactive
ggplotly(stack_barplot,
tooltip = "text") |>
layout(margin = list(t = 100)) # for increasing the padding around the title