Principal Components Analysis Recitation π
Week 10
Introduction
Today is the first recitation for Module 4 where we put together a lot of the material weβve learned in the first 3 modules of this course. Todayβs material is on conducting principal components analysis (PCA) using R, and visualizing the results with some tools weβve already learned to use, and some new wrangling and viz tips along the way.
library(tidyverse) # everything
library(readxl) # reading in excel sheets
library(factoextra) # easy PCA plotting
library(glue) # easy pasting
library(ggrepel) # repelling labels away from their points
library(patchwork) # for combining and arranging plots
Read in data
We will be using data about pizza, which includes data collected about the nutritional information of 300 different grocery store pizzas, from 10 brands compiled by f-imp and posted to Github.
<- read_csv(file = "https://raw.githubusercontent.com/f-imp/Principal-Component-Analysis-PCA-over-3-datasets/master/datasets/Pizza.csv") pizza
Rows: 300 Columns: 9
ββ Column specification ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Delimiter: ","
chr (1): brand
dbl (8): id, mois, prot, fat, ash, sodium, carb, cal
βΉ Use `spec()` to retrieve the full column specification for this data.
βΉ Specify the column types or set `show_col_types = FALSE` to quiet this message.
How different are each of the different brands of pizzas analyzed overall?