Module 4 Assignment

Due 12-03-2024

Introduction

This is your assignment for Module 4 Putting It All Together, focused on the material you learned in the lectures and recitation activities on PCA, Manhattan plots, interactive plots, and the leftovers.

Submission info:

  • Please submit this assignment by uploading a knitted .html to Carmen
  • Your headers should be logical and your report and code annotated with descriptions of what you’re doing. Starting on this assignment, I will be considering for overall format and readability of your assignment as part of your grade. I am doing this because the format of your report will be considered for your final capstone assignment. This means you should have reasonable headers and header levels, understandable flow between plots and code, and use Markdown language when appropriate.
  • Make sure you include the Code Download button so that I can see your code as well
  • Customize the YAML and the document so you like how it looks

Remember there are often many ways to reach the same end product. I have showed you many ways in class to achieve a similar end product, you only need to show me one of them. As long as your answer is reasonable, you will get full credit even if its different than what I intended.

This assignment will be due on Wednesday, December 3, 2024, at 11:59pm.

Data

The data we will be using is the same we used in the ggplot102 recitation that includes information about dog breed trait information from the American Kennel Club.

Download the data using the code below. Don’t use the code from week 5 recitation.

breed_traits <- readr::read_csv('https://raw.githubusercontent.com/jcooperstone/jcooperstone.github.io/main/assignments/modules/module4/data/breed_traits_fixed.csv')

trait_description <- readr::read_csv('https://raw.githubusercontent.com/jcooperstone/jcooperstone.github.io/main/assignments/modules/module4/data/trait_description.csv')

breed_rank_all <- readr::read_csv('https://raw.githubusercontent.com/jcooperstone/jcooperstone.github.io/main/assignments/modules/module4/data/breed_rank_all.csv')

For a little hint, here are the packages I used to complete this task. Yours might not be exactly the same.

library(tidyverse)
library(factoextra)
library(glue)
library(patchwork)
library(ggrepel)
library(plotly)
library(gghighlight)

1. Principal components analysis (PCA) of American Kennel Club dog bred trait data (6 pts)

Run a PCA on breed_traits for all of the numeric data present in that dataset. Create the following plots and make them of publication quality:

  1. A scree plot
  2. A scores plot
  3. A loadings plot
  4. A two panel plot that has the scores plot and the scree plot together

2. Make your PCA plot interactive (2 pts)

Make your PCA scores plot interactive, and so that when you hover each point, you can see what the name of that dog breed is (and only the breed of that dog).

Back to top