We will be using some data collection from the National Health and Nutrition Examination Survey which collects data to assess the health and nutritional status of people in the United States. The data from 2009-2012 has been compiled in an R package called NHANES.
# install.packages("NHANES")library(NHANES)# functionality and correlation packageslibrary(tidyverse)library(corrplot)library(ggcorrplot)library(GGally)library(Hmisc)library(reshape2)library(scales)knitr::kable(head(NHANES))
ID
SurveyYr
Gender
Age
AgeDecade
AgeMonths
Race1
Race3
Education
MaritalStatus
HHIncome
HHIncomeMid
Poverty
HomeRooms
HomeOwn
Work
Weight
Length
HeadCirc
Height
BMI
BMICatUnder20yrs
BMI_WHO
Pulse
BPSysAve
BPDiaAve
BPSys1
BPDia1
BPSys2
BPDia2
BPSys3
BPDia3
Testosterone
DirectChol
TotChol
UrineVol1
UrineFlow1
UrineVol2
UrineFlow2
Diabetes
DiabetesAge
HealthGen
DaysPhysHlthBad
DaysMentHlthBad
LittleInterest
Depressed
nPregnancies
nBabies
Age1stBaby
SleepHrsNight
SleepTrouble
PhysActive
PhysActiveDays
TVHrsDay
CompHrsDay
TVHrsDayChild
CompHrsDayChild
Alcohol12PlusYr
AlcoholDay
AlcoholYear
SmokeNow
Smoke100
Smoke100n
SmokeAge
Marijuana
AgeFirstMarij
RegularMarij
AgeRegMarij
HardDrugs
SexEver
SexAge
SexNumPartnLife
SexNumPartYear
SameSex
SexOrientation
PregnantNow
51624
2009_10
male
34
30-39
409
White
NA
High School
Married
25000-34999
30000
1.36
6
Own
NotWorking
87.4
NA
NA
164.7
32.22
NA
30.0_plus
70
113
85
114
88
114
88
112
82
NA
1.29
3.49
352
NA
NA
NA
No
NA
Good
0
15
Most
Several
NA
NA
NA
4
Yes
No
NA
NA
NA
NA
NA
Yes
NA
0
No
Yes
Smoker
18
Yes
17
No
NA
Yes
Yes
16
8
1
No
Heterosexual
NA
51624
2009_10
male
34
30-39
409
White
NA
High School
Married
25000-34999
30000
1.36
6
Own
NotWorking
87.4
NA
NA
164.7
32.22
NA
30.0_plus
70
113
85
114
88
114
88
112
82
NA
1.29
3.49
352
NA
NA
NA
No
NA
Good
0
15
Most
Several
NA
NA
NA
4
Yes
No
NA
NA
NA
NA
NA
Yes
NA
0
No
Yes
Smoker
18
Yes
17
No
NA
Yes
Yes
16
8
1
No
Heterosexual
NA
51624
2009_10
male
34
30-39
409
White
NA
High School
Married
25000-34999
30000
1.36
6
Own
NotWorking
87.4
NA
NA
164.7
32.22
NA
30.0_plus
70
113
85
114
88
114
88
112
82
NA
1.29
3.49
352
NA
NA
NA
No
NA
Good
0
15
Most
Several
NA
NA
NA
4
Yes
No
NA
NA
NA
NA
NA
Yes
NA
0
No
Yes
Smoker
18
Yes
17
No
NA
Yes
Yes
16
8
1
No
Heterosexual
NA
51625
2009_10
male
4
0-9
49
Other
NA
NA
NA
20000-24999
22500
1.07
9
Own
NA
17.0
NA
NA
105.4
15.30
NA
12.0_18.5
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
No
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
4
1
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
51630
2009_10
female
49
40-49
596
White
NA
Some College
LivePartner
35000-44999
40000
1.91
5
Rent
NotWorking
86.7
NA
NA
168.4
30.57
NA
30.0_plus
86
112
75
118
82
108
74
116
76
NA
1.16
6.70
77
0.094
NA
NA
No
NA
Good
0
10
Several
Several
2
2
27
8
Yes
No
NA
NA
NA
NA
NA
Yes
2
20
Yes
Yes
Smoker
38
Yes
18
No
NA
Yes
Yes
12
10
1
Yes
Heterosexual
NA
51638
2009_10
male
9
0-9
115
White
NA
NA
NA
75000-99999
87500
1.84
6
Rent
NA
29.8
NA
NA
133.1
16.82
NA
12.0_18.5
82
86
47
84
50
84
50
88
44
NA
1.34
4.86
123
1.538
NA
NA
No
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
5
0
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
1. How correlated are different measures of blood pressure?
In the NHANES dataset, there are 3 measurements for each systolic (the first/top number) and diastolic blood (the second/bottom number) pressure, and an average for each. How reproducible is each type of blood pressure measurement over the 3 samplings? Make visualizations to convey your findings.
2. How correlated are different physical measurements, health, and lifestyle variables?
In the NHANES dataset, there are data for subject BMI, Pulse, BPSysAve, BPDiaAve, TotalChol.
Create a series of plots/plot to show the relationship between these variables with each other.