library(retroharmonize)
Use the labelled_spss_survey()
helper function to create vectors of class retroharmonize_labelled_spss_survey.
sl1 <- labelled_spss_survey ( x = c(1,1,0,8,8,8), labels = c("yes" =1, "no" = 0, "declined" = 8), label = "Do you agree?", na_values = 8, id = "survey1") print(sl1) #> [1] 1 1 0 8 8 8 #> attr(,"labels") #> yes no declined #> 1 0 8 #> attr(,"label") #> [1] "Do you agree?" #> attr(,"na_values") #> [1] 8 #> attr(,"class") #> [1] "retroharmonize_labelled_spss_survey" "haven_labelled_spss" #> [3] "haven_labelled" #> attr(,"survey1_name") #> [1] "c(1, 1, 0, 8, 8, 8)" #> attr(,"survey1_values") #> 0 1 8 #> 0 1 8 #> attr(,"survey1_label") #> [1] "Do you agree?" #> attr(,"survey1_labels") #> yes no declined #> 1 0 8 #> attr(,"survey1_na_values") #> [1] 8 #> attr(,"id") #> [1] "survey1"
You can check the type:
is.labelled_spss_survey (sl1) #> [1] TRUE
The labelled_spss_survey()
class inherits some properties from haven::labelled()
, which can be manipulated by the labelled
package (See particularly the vignette Introduction to labelled by Joseph Larmarange.)
haven::is.labelled(sl1) #> [1] TRUE
labelled::val_labels(sl1) #> yes no declined #> 1 0 8
labelled::na_values(sl1) #> [1] 8
It can also be subsetted:
sl1[3:4] #> [1] 0 8 #> attr(,"labels") #> yes no declined #> 1 0 8 #> attr(,"label") #> [1] "Do you agree?" #> attr(,"na_values") #> [1] 8 #> attr(,"class") #> [1] "retroharmonize_labelled_spss_survey" "haven_labelled_spss" #> [3] "haven_labelled" #> attr(,"survey1_name") #> [1] "c(1, 1, 0, 8, 8, 8)" #> attr(,"survey1_values") #> 0 1 8 #> 0 1 8 #> attr(,"survey1_label") #> [1] "Do you agree?" #> attr(,"survey1_labels") #> yes no declined #> 1 0 8 #> attr(,"survey1_na_values") #> [1] 8 #> attr(,"id") #> [1] "survey1"
When used within the modernized version of data.frame, tibble::tibble()
, the summary of the variable content prints in an informative way.
df <- tibble::tibble (v1 = sl1) ## Use tibble instead of data.frame(v1=sl1) ... print(df) #> # A tibble: 6 x 1 #> v1 #> <retroh_dbl> #> 1 1 [yes] #> 2 1 [yes] #> 3 0 [no] #> 4 8 (NA) [declined] #> 5 8 (NA) [declined] #> 6 8 (NA) [declined] ## ... which inherits the methods of a data.frame subset(df, v1 == 1) #> # A tibble: 2 x 1 #> v1 #> <retroh_dbl> #> 1 1 [yes] #> 2 1 [yes]
To avoid any confusion with mis-labelled surveys, coercion with double or integer vectors will result in a double or integer vector. The use of vctrs::vec_c
is generally safer than base R c()
.
#double c(sl1, 1/7) #> [1] 1.0000000 1.0000000 0.0000000 8.0000000 8.0000000 8.0000000 0.1428571 vctrs::vec_c(sl1, 1/7) #> [1] 1.0000000 1.0000000 0.0000000 8.0000000 8.0000000 8.0000000 0.1428571
c(sl1, 1:3) #> [1] 1 1 0 8 8 8 1 2 3
Conversion to character works as expected:
as.character(sl1) #> [1] "1" "1" "0" "8" "8" "8"
The base as.factor
converts to integer and uses the integers as levels, because base R factors are integers with a levels
attribute.
as.factor(sl1) #> [1] 1 1 0 8 8 8 #> Levels: 0 1 8
Conversion to factor with as_factor
converts the value labels to factor levels:
as_factor(sl1) #> [1] yes yes no declined declined declined #> Levels: no yes declined
Similarly, when converting to numeric types, we have to convert the user-defined missing values to NA
values used in the R language. For numerical analysis, convert with as_numeric
.
as.numeric(sl1) #> [1] 1 1 0 8 8 8 as_numeric(sl1) #> [1] 1 1 0 NA NA NA
The median value is correctly displayed, because user-defined missing values are removed from the calculation. Only a few arithmetic methods are implemented, such as
median (as.numeric(sl1)) #> [1] 4.5 median (sl1) #> [1] 4.5
quantile (as.numeric(sl1), 0.9) #> 90% #> 8 quantile (sl1, 0.9) #> 90% #> 1
mean (as.numeric(sl1)) #> [1] 4.333333 mean (sl1) #> [1] 4.333333 mean (sl1, na.rm=TRUE) #> [1] 0.6666667
weights1 <- runif (n = 6, min = 0, max = 1) weighted.mean(as.numeric(sl1), weights1) #> [1] 4.857558 weighted.mean(sl1, weights1) #> [1] 4.857558
sum (as.numeric(sl1)) #> [1] 26 sum (sl1, na.rm=TRUE) #> [1] 26
The result of the conversion to numeric can be used for other mathematical / statistical function.
as_numeric(sl1) #> [1] 1 1 0 NA NA NA min ( as_numeric(sl1)) #> [1] NA min ( as_numeric(sl1), na.rm=TRUE) #> [1] 0