AmericasBarometer Labels Guide
lapop-r-labels.Rmd
Introduction
This concise guide demonstrates how R users can effectively work with AmericasBarometer datasets to access variable metadata directly eliminating the need for external codebooks. While these features are more readily visible in STATA, they often require specific approaches in R. The guide outlines:
The multilingual structure and storage of AmericasBarometer data (including variable labels and value labels).
How to implement our custom LAPOP Lab functions for seamless dataset analysis.
Installing LAPOP Lab package
The lapop
package contains custom functions to analyze
data and plot figures following the LAPOP Lab’s guidelines.
If you do not have the LAPOP Lab package already installed in your
machine you can install it using the devtools
package
below:
# Install the LAPOP Lab package:
devtools::install_github("https://github.com/lapop-central/lapop",
force=TRUE,
build_vignettes = TRUE)
After installation is successful, you will need to load the package into your library:
How LAPOP Lab stores variables metadata?
LAPOP Lab’s AmericasBarometer stores variable labels and value labels in parallel languages (i.e., Spanish/English/Portuguese) directly within the dataset structure. This allows researchers to access metadata without external codebooks by:
Embedding language-specific labels as attributes in Stata-formatted files and to use standardized variable naming conventions that maintain consistency across survey waves since 2004. Therefore, Metadata is structured to facilitate longitudinal analysis. The project employs Stata-formatted datasets (.dta) with embedded metadata, preserving:
Variable labels (i.e., question wording).
Value labels (i.e., responses options).
Missing value codes (i.e., Not Applicated, Don’t Know, Refused).
As a result, a key advantage is LAPOP Lab’s active metadata
preservation even when distributing data through third-party platforms
like ICPSR or NORC, the Stata-formatted metadata remains intact,
allowing R/python users to still access labels via the
readstata13
package.
How to load AmericasBarometer Data in R?
The preferred method for loading LAPOP Lab datasets is to use the
read.dta13()
function from the readstata13
package. Below, I highlight a key difference between this recommended
approach and two alternative methods commonly used by researchers:
haven::read_dta()
and rio::import()
.
In the R programming language, attributes are additional metadata
that can be attached to R objects. Attributes provide a way to store
extra information about an object beyond its basic data structure. To
access the attributes of an object, you can use the attr()
or attributes()
base functions.
# READSTATA13 PACKAGE (RECOMMENDED FOR LABELS)
data1 <- read.dta13("./BRA 2023 LAPOP AmericasBarometer (v1.0s).dta")
names(attributes(data1))
# HAVEN PACKAGE (RECOMMENDED FOR PLOTTING)
data2 <- read_dta("./BRA 2023 LAPOP AmericasBarometer (v1.0s).dta")
names(attributes(data2))
# RIO PACKAGE
data3 <- import("./BRA 2023 LAPOP AmericasBarometer (v1.0s).dta")
names(attributes(data3))
The methods haven
and rio
packages do not
recover all the metadata from dta
files. Thereby, we
proceed with the dataset loaded with the readstata13
package.
AmericasBarometer Variables Labels (Question Wording)
AmericasBarometer variable labels are stored as notes
in
STATA .dta files. We can find them inside the “expansion.fields”
attribute.
The varlabels
object created is a nested list (much like
a JSON object) that will contain multilingual notes with the question
wording of each variable. For instance, for the variable
ing4
in the Brazil AB 2023
dataset will
contain notes with the question wording and lead-in in English, Spanish,
and Portuguese.
data(bra23)
# Extracting attributes from expasion.fields
names(attributes(bra23))
## [1] "row.names" "names" "datalabel" "time.stamp"
## [5] "formats" "types" "val.labels" "var.labels"
## [9] "version" "label.table" "expansion.fields" "byteorder"
## [13] "orig.dim" "data.label" "class"
varlabels = attr(bra23, "expansion.fields")
# Printing notes for varibale "ing4"
varlabels[grepl("ing4", varlabels) & grepl("note", varlabels)]
## [[1]]
## [1] "ing4" "note0" "7"
##
## [[2]]
## [1] "ing4"
## [2] "note7"
## [3] "Ahora, vamos a usar una escalera en donde el número 1 representa \"muy en desacuerdo\" y el número 7 representa \"muy de acuerdo\". Un número entre el 1 y el 7, representa un puntaje intermedio. Le voy a leer algunas frases. Por favor dígame hasta qué punto está de acuerdo o en desacuerdo con ellas."
##
## [[3]]
## [1] "ing4"
## [2] "note6"
## [3] "Now we will use a ladder where 1 means \"strongly disagree\" and 7 means \"strongly agree.\" A number in between 1 and 7 represents an intermediate score. Now I am going to read some statements. Please tell me to what extent you agree or disagree with them."
##
## [[4]]
## [1] "ing4"
## [2] "note5"
## [3] "Democracy may have problems, but it is better than any other form of government. To what extent do you agree or disagree with this statement?"
##
## [[5]]
## [1] "ing4"
## [2] "note4"
## [3] "Agora, vamos usar um cartão em que o ponto 1 representa \"discorda muito\" e o ponto 7 representa \"concorda muito\". Um número entre 1 e 7, representa uma pontuação intermediária. Agora vou ler algumas frases. Por favor, me diga o quanto concorda ou discorda delas."
##
## [[6]]
## [1] "ing4"
## [2] "note3"
## [3] "A democracia tem seus problemas, mas é melhor que qualquer outra forma de governo. Até que ponto concorda ou discorda desta frase?"
##
## [[7]]
## [1] "ing4"
## [2] "note1"
## [3] "Puede que la democracia tenga problemas, pero es mejor que cualquier otra forma de gobierno. ¿Hasta qué punto está de acuerdo o en desacuerdo con esta frase?"
##
## [[8]]
## [1] "ing4"
## [2] "note2"
## [3] "Democracy may have problems, but it is better than any other form of government. To what extent do you agree or disagree with this statement?"
How do I find the correct language labels?
It is important to keep in mind that note numbers may change across countries datasets and years depending on the language the survey was fielded. For Brazil, it is Brazilian-Portuguese, whereas in most other countries it is in Spanish but also in English like in Jamaica.
A streamlined approach for users who want to quickly find the values
of the notes on variable labels without assigning to the dataset would
be assessing those notes directly with a combination of
lapply()
and Filter()
functions from base R,
or also compact()
and map()
functions from the
purrr
package.
Both approaches extract the third element (x[3]) from each list in the “expansion.fields” attribute of the dataset, but only if the first element (x[1]) equals “ing4” (i.e., the variable of interest). Both produce a semi-structured list of matching third nested elements.
# Base R approach:
# The base R version uses lapply to iterate over the list and returns NULL when
# the condition isn't met, then Filter(Negate(is.null), ...) removes those NULLs.
head(
Filter(Negate(is.null), lapply(attr(bra23, "expansion.fields"),
function(x) {
if (x[1] == "ing4") list(note_id = x[2], note_value = x[3])
}))
)
## [[1]]
## [[1]]$note_id
## [1] "_lang_l_en"
##
## [[1]]$note_value
## [1] "ing4_en"
##
##
## [[2]]
## [[2]]$note_id
## [1] "_lang_v_en"
##
## [[2]]$note_value
## [1] "Support for democracy"
##
##
## [[3]]
## [[3]]$note_id
## [1] "note0"
##
## [[3]]$note_value
## [1] "7"
##
##
## [[4]]
## [[4]]$note_id
## [1] "note7"
##
## [[4]]$note_value
## [1] "Ahora, vamos a usar una escalera en donde el número 1 representa \"muy en desacuerdo\" y el número 7 representa \"muy de acuerdo\". Un número entre el 1 y el 7, representa un puntaje intermedio. Le voy a leer algunas frases. Por favor dígame hasta qué punto está de acuerdo o en desacuerdo con ellas."
##
##
## [[5]]
## [[5]]$note_id
## [1] "note6"
##
## [[5]]$note_value
## [1] "Now we will use a ladder where 1 means \"strongly disagree\" and 7 means \"strongly agree.\" A number in between 1 and 7 represents an intermediate score. Now I am going to read some statements. Please tell me to what extent you agree or disagree with them."
##
##
## [[6]]
## [[6]]$note_id
## [1] "note5"
##
## [[6]]$note_value
## [1] "Democracy may have problems, but it is better than any other form of government. To what extent do you agree or disagree with this statement?"
# purrr package approach
# The purrr version uses map() to do the same, and also remove NULL results.
tail(
purrr::map_dfr(attr(bra23, "expansion.fields"), ~ {
if (.x[1] == "ing4")
data.frame(
note_id = .x[2],
note_value = .x[3],
stringsAsFactors = FALSE
)
})
)
## note_id
## 9 _lang_l_default
## 10 _lang_v_default
## 11 note1
## 12 note2
## 13 _lang_v_pt
## 14 _lang_l_pt
## note_value
## 9 labels2670
## 10 ING4. A democracia tem seus problemas, mas é melhor que qualquer outra forma de
## 11 Puede que la democracia tenga problemas, pero es mejor que cualquier otra forma de gobierno. ¿Hasta qué punto está de acuerdo o en desacuerdo con esta frase?
## 12 Democracy may have problems, but it is better than any other form of government. To what extent do you agree or disagree with this statement?
## 13 Apoio à democracia
## 14 ing4_pt
To ease the researchers and analysts job, LAPOP Lab developed custom
functions to deal with those labels in R. First, we will use
lpr_extract_notes
to transform the nested lists into a long
dataframe format that will contain 3 columns:
variable_name
, note_id
, and
note_value
# Extract AB notes
notes <- lpr_extract_notes(varlabels)
head(notes, n = 10) # columns names
## variable_name note_id note_value
## 1 strata _lang_l_en strata_en
## 2 strata _lang_v_en Survey strata
## 3 q1n _lang_l_en q1n_en
## 4 q1n _lang_v_en Gender determined by interviewer
## 5 q1tb _lang_l_en q1tb_en
## 6 q1tb _lang_v_en Gender
## 7 q1tcb _lang_l_en q1tcb_en
## 8 q1tcb _lang_v_en Gender
## 9 q1tca _lang_l_en q1tca_en
## 10 q1tca _lang_v_en Gender
table(notes$note_id) # note information available!
##
## _lang_l_default _lang_l_en _lang_l_pt _lang_v_default
## 190 1332 1332 202
## _lang_v_en _lang_v_pt destring destring_cmd
## 1321 1320 7 7
## note0 note1 note10 note11
## 861 846 7 6
## note12 note13 note2 note3
## 1 1 833 265
## note4 note5 note6 note7
## 189 172 134 91
## note8 note9 spss_variable_label
## 68 25 3
After extracting the notes, one can assign variable labels to their
preferred language as attributes to the dataset using another LAPOP Lab
function called lpr_set_attr()
and than access it during
data analysis without the need of an accompanying codebook.
# English
bra23<-lpr_set_attr(bra23, notes,
noteid = "note2",
attribute_name = "qwording_en")
# Spanish
bra23<-lpr_set_attr(bra23, notes,
noteid = "note1",
attribute_name = "qwording_es")
# Portuguese
bra23<-lpr_set_attr(bra23, notes,
noteid = "note3",
attribute_name = "qwording_pt")
# Printing languages
attr(bra23$ing4, "qwording_en") # English
## [1] "Democracy may have problems, but it is better than any other form of government. To what extent do you agree or disagree with this statement?"
attr(bra23$ing4, "qwording_es") # Spanish
## [1] "Puede que la democracia tenga problemas, pero es mejor que cualquier otra forma de gobierno. ¿Hasta qué punto está de acuerdo o en desacuerdo con esta frase?"
attr(bra23$ing4, "qwording_pt") # Portuguese
## [1] "A democracia tem seus problemas, mas é melhor que qualquer outra forma de governo. Até que ponto concorda ou discorda desta frase?"
AmericasBarometer Values Labels (Response Options)
AmericasBarometer’s values labels (i.e., response options)
differently from the variable labels are stored inside the “label.table”
attribute. To extract them we will use another LAPOP Lab custom function
called lpr_format_labels
that will set up the response
options as an attribute across languages. Default language is English
and default attribute name is roslabel
but the function
allows custom attribute names as shown below:
bra23 <- lpr_set_ros(bra23) # Default English
bra23 <- lpr_set_ros(bra23, lang_id = "es",
attribute_name = "respuestas") # Spanish
bra23 <- lpr_set_ros(bra23, lang_id = "pt",
attribute_name = "ROsLabels_pt") # Portuguese
# Printing ROs
attr(bra23$ing4, "roslabel") # English
## [1] "Response Options: (1) Strongly disagree (7) Strongly agree"
attr(bra23$ing4, "respuestas") # Spanish
## [1] "Opciones de Respuesta: (1) Muy en desacuerdo (7) Muy de acuerdo"
attr(bra23$ing4, "ROsLabels_pt") # Portuguese
## [1] "Alternativas de Resposta: (1) Discorda muito (7) Concorda muito"
Takeaways (TLDR)
Data Structure
AmericasBarometer datasets are distributed in Stata .dta format with multilingual metadata (question wording and response options) embedded as attributes. These support cross-national and longitudinal comparability.
Preferred Loading Method
Use the readstata13::read.dta13() function to preserve full metadata
structure. Other methods such as haven::read_dta()
or
rio::import()
often fail to import these critical
attributes.
Variable Labels (Question Wording):
Stored in the “expansion.fields” attribute. Use the custom
lpr_extract_notes()
function to convert this into a tidy
data frame.
Assign preferred language labels to each variable with
lpr_set_attr()
using the appropriate noteid.
Value Labels (Response Options):
Stored in the “label.table” attribute. Use lpr_set_ros()
to assign these response labels as attributes in English, Spanish, or
Portuguese.
Quick Access Options
For advanced users, the guide also demonstrates streamlined methods
using lapply()
and purrr::map()
to quickly
view variable labels without modifying the dataset.