AmericasBarometer Labels Guide • lapop

Introduction

This concise guide demonstrates how R users can effectively work with AmericasBarometer datasets to access variable metadata directly eliminating the need for external codebooks. While these features are more readily visible in Stata, they often require specific approaches in R. The guide outlines:

The multilingual structure and storage of AmericasBarometer data (including variable labels and value labels).
How to implement our custom LAPOP Lab functions for seamless dataset analysis.

Installing LAPOP Lab package

The lapop package contains custom functions to analyze data and plot figures following the LAPOP Lab’s guidelines.

If you do not have the LAPOP Lab package already installed in your machine you can install it using the devtools package below:

# Install the LAPOP Lab package:
devtools::install_github("https://github.com/lapop-central/lapop", 
                         force=TRUE, 
                         build_vignettes = TRUE)

After installation is successful, you will need to load the package into your library:

library(lapop)
# Note: no need for lapop_fonts() 
# because we are not plotting figures

How LAPOP Lab stores variables metadata

LAPOP Lab’s AmericasBarometer stores variable labels and value labels in parallel languages (i.e., Spanish/English/Portuguese) directly within the dataset structure. This allows researchers to access metadata without external codebooks by:

Embedding language-specific labels as attributes in Stata-formatted files and to use standardized variable naming conventions that maintain consistency across survey waves since 2004. Therefore, Metadata is structured to facilitate longitudinal analysis. The project employs Stata-formatted datasets (.dta) with embedded metadata, preserving:

Variable labels (i.e., question wording).
Value labels (i.e., responses options).
Missing value codes (i.e., Not Applicable, Don’t Know, Refused).

As a result, a key advantage is LAPOP Lab’s active metadata preservation even when distributing data through third-party platforms like ICPSR or NORC, the Stata-formatted metadata remains intact, allowing R/python users to still access labels via the readstata13 package.

How to load AmericasBarometer Data in R

The preferred method for loading LAPOP Lab datasets is to use the read.dta13() function from the readstata13 package. Below, I show a key difference between this recommended approach and two alternative methods commonly used by researchers: haven::read_dta() and rio::import().

In the R programming language, attributes are additional metadata that can be attached to R objects. Attributes provide a way to store extra information about an object beyond its basic data structure. To access the attributes of an object, you can use the attr() or attributes() base functions.

# READSTATA13 PACKAGE (RECOMMENDED FOR LABELS)
data1 <- read.dta13("./BRA 2023 LAPOP AmericasBarometer (v1.0s).dta")
names(attributes(data1))

# HAVEN PACKAGE (RECOMMENDED FOR PLOTTING)
data2 <- read_dta("./BRA 2023 LAPOP AmericasBarometer (v1.0s).dta")
names(attributes(data2))

# RIO PACKAGE
data3 <- import("./BRA 2023 LAPOP AmericasBarometer (v1.0s).dta")
names(attributes(data3))

The methods haven and rio packages do not recover all the metadata from dta files. Thereby, we proceed with the dataset loaded with the readstata13 package.

AmericasBarometer Variables Labels (Question Wording)

AmericasBarometer variable labels are stored as notes in Stata .dta files. We can find them inside the “expansion.fields” attribute.

The varlabels object created is a nested list (much like a JSON object) that will contain multilingual notes with the question wording of each variable. For instance, for the variable ing4 in the Brazil AB 2023 dataset will contain notes with the question wording and lead-in in English, Spanish, and Portuguese.

data(bra23)

# Extracting attributes from expasion.fields
names(attributes(bra23))

##  [1] "row.names"        "names"            "datalabel"        "time.stamp"      
##  [5] "formats"          "types"            "val.labels"       "var.labels"      
##  [9] "version"          "label.table"      "expansion.fields" "byteorder"       
## [13] "orig.dim"         "data.label"       "class"

varlabels = attr(bra23, "expansion.fields")

# Printing notes for varibale "ing4"
varlabels[grepl("ing4", varlabels) & grepl("note", varlabels)]

## [[1]]
## [1] "ing4"  "note0" "7"    
## 
## [[2]]
## [1] "ing4"                                                                                                                                                                                                                                                                                                        
## [2] "note7"                                                                                                                                                                                                                                                                                                       
## [3] "Ahora, vamos a usar una escalera en donde el número 1 representa \"muy en desacuerdo\" y el número 7 representa \"muy de acuerdo\". Un número entre el 1 y el 7, representa un puntaje intermedio. Le voy a leer algunas frases. Por favor dígame hasta qué punto está de acuerdo o en desacuerdo con ellas."
## 
## [[3]]
## [1] "ing4"                                                                                                                                                                                                                                                             
## [2] "note6"                                                                                                                                                                                                                                                            
## [3] "Now we will use a ladder where 1 means \"strongly disagree\" and 7 means \"strongly agree.\" A number in between 1 and 7 represents an intermediate score. Now I am going to read some statements. Please tell me to what extent you agree or disagree with them."
## 
## [[4]]
## [1] "ing4"                                                                                                                                         
## [2] "note5"                                                                                                                                        
## [3] "Democracy may have problems, but it is better than any other form of government. To what extent do you agree or disagree with this statement?"
## 
## [[5]]
## [1] "ing4"                                                                                                                                                                                                                                                                      
## [2] "note4"                                                                                                                                                                                                                                                                     
## [3] "Agora, vamos usar um cartão em que o ponto 1 representa \"discorda muito\" e o ponto 7 representa \"concorda muito\". Um número entre 1 e 7, representa uma pontuação intermediária. Agora vou ler algumas frases. Por favor, me diga o quanto concorda ou discorda delas."
## 
## [[6]]
## [1] "ing4"                                                                                                                              
## [2] "note3"                                                                                                                             
## [3] "A democracia tem seus problemas, mas é melhor que qualquer outra forma de governo. Até que ponto concorda ou discorda desta frase?"
## 
## [[7]]
## [1] "ing4"                                                                                                                                                         
## [2] "note1"                                                                                                                                                        
## [3] "Puede que la democracia tenga problemas, pero es mejor que cualquier otra forma de gobierno. ¿Hasta qué punto está de acuerdo o en desacuerdo con esta frase?"
## 
## [[8]]
## [1] "ing4"                                                                                                                                          
## [2] "note2"                                                                                                                                         
## [3] "Democracy may have problems, but it is better than any other form of government.  To what extent do you agree or disagree with this statement?"

How do I find the correct language labels?

It is important to keep in mind that note numbers may change across countries datasets and years depending on the language the survey was fielded. For Brazil, it is Brazilian-Portuguese, whereas in most other countries it is in Spanish but also in English like in Jamaica.

A streamlined approach for users who want to quickly find the values of the notes on variable labels without assigning to the dataset would be assessing those notes directly with a combination of lapply() and Filter() functions from base R, or also compact() and map() functions from the purrr package.

Both approaches extract the third element (x[3]) from each list in the “expansion.fields” attribute of the dataset, but only if the first element (x[1]) equals “ing4” (i.e., the variable of interest). Both produce a semi-structured list of matching third nested elements.

# Base R approach:
# The base R version uses lapply to iterate over the list and returns NULL when 
# the condition isn't met, then Filter(Negate(is.null), ...) removes those NULLs.
head(
  Filter(Negate(is.null), lapply(attr(bra23, "expansion.fields"), 
  function(x) {
  if (x[1] == "ing4") list(note_id = x[2], note_value = x[3])
}))
)

## [[1]]
## [[1]]$note_id
## [1] "_lang_l_en"
## 
## [[1]]$note_value
## [1] "ing4_en"
## 
## 
## [[2]]
## [[2]]$note_id
## [1] "_lang_v_en"
## 
## [[2]]$note_value
## [1] "Support for democracy"
## 
## 
## [[3]]
## [[3]]$note_id
## [1] "note0"
## 
## [[3]]$note_value
## [1] "7"
## 
## 
## [[4]]
## [[4]]$note_id
## [1] "note7"
## 
## [[4]]$note_value
## [1] "Ahora, vamos a usar una escalera en donde el número 1 representa \"muy en desacuerdo\" y el número 7 representa \"muy de acuerdo\". Un número entre el 1 y el 7, representa un puntaje intermedio. Le voy a leer algunas frases. Por favor dígame hasta qué punto está de acuerdo o en desacuerdo con ellas."
## 
## 
## [[5]]
## [[5]]$note_id
## [1] "note6"
## 
## [[5]]$note_value
## [1] "Now we will use a ladder where 1 means \"strongly disagree\" and 7 means \"strongly agree.\" A number in between 1 and 7 represents an intermediate score. Now I am going to read some statements. Please tell me to what extent you agree or disagree with them."
## 
## 
## [[6]]
## [[6]]$note_id
## [1] "note5"
## 
## [[6]]$note_value
## [1] "Democracy may have problems, but it is better than any other form of government. To what extent do you agree or disagree with this statement?"

# purrr package approach
#  The purrr version uses map() to do the same, and also remove NULL results. 
tail(
  purrr::map_dfr(attr(bra23, "expansion.fields"), ~ {
  if (.x[1] == "ing4") 
    data.frame(
      note_id = .x[2],
      note_value = .x[3],
      stringsAsFactors = FALSE
    )
})
)

##            note_id
## 9  _lang_l_default
## 10 _lang_v_default
## 11           note1
## 12           note2
## 13      _lang_v_pt
## 14      _lang_l_pt
##                                                                                                                                                       note_value
## 9                                                                                                                                                     labels2670
## 10                                                                               ING4. A democracia tem seus problemas, mas é melhor que qualquer outra forma de
## 11 Puede que la democracia tenga problemas, pero es mejor que cualquier otra forma de gobierno. ¿Hasta qué punto está de acuerdo o en desacuerdo con esta frase?
## 12                Democracy may have problems, but it is better than any other form of government.  To what extent do you agree or disagree with this statement?
## 13                                                                                                                                            Apoio à democracia
## 14                                                                                                                                                       ing4_pt

To ease the researchers and analysts job, LAPOP Lab developed custom functions to deal with those labels in R. First, we will use lpr_extract_notes to transform the nested lists into a long dataframe format that will contain 3 columns: variable_name, note_id, and note_value

# Extract AB notes
notes <- lpr_extract_notes(varlabels)
head(notes, n = 10) # columns names

##    variable_name    note_id                       note_value
## 1         strata _lang_l_en                        strata_en
## 2         strata _lang_v_en                    Survey strata
## 3            q1n _lang_l_en                           q1n_en
## 4            q1n _lang_v_en Gender determined by interviewer
## 5           q1tb _lang_l_en                          q1tb_en
## 6           q1tb _lang_v_en                           Gender
## 7          q1tcb _lang_l_en                         q1tcb_en
## 8          q1tcb _lang_v_en                           Gender
## 9          q1tca _lang_l_en                         q1tca_en
## 10         q1tca _lang_v_en                           Gender

table(notes$note_id) # note information available!

## 
##     _lang_l_default          _lang_l_en          _lang_l_pt     _lang_v_default 
##                 190                1332                1332                 202 
##          _lang_v_en          _lang_v_pt            destring        destring_cmd 
##                1321                1320                   7                   7 
##               note0               note1              note10              note11 
##                 861                 846                   7                   6 
##              note12              note13               note2               note3 
##                   1                   1                 833                 265 
##               note4               note5               note6               note7 
##                 189                 172                 134                  91 
##               note8               note9 spss_variable_label 
##                  68                  25                   3

After extracting the notes, one can assign variable labels to their preferred language as attributes to the dataset using another LAPOP Lab function called lpr_set_attr() and than access it during data analysis without the need of an accompanying codebook.

# English
bra23<-lpr_set_attr(bra23, notes, 
                    noteid = "note2", 
                    attribute_name = "qwording_en") 
# Spanish
bra23<-lpr_set_attr(bra23, notes, 
                    noteid = "note1", 
                    attribute_name = "qwording_es")

# Portuguese
bra23<-lpr_set_attr(bra23, notes, 
                    noteid = "note3", 
                    attribute_name = "qwording_pt") 

# Printing languages
attr(bra23$ing4, "qwording_en") # English

## [1] "Democracy may have problems, but it is better than any other form of government.  To what extent do you agree or disagree with this statement?"

attr(bra23$ing4, "qwording_es") # Spanish

## [1] "Puede que la democracia tenga problemas, pero es mejor que cualquier otra forma de gobierno. ¿Hasta qué punto está de acuerdo o en desacuerdo con esta frase?"

attr(bra23$ing4, "qwording_pt") # Portuguese

## [1] "A democracia tem seus problemas, mas é melhor que qualquer outra forma de governo. Até que ponto concorda ou discorda desta frase?"

AmericasBarometer Values Labels (Response Options)

AmericasBarometer’s values labels (i.e., response options) differently from the variable labels are stored inside the “label.table” attribute. To extract them we will use another LAPOP Lab custom function called lpr_format_labels that will set up the response options as an attribute across languages. Default language is English and default attribute name is roslabel but the function allows custom attribute names as shown below:

bra23 <- lpr_set_ros(bra23) # Default English

bra23 <- lpr_set_ros(bra23, lang_id = "es", 
                     attribute_name = "respuestas") # Spanish

bra23 <- lpr_set_ros(bra23, lang_id = "pt", 
                     attribute_name = "ROsLabels_pt") # Portuguese

# Printing ROs
attr(bra23$ing4, "roslabel") # English

## [1] "Response Options: (1) Strongly disagree (7) Strongly agree"

attr(bra23$ing4, "respuestas") # Spanish

## [1] "Opciones de Respuesta: (1) Muy en desacuerdo (7) Muy de acuerdo"

attr(bra23$ing4, "ROsLabels_pt") # Portuguese

## [1] "Alternativas de Resposta: (1) Discorda muito (7) Concorda muito"

Takeaways (TLDR)

Data Structure

AmericasBarometer datasets are distributed in Stata .dta format with multilingual metadata (question wording and response options) embedded as attributes. These support cross-national and longitudinal comparability.

Preferred Loading Method

Use the readstata13::read.dta13() function to preserve full metadata structure. Other methods such as haven::read_dta() or rio::import() often fail to import these critical attributes.

Variable Labels (Question Wording):

Stored in the “expansion.fields” attribute. Use the custom lpr_extract_notes() function to convert this into a tidy data frame.

Assign preferred language labels to each variable with lpr_set_attr() using the appropriate noteid.

Value Labels (Response Options):

Stored in the “label.table” attribute. Use lpr_set_ros() to assign these response labels as attributes in English, Spanish, or Portuguese.

Quick Access Options

For advanced users, the guide also demonstrates streamlined methods using lapply() and purrr::map() to quickly view variable labels without modifying the dataset.