Tag: census data

Working with Statistics Canada Data in R, Part 4: Canadian Census Data – cancensus Package Setup

Back to Working with Statistics Canada Data in R, Part 3.

What is cancensus

In the Introduction to the “Working with Statistics Canada Data in R” series, I discussed the three main types of data available from Statistics Canada. It is now time to move on to the second of those data types – the Canadian National Census data.

cancensus is a specialized R package that allows you to retrieve Statistics Canada census data and geography. It follows the tidy approach to data processing. The package’s authors are Jens von Bergmann, Dmitry Shkolnik, and Aaron Jacobs. I am not associated with the authors, I just use this great package in my work on a regular basis. The package’s GitHub code repository can be found here. There is also a tutorial by the package’s authors, which I recommend taking a look at before proceeding.

Further in this series, I will provide an in-depth real-life example of working with Canadian Census data using the cancensus package. But first, let’s install cancensus:

install.packages("cancensus")
library(cancensus)

Setting up cancensus: Adding API Key and Cache Path to .Rprofile

cancensus relies on queries to the CensusMapper API, which requires a CensusMapper API key. The key can be obtained for free as per the package’s authors instructions. Once you have the key, you can add it to your R environment:

options(cancensus.api_key = "CensusMapper_your_api_key")

Note that although the authors are warning that API requests are limited in volume, I have significantly exceeded my API quota on some occasions, and still had no issues with retrieving data.

That said, depending on how much data you need, you can draw down your quota very quickly, and here’s where local cache comes to the rescue. cancensus caches data every time you retrieve it, but by default the cache is not persistent between sessions. To make it persistent, as well as to remove the need to enter your API key every time you use cancensus, you can add both the API key and the cache path to your .Rprofile file.

If you’d like to learn in-depth what .Rprofile is, what it is for, and how to edit it, consider taking a look at sections 2.4.2 to 2.4.5 in “Efficient R Programming” by Colin Gillespie and Robin Lovelace. For a quick and simple edit, just keep reading.

Editing .Rprofile on Linux

First, find your R home directory with R.home(). Most likely, it will be in /usr/lib/R. If that is where your R home is, in the Linux Terminal (not in R), run:

sudo nano /usr/lib/R/library/base/R/.Rprofile # edit path if needed

On some systems, .Rprofile may not be hidden, so if the above command doesn’t open the .Rprofile file, try removing the dot before ‘Rprofile’:

sudo nano /usr/lib/R/library/base/R/Rprofile

(!) Note that this will edit the system .Rprofile file, which will always run on startup and will apply to all your R projects. The file itself will warn you that “it is a bad idea to use this file as a template for personal startup files”. You can safely ignore this warning as long as the only edit you are making is the one shown below, i.e. adding cancensus cache path and API key.

In the “options” section, add these two lines:

options(cancensus.api_key = "CensusMapper_your_api_key")
options(cancensus.cache_path = "/home/your_username/path_to_your_R_directory/.cancensus_cache")

Then hit Ctrl+X, choose Y for “yes”, and press Enter to save changes. When you first retrieve data with cancensus, R will create .cancensus_cache directory for you.

Editing .Rprofile on Windows

Editing .Rprofile on Windows is a bit tricky. The best thing for you would be not to touch Windows system .Rprofile, or else risk weird errors and crashes (honestly, I was not able to figure out where they come from).

Instead, set up a project-specific .Rprofile. The downside is that you may need to set it separately for every project in which you are going to use cancensus. The upside is that the contents of the .Rprofile file should be exactly the same every time. In R or Rstudio (not in the command line), run:

file.edit(".Rprofile")

Then, add these two lines to the “options” section, and save the file:

options(cancensus.api_key = "CensusMapper_your_api_key")
options(cancensus.cache_path = "C:\\Users\\Home\\Documents\\R\\cancensus_cache")

Note that when used inside R, \ symbols in file paths in Windows may need to be escaped with another \ symbol. If this file path doesn’t work, try replacing duplicate \\ with single \.

At this point you should be ready to start retrieving data with cancensus, which will be addressed in detail in my next post.

Working with Statistics Canada Data in R, Introduction

Forward to Working with Statistics Canada Data in R, Part 1.

This is the Introduction to the series on working with Statistics Canada data in the R language. The goal of the series is to provide some examples (accompanied by detailed in-depth explanations) of working with Statistics Canada data in R. Besides, I’d love to see more economists, policy analysts, and social scientists using R in their work, so I’ll be doing my best to make this easy for people without STEM degrees.

Data Types
The Tools You Need

Data Types

Statistics Canada data is routinely used for economic and policy analysis, as well as for social science research, journalism, and many other applications. It is expected that the reader has some basic R skills.

For the purposes of this series, let’s assume that there are three main types of StatCan data:

  • Statistics Canada Data, previously known as Canadian Socio-economic Information Management System (CANSIM),
  • National census data, and
  • Geographic data provided in a multitude of formats that can be used by GIS software: ArcGIS shapefiles (.shp), Geography Markup Language files (.gml), MapInfo files (.tab), etc.

The “Working with Statistics Canada Data in R” series will follow these data types, and will consist of this Introduction and the following parts: parts 1, 2, and 3 about working with CANSIM data; parts 4, 5, and 6 about Canadian Census data; and parts 7 and 8 about working with StatCan geospatial data.

This is not an official classification of data types available from Statistics Canada. The classification into CANSIM, census, and geographic data is for convenience only, and is loosely based on the key tools used for StatCan data retrieval and processing in R.

The Tools You Need

To be more specific, cansim is the package designed to retrieve CANSIM data, and cancensus is the package to get census data. Further data processing will be done with the tidyverse meta-package (a collection of packages that is itself a package) which is some of the most powerful data manipulation software currently available. GIS data is a more complex matter, but at the very minimum you will need sf, tmap, and units packages. Obviously, just as the R language, all these are completely free and open source. I am not in any way associated with the authors of any of the above packages, I just use them a lot in my work.

Note that although CANSIM has been recently renamed to Statistics Canada Data, I will be using the historic name CANSIM throughout this series in order to distinguish the data obtained from Statistics Canada Data proper from other kinds of StatCan data, i.e. census and geographic data (see how confusing this can get?).

Finally, here’s the code that installs the minimum suite of packages required to run the examples from this series. Note that you might be unable to install sf and units right now, since they have system requirements such as certain libraries being installed, which don’t usually come available “out of the box”. More on sf and units installation in the upcoming “Working with Statistics Canada Geospatial Data” post.

install.packages(c("cansim", "cancensus", "tidyverse", "tmap"))
# install.packages(c("sf", "units"))

Continue to Working with Statistics Canada Data in R, Part 1.