Learn more →
← Back to Results

Basic Data Overview

Emsi’s Canada Data Process Overview

Emsi’s Canadian dataset incorporates and harmonizes labour market data from SEPH, LFS, CBP, Census, and PSIS, providing it in a format that is easy to understand, easy to access, and easy to use. By combining these disparate datasets into one master set, the strengths of each can compensate for the weaknesses of the others. The data reaches back to 2001 and is projected forward to 2028.

The Canada Analyst tool is updated twice a year with new data from various sources, giving our users access to the most current information. Emsi data provides information for 304 detailed industries, classified using the NAICS 2017 system; 502 detailed occupations, classified using the NOC 2016 system; and 388 educational programs within the CIP classification system. All of these classifications are provided for 5,162 detailed geographical areas.

Current Data Sources

Canadian Business Patterns (CBP)

  • Establishment Counts by Industry, CSD
  • Location Counts by Industry, CSD


Census and National Household Survey (2001, 2006, 2011, 2016)

  • Workplace-based: Earnings by Class of Worker, Industry, CD
  • Workplace-based: Employment by Class of Worker, Industry, CSD
  • Workplace-based: Employment by Class of Worker, Industry, Occupation, Province


Survey of Employment, Payroll, and Hours (SEPH)

  • Annual Employment by Industry, Province/Territory
  • Annual Weekly Earnings by Industry, Province/Territory
  • Monthly Employment by Industry, Province/Territory
  • Monthly Weekly Earnings by Industry, Province/Territory


Labour Force Survey (LFS)

  • Annual Employment by Occupation, Class of Worker, Economic Region
  • Annual Employment by Industry, Economic Region
  • Annual Employment/Earnings (two-year rolling averages), Occupation, Employees, Economic Region



Canadian Occupation Projection System (COPS)

  • Industry Employment Projections, Canada
  • Occupation Employment Projections, Canada



  • Cansim 17-10-0084-01 Historic Age/Gender, CD
  • Cansim 17-10-0085-01 Historic Population Components, CD
  • Cansim 17-10-0057-01 Projected Age/Gender, Province/Territory
  • Cansim 13-10-0418-01 Fertility Rates
  • Cansim 13-10-0710-01 Death Rates 


Postsecondary Student Information System (PSIS)

  • Enrollments and Completions by Award Level, Program, Institution, CSD


Data Classification Systems

North American Industry Classification System (NAICS) 2017
The NAICS 2017 version is currently in use in the Emsi dataset as this aligns with the NAICS version used by SEPH.

National Occupation Classification (NOC) 2016
The 2016 NOC version is currently in use in the Emsi dataset as this is the version being adopted by the Census.

Emsi Industry Data

Industry Location Counts:
Industry location counts are direct from Canadian Business Patterns with no modifications.

Industry Employee Counts:
There are multiple sources of employment data by industry available in Canada, but Emsi considers SEPH to be the best source of employee counts and employee earnings by industry. Therefore, although other sources are incorporated, SEPH is considered the primary source and other figures are adjusted to it. At its most detailed, SEPH provides 4-digit NAICS by Province/Territory. Because some values in the SEPH dataset are suppressed (undisclosed by the government to protect confidentiality), Emsi uses a proprietary process to unsuppress the data.

Supplementing SEPH:
SEPH does not cover all employees in Canada, nor does it provide detail to the desired level of geographical detail. Data from the Census and CBP are combined with the SEPH data to fill in details for employees in agriculture, fishing and trapping, private household services, religious organizations, and military personnel of defence services. Additionally, these datasets are used to disaggregate SEPH data down to the census subdivision for all industries.

Employee Earnings:

SEPH contains employee earnings for all industries by province and territory. Industry employee earnings are further regionalized to the CSD/CD level using Census data.

Employee Projections:
To create industry employment projections, Emsi builds three linear regressions using historic employee counts for each geography. The regressions utilize historic data 3, 5, and 10 years into the past. The average of these linear regressions is taken, and the results are damped to curb excessive growth and decline. All trends are then adjusted to the trends of higher geography levels (CSD adjusted to CD, CD to Province, Province to Nation). This trend is considered our base projection. After we create the base projection, we adjust our annual growth rate by industry to the projections produced by COPS. This completes our industry employee count process, creating CSD-level data for 2001-2026.

Industry Self-Employment Counts:
Data for the self-employed is less readily available than employee data. SEPH and CBP contain no data on self-employed persons, so Emsi gathers this data from the Labour Force Survey (LFS) and the Census. LFS is the benchmark dataset in this case, as the Census undercounts the number of self-employed by the nature of the questions it asks.* Emsi only provides worker counts for the self-employed; there is no earnings data available.

Emsi projects the self-employed counts in the same way the employee data is projected, with the exception of adjustments. For self-employed data, the only adjustment that is made is to the overall projected growth rate of the economy at large. This completes Emsi’s self-employment data process, which provides employee counts (not earnings) at the CSD level for 2001-2026.


Emsi Occupation Data

Occupation data is generally inferior to industry data. Because industry data is more easily tied to Business Registers and to businesses, which are typically more accurate in how they classify themselves industrially, employee counts by industry are generally more accurate than employee counts by occupation. Occupation data, by nature, is usually collected from individuals and is more prone to error. For these reasons, we consider industry data to be more reliable than occupation data, and adjust occupation data accordingly.

Geographic Occupation Counts:
Occupation data is a combination of two processes. The first is the establishment of fixed occupation counts at the higher geography levels. The second is the formation of staffing patterns for industries at these same geographic levels. These staffing patterns, in combination with the industrial mix at lower geography levels, then determine the occupational makeup of lower-level geographies (e.g. CSDs).

Emsi begins with 4-digit NOC Labour Force Survey employment and earnings figures at the Economic Region geographical level. This dataset contains undisclosed values (suppressions), which Emsi fills in using Census data as an initial estimate. The undisclosed values for earnings are filled in using a separate process that incorporates industry earnings and occupation earnings from a higher level of geography. This process yields a full-series 4-digit NOC breakout at the economic region level. These estimates are then disaggregated to the CSD level using Census, smoothed to account for volatility present in LFS, and adjusted to SEPH totals so that occupation job counts and earnings match industry job counts and earnings.

The occupation job counts data is then projected using the same projection methodology described above in the industry employee process. After this base projection is created, its annual growth rate is adjusted by occupation to the occupation projections produced by COPS. These projections are then adjusted so that the projected occupation totals match the projected industry employment totals. The result of these processes is Emsi occupation employment and earnings data by Economic Region.

Occupation Staffing Patterns:
The second part of the occupation process creates staffing patterns for each economic region. After the staffing patterns are formed, CSD-level industry data is “staffed” into occupations at the CSD level using the staffing patterns created for the higher-level geography. Average hourly earnings at the Economic Region level by occupation are then applied to CSD-level data (earnings data by occupation is problematic below the Economic Region level). This forms Emsi’s occupation employee dataset at the CSD level.

Occupation Self-Employment Process:
The self-employment occupation process follows the employee occupation process very closely, with a few minor alterations. First, self-employment occupation data margins are established at the Province level rather than at the Economic Region level, as the data is highly suppressed at the Economic Region level. Second, the self- employment occupation data is not adjusted to COPS occupation projections. Third, staffing patterns are created at the Province level rather than at the Economic Region level. Finally, earnings figures are unavailable for self-employed workers by occupation.


Demographics Data

Emsi provides population counts by age and gender at the Census Division level. All historic period data is published by Emsi as delivered by StatCan. For projected years, Emsi uses a traditional cohort model, which accounts for births, deaths, in- migration and out-migration at the Census Division level. The results of the cohort model are adjusted to provincial population projection estimates published by StatCan.


Education Data

Emsi provides completions counts by institution and educational program. Completions data is produced by unsuppressing the PSIS enrollment and graduate datasets separately, and then combining them into one set.

*LFS estimates for self-employed workers in 2016 exceed Census counts of the same by 1.2 million.

Submit a Question

Let us know what specific questions we can help you with (we may even add your question to our knowledge base).


There are no related articles.

Submit a Question

Let us know what specific questions we can help you with (we may even add your question to our knowledge base).