Which of the following is an example of secondary data you might use while researching a report

What does each and every research project need to get results? Data – or information – to help answer questions, understand a specific issue or test a hypothesis.

Researchers in the health and social sciences can obtain their data by getting it directly from the subjects they’re interested in. This data they collect is called primary data. Another type of data that may help researchers is the data that has already been gathered by someone else. This is called secondary data.

What are the advantages of using these two types of data? Which tends to take longer to process and which is more expensive? This column will help to explain the differences between primary and secondary data.

Primary data

An advantage of using primary data is that researchers are collecting information for the specific purposes of their study. In essence, the questions the researchers ask are tailored to elicit the data that will help them with their study. Researchers collect the data themselves, using surveys, interviews and direct observations.

In the field of workplace health research, for example, direct observations may involve a researcher watching people at work. The researcher could count and code the number of times she sees practices or behaviours relevant to her interest; e.g. instances of improper lifting posture or the number of hostile or disrespectful interactions workers engage in with clients and customers over a period of time.

To take another example, let’s say a research team wants to find out about workers’ experiences in return to work after a work-related injury. Part of the research may involve interviewing workers by telephone about how long they were off work and about their experiences with the return-to-work process. The workers’ answers–considered primary data–will provide the researchers with specific information about the return-to-work process; e.g. they may learn about the frequency of work accommodation offers, and the reasons some workers refused such offers.

Secondary data

There are several types of secondary data. They can include information from the national population census and other government information collected by Statistics Canada. One type of secondary data that’s used increasingly is administrative data. This term refers to data that is collected routinely as part of the day-to-day operations of an organization, institution or agency. There are any number of examples: motor vehicle registrations, hospital intake and discharge records, workers’ compensation claims records, and more.

Compared to primary data, secondary data tends to be readily available and inexpensive to obtain. In addition, administrative data tends to have large samples, because the data collection is comprehensive and routine. What’s more, administrative data (and many types of secondary data) are collected over a long period. That allows researchers to detect change over time.

Going back to the return-to-work study mentioned above, the researchers could also examine secondary data in addition to the information provided by their primary data (i.e. survey results). They could look at workers’ compensation lost-time claims data to determine the amount of time workers were receiving wage replacement benefits. With a combination of these two data sources, the researchers may be able to determine which factors predict a shorter work absence among injured workers. This information could then help improve return to work for other injured workers.

The type of data researchers choose can depend on many things including the research question, their budget, their skills and available resources. Based on these and other factors, they may choose to use primary data, secondary data–or both.

Source: At Work, Issue 82, Fall 2015: Institute for Work & Health, Toronto [This column updates a previous column describing the same term, originally published in 2008.]

Introduction

What are secondary data? Secondary data refer to data that are collected by someone other than the user or are used for an additional purpose than the original one. A wide range of sources can be used as secondary data: censuses, information collected by government departments, organizational records and data that were originally collected for other research purposes [1-3]. Yee and Niemeier [4] discuss the benefits of longitudinal data as compared to repeated cross-sectional information.

Use of repeated cross sectional or longitudinal secondary data to explore social and health issues can result in the ability to provide comparative information about important environmental issues. For example, social or health related information could be examined before, during and after the current COVID 19 pandemic to gain some understanding of the course and impact of the outbreak and to inform resource allocation. Using secondary analyses of survey data collected by the China CDC, Gao, et al. [5] was able to provide timely information to demonstrate geographical differences and duration of Coronavirus in health care workers in China.

Secondary data can answer two types of questions: descriptive and analytical. Hence, the information can be used to describe events or trends or it can be used to examine relationships among variables cross-sectionally or longitudinally. Numerous secondary data bases exist and many are available online (e.g., The European Bioinformatics Institute database [6] provides a searchable database of biologic sources that can be linked to survey data). The Centre for Addiction and Mental Health (CAMH) conducts surveys in adults in Ontario, Canada (CAMH Monitor) that are repeated cross-sectional studies. The Monitor has been used in both descriptively and analytically and has provided important information on a multitude of health behaviors and policies.

Examples

An analysis of CAMH Monitor data from 1996-2006 provided important descriptive information about quitting smoking among individuals who were categorized as regular or occasional smokers. We found that the prevalence of having quit smoking for at least one year increased over time. In addition, females were more likely to show this increase than males, and older individuals more likely than younger ones [7]. These results provide us with the backdrop for examining additional questions in future research about why people quit, what programs might help people quit, and whether those who do quit are using new products that have become available such as e-cigarettes, waterpipes, smokeless tobacco and bidis. In addition, future research could be undertaken to explore whether methods of quitting have changed over time. Either survey questions could be developed to examine these issues or qualitative interviews could be used to supplement the information from the survey.

CAMH Monitor data have also been used descriptively to analyze effects of new legislation or policies by examining trends before and after the introduction of the legislation or policy, such as the potential impact of legislation on motor vehicle collisions in Ontario among smokers and nonsmokers. Legislation was enacted in Ontario in 2006 to prohibit smoking in vehicles when children and adolescents were present. We found that before the law was enacted the rate of reported collisions was higher among smokers than non smokers. Following the enactment of the legislation the rate among smokers decreased and there was no statistical difference between smokers and nonsmokers [8]. What is not known is whether drivers are in fact smoking while they are driving, their awareness of the legislation and whether their driving—smoking patterns changed because of the legislation. Another study examining cross-sectional CAMH data over time to assess legislative effects, found that texting and driving declined after introduction of more severe penalties [9].

Other examples of the use of CAMH Monitor data to evaluate policy interventions include Wickens, et al. [9] who assessed the impact of legislation to increase penalties for distracted driving on rates of texting and driving, and Mann, et al. [10] who evaluated the impact of legislation introducing administrative sanctions for impaired driving in on rates of driving after drinking in the province. These secondary analyses can also be supplemented with qualitative interviews to provide some explanation and background for the original findings.

Other types of secondary databases are longitudinal where large samples of individuals are followed over a number of years. For example, Wiesenthal and Vingilis [11] analyzed the Canadian National Population Health Survey (NPHS) descriptively and analytically to examine trends over time and relationships among variables. Specifically, they examined trajectories of distress in participants after they reported being injured from a motor vehicle collision. The NPHS, a Statistics Canada survey, is a repeated measures longitudinal survey to monitor the health and wellbeing of 20,000 Canadians. Participants were interviewed biennially from 1994/95 to 2002/03 (5 waves of interviews over a 9-year span). Because of the longitudinal nature of the secondary database, hierarchical linear modelling was used to identify within person trends; men experienced greater overall distress over time than women and a greater increase in distress over time. Moreover, the level of pre-injury distress predicted post-injury distress. This study revealed more complex and nuanced relationships among variables in their prediction of post-motor vehicle injury psychological distress. This secondary database provided numerous benefits. First, motor vehicle injuries are rare events; however, a sample of 20,000 individuals interviewed over 9 years provided enough cases of motor vehicle injury to examine the effects of injuries on distress. Additionally, evidence was mixed on whether pre-morbid distress predicted post-injury distress as all previous studies only had retrospective data on pre-injury distress levels. The use of a longitudinal secondary database provided information on distress levels before the injury occurred. The large sample size of injured individuals in this secondary database allowed for examination of mediators and moderators of the effects.

Finally, secondary data can be administrative data, that is, official records, such as hospital or police records. For example, the impact of new stunt driving legislation using stunt driving charges and collision casualty statistics, identified a decrease in charges and collision casualties among young males after the 2007 street racing legislation was introduced [12,13]. In addition, different types of secondary data can complement each other. Secondary data of hospital and police records can identify cases where individuals were apprehended or injured severely enough to go to hospital while self-report data identifies cases that might be missed by more official secondary data tools.

Discussion

Of course, there are some important factors that need to be considered in the use of secondary data.

Pros: First, there is much information available that has been collected in the past. This information can be used to make important contributions to knowledge, provide recommendations for policy, and provide the backdrop for future research.

Second, because the information is already available, subsequent research can be conducted in a timely manner, without the longer timelines for submitting proposals for funding and collecting original data. This is particularly salient because often events happen, such as the introduction of policies or historical events such as the current COVID 19 pandemic, before there is any opportunity for researchers to prepare to collect the relevant information needed to evaluate their impact. Third, often large sample sizes are available with secondary datasets, which is particularly important when investigating rare events. Moreover, certain types of secondary data have added benefits. For example, longitudinal secondary datasets have increased statistical power and can estimate a greater range of conditional probabilities compared to repeated cross-sectional secondary datasets [4].

The use of secondary data also gives researchers who have conducted the original surveys additional information that they can use to justify continuation of their original research. For example, there is strong epidemiological evidence connecting cannabis use to collision risk [13-16] that has spurred and informed experimental simulation studies examining precisely how cannabis affects driving [18,19].

Cons: As noted, secondary data may not provide all of the information of interest. Questions may not be worded as precisely as we would like to answer specific questions of interest. Analyses become more complicated if the question wording or methods of administration vary. In these cases, it is particularly difficult to decide how information from a range of years can be considered together. It is also critical to understand how the information was originally collected. Response rates to surveys have decreased over time, calling into question how representative the responses might be, which must be considered in the interpretation of secondary analyses. However, many well designed surveys include sampling weights to counter the biases that may occur from non-representative sampling. Longitudinal secondary datasets can suffer from attrition, although this is sometimes addressed by replacing lost respondents [4].

Online surveys are limited to those with access to the technology; targeted sub-groups who may not be the groups of interest when doing secondary analysis; and are correlational precluding cause and effect conclusions. Finally, ethics approval may be required if the information is being used for a purpose not originally proposed

Conclusion

It is important to make note of the limitations when presenting the information from secondary data and what the potential impact on the interpretation of the results can be. Nevertheless, secondary analysis can make important contributions to knowledge as well as provide directions for future research and programs. Tripathy (2013) [20] notes that while secondary data analysis can make important contributions to knowledge, it is important to follow specific guidelines in the use of such information, one of the most important being anonymization of the information.

What is an example are examples of secondary data?

Popular examples of secondary data include: Tax records and social security data. Census data (the U.S. Census Bureau is oft-referenced, as well as our favorite, the U.S. Bureau of Labor Statistics) Electoral statistics.

What are 4 examples of secondary data sources?

Sources of secondary data include books, personal sources, journals, newspapers, websitess, government records etc. Secondary data are known to be readily available compared to that of primary data.

What is secondary data as used in research?

Secondary data is research data that has previously been gathered and can be accessed by researchers. The term contrasts with primary data, which is data collected directly from its source.

What are the 3 main sources of secondary data collection?

Sources of secondary data censuses and government departments like housing, social security, electoral statistics, tax records. internet searches and libraries. GPS and remote sensing.