Volume 6, No. 2, Art. 36 – May 2005

Acquiring Qualitative Data for Secondary Analysis

Louise Corti & Gill Backhouse

Abstract: Qualidata was launched in 1994 as a proactive service for the location, documentation and preservation of qualitative social science research data. Over the course of its relatively short history, Qualidata has succeeded in gaining acceptance for the deposit and re-use of qualitative data material amongst the academic community. Initially concentrating on acquiring important early social studies, then working with the UK's Economic and Social Research Council (ESRC) to operate a "datasets policy," ESDS Qualidata, now merged into UK Data Archive, is in a good position to look back and review the progress made in acquiring data. This paper draws on ESDS Qualidata's pioneering experiences in acquiring and making available qualitative data. It reflects on strategies that have proved successful and comment on others that have been less productive, developing these into guidance for others wishing to embark on the acquisition and deposit of qualitative social science research data.

Key words: social science data archives, qualitative data archives, creating data, depositing data, acquisitions policy, data sharing policy, secondary analysis of qualitative data, consent, confidentiality

Table of Contents

1. Introduction

2. The Early Years of Qualidata—Starting to Acquire Qualitative Data

3. The Acquisition of ESRC Funded Data

4. Consolidation of Qualidata—Continuing the Acquisition Process

5. The Evolution and Comparison of Qualidata's Acquisition Policy

6. Criteria Used in Evaluating Qualitative Data for Archiving

7. Ethical Issues in Archiving Data

8. Summary of Main Points for a Successful Qualitative Data Acquisition Strategy







1. Introduction

The ESRC Data Archive was established in 1967 to preserve the most significant machine-readable data from research funded by the Economic and Social Research Council (ESRC). Until recently, most machine-readable data was statistical, based on surveys. Qualitative research was paper based and therefore only a small proportion of qualitative research data funded by the ESRC was archived. In 1991 the ESRC commissioned Paul THOMPSON (1998) to carry out a small pilot study to find out what was happening to qualitative data from projects that it had funded. The results of the survey showed that 90% of social science qualitative research material from projects funded by the ESRC was at risk or already lost. [1]

As a consequence of the survey report, the ESRC established the ESRC Qualitative Data Archival Resource Centre (Qualidata), with additional resources for accommodation and office expenses provided by the University of Essex. Paul THOMPSON was appointed as the Director, Louise CORTI (a scientist and social scientist by training) as Senior Administrator, and Janet FOSTER (an archivist) as Senior Research Officer. This team embarked on Qualidata's remit which was two fold: first to undertake a salvage operation to rescue the most significant material created by research from previous years; and secondly to work with the ESRC and the ESRC Data Archive to ensure that for current and future projects, the unnecessary waste of the past did not continue (THOMPSON 1998). [2]

It is important to explain here that Qualidata was not set up as an archive, but to act as a broker between researchers and existing archives. Qualidata's role was to locate and evaluate data, process and catalogue the data and arrange for their deposit in an appropriate archive. The existence of archived research materials for re-use would then be made available to the research and teaching community. [3]

2. The Early Years of Qualidata—Starting to Acquire Qualitative Data

As Qualidata is not an archive, the Center's staff needed to explore existing repositories across the UK that would be suitable to hold qualitative research material. Not only did this survey indicate potential archives to work with, it also revealed a significant amount of archived qualitative data. [4]

The main challenge, however, was to establish how much significant qualitative research material had survived from earlier research projects. Three different surveys were conducted by contacting researchers who had collected qualitative data, sometimes as far back as 1945. ESRC funded projects that had generated qualitative data, including earlier SSRC projects, were surveyed, and produced records extending back to 1970. From these surveys, details of each individual research project were entered into a database making a total of 2,565 records. [5]

There was an urgency concerning the acquisition of material from earlier social studies as it was discovered that many important datasets were already lost. One such dataset was the research material on child-rearing which John and Elizabeth NEWSON had been collecting in Nottingham since the early 1960s, consisting of over 3,000 high quality in-depth interviews with parents and children. Only weeks before the inauguration of Qualidata, the Newsons decided that their lifetime's research collection should be destroyed (THOMPSON 1998; BACKHOUSE & THOMPSON 2000). The results of this work can be seen in Table 1 with 4 datasets archived initially and then 29 in the subsequent year.


Mar 95-Sept 95

Oct 95-Sept 96

Oct 96-Sept 97

Oct 97-Sept 98

Oct 98-Sept 99

Oct 99-Sept 2000

Number of datasets evaluated







Number of datasets archived







Number of datasets cataloged1)







% of staff effort devoted to "classic studies"







Table 1: Performance measurement statistics 1995 – 20002) [6]

One of the most significant data collections archived in the first two years was the lifetime's research materials of Peter TOWNSEND. These include data from "The Family Life of Old People" (1957), "The Last Refuge" (1962), "Poverty in the UK" (1979), and an unused set of interviews carried out in the 1950s with residents of Katherine Buildings in East London, which had been a housing experiment in the 1880s. No archive was able to take large paper-based collections on social policy and social change and therefore Qualidata set up the National Social Policy and Social Change Archive (NSPSCA) at Essex in 1996, with funding from the Joseph Rowntree Foundation. Without this action, an extremely rich source of material on poverty and old age, drawn from large scale and in-depth studies, would have been lost to social science researchers and historians. [7]

Paul THOMPSON's interviews, which formed the basis for his book "The Edwardians" (1975), were archived at Essex and have been re-used by numerous researchers, leading to many publications on a wide range of topics (THOMPSON 2000). 450 in-depth interviews, from a quota sample of men and women across the UK born between 1870 and 1906, were collected by the research team for the project. This rich and unique dataset includes themes on gender issues, childhood, work and family life, and continues to provide scope for further exploration by researchers. [8]

Additional significant material that was archived includes Dennis MARSDEN's interviews from "Mothers Alone: Poverty and the Fatherless Family" (1969), and "Workless: some Unemployed Men and their Families" (1975). [9]

3. The Acquisition of ESRC Funded Data

The ESRC Datasets Policy was established in 1995 and reinforces the ESRC's stated position relating to the acquisition and use of datasets, the requirements of which are now a condition of ESRC research funding (ESRC Guide to Research Funding Section 17). The ESRC requires all award-holders to offer for deposit copies of both machine-readable quantitative data, and machine- and non-machine-readable qualitative data, within three months of the end of the award. This relates not only to datasets arising as a result of primary data collection, but also to derived datasets resulting from ESRC-funded work. [10]

The Datasets Policy requires that datasets must be deposited to a standard that would enable the data to be used by a third party, including the provision of adequate documentation. Depositors are advised to contact the two Resource Centers at the earliest opportunity should the nature of the data be such that it may be difficult to archive. The earlier in the research process these discussions occur, the more likely researchers are to create datasets which are well-documented, free of confidentiality or license constraints, and usable for secondary analysis. [11]

Certainly the ESRC is far ahead of other funders in the UK in terms of its support for archiving data through its Datasets Policy, by funding the UK Data Archive and Qualidata, and thereby making data available to other researchers for secondary analysis. However, further action is needed by the ESRC to enable this Policy to be fully implemented. CORTI and WRIGHT (2002) have made recommendations for "a set of changes to the operational procedures that would create a more robust, systematic and accountable policy." They make the following recommendations:

  • Qualidata and the UK Data Archive have an input at the grant application selection stage.

  • The ESRC needs a fully co-ordinated strategy in-house, with dedicated staff to ensure the smooth running and auditing of the policy.

  • Data creators should be required to submit a formalized data management plan at the application or short-listing stage.

  • The ESRC needs to take a more stringent view towards the length of time allowed for data embargo. [12]

These changes would considerably advance the progress made in pursuing the aims of the Datasets Policy. [13]

4. Consolidation of Qualidata—Continuing the Acquisition Process

The period from October 1996 to 2000 was one of consolidation for Qualidata. With an established comprehensive database of research projects, procedures in place for evaluating datasets and depositing them in appropriate archives, this period saw a broadening of the range of disciplines of the datasets acquired. The post of Researcher Support Officer was established in 1998 providing advice and information to ESRC grant applicants and award holders on archiving data and undertaking outreach work with the ESRC Research Programmes; in all, adopting a proactive approach to the acquisition of data. [14]

Qualidata has contacted other major UK funders about their archival policies. The Joseph Rowntree Foundation has no formal datasets policy but does encourage both quantitative and qualitative data to be offered for archiving. The Nuffield Foundation has adopted an archival policy for social science datasets for its research. The Leverhulme Trust leaves the decision on archiving data with the researchers but does send information on awards to the Data Archive. Qualidata has been in discussion with others funders of data to explore the possibilities of working together on archival policies. [15]

As Table 1 shows, from 1996 to 1997, there was perhaps a surprising fall in the number of datasets archived. This can be explained partly by staff changes, less time spent on acquiring earlier studies and too early for the Datasets Policy to have an effect. In Table 2 are figures for datasets identified for archiving, and those that have been archived, with numbers for each discipline. The large number of datasets for sociology reflects the fact that social studies were surveyed in the pilot work and again in the researchers' surveys.


Total no. Collections identified for acquisition


Awaiting response from investigator


Data destroyed


Investigator not willing


Collections acquired




Area Studies / Environmental Planning














Economic and Social History





















Management and Business Studies







Political Science and Industrial Relations







Social Psychology







Social Administration / Social Policy3)







Social Anthropology







Socio-legal Studies




























Table 2: Researcher survey progress: datasets selected for acquisition and outcomes by discipline at October 20005) [16]

The range of disciplines for which Qualidata evaluated and archived data widened from October 1998 as contact was made with all ESRC Research Programmes to provide advice and information on archiving data. What is clear from Table 2 is the very low response rate from researchers to requests for details on datasets for potential deposit. To achieve an archived dataset, not only does the researcher have to be in favor of the principle of archiving data, but the confidentiality and consent agreements from interviewees must be satisfactory. In addition, researchers must be willing to prepare and, in some cases, resurrect their data for archiving. Archiving retrospectively is time consuming and often low in priority for an academic's workload. [17]

Significant datasets that were archived between 1996 and 2000 were Stan COHEN's, "Folk Devils and Moral Panics: The Creation of the Mods and Rockers" (1971); "The Affluent Worker in the Class Structure" (1969) study undertaken by John GOLDTHORPE, David LOCKWOOD, Frank BECHHOFER and Jennifer PLATT; and Tony COXON's research SIGMA Sexual Diaries (1982-1998). [18]

By September 2000, Qualidata had deposited 137 qualitative datasets and added details of a further 140 existing collections to its catalogue. During this period a large proportion of Qualidata's staff were engaged in data processing. The majority of acquisitions, until 2000, were paper-based, and some of the larger collections amounted to thousands of transcript pages. Paul THOMPSON's dataset for the Edwardians comprises 34,172 pages; the Affluent Worker collection contains 28,479 transcript pages. Most datasets are smaller in size but all require cataloging to prescribed standards, careful listing and labellings to enable the user to identify and make full use of the material. Anonymization may also be required involving specialist processing work, which can be very time-consuming. Nor does the processing task for machine-readable data necessarily take less time than for paper transcripts. Processing machine-readable texts may vary according to the individual requirements of each dataset, including differing demands for anonymization (CORTI 2002; CORTI & THOMPSON 1999). [19]

5. The Evolution and Comparison of Qualidata's Acquisition Policy

A policy for the acquisition of qualitative data has to take account of current thematic priorities of funders and researchers and cater for current user demand for the re-use of existing datasets, which may encompass wider themes. Secondly, an acquisition policy must adopt a long-term perspective on preserving important datasets that are not currently a preoccupation, but may be in the future. Potential acquisitions are thus always evaluated in an attempt to predict future research trends and the long-term needs of researchers. [20]

During the development of the service and in response both to changing formats for data collection and storage, and to researchers' demands for archived material for secondary analysis, Qualidata has formulated a number of criteria for evaluating qualitative data with potential for secondary analysis (THOMPSON 1998). Not all the criteria are essential for a dataset to be accepted for archiving. Some criteria are more important than others, for example, the life's work of a significant researcher, even though in a problematic format, may be deemed worthy of additional resources to convert the data into an acceptable format for preservation and re-use. [21]

6. Criteria Used in Evaluating Qualitative Data for Archiving

  • A high potential for re-analysis or comparative use

    In-depth interview materials are the best source of data for secondary analysis, providing the interviews have been skillfully conducted by an experienced interviewer and cover a wide range of topics.

  • Accompanying documentation is sufficient to enable informed re-use

    The type of documentation varies with each dataset and includes necessary contextual material which gives meaning to the data. It is essential that a dataset is well-documented.

  • High quality research influential in its field

    It is important to have good quality research data to withstand the scrutiny of secondary analysis and provide exemplar datasets for teaching purposes.

  • The life's work of a significant researcher

    For example, the research collection of Peter TOWNSEND which encompasses his life's work as a social researcher with research papers, correspondence, notes, memos and observations on the research process.

  • Complement existing archived datasets

    An archive may specialize in particular study topics, for example, health, politics or social policy, and may welcome datasets in these topics in order to enhance their collections.

  • Can be made freely available for research use

    A dataset that can be offered for re-use immediately after deposit is more acceptable than one with restrictions.

  • Copyright, data protection and confidentiality conditions are satisfactory

    This refers to the agreement made by the researcher with the interviewee and must comply with copyright, data protection and confidentiality laws in each country. Copyright assignment must be clear, and confidentiality agreements allow access for academic research.

  • In a suitable format for archiving and dissemination

    Each archive stipulates acceptable formats for data acquisition. The UK Data Archive accepts only machine-readable material and not audio-cassettes or video tapes.

  • The condition of the material is legible or audible and in good physical condition

    A self-evident practical consideration. It may be possible to scan and enhance hand-written notes but this can be expensive and resource intensive.

  • The dataset is complete

    It is recognized that an incomplete dataset can have potential for re-use. For example, it is not always possible to archive audiotapes with anonymized interviews transcripts. However, to have both tapes and transcripts provides a dataset of greater value for re-use. Further, if a number of interview transcripts are withheld from a dataset for archiving, their significance for the dataset needs to be established, as they may be crucial to the dataset as a whole. [22]

It is interesting to compare these criteria with Qualidata's current acquisition priorities. Qualidata's priorities in 2002 center on:

  • the relative importance or impact of the study e.g. research recognized to have had a major influence in its field and/or representing the working life of a significant researcher,

  • data being complementary to existing data holdings,

  • the popularity of the study topic (health, criminology, social policy),

  • data being based on national samples,

  • mixed methods data,

  • data that have further analytic potential than the original investigation (CORTI & WRIGHT 2002). [23]

Certain criteria are deemed to be essential: that data are documented to a minimum standard, are in appropriate formats, are complete, and that confidentiality, data protection and copyright issues have been addressed. More emphasis is now placed on data in particular study topics, and data collected using both qualitative and quantitative research methods. Through experience of data archiving and re-use, data based on a national sample and datasets that have not been fully explored in the original study, provide the greatest potential for secondary analysis. [24]

THOMPSON, reflecting on his own personal experience of re-using qualitative data, argues that: "the most valuable qualitative datasets for future re-analysis are likely to have three qualities: firstly, the interviewees have been chosen on a convincing sample base; secondly, the interviews are free-flowing but follow a life story form, rather than focusing narrowly on the researcher's immediate themes; thirdly, when practicable re-contact is not ruled out" (THOMPSON 2000, para 41). [25]

In comparison, the US national qualitative data archive, the Murray Research Center, based at Harvard University, has a number of different collection priorities:

  • data that have not been exhaustively analyzed,

  • data which are longitudinal in design,

  • the possibility of further follow-up of the sample,

  • large scale, national sample studies,

  • the historical value of the study,

  • studies that include a wide range of measures,

  • measures with well-documented reliability and validity (CORTI & WRIGHT 2002). [26]

Founded in 1976, the Murray Research Center holds both quantitative and qualitative data and emphasizes the in-depth multi-disciplinary study of individual lives. Of particular note in its collection priorities are the longitudinal design of the data and the possibility of further follow-up of the sample by new researchers. [27]

Recent consultancy work carried out by the UK Data Archive for the Medical Research Council (CORTI & WRIGHT 2002) reported on the features of data that have a high potential for longevity and secondary analysis. Longitudinal studies have obvious long term potential. They also have the advantage of addressing broader sets of questions and thereby providing excellent opportunities to ask new questions. For many investigators, studies with the greatest value had a long-term follow-up design. Studies should be of high quality and well documented. For secondary analysis value, a dataset requires focus and breadth of investigation, complexity and high data quality. [28]

There exists considerable consensus on the most important features of a dataset for preservation and re-use between Qualidata, the Murray Research Center and the Medical Research Council. Together these provide valuable guidance for others newly embarking on the acquisition of qualitative data. There is agreement that longitudinal studies with the possibility of follow-up of subjects, a national sample base, high quality important data with a breadth of investigation, and data that have not been fully explored in the original study, are the most desirable qualities for acquiring qualitative data for secondary analysis. [29]

7. Ethical Issues in Archiving Data

It is essential that the issues of confidentiality and consent are resolved prior to the acquisition of data (CORTI, DAY & BACKHOUSE 2000; ESRC 1999). These are often the major obstacles to archiving a dataset, when they have not been clarified and agreements made with interviewees. At the end of a research project, confidentiality can be preserved through anonymization, and consent can be obtained by the researcher re-contacting informants, but these measures are both resource intensive. Researchers often do not fully appreciate the complexity of these issues and many assume that confidentiality is an imperative, without full exploration of the consequences with interviewees (GRINYER 2000). In a number of studies, especially in oral history, research participants have been very willing for their interview material to be archived without anonymization. Full consideration must be given to these issues by the researcher in consultation with the research participants. [30]

8. Summary of Main Points for a Successful Qualitative Data Acquisition Strategy

  • Establish the existence of qualitative research data through surveys of researchers working in, or retired from, relevant disciplines. It is important to concentrate first on capturing data from significant earlier studies. Personal contacts are probably the most successful approaches for acquiring these studies. Surveys, by post and electronic mail, can be conducted for the larger scale task of contacting researchers about recently collected data.

  • Formulate a collections policy to acquire qualitative research data with the highest potential for secondary analysis within the chosen remit of the archive. Establish criteria for evaluating qualitative datasets for acquisition.

  • Promote a positive attitude to preserving and sharing qualitative research data within the research community. Placing data archiving on the agenda for researchers enhances the prospect of acquiring data from current projects, and reminds researchers to prepare for archiving from the start of the research project. A researcher's positive attitude towards archiving has been one of the most important factors in acquiring qualitative data. These researchers have successfully addressed ethical issues concerning confidentiality and consent, whilst for others opposed to archiving, similar ethical problems have been a barrier to depositing data.

  • Address researchers' ethical concerns. Problems of confidentiality and consent, as expected, are the major obstacles to archiving qualitative data. These need to be brought to the forefront and explored with researchers, to determine the necessary actions to overcome these problems.

  • Establish an appropriate service and facilities for acquiring, processing, storing and accessing qualitative data.

  • Work closely with research funders to preserve important qualitative data resources with a high potential for secondary analysis. Not only does such a policy avoid the unnecessary waste of costly resources, it brings added-value to funds already expended. [31]


Qualidata is the Qualitative Data Service of the UK Data Archive which is jointly funded by the Economic and Social Research Council; Joint Information Systems Committee; and the University of Essex. Material for this paper has been drawn from Qualidata internal reports. For more information on ESDS Qualidata (this name has superceded "Qualidata"), see the website: http://www.esds.ac.uk/qualidata/.


1) Includes existing archived qualitative research material archived in public repositories but not handled by Qualidata. <back>

2) Taken from Report of the ESRC Qualitative Data Archival Resource Centre (Qualidata) October 1999 – October 2000. Figures are not presented for October 2000 onwards (the service's core ESRC funding was drastically reduced while the ESRC undertook a holistic review of their longer-term data archiving strategy). <back>

3) Some studies in this discipline were primarily classified under Sociology. <back>

4) Sociology includes the disciplines of cultural, gender and media studies. <back>

5) From Annual Report to the ESRC of the ESRC Qualitative Data Archival Resource Centre (Qualidata) Oct 1999 – Oct 2000. <back>


Louise CORTI

Present position: Associate Director of the UK Data Archive at the University of Essex, Economic and Social Data Service (ESDS), and Head of ESDS Qualidata (formerly the Qualitative Data Service), the Outreach & Training and Acquisitions Sections of the ESDS. Past position: Deputy Director Qualidata, Department of Sociology, University of Essex.

Major research areas: statistical literacy/using data in teaching; qualitative data archiving and secondary analysis of qualitative data; mixed methods data analysis.


Louise Corti

ESDS Qualidata
University of Essex
Colchester CO4 3SQ, UK

E-mail: corti@essex.ac.uk
URL: http://www.esds.ac.uk/



Present position: Senior Acquisitions and Advice Officer at the UK Data Archive, Economic and Social Data Service (ESDS), University of Essex.

Major research areas: qualitative data archiving; informed consent and confidentiality of qualitative data.


Gill Backhouse

UKDA Acquisitions
University of Essex
Colchester CO4 3SQ, UK

E-mail: acquisitions@esds.ac.uk
URL: http://www.esds.ac.uk/


Corti, Louise & Backhouse, Gill (2005). Acquiring Qualitative Data for Secondary Analysis [31 paragraphs]. Forum Qualitative Sozialforschung / Forum: Qualitative Social Research, 6(2), Art. 36, http://nbn-resolving.de/urn:nbn:de:0114-fqs0502361.

Revised 3/2007

Copyright (c) 2005 Louise Corti, Gill Backhouse

