Archiving Longitudinal Data for Future Research: Why Qualitative Data Add to a Study's Usefulness1)

Jacquelyn B. James & Annemette Sørensen

Abstract: In this paper we discuss the special challenges that data archives face when archiving and preparing for new research longitudinal studies with a large qualitative component. We discuss issues of confidentiality, how best to organize longitudinal data for future use, including ways in which to permit future follow-ups without compromising confidentiality, and ways to teach investigators how to plan for the archiving of their longitudinal research. The core of the paper, however, is an examination of the strengths that qualitative data lend to longitudinal studies for future researchers. Our main argument is that qualitative data to a much greater extent permit new investigators to look at the data in new ways than do quantitative data. We present examples of this based on re-analyses of data archived at the Murray Research Center.

Key words: archival data, longitudinal data, qualitative data, secondary analysis

Table of Contents

1. Introduction

2. The Murray Research Center: A Center for the Study of Lives

3. The Protection of Confidentiality

4. Agreements with Contributors

5. The Selection Process

6. Important Uses of Qualitative Data from the Murray Center's Holdings

7. Conclusion







1. Introduction

Founded in 1976, the Murray Research Center: A Center for the Study of Lives is a national repository for social and behavioral science data on human development and social change, with special emphasis on the lives of American women. Data housed at the Murray Center are made available to qualified scholars and researchers for secondary analysis, replication, and sometimes follow-up studies. Presently, our archive includes more than 270 data sets with a wide range of topics, samples, and designs. Many of these studies include in-depth interviews or at the very least, some open-ended survey questions. We make it a priority to acquire data for our collection that have not been exhaustively analyzed, which contain qualitative or interview data, or which are longitudinal in design. [1]

In our view, the sharing of qualitative data is more unusual and more complicated than the sharing of quantitative data. Therefore, the purposes of this paper are: (1) to document the ways that the center selects and archives qualitative data sets, makes different agreements with data contributors as to their use, and protects the confidentiality of respondents; and (2) to provide examples of creative uses of such data. In the end, we hope to bring into bold relief the usefulness of qualitative data for restructuring old data for new questions. Using examples from several studies conducted using data from the Murray Center archive, we will argue that qualitative data permit new investigators to look at the data in innovative ways to a much greater extent than do quantitative data. [2]

2. The Murray Research Center: A Center for the Study of Lives

The issue of underuse (or waste) of data has been a concern of funders and research administrators for some time and has recently received renewed attention (McARDLE 2000; SCHAIE, 2000; JAMES & ZARRETT, 2000; FERRARO & McNALLY, 2000). As KOZLOWSKI (1993) [cited in COLBY, JAMES & HART, 1998, p.ix] has observed, the "further one gets from a project, the greater is the chance of loss. Those details which once seemed too obvious to note become fragmented or lost." For this reason and because grants tend not to fully cover thorough and complete analyses of complex data sets, projects are often left with a great deal of valuable data unanalyzed. A central purpose of the Murray Center is to turn data that might otherwise be wasted into a rich and accessible resource for new research. [3]

In order to be an effective resource for new research and contribute to minimizing the waste of data, it is important not only to preserve and document data, but also to let the research community know about the availability of the data and to provide some training in how to use it. The latter is especially important, because methods for secondary analysis, especially secondary analysis of qualitative data, are unfamiliar to many researchers and not taught in many graduate study programs (COLBY, JAMES & HART, 1998). Thus, the Murray Center serves as both a repository and a research center that provides opportunities for training in the use of existing data. [4]

At least in the United States, the Murray Center's archive is unique in several ways. First, the Murray Center is the only archive that preserves the original subject records as well as coded, machine-readable data. We have also made a few carefully selected videotaped data available for reuse. The availability of these raw data, such as transcripts of in-depth interviews, behavioral observations, and responses to projective tests, is especially valuable for secondary analysis, allowing the application of different perspectives and new scoring procedures to the original data. As we will show, such data makes possible the radical restructuring of the subject records and mitigates the degree to which one is locked into the theoretical assumptions under which the data were collected. As far as we know, the Murray Research Center is the only repository in the U.S. that is designed to offer a wide range of data sets with original, qualitative records, many of which are longitudinal. [5]

The Center is also unique in that it allows for the possibility that samples from many of the studies it holds are available for further follow-up by a new investigator. The potential to conduct a follow-up study is very valuable in that it allows a new investigator to design the outcome measures used. Moreover, by allowing follow-up studies to be conducted, a study that was not longitudinal can become so. In general, the availability of multiple longitudinal data sets within the center's holdings makes it possible to add a new cohort to a single cohort longitudinal study or to integrate two data sets into a single multi-cohort study (COLBY & PHELPS, 1990). [6]

The Murray Center currently holds over 270 data sets, approximately 75 of which are longitudinal. The center adds new studies to the archive each year and has, over the years, received funding for special acquisition initiatives. One such initiative, funded by the U.S. National Institute of Mental Health and the John D. and Catherine T. MacArthur Foundation, was to develop a major collection of longitudinal studies of mental health including, for example, BAUMRIND's Family Socialization and Developmental Competence Project, BRUNSWICK's Harlem Longitudinal Study, GLUECK and GLUECK's Crime Causation Study, the Institute of Human Development's Intergenerational Studies, TERMAN's Life Cycle Study of Children of High Ability, and VAILLANT's Study of Adult Development (The Grant Study). With additional funding from NIMH along with a grant from the National Science Foundation, Murray Center staff established another major archive, this one designed to enhance the racial and ethnic diversity of the archive. A few examples of these data sets include: BRUNSWICK's Harlem Longitudinal Study; ECCLES' Prince George's County Study of Adolescent Development in Multiple Contexts; and SUAREZ-OROZCO and SUAREZ-OROZCO's Immigration, Family Life and Achievement Motivation Among Latino Adolescents. Ultimately, all special collection initiatives are folded into the archive. [7]

Because the Murray Center makes raw paper materials and videotaped data from selected studies available for new research, we must pay particular attention to the protection of study participants. To handle this, we make many agreements with our contributors with respect to subject confidentiality, ways the data may be used and whether follow-up is possible. [8]

3. The Protection of Confidentiality

Maintaining confidentiality of records is necessary in order to protect the privacy of research participants and is a high priority at the Murray Center. Data are stored in a climate- controlled, fireproof, locked vault with access given only to authorized staff members. In addition, at least up to now, almost all data sets are deidentified. Through this process, names and other information that uniquely identify subjects are censored. Currently, we are rethinking this policy as we acquire more and larger data sets and as some contributors feel that such measures damage the data. If deidentification is not possible or would severely limit the usefulness of the data, we can protect confidentiality by limiting access to the data by users. One way of doing this is to impose a more stringent screening of applicants than is normally required. For example, sometimes the approval of all members of a screening committee, appointed by the director and the contributor, is necessary to gain access to a data set. Decisions as to when a screening committee is called for are made by the Acquisitions Committee and the contributor. [9]

4. Agreements with Contributors

Contributors are guaranteed access to their own data sets and the data sets of others as long as no restrictions are violated. All agreements with contributors are made within the guidelines of the Memorandum of Agreement (MOA) which can be seen in Appendix A. In the MOA, the contributor may specify what kinds of information should be removed for the protection of confidentiality. S/he may specify certain variables or sections of the materials that are still being analyzed by current research teams as off-limits to new research for a specified time period. S/he may or may not provide a names-to-numbers list so that follow-up studies can be conducted; s/he may also say that follow-up is possible only in collaboration with the contributor. S/he may or may not agree to allow the publications of quotations taken directly from the data. Contributors may limit the use of the data in other ways as well. [10]

The center agrees to list the data set in the Guide to the Data Resources of the Murray Research Center, to require approval by the director of the center before applications to use the data are granted, and to honor all restrictions specified in the MOA. The center also makes an effort to require users of the data to give credit to the contributor of the data through citation in publications using the data and to furnish copies of any published reports based directly or indirectly on the use of the material to the center. [11]

5. The Selection Process

In determining whether or not to acquire a new data set for the archive, several kinds of criteria are used. The criteria can be roughly grouped into five general categories: content of the study, methodology (instrumentation and design), previous analysis and publication, historical value, and cost of acquiring and processing the data. Each of these is described below with an indication of the relative importance of each criterion, where possible. [12]

Content: The information in a data set must be relevant to the study of American women and/or basic issues in human development. Topic areas include: women and work, education, the family, psychological development and psychological processes, stress and coping, mental health; physical health, reproduction, and sexuality, political participation; and social policy. In addition to these ongoing topics of importance to the archive, smaller initiatives develop around a special topic as a focus for new acquisitions. The issues are chosen to reflect issues of current and future research importance. For example, in the recent past, studies of divorce and remarriage were identified as of particular interest. When a target area is chosen, staff members make a special effort to locate relevant studies by reading journals, attending conferences, and contacting researchers who have worked in the area. Finally, as we mentioned, during the last decade or so, we have secured grants for large topic-oriented acquisition initiatives. [13]

Instrumentation: The Murray Center is unusual in that it acquires and makes available original subject records as well as coded computer data. The availability of extensive open-ended material offers the possibility of recasting the material in new terms, recoding for new variables, and addressing questions that are very different from those of the original investigators. Since the acquisition of qualitative material is a unique and valuable feature of the Murray Center's archive, emphasis is placed on acquiring studies that include such material. Studies that consist solely of coded computer data are de-emphasized since such data sets are readily available through other archives. Videotapes, which also allow for great flexibility in recoding, are also desirable acquisitions, assuming that funding for processing is available and that issues of informed consent and confidentiality can be resolved satisfactorily. [14]

Because they offer greater opportunities for secondary analysis, studies that include a wide range of measures are more valuable than those with fewer measures. Measures with well-documented reliability and validity are preferred. [15]

Design: Because of their special value for secondary analysis, data generated using certain kinds of research designs are preferred. Longitudinal studies have the highest priority. The possibility of further longitudinal follow-up by new investigators increases the value of a study for acquisition if a follow-up appears to be feasible and appropriate. High attrition rates seriously affect the quality of longitudinal research and are evaluated carefully when longitudinal studies are considered. Likewise, the extent of missing data is a factor in evaluating a study for acquisition. In addition to longitudinal designs, replications, surveys, and case studies may provide useful opportunities for secondary analysis. If the topic is one for which replication would be valuable, particularly for examining social change, a data set may be acquired with the objective of providing the base-line for a replication study. Similarly, studies with one or two high-quality replications already completed are valuable for archiving. [16]

Finally, large surveys with national samples have a high priority if they are of particular content relevance, of very high quality, and not available elsewhere. Smaller-sample surveys are sometimes acceptable as well, particularly if they include responses to open-ended as well as closed-ended questionnaire or interview items. Cross-sectional studies that sample systematically across a wide age range are of moderate priority. Intervention and experimental studies will rarely be acquired because they usually are not well suited to secondary analysis. Experimental studies in particular are often designed to test very specific hypotheses, leaving manipulated samples and little room for reanalysis. [17]

Sample: At the present time, it is not feasible for the Murray Center to maintain a broad international focus. Most of the data sets acquired for the archive will include only North American samples. Occasionally, exceptions are made in cases where the non-U.S. data directly parallel or complement specific holdings on U.S. samples. [18]

The sample must be representative of the group to which the investigator wishes to generalize the findings and must be large enough in relation to the design and variables to make meaningful analysis possible. [19]

For survey data, high response rate is an important determinant of quality, although some assessment of nonresponse bias may offset difficulties with low response rate. Studies of racial and ethnic minorities are currently a high priority, and studies of Radcliffe students or alumnae are given special consideration. [20]

Previous Analyses and Publication: As an indication of the quality of the research, the distinction of the study and the investigator should be exhibited in a productive publishing record for the study. This may be waived in cases where the lack of publications can be attributed to factors other than low quality of data. In any case, the data must not be so exhaustively analyzed as to preclude extensive further analysis. [21]

Historical Value of the Data: All studies will be scrutinized from a long-range perspective in an attempt to predict whether or not the data will remain valuable in the future. Classic or historically important data sets may occasionally be acquired even if they fall short in terms of sample, measures, or other criteria. Data collected at earlier historical periods may be useful to historians looking at social change or at the history of social science. [22]

Cost of Archiving: The cost of archiving a data set will affect the assessment of its desirability for acquisition. Each data set will be evaluated in terms of how much organization, deidentification, duplication, documentation, vault space, and computer programming are necessary to archive it and make it available to users. In some cases, a somewhat less attractive data set will be acquired if it is easy and inexpensive to do so. [23]

6. Important Uses of Qualitative Data from the Murray Center's Holdings

The users of Murray Center data include sociologists, psychologists, criminologists, historians, educators, political scientists and economists. For purposes of this paper, we have drawn examples from a variety of disciplinary perspectives to show the ways that qualitative data can be radically restructured for new research.2) We will describe in some detail several creative approaches to the use of existing qualitative data starting with seldom-used methods that have potential for strong contributions: (1) two examples of new prospective studies (new issue to observe) created out of existing prospective studies; (2) two examples of the use of multiple data sets for multi-cohort designs; and (3) a few examples of follow-up studies. We will conclude with (4) a few examples of classic reanalyses, chosen from numerous such examples from users of the center's data, conducted by representatives of three different disciplines, and highlighting both long and short-term projects: (a) a reanalysis of a single data set conducted by two psychologists, a one-paper study; (b) a book project by an historian, and (c) another set of reanalyses conducted by two criminologists that has largely been the basis for building a research career, and is ongoing. [24]

New Prospective Studies from Old: One important function of longitudinal studies is to examine attitudes, behaviors, beliefs and events as they unfold as opposed to relying on retrospective accounts, with their well-known limitations (see, for example, MENARD, 1991). Longitudinal designs thus allow us to study factors involved in certain outcomes, such as drug use (SHEDLER & BLOCK, 1990), adult behavior and self concept in light of parents' child-rearing practices (KOESTNER, FRANZ, & WEINBERGER, 1990), and bereavement (IDE, TOBIAS, KAY & GUERNSEY DE ZAPIEN,1992; IDE, TOBIAS, KAY, MONK, & GUERNSEY DE ZAPIEN,1990), examining predictors and consequences, as they co-occur with data collection over many years. Ideally, such issues should be studied prospectively by randomly assigning people to control and experimental groups. Of course ethical constraints prevent such research, and longitudinal research has long been considered the best method for making causal inferences when random assignment is not possible. The existence then of longitudinal studies available for secondary analysis makes it possible to use data collected for one set of issues to examine other issues as they present themselves in the data, especially valuable in that the new investigator does not have to wait for years for the study to be conducted. [25]

We have few examples of new prospective studies from old ones at the center. It is our hope that the following two examples inspire greater use of this method for secondary analysis in order to shed light on numerous research questions/issues that because of ethical constraints cannot be handled by random assignment. [26]

Psychologists BLOCK, BLOCK, and GJERDE (1986), for example, used a longitudinal study of personality and cognitive development that was begun in 1968 and is still active today (BLOCK & BLOCK, 1980). For these analyses, the personalities of children from intact families at ages 3, 4, and 7 were reliably assessed by independent raters, using a variety of resources within the data: observations by teachers, clinicians and researchers. The parents of a number of these children subsequently divorced. According to the authors, the behavior of boys was found, as early as 11 years prior to parental separation or formal dissolution of marriage, to be experiencing predivorce familial stress. The boys' behavior was characterized by lack of impulse control, aggression, and excessive energy prior to parental divorce. The behavior of girls from subsequently divorcing families was found to be notably less affected.

The prospective relationships afforded by the longitudinal analyses suggest that the behavior of conflicting, inaccessible parents during the pre-separation period may have serious consequences for personality development, especially for boys. Hence, some characteristics of children commonly seen to be a consequence of divorce may be present prior to marital dissolution (BLOCK, BLOCK & GJERDE, 1986, p.827). [27]

The degree of precision in identifying relevant personality characteristics for this study would not have been possible without the availability of the original open-ended recordings of observations. [28]

A second, albeit older, example of a new prospective study from an old one involves the prediction of suicide (SHNEIDMAN, 1971).3) Using data from the TERMAN, SEARS, CRONBACH and SEARS study, the TERMAN Life Cycle Study of Children with High Ability (1922-present), SHNEIDMAN identified a subgroup of 30 men, for whom longitudinal personality data were available from 1921 to 1960. By examining the data, the author could see that there were five respondents who had committed suicide during their middle years (ages 40-58), 10 matched controls who had died natural deaths, and 15 who were still living; he could not, however, identify the cases. These data presented the possibility of a natural experiment with blind analyses. The case materials, derived from the archive (provided by executors of the data set at Stanford University), were prepared for Dr. SHNEIDMAN in a way that prevented his knowing any information about whether or how the respondent had died. Using life history data from these voluminous qualitative materials in the TERMAN study, SHNEIDMAN coded each respondent for the level of "perturbation" i.e., how upset (disturbed, agitated, sane-insane, etc.) he was, and rated each one on a 1-9 scale of lethality, i.e., how likely was the respondent to take his own life. "For each of the 30 cases a rough chart of the individual's perturbation in early childhood, adolescence, high school, college, early marriage, and middle life was made" (SHNEIDMAN, 1971, p.24). Results indicated that four of the five cases assessed as the most suicidal had, in fact, committed suicide. Discussion centered around several factors implicated in suicidality. SHNEIDMAN also discussed the extent to which suicide can be seen as a "discernible part of a life style and as a predictable outcome, in a person of 50, by age 30". SHNEIDMAN, in appreciating the value of the longitudinal study and the richness of the materials embedded within it, points out how seldom case histories of people who have actually committed suicide are available. This study has proven useful to clinicians and social workers trying to develop diagnostic acumen. [29]

As we have mentioned, we believe that new prospective studies created out of old ones, as a method of secondary analysis, has been under-used in social science research. It could be used, for example, to study antecedents and consequences of drug usage, eating disorders, optimism versus depressive tendencies, women's employment patterns and/or atypical careers, midlife development, unusual accomplishments, longevity, and so on. [30]

Multi-cohort designs: PARKER and ALDWIN (1997) used two longitudinal data sets, both of which included multiple cohorts, to study the extent to which personality change is developmental (age-related) or cohort specific. They examined age, cohort, and period effects in both personality (gender identity) and values. The data sets were chosen on the basis of the variety of comparisons they permitted, and the availability of relevant measures and open-ended material for content coding. Their analyses provided consistent support for differential impacts of age, cohort, and period on personality and values. "Whereas personality change (in this case 'masculinity' and 'femininity') is clearly an effect of developmental processes, changes in value orientation are clearly the result of changing sociohistorical norms and opportunities" (p.102). While this study made use mostly of pre-coded computer data, some content coding was made possible by open-ended questions about life values. It is described here primarily as a good example of combining data sets to create multi-cohort designs. [31]

Another remarkable example of a multi-cohort study made possible by combining data sets from the Murray Center archive is provided by STEWART and HEALY (1989), Linking Individual Development and Social Changes. STEWART and HEALY used five data sets with information about respondents spanning 40 years and including birth cohorts ranging from World War I to the baby boom to reveal connections between the experience of different social histories and personality development. The data consisted of questionnaires with unanalyzed closed-ended questions, along with sufficient open-ended material for developing constructs such as employment patterns, atypical careers, regrets about giving up/not giving up work when children were born, the presence/absence of female role models, internal conflicts, world views, and so on. Masterfully moving among these data sets, the authors show the ways in which social changes such as the expanded work role for women in World War II and the contracted one in the postwar period, the women's movement, and the Vietnam War during the early 1970s affected personality development quite differently depending on the age and life stage one was in when they occurred. Specifically, they argue that:

"... social experiences, in interaction with individual development, have consequences for individuals' world views when they are experienced in childhood, for their identities when they are experienced in late adolescence and the transition to adulthood, and for their behavior when they are experienced in mature adulthood" (STEWART & HEALY, 1989, p.40). [32]

Follow-up studies: As we have mentioned, the Murray Center is the only data archive that we know of that makes names-to-numbers lists available for further follow-up of some of the studies within its holdings. We will provide three examples of these, only two of which made use of qualitative data. [33]

First, Karen ROBERTO, professor of psychology at the Virginia Polytechnic Institute and State University, conducted a follow-up of the participants in TRAUPMANN's McBeath Institute Aging Women Project. The original study, conducted in 1978-79, examined many aspects of the lives of older women. Interview questions covered family status, health, work history, relationship equity, major life changes, life satisfaction, organizational affiliations, and more specific topics such as "winter as a life stress" and "the meaning of aging." In ROBERTO's follow-up, Friendships of Older Women: Changes Over Time, she recontacted 109 of the original participants to examine stability and change in interaction patterns between older women and their close friends over time. Using telephone interviews and a mailed questionnaire, she also examined the extent to which the life situations of older women (income, health, psychological well-being) influence their relationships with close friends. Dr. ROBERTO gathered data on 78 friendships mentioned in the 1978-79 data and still in existence in 1992. [34]

ROBERTO (1993; ROBERTO & STANIS, 1994) found a pattern of stability in emotional qualities of the friendships from 1978-1992, but change with respect to other aspects of the relationships in terms of recreational activities, lifestyles, and ways to connect. Health changes proved to have a significant impact on changes in relationships. Without the open-ended material in the original study, ROBERTO would not have been able to examine such personal data. [35]

A second example of a follow-up study made possible by the Murray Center includes a study by Todd HEATHERTON, professor of psychology at Dartmouth College.4) HEATHERTON conducted both a follow-up and a replication of a study, the Prevalence of Bulimia Among College Students (COLBY, WARE, & ZUCKERMAN, 1982-1984) to examine the prevalence, correlates, and consequences of eating related difficulties over time. The original sample, contacted in 1982, consisted of 900 undergraduates who were queried about eating disordered behaviors and attitudes. The purpose of the replication was to examine changes in the prevalence of dieting behavior and eating disorder symptoms from 1982-1992; the follow-up was designed to assess whether any change in eating behaviors had occurred during the transition to early adulthood. HEATHERTON, NICHOLS, MAHAMEDI, and KEEL (1995) reported, on the basis of the replication, that there were "significant reductions of problematic eating behaviors and disordered attitudes about body, weight, and shape from 1982 to 1992 ... Significantly fewer women and men reported chronic dieting in 1992 than in 1982" (p.1623). [36]

Moreover, the women in the follow-up study, as reported by HEATHERTON, MAHAMEDI, STRIEPE, FIELD, and KEEL (1997) had substantial declines in disordered eating behavior as well as increased body satisfaction, even though body dissatisfaction and desires to lose weight remained at relatively high levels. Men, who rarely dieted or had eating problems in college, were prone to weight gain in the 10 year period following college, and many of them reported increased dieting or disordered eating. While these data provided little in the way of open-ended materials, they have attracted other secondary analysts who have recoded the quantitative data (see for example, JOINER & HEATHERTON, 1998; JOINER, HEATHERTON, & KEEL, 1997; JOINER, HEATHERTON, RUDD, & SCHMIDT, 1997). [37]

Finally, the late David McCLELLAND, then professor emeritus of psychology at Harvard University, and Carol FRANZ, then postdoctoral fellow at Boston University conducted a follow-up study of SEARS, MACCOBY, and LEVIN's (1951-1958) Patterns of Childrearing Study. In 1951, mothers of 5-year-old children in the Boston area were interviewed about their own and their husbands' parenting practices. The children from this sample were re-interviewed at ages 12, 22, 31, and 41. McCLELLAND and FRANZ, who conducted the most recent wave of data collection, analyzed these longitudinal data with a focus on predictors of four types of adjustment: social and work accomplishments and psychological and physical health (FRANZ, McCLELLAND, & WEINBERGER, 1991; KOESTNER, FRANZ, & WEINBERGER, 1990; McCLELLAND, 1989). [38]

They examined family origins of empathy in adulthood and found that empathic adults were more likely to have had mothers who were confident and happy, who encouraged affection and discouraged aggression; these were not necessarily the mothers who expressed the most love for their child. Paternal firmness and involvement in child care were also associated with empathy in adulthood. All of these constructs were derived from lengthy open-ended interviews. [39]

Analyses also indicated that motive patterns (derived from stories written in response to pictures) appear to play a causal role in health outcomes between the ages of 31 and 41. Affiliative trust (optimism about relationships with people) at age 31 was positively associated with good health, which is consistent with the findings of others (SCHEIER and CARVER, 1987; PETERSON & SELIGMAN, 1987) that people who are optimistic tend to be healthier. Also, people who were high in agency motivation—the need for or concern with power and/or achievement—at age 31 were significantly healthier over time, but only when life stress was low or managed well. (McCLELLAND, 1989)5) [40]

Recoding/Reanalyzing a single data set: MacDERMID and FRANZ (1998), both psychologists, used the Q-sort technique, a method of rating respondents on some psychological characteristic (BLOCK, 1978), to study "generative realization" (a concept developed by Erik ERIKSON having to do with individuals' concern for bringing along the next generation or leaving a legacy). The Q-sort technique has been hailed as a useful method for secondary analysis in that it provides the new investigator with a tool for reliably and validly rendering different materials (interviews, questionnaires, psychological assessments) comparable, through the common language of its 100 personality items, ipsatively arranged. [41]

Using data from the Intergenerational Studies (which spans across a 50 year period 1932-1982), contributed to the center by the Institute of Human Development, MacDERMID and FRANZ created generativity scores from a variety of materials within the data set. As they explain, generative realization (GR) index scores were constructed using items judged to represent the "ideally generative person" from the list of 100 personality characteristics included in Block's (1978) version of the Q-sort. The authors then asked judges to read through the case files and rate each subject on GR using the pre-selected characteristics. Thus, GR became a new observation of the study participants' personality using re-coding conducted by at least 2 trained judges. The use of the Q-sort for other aspects of personality have been used extensively (BLOCK, 1971; CLAUSEN, 1993; EICHORN, 1981; and HAAN, MILLSAP, & HARTKA, 1986). [42]

According to MacDERMID and FRANZ (1998):

"The Q-sort is distinctive as a research instrument in that it can reliably and validly render different materials (narratives, questionnaires) comparable, through the common language of its 100 personality items, ipsatively arranged (see Block, 1978). Problems associated with having somewhat different data collected at different assessments, and variations in the questions asked within time to the different subsamples can be reduced by "translating" the data in to the language of the Q-sort. Thus, the Q-sort approach opens the door to many avenues of research, especially in secondary analysis of longitudinal data by rendering the data sets equivalent over time through carefully trained, reliable judges" (p.211). [43]

Using the Q-sorted data, MacDERMID and FRANZ were able to explore previously unexamined issues related to generativity, such as gender and cohort differences, long-term trajectories, and developmental trends in the components of agency, communion and insight. Without the availability of the qualitative materials from the data set, such coding for this study could not have been accomplished. [44]

Elaine Tyler MAY, an historian, used the KELLY Longitudinal Study (1935-1955), a study of 300 couples who were surveyed every few years throughout two decades6), to gain insight into a unique historical era, the period of intense domestic focus that followed World War II. In her classic book, Homeward Bound: American Families in the Cold War Era, MAY (1988) used a variety of sources along with the Kelly data, to shed light on the following question: "What accounted for the endorsement of 'traditional' family roles by young adults in the postwar years and the wide-spread challenge of those roles by their children?" (pp.9-10). Answering this question, MAY asserted, required "entering the minds of the women and men who married and raised children during these years. The families they formed were shaped by the historical circumstances that framed their lives" (p.10). [45]

The original investigator had been interested in long-term personality development among married persons—as assessed by numerous psychological tests, mostly questionnaires. MAY, however, was interested in the respondents own testimonies in which they "wrote about their lives, the decisions they made concerning their careers and children, the quality of their marriages, their family values, their sexual relationships, their physical and emotional health, and their major hopes and worries. In these open-ended responses, freed from Kelly's categories and concerns, they poured out their stories" (p. 12). Using these and other materials from the era, MAY documents in nine chapters the ways that the Cold War affected all aspects of family life including different aspects of the quality of marriage, sexuality and birth control decisions, and consumer practices, all evidence used to challenge the assumptions of the happy housewife of the 1950s and to suggest reasons for the upheavals in family life that followed in the late 1960s and 1970s. Quotations from the Kelly data, sprinkled liberally throughout each chapter, illuminate and validate her claims in ways that closed-ended answers to the survey questions never could. [46]

Sometimes the restructuring/recoding of data can constitute decades of work. LAUB and SAMPSON (1998), both criminologists, have spent the last fifteen years working with data from another longitudinal study, Unraveling Juvenile Delinquency (UJD), conducted by Sheldon and Eleanor GLUECK of the Harvard Law School 1949-1969 (GLUECK & GLUECK, 1950, 1968). LAUB and SAMPSON have used constructs and computer-coded data developed by the GLUECKs. Mostly, however, they have examined the data with new lenses, making use of statistical techniques unheard of in the GLUECKs' day. Without the materials that the GLUECKs collected, and the availability of the raw paper data, these new analyses would not have been possible. LAUB and SAMPSON have used the voluminous materials in the case records including, for example, detailed handwritten interviews with the respondents and their families, interviewer narratives created at the end of each interviewing session, interviews with teachers, criminal justice officials, and employers, and miscellaneous notes and correspondence relating to family and school experiences, employment histories, military service and so on to recode the data into a format that would be more useful for their research interests-a longitudinal framework for studying criminal careers.

"Specifically, over a 15-month period, we reconstructed a complete criminal history for each respondent in the study from first arrest to age 32 years. During this time period, the Gluecks' men respondents generated more than 6,000 arrests. For each arrest event, we coded the date, the specific type of charge or charges, the exact sequence of arrests, and the dates and types of all criminal justice interventions including the actual dates of incarceration (if any). Such a scheme captures all the richness of the Gluecks' longitudinal data (for more details, see Sampson & Laub, 1993)" (LAUB & SAMPSON, 1998, p.220). [47]

For their book, Crime in the Making, SAMPSON and LAUB (1993) used these reconstructions to portray, in statistical terms, the major predictors of desistance from criminal careers over time. "Drawing on a theoretical model of turning points through the life course, we found that job stability and marital attachment in adulthood had significant negative effect on later crime independent of early childhood experiences" (LAUB & SAMPSON, 1998, p.220). [48]

Merging qualitative and quantitative methods, these authors used the results from quantitative analyses to identify important cases for in-depth qualitative analyses. Given that their quantitative analyses showed that job stability was an important mechanism fostering desistance from crime, they selected cases that displayed high job stability in combination with no arrest experiences as adults; similarly, they selected cases that displayed low job stability and arrest experiences. They also identified cases that were clearly inconsistent with their quantitative findings. In total, they reconstructed and then examined in detail 70 life histories from the GLUECKs' delinquent sample. As LAUB and SAMPSON (1998) explain:

"... the quantitative findings are enhanced by the analysis of qualitative data, resulting in more illumination of the complex processes underlying persistence and desistance from crime (Karins, 1986; Jick, 1979; Kidder & Fine, 1987; Magnusson & Bergman, 1990). For example integrating divergent sources of life history data (e.g., narratives, interviews), our qualitative analysis was consistent with the hypothesis that the major turning points in the life course for men who refrained from crime and deviance in adulthood were stable employment and good marriages. At the same time, we found that persistence in criminal behavior in adulthood often was the result of a developmental process of 'cumulative disadvantage' in which the negative influence of structural disadvantages (e.g., dropping out of school, having a criminal record or a dishonorable discharge from the military) persist throughout adult development (see also Sampson & Laub, 1997)". [49]

SAMPSON and LAUB (1993) took this approach one step further by examining residual cases that did not fit with the empirical results. In so doing, they learned, for example, that alcohol abuse can counteract the expected positive effects of strong marital attachment or strong job stability on later criminal activity. [50]

Not incidentally, Crime in the Making has received numerous awards including the prestigious Michael J. Hindelang Book Award from the American Society of Criminology. In addition LAUB and SAMPSON have published 25 articles and book chapters based on their reconstruction of this classic data set, very little of which would have been possible without the availability of the case materials, the raw paper data, the 60 boxes of data stored first in the Harvard Law School basement and relocated to the Murray Center. In the last year or so, LAUB and SAMPSON have also completed a follow-up study of the GLUECK men. [51]

These are but a few examples of possibilities for research using data that already exist. These examples show the importance of preserving and making available qualitative as well as quantitative data. As we have seen, archival qualitative data afford the possibility of creating new prospective studies from old without having to wait for the issue of interest to emerge as the data are collected over many years. This approach also makes it possible to study topics that might not have been brought to the attention of researchers and/or policy makers when the original study was initiated. Childhood sexual abuse, for example, was not brought to the attention of the public until the early 1980s, long after many of today's important longitudinal studies were launched. With the presence of interview material or open-ended questions about family-related experiences, such studies might be created with existing longitudinal data sets. [52]

We have also seen the value of adding new waves of data to existing longitudinal data or making a cross-sectional study into a longitudinal one. Such approaches, while adding new findings and new causal inferences also render the data more useful for new research. There are at least 135 studies at the center for which further follow-up is possible (see Appendix B). All require approval, usually from a screening committee, but well-developed proposals, which honor the restrictions placed by the contributor, are rarely turned down. [53]

In addition to creating new prospective studies out of old ones, and adding to existing longitudinal studies, we have advocated the use of multi-cohort designs. Such designs can be created by adding data sets from the archive to a data set that one has of a single cohort, or by combining several data sets obtained from the center's archive. While there are few good examples, the value of these for illuminating developmental patterns cannot be underestimated. As the center's holdings grow, we hope that more and more researchers take advantage of the opportunity to examine age, period and cohort effects by using different data sets with similar variables of interest. [54]

Finally, we have shown several ways that old data sets can be radically restructured to develop new studies when there is enough information for the new investigator's quite different theoretical lenses. The Intergenerational Studies data, for example, was already a venerable old longitudinal study before the Q-sort technique, employed by MacDERMID and FRANZ (1998), was developed. Similarly, the Cold War Era had not begun when KELLY began his study of couples engaged to be married. That the Cold War Era began and ended as a longitudinal study continued to grow, along with the richness of participants' own words that the data included, made this particular study rich for analysis by historian Elaine Tyler MAY (1988). Similarly, several methods that LAUB and SAMPSON have used to analyze the data collected by the GLUECKs were only recently developed. Without event history analysis, for example, LAUB and SAMPSON could not have isolated the determinants of the various patterns of crime. Without the data that the GLUECKs' collected, dates and other subject records that were very unusual for criminologists of their day, these new approaches could not be applied to the data. [55]

In sum, and quite simply, qualitative data provide more information for subsequent researchers to use for new studies. Such data make it possible to apply new statistical techniques to old data. Such data allow us to study constructs and concepts that were unheard of when the original study was conducted. They allow us to examine lives during a historical period that looks very different in hindsight than it did while it was being experienced. Most importantly, such data preserve the perspectives of the respondents. Thus, we believe the preservation and sharing of qualitative data to be an important avenue for data centers and archives to travel, both here and abroad. [56]

7. Conclusion

In this paper we have made the case that archiving data that social and behavioral scientists collect and house in their offices or basements can be better stored and cared for elsewhere, that other social scientists might find archival data very useful for unanswered research questions of their own, and better still, that archival data can provide valuable research resources for generations to come. Moreover, we hope we have been convincing in our steadfast belief that we can provide rigorous and sustained protection of human subjects whose data are turned over to archival storage and made available for re-use. Finally, through our examples, we have argued that qualitative data materials, preserved and shared, are in many ways more useful for important and valuable new research than quantitative data. [57]

Appendix A

Memorandum of Agreement between
The Henry A. Murray Research Center of
The Radcliffe Institute for Advanced Study, Harvard University
and Data Contributors

This Agreement is made between (the "Contributor") and The Radcliffe Institute for Advanced Study, Harvard University regarding the data set entitled. The Henry A. Murray Research Center (the "Center") is a division of The Radcliffe Institute for Advanced Study, Harvard University. The Contributor hereby irrevocably deposits into the Center the following materials (the "Materials"):

Names and addresses of participants and the name-to-number list

(  ) are

(  ) are not

being transferred to the Center.

1. The Center will pay up to ....$ of the costs involved in acquiring the materials specified above including the costs of removing the names and such other identifying information as determined by the Center or the Contributor.

2. The Contributor (check one)

  • (  ) believes there is reason to maintain the confidentiality of each individual respondent.

  • (  ) believes there is no reason to maintain the confidentiality of each individual respondent.

3. If Box A in paragraph 2 is checked, the following information shall be deleted from the Materials: (Check all that apply)

  • (  ) last names only

  • (  ) first and last names

  • (  ) addresses

  • (  ) other, please specify:

The Center may delete such identifying information as it deems appropriate in addition to the deletions, if any, specified above by the Contributor.

4. The Center will not violate any prior agreements between the Contributor and research participants that are made known in writing to the Center. The Contributor warrants that he/she has attached any agreements or consent forms or copies thereof that he/she or others obtained from the participants of the Materials, and to which he/she has access; that he/she has made reasonable efforts to obtain a sample of any agreement or of any consent form given to participants, in the study; and that he/she has on Exhibit A attached hereto described to the best of his/her recollection the information given to the participants as to the future use of the Materials, and the conditions under which the participants agreed to participate in the studies. The Center will obtain an undertaking in writing from users that they will not knowingly divulge any information that could be used to identify individual participants in the data set, except to the extent necessary for permitted follow-up studies or where the Contributor indicates in writing that there is no reason for anonymity and the Center concurs in such conclusion.

5. The Materials are subject to the following restrictions in their usage:

(Check all that apply)

  • (  ) Only researchers specifically approved by the director of the Center for access to the "Restricted Materials" specified below will have access to those Materials. Researchers who wish to have access to Restricted Materials must make clear in their proposals why their research requires these Materials. (List all Restricted Materials):

  • (  ) The Contributor wishes to place the following restrictions on the Materials: (Check all that apply and specify in the space provided)

    • (  ) The Materials cannot be used by the following types of researchers (If this box is left blank, the Center will make them available to qualified users):

    • (  ) The data cannot be used for the following areas of inquiry, which are reserved for the Contributor and his/her students (If this box is left blank, secondary analysis may be performed in any domain):

    • (  ) Other limitations, if any, on use of the Materials:

6. The following provisions relate to the follow-up of the sample. (Check box A or B)

  • (  ) The Contributor will allow the sample to be followed up by researchers affiliated with the Center subject to the following conditions:

    • (  ) A follow-up study may only be performed with the collaboration of the Contributor.

    • (  ) A follow-up study may be performed with

      • (  ) no further restriction. Names and addresses of study participants may be made available to affiliated researchers who may be permitted to make contacts with participants at the discretion of the Center.

      • (  ) the restriction that subject identifiers may be used only by the Center's staff.

      • (  ) the restriction that any contacts of the participants must be made through the Contributor unless he/she gives written permission to the researcher to make such contacts.

  • (  ) The Contributor will not allow the sample to be followed up by researchers affiliated with the Center.

  • (  ) Follow-up restrictions do not apply since this study is a follow-up of a data set held by the Center. The Contributor is required to provide the Center  with the names and addresses of subjects.

7. The restrictions, if any, contained in paragraphs 5 and 6 of this Agreement shall last for:

  • (  ) 5 years; (  ) 10 years; (  ) 20 years;

  • (  ) Other (subject to approval by Center):

Explanation of the background for paragraph 8.

Over the years, literary rights (copyright) can create problems for a data archive. A scholar who wishes to reproduce or utilize a questionnaire, a code book or other material from a data set after its Contributor has died may not be able to locate the heirs. There may be other difficulties, including illness or incompetence of heirs, which will make it impossible for a social scientist to obtain consent to use the material. Therefore, the Center seeks to have the copyrights transferred to The Henry A. Murray Research Center of The Radcliffe Institute for Advanced Study, Harvard University.

A Contributor can only transfer copyright for those materials in the data set (e.g., questionnaire, code book, scoring manual, the compilation, etc.) that were personally created by the Contributor(s), that were created for the Contributor(s) as a work for hire, or in which copyright was transferred to the Contributor. If some of the material contributed is material in which persons other than the Contributor have copyright, the Center must be informed so as not to violate the rights of others.

Under the copyright law which took effect in January, 1978, a written transfer of copyright is needed in addition to the transfer of the physical property. Copyright lasts for an author's life and fifty years.

8. The Contributor makes the following undertakings as to copyright:

  • The Contributor warrants that the Materials are his/her own, except for those Materials bearing a legible copyright notice in the name of a person other than the Contributor, and that they do not infringe upon the copyrights of others. The only exceptions are those as set forth herein:

  • The Contributor agrees that the Materials contributed to the Center shall become the sole property of The Radcliffe Institute for Advanced Study, Harvard University to which all rights, including copyright, of the Contributor are hereby assigned, it being understood that the Contributor shall have the right to use his/her data in any future research or publication. The only reservation of rights by the Contributor are those set forth herein:

9. The Contributor will be deemed to have an affiliation with the Center, and will have access to any data set held by the Center, subject to the restrictions specified by its Contributor or the Center.

10. Each applicant who wishes to use the Materials will have to summarize in writing the proposed area of inquiry.

11. The Center will require a written undertaking by each user of the Materials to:

  • give credit to the Contributor through citation in manuscripts and publications using his/her data.

  • furnish two copies of any manuscripts (if not published) and two copies of any published report(s) based directly or indirectly on use of the Materials to the Center.

12. The Center shall give to the Contributor one copy of any manuscript or published report referred to in paragraph 11.B upon his/her request.

13. The Center may make a charge to users for use of the data to cover the expenses and costs of operating the Center.

14. Notices to the Contributor will be deemed sufficient if mailed to the Contributor at the following address until the Center has received written notification of a change of such address:

  • Address: ____________________________

  • ____________________________________

  • ____________________________________

  • Phone: ____________________

  • Fax: ______________________

15. Decisions specified in this Agreement to be exercised by the Contributor regarding use of the Materials will, after the death or incapacitation of the Contributor, be made by the director of the Center, or the director's designee.

If the Contributor is an institution, decisions regarding use of the data set will be made by the director of that institution. In the event that the institution ceases to exist, decisions will be made by the director of the Murray Center, or the director's designee.

16. This Agreement is executed under seal and shall be binding on the Contributor, his/her heirs and assigns and on Radcliffe and its assigns.


  • (Signature):____________________________Date:_______________________

  • Murray Center

  • By: ____________________________Date:_____________________________

  • The Radcliffe Institute for Advanced Study, Harvard University

  • By: ____________________________Date:_____________________________


To Memorandum of Agreement


1. To the best of my recollection, (check one)

  • (  ) no consent form or other agreement was given to participants.

  • (  ) a consent form (or other agreement) was given to participants.

2. If Box B was checked, fill in the following:

  • (  ) a sample consent form (or copy of an agreement) is attached hereto.

  • (  ) no consent form (or agreement) is available to me despite diligent efforts on my part to obtain such form (or agreement).

3. To the best of my recollection, the following information was given to participants as to the future use of the Materials:

  • (  ) none

  • (  ) other (please specify)

4. To the best of my recollection the following are the conditions under which the participants agreed to participate in the study:

  • Executed under the penalties of perjury.

  • (Signature): Date:

(Name Printed):

1) This paper was prepared for presentation at the Fifth International Conference on Social Science Methodology, October 3-6, 2000. The authors gratefully acknowledge the support of Evelyn LIBERATORE and Alison BETTER in the preparation of this manuscript. <back>

2) While we do represent several disciplines through our examples, we have, because of the nature of the archive and because of the first author's expertise as psychologist, drawn more heavily from psychological research. <back>

3) This prospective study did make use of materials that are now among the Murray Center's holdings, but the reanalysis was, of course, conducted before the Murray Center was founded. It is provided here as an heuristic example. <back>

4) Even though the HEATHERTON study presented here did not require the use of qualitative materials, it is provided here as an excellent example of the value of making such data available for follow-up studies. <back>

5) The mothers of this study have also been followed-up in recent years (see MALLEY & JAMES, 1997; MALLEY, 1999; JAMES, 1999). <back>

6) Subsequent follow-up studies have been conducted by John CONNOLLY (1979-1981). <back>


Jacquelyn JAMES, is associate director of the Henry A. Murray Research Center at the Radcliffe Institute for Advanced Study at Harvard University. James, who received her doctorate from Boston University in personality psychology, has focused her research on the meaning and complexity of gender, adult development, and motivation. Most of her work has involved using existing data within the archives of the Murray Center*restructuring qualitative data—to create new constructs and variables for a variety of research questions.

While at the Murray Research Center, Dr. James has administered two three-year research programs sponsored by the MacArthur Foundation: "The Midlife Research Program" and "The Character and Competence Research Program." These programs have attracted leading scholars to the Murray Research Center to use both quantitative and qualitative data to conduct research on a range of topics relevant to each theme. The results of the analyses from both of these programs were published as edited volumes. The first, "Multiple Paths of Midlife Development" (University of Chicago Press) with coauthor Margie Lachman of Brandeis University, is an edited volume compiling twelve studies examining different aspects of midlife development. The second, similarly compiled, came out in 1998 and is entitled "Competence and Character Through Life," co-edited with Anne Colby and Daniel Hart (University of Chicago Press).

In 1995, Dr. James organized a conference to address the controversy regarding gender differences, "Beyond Difference as Model for Studying Gender: In Search of New Stories to Tell." The issues addressed in this conference were published in a book, "The Significance of Gender: Theory and Research about Difference" which was published last summer as a special issue for the Journal of Social Issues). Dr. James has presented her work conducted in collaboration with Dr. Rosalind Barnett at an international conference on Women, Family and the Labor Market in Doha, Qatar—"The Effects of Disagreements About Gender Role Beliefs in Dual Earner Couples." Dr. James is a public lecturer and the author of numerous articles on midlife, personality and gender issues, and the use of archival data for new research.


Jacquelyn James

Harvard University
Radcliffe Institute for Advanced Study
Murray Research Center: A Center for the Study of Lives
10 Garden Street,
Cambridge, MA, USA

E-mail: Jacquelyn B. James


Annemette SORENSEN, Director of the Murray Center, is a sociologist specializing in the study of gender stratification in North America and Europe. The sociology of the family, the life course of women and men in modern society, and the impact of public policy on gender relations are other research interests. Her current research is a comparative study of economic inequality between men and women in Europe and the US towards the end of the 20th century. This study is a continuation of her earlier work on married women's economic dependency. A major focus in the current research is to examine how men are affected by women's increasing earnings. Other recent research has focused on the risks associated with the post-nuclear family system with high divorce rates, high rates of childbearing outside marriage, and high employment rates for both parents and on the life course of women and men in the former German Democratic Republic. She received her Ph.D. in sociology from the University of Wisconsin-Madison and has taught there as well as at Harvard University and Boston University.


Annemette Sorensen

Harvard University
Radcliffe Institute for Advanced Study
Murray Research Center: A Center for the Study of Lives
10 Garden Street,
Cambridge, MA, USA


