Volume 9, No. 3, Art. 39 – September 2008

Using Video for a Sequential and Multimodal Analysis of Social Interaction: Videotaping Institutional Telephone Calls

Lorenza Mondada

Abstract: The paper aims at demonstrating some analytical potentialities of video data for the study of social interaction. It is based on video recordings of situated activities in their ordinary settings—producing "naturally occurring data" within a naturalistic perspective developed by Harvey SACKS and subsequent research within ethnomethodology, conversation analysis, interactional linguistics and workplace studies. Analysis focuses on a particular kind of video recording, produced during fieldwork in call centers: it shows the payoffs of videotaping telephone calls in professional and institutional contexts. In the previous literature, phone calls have been treated as a case in which audio recordings were adequate for the resources mutually available to the participants themselves. Video documentation of phone calls in work settings shows that they involve, on the part of professional operators, more than talk at work or than talk as work: they make it possible to observe the complex work activities running simultaneously with the call and in the service of the call, i.e. the multi-activity the call taker is engaged in. In this paper, I analyze temporal and structural features of professional multi-activity in three sequential positions: in pre-beginnings, during the call while Internet searches are initiated, and in post-closings. These positions show the finely tuned coordination between the call and the other activities of the operator, as well as the continuity between the call, subsequent calls in a series and the continuous flow of work in the call center.

Key words: conversation analysis; video; telephone calls; sequentiality; multimodality; multi-activity; pre-beginnings; post-closings

Table of Contents

1. Introduction

2. Audio and Video Recordings in a Naturalistic Perspective

3. Audio and Video Recordings of Phone Calls

3.1 A rich and inspiring literature on phone calls

3.2 Beyond talk: from sound to visual dimensions of calls

3.3 Video recording of calls

4. Multi-activities during Calls: Revisiting Boundaries of the Calls

4.1 Between one call and another: Call's pre-beginning

4.2 During the call: activation of Internet searches

4.3 Post-closings

5. Conclusion: Beyond the Unity and Autonomy of the Call


Appendix 1: Transcript Conventions






1. Introduction

The aim of this paper is to demonstrate some analytical potentialities of video data for the study of social interaction. It is based on video recordings of situated activities in their ordinary settings—producing "naturally occurring data" within a naturalistic perspective developed by Harvey SACKS and subsequent research within ethnomethodology, conversation analysis, interactional linguistics and workplace studies. [1]

The paper presents analyses that focus on a particular kind of video recording, produced during fieldwork in call centers. In this way, I hope to demonstrate the payoffs of videotaping telephone calls in professional and institutional contexts. In the previous literature phone calls have been treated as a case in which audio recordings were adequate for the resources mutually available to the participants themselves. Video documentation of phone calls in work settings shows that they involve, on the part of professional operators, more than talk at work or than talk as work: they make it possible to observe the complex work activities running simultaneously with the call and in the service of the call, i.e. the multi-activity the call taker is engaged in. In this paper, I analyze temporal and sequential features of the work accomplished in call centers revealed by this multi-activity and documented by video recordings. [2]

2. Audio and Video Recordings in a Naturalistic Perspective

Video data are currently being produced and used within a diversity of epistemological perspectives, ranging from experimental settings to ethnographic studies. [3]

Ethnomethodologically inspired conversation analysis constitutes one of the early attempts to use audio and video data for the study of human and social behavior. From 1963 onwards, Harvey SACKS carried out fieldwork at the Suicide Prevention Center in Los Angeles. He began using audio tapes of calls for help that the institution itself was recording (SCHEGLOFF, 1992, p.xvi). At the same time, SCHEGLOFF was working at the Lay and Society Center on tapes of psychiatric and criminal insanity examinations (1992, p.xvii, n.8). As early as 1970, in Philadelphia, Chuck and Candy GOODWIN were carrying out video recordings of everyday dinner conversations and other social encounters, which were used from 1973 on by JEFFERSON, SACKS and SCHEGLOFF in research seminars and then in published papers. In 1975, SCHEGLOFF presented a paper co-authored with SACKS—killed a few weeks earlier in a car accident—at the Annual Meeting of the American Anthropological Association (published much later on, see SACKS & SCHEGLOFF, 2002) on "home position," an early attempt to describe bodily action systematically. In 1977, Charles GOODWIN presented his dissertation at the Annenberg School of Communications of Philadelphia (afterwards published as GOODWIN, 1981), on the basis of about 50 hours of videotaped conversations in various settings (1981, p.33). [4]

The interest in video recordings of "naturally occurring data" emerges from a naturalistic perspective largely developed by Harvey SACKS (1963, 1984), but also inspired by GOFFMAN's ethnographic observations (1963), as well as by KENDON's early work on glances and gestures (1967, 1990), SCHEFLEN's context analysis (1972) and BIRDWHISTELL's kinesic analysis (1970). All were directly or indirectly influenced by the Natural History of an Interview, a project begun in 1955, offering a pluridisciplinary analysis of communication, language, paralanguage and kinesics, as well as of psychiatric issues, focusing on a very early video corpus, constituted by the recording of a psychiatric interview between BATESON and one of his patients, Doris (cf. McQUOWN, 1957, 1971), and transcribed by HOCKETT, BIRDWHISTELL and McQUOWN (McQUOWN, 1971, ch.6). The title of this project refers significantly to an analysis of human behavior that recognizes the importance of "spontaneous conversational materials" in "a variety of contexts" (1971, ch.10, pp.9 and 11). [5]

From the 1960s onwards, conversation analysis (CA) has built a tradition that has been influential in developing reference to "naturally occurring data," as data collected in situ, documenting various forms of conduct that are not orchestrated or provoked by the researcher and which occur ordinarily and routinely in that setting (MONDADA, 2006a). This emphasis within CA is strongly associated with the analytical aims motivating the documentation of talk-in-interaction: it refers to the fact that talk and other social practices are organized in a locally situated way, orienting and adjusting to the peculiarities of the context in which they unfold—thus making it impossible to transpose or to simulate these practices in an experimental context1). It therefore refers to an analytical agenda that aims to describe organizational patterns of behavior which exploit in an indexical and systematic way various multimodal resources in their detail: grammatical, prosodic, gestural, visual resources are studied as being mobilized, arranged and possibly reconfigured by participants in the local organization of their action, sensitive to the contingencies of context. This agenda is based on rejecting the projection of etic models on participants' actions (what GARFINKEL, 1967 calls an "ironic stance" treating them as "cultural dopes") and, consequently, on the attempt to construct an emic account of the ordered character of these situated practices. Emic analysis takes into account participants' perspectives in the reconstruction of the local order of talk, of action and of reasoning as methodical achievements. These aspects are central to the project of describing talk and other actions as "naturally organized" (GARFINKEL, 1996)2). [6]

The consequences of this naturalistic perspective for the description of the linguistic, social and cultural order as in situ arrangements of participants in the organization of their everyday affairs are theoretically important: language is seen not as an abstract set of potentialities but as situated action, organized in the temporal and sequential unfolding of its uses, mobilized with other multimodal resources such as glances, gestures, bodily postures and body movements. [7]

3. Audio and Video Recordings of Phone Calls

The naturalistic perspective has produced a range of analytical uses of video. On the one hand, video data have stimulated the sequential analysis of talk-in-interaction as being strongly coordinated with multimodal resources (cf. GOODWIN, 1981, 2000a; SCHEGLOFF, 1984; HEATH, 1986; STREECK, 1993, 1996; HAYASHI, 2005; STIVERS & SIDNELL, 2005; MONDADA, 2007a). On the other hand, they have made possible the analysis of professional activities in complex settings, taking into consideration talk and action mobilizing technologies and objects in material and spatial environments (cf. GOODWIN, 1996, 2000b; SUCHMAN, 1996; HEATH & LUFF, 2000; LUFF, HINDMARSH & HEATH, 2000; MONDADA, 2007d). The former have been based largely on everyday conversations, and the latter have highlighted specific features of professional practices and institutional contexts. [8]

In the following section, I show that phone calls have been a classical topic for the sequential analysis of audio data in CA and that their study, on the basis of video data, opens up new insights, beyond the understanding of the organization of calls, about the organization of the work done for and by the calls. [9]

3.1 A rich and inspiring literature on phone calls

Phone calls have been studied in historical and seminal analytic work within CA—although not prompted by any direct interest in calls themselves (SCHEGLOFF, 2002, p.287; SACKS, 1984). SACKS's doctoral dissertation (1966) was based on his fieldwork at the Los Angeles Suicide Prevention Center and on the calls for help that were answered there, and SCHEGLOFF's dissertation on telephone openings (1967) was based on calls to the police and on informal phone calls. Nevertheless, phone conversations are a type of practice largely analyzed in CA "by accident" (SCHEGLOFF, 2002, p.287), of interest because: (a) they are "naturalistic data" (2002, p.287) available for close analysis without having being produced for scientific inspection; (b) their organization is based on what both parties can hear and not on embodied resources for interaction. This lack of visual cues makes audio recordings a legitimate and adequate way of recording of phone conversations:

"For studying co-present interaction with sound recording alone risked missing embodied resources for interaction (gesture, posture, facial expression, physically implemented ongoing activities, and the like), which we knew the interactants wove into both the production and the interpretation of conduct, but which we as analysts would have no access to. With the telephone data, the participants did not have access to one another's bodies either, and this disparity was no longer an issue" (SCHEGLOFF, 2002, p.288). [10]

While analyses carried out on these data focused on the organization of talk-in-interaction in general, they revealed specific features of telephone conversation—as the summons/answer sequence (SCHEGLOFF, 1968). The interest was not in technology per se, but in technology as a device refracting more general phenomena (SCHEGLOFF, 2002: 290). This has produced fundamental work in CA: on membership categorization analysis (SACKS, 1972, 1992), on initial thoughts about sequencing organization (SACKS, 1992), on identifications (SCHEGLOFF, 1979), openings (SCHEGLOFF, 1968, 2002), closings (SCHEGLOFF & SACKS, 1973), adjacency pairs (SCHEGLOFF & SACKS, 1973, see SCHEGLOFF, 2007) and other key concepts. [11]

Later studies have researched the specificities of phone conversations (see HOPPER, 1992): some authors have been interested in the degree to which telephone calls differ from face-to-face interaction (HUTCHBY, 2001, p.81); other have focused on the technology mediating interaction, opposing interaction with mobile phones, or with fixed phones (see the controversy between ARMINEN & LEINONEN, 2006, and HUTCHBY & BARNETT, 2005; cf. ARMINEN, 20053)). [12]

Similarly, SCHEGLOFF's analysis of openings was more interested in the general sequential constraints organizing the openings than in the contextual specificities of the call—related to its degrees of (in)formality or to its embeddedness in an institutional context. Other analyses, such as those by WHALEN and ZIMMERMAN (1987) and WAKIN and ZIMMERMAN (1999) deal with the specific sequential formats of institutional calls (cf. FELE, 2007)—characterized by a specialization and a reduction of the openings. The former feature refers to the selection of identification-oriented over recognition-oriented response to the summons; the latter to the absence of recognitionals, greetings and "howareyous," and to the fact that the first topic is uttered in the second turn, immediately after completion of the summons/response pair, in a manner that permits a very early presentation of the reason for calling. [13]

3.2 Beyond talk: From sound to visual dimensions of calls

Classic analyses of calls mention various dimensions relevant to their organization that are not only about talk but concern other aspects, showing that participants do orient to the environment of the call, to its material, social, praxeological context. [14]

Even an apparently mechanical event, such as the sound of the phone ringing, is the subject of endogenous analysis and of practical reasoning's on the part of the called. As SCHEGLOFF (1986) demonstrates, participants can topicalize the number of ringing tones (e.g. a quick response by the caller or a delayed one) and formulate a guess about the location of the phone or about the current activity of the caller. Similarly, elements of the context can be inferred on the basis of common knowledge or of what transpires not just from talk but from the sound environment (the caller may wonder if they have disturbed the called watching TV: this can be inferred from their knowledge about the time at which soap operas are broadcast and/or from the sounds that can be heard in the TV room). In this way, callers orient to elements other than talk; ones they can hear but which are taken into account not as mere sounds but as activities, events, and so on. [15]

With regard to institutional calls, ZIMMERMAN (1992, p.423) speaks about "keyboard activity" as hearable from "keyboard sounds," interpretable in terms of dispatch activities. He transcribes these sounds (attributed to "kb," the keyboard) and synchronizes them to the ongoing talk (1992, p.424)—thereby showing that typing has been activated in the surroundings of talk dealing with address information, thus allowing the inference that this typing is related to dispatch operations. Moreover, the complete text that has been typed is offered too (1992, p.425), inferred from the dispatch package and its saved messages. In this way, ZIMMERMAN not only attributes sense to the keyboard activity but (a) shows how participants orient their talk to this feature and (b) uses it to relate, in an important way, the call activity to the dispatching activity of the operator. The relation between ongoing call talk and ongoing keyboard activities is consequential for talk: the forms that have to be completed generate some of the operator's questions and constrain the order in which they use the verbal information and insert them into the system.

"The dispatch package consists of slots to fill with particular classes of information, and the activity of providing/eliciting the required items constitutes a potent contingency of the interaction. Moreover, acquiring, processing, and entering this information are largely parallel activities taking place over the actual course of the call. Thus, while the point may be obvious, it is nevertheless important to exhibit the framework of nonvocal activities—listening for, coding, and entering information—within which CTs orient to what Cs say in the course of the call" (1992, p.431). [16]

Interestingly, not only does CT (call taker) process the information of C (caller) in this way, therefore orienting to and eventually shaping their adequacy as conforming with or, even, as constrained by the system, C also orients to what CT is doing, i.e. to what is transpiring as a result of her keyboard noises. In this sense, keyboard activity is publicly available to the caller, who can adjust the features of their direction to it. This aspect can be seen in the following transcript, where C is phoning a technical service to complain that the lift in his building is out of order and to ask for a technician to be sent:

Transcript 1 (jean/rec16); for transcript conventions see Appendix 1 [17]

In this excerpt, both the call taker (CT) and caller (C) are orienting to the keyboard activity as something relevant to the conversation and therefore for its sequential organization: line 4, caller is not using the "non talking" slot as an occasion in which to speak, thereby orienting to the keyboard sound as a call-taker activity which is related to the registration of his request for assistance. Moreover, the call taker is also orienting to the accountability of what happens as neither "silence" nor "inactivity" but as an activity related to the case and to the immediate previous turn: this is displayed by him mumbling (5), repeating the last location information—that is, by tying the typing activity to the last relevant information. In this case, we can observe (a) that CT displays the accountable dimension of his action and (b) that C orients to keyboard sounds as related to an accountable CT's action and as a relevant action to solve the problem that has produced the call. [18]

The sequential format acknowledged by both seems to be as follows:

  • CT asks a question,

  • C gives an answer,

  • CT types the relevant information into the system. [19]

The next sequence is not initiated until the post-second pair part activity is achieved. [20]

Here both participants are aligned in dealing with keyboard sounds; in other cases, they can diverge: for example, C can pursue their talk or CT can minimize the noises from their environment and pursue more "private" parallel activities. [21]

As can be seen from this example and from the literature, a certain number of relevant features can be reconstructed from keyboard sounds—as well as from other sounds—as they are dealt with by the participants in their common interaction on the phone. Nevertheless, this reconstruction remains limited: the quality of the recording often means that continuous and homogeneous access to the CT's activities is denied (e.g. it is difficult to hear the keyboard sounds when both parties are talking, and therefore to transcribe exactly when the CT's activity begins and ends, and thus their exact synchronization with talk); it is also not guaranteed that what we hear as analysts on the basis of the audio recording is what the participants hear or even could hear. Even from the restricted perspective of the participants, this reconstruction is not perfect. [22]

Moreover, if we want to document not only the mutual access during the call, but also CT's work and the task of dispatching help, this reconstruction is even poorer. In particular, we have access to "keyboard sounds" but not to "CT's activity" in its specificity—such as searching on the Internet, filling in a form, writing a report. Regarding this latter point, a video recording can make the CT's mundane work of talking and writing observable, as in the following example:

Transcript 2 (bike / 2501-15.15-5.51E=5.58O; current call is finishing (1–4) and CT takes immediately the next one which had been suspended as soon as it arrived, during previous call)4) [23]

After closing the previous call, line 5, CT activates the other line and takes the next call (8). After aligning with the greetings, C agrees with some hesitation to give her reference number (10–11): this offer is accepted by CT (13), and the number is given by C (14–18). CT asks the reason for calling (20) and C provides an extended description of her problem (22–31). [24]

It can be seen that C, in proposing to give her reference number just after the greeting's second pair part and even before articulating the reason for calling (10), orients immediately to the fact that CT has the opportunity to retrieve some information about her case based on her number. C is thus orienting to the fact that CT works at the computer. The exact way in which C utters the number, in two parts (14 and 18), also demonstrates their orientation to the fact that CT is expected to write the numbers down (C does not just "tell" the number, but "dictates" it). [25]

It can be assumed that this orientation to CT's computerized task is also sensitive to the keyboard sounds made by CT at the beginning of the call. In this particular case, it can be observed that the online treatment by C of these "sounds" is problematic. C can hear keyboard sounds from the very beginning of the call, well before she has offered to give her number: CT is typing as soon as she says "patienté" at the end of her very first turn-constructional unit (TCU) (8): the keyboard sound makes it apparent that she is typing text—and not that she is, for example, restarting the computer or opening a new screen. C appears to have difficulty in interpreting this sound, as can be seen from the long 0.9 second pause (9) between the first and the second greetings pair part and from the hesitation marking her offer (a long "EUH:::m" and a false start, used as if C is delaying her contribution and waiting for CT's attention, in a way similar to that described by GOODWIN (1981)—here the keyboard sounds operate as the absent glances in GOODWIN's face-to-face interactions). Moreover, after CT's go ahead (13), when C gives her number (14 and 18) she adopts a format that (a) is adjusted to the action of dictating numbers (she splits the number in two and waits for confirmation after the first part) and (b) seems to orient to the absence of keyboard sounds (this is apparent from the long pause after CT's confirmation, 16, as well from the harrumph and the pause after this second part, 18–19). [26]

From the perspective of what CT is making publicly available of her activity and what is being heard by C, there is a puzzling situation: CT types when she is not supposed to do so, and does not type when she is supposed to. [27]

Video recording allows the call to be recast within the call taker's work as a continuous flow of actions. First, it is relevant that the call has been suspended while another call was ongoing; CT comes back to the suspended call as soon as she finishes with the previous one (1–4). However, the work generated by this previous call is not finished: CT activates the telephone button to take the next call and, as soon as her hand returns to the keyboard, she continues to type the report she had begun to write about the preceding case. The hearable computer activities at the very beginning of the next call are, therefore, still related to the previous one. Second, when she finishes the report and saves it, she turns back to the initial page of the client management software (11) and does several clicks to activate the relevant page to investigate the folder on the basis of the customer's identification number (11, 13). A detailed analysis of the synchronization between the ongoing talk and the activation of the pages shows that the page on which she should insert the number is not displayed when the customer starts to dictate the number. Anticipating that the time the system needs to display the screen is too long for a direct insertion of the number in the computerized form, CT quits the mouse and uses a pen to write down the number (14). The customer cannot hear this action, in contrast to when CT copies the number on the computer (19) which is clearly hearable by C. [28]

What the video record makes apparent is the skilled way in which CT adjusts to the sequentiality of talk and ensures the transition between one call and the next, and the progressivity of the activity; it also shows the asymmetric perspectives between C orienting to CT's work as it is publicly displayed on the phone and CT's work as it is achieved within the material and technological environment of the call center. This asymmetry is also related to the fact that while for the customer this is a unique call, for the call taker it is just another call in a series (cf. infra). [29]

It can be seen that this phenomenon is observable not by the use of video in general, but by a specific recording device, using two cameras, focused on two types of phenomena, characterized by two different scales: one is centered on CT's body and her work environment; the other is centered on the details of her screen. Moreover, the observability of the phenomenon is made possible by a specific transcript system which captures not only what CT is doing but, most importantly, the synchronization of her gestures in the ongoing talk, thanks to a fine-grained transcript—enhanced by alignment software such as CLAN, ELAN and ANVIL (MONDADA, 2007b). [30]

3.3 Video recording of calls

Video recording of phone calls has become more and more relevant as the mediating technology and the surrounding material environment have been investigated. [31]

First, technology itself has evolved, progressively integrating the visual dimension: videophone uses have been studied by DE FORNEL (1994), who demonstrates how co-participants mutually adjust their visual frames and how this affects the sequential organization of openings. Other video-conferencing devices have been studied by RELIEU (2007), BONU (2007) and MONDADA (2007c, 2007d), with the aims of describing specific adjustments of sequential formats to technologies and studying general features of interaction highlighted by the constraints and potentialities of these visual mediations. Other studies have focused on the effects of visual technological specificities on the sequentiality of talk: studying video monitors in offices and work spaces, LUFF and HEATH (1993) show that although mutual visual accessibility has been increased by this device, ordinary glances and gestural resources are not as effective in establishing mutual engagement and mutual awareness as they are in face-to-face co-present interaction. [32]

Second, more and more studies have addressed the question of the environment in which calls are made. This is particularly the case for ordinary mobile phone calls, for the spatial environment as it is both used by callers and questioned by called (RELIEU, 2002; ARMINEN, 2002); however, this is also the case for other phone calls displaying a finely tuned coordination between the call and the surrounding activities (RELIEU, 2005). This coordination is also noticeable in professional contexts such as control rooms: calls are available to and even exhibited for colleagues to overhear them and collaboratively adjust their own ongoing and future work to the nature of the call (HEATH & LUFF, 1992); calls are also used to reorient current perspectives and to instruct how to view a problem to be solved at a distance (GOODWIN, 1996). [33]

Third, institutional and professional calls have been investigated within workplaces such as call centers: emergency calls, calls for help or customer's calls are part of the ordinary work of call takers, defined not only by the turn-by-turn interchange with callers but also by other concurrent activities, involving other technologies and artifacts. Call centers thus constitute "centers of coordination" (SUCHMAN, 1993) characterized by a complex environment, where various call takers often work side by side, and where they are not only connected with callers but also with other professionals who are dispatched to provide help or assistance. The work of call takers, that is securing an efficient and hearably competent call, can be described as an "improvisational choreography" (WHALEN, WHALEN & HENDERSON, 2002) which is both adjusted to the contingencies of the call and based on systematic arrangements of the technological practices co-occurring with it and sustaining it. Video data allow the work of call takers to be documented as involving the call as well as other, often computer-based, but also document-based, activities (WHALEN, 1995; BOWERS & MARTIN, 2000; LICOPPE & RELIEU, 2005). [34]

In this sense, calls are not autonomous events, clearly separated from other ongoing activities and contexts; on the contrary, they are events that are finely coordinated with other current events, instructing parties in how to see them and how to manage them, producing inferences, information, and, finally, requests that are consequential for their organization. In this way, the introduction of video to study phone conversations has a theoretical impact, allowing a larger activity system to be taken into consideration, comprised of various streams of parallel but not separate activities. [35]

In the text below, I offer evidence and analyses of these activities, thereby characterizing what I call multi-activity (MONDADA, 2006b, in press), and I describe the way in which parties resituate the call in a series of calls and in a stream of multi-layered actions. [36]

4. Multi-activities during Calls: Revisiting Boundaries of the Calls

To explore in more detail some possible payoffs of using videos of calls for analysis, I focus on the embeddedness of the incoming calls in the praxeological, spatial and material environment of the call taker. I explore three sequential positions, located at the opening, the closing and the middle of the call. This embeddedness is especially visible at—and actually questions—the boundaries of the call. I deal not only with the beginning but also the pre-beginning of the call, and, not only with the closing but also the post-closing of it. I also show that not only the boundaries of the call are made fuzzy, but also the core of the phone interaction, showing that it is permeated by other activities inserted into the call. All these examples deal with calls in a series and with multi-activity; the aim of the analysis is to show how they put into question the autonomy of the calls and that they focus on other units made relevant by call takers in their ordinary work. [37]

4.1 Between one call and another: Call's pre-beginning

ZIMMERMAN (1992, p.432) highlights various ways in which calls are not only characterized by a proper "opening" but also by a "pre-beginning": on the part of the caller, he refers to the act of dialing; on the part of the call taker, he points to the fact that the latter orients to the incoming call as a virtual emergency even when the caller is not able to speak or the connection is broken. There are other several ways in which call takers orient to pre-beginnings: for incoming calls, they can see the number displayed and eventually recognize previous callers; for outgoing calls, they can be seen to prepare the call, searching for and reading information on their PC screen. Moreover, an incoming call does not happen in a praxeological vacuum but often in the middle of another call or activity. What happens during the ringtone can be observed and documented on video recordings, which reveal the various ways in which a call is received even before the phone is picked up (cf. SCHEGLOFF, 2002, p.295 on the intrusive and interruptive character of incoming calls on the current activity). [38]

In the case analyzed, I focus on what happens during the ringtone. [39]

The following excerpt was recorded in a European call center situated in France, and that offers help to Spanish drivers experiencing various problems. In this excerpt, we join the action when the operator (OPE) closes a call (1–10) with a customer (C); after 9 seconds, she responds to another incoming call, coming from a colleague (SYL) (12–21):

Transcript 3 (ass 10126D1-50.23, audio) [40]

The audio recording displays the sequential features of the closing. It is characterized by a summary of the decision taken (1–3), followed by the pre-closing (5–6, 8–9), which is not used by any participant, either to recycle previous topics or to introduce new ones, and then by the final greetings (9–10) (cf. SCHEGLOFF & SACKS, 1973). After a lapse of 9 seconds (11), a new incoming call is taken, displaying the sequential features of the opening: the summons in the form of various ringtones (12), the answer, which consists of a pre-recorded message (13) and then the operator's turn, again with the identification of the service and the greetings (14). The incoming call is a second call from a professional who the operator has just called. [41]

These sequential features are related both to generic aspects of the interaction's organization and to specific aspects of this institutional context (such as the pre-recorded message of the service, 13; the double identification by the operator as representative of the service and with her personal name; the puzzle this provokes for the caller, who is not a customer but a partner, visible in the long gap before her greetings, 15–16). In this sense, the record displays the institutional context of the call and its recognizability for different callers (customers, partners, colleagues, etc.). [42]

The audio record allows a detailed analysis of the interactions between operator and callers. It also displays the ways in which the operator and her co-participants refer to past actions and instruct future actions, showing the embeddedness of the call in a series of calls (BUTTON, 1991). The audio records also show the rhythm of the calls, characterizing the work of the operator—here there are 9 seconds between two calls; sometimes this interval is longer, sometimes it is shorter and incoming calls can be inserted in current calls. In this sense, the audio record—depending on the way it is technically realized and by whom—is generally not limited to isolated calls but includes a number of subsequent calls. This can provide a sense of the succession of calls that are characteristic of the operator's work. [43]

While the audio record offers relevant access to the interaction between operator and caller and their mutual hearable resources, it provides only limited access to the work of the operator, which cannot be reduced to talking on the phone. Video records allow analysis not only of the phone calls but of the work activities of which these calls are part, from the perspective of the operator. In this sense, video records achieve a shift of the analytical attention from phone calls to the operator's professional activity. [44]

Video data documenting the same excerpt, recorded by a camera focused on the operator's desk, provide for other observations:

Transcript 4 (ass 10126D1-50.23, video) [45]

Video recording allows documentation of the operator's work as a continuous flow of activities, without any time out. A few observations can be made. [46]

First, while on the audio recording the call is bounded by the end of the telephone connection, a careful observation of the video data shows that for the operator its closing is extended in various actions. The end of the connection is accomplished by the operator pressing the telephone button: the way she does this, with a significant emphatic gesture in trajectory and then by pressing insistently on the bottom, displays her retrospective stance towards the call. Moreover, this action is carried out with a particular facial expression, a smile, and is followed by a long sigh (12), both contributing to the expression of this stance. As well as being displayed this stance is made explicit in the formulation following the sigh ("end of the story" 13 done by hitting the table with the palm), which retrospectively categorizes the case as long and complex. These actions constitute what I call a "post-closing" sequence, which is achieved by verbal, sound and mimo-gestural resources (expressing both the aspectual dimension of the achievement and the stance related to it—i.e. when she hits the table with her hand, 13, and rubs her hands, 14). This sequence is oriented to two different retrospective temporal and praxeological units, the call which has just been closed and the case which has been solved. In this latter sense, it refers to a range of activities and tasks which comprise the call but go beyond it, including reportings and registrations in apposite computer forms, Internet searches, calls to other partners, dispatch of messages and faxes, etc. The operator's bodily posture displays these two orientations: by leaning over the phone while giving instructions to the caller about what do to next with another branch of the enterprise, and initiating the closing of the call (1–10); then by standing up and looking away, projecting the end of the call before its actual closing, and already displaying the stance of relief corresponding with the closing of the case. [47]

Second, the video record shows that as soon as the post-closing of the call is achieved, the operator turns to something else, to other activities: she stands up, projecting some other action, and she is promptly addressed by a colleague, Vic, who asks for help (14, 16, 18). This opens up another dimension of the operator's work, which is not an isolated activity but which takes place in a complex open space, where other colleagues are also working. The fact that Vic speaks to her at that precise moment demonstrates how colleagues mutually monitor their respective work and how the end of the call is publicly recognized as opening up a moment of availability of the operator, i.e. a moment where it is relevant and adequate to ask her for help on another task. [48]

Third, as soon as the operator engages in resolving Vic's problem, the phone starts to ring again (19, 20)—being explicitly treated as an intrusion by the assessment of line 21. From that moment on, the operator's action unfolds as the telephone continues to ring: this does not suspend the action but accelerates it and projects its imminent suspension. Vic too orients to the normative constraint represented by the ringtone (see line 34). [49]

Video recordings permit the documentation of the continuity and embeddedness of these various courses of action beyond that of the phone call; their analysis relativizes the autonomy of the call—which is enhanced by the audio recording plugged directly into the telephone line (vs an audio recording done in the room, which would then miss the caller's speech). As a consequence, the video record allows for a detailed analysis of aspects which are otherwise left to an ethnographic description and which are then considered as constituting the "background" of the work (cf. the ethnographic description of ZIMMERMAN, 1992, pp.41-42 preceding the proper analysis of the calls). This "background" becomes a "foreground" when it is videotaped, allowing for a close inspection of the fine-grained temporal features of the activities in context. Again, this close inspection depends heavily on the recording device and its disposition: in the excerpt analyzed above, the interaction between the operator and Vic remains out of the camera angle, since the latter has been placed on a fixed tripod in a position which is focused on the operator's desk and does not encompass the entire room. This choice favors the activities of the operator and not the interactions she may have with her colleagues. As a result, if the post-closing phase is well documented and allows for a multimodal analysis of facial expressions, gestures, bodily postures, the exchange with Vic is not visible, out of the frame, and is only hearable, thanks to the camera's microphone. In this sense, it is important to remember that the camera itself—similar to any other recording device—does not constitute a transparent window upon the call center, but expresses interpretative choices of the researcher about the focus of the future analysis and the delimitation of action and background (cf. MACBETH, 1999; MONDADA, 2003, 2006a, 2007d). [50]

4.2 During the call: Activation of Internet searches

As seen above, operators' work is characterized by a constant flow of calls and by other activities between calls, which relativize both the importance, the autonomy and the boundaries of calls. This is observable not only between one call and another, but also during the call itself, characterized by juxtaposed calls and activities within one single call, as is shown in the following excerpt. [51]

In this excerpt, the operator is dealing with the case of a Spanish tourist stranded in a village near Paris. She calls the French national train system (SNCF) to book a ticket from there and back to Spain. During the call, she also carries out an Internet search for a taxi to take the tourist and his family to the train station. The call and the search unfold in a parallel fashion and, for a while, independently from one another. During the call, a problem arises when the SNCF call taker cannot find the station mentioned by the operator, Aires sur Marne. He suspends the call in order to investigate further. At this point, the operator completes her previously initiated Internet search and discovers that the name of village is actually Vaires sur Marne. At that point, the call and the search become interdependent, converging and providing mutual resources to resolve the problem. [52]

This excerpt highlights central features of multi-activity in call centers: parallel activities can be either autonomous or interrelated, and their status is not always given a priori and definitely, Instead their status is acquired during the unfolding of these activities and thanks to their temporal and functional coordination. Therefore, detailed study of this coordination and of the temporal features of these simultaneous activities is central. [53]

We join the action as the operator (OPE) dials the train company number: the call is first responded to by a pre-recorded message (SNCF) and then by the call taker, Sébastien (SEB):

Transcript 5 (ass 0912-1.11.02 D1) [54]

The operator dials the number of the train company (1) and this is followed by one ringtone (2). The answer to this summons comes not from a human agent, but from the SNCF pre-recorded message (3). This response results in both another answer, by an agent, and a time lapse: this can end at any moment (in some systems, its length is announced and measured although this is not the case here) and is used by the operator to do something else. Between the ringtone and the automatic message, the attention of the operator switches from her papers—where she had taken notes about the case—to the screen (2–3); she initiates a PC-oriented activity as soon as the first word of the automatic message is uttered (3)—thus projecting a moment's wait. At the end of the first word, the operator moves her mouse to a page behind the current foregrounded page and clicks on a page in the background, the French yellow pages, which had been used during the previous call. This switch of attention and activation of a new screen environment launches a parallel activity to the waiting call, which is finely articulated to the waiting time: the operator scrolls the yellow page, stops on the form to be completed (in the bottom of the page), erases the information inserted in the previous search and inserts the new information ("TAXI," 5) in the slot where the searched professional activity has to be provided. Then she moves to the next slot, where a location has to be inserted: this move is complex, and relies on the notes she had previously written about the location of the customer (7); the previous location is highlighted, then erased (10) and completed with the new search ("AIRES SUR MARNE," 10). Interestingly, inserting this information fits perfectly with the remaining waiting time: the operator finishes writing exactly at the end of the ringtone (11), launching an imminent answer by a SNCF agent. [55]

At the beginning of this call, there are two activities: the opening of the call, which takes time, and the Internet search. When Sébastien answers, the operator turns her attention to the call—embodied by her glances at the phone. After she gives the location, Aires sur Marne, Sébastien has a problem finding the station in his computer system and asks more questions. Even if the operator turns her attention promptly back to the screen, the information she gives to the agent involves another search, on the basis of another source, which is not the web page but her notes, in front of the keyboard. Thus, her attention alternates between the screen, the phone and her notes. This alternation is precisely coordinated with the sequential organization of the call, the questions asked by the agent and the moments in which an action of the agent is launched, used by the operator to come back to her Internet search. [56]

These activities can be followed in the transcript of the call:

Transcript 6 (continuation of transcript 5) [57]

After the exchange of greetings, the operator's reason for calling is first articulated as a ticket order (15), then repaired (18) and reframed as a preliminary question about the existence of a station in the village of Aires sur Marne. This question projects the agent's answer, as well as the expectation that this answer will be the result of a computer search: the operator orients to the fact that it will take some time, by putting both hands on the keyboard during the following pause (19) and by switching her glance from the phone to the screen (20). This switch of attention ends up in her pressing the ENTER key (22), achieving the Internet search prepared before the beginning of the call. A new page appears immediately (22), but the operator does not have time to read it, since Sébastien does not answer her question but instead inserts an extra question about the village's name (24), asking for its spelling. [58]

The operator has already oriented to possible problems raised by the location, by offering an extension of her question (20) about the department, and by collaboratively achieving Sébastien's repetition of the first part of the name of the village (22), providing for the second part (25). [59]

When Sébastien asks for the spelling (24), she displays a double orientation to the multi-activity she is engaged in: her right hand is on the mouse (25), projecting further activities on the screen, but her attention goes to her written notes (27) while she prefaces and delays her answer (27). The spelling (29) is provided on the basis of reading her notes, as she bends over them and points to the written name with her left hand. Again, as soon as she has completed the spelling, she looks back at the screen: again, when she finishes her turn, projecting the other's next turn, she uses the available time slot to look back at the screen (29). She scrolls through the yellow page, as Sébastien continues to have problems in finding the location. [60]

Her use of the computer converges with the search for a solution to the problem raised in the call: she offers to give Sébastien the postal code and simultaneously initiates an Internet search. While Sébastien confidently communicates that Aires sur Marne does not exist on his system (34), she reads the postal code from her notes (37), erases the previous code on the Internet form and substitutes the new one (39). Both are engaged in computer searches for the resolution of the problem—and these searches are actually in parallel, as shown by the fact that Sébastien responds only minimally ("°°ouais°°") to the operator's offer (37), and skips connecting line 40 with his own previous turn, line 31. The operator on her side neglects Sébastien's turn and by reading the postal code (37–38) she skips and connects with her own offer (32). The operator is still focusing on the postal code of the village, while Sébastien is trying to identify the closest station in the nearby region (40), making the assumption that Aires does not have a station. At this stage, neither of them are questioning the existence or the name of the village—this being a case of "documentary method of interpretation" (GARFINKEL, 1967)—and both of them are engaged in parallel in the search for a solution. [61]

At this point, no solution has been formulated; Sébastien suspends the call to continue his searches offline, and in this way accepts responsibility for solving the problem alone, no longer engaging in a collaborative search with his partner. On her part, the operator continues her own Internet activity during this waiting interval and promptly finds the solution:

Transcript 7 (continuation of transcript 6) [62]

During the following 6 seconds (44), she performs a series of operations on the Internet site: a new page appears and while scrolling and reading it she identifies the problem, the name of the village is Vaires and not Aires (45). Her "discovery" takes the verbal form of a change-of-state token (HERITAGE, 1984), followed by the pronunciation of the correct name with an emphasis on the first syllable, followed by an apology to the absent party, some laughter and a formulation of what happened as her mistake. Her discovery is embodied too, since she withdraws from the computer, opens her mouth and shakes her hand. The mistake is corrected on its source, the written notes; the phonetic emphasis on the first syllable ("vaires" 45, 49) corresponds to the repeated marking of the V on her notes. [63]

This discovery will be communicated to Sébastien as soon as he comes back to her without having found the solution—in the meanwhile the operator answers other incoming calls and still does not have time to complete her search and call for a taxi. [64]

This excerpt reveals various multi-activity regimes and their potential alternation: the Internet search begins as a parallel autonomous activity, disconnected from the call, and ends as a search converging with the call's objectives, in service of the call. [65]

Moreover, the excerpt displays some forms and effects of the fragmentation of the activities in multi-activity: parallel activities are discontinuous, being suspended, retrieved; their temporality depends on what is going on in the "main" publicly-displayed activity, i.e. the talk. [66]

Parallel activities—even when autonomous—are intertwined in the sense that the organization of the sequentiality of the main activity constitutes a kind of guideline, requesting more or less attention. These alternate states of attention are related to normative expectations depending on sequential organization and on projections: high attention seems to coincide with moments where the speaker is selected and speaks, low attention coincides with the projection of an expected action from the partner, with moments where this action is expected although delayed. The implicatednes of sequential organization is active even for parallel activities that are autonomous from it; it is even more active when simultaneous activities are initiated in service of the ongoing call. [67]

4.3 Post-closings

Multiple simultaneous activities are not always to the advantage of the same goal, but can converge; they can be simultaneous without being related, but can also be closely articulated although not sharing the same temporality. [68]

In the last case analyzed, the closing of the call does not coincide, with the closing of the Internet activity which was initiated during the call, in a way that is sensitive to its sequential organization and to the emerging problems. The excerpt is taken from an outgoing call initiated by an operator searching for a taxi that could drive a family back to Spain. This ride seems problematic for the taxi driver as early as the initial formulation of the offer:

Transcript 8 (ass 1012 E1-D1 / 8.45) [69]

The operator's offer (01) projects, but is not followed by a second pair part (acceptance or refusal); what comes next is a significant lapse (02) and then an inserted question about the location (03), which formulates the destination in a rather general way (the country) and with a tone of surprise. Other questions about the localization follow, further delaying the second pair part. [70]

In the context of increasing reluctance on the part of the taxi driver, the operator begins to change her body orientation, moving from the phone, on which she is leaning, towards her PC, mouse and screen:

Transcript 9 (later on, within the same call as transcript 8) [71]

In the context in which the taxi driver is not accepting the fare to Spain that she is proposing, the operator initiates an Internet search, using a website providing maps and itineraries (http://www.mappy.fr/). This search begins when the she formulates again the destination ("to: (.) near San Sebastian" 1). This place formulation is produced with a self-repair: "to:" projects the name of the town, but it is repaired with "near," which projects a more general landmark. This self-repair aligns with the level of reference adopted by the taxi driver himself, who has been using landmarks like "in Spain" and "near the border," but has never referred to a particular town or place. Place formulations are recipient-designed in the context of a negotiation—in this case, accepting or refusing the fare (SCHEGLOFF, 1972). [72]

The very fact that in this context the operator turns to her computer and initiates an Internet search is both retrospectively sensitive to the difficulties emerging during the call about the final destination, and prospectively sensitive to an imminent refusal, transpiring from the constant delay of the taxi driver's answer. Turning to her computer, the operator introduces an extra temporal dimension into her work: the activation of her screen, the transition from the current window to Internet Explorer, the selection of the www.mappy.fr web site, which is typed and not bookmarked, the typing of the request, until the appearance of the requested map are operations which not only take time but are also embedded within other talking activities. Thus, the temporality of the Internet search—which is in the service of the ongoing call—is both concurrent and distinct from the time of the unfolding talk. The various steps of the Internet search do not fit with the sequences of talk and the search will eventually be achieved after the end of the call. In fact, the call is closed while the operator is pointing her mouse on the field "starting point" of the mappy itinerary, without having yet typed the name of the target place (23ff). [73]

Turning to her PC and engaging in another activity on the screen means a split of attention, favored by the long pauses delaying the taxi driver's answers. Before the excerpt analyzed here, the operator had already initiated another parallel activity—beginning to write the report about the ongoing call—during a suspension introduced by the taxi driver asking for a moment in order to look at a map. At the beginning of the excerpt, she turns to the computer while she formulates the destination of the fare. She engages in an activity which absorbs her attention during the pause (2) preceding the taxi driver's answer—which offers a first account for not accepting the fare (3)—and during the pauses delaying her own answers (5, 8): the latter are configured not only by the sequential organization of talk, but also by this parallel computer activity. Line 6, she initiates a possible repair of the term "outillage," but is overlapped by the taxi driver expanding his account (7); she visibly turns to the phone line 8, before producing an objection line 9, not attending to the PC until line 10 (see Figures A and B).

Figure A: Operator looking at the screen, Figure B: Operator looking at the phone [74]

Lines 11–12, during the taxi driver's explanation, the operator looks at her screen and completes typing the website address. When she replies with "yes::" (14), she presses the "ENTER" key, still looking at the screen. Her attention switches from the screen to the phone as the taxi driver produces what increasingly resembles a refusal, in the form of describing the impossibility of crossing the border (16): this announcement refocuses the operator's attention on the conversation—as materialized in the form of the telephone. Thus, when the operator answers with a change-of-state token "oh" and a repetition of the modal verb in the negative form (18), she no longer looks at the screen, and continues not to look at it until the confirmation by the taxi driver (20) that he is unable to take the fare. After this refusal has become clear and a last account is proposed, the operator's attention turns again to her mouse and screen, as well as to her documents, projecting the search for an alternative solution. [75]

These analytical details allow for a fine-grained description of the complexity of the operator's work, characterized by her multi-activity. Multi-activity refers to the simultaneous management of the conversation and of the Internet search, i.e. of two activities which have different temporalities, which unfold in a parallel manner, but which nevertheless are finely attuned to one another. Not only is the Internet search initiated at a precise sequential moment within the conversation, but the way in which it unfolds registers and recycles details made relevant in the course of the conversation, such as selecting the information to search for. The temporality of the conversation can be accelerated or slowed down (as in the flat prosody and low pace of the continuer "oui::" 14) to cope with the rhythm of the Internet search (and the power of the computer); inversely, the Internet search can either be momentarily disregarded as attention is entirely focused on the conversation, or managed in a parallel manner. [76]

More radically, multi-activity allows us to consider the non-autonomy of the phone conversation, whose development and outcome is intertwined with other courses of action and other artifacts in the work environment. This non-autonomy is also manifested in what happens after the phone call, where activity is expanded after the closing (cf. 3.1. supra), while the Internet search is still going on: the search on Mappy is neither finished nor abandoned at the end of the call, but is continued. This invites us to look at the post-closing environment of the call:

Transcript 10 (continuation of the activity after the closing of the call; operator's name is Elena [cf. line 68]; Lea and Sara are colleagues working in the same open space) [77]

This last excerpt documents the continuation of the operator's activity, after the call has been closed. Thus, the end of the call does not coincide with the end of the activity: as soon as the phone connection is closed, a critical and ironic comment recycles an element of the previous conversation—the account motivating the refusal (31)—, followed by laughter (33). After this post-closing comment (cf. excerpt 4 above), the operator formulates her orientation to the pursuit of the activity (35), literally formulating the "what's next" principle on which sequentiality is based (SCHEGLOFF & SACKS, 1973). In response to this question, the operator refocuses her attention on the screen and pursues her search on Mappy (37). She first types the departure point (39–47), with various self-repairs (both oral and written)—which can not be analyzed here in detail—, then the arrival point (49). [78]

This computer activity is again embedded in a multi-activity, not related to a phone conversation but to an exchange with a nearby colleague, Lea, who occupies the next desk and is involved in file backup (50ff) (cf. Excerpt 4 above for a similar situation). Lea formulates her current activity in the form of an announcement (52), whose repair is initiated by the operator after a lapse (54) and received by her with a change-of-state token (56) preceding a series of assessments (55–59). Lea makes her announcement (50) at a moment which orients to the fact that Elena is no longer talking on the phone, treating her as being available. Interestingly, this shows how colleagues orient to silences following calls—and not to the fact that the operator is pursuing her activity, publicly displayed by typing on the keyboard and talking aloud or mumbling. This activity seems not to be taken into account by her colleague, and is not treated as an obstacle to their conversation. This takes place in a situation of "continuing state of incipient talk" (GOFFMAN, 1963; SCHEGLOFF & SACKS, 1973), typical of open-plan workplaces, which favor various types of multi-activity (such as working while chatting with a colleague, helping somebody while continuing to work on the computer, etc.). [79]

As the exchange with Lea goes on, the operator continues her search: she carries on looking at her PC, while the requested map progressively appears on the screen (59–67). Her discovery of the result is publicly displayed in a noticing, introduced by a change-of-state token ("oh yes:, it's four hundred kilometers eh," 67). This manifests a change in her geographical knowledge, formulated by the "ah," quantification of the distance and highlighting the itinerary with the mouse (Figure C).

Figure C: Mappy web site [80]

This announcement is not oriented to by the other colleagues co-present, who continue to attend to their own work. Sara, another colleague, announces she has found a solution to a problem that had previously arisen (68–69) and engages in an exchange with Elena (70–87), closed by her "okay" (87). This exchange suspends Elena's activities on the screen; as soon as it is closed, her attention comes back to the same web page which is scrolled while she formulates her next question ("wait a minute, what are they telling me about the distance" 88): this produces a new discovery and a new expression of surprise ("ah oui:::" 90), followed by an evaluation ("four hundred kilometers, it's huge." 92) uttered while she again picks up the preceding call's report. The "discovery" of the exact mileage proposed to the taxi driver retrospectively occasions a reinterpretation of his refusal—and this aspect is recycled as an explicative account when the customer calls back later on. Thus, the post-call achievement in the search for the exact geographical information about the itinerary and the distance enables both a re-evaluation of the output of the previous call and the organization of the next calls, leading to the resolution of the problem. [81]

5. Conclusion: Beyond the Unity and Autonomy of the Call

In this paper, I have outlined the rich tradition of telephone call analysis and its recent developments taking into account new technological mediations as well as new recording devices. The aim of the analyses pursued here is to contribute to this ongoing research, identifying the potentialities and difficulties of using video to explore a domain that has been studied for a long time on the basis of audio data. [82]

The analyses developed in this paper have focused on three key sequential positions within the overall structure of calls: the pre-beginning, the middle and the post-closing of the call. For these positions, I have analyzed various ways in which activities related to the call develop beyond the call itself: between calls, and before the opening of the subsequent one, the operator engages in other activities, which can be interrupted, suspended, accelerated or perturbed by the incoming ringtones (3.1); during calls, either the operator or another co-participant can initiate suspension of the call and engage in other activities, autonomous or in the service of the task pursued in the call (3.2); after the end of calls, activities related to the call are pursued and produce a retrospective revision of what happened in the call as well as a prospective reorientation of the next activities (3.3). [83]

These analyses are made possible by video documentation of the activity in the call center: video data contribute less to the analysis of the interactions between caller and operator, which are organized on the basis of mutually available hearable resources, than to the analysis of the activity of the operator, which comprise the call but are not reducible to it. Thus, video operates a switch in the analytical focus from talk-in-interaction within the call to talk-and-other-conducts within the workplace activities. [84]

This new focus of analytical interest deals with topics that cannot be studied without video recordings, such as multi-activity. Multi-activity emerges as a pervasive feature of contemporary workplaces, characterized by simultaneous flows of action which:

  • can be either independent one from another or one in the service of another;

  • are temporally related in a fine-grained way although involving different paces, different time units, different temporal organizations;

  • can be distributed in a dynamic way between a "main" and a "secondary" activity, between a "background" and a "foreground";

  • can be sensitive to the sequential relevancies of the foregrounded talk, although having different sequential implications and different temporal boundaries. [85]

In order to account for these activities, their unfolding, their coordination, their particular modes of intertwinement, a detailed transcript is necessary, taking into consideration various temporal organizations and synchronizations of different actions. [86]

In this context, the phone call, as seen from the operator's point of view, and from the perspective of the achievement of her work activity, is questioned in its autonomy and unity:

  • call's boundaries are expanded, showing the importance of call's pre-beginnings (where contents, tasks, problems, caller's identity are eventually guessed and projected) and of call's post-closings (where retrospective assessments, accounts, stances and interpretations are expressed, and where the activity initiated within the call is pursued and expanded, and other activities are prospectively launched);

  • calls are characterized by a pervasive multi-activity;

  • calls are not linearly distributed, but often operators are faced with various incoming calls at the same time, having to suspend, to hierarchize and juxtapose them;

  • calls are not unique and acquire the sense being in a flow of calls-in-a-series, comprising follow-up calls, second calls, people calling back, new calls arising from previous ones, etc.;

  • from the perspective of the operator, the relevant unit of her working activity is constituted by the "case" to handle and to solve, constituted by different calls and other activities, more than by single calls;

  • in this sense, the operator is not just a "call taker" but also a "dispatcher," a "partner," a "customer" of other institutions and services, and manages different tasks, with different co-participants, for the resolution of the "case"—acquiring different membership categories, related to different category-bound activities (SACKS, 1972). [87]

While ethnography can produce a sense of the complexity of these multiple activities, video is the only way of documenting them for a sequential analysis accounting for the temporally fine-grained coordination between the mobilization of multimodal resources (talk, facial expressions, gestures, glances, bodily postures, objects manipulations, etc.), the timed use of artefacts and technologies, the constant rearrangement of participant frameworks and the changing foci of attention. [88]


This paper is written in the context of the project ICA, "Interactions in call centers/Interactions dans des centers d'appel" focused on the analysis of video recorded activities in call centers. I thank all operators for their collaboration, as well as Isabel COLON, Clémentine HUGOL-GENTIAL, Camila Diana AMAYA, Vicky MARKAKI and Sara MERLINO who contributed to the project. The support of PHC / Galileo Program 2008 is gratefully acknowledged.

Appendix 1: Transcript Conventions

For talk, data have been transcribed according to conventions developed by Gail JEFFERSON.

[ ]

overlapping talk




micro pause


timed pause


extension of the sound or the syllable it follows


stopping fall in tone


continuing intonation


rising inflection




louder voice


quieter fragment than its surrounding talk




out breath


described phenomena

< >

delimitation of described phenomena

( )

string of talk for which no audio could be achieved

An indicative translation is provided line per line; its primary aim is to help the reading of the original transcript.

Multimodal details have been transcribed according to the following conventions:

actions are described in the following line, in italics, and are synchronized with talk thanks to a series of landmarks:

* *

delimitate one participant's actions descriptions

+ +

delimitate other participant's actions descriptions

| |

delimitate description of changes on the screen


gesture or action described continues across subsequent lines


gesture or action described continues until and after excerpt's end


gesture or action described continues until the same symbol is reached


gesture or action described begins before the excerpt's beginning


gesture's preparation


gesture's apex is reached and maintained


gesture's retraction


participant doing gesture is identified when (s)he is not the current speaker


describes what appears on the PC's screen


indicates the exact point where screen shot (figure) has been taken, with a specific sign showing its position within turn at talk


1) The fact that participants can orient to the camera and to the very fact that they are being recorded is not considered as a "bias," but as a phenomenon which can be treated as a topic of analysis; it is not considered as a general feature that can be imagined, but a local feature that can be empirically identified and described (cf. HEATH, 1986; LOMAX & CASEY, 1998). <back>

2) This reference to "natural" is deeply rooted in SACK's references to early natural sciences and to the possibility of natural observations and natural descriptions of social life (LYNCH & BODEN, 1994). However, this point has often been misunderstood and made an object of debate (see SPEER, 2002): as noted by LYNCH (2002), the term is not used in order to oppose "social" vs "natural" conducts, but refers more to the distinction between "natural' vs "artificial' language and to what SCHUTZ and phenomenology call the "natural attitude" (1962), i.e. a pre-reflexive posture that characterizes ordinary life practices. Thus, "natural" refers, prior to the collected data, to the practices themselves: the term used by GARFINKEL, "'naturally organized', in this context means an ordering of activity that is spontaneous, local, autochthonous, temporal, embodied, endogenously produced and performed as a matter of course" (LYNCH, 2002, p.534). <back>

3) On the one hand, HUTCHBY and BARNETT (2005) relativize the impact of technologies on interactional structure, showing the permanence of interactional practices that emerged with other, older, technological devices. On the other hand, ARMINEN and LEINONEN (2006) show that mobile phone conversations exploit features of the technology and adjust their sequential format to their "affordances" (defined as opportunities offered by technology to action). <back>

4) Each transcript is an excerpt from longer audio and video files; the name of the corpus/of the tape is indicated, permitting a precise identification of the fragment, as well as the temporal location of the excerpt within the complete file. <back>


Arminen, Ilkka (2002). Emergentes, divergentes? Les cultures mobiles. Réseaux, 20(112-113), 79-106.

Arminen, Ilkka (2005). Sequential order and sequence structure: The case of incommensurable studies on mobile phone calls. Discourse Studies, 7, 649–662.

Arminen, Ilkka & Leinonen, Minna (2006). Mobile phone call openings—Tailoring answers to personalized summons. Discourse Studies, 8(3), 339-368.

Birdwhistell, Ray L. (1970). Kinesics and context: Essays on body motion communication. New York: Ballentine Books.

Bonu, Bruno (2007). Connexion continue et interaction ouverte en réunion visiophonique. Réseaux, 144, 25-57.

Bowers, John & Martin, David (2000). Machinery in the new factories: Interaction and technology in a bank's telephone call center. In Wendy Kellogg & Steve Whittaker (Eds.), Proceedings of the 2000 ACM conference on Computer supported cooperative work (pp.49-58). Philadelphia: ACM.

Button, Graham (1991). Conversation-in-a-series. In Deirdre Boden & Don H. Zimmerman (Eds.), Talk and social structure: Studies in ethnomethodology and conversation analysis (pp.251-77). Cambridge: Polity.

Fele, Giolo (2007). La communication dans l'urgence. Les appels au secours téléphoniques. Revue Française de Linguistique Appliquée, XI(2), 33-51.

Fornel, Michel de (1994). Le cadre interactionnel de l'échange visiophonique. Réseaux, 64, 107-132.

Garfinkel, Harold (1967). Studies in ethnomethodology. Englewood Cliffs, N.J.: Prentice-Hall.

Garfinkel, Harold (1996). An overview of ethnomethodology's program. Social Psychology Quarterly, 59(1), 5-21.

Goffman, Erving (1963). Behavior in public places: Notes on the social organization of gathering. New York: Free Press.

Goodwin, Charles (1981). Conversational organization: Interaction between speakers and hearers. New York: Academic Press.

Goodwin, Charles (1996). Transparent vision. In Elinor Ochs, Emanuel A. Schegloff & Sandra A. Thompson (Eds.), Interaction and grammar (pp.370-404). Cambridge: Cambridge University Press.

Goodwin, Charles (2000a). Action and embodiment within situated human interaction. Journal of Pragmatics, 32, 1489-522.

Goodwin, Charles (2000b). Practices of color classification. Mind, Culture and Activity, 7(1-2), 19-36.

Hayashi, Makoto (2005). Joint turn construction through language and the body: Notes on embodiment in coordinated participation in situated activities. Semiotica, 156(1/4), 21-53.

Heath, Christian (1986). Body movement and speech in medical interaction. Cambridge: Cambridge University Press.

Heath, Christian & Luff, Paul (1992). Collaboration and control: Crisis management and multimedia technology in London Underground Line Control Rooms. Journal of Computer Supported Cooperative Work, 1(1-2), 69-94.

Heath, Christian & Luff, Paul (2000). Technology in action. Cambridge: Cambridge University Press.

Heritage, John C. (1984). A change-of-state token and aspects of its sequential placement. In John Maxwell Atkinson & John Heritage (Eds.), Structures of social action (pp.299-345). Cambridge: Cambridge University Press.

Hopper, Robert (1992). Telephone conversation. Bloomington: Indiana University Press.

Hutchby, Ian (2001). Conversation and technology: From the telephone to the Internet. Cambridge: Polity Press.

Hutchby, Ian & Barnett, S. (2005). Aspects of the sequential organization of mobile phone conversation. Discourse Studies, 7(2), 147-171.

Kendon, Adam (1967). Some functions of gaze-direction in social interaction. Acta Psychologica, 26, 22-63.

Kendon, Adam (1990). Conducting interaction. Patterns of behavior in focused encounters. Cambridge: Cambridge University Press.

Licoppe, Christian & Relieu Marc (2005). Entre système et conversation. Une approche située de la compétence des téléopérateurs dans les services d'assistance technique. In Emmanuel Kessous & Jean-Luc Metzger (Eds.), Le travail avec les technologies de l'information (pp.177-199). Paris: Hermès.

Lomax, Helen & Casey, Neil (1998). Recording social life: Reflexivity and video methodology. Sociological Research Online, 3(2), http://www.socresonline.org.uk/3/2/1.html [August 12, 2008].

Luff, Paul & Heath, Christian (1993). System use and social organisation: observations on human-computer interaction in an architectural practice. In Graham Button (Ed.), Technology in working order: studies of work, interaction and technology (pp.184-210). London: Routledge.

Luff, Paul; Hindmarsh, Jon & Heath, Christian (Eds.) (2000). Workplace studies. Recovering work practice and informing system design. Cambridge: Cambridge University Press.

Lynch, Michael (2002). From naturally occurring data to naturally organized ordinary activities: comment on Speer. Discourse Studies, 4, 531-37.

Lynch, Michael & Boden, David (1994). Harvey Sacks's primitive natural science. Theory, Culture & Society, 11(4), 65-104.

Macbeth, Douglas (1999). Glances, trances, and their relevance for a visual sociology. In Paul L. Jalbert (Ed.), Media studies: Ethnomethodological approaches (pp.135-170). Lanham: University Press of America.

McQuown, Norman A. (1957). Linguistic transcription and specification of psychiatric inteview materials. Psychiatry, 20(1), 79-86.

McQuown, Norman A. (Ed). (1971). The natural history of an interview. Chicago: University of Chicago Library. [Microfilm collection of manuscripts on cultural anthropology, 95(xv)]

Mondada, Lorenza (2006a). Video recording as the reflexive preservation-configuration of phenomenal features for analysis. In Hubert Knoblauch, Bernt Schnettler, Jürgen Raab & Hans-Georg Soeffner (Eds), Video analysis. Methodology and methods. Qualitative audiovisual data analysis (pp.51-68). Bern: Lang.

Mondada, Lorenza (2006b). Multiactivité, multimodalité et séquentialité : l'initiation de cours d'action parallèles en contexte scolaire. In Marie-Cecile Guernier, Viviane Durand-Guerrier & Jean-Pierre Sautot (Eds).,Interactions verbales, didactiques et apprentissage (pp.45-72). Besançon: Presses Universitaires de Franche Comté.

Mondada, Lorenza (2007a). Multimodal resources for turn-taking: Pointing and the emergence of possible next speakers. Discourse Studies, 9(2), 195-226.

Mondada, Lorenza (2007b). Transcript variations and the indexicality of transcribing practices. Discourse Studies, 9(6), 809-821.

Mondada, Lorenza (2007c). L'imbrication de la technologie et de l'ordre interactionnel. L'organisation de vérifications et d'identifications de problèmes pendant la visioconférence. Réseaux, 144, 141-182.

Mondada, Lorenza (2007d). Operating together through videoconference: Members' procedures accomplishing a common space of action. In Stephen Hester & David Francis (Eds.), Orders of ordinary action (pp.51-68). Aldershot: Ashgate.

Mondada, Lorenza (in press), The systematic organization of concurrent courses of action within multi-activity. In Charles Goodwin, C. LeBaron & Jürgen Streeck (Eds), Multimodality. Cambridge: Cambridge University Press.

Relieu, Marc (2002), Ouvrir la boîte noire: identification et localisation dans les conversations mobiles. Réseaux, 112-113, 19-48.

Relieu, Marc (2005). Les usages des TIC en situation naturelle : une approche ethnométhodologique de l'hybridation des espaces d'activité. Intellectica, 41-42(2-3), 139-162.

Relieu, Marc (2007). La téléprésence, ou l'autre visiophonie. Réseaux, 144, 183-223.

Sacks, Harvey (1963). Sociological description. Berkeley Journal of Sociology, 8, 1-16.

Sacks, Harvey (1966). The search for help: no one to turn to. Unpublished PhD dissertation, University of California, Berkeley.

Sacks, Harvey (1972). An initial investigation of the usability of conversational data for doing sociology. In David Sudnow (Ed.), Studies in social interaction (pp.31-74). New York: Free Press.

Sacks, Harvey (1984). Notes on methodology. In John Maxwell Atkinson & John Heritage (Eds.), Structures of social action (pp.21-27). Cambridge: CUP.

Sacks, Harvey (1992). Lectures on conversation. Vol.I+II, Oxford: Blackwell.

Sacks, Harvey & Schegloff, Emanuel A. (1971/2002). Home position. Gesture, 2, 133-146.

Scheflen, Albert E. (1972). Body language and social order: Communications as behavioral control. Englewood Cliffs: Prentice Hall.

Schegloff, E.A. (1967). The first five seconds: The order of conversational openings. Unpublished PhD dissertation, University of California, Berkeley.

Schegloff, Emanuel A. (1968). Sequencing in conversational openings. American Anthropologist, 70, 1075-1095.

Schegloff, Emanuel A. (1972). Notes on a conversational practice: Formulating place. In David Sudnow (Ed.), Studies in social interaction (pp.75-119). New York: Free Press.

Schegloff, Emanuel A. (1979). Identification and recognition in telephone openings. In George Psathas (Ed.), Everyday language (pp.23-78). New York: Erlbaum.

Schegloff, Emanuel A. (1984). On some gestures' relation to talk. In John Maxwell Atkinson & John Heritage (Eds.), Structures of social action (pp.266-296). Cambridge: Cambridge University Press.

Schegloff, Emanuel A. (1986). The routine as achievement. Human Studies, 9, 111-52.

Schegloff, Emanuel A. (1992). Introduction. In Gail Jefferson (Ed.), Sacks, H. Lectures on conversation, vol. I (pp.ix-lxv) Oxford: Blackwell.

Schegloff, Emanuel A. (2002). Beginnings in the telephone. In James E. Katz & Mark Aakhus (Eds.). Perpetual contact: mobile communication, private talk, public performance (pp.284-300). Cambridge: Cambridge University Press.

Schegloff, Emanuel A. (2007). Sequence organization in interaction: A primer in conversation analysis. Vol. 1. Cambridge: Cambridge University Press.

Schegloff, Emanuel A. & Sacks, H. (1973). Opening up closings, Semiotica 8(4), 289-327.

Schutz, Alfred (1962). Collected Papers, Vol. 1: The Problem of Social Reality. The Hague: Martinus Nijhoff.

Speer, Susan A. (2002). "Natural" and "contrived" data: A sustainable distinction? Discourse Studies, 4, 511-525.

Stivers, Tanya & Sidnell, Jack (2005). Multi-modal interaction. Semiotica, 156(1/4), 1-20.

Streeck, Jürgen (1993). Gesture as communication I: its coordination with gaze and speech. Communication Monographs, 60, 275-299.

Streeck, Jürgen (1996). How to do things with things: Objets trouvés and symbolization. Human Studies, 19(4), 365-384.

Suchman, Lucy (1993). Technologies of accountability: Of lizards and airplanes. In Graham Button (Ed.), Technology in working order: Studies of work, interaction and technology (pp.113-126). London: Routledge.

Suchman, Lucy (1996). Constituting shared workspaces. In David Middleton & Yrjö Engeström (Eds.), Cognition and communication at work (pp.35-60). Cambridge: Cambridge University Press.

Wakin, Michele A. & Zimmerman, Don H. (1999). Reduction and specialization in emergency calls. Research on Language and Social Interaction, 32(4), 409-437.

Whalen, Jack (1995). A technology of order production: computer-aided dispatch in public safety communications. In Paul ten Have & George Psathas (Eds.), Situated order (pp.187-230). Washington: University Press of America.

Whalen, M., & Zimmerman, D. H. (1987). Sequential and institutional contexts in calls for help. Social Psychology Quarterly, 50, 172-185.

Whalen, Jack; Whalen, Marilyn & Henderson, Kathryn (2002). Improvisational choreography in teleservice work. British Journal of Sociology, 53(2), 239-258.

Zimmerman, Don H. (1992). The interactional organization of calls for emergency assistance. In Paul Drew & John Heritage (Eds.), Talk at work (pp.418-469). Cambridge: Cambridge University Press.


Lorenza MONDADA, ICAR research lab (CNRS & University of Lyon). Lorenza MONDADA is Professor at the Department for Linguistics, University of Lyon 2, and Director of the ICAR research Lab (CNRS). Her research deals with the organization of grammatical and multimodal practices in interaction. Current research is carried out on video-recordings from various institutional and professional settings, and on ordinary conversations, focusing on the ways in which participants sequentially and multimodally organize their (often multiple) courses of action. Recent publications include: (2003). Working with Video, Visual Studies, 18(1), 58-72; (2004). Ways of doing: "'Being plurilingual'' in international work meetings. In Rod Gardner & Johannes Wagner (Eds.), Second language conversations. London: Continuum; (2007). Multimodal resources for turn-taking: Pointing and the emergence of possible next speakers. Discourse Studies, 9(2), 195-226; (2007). Operating together through videoconference. In Steve Hester & Dave Francis (Eds.), Orders of ordinary action (pp. 51-67). London: Aldershot; (2007). Bilingualism and the analysis of talk at work. In Monica Heller (Ed.). Bilingualism (pp.297-318), New York: Palgrave.


Lorenza Mondada

F-69342 Lyon Cedex

E-mail: lorenza.mondada@univ-lyon2.fr
URL: http://icar.univ-lyon2.fr/membres/lorenza/


Mondada, Lorenza (2008). Using Video for a Sequential and Multimodal Analysis of Social Interaction: Videotaping Institutional Telephone Calls [88 paragraphs]. Forum Qualitative Sozialforschung / Forum: Qualitative Social Research, 9(3), Art. 39, http://nbn-resolving.de/urn:nbn:de:0114-fqs0803390.

Revised 10/2008

Copyright (c) 2008 Lorenza Mondada

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.