Keywords

1 Introduction

Face-to-face conversation between humans implies a moment-by-moment organization of turns at talk, artifacts use and body postures that provide for the accountability of what is going on, what has been done, and what one could expect to be done. By ‘accountability’ is meant that social practices contain their very intelligibility as they occur in the here-and-now of activities. Analysts can rely on this situated intelligibility to describe and analyze how social activities are organized [17]. Social activity is structured as an emergent product of interrelations between sequential organization of talk, gestures, cognition and objects from the environment [22, 34, 43]. Therefore, social activity fait système and becomes a unit for analysts, because it is accountably treated as one by the social actors in the first place. Following Charles Goodwin’s definition of symbiosis [22], we will consider that through social interaction, humans are meant to organize some ‘wholes’ that are both different from, and greater than their parts, and constructed through the mutual interdependence of unlike elements.

There are many ways to describe the social ballet that is wound up each time people are co-present – and that would inspire research in Human-Machine Interaction. In this paper we present data collected from an experiment where epistemic balance was at stake. Essentially, in the frame of a musical quiz, we created situations where the robot would at some delimited occasions step into a person’s epistemic territory. The intended encroachment was triggered through a reflexive utterance produced by the robot during the interaction, that is, a turn-at-talk that would reflect traces of the person’s own activity [4]. The emergence of the reflexive turns relied on the participants’ physiological measurements that were measured with a connected Empatica E4 wristband [11, 16]. This epistemic encroachment was regularly responded to by the recipients. A close analysis of the following sequential environment ‘Reflexive Turn - Response’, will lead us to account for a collection of resources used by the participants in order to create a familiar social solidarity/synergy.

2 CA, Quantified-Self and HRI

Leaning on human-human (or animal) interactions to support conversational agents or robots’ design, relies on a robust literature [5, 10, 13, 14, 28]. In the light of the development of social robots or socially interactive robots [13], concepts drawn from sociology such as connection, co-presence and cooperation, figure in definitions of engagement as ways to describe commitments or partnership in Human-Machine/Robot interactions [36] (HRI, HCI). However, such concepts are rarely scrutinized in their practical and sequential achievement. For the few researchers in HRI or HCI who have used it, Conversational analysis (CA), as a descriptive and naturalistic approach using interaction itself as a resource for analysis [45], provide analytical and methodological tools either to account for the moment-by-moment accomplishment of Human-Machine situated interactions [40, 41], either to design systems, and therefore anticipate on further interactions [37]. The present work is a contribution to this approach that aims at accounting for the sociality of robots as a practical accomplishment and therefore proposes an interactionist perspective on symbiosis. We assumed that giving access to the robot to something about the personal territory of an individual, like physiological data, could be questionable as something that helps, or somehow has an effect on the relation between this human and that robot.

The concepts of Quantified self and self monitoring are novel ideas to the field of HRI and we (the authors) are not aware of any experiments where this concept has ever been used in human-robot interaction. There are however several studies in the domain of human-computer interaction where the concept of quantified self has been employed. Li, Dey and Forlizzi [31] discuss how we can develop tools for analyzing the data that we collect about ourselves and how we can better perform self-reflection using this data. Human-computer interaction researchers have applied the concept of Quantified Self and self-tracking to diverse domains: to study electricity consumption [39], transportation habits [15], eating habits [46], and exercise habits [6]. While the use of smartphones and wearable devices continues to increase, we are finding new uses for these devices for self-monitoring such as measurement of our physical activity [12, 48], tracking our sleep patterns [1] and even looking at changes in our mood over time [50] among others which have become commercial successes. The E4 sensor and its predecessors have been used in a variety of experiments by researchers. Pieper and Laugero [38] used features collected using the Q sensor (predecessor to E4) in a study on preschool children and their emotional eating habits. Hernandez et al. [27] utilized the E4 to study stress during driving. In our experiment, we utilized only three of these measures: galvanic skin response [3, 7], pulse rate and peripheral skin temperature and two derived measures: slope of the galvanic skin response and change in pulse rate.

3 Experiment’s Set up

3.1 Woz Set up

The experiment was carried out at LIMSI-CNRS. Using a Wizard of Oz, the robot Nao (Aldebaran Robotics) is remotely controlled by a human who observes the course of the conversation and reacts accordingly. The Wizard of Oz system used was an adaptation of previous systems developed at LIMSI-CNRS [8, 49]. The content of each scenario is predefined and Nao (that is, the human wizard) follow a conversation tree to perform the next action (uttering a text, give reflexive information, playing a song). The operator of the robot was responsible for making the robot utter turns from a repository available to the operator through a Graphical user interface. The operator also had access to real time physiological parameters measured by the E4 wristband (see Fig. 1). On the basis of the changes in the physiological signals and the chosen profile of the operator discussed in Sect. 3.3, the operator decided to make the robot utter reflexive turns related to changes in physiological states either only at the start and end of the quiz or each time a change was detected.

Fig. 1.
figure 1

Woz set up scheme

3.2 Activity’s Scenario: The Musical Quiz

The interaction was based on a straightforward musical quiz. After a short greetings sequence, the game started. We asked participants to play at least 3 rounds of the game, so that we could use different profiles (see below). Each round of the quiz contained 4 short extracts of music selected randomly from a music library containing music from various genres. After the extract was played (through the same integrated Nao’s speakers), the participant had 30 s to guess the artist and the title of the song. During this search, the participant could be interrupted by the robot producing a reflexive turn if significant physiological variations were detected. Besides, at the beginning of the game as well as at the end of each round, the robot produced a quantitative reflexive turn by uttering the measurements.

3.3 Turn Design and Robot’s Profile

The robot’s reflexive utterance design relies, first, on an exploration of pragmatic dimensions. That is, we wanted to encompass an occurrences’ spectrum from «giving straightforwardly piece of information» to «provide the participant with a warning, a council». Second, we paid attention to the distribution of the turns along an epistemic gradient [24, 30]. That is, some utterances would display a primary knowledge access from the robot’s point of view, whereas others would be more balanced towards the participant. Third, we have drawn from the pair Warmth/Competence that is used to study the believability of AI [9], to build the robot’s profile. That is, some utterances would be supposed to display a Competence feature (e.g. «your heart beat is X»; «you are stressed»; «your heart beat is rising»), while others would rather display some empathy or care (e.g. «are you stressed because you can’t handle it?»).

3.4 Data Collected

In the first session of our experiment, the robot operator was visible to the participants but the participants were not aware if the operator was controlling the robot or was just observing the functioning of the robot. In the 2 following sessions, the operator was hidden from the participants and therefore the experiment followed the Wizard of Oz paradigm. The number of subjects is 12 (5 men and 7 women), recorded over 3 sessions (3 participants in the first session, 5 in the second, and 4 in the third respectively) making up around 3 h of data. In the third session we proposed to a participant to attend to the interaction with her friend and colleague. 97 reflexive sequences were selected out of which 28 have been accurately transcribed following the CA methods.

4 Interactional Symbiosis as a Moment-by-Moment Achievement

Largely, we observed that reflexive turns had recurrently an effect on participants interacting with the robot. That is to say, when the robot claims, one way or another, an access to the participant’s personal territory, the latter displays some reaction to it. We intend to show that those reactions account for socio-organic solidarities as proofs of interactional symbiosis. First, there are behaviors that are general to social interactions, namely co-presence management issues and preference for agreement. Second, there are more specific practices related to the epistemic encroachment that participants undergo in the experiment.

4.1 Working Out Social Synergy (1): Co-presence Management

As the robot gives precise measurements, we observed in the data that participants would regularly display body postures, that could be described as ‘on-rendering-process faces’. Here is an example with an accurate multimodal transcriptionFootnote 1:

There are indeed ways to display that one is taking informations into account without disturbing the ongoing accomplishment of the speaker’s turn. What is striking here, is that the participant is not only displaying this, but she’s also managing a basic co-presence problem [19]: through a meticulous to and fro eye (and head) movement (L03), she operates the possible actions enabled by turn constructional units organization [21, 35, 45] to keep track of both the participation frame and the delivered information on herself. In other words she’s considering the robot’s turn as a component of a larger organizational process: we can see here how body posture, eyes movements and speech are entangled, in order to structure an intricate event like ‘receiving information about oneself’. Moreover, such methods that consist in moving body orientation from the speaker’s ‘face’ to an alternative (imaginary) space where one can accountably think over (i.e. showing a process of thinking), illustrate how treating the robot as a socio-interactional partner could be achieved, in the present of interaction.

This phenomenon can be even more intricate. In the following extracts two participants interact with the robot, P2 is attending to the quiz, P is the one officially ‘connected’ to the robot:

Broadly, as the participants are provided several times with ‘their’ measures during the game, we observed that one thing they could do is ordering them in a temporal frame. Like in this extract (2), participants can use the new measure to proceed an update/summary. Let’s focus on P2’s behavior. As P is provided with her measures from N, P2 is drawing in the air a curve (Fig. 2). This curve accounts for an analysis of P’s heart rate evolution. This metaphoric gesture [33] is achieved during the stream of N’s speech that is directed towards P. Therefore, insofar as the primary exchange is the one between P and N, P2’s conduct is configured as a secondary scene or byplay [20]. This byplay functions as a co-expressive way to deal both with her friend’s reflexive turn, and with the specific participation frame [21].

Fig. 2.
figure 2

Multimodal byplay from the participant’s friend

As we’ve seen that different kinds of semiotic resources are used in order to manage co-presence and to structure ‘wholes’, we can now turn to other phenomena, more specific to the experiment, for they have to do with epistemic (re)configuration.

4.2 Working Out Social Synergy (2): Epistemic Balance and Preference for Agreement

We found that participants display more elaborated reactions in face of reflexive turns that, instead of giving a precise measurement, produce other kinds of actions like interpretations, assessments, warnings. Extracts (3), (4) above present two different methods that participants can use to respond.

In extract (3), P couldn’t find the name of the music played. N produces a declarative turn that points out the participant’s emotional state. This is a kind of turn that call for agreement or disagreement [2, 47, 51]. P in return, displays an affiliation towards the reflexive turn with «ouais», and attaches it with an account that provides an explanation regarding why she may be indeed stressed – namely because she doesn’t know much about classical music. As we can see in the transcript this turn is simultaneously accomplished with a «no» head shake (L03). This way P associates a negative valence to her turn. While we can’t say for sure what object this valence refers to, this is at least a conduct that contributes to the affiliation of the participant towards the reflexive turn. Moreover, this way of building a turn is congruent with the way two persons preferentially behave in agreement sequences: first you refer to what is projected in the previous turn and then you deliver your position [26, 42, 44]. In extract (4), the participant failed in finding out the song title (and was quite sorry about that). N produces a warning that refers to an analysis of her heart beat increasing. Here again, we can observe an affiliation towards the robot’s turn through an assessment – namely that being stressed, is bad news [32].

Informations exchange is not only a matter of input-output mechanism. Informations such as that we’re concerned with, lie on a domain, or territory on which social actors have stratified access. That is, they occupy during the interaction an epistemic position on a gradient that extends from knowledgeable to less (or no) knowledgeable [24, 25, 30]. And during social interactions, participants may claim, negotiate, confirm, discard epistemic positions. The ways participants react to Nao’s turns, demonstrate that they configure those turns as components of a specific work, and that is: dealing ‘together’ with epistemic configurations.

Finally, whereas people consider having privileged access to their own experience, and so consider having specific rights to say something about it [2, 26], disagreement or confrontation may be problematic for the participants’ face, in a Goffman sense:

In line 01, N produces a polar question that embodies an explanation as a candidate for being stressed. This candidate is rejected by P (L03) but followed with an assessment (L06) that displays an interesting analysis of the very candidate: by showing a moral perspective on the emotional state mentioned in N’s turn, she analyzes it as an account of shame. Hence, without denying N’s epistemic authority to claim that she might be stressed, P challenges the robot’s account candidate in order to justify an epistemic re-balancing [24]. This challenge is allowed by the very format of N’s turn – a polar question in which the recipient is entrusted with a knower stance. Moreover, what is striking here is the way the participant manages the juxtaposition of balancing the epistemic configuration with the problem of the preference for agreement: this is in fact not trivial that the «no» on line 03 is followed by a mitigation mark «pas trop», and an account (L06) that justifies the disagreement as a way to preserve both hers and the robot’s face [18, 29]. That is to say, P behaves as if the robot was indeed a ritually delicate object. Therefore, this peculiar extract, shows that the notion of symbiosis may encompass a moral dimension.

5 Conclusion

In this experiment we used a quantified-self device to provide the robot with the presupposition of a specific epistemic authority vis-à-vis the participant, and we tested this authority through reflexive sequences. Largely, we found (1) that reflexivity taken care of by the robot, has an effect on the participants’ behavior, as a step into their personal epistemic territory, and (2) that the persons display practices that show the analogous commitment as in human-human interactions regarding the preferential organization of turns-at-talk in terms of adjacency, agreement, and epistemic balance. Even if they are aware of the robot’s limitations, participants display an attention to organize a participation framework (with rights and obligations), in which the robot is treated like a participant in its own right. Organization is what binds elements, events or individuals in a symbiotic relationship, that is, a potential synergy in which different sign systems work together to build relevant action and accomplish consequential meaning. Therefore, the scope of turn-design must not be limited to stream of speech phenomena (grammar, speech acts), but must encompass structures providing for the organization of the endogenous activity systems within which strips of talk are embedded [23]: 370.

As Foster [14] put it, detailed analysis of interactional behaviors offers opportunities for socially interactive robots design improvements, that is: identify and reproduce human ordinary skills (perception, practical reasoning, gesture…) in order to make the machines more adaptable regarding interactional situations (assistive robots in the home environment, companion robot). We observed in the data that the participants treat the robot as a body, a presence. It shows that social interaction, as Goffman identified it decades ago, is first, before talking, a matter of hand-to-hand management: entering in a mutual perceptive field, focusing on a common object – depending on the participation footing of the participant. Moreover, we introduced the problem of epistemic balance, and observed some occurrences of plasticity of the epistemic configurations. Epistemic balance phenomena analysis points out that social synergy is a dynamic process: status in the interaction are to be defined and may be negotiated. Epistemic configurations play also a fundamental role in the way higher-order classes of action such as suggestions, proposals, or offers are dealt with in the interaction. We believe this is an entire issue to explore in HRI.

Hence, we observed all sorts of practices that are grist to the interactional mill and accountable arguments for a view of humans working at establishing and maintaining socio-organic solidarities with a robot. This was done in a short term and a quite artificial setting. Prospectively, we would need a larger amount of data to extend the analysis towards (a) more natural interactions, (b) more differentiation in reflexive turn-design impact, (c) better understandings of the usefulness of physiological measurements as resources for Human-Machine situated symbiosis.