Information Processing and Management 48 (2012) 32–46
Contents lists available at ScienceDirect
Information Processing and Management journal homepage: www.elsevier.com/locate/infoproman
A study of awareness in multimedia search Robert Villa ⇑, Joemon M. Jose School of Computing Science, University of Glasgow, Glasgow, UK
a r t i c l e
i n f o
Article history: Received 2 June 2009 Received in revised form 3 March 2011 Accepted 15 March 2011 Available online 5 April 2011 Keywords: Collaborative retrieval Awareness Video retrieval
a b s t r a c t Awareness of another’s activity is an important aspect of facilitating collaboration between users, enabling an ‘‘understanding of the activities of others’’ (Dourish & Bellotti, 1992). In this paper we investigate the role of awareness and its effect on search performance and behaviour in collaborative multimedia retrieval. We focus on the scenario where two users are searching at the same time on the same task, and via an interface, can see the activity of the other user. The main research question asks: does awareness of another searcher aid a user when carrying out a multimedia search session? To encourage awareness, an experimental study was designed where two users were asked to compete to find as many relevant video shots as possible under different awareness conditions. These were individual search (no awareness), Mutual awareness (where both users could see the other’s search screen), and unbalanced awareness (where one user is able to see the other’s screen, but not vice-versa). Twelve pairs of users were recruited, and the four worst performing TRECVID 2006 search topics were used as search tasks, under four different awareness conditions. We present the results of this study, followed by a discussion of the implications for multimedia information retrieval systems. Ó 2011 Elsevier Ltd. All rights reserved.
1. Introduction The growth of online multimedia resources is continuing to rise, with web sites such as YouTube1 and Flickr2 allowing users access to increasing quantities of multimedia data. The problem of accurately and quickly searching multimedia documents in such systems has accordingly grown in significance: users now have an increased ability to obtain and store multimedia files, but often have no effective facilities available to search the data. One possible approach to the searching of such multimedia collections is the use of collaboration search, where two or more individuals search together to find new material (Adcock et al., 2007; Adcock & Pickens, 2008; Adcock, Cooper, & Pickens, 2008). While search is often considered an activity carried out alone, recent research has suggested that collaborating together on search tasks is not unusual. The survey carried out by Morris (2008) found that 53.4% of the respondents answered ‘‘yes’’ to the question ‘‘Have you ever cooperated with other people to search the Web?’’. One important aspect of collaboration is awareness, which enables ‘‘an understanding of the activities of others, providing a context for your own activity’’ (Dourish & Bellotti, 1992). When engaged on a collaborative task, awareness of others allows a user to know who is doing what, enabling the coordination and sharing of information. In this paper, we are interested in how awareness of another user’s searching may or may not aid a different user searching at the same time, but in a different place. Specifically, we are interested in investigating three main questions: (1) does having awareness of another user alter a
⇑ Corresponding author. Tel.: +44 (0)141 330 1638. 1 2
E-mail addresses:
[email protected] (R. Villa),
[email protected] (J.M. Jose). www.youtube.com. www.flickr.com.
0306-4573/$ - see front matter Ó 2011 Elsevier Ltd. All rights reserved. doi:10.1016/j.ipm.2011.03.005
R. Villa, J.M. Jose / Information Processing and Management 48 (2012) 32–46
33
user’s search performance? (2) does awareness alter a user’s searching behaviour? and (3) what parts of a user’s searching process is most useful to another user engaged on the same task? In order to investigate these questions, an experiment was carried out which used a game scenario, where pairs of users competed against each other to find as many relevant video shots as possible, using a video retrieval system.3 With this set up, the aim is to provide an incentive for users to be aware of the other user, to investigate how awareness may or may not aid searching in an environment which encourages its use. Motivating this research is the continuing need for methods of improving retrieval within the context of a community of users. We theorize that creating community based systems where users can be automatically connected together depending on their immediate information needs may provide opportunities for richer search collaborations between users. Such collaborations are of especial interest in multimedia searching, where the performance of video and image search engines are currently very low, and hampered by the problems inherent in the semantic gap (Smeulders, Worring, Santini, Gupta, & Jain, 2000). The semantic gap is the difference between the low-level visual features which can be extracted and used by content-based search systems (such as color and texture), and the high-level concepts in which users work and search. Given this, encouraging and facilitating collaboration between users is one way of attempting to mitigate the current problems in multimedia retrieval. The information retrieval domain which is considered here is that of multimedia video retrieval, as exemplified by the TRECVID effort (Smeaton, Over, & Kraaij, 2006c). The problems inherent in video retrieval, and in particular the problem of the semantic gap, means that this is a challenging research area. Current state of the art systems, such as those taking part in TRECVID-2006 (Smeaton & Over, 2006), still show a performance considerably below that of their text-based cousins. We are principally concerned with two types of awareness: Mutual awareness, as considered by Schmidt (1998) and Simone and Bandini (2002), where two (or more) users are both aware of each other in a joint collaboration; and Watching awareness, where user A is able to observe the searching of user B, while user B cannot observe A. This latter type of awareness leads to a further type, which we call Watched awareness, where user A knows user B is able to watch their searching, but cannot watch user B. The latter two types of awareness are, of course, two different sides of an unbalanced awareness between two users. The reason for being interested in such cases is due to its potential occurrence in a large application: at any point in time, there may be many potential watchers, and many others being watched, but neither of these situations may be balanced–unbalanced awareness situations may well take place. The potential effect of such unbalanced awareness on search is an open question. One aspect which will not be dealt with in this paper is that of privacy, an issue which will undoubtedly be a problem for real-world systems. Privacy, however, does not preclude studies of the potential advantages, if any, of synchronous awareness in multimedia information retrieval. In the study reported here, we are interested in investigating the potential benefits of awareness, within a scenario which is conductive to awareness itself. The remainder of this paper is structured as follows. Section 1.1 outlines the research questions considered in this paper, followed by Section 2 which describes related work. Section 3 describes the experiment, outlining the experimental design, the data collection used (Section 4), the experimental procedure (Section 5), and the interfaces (Section 6). Section 7 presents the results, which is followed by a discussion in Section 8. The paper ends with conclusions and future work. 1.1. Research questions Four main research questions are considered in this paper:
RQ1: RQ2: RQ3: RQ4:
Does awareness alter a user’s search performance? Does awareness alter a user’s searching behaviour? Does having awareness of another alter the effort a user must put in to searching? What parts of a user’s searching process is most useful to another user engaged on the same task?
As shorthand, in this paper we will use the term ‘‘awareness’’ when meaning ‘‘awareness of another user engaged on the same task’’. For RQ1, we are most interested in a user’s ability to find more relevant material, while for RQ2 the emphasis is on how user searching behaviour changes, e.g. does the number of queries executed change? RQ3 is concerned with quantifying the effort users put into searching, and in particular, for the video retrieval interfaces considered in this paper, the effort put into browsing videos. The final research question is broader than the other three, being a more opened ended investigation into what parts of the remote user’s search screen is of most use to another user, e.g. are search terms, search results, etc. copied and used by a local user. 2. Previous work The importance of awareness in collaboration has been studied in a range of different academic areas, including Human Factors (Adams, Tenney, & Pew, 1995; Endsley, 1995), and Computer Supported Cooperative Work (Dourish & Bellotti, 1992;
3 This paper combines and extends the analysis in Villa, Gildea, and Jose (2008b) Villa, Gildea, and Jose (2008a); additions include Sections 7.7 and 7.10 among others.
34
R. Villa, J.M. Jose / Information Processing and Management 48 (2012) 32–46
Hutchins, 1995; Gutwin & Greenberg, 2004). At an informal level, awareness can be considered as ‘‘knowing what is going on’’ (Endsley, 1995; Gutwin & Greenberg, 2004), and can be related to ‘‘Situational Awareness’’ (Adams et al., 1995; Endsley, 1995). In Endsley (1995), a model of situational awareness is defined, where it is considered as an active decision making process with three levels: the perception of elements in the current situation; the comprehension of the current situation; and the projection of a future status. Such a concept is not restricted to collaborations and the workings of teams, also being important in more general situations, such as the flying of an aircraft. Endsley (1995) explicitly extends the concept of an individual’s situational awareness to that of a team, where a team working together may have an overall awareness which is composed of the awareness of the individuals in the team. Based on the definitions from Adams et al. (1995) and Endsley (1995), the work of Gutwin and Greenberg (2004) considers workspace awareness in groupware systems, which are defined as the continual understanding of the interactions of other users with a shared workspace. In this work awareness has moved from the awareness of a general situation, which can include location, environment, and controlled machinery such as an aircraft, to specifically consider the importance of the awareness of another user in remote communication, such as the shared workspace provided by a computer-supported collaboration tool. Possible approaches to awareness in collaborative awareness tools are discussed in Dourish and Bellotti (1992). In Schmidt (1998), two types of awareness are defined: Mutual awareness (also simply called awareness) which is defined as the awareness of a cooperative activity where two of more actors A and B are both aware of each other’s activity; and reciprocal awareness, which is defined as ‘‘A’s awareness of B’s awareness of A’’ (emphasis from Schmidt (1998)). Following on from Schmidt (1998), Simone and Bandini (2002) classified awareness into two classes: by-product awareness, and add-on awareness. By-product awareness refers to the information which is communicated implicitly in the course of a user’s work. Add-on awareness is defined as additional activities which a user may engage in to promote awareness of their actions. Some examples given in Simone and Bandini (2002) include the act of annotating a form, which thereby notifying others of future situations. As pointed out in Simone and Bandini (2002), the extra communication cost (the extra effort required of an individual) differs between the two types. In by-product awareness, the cost is borne by those who must acquire and maintain awareness, while for add-on awareness, the cost is borne more by the individual who is communicating the awareness information. In this paper, the principal interest is in by-product awareness, both mutual and reciprocal, where there is no explicit communication (such as talking) between the users engaged on the task. The role and impact of awareness in collaborative information systems has been considered in previous work, both in a textual and multimedia context. Smeaton, Foley, Gurrin, Lee, and McGivney (2006b) introduced the Fischlar-Diamond Touch (Fischlar-DT) system, describing the typical user interaction process and specifically how the Fischlar-DT system supports a collaborative search process between two users. The Fischlar-DT system makes use of a large touch-screen display surface, allowing users to share the same display allowing for greater Mutual awareness. Smeaton et al. explore the issues associated with designing and implementing a collaborative multimedia retrieval system, of particular interest is the description of user interface design elements targeted specifically at addressing awareness issues. Continuing the work, Smeaton, Lee, Foley, and McGivney (2006a) present a user evaluation of the previously developed Fischalar-DT system by a pair of users in a collaborative video retrieval task. The aim of the user evaluation was to compare two user interfaces: the first designed to maximise individual search efficiency at the expense of awareness, the second to maximise awareness of the other user’s actions at the expense of individual efficiency. Analysis of the results yielded a surprising result: the interface designed for awareness is shown to clearly outperform the interface designed for efficiency (the assessed measures being mean average precision, precision after 10 retrieved shots, and recall). It is suggested that the awareness version of the interface encourages greater coordination between users, for example being able to more effectively split the work load to avoid repetition. Despite the Fischlar-DT system making use of an unorthodox user interface the strength of the results presented suggest that awareness plays an integral role in search performance in collaborative video retrieval systems and that collaboration can have a positive impact upon search performance. There have also been studies looking at the effectiveness of collaboration in web-based textual information retrieval tasks. Using the results of a survey of over 200 knowledge workers, Morris (2007) describes a set of recommendations for collaborative search systems to follow. Following on from this work in Morris et al. (2007), Morris describes a collaborative retrieval system which aims to provide all the functionality that users need to effectively search the web together, e.g. a messaging system, the ability to recommend a page to the other user, awareness of the other user’s queries, and so on. The paper describes a user evaluation performed to assess the system. Users were invited to assess the system on the three previously identified key areas: awareness of the other user, division of labor between the users and persistent storage of the generated results. The participants of the study found that ‘‘awareness was the most valuable aspect of SearchTogether’s design’’, with features which supported awareness being among the most utilised and highly-rated in the system. While the conclusions reached by Morris are derived from a user evaluation on a text-based retrieval system, they can be applied to any collaborative document retrieval system. While this body of work makes clear the importance of awareness in collaboration within the information retrieval domain, another aspect of the work reported in this paper is the element of competition: in work such as Smeaton et al. (2006a) and Adcock et al. (2007), for example, the collaboration between the individuals is constructive rather than competitive. As a tool for experimenters, the use of competitive games have, however, have a long history in Psychological research (Pruitt & Kimmel, 1977; Colman, 1982). Such work, based on game theory and perhaps exemplified by the prisoner’s dilemma game (Nemeth, 1972), involved the study of how people behave in different situations, where an individuals decisions partially
R. Villa, J.M. Jose / Information Processing and Management 48 (2012) 32–46
35
coincide and conflict with the interests of the parties involved. More recently, the rise of computer games has resulted in numerous efforts to study how users behave when playing games, e.g. Salen and Zimmerman (2004), Brown and Bell (2004), although the importance of the ‘‘play element of culture’’ is old (Huizinga, 1955). In the image retrieval domain, online games have been used in the image annotation task (von Ahn & Dabbish, 2004; von Ahn & Dabbish, 2008). In von Ahn and Dabbish (2004), the ‘‘ESP’’ game was introduced, in which pairs of users are asked to guess what the other user is typing when presented with the same image. When both users type the same thing, for example the word ‘‘purse’’ when a picture of a purse is displayed, the players gain points, and then move onto a different image in which the process repeats. The result of the game are a set of annotations of images, which are agreed on by two users. Commercial versions which operate in similar ways, such as the Google Image Labeler,4 also exist. The idea has also been extended to gain finer grained information about images, for example, Peekaboom (von Ahn, Liu, & Blum, 2006) is designed to aid in the location of objects in an image, using a similar game scenario as that used in von Ahn and Dabbish (2004). The search process itself has also been used as a game in itself. Halttunen and Sormunen (2000) used a standard information retrieval test collection to create a game which aims to aid users in learning about searching. The relevance assessments of the test collection were used to show students the performance of the queries which they enter, allowing teachers and students to see and compare the performance of their queries. This work is closer to that reported in this paper, albeit in the text domain, due to the act of searching being core to the task of the game. In both Halttunen and Sormunen (2000) and the present work a test collection is used, and the relevance judgements provided by the test collection are used to judge (or score) the efforts of the users taking part in the game. In the work of Halttunen and Sormunen (2000), however, there is not the explicit element of competition with another, which is present in this work. In this work, we use a game scenario in order to simulate a situation which aims to encourage awareness of the other competitor. Unlike previous collaborative work, the aim is not to simulate collaboration directly but rather place users in a game situation which enables us to investigate the importance of one aspect of collaboration, that of awareness. The ‘‘game’’ is the search process itself, with game scoring provided by the relevance judgements of a test collection, in the manner of Halttunen and Sormunen (2000). 3. Experiment A competitive game scenario was used in order to test the research questions outlined in Section 1.1, where two users were asked to perform the same search task on two different computers at the same time. From the point of view of studying the effect of awareness, using a game scenario has the advantage of encouraging a user to be aware of the other remote user’s searching, since to do so may provide a competitive advantage. A competition provides a way of motivating the participants to gather as much material as possible, and be aware of their competitor, from whom they may copy material. The game-like nature of the experiment was reinforced by rewarding the user who found the most relevant shots for each search task with an extra monetary prize. The aim of this setup is to enable the study of awareness under controlled conditions in which it is encouraged, to provide results from a ‘‘best case’’ scenario, in order to guide future, more realistic, investigations of collaboration and awareness. Since a synthetic experimental scenario is used, the results of the experiment do not necessarily reflect real-world usage, however, if users do not or cannot take advantage of another user’s searching under such a conducive environment, awareness is also unlikely to provide an advantage in more realistic scenarios. Four different awareness conditions were used, as follows: 1. Mutual: Mutual awareness, where both users could see each other’s search screen. 2. Watching: the user would be able to see the other remote user’s screen, but not vice-versa. 3. Watched: the inverse of the previous condition, where the user cannot see the remote user, but knows they can see him or her. 4. Independent, where there was no awareness, and both users searched independently. From the point of view of each individual user, conditions (1) and (4) are straightforward, but the unbalanced awareness conditions (2) and (3) result in each user being in one of two different conditions, that of being aware of the other user (and not being watched), or of being watched by the other user (and having no awareness of that user’s actions). We also roughly categorise the Mutual and Watching conditions as being ‘‘aware’’ conditions, with the Watched and Independent conditions being ‘‘individual’’ conditions (no awareness). When referring to a user we will often refer to a local and remote user, to refer to the user of interest (the local user) and their other partner (or remote user). Three hypothesizes were defined: Hypothesis A: The performance of a user on a search task differs when the user is aware of the search interface of the remote user. Hypothesis B: The searching behaviour of a user differs on the two aware conditions when compared the individual conditions.
4
http://images.google.com/imagelabeler/.
36
R. Villa, J.M. Jose / Information Processing and Management 48 (2012) 32–46
Table 1 The Latin square design used in the experiment, for the first four pairs of users. The conditions were I (Independent), A (Awareness), W (Watched) and MA (Mutual awareness). When a user was in the Awareness condition, the opposing user was in the Watching condition. Odd user
Even user Topic
Topic
User
0173
0175
0189
0192
User
0173
0175
0189
0192
1 3 5 7
I MA A W
A W MA I
MA I W A
W A I MA
2 4 6 8
I MA W A
W A MA I
MA I A W
A W I MA
Hypothesis C: The effort expended by the user playing and browsing videos during searching differs on the aware conditions when compared to two individual conditions. Each of these hypothesises relates directly to the research questions RQ1–RQ3, from Section 1.1. Hypothesis A is straightforward: users who are able to watch another user engaged on the same task are likely to perform better, since they can copy and take advantage of the searching behaviour of the user being watched. It is proposed to measure search performance using three measures: the number of shots found, the number of relevant shots found, and mean average precision (MAP). Hypothesis B and C both make the assumption that when a user is aware of another user, there will be a decrease in the necessary work carried out, due to copying. For Hypothesis B, we measure searching by the number of queries executed, while for Hypothesis C, effort is measured by the number of video shots played, and forward and backwards interaction with the video browser. Video retrieval systems currently have low performance when compared to text, as can be seen by the average MAP scores in TRECVID (e.g. Smeaton & Over (2006), MAP for the worst performing four topics are shown in Table 2). By using video browsing functionality, users are able to search manually for relevant shots in videos, and we expect this manual searching to decrease in the awareness conditions. A Latin square design was used, with users and topics as blocking factors (Table 1). Since the experiment was collaborative, users were run in pairs, where each block of four runs (four pairs of users) on the four topics resulted in two complete Latin squares where Watching and Watched conditions mirror each other. In the next section, we describe the data collection which was used. 4. Collection The TRECVID 2006 collection was used for the evaluation, which provides 258 h of video data from the end of 2005. This particular collection was chosen for this study due to its relatively large size and its included set of topics and relevance judgements which could be used as-is. The vast majority of this is news video, although there are also some music and entertainment programs. The collection is multilingual, broadcast in three different languages (English, Chinese, and Arabic), and automatic speech recognition transcripts of all videos are provided as part of the data set. English translations of the Chinese and Arabic transcripts are also provided, generated automatically using standard machine translation software. For practical reasons, it was not possible to use all 24 TRECVID topics, instead the 4 worst performing interactive search topics were chosen, shown in Table 2. The four worst performing topics were chosen for two reasons, partly to follow the positive results of Adcock et al. (2007) which found difficult search tasks were more conductive to collaboration, and partly because we wanted to challenge our users to induce an awareness of the remote user. The TRECVID relevance judgments associated with each of these topics, created as part of the TRECVID 2006 effort, were used as the ground truth for each topic. Table 2 also shows the median mean average precision (MAP) for the systems which took part in TRECVID 2006, from Smeaton and Over (2006), showing the overall low performance for these topics. 5. Procedure Twenty four users were recruited for the experiment, through an email campaign at our university. Users ranged in age between 22 and 36 years of age, with a median of 26. All were either native English speakers, or considered themselves fluent in English. No user had any knowledge of Arabic, and one user was a native Chinese speaker, all others having no knowledge of Chinese. Table 2 The four TRECVID-2006 topics used in the evaluation, the median MAP values are from Smeaton and Over (2006). Topic
Median MAP
Topic description
0189 0173 0175 0192
0.038 0.037 0.034 0.030
Find shots of a group including least four people dressed in suits, seated, and with at least one flag Finds shots with one or more emergency vehicles in motion (e.g., ambulance, police car, fire truck, etc.) Find shots with one or more people leaving or entering a vehicle Find shots of a greeting by at least one kiss on the cheek
R. Villa, J.M. Jose / Information Processing and Management 48 (2012) 32–46
37
Since the experiment was collaborative in nature, users were arranged to arrive together. As a practical convenience, a number of users came with their friends to take part in the experiment together. Out of the 12 pairs of users, 4 knew each other well, while the other 8 pairs of users were either strangers or not well known to each other. The experimental set up and procedure was as follows: two users were provided with duel screen computers in the same room, where neither user could see the screen of the other user directly, and where both users were asked not to communicate during the course of the tasks. When the users arrived, they were asked to fill in a consent form and short entry questionnaire, before a short (10 min) training demonstration of the fully aware system was carried out. After the demonstration, a further 10–15 min training period followed, when users could use and become familiar with the search system. Before each search task was started, the search system for both users was initialised by the experimenter, along with a browser window showing the search topic. Both users then had 15 min to carry out the search task. A 1 min warning was announced before the search time ran out, after which a post-search questionnaire was administered. This procedure was then repeated for each of the four tasks. At the end of the experiment, the winner was announced. This was calculated by taking the number of relevant shots found by each user as a score – the user who found the greater number of relevant shots was the winner of the task. The user who won the most tasks was the winner of the experiment. In the case of a draw the prize money (5 pounds) would be split evenly between both users. All users, whether win, lose or draw, also received 10 pounds payment for taking part in the experiment. 6. Interface design A screenshot of the video retrieval system is shown in Fig. 1, showing the interface in the awareness mode. On the left hand side of the screen is the user’s search screen, the remote user’s screen is shown on the right, and each video shot is represented by a keyframe image. Working from top to bottom, on the left hand screen first, each of the main interface elements are: 1. The final results area, where users drag shots they consider relevant to the topic. 2. The search box and button, allowing the user to enter a textual query and start a search. 3. Immediately below the text search box, is a search history pull down menu, which gives a list of the text queries previously executed by the user. The user can re-execute an old query by selecting an item on this list. The history button will display any shots deleted from the relevant shots area. 4. The list of relevant shots, which will be used in relevance feedback. 5. The list of search results generated by a search. The interface is web based, and makes extensive use of drag and drop – to mark a shot as relevant to a query, for example, a shot can be clicked from the results list, and dragged onto the ‘‘relevant shots’’ area. On the right hand side of the screen in the awareness mode is the search display of the remote user, mirroring that of the local user. Shots can be dragged and dropped from this screen onto the local side of the display, enabling the copying of shots for either use in relevance feedback or for
Fig. 1. A screenshot of the collaborative interface, with the remote user’s screen on the right hand side.
38
R. Villa, J.M. Jose / Information Processing and Management 48 (2012) 32–46
selection as final results. The main difference in this right hand display is that the final results of the remote user cannot be seen, instead only a count of the number of shots marked by the remote user is seen. This ensures that on the awareness conditions users could not simply copy the final results of the other user. The counter aimed to continue the game-style metaphor used in the experiment, allowing the user some method of judging how well they were doing relative to their competitor. Note that the counter only displayed the total number of shots selected, and did not give information about the number of relevant shots. The remote display varies slightly from the local display in minor ways: the search and history buttons are disabled and the Clear button is replaced with a ‘‘Refresh’’ button. Clicking this button will update the remote display to reflect the latest state of the remote user’s search screen. An explicit button was used to enable us to record when users ‘‘look’’ at the remote user’s screen, without requiring more complex apparatus such as eye tracking equipment. It was thought that the game design, with scores and a prize, would encourage users to not only find material, but to also use this refresh functionality, since it is to their benefit to do so. Each keyframe contains a small play button on the bottom left of the image, which will play the corresponding shot in a pop-up video player. Keyframes of the shots temporally before and after the shot being played are displayed on the left and right of the pop-up; clicking on either keyframe will move that shot to the centre of the player, where it will play, enabling the browsing of the video backwards or forwards. Both the collaborative and non-collaborative versions of the interface make use of the same underlying retrieval system. This system is responsible for indexing the videos which make up the collection, and uses the same underlying technologies described in Urban et al. (2006). For search, the system uses both text information (the TRECVID-2006 transcripts) and the visual content of the keyframes which represent shots: query terms are used to search against the text transcripts as in a traditional textual IR system, and visual features from the relevance feedback examples are used to find other shots with similar visual characteristics. Two versions of this interface were developed for the experiment, the ‘‘collaborative’’ version described above, and an ‘‘individual’’ version, which is identical to the local screen on the left hand side of Fig. 1. With this particular interface, the user is unable to see the search screen of the remote user, although under the unbalanced awareness conditions, the remote user may still see them. 7. Results The 12 competitive runs of the experiment resulted in 11 wins and one draw, as shown in Table 3. This table gives the number of relevant shots marked by each user in each run, for each of the four conditions. For the two unbalanced conditions, the user with awareness of his/her partner won on 12 out of the 24 occasions, with 2 draws; in the other 10 situations, the user being watched won. One issue which immediately became apparent was that user 5 marked considerably more shots over the four conditions than all other users. This particular user realized at the start of the experiment that there was no penalty for selecting irrelevant shots, and his behaviour took advantage of this fact. In total user 5 marked 603 shots over the four conditions, compared to only 86 shots for the next highest user. For the analysis of shot marking in the following sections, this user’s results were removed, before the analysis was carried out. The results will be presented as follows: the next two Sections 7.1 and 7.2 report results looking at the degree of monitoring of the remote user and the user perceptions of this. This is followed by Sections 7.3, 7.4, 7.5 which report results for Hypothesises A–C (RQ1–RQ3). Sections 7.6, 7.7, 7.8, 7.9 consider RQ4, reporting results and user perceptions concerning the copying of remote information. Finally in Section 7.10 we consider whether users knowing each other better resulted in differences in search behaviour. 7.1. Monitoring of the remote user The interface shown in Fig. 1 required the user to explicitly press a ‘‘Refresh’’ button to update their view of the remote screen. This allowed us to log when a user updated their view of the remote user. Over all users, the median number of Table 3 Number of relevant shots found by each pair of users when competing against each other. A bold value indicates a winning score. For the unbalanced awareness, A indicates the Watching condition, W Watched. Pairs of users
1/2
3/4
5/6
7/8
9/10
11/12
13/14
15/16
17/18
19/20
21/22
23/24
Mutual
11 7 9 11 1 4 4 9 2
14 1 2 0 0 3 2 1 3
10 5 6 4 6 1 1 2 5
3 3 6 10 4 4 7 7 8
15 7 9 4 6 2 2 0 9
3 6 25 3 2 1 3 6 Draw
1 2 1 1 3 6 0 3 14
1 2 2 8 2 4 1 5 16
3 2 0 3 0 1 4 5 18
5 9 1 3 2 8 2 6 20
0 1 5 8 1 3 4 1 22
6 5 12 1 6 2 1 0 23
Unbalanced Unbalanced Individual Winning user
A W W A
39
R. Villa, J.M. Jose / Information Processing and Management 48 (2012) 32–46
refresh events occurring in the Watching condition was 8.5 (with minimum 0, maximum 43) and in the Mutual condition 7.5 (minimum 0, maximum 26). Fig. 2 shows the cumulative number of refreshes carried out by each user over time, with error bars. This graph was constructed by segmenting each user log into 1 min chunks, and counting the number of refreshes which had occurred up until that point in time. As can be seen, the number of refreshes is roughly linear for both the Watching and Mutual conditions, with users generally pressing refresh more often in the Watching condition. For both conditions there is a very large variation between users, especially for Watching. 7.2. User perceptions of awareness User perceptions of task stressfulness, the degree of distraction caused by the other user, and the utility of awareness were elicited in post task questionnaires. The questions are shown in Table 4, along with the median responses and interquartile ranges for each of the four conditions. A five point Likert scale was used for all questions, 1 indicating agreement with the statement, and 5 disagreement. Twenty four responses for each factor were gathered, with the awareness question being asked to only those who had carried out either the Watching or Mutual conditions. The question ‘‘I was distracted by the remote user’’ was asked for all conditions, and no significant differences were found, all users reported no distraction on all conditions. For the question ‘‘I was stressed while carrying out the task’’ there was a slight increase in stress on the Mutual condition (not statistically significant). The last of these three questions, ‘‘I found that awareness of the other user was useful’’, asked the users about the utility of being able to watch the remote user. In this case, there was a trend for a preference for awareness in the Watching condition, possibly due to the advantage gained. 7.3. Search performance In this section we present results for Hypothesis A: the performance of a user on a search task differs when the user is aware of the searching of the remote user. Three different measures are used for performance: the number of shots marked, the number of relevant shots marked, and the mean average precision (MAP) of the marked results. Shots marked is a simple count of the shots in the ‘‘final results’’ section of the interface, at the end of each search session, while the number of relevant shots is a count of those marked shots which are also considered relevant according to the TRECVID-2006 relevance judgements (these counts are also reported per user in Table 3). In both cases we assume that users who find more shots have carried out the task better. A MAP score can also be calculated as an alternative to using counts, in the standard way. For consistency, we only use the relevant shots in the calculation of MAP, rather than the arbitrary shot order which
Fig. 2. Mean number of cumulative refreshes over time.
Table 4 User responses to three post-task questions where 1 = agree, 5 = disagree; median (IQR).
I was stressed while carrying out the task I was distracted by the remote user I found that awareness of the other user was useful
Independent
Watched
Watching
Mutual
5 (1.00) 5 (0) n/a
5 (2.00) 5 (0.25) n/a
5 (1.00) 5 (1.00) 2 (1.00)
4.5 (1.25) 5 (1.00) 3 (2.00)
40
R. Villa, J.M. Jose / Information Processing and Management 48 (2012) 32–46
results from the interface. Each shot added to the ‘‘final result’’ area is appended to the end of the list, and the user is unable to reorder the results, which makes it unsuitable for use as a ranking. Table 5 shows the results for these three measures over the four conditions. In order to analyse this data, a negative binomial generalized linear model was used for the shots marked, and relevant shots marked (Venables & Ripley, 2002; Crawley, 2007), since both of these measures are count data. Post-hoc analysis of this data was carried out using a Tukey general linear hypothesis (Hothorn, Bretz, & Westfall, 2008). For MAP, a standard ANOVA analysis was carried out. A significant interaction was found for shots marked and relevant shots marked (v2 = 138.7, df = 28, p 6 0.05, and 2 v = 104.2, df = 29, p 6 0.05 respectively). In the post-hoc analysis, for the shots marked it was found that the Independent and Watching conditions differed significantly (z = 3.156, p = 0.009); for the case of relevant shots marked, the Independent and Mutual conditions differed significantly (z = 2.584, p = 0.0478). For MAP, where an ANOVA was used, no significant interactions were found. In Table 5, a trend can be seen for performance to increase from the Independent condition to the Mutual condition, for both the relevant shots marked, and MAP. For all three measures, there is a trend for performance to increase on the awareness conditions when compared to Independent and Watched, although only two of the comparisons were significant. The MAP results were low, typical in video retrieval (the median MAP scores for these topics from TRECVID 2006 are shown in Table 2 for comparison). It should be noted, however, that the absolute performance of the retrieval system was not of interest in this study, rather we are interested in the effect of the awareness conditions. 7.4. Searching Results for Hypothesis B (the searching behaviour of a user differs with awareness) are presented in this section. The number of searches carried out was taken to be the measure for this hypothesis, and Table 6 shows the mean number of the searches carried out in each of the four conditions. This is a count, similar to shots marked from the previous section, and therefore a negative binomial generalized model was used to fit the data, and a significant interaction was found (v2 = 143.13, df = 29, p 6 0.05). A Tukey general linear hypothesis was run, with significant differences found between the Independent and Watched (z = 2.622, p = 0.0430), and the Independent and Watching conditions (z = 3.083, p = 0.0112). No significant differences were found between the Mutual and Independent conditions. As can be seen in Table 6, the trend is for fewer searches to be executed on the Watched and Watching conditions, with a larger number of searches executed on Mutual, and larger still on Independent. This trend is quite different from that found in the previous section, Section 7.3, where the trend was for an increase in performance for each condition in turn. 7.5. Video browsing effort This section concerns Hypothesis C, which hypothesised that the effort expended by a user playing and browsing videos during searching differs with awareness. To this end, three measures were used: the number of times a user Table 5 User Performance for the given three measures, mean (SD). Independent Shots marked Relevant shots marked MAP q
6.91 (3.75) 3.17 (2.53) 0.017 (0.015)
Watched 8.04 (7.23) 3.71 (3.14) 0.020 (0.017)
Watching
Mutual q
11.09 (9.74) 4.88 (5.29) 0.022 (0.017)
10.17 (8.39) 5.08 (4.17)q 0.024 (0.020)
Indicates a significant difference with respect to the Independent condition (p 6 0.05).
Table 6 Number of searches carried out on each condition, mean (SD).
Number of searches carried out ⁄
Independent
Watched
Watching
Mutual
25.21(14.70)
20.79(13.61)⁄
19.71 (9.34)⁄
23.75(13.35)
Indicating a significant difference with the Independent condition (p 6 0.05).
Table 7 Video browsing as measured by the shots played, and next/previous shot browsing, mean (SD).
Shots played Next shot Previous shot ⁄
Independent
Watched
Watching
Mutual
49.83 (23.02) 114.04 (107.29) 16.88 (23.90)
42.46 (16.73) 122.08 (98.45) 20.71 (22.20)
51.88 (19.41) 97.04 (96.35) 13.33(18.77)
59.79 (26.44)⁄, 104.12 (107.01) 12.29 (15.86)
Indicates a significant difference with the Independent condition (p 6 0.05). A significant difference with the Watched condition (p 6 0.05).
41
R. Villa, J.M. Jose / Information Processing and Management 48 (2012) 32–46 Table 8 The number of shots marked by users from different sources (median and IQR).
From From From From From
video player local relevance list local search results remote relevance list remote search results
Independent
Watched
Watching
Mutual
2 (5) 1 (3) 0 (1) n/a n/a
0.50 (6.25) 1.00 (2.75) 0.00 (2.00) n/a n/a
2.00 1.00 0.00 0.00 0.00
2.00 (9.25) 1.50 (3.00) 1.00 (2.00) 0.00(0.00) 0.00 (0.00)
(6.25) (3.00) (3.00) (1.00) (0.00)
played a video shot5; the number of times the previous video was played in the video browser (i.e. the user browses backwards in the video); and the number of times the next video was played in the video browser (i.e. the user browses forwards in the video). Results are shown in Table 7. Again, a negative binomial generalized model was used to fit the count data, and significant interactions were found for each of the measures and awareness (v2 = 193.55, df = 29, p < 0.05, v2 = 174.85, df = 29, p < 0.05 and v2 = 114.28, df = 29, p < 0.05 respectively). For shots played, using a Tukey general linear hypotheses, it was found that the Mutual condition was significantly different from both the Independent and Watched conditions (z = 3.271, p = 0.0058 and z = 5.776, p < 0.001). The Watching condition was also found to be significantly different to Watched (z = 3.589, p = 0.0019). For next video shot events, Watching was found to differ significantly from Watched (z = 2.704, p = 0.0349), while for previous video shot events, both Mutual and Watching was varied significantly different from Watched (z = 2.827, p = 0.0244, and z = 2.640, p = 0.0414 respectively). For the Next and Previous shot results in Table 7, it can be seen that there is a trend for these functions to be used less in Watching and Mutual, indicating that users are browsing forwards and backwards less in these conditions. For the Play events, the trend is the opposite, where Playing videos increases in Watching and Mutual. The reason for this difference may be due to users playing video shots with the video browser as a way of checking their relevance or lack of relevance to the topic before marking the shot. Unfortunately, from the logs, we are not able to determine which panel of the interface the played shot was from (i.e. whether the shot was one found on the remote or local screen), but this playing as checking may also have occurred with shots found from the remote user, the keyframe alone not being informative enough to make a relevance judgement. 7.6. Were shots copied from the remote user? From the search logs generated by the system, it was also possible to estimate the source of each shot marked by the user, i.e. given the list of shots marked as final by the user, where did these shots come from? Table 8 show the median number of shots from each of the following five sources which made up a user’s final list of results for a task:
From From From From From
video player: shots which were copied from the video player directly onto the final results. local relevance list: shots which were copied from the relevance list of the user. local search results: shots which came directly from the user’s search results. remote relevance list: shots which came from the remote user’s list of relevant shots. remote search results: shots which came from the remote user’s search results.
These results consider the ‘‘source’’ for a final result to be the last interaction which added the shot to the final result list, for example, if a shot was added from the video player, then deleted, and then added from the search results, it will be counted as being from the search results. They also do not take account of shots which the user may have copied onto their own relevance list, before copying onto their own final results, although this number is very small. This later action – copying a shot from the relevance list to the final result list – occurred only 22 times across all users. Due to the relative sparsity of the data, we report medians and the interquartile range, and have again removed the results for user 5; results are shown in Table 8 and Fig. 3 as box plots. It can be seen that most shots are sourced from a user’s own efforts, and in particular, from the video player. Few shots were copied from the remote user. In total only 26 shots were copied from the remote user across the search sessions, the majority of which (18 shots) were copied from the remote user’s relevance feedback area, rather than from the remote user’s search results. 7.7. When during sessions did users copy shots from the other user? From the logs generated by the users it was possible to find out when during the sessions shots were copied from the remote user. For example, it was possible for a user to wait until the end of a session before copying shots, rather than be aware of their competitor during the search session itself. Table 9 shows the total number of shots copied, with and
5
This does not include video shots played using the previous and next functions, which differs from the results given in Villa et al. (2008b).
R. Villa, J.M. Jose / Information Processing and Management 48 (2012) 32–46
From local relevance list
Independent Watched
Mutual
Watching
25 20
Mutual
Independent Watched
Watching
Mutual
From remote search results
0
25 20 15 10 5 0
5
10
15
20
25
From remote search results
30
30
From remote relevance list
15 0
0 Watching
10
From local search results
30 25 20 15 10
From local relevance list
5
30 25 20 15 10
From video player
5 0 Independent Watched
From remote relevance list
From local search results 30
From video player
5
42
Independent Watched
Watching
Mutual
Independent Watched
Watching
Mutual
Fig. 3. The number of shots marked by users from different sources.
Table 9 The number of shots marked by users from different sources (median and IQR). Minute
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Total shots copied (with user 5) Total shots copied (without user 5)
48 46
4 1
1 1
0 0
0 0
2 0
7 2
6 0
17 3
1 0
5 3
6 1
2 0
0 0
6 3
Table 10 The unique terms used and estimated number of terms copied, mean (SD).
Unique terms used per user Unique terms copied per user Percentage of terms copied
Independent
Watched
Watching
Mutual
17.38 (9.76) n/a n/a
14.33 (9.39) n/a n/a
13.33 (7.18) 1.83 (1.93) 14%
16.17 (9.26) 1.67 (1.63) 10%
without user 5, by all users. Eighteen of the users copied a shot from the remote user during the first minute of the search session. During the last minute of the session, only users 5, 7 and 9 copied shots, totaling 6 shots (last column of Table 9). As can be seen, apart from the start of the session, relatively few shots are copied during the rest of the sessions, especially when user 5 is removed from the analysis. 7.8. Did users copy search terms from the other user? One of the aspects under study is the degree to which awareness of another user influences search behaviour. One possible way awareness may help is by providing a user with new terms with which to use in queries, allowing them to ‘‘borrow’’ the terms of the remote user. Table 10 shows the number of unique terms used in each condition, plus the estimated number of copied terms in the two awareness conditions. The copying estimates were derived from the interaction logs, which record each search initiated by a user, plus when a user presses the ‘‘Refresh’’ button to update their view of the remote user. Copied terms were identified using the following procedure:
43
R. Villa, J.M. Jose / Information Processing and Management 48 (2012) 32–46 Table 11 User perceptions of remote information, where 1 = agree, 5 = disagree; median (IQR).
I found it useful to copy the other user’s words from the search box I found it useful to copy the other user’s shots marked as relevant I found it useful to copy the other user’s search results
Watching
Mutual
4 (2.75) 3 (2) 2 (2)
4 (2.25) 3 (2) 3 (2)
1. A list of common terms was constructed, for each pair of users. A stop word list was used to filter out very common words. 2. For each common term, the times at which the term was first used by each user was found. The user who had used the term first was assumed to be the originator of the term. 3. The log file was then checked, to see if the other user had pressed ‘‘refresh’’ between when the first user had used the term, and the second user. If this was the case, the term was counted as having been copied by this second user. While this does not allow us to know for sure whether a term was copied, it provide an estimate of the potential terms which may have been copied and reused during the awareness conditions. It can be seen from Table 10 that roughly between 10% and 14% of terms used by a user in an awareness condition were potentially copied from the other user. 7.9. User perceptions of remote information use After the Watching and Mutual conditions, three questions were used to elicit the degree of usefulness of the different types of remote information available to the user for copying. The three questions were: ‘‘I found it useful to copy the other user’s words from the search box’’, ‘‘I found it useful to copy the other user’s shots marked as relevant’’ and ‘‘I found it useful to copy the other user’s search results’’. The responses to these questions are shown in Table 11. It can be seen that users generally disagreed with the statement that copying terms from the other user’s query was useful, despite the results from Section 7.8. Responses to the second question suggest that users were generally neutral concerning the copying of relevance information, possibly reflecting the low level of copying found in Section 7.6. Users responded more positively to the question ‘‘I found it useful to copy the other user’s search results’’ on the Watching condition. The greater perceived utility of remote search results in Watching is consistent with users behaving less pro-actively in this condition, although from the log data it was found that search results were copied less than shots marked relevant. 7.10. Did the performance of users alter based on familiarity? While not explicitly controlled in the experimental design, one potential influence on the performance and behaviour of users is their relationship with their competitor, i.e. how well the two users know each other may affect how the user approaches the game scenario. In order to investigate whether this variable could have an effect, the question ‘‘How well do you know the other person involved in this experiment?’’ was asked in each entry questionnaire, to both users, before starting each study. A five point scale was used, ranging between 1 (‘‘not at all’’) and 5 (‘‘very well’’); results are shown in Table 12. Users were asked this question individually, and so inevitably there are differences between users’ perceptions of each other (shown by the differences between the even and odd numbered users in Table 12, although in only two cases are the differences between users larger than one). Four pairs of users reported that they knew each other ‘‘not at all’’, two pairs knew each other ‘‘very well’’ (pairs 5 and 9), while the mean score for the other pairs of users was between 2.5 and 3.5, indicating that they knew their opposing partner to some degree. Overall, for all pairs of users, the mean response is 2.708 (SD 1.422). Based on this feedback, we can consider the performance of each user based on that user’s familiarity with their opponent. Table 13 shows the mean performance, for the three performance measures used in Section 7.3: shots marked, relevant shots marked, and MAP. No significant interactions were found, when using multiple t-tests (for MAP) and the Mann-Whitney U (shots marked and relevant shots marked), along with the Bonferroni correction.
Table 12 Responses to the question ‘‘How well do you know the other person involved in this experiment?’’ from the entry questionnaire. 1 = ‘‘not at all’’, 5 = ‘‘very well’’. Pair
1
2
3
4
5
6
7
8
9
10
11
12
Odd user Even user Mean SD
4 3 3.5 0.53
2 3 2.5 0.53
3 3 3 0
5 2 3.5 1.6
4 5 4.5 0.53
3 4 3.5 0.53
4 2 3 1.07
1 1 1 0
5 5 5 0
1 1 1 0
1 1 1 0
1 1 1 0
44
R. Villa, J.M. Jose / Information Processing and Management 48 (2012) 32–46
Table 13 Mean performance for different levels of partner familiarity. Knowledge of opponent
1 (none)
2
3
4
5 (very well)
Shots marked Relevant shots marked MAP
7.312 (4.935) 3.656 (3.001) 0.019 (0.015)
9.917 (9.876) 4.5 (4.101) 0.024 (0.026)
10.8 (11.728) 4.05 (5.817) 0.025 (0.019)
8.938(7.094) 4.875 (4.303) 0.022 (0.016)
7.562 (4.32) 3.188 (2.316) 0.014 (0.01)
8. Discussion Looking first at Hypothesis A and the results reported in Section 7.3, there is a trend for users to find more shots, and find more relevant shots in the awareness conditions. For shots marked and relevant shots marked, there are significant differences between Watching and Independent, and Mutual and Independent respectively, with a general trend for greater numbers of shots to be found on Watching and Mutual. MAP results show a trend without statistical significance. Both shots marked and MAP show the same trend, with users performing poorest in the awareness condition, increasing on Watched, further increasing on Watching, with the best performance on the Mutual condition. This gradual increase indicates a possible relationship between the conditions. For shots marked, a similar trend is found, but with the number of shots found decreasing in Mutual when compared to Watching, which suggests that while users were marking more shots relevant in Watching, there was not an increase the accuracy of those results. The results in Section 7.4 for Hypothesis B show that the number of searches executed on both the Watched and Watching conditions is less than the Independent condition. Both are also less than Mutual, although this difference is not statistically significant at the 5% level. The relative lack of searching on both conditions may be a result of users performing more passively under these conditions. By looking at the number of shots copied from the remote user, this possible passivity in Watching may be reflected by the number of shots copied, i.e. that a passive user on the Watching condition would spend more time watching and copying shots found by the remote user. However, the difficulty of finding relevant shots means this is not clear from the results in Section 7.6 – the median number of shots copied is zero for all conditions, with only the range for the Watching condition varying (Fig. 3). This reflects the low numbers of relevant shots found by users in Table 3, which makes it difficult to determine the contribution of shots copied form a remote user. Awareness is obviously rendered less useful when the person being watched is unable or has difficulty in completing the task. Hypothesis C (results presented in Section 6.4) concerned the effort exerted in playing and browsing videos. From Table 7, a trend can be seen for fewer Next and Previous shot events to be executed on Watching and Mutual. The shots played, however, shows almost the opposite behaviour, with a trend for users to Play more shots on Watching and Mutual. This may be due to users checking the relevance of shots by playing them before marking them as relevant. Taking the results from Sections 7.3 and 7.5 together, the generally better relative performance of users on Watching and Mutual, combined with the trend for less video browsing (albeit with more Playing of videos), suggests evidence that awareness enables a user to perform better with less effort, than working independently. One result of this study is the relative lack of copying by users (Section 7.6). One possible outcome of the experimental procedure was for users to simply copy shots from each other and therefore, in the Mutual awareness condition, both users ending up with the same results. This did not materialize, with the degree of copying being low (with the exception of User 5). As previously mentioned, its likely that the difficulty in finding relevant shots impacted on the usefulness of awareness. It should be emphasized that the study took place with the four worst performing tasks from TRECVID 2006; choosing less difficult topics may have resulted in a larger difference between the different conditions. Additionally, as shown in Section 7.7, most shots which were copied occurred at the start of a session, the numbers copied by users being low for the rest of the session. There does not appear to be evidence that users waited until the end of a session, and simply copied the other user’s shots onto their screen, although the design of the interface (in which the final result area of the remote user was hidden), would tend to mitigate against this behaviour. One other possible reason for the lack of copying was the design of the interface, which required users to press refresh to update the remote screen, although it was thought that the competitive nature of the experiment would encourage the use of this feature. Users pressed refresh on average 9.88 times per task (SD = 7.93), which equates to less than once a minute, which may have contributed to the lack of copying. Informal feedback during the experiment also made clear to us that some users were using the final results area, which could not be viewed by the remote user, as a way of ‘‘hiding’’ shots. For example, a number of users stated that they avoided using relevance feedback, since placing a shot in the relevant shot area would make that shot visible to the other user. Instead, they would drag any found shots directly to the final shots area, where they could not be seen by the remote user. The results of term copying suggest that between 10% and 14% of terms have been potentially copied. Analysis based on these figures must be tentative due to the way in which these estimates were derived from the log data, and with the absolute number of potentially copied terms being low (10). Interestingly, user perceptions of which remote data is most useful (Section 7.9) is different from the low-level analysis, suggesting that users consider copying results and shots marked relevant to be more useful than search terms. This is perhaps unsurprising, given that a shot can be directly copied without further work. One possible influence on performance and searcher behaviour is how well users know each other: two users who are good friends may act and perform differently from two strangers. This was not explicitly controlled in the design of the
R. Villa, J.M. Jose / Information Processing and Management 48 (2012) 32–46
45
experiment although we did, in the entry questionnaire, ask each user how well they knew their opposing partner. These results are given in Section 7.10, and show that more pairs of users judged each other as strangers than as good friends. No significant interaction between familiarity and performance was found, although this is undoubtedly an area for future research. Lastly, the results of Section 7.2 suggest that both being watched and being the watcher is not significantly more stressful or as distracting as searching alone, although this result should be considered within the context of the controlled environment of the experiment. Users’ perceptions of the utility of awareness, while generally neutral, did become more positive on the Watching condition. This was not a major effect, however, and no user simply copied the material found by the other user, without doing any searching work themselves. This could be due to the low effort required to search; carrying out a search is straightforward in the interface, and does not take much effort. 9. Conclusions and future work In this paper a study into the role of awareness in multimedia retrieval has been presented. Awareness is an important enabler of communication and collaboration within collaborative environments. The study was designed to investigate the potential benefits, if any, of awareness within an environment which encouraged users to be aware of another’s activity, achieved through the use of a competitive game-like scenario. From the results and discussion sections, some broad conclusions can be drawn, notably that awareness would appear to help improve searcher performance, and simultaneously decrease the searcher effort required to achieve that performance. Searching appears to be influenced by the unbalanced awareness conditions Watched and Watching, where the number of searches executed decreased. Perhaps most importantly, copying of shots from the remote user was not carried out often despite the game scenario, with the exception of a single user. This result suggests that users may have to be encouraged, via interface support, to view and use remote information, such as through a more natural and easy form of awareness (e.g. one where remote information is available without the need to explicitly update a display). The importance of sharing search terms is also suggested by the results, although given the generally negative perception of search terms as a source of useful information (Section 7.9), more work would have to be carried out to investigate the ways of improving remote use of such information. The study suggests a number of avenues for investigation, in particular further research into how awareness information (query terms, shot information, etc.) are used by users. The study reported here suggests that query terms appear to be reused, between 10% and 14% of the time. How and why these terms are copied and used is still an open question, as is the issue of how this may or may not change if the study was carried out in a text based search environment. Making shot information more attractive to users is also an issue, for which content based or collaborative filtering techniques could be used. Another interesting direction of study is to apply the game-based search methodology described here to a text based search environment, or other multimedia search environment. Indeed, the competitive aspect of the experiment may in fact prove advantageous when considering some scenarios in which search systems are used, such as that of students involved in a project. Competition is a natural part of such situations, where students compete to get the best grades, and therefore a game-like scenario may be appropriate for their study. Acknowledgment This research was supported by the European Commission under the projects SALERO (FP-027122-SALERO) and MIAUCE (IST-033715). It is the view of the authors but not necessarily the view of the community. References Adams, M., Tenney, Y., & Pew, R. (1995). Situation awareness and the cognitive management of complex systems. Human Factors, 37(1), 85–104. Adcock, J., & Pickens, J. (2008). Fxpal collaborative exploratory video search system. In ACM Conference on Image and Video Retrieval (CIVR 2008) VideOlympics (Demo). Adcock, J., Pickens, J., Cooper, M., Anthony, L., Chen, F., & Qvarfordt, P. (2007). Fxpal interactive search experiments for TRECVid 2007. In TRECVid 2007 – Text REtrieval Conference TRECVid Workshop. National Institute of Standards and Technology (NIST), Gaithersburg, MD.
. Adcock, J., & Cooper, M., Pickens, J. (2008). Experiments in interactive video search by addition and subtraction. In ACM conference on image and video retrieval (CIVR 2008). Brown, B., & Bell, M. (2004). CSCW at play: ‘‘there’’ as a collaborative virtual environment. In Proceedings of the 2004 ACM conference on computer supported cooperative work (CSCW’04). ACM, Chicago, USA, pp. 350–359. Colman, A. M. (1982). Game theory and experimental games: The study of strategic interaction. Oxford: Pergamon. Crawley, M. J. (2007). The R book. Chichester, England: John Wiley and Sons Ltd.. Dourish, P., & Bellotti, V. (1992). Awareness and coordination in shared workspaces. In Proceedings of the conference on computer supported cooperative work (CSCW’92). ACM, Toronto, Canada (pp. 107–114). Endsley, M. (1995). Toward a theory of situation awareness in dynamic systems. Human Factors, 37(1), 32–64. Gutwin, C., & Greenberg, S. (2004). The importance of awareness for team cognition in distributed collaboration. In E. Salas, S. M. Fiore, & J. A. CannonBowers (Eds.), Team cognition: Process and performance at the inter- and intra-individual level (pp. 177–201). Washington: APA Press. Halttunen, K., & Sormunen, E. (2000). Learning information retrieval through an educational game: Is gaming sufficient for learning? Education for Information, 18(4), 289–311. Hothorn, T., Bretz, F., & Westfall, P. (2008). Simultaneous inference in general parametric models. Biometrical Journal, 50(3), 346–363.
46
R. Villa, J.M. Jose / Information Processing and Management 48 (2012) 32–46
Huizinga, J. (1955). Homo ludens: A study of the play-element in culture. Boston, Mass: Beacon Press. Hutchins, E. (1995). Cognition in the wild. Cambridge, Mass: The MIT Press. Morris, M. R. (2007). Collaborating alone and together: Investigating persistent and multi-user web search activities. Tech. Rep. MSR-TR-2007-11, Microsoft Research. . Morris, M. R. (2008). A survey of collaborative web search practices. In CHI ’08: Proceeding of the twenty-sixth annual SIGCHI conference on human factors in computing systems (pp. 1657–1660). New York, NY, USA: ACM. Morris, M. R., & Horvitz, E. (2007). Searchtogether: An interface for collaborative web search. In Proceedings of the 20th ACM symposium on user interface software and technology (UIST’07) (pp. 3–12). ACM. Nemeth, C. (1972). A critical analysis of research utilizing the prisoner’s dilemma paradigm for the study of bargaining. Advances in Experimental Social Psychology, 6, 203–234. Pruitt, D. G., & Kimmel, M. J. (1977). Twenty years of experimental gaming: Critique, synthesis, and suggestions for the future. Annual Review of Psychology, 28, 363–392. Salen, K., & Zimmerman, E. (2004). Rules of play: Game design fundamentals. Cambridge, Mass: MIT Press. Schmidt, K. (1998). Some notes on mutual awareness. In Paper presented at COTCOS awareness SIG workshop. . Simone, C., & Bandini, S. (2002). Integrating awareness in cooperative applications through the reaction-diffusion metaphor. Computer Supported Cooperative Work, 11, 285–298. Smeaton, A., & Over, P. (2006). TRECVid-2006: Search task (slides). In TRECVid 2007 – Text REtrieval conference TRECVid workshop. National Institute of Standards and Technology (NIST), Gaithersburg, MD. . Smeaton, A. F., Foley, C., Gurrin, C., Lee, H., & McGivney, S. (2006b). Collaborative searching for video using the Fischlar system and a diamondtouch table. In Proceedings of the First IEEE international workshop on horizontal interactive human–computer systems (TABLETOP ’06) (pp. 151–159). Washington, DC, USA: IEEE Computer Society. Smeaton, A., Lee, H., Foley, C., & McGivney, S. (2006a). Collaborative video searching on a tabletop. Multimedia Systems, 12, 375–391. Smeaton, A. F., Over, P., & Kraaij, W. (2006c). Evaluation campaigns and TRECVid. In MIR ’06: Proceedings of the 8th ACM international workshop on multimedia information retrieval (pp. 321–330). New York, NY, USA: ACM Press. Smeulders, A. W., Worring, M., Santini, S., Gupta, A., & Jain, R. (2000). Content-based image retrieval at the end of the early years. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(12), 1349–1380. Urban, J., Hilaire, X., Hopfgartner, F., Villa, R., Jose, M., Chantamunee, S., & Gotoh, Y., (2006). Glasgow university at TRECVid 2006. In TRECVid 2006 – text REtrieval conference TRECVID workshop. National Institute of Standards and Technology, MD, USA. Venables, W. N., & Ripley, B. D. (2002). Modern applied statistics with S (4th ed.). London: Springer. Villa, R., Gildea, N., & Jose, J. M. (2008a). Collaborative awareness in multimedia search. In Proceedings of the ACM international conference on multimedia (pp. 877–880). Canada: ACM, Vancouver. Villa, R., Gildea, N., & Jose, J. M. (2008b). A study of awareness in multimedia search. In Proceedings of the 8th ACM/IEEE-CS joint conference on digital libraries (JCDL’08) (pp. 221–230). Pittsburgh, PA, USA: ACM. von Ahn, L., & Dabbish, L. (2004). Labeling images with a computer game. In Proceedings of the SIGCHI conference on human factors in computing systems (CHI 04) (pp. 319–326). New York, NY, USA: ACM. von Ahn, L., & Dabbish, L. (2008). Designing games with a purpose. Communications of the ACM, 51(8), 58–67. von Ahn, L., Liu, R., & Blum, M. (2006). Peekaboom: A game for locating objects in images. In Proceedings of the SIGCHI conference on Human Factors in computing systems (CHI 06) (pp. 55–64). ACM, New York, NY, USA.