Free-form searching via web sites: Content and moving observed in the context of personal development

Free-form searching via web sites: Content and moving observed in the context of personal development

Information Processing and Management 42 (2006) 769–784 www.elsevier.com/locate/infoproman Free-form searching via web sites: Content and moving obse...

246KB Sizes 2 Downloads 18 Views

Information Processing and Management 42 (2006) 769–784 www.elsevier.com/locate/infoproman

Free-form searching via web sites: Content and moving observed in the context of personal development Jarkko Kari

*

Department of Information Studies, University of Tampere, FIN-33014 University of Tampere, Finland Received 16 December 2004 Available online 17 May 2005

Abstract This empirical article focuses on information seeking—related to personal development—via World Wide Web sites. In order to obtain detailed and valid data, free-form Web searches by 15 individuals were observed and videotaped. The 687 sites visited by the informants, along with their actions therein, were examined and coded. The study explores the entity focus and origin of the viewed WWW sites, as well as the tactics of moving within and between sites. Correlations between the variables are also analysed. One of the most interesting findings was that personal resources—documents within the userÕs computer—are a part of the Internet, too, when they are connected to it. This observation highlights the potential importance of researching the ways in which internal and external search for information differs, and what roles local resources have.  2005 Elsevier Ltd. All rights reserved. Keywords: Information searching; Web sites; Content; Search tactics; Observation; Personal development

1. Introduction Although empirical studies dealing with WWW searching have proliferated in recent years, research of this kind has often scrutinized navigation between Web pages (e.g. Cockburn & Jones, 1996; Cockburn & McKenzie, 2001; Ho¨lscher & Strube, 2000; Kim, 2001; Slone, 2002). The impetus for the article at hand is given by the statement that ‘‘the Internet is increasingly recognized for the vast array of information, services, meeting places, and communities-of-interest that it offers’’ (Scull, Milewski, & Millen, 1999, p. 17; see also Ho¨lscher & Strube, 2000). The purport of the declaration is that the mass of WWW information is

*

Tel.: +358 3 215 8391; fax: +358 3 215 6560. E-mail address: jarkko.kari@uta.fi

0306-4573/$ - see front matter  2005 Elsevier Ltd. All rights reserved. doi:10.1016/j.ipm.2005.04.001

770

J. Kari / Information Processing and Management 42 (2006) 769–784

shaped and made identifiable by sites rather than specific pages. This would point to Web site being a more meaningful unit of analysis than Web page. Concentrating on WWW sites yields a more nuanced picture of information searching than the WWW as a whole, and it simultaneously avoids the atomism of individual pages. It is appropriate to define the World Wide Web (WWW or Web) as an interactive and collaborative ‘‘information environment’’ (e.g. Catledge & Pitkow, 1995, p. 1066; Tewksbury & Althaus, 2000, p. 128) that is mainly composed of innumerable hypermedia and hypertext documents linked to one another (see e.g. Catledge & Pitkow, 1995; Lazonder, Biemans, & Wopereis, 2000), and distributed over the Internet (see Choo, Detlor, & Turnbull, 1999; cf. OÕNeill, 1998). A WWW site ‘‘is a collection of Web pages linked together and which exist on a particular server’’ (ibid., p. 115; also Hodkinson, Kiel, & McColl-Kennedy, 2000; cf. Wang, Hawk, & Tenopir, 2000). It is published by an organization or individual (see Pharo, 2002). The site is usually distinguished by its unique Internet domain name (Hodkinson et al., 2000): for instance, the site of the Relationship Learning Center is at the URL (Universal Resource Locator; OÕNeill, 1998) www.relationshipjourney.com. In brief, then, a site is normally a distinct Web space (Wang et al., 2000) with a home or main page of its own. There are apparently two main navigational directions in relation to WWW sites: inter-site (between sites) and intra-site (within a site, page by page) searching (Hodkinson et al., 2000). In Card et al.Õs (2001) small-scale study of Web searching, the number of the partakersÕ movements within sites was much higher than between sites. This ‘‘discovery’’ is hardly a surprise, since WWW sites are made of pages. Studying Web search activities at the level of sites—as in CotheyÕs (2002), Lazonder et al.Õs (2000), as well as LightÕs (2001) work—has not been a popular approach. Even those inquiries have paid little regard to site content and moving from one site to another. CotheyÕs longitudinal analysis of transaction logs looked at ‘‘Web host conformance’’, or the degree to which the WWW sites visited by one participant matched the pool visited by all participants. The results revealed that the informantsÕ conformance decreased—owing to experience—during ten months. This was interpreted as increasing dispersion in the selection of Web sites (Cothey, 2002). In an observational piece of research by Lazonder, Biemans and Wopereis, the effect of searchersÕ Web experience on locating WWW sites was examined. It was ascertained that those with more experience were more successful in finding pertinent sites than novices (Lazonder et al., 2000). Light conducted three different, but related studies concerning the role of context in perceiving Web sites. It seems that the subject matter of, and inputting text into WWW sites affected the participantsÕ notions of the sites. In turn, the producers of, and structural metaphors in Web sites influenced the informantsÕ interaction with those sites (Light, 2001). The present paper postulates that parallel to the evolving information horizon—particularly the Internet—is the human being, who him/herself is also in a process of becoming more mature, reaching out for perfection. If we accept the view that development is a fundamental characteristic of living organisms (Deci & Ryan, 1985; Piaget, 1971), this sounds only natural. Especially in these times of rapid change, the continual developing of oneself has become a necessity (see Ostun, 1998), even in free time. To this, on the other hand, the information society with its networked services provides quite innovative solutions. Personal development or growth means that an individual improves his/her abilities, skills, knowledge or other qualities by working on them (see Maslow, 1968; cf. Magnusson, 1995; Ostun, 1998). Self-development arises from our needs, and affects our behaviour (Deci & Ryan, 1985). Personal growth is also crucial for peopleÕs information skills (Fourie & Niekerk, 1999). Taking the above remarks into due consideration, this empirical article focuses on information seeking— related to personal development—via World Wide Web sites. The purpose of the report is to analyse the nature and types of WWW sites, as well as information search tactics. The paper looks for answers to three research questions which are further specified below:

J. Kari / Information Processing and Management 42 (2006) 769–784

771

(1) What kinds of Web sites do people visit?  What entity do the sites focus on?  Where do the sites come from? (2) How do the persons move in the WWW?  Which pages of a site do they visit?  How do they proceed from one site to another? (3) Do the qualities of site and manner of moving correlate? These questions are addressed by analysing ‘‘ordinary peopleÕs’’ free-form Web searches recorded in university facilities.

2. Methods Because relatively little was still known even about the basics of WWW information seeking when this project commenced (in 2001), an exploratory mode of research was in order. As a mixture of descriptive and explanatory work, this article probes the study object by inductive, deductive and correlative means. Such triangulation of methods aims at getting a many-sided and consistent picture of the phenomenon under scrutiny. The procedures as a whole can be pictured as a process with three distinct phases: recruiting participants, collecting data, and analysing data (Fig. 1). 2.1. Participants The research effort was centred on individuals who were sufficiently motivated to go through the various phases of the investigation (see Savolainen, 1998). Accordingly, persons were looked for who were interested in developing themselves, and who also used the Internet in connection with it. For practical and financial reasons, it was sensible to limit the dispersion of the informants to the province of Pirkanmaa (in Finland), for the study is performed in its capital, Tampere. Since the investigation aspires to profundity, a sample of 20 individuals was deemed adequate. Because there was no exhaustive or even representative register of self-developers, the only option was to seek out volunteers from a number of quarters in the hope of reaching at least some degree of coverage (cf. Rieh, 2004). Considering the theme of the project—‘‘Self-development and Internet use’’—the Internet was probably the best vehicle in contacting potential participants. Therefore, a notification about informants wanted was sent by e-mail to the local public library, adult education centres, and a computer club for senior citizens (altogether five organizations), who forwarded my message to their own people, as well as put my hyperlink on their WWW sites. To make explicit what I meant by ‘‘personal development’’, its definition from Section

Fig. 1. The empirical process.

772

J. Kari / Information Processing and Management 42 (2006) 769–784

1 was included in the call for participants. The candidates enrolled to the inquiry by submitting via a Web form certain basic information about themselves: ways of personal development, Internet use in years, name, domicile, and contact details. Suitable persons were then approached by e-mail or telephone in order to fix a meeting. As a result, I managed to persuade 18 individuals to take part in the research. Because authenticity is a central element in the project at hand, an effort was made to study each participant on his/her own terms, i.e. where and when s/he wanted. Much to my surprise, a preponderance (15) of the recruits opted for our department facilities, whereas two informants selected their home, and one preferred his workplace. The research space (at the University of Tampere) was a standard meeting room that was furnished with a computer (connected to the Internet) and video recording apparatuses for the purposes of this inquiry. On the other hand, if an individual wanted to perform a WWW search somewhere else, the immobility of the video equipment prevented screen capturing on site. Because video data is crucial to the current report, the three ‘‘dissidents’’ had to be excluded here. Curiously, only one third (5) of the 15 informants were male (cf. Rieh, 2004). This bias towards women may have been inadvertently caused by an appeal on my WWW form saying ‘‘help men out now!’’ (see acknowledgements). The participants represented all age groups between 10 and 70 years (cf. ibid.), their average age being 37 years (n = 14; the number of observations fluctuates here, because some participants did not answer all of the interview questions). The educational distribution of the subjects did not come as a surprise: one of them had no degree at all (since he was still in the primary school), none had just a basic degree, eight had an intermediate (college, upper secondary school, or vocational school) degree, and four had a higher (university) degree (n = 13). Their occupational status was such that seven individuals were studying, four were working, and three did neither of those (n = 14): one of these was a pensioner, another was on vacation, and the third person was unemployed. The high proportion of students could be explained by the recruitment channels used. The partakersÕ Internet experience varied between two and ten years, with a mean of five years. It did seem that both novice and expert searchers were rare in this group. 2.2. Data collection The main part of the data was collected during November 2001–January 2002. Usually all necessary data from one person was gathered before moving on to the next. Following the example set by the bulk of earlier Internet studies (e.g. Pharo & Ja¨rvelin, 2004; Rieh, 2004; Wang et al., 2000), this investigation wielded multiple methods of data collection, for different research questions demanded different procedures. Interviewing was the core technique, covering the context of situational Web information seeking. The real-time scrutiny of WWW interaction, in turn, addressed most of the research questions in this paper, necessitating observation (by the scholar) and thinking aloud (by the participants). Wherever a Web session took place, the partaker always had the liberty to select any available browser program s/he wanted, and carry out the search as s/he saw fit, on a subject of his/her own. There was no real time limit, either. The duration of the sessions fell somewhere between half an hour and two hours. The sole restriction was that the search topic had to concern personal development. What constituted ‘‘self-development’’ was ultimately left to the participants to decide (one of the interview questions actually probed what the informant him/herself means by personal development), so no external, strict definition was imposed on them. Sometimes, a participant looked for information on more than just one subject within a single WWW session; the maximum was as many as seven topics. In a few other cases, the person searched the Web for the same thing on two separate occasions, under my surveillance. As a matter of fact, a great majority of the informants felt that it would take more than three (some even spoke of dozens) rounds of Web searching to satisfy their then need for information (see Rieh, 2004; cf. Fidel et al., 1999). For understand-

J. Kari / Information Processing and Management 42 (2006) 769–784

773

able reasons, it would have become impossible to document all of these instances, and therefore only one search session per person was usually deemed sufficient. Two sittings took place with three participants (separately), as this appeared more appropriate in their situations. Pharo (1999) also observed one search session per individual, but this is far fewer than, say, Fidel et al.Õs (1999) three sessions. My minimal line in this regard was justified by aspirations to profound, in-depth data gathering and analysis. When a partaker came to my WWW chamber, the search session was captured on videotape. This was enabled by a computer-to-TV converter (AVerKey500; see http://www.aver.com/products/ comptv_AVerKey500pro.shtml), which had been inserted between the PC and its monitor. A microphone was also attached to the video cassette recorder (VCR), which arrangement synchronized the events on the computer screen with the searcherÕs speech. The video tapes came to hold a total of some 17 hours of data. 2.3. Data analysis All empirical material was then transformed into computer-readable text. The data processing involved transcribing the audio recordings, as well as interpreting the taped video films. The current article examines the observational (video) data. The first phase of analysis was the identification of the seen WWW sites, which laid the foundation of and merged with later stages. Initially, the video cassettes were played back, and the participantsÕ navigational paths were manually logged (see Hill & Hannafin, 1997), page by page. The Web site to which each page belonged was also noted. Later, the sites were scrutinized in more detail (see next paragraph), paying special attention to the URL address, title, information content, and language of their main page. For the purposes of this report, moving to a page in another WWW site signaled the next site, even if that meant revisiting a previously-viewed site. In each case, the manner of jumping from one site to another could be discerned by following the mouse pointer, scroll bars, and text entry boxes on the screen. Coding the video material was the second phase. In this basically qualitative job, the major methods were content analysis (as in Rieh, 2002, for example), and classification. Typologies were usually constructed according to single dimensions, so that the various types became mutually exclusive. This of course implies an inductive mode of analysis, in which categories emerged from the researcherÕs interpretation of the data. For every Web site that an informant visited, the values of 60 variables were recorded. Most of the information was gleaned by aggregating page-level data, but occasionally this had to be complemented by inspecting the searcherÕs thinking-aloud or the original site in the WWW. Of the four variables in this paper (entity focus, site origin, intra-site movement, and inter-site movement; see Sections 3.1.1 and 3.2.2), one was categorized deductively. The classification of movement strategies from my first empirical article (Kari, 2004) in the project was chosen to test its utility with inter-site movement. The third step involved extracting quantitative, descriptive measures—such as frequency and percentage (cf. Jansen & Pooch, 2001)—from the coded material. In the fourth phase, statistical operations were performed between the variables. Since these are all of nominal scale, contingency table analyses were carried out. As the size of the table was always bigger than 2 · 2 cells, the proper correlation coefficient was Crame´rÕs V (see Elifson, Runyon, & Haber, 1990). The accompanying significance level (p) was naturally calculated, too. The quantitative work was done by the help of statistical software—StatView SE+Graphics 1. 2.4. Quality The validity of the research results was affected by several factors, both negative and positive. It was most of all weakened by the fact that the Web searches studied in this article were conducted somewhere else than

774

J. Kari / Information Processing and Management 42 (2006) 769–784

in the participantsÕ natural environment. Moreover, the resolution of the video recordings left something to be desired, as the body text on the screen often remained unreadable. These shortcomings are above all compensated for by having collected the data in real time, thus avoiding pitfalls of retrospection. The ‘‘artificial’’ approach is also balanced by grounding most of the analysis on the data. When something about a WWW site seemed unclear or ambiguous, the original documents (in the Internet) could usually be consulted (cf. Rieh, 2004). If not, the participantÕs speech was examined as a last resort. Hence, the project should yield moderately valid findings. It was not to be expected that the reliability of the results would pose serious problems, as the empirical material was gathered and coded quite systematically. However, it was still given a boost by double-checking some of the codes (see Klobas & Clyde, 2000). Due to the small and self-selected sample, as well as the time lag of three years (between data collection and reporting), the quantitative generalizability of the findings may be low, which is almost a hallmark of Internet usage research (see Savolainen, 1998). This limitation is mitigated by the half-qualitative approach, and comparing results with earlier research.

3. Results and discussion The empirical findings are presented and discussed next. This section covers the entity focus and origin of Web sites, intra-site and inter-site movement, as well as correlations between the variables. 3.1. Web sites The number of visited sites ranged between 18 and 83, depending on the individual (n = 15). The average was 47 sites, whereas the total number of WWW sites in the corpus was 687. 3.1.1. Entity focus The original concept of entity focus is a less-used gap dimension in the Sense-Making approach (e.g. Dervin, 1999). It organizes information needs according to what entity they concentrate on (Dervin, Jacobson, & Nilan, 1982). Instead of subject areas (see e.g. Kari, 2004), entity focus deals with types of things (living or nonliving) ‘‘with distinct and independent existence’’ (see Compact Oxford English dictionary, 2005). The concept is applied to Web sites in the study at hand. For the sakes of concreteness and comparability with prior research, the specific entity foci are examined first. Here, a class was required to manifest itself at least ten times in the data. Otherwise, it was merged with another category. Table 1 enumerates the various foci of the WWW sites and their share in the corpus. Nine distinct entity foci were found: computer program, database, education, organization, person, portal, publication, service, and subject area. The additional categories of ‘‘other’’ and ‘‘none’’ require some specification. The other Web sites were dedicated to a place (4 cases), presentation (3 cases) or project (1 case). They were lumped together because of their rarity. Some pages did not have any content whatsoever, and their entity focus was thus none. Many of them were initial pages, chiefly since the WWW browsers were configured to show a blank window on start-up, or when pushing the Home button. Other pages looked void, because they did not get a chance to load their content. A juxtaposition with prior findings would be most interesting here. The sole related WWW site typologies are the following two:  search engine, entertainment, sports, utilities, news, interactive, commerce (Perse & Ferguson, 2000);  political or government, music-related, film-related, television-related, on-line magazine or newspaper, corporate or business, employment-related, spiritual or religious, travel-related, sports-related, other kind (Tewksbury & Althaus, 2000).

J. Kari / Information Processing and Management 42 (2006) 769–784

775

Table 1 Specific entity focus of Web sites Examplea

f

Portal Organization

Suomi24.fi [Finland24.fi] at http://www.suomi24.fi/ Oulun seudun ammattikorkeakoulu: Kirjasto [Oulu Polytechnic Library] at http://www.oamk.fi/kirjasto/

200 142

29.1 20.7

Person Subject area

Erkki Karvonen at http://www.uta.fi/~tierka/ Salukien kanssa vuodesta 1975: Sari Ra¨sa¨nen perheineen [With salukis since 1975: Sari Ra¨sa¨nen & family]c at http://www.kolumbus.fi/rasanen.saluki/ Tietokone [Computer]d at http://www.tietokone.fi/ ATK-ajokorttikoulu [ADP Driving Licence School] at http://www.atk-ajokorttikoulu.net Ta¨htitieteen viitetietokanta: Kosmos [Astronomical reference database: Cosmos] at http://www.ursa.fi/extra/kosmos/ Peli [Game] at http://www.hellfish.org/~mersu/files/hemohespeli.swf F Musiikki online [F Music online]e at http://www.f-musiikki.fi/

67 39

9.8 5.7

38 22

5.5 3.2

20

2.9

18 14

2.6 2.0

8

1.2

39 80

5.7 11.6

687

100.0

Entity focus b

Publication Education Database

Computer program Service Otherf Noneg Unknown Total a b c d e f g

Petsamon liikenne 60 vuotta [Pechenga Haulage 60 years] at http://www.mobilia.fi/petsamo/ Empty page at about:blank The site remained unidentifiable

%

Note that the Web sites at the stated addresses have usually changed after the data collection period (2001–2002). Search engines, Web directories, or any other WWW sites that mainly provide references to other sites. This site was about salukis, a breed of dogs. A magazine. An online shop. Known sites which cannot be placed in the categories above. Pages that do not apparently belong to any site.

It is unfortunate that those classifications are quite different from the one presented in this work. Moreover, neither of those studies analysed the visitation rate of the sites. This being the case, comparisons appear to be rather pointless. According to Table 1, portal and organizational sites were visited most commonly, whereas service, computer program, database and education were the least frequent foci of the WWW sites. The favouring of portals is not astonishing, for surveys have time and again shown that search engines are among the most popular Web sites (Green, 2000). This suggests that metainformation (see Kari, 2004) is felt relatively important in the WWW environment. In most cases, choosing an organizationÕs site probably reflected a wish for official or trustworthy information. Even though the specific site typology in Table 1 seems robust enough, it may not be exhaustive. It is also impracticable when doing correlational analyses, unless one has a large number of observations (see Abramson, 1998). It could therefore be useful to condense the information further. This gave rise to generic entity foci, which are displayed in Table 2. Five sorts were present: actor, artefact, information, place and process sites. The chart also enumerates which specific foci were typically classified under each generic foci. This connection is indicative only, for due to the small number of several foci, no correlation could be calculated. The generic entity foci found here resemble the four entity foci in the Sense-Making theory: self, other (people), object and situation (event or process) (Dervin, 1983; Dervin et al., 1982). Actor corresponds to self and other, artefact is roughly the same as object, and process is a kind of situation. However, information and place are not covered by DervinÕs categorization.

776

J. Kari / Information Processing and Management 42 (2006) 769–784

Table 2 Generic entity focus of Web sites Generic focus

Typical specific foci

f

Actor

Organization Person

210

30.6

Informationb

Portal

207

30.1

Artefact

Computer program Database Publication Service Subject area

112

16.3

Process

Education

30

4.4

4

0.6

5 39 80

0.7 5.7 11.6

687

100.0

a

c

Other

Place

More than one Nonef Unknowng Total

d

e

Subject area None Unknown

%

a

A living being. b Information about information. c For example, Active Worlds (virtual reality) at http://activeworlds.com/. d A Web site that exhibits two or more equally strong generic foci instead of one. For instance, Datatekniikka ja viestinta¨: Jukan avoin tietosivusto [Data technology and communication: Jukka’s open information site] at http://www.cs.tut.fi/~jkorpela/indexfi.html revealed a combination of artefact and process. e The specific focus of all five sites was a subject area, but the majority of the subject area sites were artefacts. f Pages that do not apparently belong to any site. g These sites remain unidentifiable.

Table 2 shows the distribution of the generic entity foci, too. While actor and information WWW sites were each visited almost every third time, actor was a shade more popular focus. Artefact sites attracted moderate attention with their share of one sixth, whereas process (1/23) and especially place (1/172) sites were less common. The prevalence of actor sites could tell of the searchersÕ humanness, of their attempts to somehow connect with other people through the Internet. It might also be linked to personal development as a characteristically human phenomenon. What with the proposed system of generic and specific entity foci, someone might argue that they actually form an ‘‘ontology’’, which concept has gained some popularity in recent years. According to Kabel and her colleagues, for instance, ontology in philosophy means ‘‘a theory about the nature of existence, of what types of things exist’’. In the information field, an ontology is an indexing framework more complex than a traditional thesaurus, as it incorporates concepts, attributes, relations, constraints and instantiations (Kabel, Hoog, Wielinga, & Anjewierden, 2004, p. 350). Yes, the classification of entity foci could be considered as a simple ‘‘entity ontology of WWW sites’’, but so could almost any typology be construed as an ontology. This would water down the term, so it is best reserved for indexing purposes. 3.1.2. Origin The origin of a Web site betokens the geographical area of the Internet in which the site resides, relative to the searcherÕs point of network access. Foreign country, home country, local network, disk, and RAM were the five zones from where the WWW sites were fetched (see Table 3). This classification presupposes a very broad view of the Internet: it incorporates not only external resources (foreign and domestic sites), but also resources internal to an organization (local networks such as intranets), and even personal resources in

J. Kari / Information Processing and Management 42 (2006) 769–784

777

Table 3 Origin of Web sites Origin Home country Foreign country RAMc Disk Local network Unknown

Example a

Makupalat [Titbits] at http://www.makupalat.fi/ J.S. Bach Archive and Bibliography at http://odur.let.rug.nl/Linguistics/diversen/bach/intro.htmlb Passages of Web information pasted into unsaved Word documents Int-aad on a diskette P. Tynin kotisivut [P. TyniÕs home pages] at http://www.uta.fi/~paivi.tyni/e Ka¨ytta¨ja¨kysely 98 [User survey 98] at an unidentified URL

Total a b c d e

f

%

295 249

42.9 36.2

60 9 9 65

8.7 1.3 1.3 9.5

687

99.9

A Finnish URL. A Dutch URL. ‘‘Random Access Memory’’ (Compact Oxford English dictionary, 2005), which is the operating memory in a computer. A small collection of texts about a computer virus called Int-aa. A URL of the University of Tampere.

the userÕs computer (disks and RAM). In this study, temporary (RAM) files played a role in some informantsÕ search, and thus they can be regarded as a part of the Web of electronic documents, too, even if just for a short time. As a whole, the origin typology forms a nested model of WWW spaces, from ‘‘here’’ (RAM) to ‘‘far away’’ (foreign country). In earlier research, local resources have been conceptualized as either a physical document collection (e.g. Steinhagen & Moynahan, 1998), the WWW site of oneÕs organization (Aimar et al., 1995), or an intranet (e.g. Cline, 1998). The inquiry at hand highlights records within the userÕs computer as a novel dimension of local resources in Web searching. According to Table 3, WWW sites from oneÕs home country were viewed most often—almost every second time. With their share of over a third, sites abroad were also quite favoured. Local sites of all kinds, on the other hand, were seldom visited, for their combined percentage was a mere one ninth. Overall, the distribution speaks of a national orientation in Web searching. 3.2. Moving through the WWW 3.2.1. Intra-site movement Intra-site movement refers to the user navigating within a Web site. This was not analysed as a complex page-by-page process, but simply in terms of the section(s) that the individual went to on that occasion: the Table 4 Pages in intra-site movement Pages

Examplea

f

Subpages Home page and subpages Home page

s1 ) s2 ) s1 ) s2 ) s1 h ) s1 ) s2 ) s2 ) s3 ) s4 Tampereb at http://www.tampere.fi

407 139 65

59.2 20.2 9.5

None Unknownc

Empty page at about:blank A pop-up window with the title Print FREE Coupons for your Holiday Shopping!

39 37

5.7 5.4

687

100.0

Total a b c

h = home page, s = subpage. A city. These sitesÕ page types remain unidentifiable.

%

778

J. Kari / Information Processing and Management 42 (2006) 769–784

home page, subpages (Reitz, 2004), or both (see Table 4). Home pages could be picked out on the basis of their content or position in the page hierarchy. The rest of the pages were subpages, empty pages, or unidentifiable pages. Visiting a site could mean viewing one page, or it could mean travelling through 50 pages. The median number of pages was just one, which denotes that usually site visits were incredibly short. Table 4 indicates that most (over half) visits to WWW sites comprised of viewing solely subpages. A home page was involved in fewer than one third of the cases. Proceeding to a home page but not to any of its subpages only happened about every 11th time on an average. In retrospect, the class of subpages appears rather coarse. It would be more informative to divide them into at least two varieties: informational vs. metainformational pages (see e.g. Kari, 2004; also Dennis, Bruza, & McArthur, 2002), or public vs. restricted-access pages, for instance. 3.2.2. Inter-site movement An inter-site move is an action by which the searcher leaves a WWW site, and proceeds to view another site (see Pirolli & Fu, 2003). As listed in Table 5, three classes of such tactics were witnessed: pointing, typing and following. These I first identified in connection with individual pages (Kari, 2004). When a person points, s/he moves on to the next site by essentially pushing a single symbol of one kind or another. In the investigation at hand, this was ordinarily done via clicking a mouse button, but sometimes a keyboard shortcut or the computerÕs Reset switch was a more viable solution. Advancement as a consequence of inputting a character string into one or more text fields is called typing. The third method—following—signifies that the individual momentarily loses control over manoeuvring, while the computer takes him/her to another site. In this project, it was not a matter of artificial intelligence acting or the machine getting muddled; in all likelihood, it happened because the programming behind some of the WWW pages instructed the browser to jump to a designated address. Upon examining the quantitative aspects of Table 5, one can immediately perceive that pointing was the preferred tactic, since it accounted for an overwhelming six sevenths of the traffic between sites. This observation resembles earlier results on browsing in the Web (e.g. Iivonen & White, 2001; Ylikoski, 2003; cf. Dalgleish & Hall, 2000). With its cut of one ninth, typing was left far behind. Following was even more exceptional, as this only took place every 31st time on an average. Considering the absence of personal bookmarks and settings, the absolute supremacy of pointing was remarkable. From this, one may speculate that pointing is even more common when a person searches the WWW on his/her own computer. It might be profitable to compare the above distribution with that pertaining to the individual pages (in Kari, 2004). It seems that site-level movement happened slightly more often by pointing (+0.8%) or following (+0.2%), and less often by typing (1.8%), than page-level movement. All in all, the differences were unexpectedly small. The similarity does make sense in the light of the fact that visiting a Web site normally meant viewing just one page therein (see above). What with the overwhelming dominance of pointing, it could be advisable to split this class into two varieties of pointing, such as going forward (e.g. by clicking on a link) vs. backward (e.g. by pushing the browserÕs Back button) (see Kari, 2004). Table 5 Tactics of inter-site movement Tactic

Example

f

Pointing Typing Following

Clicking on a hyperlink Executing a query with a search engine The window automatically moved behind another window

582 74 22

84.7 10.8 3.2

None Unknown

The person did not go any further The manner of movement could not be observed

8 1

1.2 0.2

687

100.1

Total

%

J. Kari / Information Processing and Management 42 (2006) 769–784

779

3.3. Correlations In order to figure out meaningful covariations that would meet the technical requirements of the Crame´rÕs V test, uncertain data had to be excluded. Thus the ‘‘unknown’’ categories, and classes with too few cases (e.g. ‘‘place’’ as entity focus in Table 2) were ruled out here. Table 6 reports the correlations (on the scale of 0–1) between the four variables (in Tables 2–5) for which they could be legitimately determined. It appears that although the connections were not very strong, their statistical significance was generally high. When we look at the variables in Table 6, it seems that inter-site movement correlations were moderate, but other covariations were principally low. Apparently, inter-site movement correlated most intensely with entity focus and intra-site movement, whereas there was no evidence of any dependency between Web site origin and intra-site movement. The sole prominent covariations are examined in more detail below. 3.3.1. Entity focus vs. inter-site movement Upon perusing Table 7, the entity focus of a WWW site appears to influence the way in which a person leaves the site. It turned out that there are certain differences between entity types in this regard. When moving from artefact and especially information sites, the technique of typing was less frequent than normally. Then again, process sites and empty pages in particular were more often than usually left behind by typing. The dissimilarities can be accounted for by the navigational options available. In general, pointing is the default mode of jumping to another site, whereas typing represents an anomaly in this respect. Now

Table 6 Correlations (Crame´rÕs V) between variables (n = 536–631) Variable Y

Variable X Entity

Origin

Intramove

Intermove

Entity Origin Intramove Intermove

– 0.18** 0.15** 0.29**

0.18** – 0.08 0.10*

0.15** 0.08 – 0.28**

0.29** 0.10* 0.28** –

* **

p < 0.05. p < 0.001.

Table 7 Percentages (%) of inter-site movement tactics by generic entity foci* Entity***

Intermove**

Total

Pointing

Typing

Information Artefact Actor Process None

94.1 92.1 89.4 86.7 54.6

5.9 7.9 10.6 13.3 45.5

100.0 100.0 100.0 100.0 100.1

Average

89.4

10.7

100.1

* ** ***

n = 573, Crame´rÕs V = 0.29, p = 0.0001. Classes ‘‘following’’, ‘‘none’’ and ‘‘unknown’’ were discarded. Classes ‘‘place’’, ‘‘more than one’’ and ‘‘unknown’’ were discarded.

780

J. Kari / Information Processing and Management 42 (2006) 769–784

Table 8 Percentages (%) of inter-site movement tactics by pages in intra-site movement* Intramove***

Intermove** Pointing

Typing

Subpages Home page & subpages Home page None

92.6 87.1 78.7 54.6

7.4 12.9 21.3 45.5

100.0 100.0 100.0 100.1

Average

88.1

11.9

100.0

* ** ***

Total

n = 620, Crame´rÕs V = 0.28, p = 0.0001. Classes ‘‘following’’, ‘‘none’’ and ‘‘unknown’’ were discarded. Class ‘‘unknown’’ was discarded.

information sites are extraordinarily rich with links to other sites, so there is probably less need to resort to a typing tactic to reach them. The opposite can be said of empty pages (no entity focus), where typing was about four times more prevalent than on the average. Because such pages have no content and thus no signs forward, it may be much more necessary to enter a URL. 3.3.2. Intra-site movement vs. inter-site movement As Table 8 demonstrates, it seems that inter-site movement was also affected by intra-site movement. When viewing only subpages of a Web site, typing was a particularly little-used means of getting to the next site. With mere home pages, on the other hand, the searchers more often resorted to typing. Empty pages (no site) were in a class of their own, just like in Table 7. The navigational possibilities are again a plausible determinant of the correlation. After all, subpages are much more likely to contain links outside than home pages.

4. Conclusions 4.1. Summary This article has scrutinized information searching on the World Wide Web in terms of what sorts of sites were visited by self-developers, how they moved within and between sites, as well as whether there were any dependencies between these phenomena. The research questions were successfully answered, albeit the analyses could not dig very deep. The most central findings were, question by question:  What entity do Web sites focus on? The two most typical specific entity foci of the viewed WWW sites were portal and organization. The generic entity foci of actor, artefact, information, place and process formed a novel site typology; actor and information were the dominating categories in quantitative terms.  Where do the sites come from? The new origin classification specifies whether a WWW site comes from a foreign country, home country, local network, disk, or RAM; almost half of the sites were fetched from distant, domestic servers.  Which pages of a site do people visit? Intra-site movement involves going to a home page, subpages, or both; the majority of the visits to a Web site did not include the home page.  How do searchers proceed from one site to another? Among the three tactics of inter-site movement (following, pointing and typing), pointing was used by far most.

J. Kari / Information Processing and Management 42 (2006) 769–784

781

 Do the qualities of site and manner of moving correlate? Statistically significant correlations between the site and movement variables were common, but not intense; the strongest covariations were discovered in generic entity focus affecting inter-site movement, as well as intra-site movement influencing inter-site movement. As explained in the method section, caution must be exercised when generalizing the quantitative results obtained in this study. The numbers are likely to represent active self-developers more than other populations. Then again, I might ask: who among us does not evolve oneself ? 4.2. Implications Pharo (2002) supposes that WWW resource types are innumerable. Empirical studies refute such a notion by showing that the wide repertoire of sites can be captured in finite typologies. Granted, the few earlier content-based Web site classifications have been somewhat haphazard, and not that fruitful theoretically. It does seem that there is a limited number of specific entity foci, but finding a fully comprehensive and non-overlapping set is problematic. The more systematic schema of generic entity focus provides a solution, albeit at the cost of losing some information. One advantage of this typology is its compatibility with Brenda DervinÕs Sense-Making theory. Besides research, entity focus may also be useful in the practice of designing digital information systems. There the categorizations could be exploited in implementing a document/site entity ontology (see Section 3.1.1) as a hierarchy including the generic and specific levels (see Kabel et al., 2004). It is unlikely that entity focus would make a good primary search key, but it could work well as a secondary key. Thus, for example, a WWW directory probably ought to be first broken down by subjects (cf. Kari, 2004), and these could be then further specified with entities. Most Web searching studies assume that Internet information is ‘‘out there’’, waiting to be harvested. Analysing the origin of Web sites, however, exposed that personal resources—documents within the userÕs computer—are a part of the Internet, too, when they are connected to it. This observation highlights the potential importance of researching the ways in which internal and external search for information differs, and what roles local resources have. When training people to seek information in the Internet, it would be beneficial to practise searching in local resources, as well. Search engine developers, on the other hand, might do well to experiment with the option of including local resources in a WWW retrieval system. These measures would generate added value to end users. Over half of the Web site visits were to subpages only; that is, the home page was not involved in intra-site movement. In the case that this is a common trend—the substantiation of which calls for further research worldwide—WWW publishers should pay more attention to the usability of their sites in their entirety. It would hence be necessary to ensure the integrity of a site, make its various components accessible from subpages, and visualize its structure. Home pages ought to retain their role as the logical centre of a Web site, but it would not be wise to assume that most of the site traffic actually passes through them. As far as inter-site movement is concerned, this inquiry once more confirms prior findings on the prevalence of the browsing (pointing) strategy, as opposed to searching (typing). It would therefore be amply justified for researchers of Internet information seeking to concentrate on pointing as the principal method of travelling through the WWW. User training would benefit from taking up pointing to diversify peopleÕs Web search skills. On the other hand, there is surely room for improvement in the searchersÕ utilization of the typing tactic, too. The correlation patterns suggest that the tactic of WWW movement is a central variable, at least in intersite navigation (cf. Kari, 2004). Without any strong dependencies, however, the covariations remained fairly unimpressive as a whole. It is quite possible that contextual factors—like search topic, situation, or stage of personal development—are better predictors of site-level Web searching.

782

J. Kari / Information Processing and Management 42 (2006) 769–784

The work done in this study can be carried on along several avenues. One would be the finding or creating of a theory to accommodate information search by using WWW sites. In addition to the propositions above, there are many fruitful, related research questions that are worth exploring. To mention but a few: What other relevant dimensions of Web site use are there? Why are particular WWW sites and ways of movement chosen? How do site-level and page-level searching relate to each other? Making the research setting as naturalistic as possible, and furthering a longitudinal approach are methodological challenges that should be dealt with.

Acknowledgements The project has been a joint endeavour by the author and Reijo Savolainen. Savolainen did not participate in composing this article, but he provided some background support, for which he earns my thanks. Furthermore, I wish to acknowledge the valuable suggestions of the anonymous referees. I am also grateful to the Information Society Institute at the University of Tampere for funding this research.

References Abramson, A. D. (1998). Monitoring and evaluating use of the World Wide Web in an academic library: An exploratory study. Proceedings of the ASIS Annual Meeting, 35, 315–326. Aimar, A., Casey, J., Drakos, N., Hannell, I., Khodabandeh, A., Palazzi, P., et al. (1995). WebLinker, a tool for managing WWW cross-references. Computer Networks and ISDN Systems, 28, 99–107. Card, S.K., Pirolli, P., Wege, M. van der; Morrison, J.B., Reeder, R.W., Schraedley, P.K., Boshart, J., 2001. Information scent as a driver of Web behavior graphs: Results of a protocol analysis method for Web usability. (UIR-2000-13). http://www2.parc.com/istl/ groups/uir/publications/, 11 March 2005. Also CHI 3(1) 498–505. Catledge, L. D., & Pitkow, J. E. (1995). Characterizing browsing strategies in the World-Wide Web. Computer Networks and ISDN Systems, 27(6), 1065–1073. Choo, C.W., Detlor, B., Turnbull, D., 1999. Information seeking on the web—an integrated model of browsing and searching. First Monday 5(2), http://www.firstmonday.dk/issues/issue5_2/choo 11 March 2005. Also Proceedings of the ASIS Annual Meeting 36, 3–16. Cline, N. M. (1998). Local or remote access: Choices and issues. Collection Management, 22(3/4), 21–29. Cockburn, A., & Jones, S. (1996). Which way now. Analysing and easing inadequacies in WWW navigation. International Journal of Human-Computer Studies, 45(1), 105–129. Cockburn, A., & McKenzie, B. (2001). What do web users do? An empirical analysis of web use. International Journal of HumanComputer Studies, 54(6), 903–922. Compact Oxford English dictionary (2005). Oxford: Oxford University Press. http://www.askoxford.com/dictionaries/compact_oed/? view=uk 13 January 2005. Cothey, V. (2002). A longitudinal study of World Wide Web usersÕ information-searching behavior. Journal of the American Society for Information Science and Technology, 53(2), 67–78. Dalgleish, A., & Hall, R. (2000). Uses and perceptions of the World Wide Web in an information-seeking environment. Journal of Librarianship and Information Science, 32(3), 104–116. Deci, E. L., & Ryan, R. M. (1985). Intrinsic motivation and self-determination in human behavior (Perspectives in Social Psychology). New York: Plenum Press. Dennis, S., Bruza, P., & McArthur, R. (2002). Web searching: A process-oriented experimental study of three interactive search paradigms. Journal of the American Society for Information Science and Technology, 53(2), 120–133. Dervin, B., 1983. An overview of Sense-Making research: Concepts, methods, and results to date. Paper presented at the Annual Meeting of International Communication Association. Dallas, May 1983. Also http://communication.sbs.ohio-state.edu/sensemaking/art/artabsdervin83smoverview.html, 15 March 2005. Dervin, B. (1999). On studying information seeking methodologically: The implications of connecting metatheory to method. Information Processing and Management, 35(6), 727–750. Dervin, B., Jacobson, T. L., & Nilan, M. S. (1982). Measuring aspects of information seeking: A test of a quantitative/qualitative methodology. Communication Yearbook, 6, 419–444.

J. Kari / Information Processing and Management 42 (2006) 769–784

783

Elifson, K. W., Runyon, R. P., & Haber, A. (1990). Fundamentals of social statistics (Second ed.). New York: McGraw-Hill. Fidel, R., Davies, R. K., Douglass, M. H., Holder, J. K., Hopkins, C. J., Kushner, E. J., et al. (1999). A visit to the information mall: Web searching behavior of high school students. Journal of the American Society for Information Science, 50(1), 24–37. Fourie, I., & Niekerk, D. van (1999). Using portfolio assessment in a module in research information skills. Education for Information, 17(4), 333–352. Green, D. (2000). The evolution of Web searching. Online Information Review, 24(2), 124–137. Hill, J. R., & Hannafin, M. J. (1997). Cognitive strategies and learning from the World Wide Web. Educational Technology Research and Development, 45(4), 37–64. Hodkinson, C., Kiel, G., & McColl-Kennedy, J. R. (2000). Consumer web search behaviour: Diagrammatic illustration of wayfinding on the web. International Journal of Human-Computer Studies, 52(5), 805–830. Ho¨lscher, C., & Strube, G. (2000). Web search behavior of Internet experts and newbies. Computer Networks, 33(1–6), 337–346. Iivonen, M., & White, M. D. (2001). The choice of initial web search strategies: A comparison between Finnish and American searchers. Journal of Documentation, 57(4), 465–491. Jansen, B. J., & Pooch, U. (2001). A review of web searching studies and a framework for future research. Journal of the American Society for Information Science and Technology, 52(3), 235–246. Kabel, S., Hoog, R. de, Wielinga, B. J., & Anjewierden, A. (2004). The added value of task and ontology-based markup for information retrieval. Journal of the American Society for Information Science and Technology, 55(4), 348–362. Kari, J. (2004). Web information seeking by pages: An observational study of moving and stopping. Information Research, 9(4), http:// InformationR.net/ir/9-4/paper183.html, 15 March 2005. Kim, K.-S. (2001). Information seeking on the Web: Effects of user and task variables. Library & Information Science Research, 23(3), 233–255. Klobas, J. E., & Clyde, L. A. (2000). Adults learning to use the Internet: A longitudinal study of attitudes and other factors associated with intended Internet use. Library & Information Science Research, 22(1), 5–34. Lazonder, A. W., Biemans, H. J. A., & Wopereis, I. G. J. H. (2000). Differences between novice and experienced users in searching information on the World Wide Web. Journal of the American Society for Information Science, 51(6), 576–581. Light, A. (2001). The influence of context on usersÕ responses to websites. The New Review of Information Behaviour Research, 2, 135–149. Magnusson, D., 1995. Individual development: A holistic integrated model (Reports from the Department of Psychology 796). Stockholm: Stockholm University. Maslow, A. H. (1968). Toward a psychology of being (second ed.). Princeton: D. van Nostrand. OÕNeill, E. T. (1998). Characteristics of web accessible information. IFLA Journal, 24(2), 114–116. Ostun, A. (1998). Self-development: Adaptation for change in the process of information services. Information Services & Use, 18(3), http://search.epnet.com/direct.asp?an=1444044&db=afh, 15 March 2005. Perse, E. M., & Ferguson, D. A. (2000). The benefits and costs of Web surfing. Communication Quarterly, 48(4), 343–359. Pharo, N. (1999). Web information search strategies: A model for classifying web interaction? In T. Aparac, T. Saracevic, P. Ingwersen, & P. Vakkari (Eds.), Digital libraries: Interdisciplinary concepts, challenges and opportunities. Proceedings of the third international conference on the conceptions of the library and information science (pp. 207–218). Zagreb & Lokve: Filozofski fakultet Zagreb & Naklada Benja. Pharo, N., 2002. The SST method schema: A tool for analysing work task-based Web information search processes (Acta Universitatis Tamperensis 871; also Acta Electronica Universitatis Tamperensis 178, http://acta.uta.fi/teos.phtml?6719, 15 March 2005). Doctoral dissertation, University of Tampere, Tampere. Pharo, N., & Ja¨rvelin, K. (2004). The SST method: A tool for analysing Web information search processes. Information Processing and Management, 40(4), 633–654. Piaget, J. (1971). Biology and knowledge: An essay on the relations between organic regulations and cognitive processes (B. Walsh. Trans.). Edinburgh: Edinburgh University Press. Pirolli, P., Fu, W.-T., 2003. SNIF-ACT: A model of information foraging on the World Wide Web. (UIR-2003-02). http:// www2.parc.com/istl/groups/uir/publications/, 15 March 2005. Also presented at the Ninth International Conference on User Modeling, Johnstown, PA. Reitz, J. M. (2004). ODLIS: Online dictionary for library and information science. Westport: Libraries Unlimited, http://lu.com/odlis/, 15 March 2005. Rieh, S. Y. (2002). Judgment of information quality and cognitive authority in the Web. Journal of the American Society for Information Science and Technology, 53(2), 145–161. Rieh, S. Y. (2004). On the Web at home: Information seeking and Web searching in the home environment. Journal of the American Society for Information Science and Technology, 55(8), 743–753. Savolainen, R. (1998). Use studies of electronic networks: A review of empirical research approaches and challenges for their development. Journal of Documentation, 54(3), 332–351.

784

J. Kari / Information Processing and Management 42 (2006) 769–784

Scull, C., Milewski, A., & Millen, D. (1999). Envisioning the web: User expectations about the cyber-experience. Proceedings of the ASIS Annual Meeting, 36, 17–24. Slone, D. J. (2002). The influence of mental models and goals on search patterns during Web interaction. Journal of the American Society for Information Science and Technology, 53(13), 1152–1169. Steinhagen, E. N., & Moynahan, S. A. (1998). Catalogers must change! Surviving between the rock and the hard place. Cataloging and Classification Quarterly, 26(3), 3–20. Tewksbury, D., & Althaus, S. L. (2000). An examination of motivations for using the World Wide Web. Communication Research Reports, 17(2), 127–138. Wang, P., Hawk, W. B., & Tenopir, C. (2000). UsersÕ interaction with World Wide Web resources: An exploratory study using a holistic approach. Information Processing and Management, 36(2), 229–251. Ylikoski, T. (2003). Access denied: Patterns of consumer Internet information search and the effects of Internet search expertise (Acta Universitatis Oeconomicae Helsingiensis A-214). Helsinki: Helsinki School of Economics. Jarkko Kari has published articles on information seeking related to the paranormal, as well as on Internet searching for personal development. He is also working on information process as a unifying concept, and is concerned with the foundations of information studies. Unusual, alternative, higher and holistic perspectives are his speciality.