Internet usage: Predictors of active users and frequency of use

Internet usage: Predictors of active users and frequency of use

INTERNET USAGE: PREDICTORS OF ACTIVE USERS AND FREQUENCY OF USE Christos Emmanouilides Kathy Hammond f ABSTRACT The factors that predict Internet usa...

193KB Sizes 0 Downloads 68 Views

INTERNET USAGE: PREDICTORS OF ACTIVE USERS AND FREQUENCY OF USE Christos Emmanouilides Kathy Hammond f

ABSTRACT The factors that predict Internet usage patterns are explored through the use of consumer panel data. We look at two major aspects of usage behavior; active (current) versus lapsed usage and usage frequency among current users. We find that the main predictors of active or current use of the Internet are: ● Time since first use of the Internet. Pioneers (very early adopters) are most likely to be active users. However, the relationship is not a linear one; middle adopters are more likely than other groups to have not used the Internet in the previous month. ● Location of use. Social use at home, especially with two or more other people. ● Specific services used. Personal communication is the most © 2000 John Wiley & Sons, Inc. and Direct Marketing Educational Foundation, Inc. f JOURNAL OF INTERACTIVE MARKETING VOLUME 14 / NUMBER 2 / SPRING 2000

17

CHRISTOS EMMANOUILIDES and KATHY HAMMOND are with the London Business School, Sussex Place, Regent’s Park, London NW1 4SA, United Kingdom.

JOURNAL OF INTERACTIVE MARKETING

popular activity (used by just over half the sample), but the best predictors of active users are use of information services.

DATA AND METHODOLOGY The survey data we analyze were collected by NOP Research Group Ltd in four successive waves: December 1995, June 1996, December 1996, and June 1997. Using their large (N ⫽ approximately 11,000) and regular Omnibus surveys, about 1,000 people aged 15 years and over who had used the Internet in the previous 12 months were identified and recruited at each wave. The subsequent survey on Internet use was carried out over the telephone. The following aspects of claimed Internet usage behavior were measured:

The main predictors of frequent or heavy Internet use (20⫹ times per month) are: ● Broad applications, e.g., business email followed by personal email. ● Time since first Internet use. The relationship here is linear; the longer someone has used the Internet, the more likely they are to be a heavy user. ● Location of use. Use at work and use at home with two or more other people are both strong predictors of heavy usage.

a) date of last use and usage frequency; b) time of first Internet use; c) reasons for using the Internet currently and reason for first use; d) locations of usage (home, office, school/ university); e) broad applications (e.g., email, Web access, newsgroup membership, etc.) and more specific activities/services (e.g., home banking, newspapers online) performed over the Internet; f) demographics and other background information.

INTRODUCTION Like pioneer adopters of the telephone and the computer, early adopters of the Internet may be systematically different from average users who come along later (Kraut, 1996). We therefore investigate differences in the applications and services used by active (i.e., current as opposed to lapsed) users, and by light, medium, and heavy Internet users. Here we build on earlier studies from the United States that have focused on identifying the characteristics and usage patterns of different groups of Internet adopters (Gupta & Chatterjee, 1997; Miller, 1996; Sivadas, Grewal, & Kellaris, 1998). Our focus is on the United Kingdom and our aim is to identify predictors of both active versus lapsed users and usage patterns (e.g., heavy versus light and moderate users). We extend our analysis beyond the usual demographics to additional aspects of Internet usage including location of use, reasons for using the Internet, specific activities, and applications performed over the Internet. Logistic regression models are used to predict active usage and usage patterns and we assess the predictive capacity of these models. We discuss the model findings and illustrate the main effects. Directions for future research are also considered, including measurement and modeling needs. JOURNAL OF INTERACTIVE MARKETING

Two separate studies were conducted: one on current versus lapsed users and one on frequency of use among current users. In the two studies we examine the same set of explanatory factors in an attempt to explain usage behavior (definitions can be found in the Appendix). The only exception is that in the first study we do not employ information on the broad applications used, since the relevant question was answered only by current users. In the second study we make use of this information, but we exclude from our analysis the activities performed in the 6-month period before the survey date (as use of these activities have a high degree of correlation with the set of broad application variables). For example, business and personal communications are largely email and chat/newsgroup types of activities. Below we detail the model development for each study and following that we discuss the main findings from each study. ●

18

VOLUME 14 / NUMBER 2 / SPRING 2000

INTERNET USAGE: PREDICTORS

TABLE 1

Internet Use: Active and Lapsed Users

Active users Lapsed users Total sample size

Dec. 1995

June 1996

Dec. 1996

June 1997

Total

63% 37% 849

58% 42% 959

62% 38% 983

61% 39% 979

61% 39% 3770

Active users ⫽ % who used the Internet in the month preceding the survey. Lapsed users ⫽ % who have tried the Internet in the past but not used it in the month preceding the survey.

Study 1: Active Versus Lapsed Users Our first study addresses the question of whether we can distinguish between people who are active (i.e., current or recent) Internet users and those who have tried the Internet in the past but were not users in the previous month—lapsed users. We have to bear in mind that a proportion of those who appear to have lapsed may be people who have not yet passed completely through the trial stage to full adoption (e.g., Roberts & Lilien, 1993; Mahajan, Muller, & Bass, 1993). Distinguishing lapsed users from very infrequent users and from those who are still in the adoption phase is not possible with the present data, and is an issue for further research and data collection. Consequently, what we term in the following discussion as “lapsed use” is more accurately a reported temporal discontinuity of use and as such can be thought of as an aspect of “adoption depth” (Rogers, 1983). From the full sample, totaling 3,992 respondents, we selected those who had given complete answers to a minimum set of questions, thus reducing the effective combined sample size to 3,770. The missing values occur almost at random and thus are unlikely to affect the generalizability of our findings (Little, 1992). Table 1 shows the sample sizes by wave and the percentage of respondents in each wave who were active and lapsed users. The slight increase in lapsed users in the second wave is not significant and the overall pattern is remarkably static allowing us to pool the data across waves and perform a single analysis on them. This increases the power of statistical tests and injects temporal variability into explanatory JOURNAL OF INTERACTIVE MARKETING

variables. We control for possible nonstationarity in the model parameters by the inclusion of calendar main effects and interactions with explanatory variables (Micklewright, 1993). Modeling Active versus Lapsed/Infrequent Users. We assume a binary response for active usage:



0 if individual is not an active Internet user yi ⫽ 1 if individual is an active Internet user , i ⫽ 1, . . . , N (1) A standard logistic model for the probability to be an active user, conditional on a vector of individual specific covariates, xi, assumes the form: p i ⫽ Pr共 y i ⫽ 1兩x i, ␤ 兲 ⫽

exp共 ␤ Tx i兲 , 1 ⫹ exp共 ␤ Tx i兲

(2)

where ␤ is the corresponding parameter vector to be estimated. From a preliminary exploratory data analysis, we selected sets of possible variables that explain the variation in response. The Appendix gives the full set of variables. Model Estimation. The logistic regression model was estimated via maximum likelihood. The log-likelihood contribution from a single observation is: l共 y i, ␪ 兲 ⫽ y i ln 共 p i兲 ⫹ 共1 ⫺ y i兲 ln 共1 ⫺ p i兲.

(3)

Under the assumption of independent observations the sample log-likelihood becomes: ●

19

VOLUME 14 / NUMBER 2 / SPRING 2000

JOURNAL OF INTERACTIVE MARKETING

Study 2: Frequency of Internet Use In the second study, we shift our focus to the frequency of Internet use among active users (again active users are people who have used the Internet at least once in the month preceding the sampling dates). Frequency of use is measured as the claimed number of times the Internet was accessed in the month preceding the survey date. From an evaluation of the raw frequency data we found that the pattern of usage is stable across waves and appears trimodal. We therefore were able to simplify the analysis by grouping people into three usage categories: LOW (used the Internet 1–3 times in the last month), MODERATE (4 –19 times), and HIGH (20⫹ times). As in the previous section, we selected a sample of complete cases on the basis of a set of explanatory variables. The effective sample size for this study is N ⫽ 2,359. Again checking for systematically missing values did not reveal any pattern that should cause concern. In Table 3 we report sample statistics for the three usage classes by survey wave.

TABLE 2

Active versus Lapsed Users: Model Fit Statistics Observations Fitted parameters Model log-likelihood Base-line model log-likelihood LR test statistic p-value AIC In-sample correct classification rate Out-of-sample correct classification rate

3770 46 ⫺1876.2 ⫺2525.03 1297.65 in 45 d f 0 1.02 75.2% 74.3%

冘 l共 y , ␪兲. N

l共y, ␪ 兲 ⫽

i

(4)

i⫽1

This function was maximized using the glm procedure of S-PLUS with logit link. Model selection was performed via a stepwise procedure based on likelihood ratio tests. As several of the possible explanatory variables are cross-correlated, we selected the initial set of model variables to minimize multicollinearity effects.

Modeling Internet Frequency of Use. In addition to the variables used in Study 1, we now incorporate additional information on specific applications performed over the Internet in the month preceding the sampling dates. We fitted several probability models for individual membership in the three simplified usage frequency classes, all having the HIGH usage frequency category as a reference.

Model Fit. Summary fit statistics are given in Table 2. The simple logistic model achieved 75.2% in-sample average correct prediction of individual responses, and a corresponding average of 74.3% for out-of-sample predictions. Examination of the correlation matrix of the final model’s parameter estimates did not reveal any unusual structures that could indicate multicollinearity problems.

Proportional Odds Model. This is a model for ordered responses (McCullagh & Nelder, 1989, pp. 151–155) in which an unobserved latent

TABLE 3

Statistics of Internet Frequency of Use among Current Users Usage Class

Dec 1995

June 1996

Dec 1996

June 1997

Total

LOW MODERATE HIGH Total sample size

26% 36% 38% 555

30% 34% 36% 577

26% 37% 37% 615

32% 34% 34% 612

29% 35% 36% 2359

JOURNAL OF INTERACTIVE MARKETING



20

VOLUME 14 / NUMBER 2 / SPRING 2000

INTERNET USAGE: PREDICTORS

category, then we obtain a more general cumulative probability model (MGCPM), which is specified as above by replacing ␤ with ␤s . The relation above is now satisfied by a more comT plex constraint on the parameters, ␥ s⫺1 ⫺ ␤ s⫺1 T T xi ⱕ ␥ s ⫺ ␤ s xi ⱕ ␥ s⫹1 ⫺ ␤ s⫹1 xi , for all s ⫽ 1, . . . , S. This model may exhibit undesired behavior for reasons reviewed in the literature (McCullagh & Nelder, 1989, p. 155). However, we fitted this model for completeness, and a check of the results did not detect the presence of any important side effects.

variable, or utility for Internet frequency of usage, ␺ i , is assumed to exist for each individual i, i ⫽ 1, . . . , N, related to the observed response, y i , through a threshold representation: y i ⫽ s iff ␥ s⫺1 ⱕ ␺ i ⱕ ␥ s, s ⫽ 1, . . . , S, ␥ 0 ⫽ ⫺⬁, ␥ S ⫽ ⬁.

(5)

This means that according to the value of the latent variable, ␺ i , in the real axis with respect to the thresholds ␥ s the individual response falls in one of the S ⫽ 3 categories of Internet usage. We impose a simple linear regression model for the latent response:

␺ i ⫽ ␤ Tx i ⫹ ␧ i ,

Multinomial Logit Model. Here it is essentially assumed that the ordered nature of the Internet usage categories is not important. This is not a strong assumption, in the sense that the only possible side effect is a slight loss of efficiency and not of consistency in the parameter estimates (Amemiya, 1985, p. 293). This would mean that significant effects would be possibly more difficult to detect through formal statistical tests. We believe that this possible cost is acceptable, because this model can give more detailed and readily interpretable results for the effects of the explanatory variables on Internet usage frequency than the previously described models. This is due to two reasons: First, we can more easily fit nonparallel slopes, ␤s ; and, second, we model directly probabilities and not cumulative probabilities. The results have been checked against those of the other models and do not show any notable differences. For the MNL model, using as reference state the high frequency of Internet usage category, s ⫽ 3, the conditional probability of an individual response to be in category s is given by

(6)

where ␧ i ⬃ N(0, ␴ 2 ), xi is a vector of individual specific explanatory variables, and ␤ is the corresponding parameter vector. Then, the cumulative probability to observe an individual response up to and including category s, s ⫽ 1, . . . , S, conditional on xi , ␤, ␥ is: P i,s ⫽ Pr共 y i ⱕ s兩x i, ␤ , ␥ 兲 ⫽ F共 ␥ s ⫺ ␤ Tx i兲,

(7)

where F( 䡠 ) is a distribution function, in our case the logistic: F共 䡠 兲 ⫽

exp共 䡠 兲 . 1 ⫹ exp共 䡠 兲

(8)

Consequently, the probability for category s is: p i,s ⫽ Pr共 y i ⫽ s兩x i, ␤ , ␥ 兲

(9)

⫽ F共 ␥ s⫹1 ⫺ ␤ Tx i兲 ⫺ F共 ␥ s ⫺ ␤ Tx i兲. An obvious constraint is: P i,s ⱖ P i,s⫺1 ⱖ 0, ᭙ s ⫽ 1, . . . , S,

exp共 ␤ sTx i兲

p i,s ⫽

(10)

1⫹

and it is satisfied immediately (for the distribution function is monotonic) if

, s ⫽ 1, 2,

T r i

r⫽1

1

p i,s ⫽

␥ s⫺1 ⱕ ␥ s ⱕ ␥ s⫹1, for all s ⫽ 1, . . . , S. If we allow the parameter vector to vary by

冘 exp共␤ x 兲 2

1⫹

JOURNAL OF INTERACTIVE MARKETING

冘 exp共␤ x 兲 2

, s ⫽ 3.

T r i

r⫽1



21

VOLUME 14 / NUMBER 2 / SPRING 2000

(11)

JOURNAL OF INTERACTIVE MARKETING

value of the Akaike information criterion (Akaike, 1973).

TABLE 4

Frequency of Use: Fit Statistics of the Alternative Models

Model Proportional odds MGCPM MNL

Loglikelihood

Number of Parameters (k)

Akaike Information Criterion (AIC)

⫺2093.97

59

1.825

⫺2030.17 ⫺2022.66

116 116

1.820 1.813

AIC ⫽ ⫺

冘y

MODEL RESULTS AND DISCUSSION Study 1: Active versus Lapsed Users The main factors affecting continuity of Internet use (i.e., current or active users) are (in order of importance based on likelihood-ratio tests): time since first use; location of use (and whether the Internet is used alone or with other people); type of Internet connection; types of application or service used in the last 6 months; reasons for first use; working status of the respondent and who pays for the online time/ connection. Once we control for these main effects, other demographic factors such as age, presence of children at home, sex of respondent, and income have no significant impact on continuity of use. Table 5 illustrates the parameter estimates and univariate Wald statistics for the main factors. In Tables 6 to 9 we illustrate the scale and relevance of our findings by breaking down each of the main model factors by the percentages of current and lapsed users.

3

i,s

ln 共 p i,s兲,

(12)

s⫽1

where



1 if y i ⫽ s y i,s ⫽ 0 if y ⫽ s . i

(13)

Under the assumption of independent observations the sample log-likelihood becomes:

冘冘y N

l共y; ␪ 兲 ⫽

3

i,s

ln 共 p i,s兲.

(14)

i⫽1 s⫽1

We maximized this function using Microsoft FORTRAN 90 and the IMSL nonlinear optimization routine NCONF, together with the routines FDHES, EVLRG, and LINRG for the calculation of the standard errors from the observed information matrix. The three estimated models resulted in almost identical and consistent answers for the effects of the explanatory variables and their significance on predicting Internet usage frequency as summarized by the three categories. In Table 4 we list the fit statistics for the three models, together with the JOURNAL OF INTERACTIVE MARKETING

(15)

Here k is the number of fitted model parameters, and n is the sample size. We see from the values of the AIC statistic that the models provide similar fit to the data, but the MNL seems to perform slightly better. Below we report the main results from the MNL model.

Model Estimation. The above models have been estimated by maximum likelihood. The log-likelihood contribution from a single observation is:

l共 y i; ␪ 兲 ⫽

2 共l共y; ␪ 兲 ⫺ k兲, n

Time Since First Internet Use. There is a nonlinear relationship between active Internet use and time since first use of the Internet. We illustrate this in Table 6, where users are divided into four groups: new users—those whose first use of the Internet was within 3 months of the sample date (10% of the sample); middle adopters—those whose first use was 3–12 months before the sample date (51%); early adopters—the 26% of the sample who have been using the Internet for 1–2 years before the ●

22

VOLUME 14 / NUMBER 2 / SPRING 2000

INTERNET USAGE: PREDICTORS

TABLE 5

Active Use: Most Significant Parameter Estimates of the Logistic Model Response: Used in the Last Month Parameter Estimates (Wald Statistic)

Variable

Variable

Response: Used in the Last Month Parameter Estimates (Wald Statistic)

Main location of use Home: Alone With one more person With two more persons With three or more persons Work School University Internet Cafe´ Other

1.26 (8.90) 1.25 (7.04) 1.92 (5.86) 2.05 (5.37) 0.71 (5.52) ⫺0.15 (⫺1.22) 0.32 (2.71) ⫺0.03 (⫺0.18) 0.04 (0.48)

⫺1.14 (⫺5.62)

Constant Time since first use 1–3 months 3–6 months 6–12 months 1–2 years ⬎2 years

⫺0.73 (⫺4.59) ⫺0.93 (⫺6.46) ⫺0.71 (⫺4.66) ⫺0.34 (⫺1.81) (reference)

Type of Internet connection

Services used

Modem: No Modem Don’t Know ⬍9.6 Kbps 9.6 Kbps 14.4 Kbps 28.8 Kbps 33.6 Kbps ⬎33.6 Kbps

Home Banking Home Working Personal Communication Business Communication Job Hunting Travel Information Hotel Information Newspapers Online Playing Games Public Information Other

⫺0.54 (⫺1.54) ⫺0.34 (⫺2.37) 0.64 (7.05) 0.57 (5.14) 0.49 (3.33) 0.43 (3.36) 0.35 (1.93) 0.63 (5.72) ⫺0.31 (⫺2.85) ⫺0.20 (⫺2.00) ⫺0.75 (⫺4.40)

Working status Full-time Employed Part-time Employed Student Not Working

⫺0.18 (⫺1.16) 0.69 (5.23) ⫺0.65 (⫺3.64) (reference)

(reference) ⫺0.37 (⫺3.23) ⫺0.31 (⫺1.25) ⫺0.52 (⫺2.47) 0.36 (2.26) 0.66 (4.31) 0.37 (1.66) 1.03 (1.81)

Network server

0.32 (3.20)

Reason for using Internet General Interest Clients’ Request Business Information Leisure Information Available at Work Available at University Email

⫺0.20 (⫺2.25) 0.27 (1.10) 0.30 (2.37) 0.12 (0.98) 0.45 (3.19) 0.34 (1.84) 0.33 (2.40)

Who pays for Internet use Respondent Employer College/School

0.61 (4.25) 0.66 (4.11) 0.39 (2.56)

JOURNAL OF INTERACTIVE MARKETING



23

VOLUME 14 / NUMBER 2 / SPRING 2000

JOURNAL OF INTERACTIVE MARKETING

TABLE 6

Active Use: Time Since First Internet Use How Long People Have Been Using the Internet*: New users: less than 3 months use Middle adopters: 3–12 months use** Early adopters: 1–2 years use Pioneers: more than 2 years use

% In Segment

% Who Are Current Users

10 51 26 14

64 53 65 82

Base ⫽ all users of Internet in previous year, N ⫽ 3,370. * Relative to survey date. ** In the logistic regression model we split this group further into 3– 6 months use and 6 –12 months use.

but this rises to 52% of the group who use the Internet with two or more other people at home.

sample date; pioneers—those who have been using the Internet for more than 2 years before the sample data and who make up 14% of the sample. Middle adopters are more likely not to have used the Internet in the last month compared with both earlier adopters and new users. Pioneers are much more regular Internet users than any other group.

Type of Internet Connection. Not surprisingly those users who had a modem connection with a high data transfer speed were more likely to be active users. Also, access through a network server is positively related to active Internet use.

Location of Use. While controlling for the effects of other covariates (including working status and payment of Internet connection expenses), people who use the Internet from home or from work are less likely to be lapsed users than those who access the Internet elsewhere (e.g., at an Internet Cafe´, or at school or university). For instance, from Table 7 we see that 34% of Internet users access the Internet from home, and of these 79% are current users. A higher percentage (50%) access the Internet from work, and of these 72% are current users.

Internet Services Used. The next most important factor influencing active Internet use is which specific services are used. Different activities involve individuals to a varying degree that reflects their needs, personality characteristics, and tastes. There is very strong evidence from the data that the type of activity Internet users are engaging in online explains much of the

TABLE 7

Active Use: Location Where Internet Is Used

Social Use at Home. Looking further into the depth of adoption in the home, people who use the Internet at home with two or more other people are significantly more likely to be current users than those who use it alone or with just one other person. Table 8 shows that while only 5% of respondents used the Internet with two or more other people at home, 85% of these people had used it in the last month. These other home users are most likely to be children (under 16 years); 35% of the total sample had school-age children living at home, JOURNAL OF INTERACTIVE MARKETING

Location

% Using the Internet at Each Location*

% Who Are Current Users

Home Work University Internet cafe School

34 50 29 9 16

79 72 63 59 48

Base ⫽ all users of Internet in previous year, N ⫽ 3,370. * Note: 35% also used the Internet in an unspecified location. Also percentages sum to more than 100 as many respondents used the Internet in more than one location.



24

VOLUME 14 / NUMBER 2 / SPRING 2000

INTERNET USAGE: PREDICTORS

contrast, initial motivations such as “general interest” and “availability at higher education establishments” relate more to lapsed use. There are no significant relationships between the stated initial reason for using the Internet and the length of time an individual respondent had been using it; we can therefore deduce that initial motivations for use have a persistent effect on later behavior.

TABLE 8

Active Use: Social Use at Home

Use Alone or with Others

% In Segment

% Who Are Current Users

With 2 or more other people Alone at home With one other person

5 21 9

85 79 75

Working Status. Students are the group most likely to be active Internet users, not surprising given the almost universal access at colleges. People out of paid work have a more irregular pattern of usage, while the part-time employed do not differ significantly in their behavior from full-time employed users.

Base ⫽ all users of Internet in the previous year, N ⫽ 3,770. * A ␹2 test of independence in the crosstabulation corresponding to Table 8 gives a p-value of 0.02; the differences are significant at the 0.05 level.

variation in the temporal stability of their usage behavior. More specifically, controlling for the effects of other model variables, the use of the Internet for reading newspapers online, working from home, personal and business communications, job hunting, and travel and hotel information are activities that characterize those who are active users. On the other hand, playing games and accessing public information sources are more related to lapsed Internet use. In Table 9 we show the 10 most popular services and the percentage of users of each service who are active users. While personal communications is the most popular activity (used by just over half the sample), the predictors of active Internet use are predominantly business-related applications such as searching for hotel and travel information, job hunting, newspapers online, and business email. Home banking is used by just 2% of the sample, but 77% of these people used this service in the month preceding the survey. Twenty percent of the sample use the Internet for playing games, but only just over half (56%) have done so in the month preceding the survey.

Payment of Internet Connection. Those who pay for their Internet access are, on average, the most likely to be active users, followed by those with free access at educational establishments, and then by those whose access to the Internet is paid by employers (this is controlling for working status and other work-related variables, and therefore much of this effect is due to free access).

Study 2: Frequency of Use In Table 10 we show the most significant estimated effects from the MNL model. Together with parameter estimates we report estimates of the magnitude of simultaneous regressor effects on Internet usage frequency class membership probabilities. These are the partial derivatives of equation (2) with respect to the explanatory variables evaluated at the sample mean, providing a summarizing indicator of the mean effects of each regressor variable on the probability of the three response states (Cramer, 1991). As in our findings for current versus lapsed users, several variables are unavoidably correlated with each other. Inspection of the covariance matrix of the parameter estimates did not reveal any unusually large elements that could cause serious multicollinearity problems. The MNL model achieved a 54.2% correct prediction of individual membership in the three usage categories (each having an actual

Reported Reasons for Using the Internet for the First Time. The stated reasons for first using the Internet reflect the dominant individual initial attitudes and expectations of benefits from using the Internet. Active usage is closely related to initial motivations such as leisure and seeking business information, communication needs (e.g., email), and availability at work. In JOURNAL OF INTERACTIVE MARKETING



25

VOLUME 14 / NUMBER 2 / SPRING 2000

JOURNAL OF INTERACTIVE MARKETING

most other covariate interactions we tested. Presence of children in the home, gender of the respondent, and income have no explanatory value. As it is not possible to directly represent the results of the multivariate conditional model in a simple cross-tabular form, in Tables 11 to 13 we illustrate the observed unadjusted effects only, by breaking down the percentages in the three frequency segments by the variables within each of the main factors. We also report the relevant observed odds ratios relative to the low-usage category (the response reference category).

TABLE 9

Active Use: Specific Internet Applications Used in the Last 6 Months

Service

% Using This Service*

% Who Are Current Users

Hotel information Travel information Job hunting Newspapers online Business communications Home banking Home working Personal communications Public information Playing games

9 21 12 23 43 2 13 51 25 20

82 81 79 78 77 77 76 74 70 56

Broad Applications Used. Even though it is tautologous that the more applications a person performs, the higher the usage frequency, there are differences in usage frequency by application type (controlling for other covariate effects). As expected, email either for personal or business use is, for the average Internet user, a frequently performed activity. This is reflected in the estimated model effects and parameters. Membership of the high-frequency group is positively affected by all application variables. The greatest effects come from business email followed by personal email and access of the Web. However, with the present measurement scale we cannot discriminate well between moderate and heavy users, and this is true for most application types. Most applications’ average effects on moderate group membership are small and insignificant. In Table 11 we show the observed relationship between broad applications used and frequency of use. Web access is a frequently performed activity, though less than email. Other applications such as downloads, newsgroups, and mailing lists are associated with heavy usage, but again to a lesser degree than email and Web access. More specialized applications, including Internetphone, IRC, and video viewing, are strongly associated with heavier Internet use (these are grouped under “Other” in Table 11).

Base ⫽ all users of the Internet in the previous year. N ⫽ 3,770. * At least once in the previous 6 months.

size of about 31 of the total effective sample, N ⫽ 2,359). The success rate was higher for the LOW and HIGH categories (60% and 69% respectively), and lower for the MODERATE category (35%). Importantly though, only 7.6% of all cases were wrongly assigned to categories non-neighboring to the actual (that is from LOW and HIGH and vice versa). This pattern did not improve under the proportional odds and the MGCPM models. Main Findings. Inspection of the p-values of the likelihood ratio test statistics on the groups of explanatory variables in Table 10 indicate that the main predictors of Internet frequency of use are (in order of statistical importance): broad applications performed in the month preceding survey date (e.g., personal or business email, etc.); location of use; time since first use; reasons for first using the Internet; more specific activities/services used in the last 6 months prior to survey date; and working status. Very few interactions between calendar time and covariates proved to have explanatory value. The same holds for JOURNAL OF INTERACTIVE MARKETING

Location of Use. Location of use is a strong predictor of heavy use (see Table 12). Use at work is the strongest discriminator between heavy and ●

26

VOLUME 14 / NUMBER 2 / SPRING 2000

TABLE 10

Internet Usage Frequency in the Last Month, MNL Model

MODERATE Category Estimates (Wald-Statistic)

HIGH Category Estimates (Wald-Statistic)

Constant

⫺2.21 (⫺5.37)

Broad Application Personal email Business email World Wide Web FTP and downloads Newsgroups Mailing lists Other

Variable

Magnitude of Simultaneous Effects of Regressors on Class Membership (Derivatives Evaluated at the Sample Average) (Bold Face Indicates Statistical Significance (t ⬎ 1.5)

Likelihood Ratio Statistic (p-Values Based on ␹2)

LOW

MODERATE

HIGH

⫺2.67 (⫺5.90)







0.89 (6.80) 0.74 (5.31) 0.51 (3.85) 0.23 (1.86) 0.14 (1.11) ⫺0.02 (⫺0.14) 0.50 (3.70)

1.13 (7.63) 1.27 (8.10) 0.85 (5.53) 0.34 (2.43) 0.29 (2.11) 0.33 (2.15) 0.76 (5.06)

ⴚ0.21 ⴚ0.21 ⴚ0.14 ⴚ0.06 ⴚ0.04 ⫺0.03 ⴚ0.13

0.06 0.01 0.01 0.01 ⫺0.01 ⴚ0.05 0.02

0.15 0.20 0.13 0.05 0.05 0.08 0.11

Location of use Home: Alone With one more person With two more persons With three or more persons Work School University Internet Cafe´ Other

0.64 (3.67) 0.63 (2.73) 0.43 (1.20) 0.81 (1.99) 0.43 (2.58) ⫺0.58 (⫺3.05) 0.51 (3.06) ⫺0.37 (⫺1.72) ⫺0.20 (⫺1.44)

0.68 (3.51) 0.68 (2.61) 1.03 (2.71) 1.28 (2.81) 0.69 (3.67) ⫺0.52 (⫺2.29) 0.29 (1.56) ⫺0.21 (⫺0.89) ⫺0.08 (⫺0.51)

ⴚ0.13 ⴚ0.14 ⴚ0.15 ⴚ0.21 ⴚ0.11 0.12 ⴚ0.08 0.06 0.04

0.06 0.06 ⫺0.03 0.02 0.01 ⴚ0.07 0.08 ⫺0.06 ⫺0.03

0.07 0.08 0.18 0.19 0.10 ⫺0.05 0.00 ⫺0.00 0.01

Time since first use ⬍1 month 1–3 months 3–6 months 6–12 months 1–2 years ⬎2 years

0.37 (1.03) ⫺0.00 (⫺0.01) 0.00 (0.01) 0.04 (0.21) 0.01 (0.05) (reference)

⫺0.76 (⫺1.72) ⫺1.26 (⫺4.41) ⫺0.78 (⫺3.07) ⫺0.85 (⫺3.86) ⫺0.67 (⫺3.04) (reference)

0.04 0.13 0.08 0.08 0.07 —

0.18 0.16 0.10 0.12 0.09 —

ⴚ0.22 ⴚ0.29 ⴚ0.18 ⴚ0.20 ⴚ0.16 —

Reason for first use General interest Better communications Available at school Email

⫺0.08 (⫺0.64) 0.59 (2.22) 0.31 (0.83) 0.47 (2.50)

⫺0.38 (⫺0.27) 0.77 (2.76) ⫺1.47 (⫺1.84) 0.71 (3.54)

0.05 ⴚ0.14 0.12 ⴚ0.12

0.03 0.04 0.26 0.02

ⴚ0.08 0.10 ⴚ0.38 0.10

42.37 (1.16e-006)

0.08 (0.41) 0.55 (2.61) 0.32 (2.10) 0.24 (1.41)

0.42 (1.98) 0.41 (1.80) 0.43 (2.60) 0.51 (2.78)

⫺0.05 ⴚ0.10 ⴚ0.08 ⴚ0.08

⫺0.04 0.08 0.02 ⫺0.01

0.09 0.02 0.06 0.09

41.26 (1.86e-006)

Services used Home working Hotel information reading Newspapers online Magazines online

257.44 (⬍1.0e-008)

62.42 (8.27e-007)

46.80 (1.03e-006)

JOURNAL OF INTERACTIVE MARKETING

TABLE 10

Continued

MODERATE Category Estimates (Wald-Statistic)

HIGH Category Estimates (Wald-Statistic)

Work status FT employed PT employed Student Not working

0.42 (1.41) ⫺0.01 (⫺0.03) 1.11 (3.49) (reference)

⫺0.02 (⫺0.05) ⫺0.69 (⫺1.74) 0.23 (0.63) (reference)

Type of connection Network server

⫺0.06 (⫺0.45)

Who pays for connection College/School Other

⫺0.54 (⫺2.43) ⫺0.58 (⫺2.84)

Variable

Main purpose for use Work at academic institution Business use Date of data collection Dec 1995 June 1996 Dec 1996 June 1997

0.31 0.36

(1.47) (2.06)

(reference) ⫺0.24 (⫺1.39) ⫺0.12 (⫺0.66) ⫺0.33 (⫺1.77)

Magnitude of Simultaneous Effects of Regressors on Class Membership (Derivatives Evaluated at the Sample Average) (Bold Face Indicates Statistical Significance (t ⬎ 1.5) LOW

MODERATE

HIGH

Likelihood Ratio Statistic (p-Values Based on ␹2)

⫺0.04 0.07 ⴚ0.14 —

0.10 0.09 0.23 —

⫺0.06 ⴚ0.16 ⫺0.09 —

(3.00)

⫺0.04

ⴚ0.07

0.11

17.09 (1.95e-004)

⫺0.62 (⫺2.42) ⫺1.05 (⫺4.03)

0.12 0.17

⫺0.05 0.00

⫺0.07 ⴚ0.17

19.97 (5.07e-004)

(2.63) (3.86)

ⴚ0.09 ⴚ0.11

⫺0.01 ⫺0.01

0.10 0.12

18.24 (1.11e-003)

(reference) ⫺0.22 (⫺1.15) ⫺0.25 (⫺1.27) ⫺0.54 (⫺2.60)

0.05 0.04 0.09

⫺0.03 0.01 ⫺0.01

⫺0.02 ⫺0.03 ⴚ0.08

0.44

0.59 0.73

8.08 (2.32e-001)

Overall, very early adopters or pioneers are more likely than not to be heavy users, even when we control for the effects of other covariates; as can be seen in Table 10, the relative effects of all levels of the variable with respect to the reference category of pioneers (first used more than 2 years ago) are negative and large. These effects may be explained by the self-selecting nature of the sample, combined with the argument that involvement increases more or less steadily with duration since first use for those who remain in the Internet population. In Table 13 we illustrate the implications of these findings by showing time since first use by the percentages of respondents in the different frequency segments.

light users (work users tend to be heavier users), but use at home with two or more other people is also a factor signaling heavy use. Time Since First Internet Use. As in the study of current users, the length of time since first Internet use is a predictor with large and significant effects on usage frequency. But unlike in Study 1, its effects are now almost linear on the utility to be a low or moderate Internet user compared with a heavy one. This utility decreases inversely to the time elapsed since first use. However, it cannot discriminate well between the low and mo derate usage classes, except in the case of new Internet users (first used 0 –1 month ago). JOURNAL OF INTERACTIVE MARKETING

30.54 (3.10e-005)



28

VOLUME 14 / NUMBER 2 / SPRING 2000

INTERNET USAGE: PREDICTORS

TABLE 11

Frequency of Use: Broad Applications Used

Broad Applications Used

% Low Users

% Mod. Users

% High Users

Odds Ratio, Mod. Use**

Odds Ratio, High Use**

Business email Personal email Web access Downloads* Newsgroup membership Mailing lists “Other”

16 20 24 22 21 22 22

34 38 36 35 34 31 35

50 42 41 44 45 47 42

2.50 2.44 1.80 0.98 1.53 1.20 1.41

7.88 2.94 2.86 2.68 2.48 2.11 1.76

Base ⫽ all users of Internet in previous month, N ⫽ 2,329. * Of software, film, music, FTP. ** Versus low use, with respect to ‘not used’ the specified application.

Other Effects. The reported reasons for first using the Internet again play a predictive role. Frequent usage is positively related to Internet usage initiated because of communication needs. Similar to the results from Study 1, ‘vague’ initial motivations such as “general interest” and “availability at school” relate strongly to low-frequency use. Specific types of activity on the Internet, namely home working, seeking hotel information, and reading newspapers and magazines online, have also some small discriminatory power.

know what types of consumer they are going to reach. The broad demographics of Internet users are well known (with a bias to young, affluent, college-educated males, but continually moving closer to the population norm as Internet adoption grows). However, our findings show other less intuitive predictors of both active and frequent use. We find that the main predictors of active or current use of the Internet are: ●

IMPLICATIONS FOR BUSINESS Companies who develop an Internet presence in order to interact with consumers need to

Time since first use of the Internet. Pioneers (very early adopters) are most likely to be current users. However, the relationship is not a linear one; middle adopters are more likely than other groups to have not used the Internet in the previous

TABLE 12

Frequency of Use: Location of Use

Location of Use Home Work School University Internet Cafe´

% Low Users

% Mod. Users

% High Users

Odds Ratio, Mod. Use*

Odds Ratio, High Use*

20 21 46 26 29

37 33 32 39 30

43 46 23 36 41

1.93 1.64 0.51 1.30 0.81

2.49 4.07 0.34 1.11 1.10

Base ⫽ all users of Internet in previous month, N ⫽ 2,329. * Versus low use, with respect to ‘not using the Internet at the specified location’.

JOURNAL OF INTERACTIVE MARKETING



29

VOLUME 14 / NUMBER 2 / SPRING 2000

JOURNAL OF INTERACTIVE MARKETING

TABLE 13

Frequency of Use: Time Since First Internet Use How Long People Have Been Using the Internet:* New users: less than 3 months use Middle adopters: 3–12 months use Early adopters: 1–2 years use Pioneers: more than 2 years use

% Low Users

% Mod. Users

% High Users

Odds Ratio, Mod. Use**

Odds Ratio, High Use**

45 32 25 12

37 38 36 25

18 30 38 62

0.41 0.58 0.70 —

0.08 0.19 0.30 —

Base ⫽ all users of Internet in previous month, N ⫽ 2,329. * Relative to survey date. ** Versus low use, with respect to pioneers.



month. One possible explanation of this nonlinear effect could be that new users require some time to realize the benefits of using the Internet and after a time period of 3 or more months stabilize their behavior towards less regular Internet use. Middle adopters will not necessarily ever behave like pioneers.

The main predictors of frequent or heavy Internet use (20⫹ times per month) are similar to those for active use, but the order is slightly different: ●

Broad applications. The most discriminating driver of heavy compared with light use is business email followed by personal email and then other applications such as access to the Web, newsgroup membership, etc. ● Time since first Internet use. The relationship here is linear; the longer someone has used the Internet, the more likely they are to be a heavy user. ● Location of use. Use at work and use at home with two or more other people are strong predictors of heavy usage.



Location of use. Respondents who use the Internet from home or from work are less likely to be lapsed users than those who access the Internet elsewhere (e.g., at an Internet Cafe´, school or university). Social use at home, especially with two or more other people, is a strong predictor of current use—such people are significantly more likely to be current users than those who use it alone or with just one other person. Even though causality cannot be assessed through these crosssectional data, this finding may indicate that: work-related needs and peer pressure are major factors in steady Internet usage, as predicted by standard diffusion theory; PC home ownership increases the depth of adoption or usage; there is a domestic diffusion effect. ● Specific services used. Personal communications account for the most popular activity (used by just over half the sample), but the main predictors of recent Internet use are predominantly information services. JOURNAL OF INTERACTIVE MARKETING

Initial motivations. Initial motivations for use have a persistent effect on later behavior.

LIMITATIONS AND FUTURE RESEARCH These two studies have revealed potentially powerful predictors for the temporal stability and frequency of Internet use. There are, however, several limitations to this type of analysis. First, measurement of adoption or usage depth is limited; a better measure would take into account the average number of hours spent daily online. Second, consistent measurement of additional variables, such as attitudes and personality traits, ownership of other technologies, and variables directly relating to social or ●

30

VOLUME 14 / NUMBER 2 / SPRING 2000

INTERNET USAGE: PREDICTORS

a nested logit approach, the continuity of usage spells, and the usage frequency within spells of Internet activity. Survival analysis can also be used to model the duration since first Internet use. These techniques may answer questions such as what distinguishes earlier from later adopters, and provide us with suggestions for longitudinal data collection in order to assess this issue further. Longitudinal panel data will enable us to obtain better measures of both patterns of Internet use and factors that may explain them.

work-related peer influence on the decision to adopt and use, would add further insight. Third, a clearer picture of the differences between the various applications’ effects on the level of overall Internet usage requires measurement of the frequencies with which each application is performed by individuals. This will enable better variable selection and model building procedures and improve the descriptive and predictive power of our statistical analyses. Based on the exploratory results presented here, work is in progress to model jointly, using

APPENDIX

Sets of possible explanatory variables Groups of Variables Age group Gender Income Children under the age of 15 at home Working status Sampling date Locations ever used the Internet, including detailed usage at home

Broad applications used on the Internet in the last month

Activities for which the Internet was used in the last 6 months

Time first used the Internet

Type Ordinal: 1 ⫽ 15–17, 2 ⫽ 18–24, 3 ⫽ 25–344 ⫽ 35–44, 5 ⫽ 45–54, 6 ⫽ 55⫹ Binary: 1 ⫽ Male, 2 ⫽ Female Ordinal: 1 ⫽ £ 0–10K, 2 ⫽ £ 11–15K, 3 ⫽ £ 16–20K 4 ⫽ £ 21–25K, 5 ⫽ £ 26–30K, 6 ⫽ £ 31K⫹, 0 ⫽ Missing Binary: 0 ⫽ No, 1 ⫽ Yes Categorical: 1 ⫽ FT employed, 2 ⫽ PT empl., 3 ⫽ Student, 4 ⫽ Not working Ordinal: 1 to 4 Set of binary variables**, one ordinal: 1: Home, 0 ⫽ No, 1 ⫽ Using at home alone, 2 ⫽ Using at home with 1 more person, 3 ⫽ Using at home with 2 more, 4 ⫽ Using at home with 3⫹ more 2: Work, 0 ⫽ No, 1 ⫽ Yes 3: School, 4: University 5: Internet Cafe´, 0: Other*** Set of binary variables** 1: Personal email, 2: Business email, 3: World Wide Web, 4: Downloads of software, film, music, FTP 5: Newsgroups, 6: Mailing lists, 7: Other specialized Set of binary variables** 1: Home shopping, 0 ⫽ No, 1 ⫽ Yes 2: Home banking, 3: Education, 4: Buying a car, 5: Home working 6: Personal communication, 7: Bus. communication 8: Downloading film/music, 9: Job hunting, 10: Holiday information, 11: Travel Information, 12: Hotel Information, 13: Reading papers online 14: Reading magazines online, 15: Playing games 16: Accessing public information, 0: Other Ordinal: 1: 0–1 month ago, 2: 1–3 months ago, 3: 3–6 months ago, 4: 6–12 months ago, 5: 12–24 months ago, 6: More than 24 months ago

JOURNAL OF INTERACTIVE MARKETING



31

VOLUME 14 / NUMBER 2 / SPRING 2000

JOURNAL OF INTERACTIVE MARKETING

APPENDIX

Continued Groups of Variables Reasons for using the Internet the first time

Purposes the Internet was mostly used for Type of Internet access, Including detailed modem information

Who pays for Internet access

Type Set of binary variables** 1: General interest/curiosity, 0 ⫽ No, 1 ⫽ Yes 2: Clients at work requested it, 3: To improve firms’ image, 4: To save time, 5: To save money, 6: Better communications, 7: Business information (incl. travel), 8: Leisure information, 9: Education, 10: Other, 11: Downloading software, 12: Friends’ recommendation, 13: Available at work 14: Available at school, 15: Available at higher education, 16: Email Set of binary variables** 1: Other, 2: Leisure or personal, 3: Academic study 4: Work at an academic institution, 5: Work or business use Set of binary variables**, one ordinal TACC1: Modem, 0 ⫽ No modem, 1 ⫽ Yes, but speed unknown, 2 ⫽ Less than 9600 kbps, 3 ⫽ 9600 kbps, 4 ⫽ 14400 kbps, 5 ⫽ 28800 kbps, 6 ⫽ 33600 kbps or more TACC2: ISDN, TACC3: Network server Set of binary variables** 1: Respondent, 2: Employer, 3: College/School, 4: Other

** When sets of binary variables are used, respondents are able to give more than one response to the question. *** Includes usage at Friends/Relatives, hotels, mobile phone/GSM network or TV set top box.

REFERENCES

(Eds.), Marketing (pp. 349 – 407). Elsevier (Handbooks in Operations Research and Manufacturing Science, 5).

Akaike, H. (1973). Information Theory and an Extension of the Maximum Likelihood Principle, In B.N. Petrov & F. Csaki (Eds.), Second International Symposium on Information Theory. Budapest: Akademia. Amemiya, T. (1985). Advanced Econometrics. Oxford: Basil Blackwell. Cramer, J. (1991). The Logit Model: An Introduction for Economists. London: Edward Arnold. Gupta, S., Chatterjee, R. (1997). Consumer and Corporate Adoption of the World Wide Web as a Commercial Medium. In R.A. Peterson (Ed.), Electronic Marketing and the Consumer (pp. 139 –154). Sage Publications. Kraut, R. (1996). The Internet at Home. Communications of the ACM, 39 (12), 33–35. Little, R.J.A. (1992). Regression with missing X’s: A Review. Journal of the American Statistical Association, 87–(Dec), 420, 1227–1237. Mahajan, V., Muller, E., & Bass, F.M. (1993). New Product Diffusion Models. In J. Eliashberg & G.L. Lilien

JOURNAL OF INTERACTIVE MARKETING

McCullagh, P., & Nelder, J.A. (1989). Generalised Linear Models (2nd ed.) London: Chapman and Hall. Micklewright, J. (1993). The Analysis of Pooled CrossSectional Data: Early School Learning. In R.B. Davies & A. Dale (Eds.), Analysing Social and Political Change: A Casebook of Methods. Miller, T.E. (1996). Segmenting the Internet. American Demographics, 18 (July), 48 –52. Roberts, J.H., & Lilien, G.L. (1993). Explanatory and Predictive Models of Consumer Behavior. In J. Eliashberg & G.L. Lilien (Eds.), Marketing (pp. 27– 82). Elsevier (Handbooks in Operations Research and Manufacturing Science, 5). Rogers, E.M. (1983). Diffusion of Innovations (3rd ed.). New York: Free Press. Sivadas, E., Grewel, R., & Kellaris, J. (1998). Targeting Consumers through Preferences Revealed in Music Newsgroup Usage. Journal of Business Research, 41 (3), 179 –186.



32

VOLUME 14 / NUMBER 2 / SPRING 2000