A meta-analysis of materiality studies

A meta-analysis of materiality studies

A META-ANALYSIS OF MATERIALITY STUDIES Eugene G . Chewning, Jr . and Julia L . Higgs ABSTRACT Materiality is one of most widely studied topics in au...

1MB Sizes 1 Downloads 11 Views

A META-ANALYSIS OF MATERIALITY STUDIES

Eugene G . Chewning, Jr . and Julia L . Higgs

ABSTRACT Materiality is one of most widely studied topics in auditing . Because materiality is not well-defined, considerable research has been conducted to determine how financial statement preparers and users operationalize the concept . We conduct a meta-analytic review of the literature to document the strength of each of 11 frequently-used materiality measures studied in the empirical research on materiality . Our results show large effect sizes for an item's impact on income, revenue, assets, and equity ; moderate effect sizes for the nature of the item and risk ; and small effect sizes for earnings trend, absolute size of the item, firm size, current assets or working capital, and return on investment . The results also indicate that the effects persist over categories of research design (survey, archival, or behavioral), but the strength of the effects varies between designs .

Advances in Accounting, Volume 17, pages 65-90. Copyright m 2000 by JAI Press Inc. All rights of reproduction in any form reserved. ISBN : 0-7623-0611-4 65



66

EUGENE G. CHEWNING, JR . and JULIA L . HIGGS

INTRODUCTION There are few instances in which financial reporting and auditing standards provide explicit guidance in making materiality decisions . For instance, Statement of Financial Accounting Concepts (SFAC) No . 2 (Financial Accounting Standards Board [FASB] 1980) describes an item as material when it is "probable that the judgments of a reasonable person relying upon the report would have been changed or influenced by the inclusion or correction of the item" (paragraph 132) . Few rules or standards for judging materiality in specific situations have been issued . In fact, the FASB has maintained that it should not formulate a general materiality standard (SFAC No . 2, para . 131, FASB 1980) . The concept of materiality continues to be elusive and is generally considered to be a matter of professional judgment . While we have few guidelines or standards for judging materiality, we do have a rich body of empirical inquiry on the subject . Our search located 125 papers on the topic, and from these papers we identified more than 50 quantifiable measures of materiality . This paper reports a meta-analytic review of 26 such studies . Our purpose is to examine rigorously the relationships documented in the materiality literature and report evidence regarding the nature and strength of those relationships . This research is motivated in part by the FASB's renewed interest in materiality research . In particular the FASB (1995) encouraged research related to materiality and disclosure . Our work does not exclusively address this particular application of materiality but provides evidence on that issue as well as summary evidence regarding materiality measures . The findings may be of particular interest to researchers interested in the study of materiality or in providing experimental control for materiality, to auditing teachers who may be interested in a synthesis of this literature, or to policymakers such as the FASB in addressing issues such as materiality and financial disclosure . Since the last published, comprehensive review of the materiality research (Holstrum and Messier 1982), a number of empirical studies examined how auditors make materiality decisions and how materiality has been used in accounting and financial reporting decisions . Most of these studies involve the application of materiality to disclosure and thus provide evidence that may be useful to the FASB . With the inclusion of these more recent studies, it is appropriate to reexamine the factors that are important in evaluating materiality . Our work differs from Holstrum and Messier (1982) . They use traditional review methods to evaluate four factors that are potentially important in judging the materiality of an item : the effect of the item on income, the earnings trend, working capital, and total assets . They find that the effect on income is the most important factor and report the earnings trend variable as a "distant second ." They also find mixed results for a measure using total assets as a base, and virtually no support for the working capital measure . Using the meta-analysis method, we find a strong effect size for the income effect variable and a small effect size for the



A Meta-Analysis of Materiality Studies

67

working capital measure . In contrast to Holstrum and Messier's findings, we find that a total assets measure has a large effect size that is similar in magnitude to the effect sizes associated with income, revenue, and equity . The effect size associated with the earnings trend measure is small . Our research also differs from a meta-analysis of studies of financial statement adjustments taken from actual audit workpapers (Kinney and Martin 1994) . They synthesize 13 studies (nine different data sets) and find that audit adjustments tend to reduce income and assets, often in amounts that greatly exceed materiality thresholds . The purpose of our study is to accumulate and summarize the measures that are used to evaluate materiality . Because of this purpose, studies included in the analysis employ materiality as the dependent variable . Thus, not only is the purpose of our study different from Kinney and Martin, but there is little overlap in the set of studies included in the respective analyses . The remainder of the paper is organized as follows . The next section describes meta-analysis and the methods used in this paper . We then discuss the process of identifying the literature . This is followed by a description of the analysis and presentation of the results . Finally, a summary and concluding remarks are made .

METHOD Meta-analysis is a quantitative method which allows for the summarization of the findings of existing research. In meta-analysis the characteristics and findings from diverse studies on some issue are quantified and integrated in a statistically rigorous manner . In the accounting literature few studies are replicated, and the research paradigms, experimental designs, and sampling characteristics of published research differ from study to study, sometimes quite substantially . Metaanalysis provides a way to determine if particular findings persist despite these differences in research methods . Further, meta-analysis allows us to infer which findings or relationships are most important, and in so doing, effectively reduces a large number of relationships to a parsimonious set . Traditional reviews often report the number of significant findings or, perhaps, the probability levels associated with findings . Sample sizes of individual studies are seldom used to integrate the findings . Conclusions drawn may be affected substantially by the subset of studies on a topic that are selected for inclusion in the review . Meta-analysis is potentially a significant improvement over traditional review methods . First, we provide a summary measure of the strength of the relationship between materiality judgments and financial statement variables . This measure, referred to as an "effect size," provides information regarding the importance of the relationship and uses sample sizes from individual studies as weights in its construction (Hunter, Schmidt, and Jackson 1982 ; Glass, McGaw, and Smith 1981) . Second, we correct the variances of our effect size measures for sampling



68

EUGENE G . CHEWNING, JR . and JULIA L . HIGGS

error, an artifactual source of variation across studies, for differences in findings across studies may be caused by sampling error alone . Finally, we consider the effects of methodological moderator variables, that is, the methodological characteristics of studies that may lead to differences in findings . For instance, our analysis shows that, for several financial statement materiality indicators, the average effect size obtained from the results of studies employing survey methods differs from the average effect size obtained from studies employing other research designs . The primary focus of our review is on effect sizes of financial statement indicators of materiality . We first identify the financial statement items that have been used in the research literature as indicators of materiality . For example, by far the most commonly used indicator of the materiality of a transaction or event is its effect on income . Next, where possible, we compute effect sizes for each financial statement item employed as a materiality measure in each study . The effect size is defined as the difference in means of a materiality measure for an experimental group and control group divided by the within-group standard deviation : d = (µe - ltc)/a

(1)

where d is the effect size, µe and µc are the means of the experimental and control groups, respectively, and a is the within-group standard deviation . Most metaanalyses in the social sciences have used this measure of effect size (Hunter, Schmidt, and Jackson 1982) . Thus, an effect size for the income effect variable, for example, may be defined as the mean income effect of a group of transactions judged to be material, less the mean income effect of a group of similar transactions judged to be immaterial, divided by the within-group standard deviation of the income effect. The interpretation of the effect size is relatively straightforward . A comparison of the experimental (material) and control (immaterial) groups is expressed in units of standard deviation . In the income effect example, the average income effect of a transaction judged to be material for a group of firms is d standard deviations larger, assuming a positive value for d, than the average income effect of the same transaction judged to be immaterial for another group of firms . Unfortunately, the research designs encountered in this literature are diverse, and most studies do not report explicitly the means and standard deviations for experimental and control groups . Some studies report differences in groups using a t, F, or other statistic, while others describe empirical relationships with a correlation coefficient, a x2 statistic, or some other statistic . Meta-analysis requires the results from each study included in the analysis to be transformed to a common measure, such as d, as we adopt here . Methods for calculating effect sizes from commonly used test statistics are discussed in Glass and colleagues (1981) and



69

A Meta-Analysis of Materiality Studies Table 1.

Methods Used to Transform Reported Statistics to Effect Sizes

Statistics Reported

Intermediate calculations

r2

NA

Formula for effect size (d) IN-2 d= ,~ N

2r \ 1 - r2

12 (eta-squared)

rxy = 1 .25 (rpb)

d_JN-2

F2 rpb = 11 Mean and standard deviation of experimental and control groups

NA

t-statistic

NA

2r C

d

= X,-X, Se

d=t 1 + 1 Ne Nc Percentage classified as material

NA

(Pe - Pc ) d= ,JPj1 -P' )

NA

z-statistic

# material or # not material z=

(Ne ± .S)-Ne+c( .5) ,JNe+c( .5)( .5)

Notes:

d = z

1 +1 Nc Ne

d = z

1 + 1 Ne Nc

Summary of Variables: • = Sample Size • = Experimental Group • = Control Group d = Effect Size • = Mean • = Proportion of population s = Within group standard deviation rpb = Point biserial correlation coefficient rxy = Correlation coefficient • = Correlation coefficient

Hunter and colleagues (1982) . The methods employed in this paper appear in Table 1 . The process of cumulating the results of studies follows the method suggested by Hunter and colleagues (1982) . First, we estimate the population effect size, d, as an average of the effect sizes of each individual study weighted by the number of subjects in each study as follows :

d

=(Nidi) (2) .Ni Y



70

EUGENE G . CHEWNING, JR . and JULIA L . HIGGS

where d; is the observed effect size for study i and N; is the number of subjects employed in study i . Thus, greater weight is given to studies employing larger numbers of subjects . 1,2 As long as sample size is not correlated with the variance of d across studies, the frequency weighted average is superior to a simple average (Hunter et al . 1982) . The variance of d is computed as follows : 2 Y [ Ni( d i - d)1

2

ad =

~ N[

(3)

If we let S denote the population value of the effect size statistic, the observed value of d deviates from S by sampling error, 6e 2. Sampling error is the greatest single source of artifactual variation in effect sizes across studies in most domains (Hunter et al . 1982) . The expected sampling error is computed as follows : 4(1 + 2 6e =

d2

18)K

(4)

Y Ni where K is the number of independent effect sizes . The variance, corrected for sampling error or residual variance, is thus : 2

2

2

6 S = 6d - 6e

(5)

If the effect size is the same across studies, then we expect the value of 6s2 to be approximately zero . If the value of 6g2 is large, then it is likely that other sources of artifactual variation are present and/or one or more moderator variables might explain the variation across studies . Other sources of artifactual variation include variation due to measurement error, restriction in range, bad data, or even reporting error . In published studies, we assume the editorial process eliminates most, if not all, variation due to bad data and reporting error and perhaps much of the artifactual variation due to measurement error . Measurement error concerns how the true value of a construct differs from its measurement . Correcting effect sizes for measurement error requires measures of the reliability of the independent and dependent variables in each study . Unfortunately, these are seldom reported in the accounting literature . Restriction in range concerns the distribution of the independent variables in a study . If the standard deviation of the independent variable differs from the population standard deviation, the correlation between the independent and dependent variables (or effect sizes) will differ from the true correlation . Unfortunately,



A Meta-Analysis of Materiality Studies

71

the studies we examine do not provide the information that would allow for a correction of d because of range restriction of the independent variable . However, a priori, we do not expect substantial range restriction, especially for studies employing archival methods . Given the financial nature of the independent variables used in studies of materiality, we expect these measures to be within ranges decision makers face in the field . To test for the existence of an important moderator variable, we partition studies into subsets based on a hypothesized moderator variable, for example, studies using a survey method in one set and studies using other methods in another set . We then repeat the process outlined above for each subset of studies . That is, we compute d, 6d2, 6e2, and 682 for each subset. If there is a large difference in mean effect sizes between subsets or a reduction in the residual variance within subsets, then the hypothesized variable is a moderator variable (Hunter et al . 1982) . To summarize the meta-analysis approach, we compute effect sizes for each independent variable in each study included in our analysis . The indiv i dual effect sizes are cumulated giving an estimate of the population effect size, d . We compute the observed variance of d and correct for sampling error . If the corrected variance, 682 , is large relative to d, then we search for moderator variables to explain the observed variance .

IDENTIFYING THE LITERATURE The first step in the identification of papers included in this review was a key word search of two electronic data bases, ABI/Inform and the Economics Literature data base . ABI/Inform includes abstracts of management, marketing and general business journals . This database includes articles published in the years 1987 through early 1996, providing an abstract of a journal article if the search term is included in the article title or abstract . A second search was performed on the Economics Literature data base which contains the citations for articles appearing in most accounting, finance, and economics journals from 1950 through 1996 . This data base does not include abstracts ; rather, articles with the key word in the title are identified . The key words "material" and "materiality" were used in both searches . We also searched the Accountant's Index (AICPA) for the period 1954-1995 under the headings of "materiality" and "material ." Because Woolsey's papers (1954a, 1954b) appear to be the first papers to study materiality issues in a rigorous manner, 1954 was selected as the starting year . Articles identified from all three sources were selected as candidates for inclusion in the review based on the abstract, description, and/or title . Also, these papers' bibliographies were examined to search for papers that may not have been otherwise included . Our search identified a total of 121 published papers and 4 working papers related to the concept of materiality . 3



A Z_

8 8

8

V

V

V

N in H

in M



M c

co O O

N M O O

N

N O

~ N

O N

IM M O

9

N

ro I

N

N

8 v

x Ln L N

1-4

O

O

~o

cd a. a> 'CS 0

N N b

b

Im in M eM O O

N O O

2 y O

N 4)

O

N~ a .~ N Q N 4) W 0)

U W O1 O

O H O

. N

H

t,

N N N

44 d ry

C O

L,

Ln

1'3

M

to

z

N

Ln

00

N

O N N ri ~i di

N

rn

co

e-+ h

N- M V' " N N 9 in N N '-1 O 'O N ~--~ 8 10 U) 10 O M M O tn M oO

'-+ 'O

C

a.+ O

4)

Ifl T

10

M Ln

1O N ~L

N T ~ N

v 00 00 oy oy o .o .

0

N

oO OO ao N

a,0\ m p>v -

m m ed m

vpO v

~ oai ~a

.a

d v v v

y

y

W

ON

v

O

v

T

00 v

v v

~4

bG+O V) G G G

A

o 0

C

u 0) 0)

cr"d'

0)

U U U T

u 7

G z

72

aO a ' 0O

v

C z

G

v ~

O .G .

x3333GWbzz~ ~ G G ro ~ U

c

m

U Q C

N

o U

d



N IZ

ro I N

S v

8 8 v v

z

CAD t\ O

M O

C co OD

M

ti

O

O

H O

Lr) H O O

M N O O

d M C) O O

I N

L,

o0

4

M

O

O

In

n

x 00

8

N a0

b

N ZS

b

rl

O

Iz

14 N 00

M

inn

O 1

W 'D 'D cn C M G~

M ti O oOCD Lo N ~O O~ 00

H

cri

'-i m

.-

M

N

-

11!

cV

M O O~ - - (31 M N- N

M

00

O

O

a N M

N

M

00 0 0' O' T

v v v asv 0,v a,v

' rn rn M -4rnt\rn0'

O

N N

n 00 00

S~ la~ Ss

N

O

I-

n



n

M

q 00 ti z 00 t. ON C, 00 00

• O





w ~Oq 0)

_

O

O m

ai it a P4

b .iDxz`" 'd G v

R IU

yl O

R O

••

G V

w

cn C G

• .C d

O LO

73

0



H A

-

g g v v

m

o O D

vi z

0

M

en

v

00 H H

M

D

O

in

O

O

N

M

Ln

O O O O

O

O O

n

M

M

Nb

O

O

O v

C W O

U

I

N

O

O O

0)

O

O

00

Nbb

0)

M

W

y

IO 'r

Ln w

U

Q

N w

a

a

H

00 , N ~ cN Ib

z

,q O

O

M er

Ln

14

en Ln Cf) 00 . I+ I- I

~ 10 rn m

M

m ii n

"R Cf)

'

C

~ 00

N

n N O -

A N

0

00 O' M 00 00 a' a' 00 00 00

O

. v opa, b0

bA m ~ ~ ~ ~ vQvvvv A O V V j u ~P404P404 o

v

D 3

W d

CO ~0

z~

m 0

ti O 4 2

v Z

x U

CO m

v w

0

O 0)

m A

a z

cn

74

a



8

S

J

J

0 a+

ao

to

OD O O

N O N O

cC

M N O O

M N O O

00 O O

N H O

N O

O O

z

z

0

I N

x

N CO

b

N 0

b

Nb 'CS N

G O U N

r

U W

O

O O

0)

M O

O

O' O O O

0 0

x cC

w v0 01

w 2 W

Q

r~

I

Fr

N

N

z

N W N C Q M

O 00

N N O 0

0

N 0

M ma"+ o

cC N en en N i N ~--i m N

00 0~ 00 00 a, a, 0' Q1

O

v

v

~

~

~--~ tf i d~ O

N in m M M N 'IV N C 0

cC M in rnO, N in M ---

N ce n ei rn in in

O

N

N Q1

v rY

00 n o0 N 0'

M

•D

U u o

°~ G O

U Q

Ca 0 z

w cJ d

^

G U d Ms b O m Q

75

M

O 4G v v v w o~ z3 O 3 G 0 G m o

i0 O. V

CO d

0\

N 0 0000 0' 0' 0' 0' 0' 0,

.G . G
ti

N 00 O 00 M

N 00 N O 00 0 00 0 a, a, a, al

p v a 3 R ,w v v v .b N N

w

G . G b~ o

to

w w



a o I

O v

z

v z

0

N M N

O

N O O

eO O O

M O O

00 O O

0

O N

z

N a mi

K N . Nb

O

C

O O



N ,~s •

n

ao O O, O O

. N

O

O

N

D

O O

O

O

O

i

O

O

00

In

n O O



C .Y

N



0

T.

V

a

H

N

M

m

o

N

N

eM

N Ir

a. e

-4

N

N 'r t+1 I

N

O N C4 L~ N . l~ N .h 0 O

o

M

N M n O N N

0

N

M'0 M 0' N N

In N N

z

M

4 .a

v O

00 000 n n0,rna'00a,o00T o00 O, G' 0a' a

ti ON

v•00 v~ N

'C E w v ai v v

'b

vo c

44

d



O' N

Gm

O O ~ ~ G

o

d

O

o~+

0 p0

O z

z

76

an



"N



8

J

z

z

O I

N N



N

0

O

O

O O

N N

O9 N O

O

O

O

O

O

n N o

b

N10 C C O



N N O

N

G

O

O

N u)

U

in



0 in

in

N

a

a

H I0

C



Lfl

U

I

I

I

I

°o a c~ q

wn m ° t ~ .0 I I I i

c 07 o

r. 0 o N . ED N N -

M in N H N

K

P 'on, R O, C, a, O, C, l,CO L

v 0 ar R

a

CCSCS05 0'

It A

Oc 0u

m

• •

~O

v X

m

0 U

y

77



m z

b

y u b F N v 7 R

O

W A ,.7 O 41

N 0

O N

O

O

b F C1 O1

0 A -O

G

O O

O

N

N N a) y

a)

O

b

~I

O

Nro

G

II

N o

v

v

O r

b

0

U

N

v

t1 O

b a

a v

N v

.D CD 00 I3

N M O

N

b

•r

O

t

v

o

O 0 n N• y °

u

O



0

F

b

,n

m

i

y

v r



o

o v o 0 G~!

OJ O~!

V

d

00

a ° y v 7 z

z

~

b

>

v o

.FV.

~r'

H M z

78

N~

C

v

u

H

N AC) N U

•• g

y '

tC p

I

8 N 2 I .'a

"

DO V

:3 „

A v a v v

7

$,

Y G u

o

Ln cc El 00

Rh

y



x

NX Q



A Meta-Analysis of Materiality Studies

79

The largest group of studies on the topic of materiality includes essays and literature reviews (N=57) . This literature is primarily normative, and it is not included in our analysis . Several other studies employ empirical, analytical, or simulation approaches but either do not use a measure of materiality as the dependent variable (N=13) or lack the information necessary to compute effect sizes (N=23) . A few studies (N=4) are eliminated because the data are published in other studies and included in our analysis via the other publications . The remaining studies (N=28) provide one or more independent measures of an effect size of a materiality relationship . We include in our analysis those measures that are common to two or more studies . Thus, our study includes the analysis of a total of 92 effect sizes of 11 materiality measures from 26 studies . 4

RESULTS Analysis of Effect Sizes The materiality measures appear in separate panels of Table 2 . Note that a study may appear more than once in a panel if the author(s) took separate measurements from different populations . For each financial statement materiality measure, the total number of subjects, the average effect size, the number of studies cumulated, the observed variance of the effect size, the estimated error variance, the corrected effect size variance or residual variance, and a chi-square test for variation across studies are reported . In addition, for the moderator variables described in Table 2, we provide the effect size and the number of subjects at the individual study level . Our review finds 24 quantifiable estimates of an effect size for the relationship between materiality and income with a total sample of 2,949 subjects (Table 2, Panel A) . The initial estimate of the population effect size is 2 .387 (i .e ., the income effect of material items is 2 .387 standard deviations larger than the income effect of immaterial items) with a corrected variance of 7 .454 . Because the chi-square test shows this variance to be reliably different from zero (x2 = 3,195, p < .001), we search for moderator variables to account for the residual variance in the effect size estimates . Logical considerations of the strengths and weaknesses of various research designs and casual observation of the individual study effect sizes suggest a consideration of methodological moderators. The first methodological moderator that proves useful in reducing the variance in effect sizes is the distinction between survey and nonsurvey research .5 The seven measures from research employing a survey method produce an effect size for the income effect variable of 5 .615 . The 17 measures from studies using archival or behavioral methods produce a much smaller effect size of 0 .837 . In both cases the residual variance, 6g2, is lower than the estimate for all measures combined but still significantly different from zero (for surveys, x2 = 332 .1, p < .001 ;



80

EUGENE G. CHEWNING, JR. and JULIA L . HIGGS

for nonsurveys, x2 = 145 .2, p < .001) . Because of the relatively small number of studies employing a survey method and the variety of potential problems that attend those research designs, we elected not to search for further moderator variables for survey research . For the nonsurvey research, we continue the search for methodological moderators . Again logical considerations and casual observation suggest a partitioning based on behavioral and archival research methods . Research employing archival methods tends to emphasize external validity while behavioral research emphasizes internal validity and strong variable manipulation . The weighted mean effect size of archival studies is 0 .727 . The residual variance among these estimates is .083, and significantly different from zero (x2 = 33 .5, p < .005) . The weighted mean effect size for behavioral studies is 1 .558, over two times as large as the mean effect size from archival research . The variance in effect sizes from behavioral methods remains significant (x 2 = 57 .8, p < .00 1) . This result implies that other important, but unspecified variables differ between studies using archival and behavioral methods . Because of the relatively small number of effect sizes for studies using either method, we elected not to search further for moderator variables . Panel B of Table 2 reports the results of the analysis of eight effect-size measures for the revenue effect variable . For all studies we find a d of 1 .63 but with a large, significant residual variance (x2 = 703 .5, p < .001) . Eliminating surveys from the analysis reduces d to 0.829 and the residual variance to 0 .008, which is not significantly different from zero . Panel C reports the results of the analysis of 11 measures of the effect size for the asset variable . For all studies d is 0 .588 with significant residual variance (x2 = 122 .3, p < .001) . When survey-elicited effect sizes are eliminated, the weighted mean effect size is 0 .807 . The residual variance is less than 0 .05 and insignificantly different from zero (x2 = 9 .3, p > .05) . Panel D reports the results of the analysis of six effect-size measures for the equity effect variable . The weighted mean effect size is 0 .735 with a significant residual variance (x2 = 18 .3, p < .01) . Neither eliminating the single survey effect size nor partitioning by archival/behavioral research method significantly reduces the residual variance . Six measures for the nature of the item with a weighted mean effect size of 0 .560 appear in Panel E . Five measures for risk with a weighted mean effect size of 0 .393 appear in Panel F . The residual variance associated with each variable is insignificantly different from zero . Ten studies provide a measure for the earnings trend variable (Panel G) . Because of significant residual variance in the estimate of the mean effect size, we partition surveys and nonsurvey studies . This results in an insignificant residual variance for each set. The mean effect size from the survey studies is 1 .157, and the result is dominated by a single study . The weighted mean effect size for the eight nonsurvey measures is 0 .266 and is considerably smaller than the effect size found in survey research .



A Meta-Analysis of Materiality Studies

Table 3.

81

Summary of Effect Size Measures Materiality Measure

d

Largea Revenue Asset Income Equity

0 .83 0 .81 0 .73 0 .73

Nature of Item Risk

0 .56 0 .39

Earnings Trend Absolute Size Return on Investment Current/Working Capital Firm Size

0 .27 0.25 0.17 0.21 0.01

Medium

Small

Notes:

'The categories roughly follow guidelines suggested by Cohen (1977). Cohen suggests categories of effect sizes as follows : small (d = 0.2), medium (d = 0.5), and large (d = 0 .8) .

Five studies employ the absolute size of the item as indicators of materiality (Panel H) . However, the residual variance of the weighted mean effect size is significantly greater than zero (x2 = 13 .1, p < .02) . Eliminating the single survey study results in a small weighted mean effect size of 0 .252 and a residual variance of 0 .036, which is not significantly different from zero . Firm size is an independent variable in five studies (all archival, Panel I) . The weighted mean effect size is 0 .006 with significant residual variance . In only the Costigan and Simon (1995) study does the effect size carry a positive sign . We find 10 measures of the effect size for a relationship we label current/working capital effect (Panel J) . The effect sizes for the survey and nonsurvey studies are of opposite signs ; each has insignificant residual variance . However, the negative signs associated with the effect sizes for surveys are a result of the computation of the effect size . Generally speaking, the survey studies ask subjects if they use a particular measure in making materiality decisions . If most do not, the coefficient of a binomial test is negative . A negative effect size thus implies that subjects report that the measure is used infrequently. A weighted mean effect size of 0 .207 from the non-survey studies (all behavioral) also implies a small effect for this measure . Two studies employ a measure of return on investment (Panel K) . The weighted mean effect size is small, 0 .173, and the residual variance is insignificant . Table 3 summarizes and categorizes the analysis of the effect-size measures . The effect size associated with each variable in Table 3 is the measure from Table 2 that has the lowest sampling error (the residual variance is lowest, most often not significantly different from zero) . Often this is the weighted mean effect size



82

EUGENE G . CHEWNING, JR. and JULIA L. HIGGS Table 4. Results for Materiality Measures from Disclosure Related Studiesa 2

Materiality Measure

Income Effect Revenue Effect Asset Effect Equity Effect Risk Earnings Trend Absolute size of Item Firm Size

N

d

K

1,619 688 688 763 468 685 354 1,009

.705 .969 .929 .772 .251 .174 .262 .027

10 3 3 4 2 2 2 4

08

P > X,

.084 .000 .022 .102 .017 .001 .075 .212

< .005

n .s . n .s . < .005 n .s . n .s . < .05 < .001

Note: 'Statistics are defined in Table 2 .

from nonsurvey research . We base our categorization on rough guidelines for effect sizes (Cohen 1977) of d = .8 for a large effect, d = .5 for a medium effect, and d = .2 for a small effect . 6 According to these guidelines, the income, revenue, asset, and equity measures have a large effect ; the nature of the item and risk measures have a medium effect ; the earnings trend, the absolute size of the item, firm size, current/working capital effect, and return on investment have a small effect. Surprisingly, despite conventional wisdom and the often-mentioned importance of percentage-of-income rules of thumb, the mean effect size for the income effect variable is not significantly larger than similar measures for revenue, assets, and equity . Thus, contrary to our expectations, the research does not find the income effect variable to be the leading materiality measure . A potential limitation of this analysis is the integration of studies conducted over such an extended period of time . If the environment in which materiality decisions are made has changed, for example, if audit procedures have become more structured over time, then it may not be appropriate to integrate studies over a long period of time . To address this issue, we examined separately more recent studies, those appearing in the literature after 1987 . The results are very similar to those reported above . Large effect sizes are found for the income, revenue, asset, and equity variables . Materiality and Disclosure The FASB (1995) issued a call for research on the topic of materiality and disclosure . The FASB is concerned with reducing the costs of disclosure and reducing the noise and increasing the usefulness of information presented in financial statements . Application of accounting standards to material items only is proposed . However, disclosure materiality standards do not exist . We summarize effect sizes from the extant disclosure-related materiality studies in Table 4 .



A Meta-Analysis of Materiality Studies

83

Seven archival studies of materiality involve disclosure issues : Chewning and colleagues (1989, 1998), Costigan and Simon (1995), Frishkoff (1970), Morris and Nichols (1988), Neumann (1968), and Stone and Ingram (1988) . With the exception of Chewning and colleagues (1998), these studies examine the disclosure of accounting principles . Accounting principles changes resulting in audit reports modified for consistency are presumed to be material events . Table 4 provides a summary of the analysis of effect sizes of the studies of disclosure . Eight of the 11 materiality measures from Table 2 are common to two or more disclosure studies and are included in Table 4 . Four materiality measures exhibit a strong effect in disclosure studies : the income effect, revenue effect, asset effect, and equity effect . For the income and equity effects, the residual variance after correcting for sampling error is significant . The remaining measures in Table 4, risk, earnings trend, the absolute size of the item, and firm size, have small effects . Thus, the effect sizes associated with disclosure studies are similar to those reported in Table 3 . Fail Safe N It is also possible to combine statistical significance levels from individual studies into a joint test of the null hypothesis (Jones and Fiske 1953) . A joint test of this kind provides aggregate evidence regarding the rejection of the null (Glass et al . 1981) . However, such a test may be biased toward rejection of the null if a meta-analysis includes only published papers and if journals are more likely to publish positive results . This problem could be remedied if one were to include all research (both published and unpublished) on a topic in the review . However, it is very unlikely that any literature review ever uncovers every study conducted on any topic . Because studies with negative findings or with no findings (i .e ., studies supporting the null hypothesis) may not be published or circulated within the academic community, the typical literature review of published research is biased toward positive findings . This is the so-called "file drawer" problem and is a criticism of any review method, including meta-analysis . On the other hand, the information in significance levels from published research is useful in documenting the strength of the findings through a measure called the Fail Safe N (Cooper 1979) . The Fail Safe N (Nfs) is the number of additional studies cumulated in a meta-analysis that would reduce the overall probability obtained from combining the observed probabilities to an insignificant level . That is, Nfs is the number of studies confirming the null hypothesis, with Z--0, that would be needed to reverse the conclusion that a statistically significant relationship exists . Thus Nfs can be interpreted as an indication of the stability of the relationships documented in Tables 2 and 3 . Nfs may be determined for p = .05 as follows :



84

EUGENE G . CHEWNING, JR . and JULIA L . HIGGS Table 5.

Fail Safe N (Nfs .05) a

Materiality Measure

Kb

Nf, .05

Income Effect All Nonsurvey

24 17

1821 932

Revenue Effect All Nonsurvey

8 5

224 104

11

14

5

69

6 5

72 61

7 5

73 25

Risk

5

35

Earnings Trend All Nonsurvey

10 8

185 123

Absolute Size All Nonsurvey

6 5

49 21

Firm Size

5

26

9 4

0 6

2

1

Asset Effect All Nonsurvey Equity Effect All Nonsurvey Nature of the Item All Nonsurvey

Current/Working Capital All Nonsurvey Return on Investment Notes:

aThe number of studies confirming the null hypothesis (with Z=0) that would be needed to reverse the conclusion that a statistically significant relationship exists at a critical value for statistical significance of 0 .05. bK = the number of independent tests of the null hypothesis, i .e . number of significance levels aggregated in determining Nf,

~Z 2 Nfs .05 = (1 .645) -K

(6)

where ZZ is the sum of the individual Z scores associated with the reported probability levels and K is the number of independent tests of the null hypothesis included in the meta-analysis . Table 5 reports numbers of tests of the null and Nfs for each financial statement measure of materiality . Also included are measures of Nfs for nonsurvey research . 8



A Meta-Analysis of Materiality Studies

85

For the four measures with largest weighted mean effect sizes (from Table 3) the number of negative tests of the null necessary to reverse the findings of a relationship range from 14 for the asset effect variable to over 1,800 for the income effect. 9 The measures with the largest mean effect sizes also appear to exhibit substantial stability . For the measures with medium effect sizes the relationships are also quite stable with 25 or more negative tests required to change the conclusion that a relationship exists . The more surprising results are from the small mean effect size category . Even though the absolute size of the item and earnings trend have small mean effect sizes, the relationship appears quite stable in that 49 negative tests of the null for the absolute size of the item and 185 negative tests of the null for the earnings trend variable would be needed to reverse the conclusion that these measures are related to the materiality decision . On the other hand, the current/ working capital effect and return on investment relationships appear to be very tenuous .

LIMITATIONS Our review considers primarily published research . If published studies report larger effect sizes, then the effects we report are biased . Hunter and colleagues (1982) provide evidence consistent with the hypothesis that the differences in effect sizes between published and unpublished studies are due to the quality of the research method . In any case, we find that for most of the materiality measures we study, reversal of the conclusion that a relationship exists would require discovery of a large number of unpublished studies with results supporting the null hypothesis . Another potential criticism of the meta-analysis approach is that studies with different experimental designs, different subject pools, and different measurement techniques are combined (i .e ., the "apples and oranges problem" as labelled by Glass and colleagues 1981) . However, they point out that it is precisely those studies, the ones that differ in one or more respects, that we should integrate . Integration of studies that are alike is useless because the results of like studies should differ only by sampling error. A finding that persists in spite of differences in study characteristics is less likely to be the result of a particular research design . In our analysis we code the major characteristics of research design . Our results indicate that although the strength of the effect varies over broad categories of research design, the effects persist over different studies with different threats to internal and external validity .



86

EUGENE G . CHEWNING, JR. and JULIA L. HIGGS

SUMMARY AND CONCLUSIONS This paper reports the results of a meta-analysis of the empirical research on materiality . We cumulate results of prior research on eleven variables hypothesized to influence materiality decisions . Four of these measures exhibit a strong effect : income, revenue, assets, and equity . Contrary to our initial belief, the income effect is not dominant, carrying an effect size of approximately the same magnitude as the other three measures . Of these four materiality bases, only the revenue and asset measures exhibit insignificant residual variance for the effect size estimates (from the nonsurvey studies) . A possible explanation for this observation is that assets and revenues tend to be more stable from period to period than income and equity . If income is near zero, a small absolute amount will appear large relative to income . In such situations, small transactions may appear ostensibly as material events . In companies in which assets and revenues fluctuate less than income and equity, the relationship between assets or revenues and materiality is more stable . This discussion begs the question of what is the "best" materiality base . These results do not answer that question but provide a basis for considering alternatives . Because of the problem associated with income near zero, a materiality measure that includes an income statement and balance sheet measure (e .g ., income and assets or revenue and equity) may be superior to either measure taken alone . Other alternatives such as a composite measure that combines multiple measures may be superior still . Consistent with this approach, Pany and Wheeler (1989) find a "blended method" that averages materiality thresholds calculated from income, total assets, equity, and revenue provides the most stable amount from year to year across a number of industries . Two measures have moderate effects : the nature of the item and firm risk . Five measures are found to have small effects : earnings trend, the absolute size of the item, firm size, the current/working capital effect, and return-on-investment . The FASB recently expressed concern that materiality guidelines based on income or balance sheet accounts "may not translate well to disclosure" (1995, 5) . Because of the FASB's renewed interest in disclosure-related materiality, we cumulated separately the results of studies that examine the materiality decision implied by disclosure decisions . The results are quite similar to those outlined above, with the largest effect sizes associated with the income, revenue, asset, and equity measures . Consequently, our results imply a strong relationship between disclosure-related materiality and income and balance sheet amounts . The research method plays an important role in the strength of the relationships observed . Mean effect sizes from survey research tend to be much larger than mean effect sizes from archival and behavioral research . However, generally the problem is one of magnitude rather than direction . Surveys identify measures related to the materiality decision but may overstate their importance relative to studies using archival or behavioral research designs . The residual variances from



A Meta-Analysis of Materiality Studies

87

survey research also tend to be larger than the residual variances from archival and behavioral research . This is indicative of other unidentified variables (very probably related to experimental design) differing for studies that use the survey method . With regard to the income effect variable, an additional methodological moderator variable explains the magnitude and variance in effect size . From studies using an archival method the effect size for the income variable is smaller and exhibits much lower residual variance than the same measure from studies using a behavioral method . Tighter controls in the laboratory over potentially confounding effects and strong manipulations may indicate that the larger effect size is a better measure of the actual effect on income . However, in the field, the task of making materiality decisions is likely less well-defined, the information may be less reliable, and/or the income effect may be but one indicator that decision makers consider in making materiality decisions . We conclude with several observations on the materiality literature and potential issues for research . First, few empirical studies actually provide working materiality guidelines, and, when some guidance is given, it is most often for the size of the item relative to income only . Guidelines in the form of mean income effects of items judged material or immaterial are common in the archival studies . Few studies provide guidance for other materiality bases (e .g ., assets, revenue, or equity) . Second, there is very little in the way of empirical studies on the "sliding-scale" materiality measures with Warren and Elliott (1986) and Pany and Wheeler (1989) as notable exceptions . Third, the disclosure-related materiality studies are almost exclusively limited to examinations of the disclosure of accounting principlcs changes . Descriptive studies that examine the magnitude of other kinds of disclosure relative to various bases are needed . Studies of disclosures mandated by accounting standards may be particularly helpful to the FASB in the debate over disclosure effectiveness . Fourth, Raman and Van Daniker (1994) report the results of a survey of the frequency of use by auditors of several materiality bases, including sliding-scale measures, for governmental audits . We did not find a published survey of that kind for the audits of for-profit entities . Such a study could assist auditors in documenting materiality decisions, in assessing risk, and in audit planning, especially in firms without established materiality guidelines .

ACKNOWLEDGMENTS We would like to thank Maribeth Coller, Fred Mittelstaedt, Lloyd Seaton, Earl Spiller, Stephen Wheeler, the associate editor and reviewers for comments on earlier drafts of this paper.



88

EUGENE G . CHEWNING, JR . and JULIA L . HIGGS

NOTES 1 . This definition for the estimate of the population effect size assumes implicitly that subjects are homogeneous across samples . This is probably not true . Materiality decisions made in the laboratory by subjects under controlled conditions may well differ from those made by auditors in the field (from which much of the archival research is drawn) because of differences in the risks, rewards, and incentives facing each . In a subsequent section, we address this issue by partitioning the results on research method. 2 . Two studies, Firth (1980) and Icerman and Hillison (1991) employ more than 1,000 subjects . To reduce the potential impact of a single study with a disproportionately large sample size on the weighted effect size, we arbitrarily set N, to 999 for each of these two studies . This has a negligible impact on the reported effect sizes . 3 . Including only published papers may create bias in any literature review . To reduce this potential bias, we include papers that have been widely circulated in the academic community (and known to us) . Two working papers are included in our analysis . Our criteria for inclusion are that (1) the paper is cited frequently in the bibliographies of published articles (indicating that the paper had been widely circulated and was considered significant) or (2) the paper is currently being circulated and is potentially a significant addition to the literature. 4 . Two of the 28 studies have measurable effect sizes for variables that were not examined in other studies (auditor-client affiliation (Bates, Ingram, and Reckers 1982) and personal characteristics of the decision maker (Estes and Reames 1988)) . Our review includes an effect size if measured in two or more studies. 5 . Holstrum and Messier (1982) in a narrative review of the materiality research (without quantitative, comparative measures of effect size) also categorize the research similarly by method: survey, archival, and judgment-capturing experiment . 6 . Cohen (1977) offers these effect size categories as conventions, much as the .05 significance criterion is a convention . While arbitrary, as are all conventions, Cohen gives numerous examples from the behavioral sciences that support the reasonableness of these categories. 7 . To more fully address the intertemporal stability of our findings, we also examined separately studies appearing prior to 1982, the year Holstrum and Messier's review appeared in the literature. This analysis resulted in large effect sizes for the income effect and absolute size of the item, medium effect sizes for earnings trend and the nature of the item, and small effect sizes for firm size and the current/working capital effect . Only single pre-1982 studies examined revenue, equity, asset, and risk measures. Thus, estimates of mean effect sizes for these measures are not determinable . Note, these results are consistent with the conclusions drawn by Holstrum and Messier (1982) regarding the relative importance of materiality measures. Because of limited pre-1982 data for the revenue, equity, and asset measures, we are unable to extend inferences about the stability of materiality measures to this time period . It is clear, however, that accounting and auditing research has given much more attention to asset, revenue and equity-related materiality measures since 1982 . 8. The Fail Safe N reported in Table 5 is an aggregation of significance levels reported in the various studies rather than an aggregation of effect sizes as reported in Table 2 . Some studies report significance levels without reporting information sufficient for determining an effect size . The value of K in Table 5 represents the number of significance levels used in the aggregation and differs from the number reported as K in Table 2 for the nature of the item, the absolute size, and current/working capital effect . 9. From Table 2, it is apparent that the results for the asset effect variable are much larger with lower variance when measured by non-survery methods . The negative coefficients on d for several of the survey results imply that the respondents did not use the asset variable in making materiality assessment . However, the remaining evidence indicates that the asset effect is an important materiality measure . Thus the weighted mean effect size for non-survey studies may be more representative of the true asset effect measure and these studies may be more appropriate as input for the Fail Safe N test.



A Meta-Analysis of Materiality Studies

89

REFERENCES Abdel-Khalik, A . R. 1977 . Using sensitivity analysis to evaluate materiality . Decision Sciences 8 (July) : 616-629 . Bates, H. L ., R . W . Ingram, and P. M . J . Reckers . 1982 . Auditor-client affiliation : The impact on materiality . Journal of Accountancy 153 (April) : 60-63 . Boatsman, J. R., and J. C . Robertson . 1974 . Policy-capturing on selected materiality judgments . The Accounting Review 49 (April) : 342-352 . Bremser, W . G . 1975 . The earnings characteristics of firms reporting discretionary accounting changes . The Accounting Review 50 (July) : 563-573 Chewning, G., K . Pany, and S . Wheeler. 1989 . Auditor reporting decisions involving accounting principle changes : Some evidence on materiality thresholds . Journal of Accounting Research 27 (Spring) : 78-96 . Chewning, G ., S . Wheeler, and K. C . Chan. 1998 . Evidence of auditor and investor materiality thresholds resulting from equity-for-debt swaps . Auditing : A Journal of Practice and Theory 17 (Spring) : 39-53 . Cohen, P . 1977 . Statistical Power Analysis for the Behavioral Sciences (revised ed .). New York: Academic Press . Cooper, H. M . 1979 . Statistically combining independent studies : A meta-analysis of sex differences in conformity research . Journal of Personal and Social Psychology : 131-146 . Copeland, R . M ., and W . Fredericks . 1968 . Extent of disclosure. Journal of Accounting Research 6 (Spring) : 106-113 . Costigan, M . L ., and D . T . Simon . 1995 . Auditor materiality judgment and consistency modification : Further evidence from SFAS 96 . Advances in Accounting 13 : 207-222 . Estes, R ., and D . D. Reames . 1988 . Effects of personal characteristics on materiality decisions : A multivariate analysis . Accounting and Business Research 18 (72) : 291-296 . Financial Accounting Standards Board . 1980. Statement of Financial Accounting Concepts No . 2: Qualitative Characteristics of Accounting Information . Stamford, CT: FASB . Financial Accounting Standards Board . 1995 . Financial Accounting Series-Prospectus : Disclosure

Effectiveness . Norwalk, CT : FASB . Firth, M . 1979 . Consensus views and judgment models in materiality decisions . Accounting Organizations and Society 4 (4) : 283-296 . Firth, M. 1980 . A cross-sectional analysis of qualified audit reports . International Journal ofAccounting 15 (Spring) : 47-59 . Friedberg, A . H ., J . R . Strawser, and J. H. Cassidy . 1989 . Factors affecting materiality judgments : A comparison of `big eight' accounting firms' materiality views with the results of empirical research . Advances in Accounting 7 : 187-201 . Frishkoff, P . 1970 . An empirical investigation of the concept of materiality in accounting . Journal of Accounting Research 8 (Supplement) : 116-129 . Glass, G . V ., B . McGaw, and M . L . Smith . 1981 . Meta-Analysis in Social Research . Beverly Hills, CA : Sage Publications. Hofstedt, T . R., and G . D . Hughes . 1977 . An experimental study of the judgmental element in disclosure decisions . The Accounting Review 52 (April) : 379-395 . Holstrum, G. L ., and W . F. Messier, Jr. 1982 . A review and integration of empirical research on materiality . Auditing: A Journal of Practice and Theory 2 (Fall) : 45-63 . Hunter, J . E ., F. L . Schmidt, and G . B . Jackson . 1982. Meta-Analysis: Cumulating Research Findings Across Studies. Beverly Hills, CA : Sage Publications . Icerman, R . C ., and W . A . Hillison . 1991 . Disposition of audit-detected errors : Some evidence on evaluative materiality . Auditing: A Journal of Practice and Theory 10 (Snringl : 22-34



90

EUGENE G . CHEWNING, JR. and JULIA L. HIGGS

Jennings, M., D . C . Kneer, and P . M . J . Reckers . 1987. A reexamination of the concept of materiality : Views of auditors, users and officers of the court . Auditing : A Journal of Practice and Theory 6 (Spring) : 104-115 . Jones, L . V ., and D . W. Fiske . 1953 . Models for testing the significance of combined results . Psychological Bulletin 50 : 375-382 . Kinney, W. R ., and R . D. Martin . 1994. Does auditing reduce bias in financial reporting? A review of audit-related adjustment studies . Auditing : A Journal of Practice & Theory 13 (Spring) : 149156 . Krogstad, J. L., R. T. Ettenson, and J . Shanteau . 1984 . Context and experience in auditors' materiality judgments . Auditing: A Journal of Practice and Theory 4 (Fall) : 54-74 . Mayper, A. G . 1982 . Consensus of auditors' materiality judgments of internal control weaknesses . Journal ofAccounting Research 20 (Autumn) : 773-783 . Messier, W . F., 1983 . The effect of experience and firm type on materiality/disclosure judgments . Journal of Accounting Research 21 (Autumn) : 611-618 . Morris, M . H ., and W . D . Nichols . 1988 . Consistency exceptions : Materiality judgments and audit firm structure . The Accounting Review 63 (April) : 237-254 . Neumann, F . 1968 . The auditing standard of consistency. Journal of Accounting Research 6 (Supplement) : 1-17 . Parry, K ., and S . Wheeler . 1989 . Materiality : An inter-industry comparison of the magnitudes and stabilities of various quantitative measures. Accounting Horizons 3 (December) : 71-78 . Patillo, J . W., and J . D . Siebel . 1974 . Factors affecting the materiality judgment . CPA Journal 44 (July) : 39-44 . Raman, K . K., and R . P. Van Daniker . 1994 . Materiality in government auditing . Journal of Accountancy 177 (February) : 71-76 . Reckers, P . M . J., D . C . Kneer, and M. M . Jennings . 1984 . Concepts of materiality and disclosure . CPA Journal 54 (December) : 20-31 . Roberts, R . W ., and J. T . Sweeney. 1993 . Auditor loss exposure factors and preliminary materiality judgments : Evidence from practice . Advances in Accounting 11 : 171-183 . Stone, M. S ., and R. W . Ingram . 1988 . Auditor's use of quantitative decision rules in assessing materiality. Working paper, University of Alabama . Warren, C . S ., and R. K . Elliott. 1986. Materiality and audit risk-A descriptive study . Working paper, University of Georgia. Woolsey, S . M. 1954a . Development of criteria to guide the accountant in judging materiality . Journal ofAccountancy 97 (February): 167-173 . Woolsey, S . M. 1954b . Judging materiality in determining requirements for full disclosure . Journal of Accountancy 98 (December) : 745-747 .