0149-7189/86$3.00 + .OO Copyright * 1986 Pergamon Journals Ltd
Evoluotion ond Program Planning, Vol. 9, pp. 335-344.1986 Printed in the USA. All rights reserved.
EVALUATION
APPROACHES IN MENTAL HEALTH: IMPLICATIONS OF THE NEW FEDERALISM CAROLT. MOWBRAYAND SANDRAE. HERMAN Michigan Department of Mental Health
ABSTRACT Since the introduction of federal mental health legislation in 1963, there has been a changing emphasis on evaluation and accountability. With directfederal funding of community mental health services, accountability demands were met through expectations for local agency evaluation activities which were overseen by federal authorities. The advent of the New Fedemlism and the shift to block grantfunding of mental health services to state mental health authorities have shifted responsibility for evaluation to the states and local progmms. This paper reviewsfedeml mental health statutes to trace the extent and locus of required evaluation activities and discusses two approaches to carrying out progmm evaluation: “top-down” where the evaluation topic, method, and data collection are mandated by an administering or funding body; and “bottom-up” approaches where the subject, method of study, and data to be collected are developed in response to a felt need at the local agency level. A case study of each approach as used at the state level in mental health is examined. Based on the litemtum and the casestudies, conclm’ons am presented on the pro’s and con’s of each method in meeting accountability demands and the bam*emwhich must be overcomefor either method to be successful. ations, consultation and education or conversion) was obligated to “a program of continuing evaluation of the effectiveness of its programs in serving the needs of the residents of its catchment area and for a review of the quality of the services provided by the center.” Not less than 2% of operating expenses were to be spent on evaluation. P.L. 94-63 maintained the same requirements for states in evaluation, planning, and documentation. The next major mental health legislation and, probably, the first step in the New Federalism, was the Mental Health Systems Act of 1980 (P.L. 96-398). A major role in the administration of CMHC programs was shifted from the federal government to the states, to be accomplished through performance contracts between federal, state and local entities. The Act also permitted states to receive allotments (Formula Grants) for administrative activities which included data collection, data analysis, research, and evaluation. States would receive funds under the Systems Act after submission of an acceptable plan. Specific to evaluation, the Administrative Part of the Act included requirements for “provision of statistics or data to conform to
The first federal mental health legislation was passed in 1963 (P.L. 88-164), authorizing grants for construction of community mental health centers (CMHCs). In 1965, P.L. 89-105 expanded federal financing to include staffing grants for professional and technical personnel. These statutes contained limited requirements for evaluation, accountability, and documentation. States were required to submit plans, which included a “statewide inventory of existing facilities and survey of need” and set forth “relative need” of projects requesting funding. Other accountability provisions focused on reporting, merit personnel standards, record-keeping, and client rights to hearings. These statutes limited the state’s involvement with CMHC’s and their relationship to state hospitals. In 1975, a revision of the CMHC Act (P.L. 94-63) was passed which included major additions on evaluation and accountability. In addition to operations (staffing) grants, agencies could apply for planning grants which included funding of activities assessing the need for mental health services in their area and designing a CMHC program based on such needs assessments. A center with an operations grant (initial oper-
Requests for reprints should be addressed to Dr. Carol T. Mowbray, Director, Research and Evaluation Division, Department of Mental Health, State of Michigan, Lewis Cass Building, Lansing, MI 48926. 335
336
CAROL T. MOWBRAY and SANDRA E. HERMAN
the method of collection and content prescribed by the Secretary.” The Services Part included requirements for identification of the needs of mental health service areas and evaluation of mental health facilities, personnel, and need for additional services. The Mental Health Systems Act was never implemented. However, the Reagan administration created an even greater change in evaluation requirements for the states with its Block Grant approach to funding human services. Under the Omnibus Reconciliation Act of 1981 (P.L. 97-35), an Alcohol, Drug Abuse and Mental Health Services Block Grant was created which totally shifted administration of mental health programs to the states. One provision required: “The state agrees to establish reasonable criteria to evaluate the effective performance of entities which receive funds from the state under this part and . . . independent state review procedures of the failure by the state to provide funds for any such entity.” An annual report is required of the state “to determine whether funds were expended in accordance . . . and consistent with the needs within the state . . . and of the progress made toward achieving the purposes for which funds were provided.” Thus, overall, federal statutes point clearly to certain trends. There has been an emphasis on accountability and evaluation for receipt of funds. Responsibility for accountability and evaluation, however, has changed. Up until the 1980 legislation, the burden and expectations were placed on local CMH agencies to TOP DOWN
VERSUS
BOTTOM
One of the classical issues in the evaluation literature is the locus for initiating evaluation activity. Both the purpose and perspective for program evaluation need to be considered by government (whether federal or state) when establishing the locus for evaluation. Windle and Neigher (1978) identified four purposes for program evaluation (a) program amelioration or improvement, (b) accountability, (c) advocacy, and (d) research. Neigher and Schulberg (1982) identified four major perspectives for program evaluation (a) funder, (b) provider, (c) recipient, and (d) public. In a “topdown” approach, evaluation topics and often methods as well are mandated by an administering or funding agency. This serves an accountability or research purpose for a funder or the public but generally fails to consider other purposes or perspectives. In “bottomup” approaches, the activity is under the independent control of a local agency, to meet its own felt need. This approach serves the purposes of program improvement and advocacy from the perspective of the provider or recipient but may not meet the needs of funders nor demonstrate accountability. The literature is replete with discussion of these alternatives. Most evaluation theory and philosophy have concluded that the bottom
carry out these responsibilities: first with funding and technical assistance provided; later, with no extra funding available but with the requirements still present. Finally, the New Federalism has led to a shift from federal to state levels to monitor and assume responsibility for these evaluation/accountability requirements (Flaherty & Windle, 1981). Chelimsky (1981) presented a thorough discussion of the implications for evaluation of this latest change in federal policy. Her analysis leads to the conclusion that evaluation requirements have to balance the need for valid and reliable information against administrative flexibility and program integrity. The accountability issues highlighted in her discussion are equally applicable to state and local relationships as to federal and state relationships. For both federal and state governments, the question is at what level of stringency to set accountability and evaluation requirements in order to address issues of “the achievement of the national [state] objective, the achievement of the recipient [local] objective, and program integrity and efficiency’ (p. 118). Chelimsky suggests that the level of stringency required (ranging from self-monitoring to formal external evaluation) should depend on the specificity and importance of the national objective. The same is true for the state level. The approach that a state takes in structuring evaluation requirements for local programs should reflect the level of stringency required to assure achievement of important state objectives. UP EVALUATION
APPROACHES
-up approach is preferable since it necessarily is more likely to involve stakeholders (Patton, 1978). The “stakeholder assumption” is the idea that key people who have a stake in an evaluation should be actively and meaningfully involved in shaping that evaluation so as to focus . . . on meaningful and appropriate issues, thereby increasing the likelihood of utilization. (Patton, 1982, p. 59) Cronbach, Ambron, Dornbusch, Hess, Hornik, Phillips, Walker, and Weiner (1980) present examples of how mandates for accountability and resultant evaluation and information requirements have produced unintended consequences to the extent of killing the program’s original purpose. They state that, “when there are multiple outcomes . . . dramatizing any one standard of judgment simply adds to the priority of whatever outcomes that standard takes into account. Whatever outcomes do not enter the scoring formula tend to be neglected” (p. 139). However, the importance of stakeholder involvement also has its critics. Smith (1980) has suggested that the assumption is a widely held belief that may have little basis in reality. Carter and Walker (in press) go so far
Evaluation Approaches in Mental Health as to say that a major strategy for implementing accountability in public agencies should be federal funding agencies mandating outcome requirements (through regulation), thus supporting the idea of a top-down approach. Unfortunately, while discussion on this subject has proliferated, there is little in the way of a definitive answer as to the effectiveness of either ap preach. The federally mandated evaluation requirement for the community mental health program was an example of a primarily top-down approach. The evaluation requirement (particularly in P.L. 94-63) provided a high degree of structure in terms of the types of evaluations to be carried out (Windle & Ochberg, 1975; Davis, Windle, & Sharfstein, 1977; Windle & Woy, 1977). At the same time, this effort at mandated evaluation left a large degree of home rule to local programs in their selfevaluations. Discussions on the impact of the federal effort to evaluate the CMH program (Neigher, Ciarlo, Hoven, Kirkhart, Landsberg, Light, Newman, Struening, Williams, Windle, & Woy, 1982) suggest mixed results. While most discussants appeared to consider the broad scope federal efforts to have had little impact, most also supported the positive results of local efforts. Ciarlo (Neigher, et al., 1982) suggested that a coordinated state/local evaluation effort was the best strategy until and unless states develop evaluation capacity independent of local agency evaluation activities and data systems. Since the 197Os, the State Mental Health Authority in Michigan has been actively pursuing evaluation/ accountability in state and local programs. Concerns and perceived need for evaluation/accountability at the state level considerably predate shifts in federal law. Two case studies based on experiences in Michigan will be presented to exemplify a top-down approach versus a bottom-up approach. From these case studies, conclusions on the pro’s and con’s of each approach will be presented along with a discussion of some of the barriers that must be overcome for either approach to succeed. Top-Down Approach: The P.A.C. As a Case Study One way a state agency can meet accountability needs is to mandate the data to be collected locally and then to carry out processing and analysis of the data itself. Such an approach was selected by the Michigan Department of Mental Health in 1974 when the Department reorganized along functional lines and a Planning and Evaluation System was created. This new structure adopted as its mandate the development of a comprehensive client tracking system to facilitate planning and evaluation functions at the state level. This system was to include client demographic characteristics, client functioning level/adaptive behavior, client medical conditions/impairment factors, an assessment of client
331
progress, and services received. While the department’s clientele included mentally ill as well as developmentally disabled (DD) clients, the evaluation approach included only the latter because the objectives for client progress were clearer and more agreed upon. Overall, the implementation of this system was a positive effort, one that met many of the “how-to” criteria in evaluation textbooks. It was well-organized and characterized by exceptional planning and thoroughness for a project of its magnitude. There was especially good use of input from the field and dissemination of project results and status. However, despite all this, the outcome was decidedly negative. The project was divided into three phases. In the development phase (1975), new staff were hired who searched the literature and contacted state and local programs for information on data to be collected, content and frequency, instruments and approaches utilized, etc. Criteria were developed for reviewing various approaches. An advisory group was set up which represented field agencies and advocacy groups. The staff and advisory committee developed a plan as to the content of the uniform client data system and the design of a pilot study. A major component of the system was to be a measure of the client’s adaptive behavior. The instrument selected to measure adaptive behavior was the Progress Assessment Chart (PAC) by H. C. Gunzberg (1969), which surveyed four types of skills: selfhelp, communications, socialization, and occupation. The instrument contained four sequential forms which assessed the client on increasingly advanced, developmental levels. The overall design of the data system required that all data be collected at intake for all clients in the state (state inpatient centers for the developmentally disabled and community mental health outpatient, partial day, and residential programs). At six-month intervals, each client was to be readministered the PAC to assess progress and the other data were to be updated. The state Mental Health Department was to receive and process all data, complete statistical analyses, and issue annual reports, which describe clients and client progress, and relate progress to services, client demographics, etc. A pilot test was designed and carried out in 1975-76 to test the feasibility and utility of the approach in state DD programs. The pilot site was selected on the basis of specific criteria. All agencies in the western part of the state were involved. In fact, most of these agencies volunteered and demonstrated cooperation and enthusiasm. A training package was developed by a wellrespected university consortium under contract. Daylong training sessions were held, involving all those who would be doing the rating, supervising raters, or training the raters. The focus was on developing and practicing skills in evaluating clients on the PAC. The training included an assessment of the reliability of the
CAROL T. MOWBRAY and SANDRA E. HERMAN
instrument and the effectiveness of the training. After training, data from the pilot test were carefully and comprehensively collected and analyzed. Over 1,100 clients were assessed: 37% in the institutions and 63% in the community. The reliability of the raters was high (85%) and was increased through the training. The scoring distribution was analyzed and found to be acceptable for parametric analysis techniques. The PAC demonstrated good internal consistency and reliability. Construct validity was assessed through factor analysis. Criterion-related (concurrent) validity was assessed through correlations with established instruments and client characteristics (independence level). All validity checks yielded acceptable results (Milan & Hallgren, 1976; Herman, 1978a, 1978b). Thus, detailed and sophisticated statistical analyses confirmed that the Uniform Client Assessment approach had acceptable psychometric properties. The consensus of participating agencies was that it was feasible for them to collect and submit these data in a reliable, valid, and timely form. It was decided that the approach should be implemented statewide. In 1976-1977, training provided on a region by region basis, as each was ready for implementation. Problems and questions now began to appear. As data began to pour in from a larger number of agencies, state-local staff became unable to process them in a timely fashion. The data quality and completeness were not as good as in the pilot test. There were insufficient resources to intervene with individual agencies and no system-‘wide sanctions in place to address the lack of validity or completeness of datas. Additionally, some agencies resisted participating and no sanctions for failing to participate had been established. Finally, not only were there more data to analyze, but the analyses were more difficult to carry out since it was not clear what information was needed in the output reports. Finally, in November 1977, a community mental health agency filed suit against the Department. It had resisted complying with the required data collection and submission. The regional administration threatened retroactive cancellation of its contract as a sanction. The agency sued on the basis that providing the information was an invasion of client right to privacy and that it would harm service: By virtue of the fact that they have been labeled “mentally retarded” and are receiving service through the state mental health system, a selected group of people are subjected to an arbitrary and value-laden rating system which seeks to establish common standards of behavior for virtually every aspect of life. This is being done without clearly establishing the state’s need or interest in creating such standards or for collecting such data and without regard for whether or not a deficiency exists for any given client. (Phoenix Place, 1977, p. 2) The suit further
alleged that the rating
system required
the assessor to function as a “peeping tom” and that the amount of time required to do these assessments and the necessity of showing progress would interfere with the client receiving appropriate treatment suited to condition. Based on the plaintiff’s allegations, the Court granted a restraining order, prohibiting the Department from utilizing the PAC. In retrospective review, although methods of implementing the proposed evaluation system had a number of positive aspects, there were five major deficits, particularly in the policy area. First, the Department had never estimated the amount of resources necessary to process the data and analyze them at the state level. Feasibility was only addressed for the local level in the pilot study. In fact, such a resource specification could not have been done because of a second major deficit: The Department had never clearly stated or decided the purpose for the data. The pilot study report had indicated some suggested uses, such as, to examine overversus under-utilization of programs, to analyze accessibility of services and to indicate gaps in services, and differences in services and referral patterns. But there was never any official policy stated as to the specific uses-and there were many possibilities! For instance, scores could have been used as eligibility criteria for certain services. Scores also could have been used as a measure of agency performance with lack of progress penalized through funding cutbacks or restrictions on program expansion. Alternatively, summarized data could have been used by the Department in a more helpful fashion, such as to provide technical assistance and intervention via a management by exceptions approach, or as a basis to ask the Legislature for more funds. None of these specifics had ever been stated. In fact, the PAC suit charged: The first step essential to that process [of evaluation and accountability], however, is for the department to articulate clearly and definitively which data is [sic] needed at the state level, why it is needed, how it will be utilized, and with whom it will be shared. Then the service providers should be asked to assist in developing the procedures which will be used in gathering the data that hopefully all will have agreed must be collected. (Phoenix Place, 1977, P. 6).
As a result of this deficiency, the proposed evaluation system had a third major flaw: lack of data standards. There was no consistent or even explicit policy as to how the Department would deal with agencies who refused to supply data or whose data were incomplete or unreliable. If it had been clear how the data were to be used, the consequences of no data or having poor data would have naturally followed. The fourth major flaw in the system was that evaluation “started with the wrong end of the horse.” That is, the focus was on outcome evaluation before good process and output data were in place. In fact, even if
Evaluation Approaches in Mental Health outcome could be successfully measured, without reliable process and output indicators, it would be useless because it could not be related to meaningful activities: if “successful programs” (in terms of client impact) were found, there would be no means of knowing what made them successful. A final problem was that the evaluation system never formally addressed client confidentiality or the need for client consent. Even if the Department felt that it had sufficient statutory basis to make such provisions unnecessary, such a procedure needed to be formalized and substantiated by outside legal authority. The experience with the PAC is an example of how an evaluation effort can flounder when the program is essentially unevaluable. Evaluability assessment (Rutman, 1980) provides a tool for understanding, in retrospect, why this attempt at a statewide evaluation failed. Two major principles governing the evaluability of a program are concerned with program characteristics and the purpose and feasibility of the evaluation. Program characteristics include a defined program structure, program goals, and implementation of the program consistent with its prescribed structure. The purpose and feasibility factors include the purpose of the evaluation and the feasibility of the evaluation in terms of design and implementation of the program, the evaluation methodology, and constraints. In the case of the PAC evaluation, the program under study was deficit on all three program characteristics. The evaluation encompassed services ranging from outpatient case management to inpatient treatment. This range of services was provided by the state and 55 county mental health authorities, which either directly provided services and/or contracted with private providers for services. There was great variability across these programs in structure, goals, and methods of implementation. Attempts to relate PAC scores to the effectiveness of the program before the evaluation capacity had developed sufficiently to track process and output measures doomed the evaluation. Thus, one major flaw can be attributed to the failure by both evaluators and state administrators to recognize that the program was not a single program but a multitude of programs which varied in suitability for evaluation. The PAC evaluation effort also violated the purpose and feasibility principles of evaluability assessment. The effort lacked clearly stated purposes for the data, i.e., there was no explicit purpose for the evaluation. Statements concerning the purpose for the evaluation were focused on general issues of client outcome. In addition, the methodology chosen was not feasible because it failed to assure that data were valid and reliable, i.e., lack of data standards. Further, the evaluation effort failed to consider cost and legal constraints. The cost of processing and analyzing data were not well estimated prior to statewide implementation and no con-
339
sideration was given to legal constraints around client confidentiality. The evaluability assessment indicates that the PAC evaluation effort should not have been undertaken. Why were these factors overlooked? This was the first evaluation effort undertaken by the Department of Mental Health. In addition, there was little evaluation literature available in the early 1970s to identify the flaws prospectively for the Department. ‘As would later be stated in the 1977 “Guidelines for Program Evaluation in CMHCs,” program evaluation capacity requires some time and experience before it can operate effectively (p. 33). This capacity was neither sufficiently developed nor sophisticated to recognize the pitfalls of the approach selected. The PAC instrument was finally dropped from the Uniform Client Assessment System because of lack of resolution of the temporary restraining order. A period of retrenchment followed with the Department subsequently restating its need to have some uniform client data and evaluation of systems in operation for accountability purposes. This renewed focus on evaluation initiated an alternative evaluation system- one which is closer to a bottom-up approach. The Modified Bottom-Up Approach: An Alternative At about the same time as the federal shift to block grants, the Michigan Department of Mental Health also began shifting responsibility for program development and control over entry to state-operated mental health programs to county mental health authorities. Concurrently, county mental health authorities began to develop local management information systems supported by microcomputers and to become less dependent on the state for centralized data processing and management information support. With increasing emphasis on decentralization of services, the Department was faced with the problem of how to relinquish direct control of program evaluation activities as it had with direct service programs and still meet the stipulations of Michigan’s mental health laws require it to evaluate its own programs and assure that the county mental health authorities do the same. Experiences of both federal and state governments over the last fifteen years support a move away from centralized state level evaluation systems. The previously described experiences of Michigan’s Department of Mental Health with the PAC exemplify many of the problems encountered with such standardized efforts. Kimmel (198 1) summarized a number of lessons to be learned from these experiences. Among the negative lessons (Don’ts), Kimmel included: (a) mandating program evaluation requirements; (b) mandating outcome evaluation studies; (c) mandating any single evaluation methodology, ideology, approach, or method; (d) mandating evaluation because it is a “good” thing to
340
CAROL T. MOWBRAY and SANDRA E. HE!RMAN
do: and (e) expecting overall judgments on the worth or value of programs. His list of positive lessons (Do’s) included: (a) making program evaluation requirements selective, restrained, permissive, and enabling: (b) viewing evaluation as the study of programs; (c) being modest in expectations about payoffs from ev~uations; (d) keeping program evaluation in perspective; (e) viewing program evalaution as developmental; and (f) keeping evaluation requirements feasible and flexible. Other commentators on the area of mandated evaluation have suggested that evaluation of human services be used to promote accountability and to increase the use of information to improve programs (Windle & Neigher, 1978; Flaherty & Windle, 1981). The dilemma for the state is how to meet its responsibilities to assure that mental health programs are evaluated with regard to appropriateness, effectiveness, and efficiency and not mandate a single uniform evaluation methodology. An alternative to centrally collecting and processing all evaluative data is to shift the responsibility to the local agency. In a totally bottom-up approach, the responsibility is shifted to the local program with no additional requirements from the state. Yet, the state, as funder, must assure effective and efficient expenditure of public funds. The relationship between the state and county programs in Michigan clearly embodies the assumptions on which an accountability system is based (Windle & Keppler-Seid, 1984): There is a hierarchical relationship and the counties must report information on performance to the state. Thus, a strictly bottomup approach is not possible from the state’s perspective but a modification could be acceptable. Such a shift, however, must be accompanied by the specification of the state’s information needs and parameters for program evaluation performance by local programs. In the Michigan Mental Health System, the problems encountered with a top-down approach to program evaluation have led to an alternative strategy in which the state specifies a limited set of data elements which form the management/accountability tool and develops standards on program evaluation activities for the mental health system. Such an approach recognizes the need to decentralize evaluation responsibilities along with programmatic responsibilities and still provides the state with a means to promote accountability through program evaluation. It also allows the state, as Chelimsky (1981) has suggested, to balance its need for valid and reliable data with the county’s needs for administrative flexibility and program integrity to make evaluation workable. The m~nagement/ac~o~nta~i~ity tool. In selecting the data elements to be included, the Department identified four basic uses for this information: (a) accounfubilify, i.e., to explain and justify to the various constituent groups current services and costs as we11as future ser-
vice development plans and projected costs; (b) monitoring the match between clients and characteristics of programs (restrictiveness of environments, cost alternatives, etc.); (c)planning for development of alternative community services based on client need; and (d) individual client placement planning, i.e., joint planning between community mental health programs and state-run mental health facilities for services to individual clients. The data set includes (a) client demographics (identification number, birthdate, gender, and race); (b) assessments of client behavior (global functioning level, maladaptive behaviors, self-help/daily living skills, community living skills); and (c) service objective (psycho-social adjustment, crisis resolution, rehabilitation, maintenance). These data elements rep resented a marked reduction in the amount of information required from local programs compared to the requirements of the PAC System and yet give the state basic descriptive information on clients. Program evaluation. The management
tool will permit the state to meet some of its accountability requirements. It does not, however, assess the adequacy, quality, effectiveness, or efficiency of county programs. A modified bottom-up program evaluation system meets these needs by placing the evaluation responsibility within the county agency. Problems for the state in using such an approach are: How to guide the development of program evaluation capacity in local agencies; how to specify the overall goals of program evaluation activities so that there is evenness in evaluation across the state; how to monitor local program evaluation activities; and how to specify its own role and contribution to program evaluation. Standards for program activities are one way of addressing these problems. The standards should serve a number of functions: clarify the definition of program evaluation; set the policy on the development of program evaluation capacity; establish the performance expectations for county mental health authorities; establish procedures for monitoring compliance with the guidelines; and provide guidance to county mental health authorities on credentialing of program evaluation personnel and developing program evaluation systems. In serving these various functions, the standards should avoid the “don’ts” of Kimmel’s lesson list and follow the “do’s.” The standards for program evaluation being developed in Michigan attempt to be feasible, flexible, and above all enabling. The standards take a broad view of the type of activities which constitute program evaluation and view the program evaluation enterprise as developmental in nature. The type of program evaluation activity to be pursued must be appropriate not only to the problem to be addressed but also to the expertise and capacity of the agency to carry out the work. Evaluation capacity will change and develop
Evaluation Approaches in Mental Health along with different approaches as the programs and problems under study begin, mature and change. The intent of the standards is to establish the locus of control over evaluation of county programs in the county community mental health authority. In this approach, the state takes on two evaluation roles: providing guidance and technical assistance to counties in developing and using evaluation capacity; and assessing the effectiveness of new treatments and the replicability of model programs. The Michigan standards on program evaluation cover policies of the Department on evaluation, program evaluation performance requirements, the Department’s responsibilities, ethics and conduct of program evaluators, and credentials for program evaluators. The standards are in line with Kimmel’s recommended “do’s” in the manner in which program evaluation is defined, the inclusion of the evaluability concept, and the major modes of program evaluation activities designated as appropriate. The most distinguishing features of the standards are the definition of program evaluation, the definition of two major modes of activity at the county level, and the use of a task force on evaluation. Program evaluation is defined very broadly as the systematic gathering, analysis, and interpretation of valid and reliable information using social science methods for use in decisions concerning status of service delivery system in comparison to desired outputs, program planning, program implementation and improvement monitoring, impact (outcome) assessment, and determination of program efficiency. Evaluation activities include but are not limited to needs assessment, explication of program goals and objectives, study of program performance in relationship to the goals and objectives of the program, study of the process of service delivery and/or implementation of programs, the study of client and program outcomes in relationship to the program’s goals and objectives, and cost-benefit/cost-effectiveness studies. The definition recognizes that program evaluations must serve a number of functions and provides flexibility to the county programs in determining the type of evaluation which best meets their needs. Such a broad definition also permits the standards to emphasize the evaluability concept that evaluations should be appropriate to both the program or problem and within the capacity and expertise of the agency. The definition does, however, place some limits on the types of activities which can be considered evaluation. General administrative activities are excluded as are number “crunching” and accounting activities. The activities seen as germane to county level program evaluation are to be divided into two major modes of activity for purposes of planning and reporting on evaluation activities. These two modes are system as-
341
sessment and special program evaluation studies. System assessment is defined as those activities directed at continuous program assessment and improvement. System assessment program evaluation activities focus on the status of mental health service delivery as it exists at any given point in time in comparison to the desired outputs and outcomes of the mental health service delivery system. Special program evaluation studies are defined as time limited studies directed at evaluation of specific programs, service delivery activities, or problems. This distinction permits county programs to be selective in the programs or problems to be studied in depth and also permits the county programs to use their evaluation capacity to keep services on target in terms of outputs and outcomes. The identification of two modes of activity also serves to further define activities which fall into program evaluation. The state’s recognition that there are two modes of operation in program evaluation permits county agencies to legitimately channel limited program evaluation resources to activities with the greatest utility to them. The standards encourage both program accountability and program amelioration through the specification of these two modes of operation. System assessment activities foster program accountability by focusing on outputs and outcomes of mental health services in relationship to the identified goals and objectives of the county programs and producing measures of system performance which can be compared to established performance objectives. Special program evaluation studies identify strengths and weaknesses of selected county programs by studying both the process and outcomes of services provided. The results can be used for program amelioration activities by validating and improving the effectiveness of service delivery. Furthermore, partitioning evaluation activities into three segments (state level, local special studies, and local system assessment) permits the three purposes of program evaluation outlined by Rossi (1978) to be met, e.g., research on program effectiveness, testing the replicability of programs, and monitoring the operation of programs. State level evaluation is focused on the implementation process and outcomes of new programs and it constitutes research on the effectiveness of these models. Collaborative studies between the state and local programs are focused on the replication of program models across sites and serves the second function. Finally, local system assessment and special studies focus on the actual operation of programs. The establishment of a task force on program evaluation is a means of allowing counties to have access to state level policy-making on program evaluation. This committee has representatives from the county programs as well as from the Department. The charge to the task force is to advise the Department on new policy
342
CAROL T. MOWBRAY and SANDRA E. HERMAN
directions within program evaluation and to assist the Department in determining priorities for its own evaluation activities as well as overall goals and objectives for county program evaluation efforts. The task force assists the state in identifying its evaluation research and program replication agendas. Final elements to implementation of the standards and their effective use involve monitoring methods and compliance mechanisms. Through the administrative procedures of the Department, these standards become part of the contractual agreement between the state and counties used to fund county programs. Thus, county agencies will have performance standards to meet in the area of evaluation as well as in clinical services. Monitoring of evaluation activity becomes an integral part of the overall monitoring of the agencies’ operations, carried out by the Dep~ment’s area service managers, who oversee the contracts. Noncompliance in any area, including evaluation, becomes an issue in contract renegotiations and in requests for new funding versus redirection of base program expenditures. ~~p~~~ff~~~~sof the standards. The standards have implications for both county programs and the Department. Standards such as these, which do not mandate a specific methodology, ideology, approach, or method, shift the burden of responsibility of determining what to do to the county agency. The standards do legitimate much of the current work of county program evaluators (Clifford & Sherman, 1983). In order to meet its con-
tractual obligations, the county mental health agency must invest in the resources necessary to formulate an evaluation capacity. Funneling data to the Department will no longer suffice to meet the state’s evaluation requirements. The county agency must take a greater responsibility in determining the information most necessary for it to demonstrate accountability and to improve its programs. For the Department, the standards indicate a recognition of the autonomy of the county programs. The standards lead the Department away from large routine data collection procedures to reliance on the county agencies for summary evaluative information. With this shift comes the need for the Department to clearly define the information it needs to meet the accountability demands placed on it. The presence of standards also makes evaluation a recognized component of the mental health system and permits state level evaluators to obtain explicit priorities for the development of evaluation capacity and for carrying out the evaluation mandate. The priority assigned to program evaluation will influence the amount of resources invested in it and the speed with which evaluation capacity develops. The standards also make it possible for community mental health boards to evaluate service purchased from programs run directly by the Department. In a sense, this epitomizes the potential impact of the standards- roles are now reversed and it is the county agency which assesses the value of services and programs rather than the State.
DlSCUSSION The two case studies presented here are historically linked. The modified bottom-up has clearly been shaped by the experience of the top-down approach. The bottom-up approach is an effort to install accountability mechanisms while avoiding the problems generated by lack of uniform program characteristics and the constraints on the feasibility of a statewide evaluation system. The top-down approach failed because it attempted evaluation with a program and in a setting unready for evaluation. Can the modified bottom-up approach avoid these and other problems associated with it? Consideration of the potential problems suggests that it will be more successful. Capacity for evaluation is a critia1 element for the success of an evaluation system. The top-down approach assumed and required a capacity which was not available at either the state or county levels of the system. The modified bottom-up approach avoids this problem by allowing county programs to select the components of the program to be studied as well as the methodology. The evaluation standards support an initial emphasis on process evaluation through activities identified as system assessment. This will permit the capacity to carry out evaluation activities to grow as
the agency’s capactiy to use information grows. State level monitoring of the local evaluations will serve as a mechanism for regulating these evaluations within the available capacity. Another element critical to successful evaluation is clearly defined program characteristics. County control over the evaluation will reduce the amount of variability in program definition, structure and goals included in any single evaluation. This will permit better definition of the entity to be evaluated, State level evaluations will be focused on new program models and replications. In each of these instances, determining program definition is a major step in the evaluation sequence. Finally and most importantiy, the success of the bottom-up approach will be determined by how robust the evaluation procedures are to threats to internal and external validity and sources of unreliability. Threats to the internal validity reduce the users’ ability to link evaluation results to the effects of the program. Threats to external validity will reduce the users’ ability to generalize across samples and contexts (Bernstein, Bohrnstedt, & Borgatta, 1976; Rossi, Freeman, & Wright, 1979). Lack of reliability in the evaluation procedures will further magnify validity problems and
Evaluation Approaches in Mental Health introduce additional error. For county special studies and state effectiveness and replication studies, these threats can be handled through good evaluation designs which take into account sources of invalidity and unreliability. The evaluation standards provide criteria for evaluator credentials. This should ensure the employment of qualified evaluators who can develop appropriate designs. The threats to the validity and reliability of the management/accountability tool data are more difficult to reduce. Here, the state is directly relying on county programs to generate the required data. Periodic training has been found to reduce the unreliability of measures similar to those which form the management tool (Green, Nguygen, & Attkisson, 1979). Solutions to the threats to the validity of these data are not as readily apparent. The premise of the management tool is that the characteristics of a program should be related to the degree of functioning exhibited by the client using it. Such a premise leaves the data open to error because of: events other than the program which occur between repeated assessments of the clients (history); the level of unreliability across persons performing the clinical assessments (instability); effects of repeated use of the same instrument (testing); changes by the observers in the scoring of the behaviors over time (instrumentation); regression of scores over repeated assessments toward the true score (regression); and bias in admission of clients to programs (selection) (Campbell & Stanley, 1963; Campbell, 1972). These sources of error reduce the confidence that can be placed in the relationship between program characteristics, such as restrictiveness, and client functioning as revealed by the data from the management tool. The premise also implies a level of generalizability across county programs which is jeopardized by lack of standard treatments in
343
these programs and the interaction of program differences with client differences (Campbell & Stanley, 1963; Bernstein, et al., 1976). As long as the state relies on county programs to generate its data, these threats cannot be completely eliminated. Once the rules governing the state level use of the data are known, the data are open to “gaming,” i.e., manipulation to gain a favorable status (Kimmel, 1984). Stating purposes for the use of these data, phasing in the use of the data as performance measures with fiscal consequences and auditing of submitted data should help address selection threats to internal validity and the threats to external validity. Periodic training of staff persons who collect data will reduce the threats from instability and instrumentation sources. Although the threats to validity cannot be totally eliminated, the extent to which these threats diminish the accuracy of the data can be decreased, leading to a greater potential for success. Evaluation systems encompassing large service programs have been problematic in the past and will continue to be so. Given the constraints on evaluation at the state level, we think that the coordinated efforts between the state and county programs has the greatest potential for success. Each takes responsibility for a segment of program evaluation which is most in line with its mission and direction. In commenting on the transition to block grants for CMHCs, Ciarlo (Neigher et al., 1982) indicated a belief that a coordinated strategy is the wisest until states are capable of evaluation efforts not dependent on county level provision of data. The modified bottom-up approach using performance standards for evaluation has the option to grow and change as capacity and technology change at the state and county levels. This is the strongest rationale for expecting success.
SUMMARY This paper has described the experience of one state mental health authority in attempting to carry out its statutory accountability requirements. The experience with the PAC substantiated the limitations of a top-down approach which have been described in the evaluation literature. The case study further points out the dangers of (a) mandating data but not specifying its use, (b) inadequately assigning resource requirements before putting a system in place, and (c) not attending to policy and legal issues, such as sanctions for noncompliance with data collection requirements and the implications for confidentiality and client consent. The case studies further describe a modified bottom-
up approach which as been instituted as an alternative, attempting to utilize what was learned from the first failure experience. This approach specifies a minimal data set to be collected and utilizes evaluation standards for planning, implementing, and reporting evaluations of the quality, effectiveness, and efficiency of county mental health programs. We believe this alternative approach is sound. It has received support from state decision-makers and county agency personnel. However, only the future can determine its feasibility and its utility for state and local decision-making and system change.
REFERENCES BERNSTEIN, I. N.. BOHRNSTEDT, Cl. W.. & BORGATTA, E. F. (1976). Extemal validity and evaluation research: A codification
of problems. In I. N. Bernstein (Ed.), Validity issues in evaluation research (pp. 107-134). Beverly Hills, CA: Sage Publications.
CAROL T. MOWBRAY and SANDRA E. HF2RMAN CAMPBELL, D. T. (1972). Reforms as experiments. In C. H. Weiss (Ed.), Evaluating action programs: Readmgs in social action and education (pp. 187-223). Boston: Allyn and Bacon. CAMPBELL, D. T., & STANLEY, J. C. (1963). Experimentalond quasi-experimental designs. Chicago: Rand McNally. CARTER, R., Br WALKER, R. (in press). Everybody c@resbut nobody knows: Public uccountability in rhe 1980’s.
(Ed.), National Institute of Mental Health Series BN No. 5, Program performance measurement: Demand, technology, and dangers (DHHS Publication No. ADM84-1357, pp. 108-116). Washington, DC: U.S. Government Printing Office. MILAN, M., & HALLCREN, S. (1976). The uniform client ussessmen1 of the mentally retarded: A pilot study. Lansing, MI: Michigan Department of MentaI Health.
CHELIMSKY, E. (1981). Making block grants accountable. In L. Datta (Ed.), Evaluation in change: Meeting new government needs (pp. 89-120). Beverly Hills, CA: Sage Publications.
NEIGHER, W., CIARLO, J., HOVEN, C., KIRKHART, K., LANDSBERG, G., LIGHT, E., NEWMAN, F., STRUENING, E., WILLIAMS, L., WINDLE, C., & WOY, J. R. (1982). Evaluation in the community mental health centers programs: A bold new approach? Evaluation and Program Planning, 5, 283-311.
CLIFFORD, D. L., & SHERMAN, P. (1983). Internal evaluation: Integrating program evaluation and management. In A. J. Love (Ed.), Developing effective internal evaluations: No. 20 New Directions for program evaluation (pp. 23-45). San Francisco: Jossey-Bass.
NEIGHER, W. D., & SCHULBERG, H. C. (1982). Evaluating the outcomes of human service programs: A reassessment. Evaluation Review, 6, 731-752.
CRONBACH, L. J., AMBRON, S. R., DORNBUS~H, S. M.. HESS, R. D., HORNIK, R. C., PHILLIPS, D. C., WALKER, D. F., &WEINER, S. S. (1980). Towardreform ofprogram evafuotion. San Francisco: Jossey-Bass. DAVIS, H. R., WINDLE, D., & SHARFSTEIN, S. S. (1977). Developing guidelines for program evaluation capacity in commute mental health centers. Evaluufion, 4, 25-34. FLAHERTY, E. W., & WINDLE, C. (1981). Mandated evaluation in community mental health centers: Framework for a new policy. Eva~uution Review, 5, 620-638. GREEN, R. S., NGUYEN, T. D., & ATTKISSON, C. C. (1979). Harnessing the reliability of outcome measures. Evaluotin and Program Planning, 2, 137-142. GUIDELINES FOR PROGRAM (1977). Evaluation, 4, 30-34.
EVALUATION
PATTON, M. Q. (1978). Uti&&ion focused evaluation. Beverly Hills, CA: Sage Publications. PATTON, M. Q, (1982). Pructicul evaluation. Beverly Hills, CA: Sage Publications. PHOENIX PLACE AND THE PAC manuscript, Phoenix Place, Detroit, MI.
(1977).
Unpubl~sh~
ROSSI, P. H. (1978). Issues in the evaluation of human service delivery. Evaluation Quarterly, 2, 573-599. ROSSI, P. H., FREEMAN, H. E., & WRIGHT, S. R. (1979). Evaluation: A systematic approach. Beverly Hills, CA: Sage Publications. RUTMAN, L. (1980). Planning useful evaluations: Evoluobility assessment. Beverly Hills, CA: Sage Publications.
IN CMHCs
GUNZBERG, H. C. (1969). Progress assessment chart for social and personal development (Vols. I-4). Warwickshire, England: SEFA LTD.
SMITH, N. L. (1980, Winter). Studying evaluation assumptions. Evaluation Network Newsletter.
HERMAN, S. E. (1971(a). The PAC as an ~~t~ment to measure change in adupfive behavior. Paper presented at the American Association on Mental Deficiency Annual Conference, Denver, CO.
WINDLE, C., & KEPPLER-SEID, H. (1984). Introduction: A model for implementing performance measurement. In C. Windle (Ed.), National Institute of Mental Health, Series BN No. 5, Program Performunce me~urement: Demands, technology* and dungers (DHHS Publication NO. ADM84-1357, pp. I-14). Washington, DC: U.S. Government Printing Office.
HERMAN, S. E. (I 978b). The construct validity of the PAC. Paper presented at the American Association on Mentaf Deficiency Annual Conference, Denver, CO.
WINDLE, C., & NEIGHER, W. (1978). Ethical problems in program evaluation: Advise for trapped evaluators. Evaluation and Program Planning, I, 97-108.
KIMMEL, W. A. (1981). Putting program evaluation in perspective for state and local government. Human service monograph series (Project SHARE No. 18). Washington, DC: Department of Health and Human Services.
WINDLE, C., & OCHBERG, F. M. (1975). Enhancing program evaluation in the community mental health centers program. Evaluation, 2 (2), 31-36.
KIMMEL, W. A. (1984). State mental health program performance measurement: Selected impressions from three states. In C. Windle
WINDLE, C., & WOY, J. R. (1977). When to apply various program evalution approaches. Evafuation, 4, 35-37.