Paradigms for Establishing Experimental Construct Validity

In Counseling and Psychotherapy

John J. Horan

Division of Psychology in Education

Arizona State University

RUNNING HEAD: Experimental Construct Validity

Paper presented at the Annual Meeting of the American Psychological association, New York, August 11, 1995

Draft: 8/9/95

Abstract

This paper has several components. First, I trace the logic of experimental construct validity from its origins in certain key features of psychometric construct validity established by Campbell and Fiske (1955) down through subsequent work by Cook and Campbell (1979). Second, I provide early examples of how experimental construct validity was addressed in ad hoc fashion before the term became widely employed. Third, I show how experimental construct validity can be evidenced from blocking that derives from current work under the topic of differential diagnosis. Finally, I describe how construct validity may be established from the outcome pattern produced by a given experiment.

Paradigms for Establishing Experimental Construct Validity

In Counseling and Psychotherapy

My comments today derive from two previously unpublished papers. The first was a 4-minute "new fellows" address that I gave at APA eleven years ago (Horan, 1984). I had been asked to speak about what I found exciting, and I replied with the supposition that experiments in counseling and psychotherapy could reflect the convergent and discriminant tenets of construct validity so firmly established in the psychometric literature. That talk previewed the logic undergirding Kate McNamara's dissertation on the treatment of depression (cf, McNamara & Horan, 1986), a logic I believed was applicable to clarifying process and outcome changes in many other clinical problems. In a nutshell, counseling treatments should produce changes on theoretically relevant measures while failing to produce changes on unrelated measures.

The second paper, delivered as an invited address at AERA three years later (Horan, 1987) greatly expanded my thoughts on the matter. I described various paradigms for establishing experimental construct validity, but had not yet gathered a sufficient amount of illustrative data to warrant going to print. Although Kate's data were textbook clear, two subsequent dissertations showed no evidence for the construct validity of the interventions deployed.

So here I am a decade and a dozen checkered studies later still believing in the merits of experimentation and the importance of demonstrating experimental construct validity. In the face of increasing discontent with the experimental method in our field and the stampede to alternative methodologies, I am fully aware of the putative Quixotic implications of my stance. Nevertheless, I take heart from the words of Gilson (1937, p. 85):

"If metaphysical speculation is a shooting at the moon, philosophers have always begun by shooting at it; only after missing it have they said there was no moon, and that it was a waste of time to shoot at it. Skepticism is defeatism in philosophy, and all defeatisms are born of previous defeats. When one has repeatedly failed in a certain undertaking, one naturally concludes that it was an impossible undertaking. I say naturally, but not logically, for a repeated failure in dealing with a problem may point to a repeated error in discussing the problem rather than to its intrinsic insolubility."

I have several objectives in this paper. First, I'd like to give a quick refresher on key features of psychometric construct validity and how it might be extrapolated to experiments. Then, I'll cite examples of how experimental construct validity was addressed in ad hoc fashion before the term became widely employed. I'll move from there to show how experimental construct validity can be evidenced from blocking strategies that derive from emergent thinking under the topic of differential diagnosis. Finally I'll describe some attempts to approach construct validity through the outcome pattern produced by a given experiment.

Construct Validity: Review and Extension

The topic of construct validity has been a cornerstone of psychometric theory for the past four decades (see Bechtoldt, 1959; Campbell & Fiske, 1959; and Cronback & Meehl, 1955). Most text-book definitions speak in terms of a gradually accumulating bank of logical-empirical evidence that the psychometric device does in fact measure a specific psychological construct. The phenomenon of convergence is an important property. For example, one might expect scores on a state anxiety index to covary with alternating phases of experimentally induced stress and relaxation. But divergence is also important, so one might subject the device to Campbell and Fiske's (1959) elegant "multitrait-multimethod matrix" procedure and note whether it: 1) correlates with different methods for measuring the same construct (convergent validation) and 2) fails to correlate with similar methods for measuring different constructs (discriminant validation). A self-report state-anxiety-index, for example, should correlate higher with heart rate acceleration (a different method for measuring the same construct) than it does with a self-report index of social approval (a different construct assessed via the same method).

Cook and Campbell (1979) extended the concept of psychometric construct validity to include the design of experiments. Essentially, they were updating their views of internal and external validity and saw construct validity as a subset of external validity. Internal validity pertains to the question of whether or not an effect occurred; construct validity presumes that effect, and then focuses on the underlying cause-effect relationships between theoretical components.

Here, I will restrict myself to only one aspect of experimental construct validity, namely, the logical-empirical link between independent and dependent variables. Independent variables include the traditional blocking and manipulated variables of counseling research (e.g., subject sex and treatment strategy) along with alternative conceptualizations derived from recent thinking on the topic of differential diagnosis. The dependent variable category, however, can be thought of as including not only the typical direct and generalization outcomes sought by most investigators, but a broad span of other measured variables as well. For example, independent variable manipulation checks, assessments of counseling process, and demand-credibility-expectancy analyses, all reflect the logical link, or lack thereof, between cause and effect. Let us first of all look at some early ad hoc attempts to address the issue.

Ad Hoc Examples of Construct Validity

When software malfunctions, the manufacturers -- if they are still in business -- issue "patches" to remedy the bugs. Several methodological patches appeared in the literatures of counseling and psychotherapy during the late 1970's. They arose from the growing realization that "all was not right" with our data -- further controls beyond random assignment were critical. In some cases the additional controls were procedural such as checking addresses of volunteers to make sure that individuals assigned to experimental and control interventions did not live in the same residence. In other cases, control measures were employed which assessed phenomena critically related to the construct validity of the experiment.

These ad hoc analyses are well understood by serious experimenters in our field and are fully in the public domain. I continue to be puzzled, however, as to why so many manuscripts that I review do not include them. Let me briefly describe two methodological patches for improved construct validity.

1. Independent variable manipulation analyses permit us to determine whether or not the treatment "took." Chemotherapy researchers, for example, need to know if their patients cheeked the pill; counseling researchers have to assess if their clients learned what they were supposed to learn and then put that learning into practice.

In a study on pain management (Hackett & Horan, 1980), for example, we found that relaxation training increased pain tolerance and that distraction/imagery strategies raised pain thresholds. A self-instructions treatment, on the other hand, produced no effect whatsoever. An independent variable manipulation analysis, however, revealed that relaxation was put into practice by 83% of the subjects who were trained in it; conversely, those who were not trained in using it generally did not attempt to relax themselves (note the appropriate convergence and divergence). Interestingly, a large number of subjects, 45%, ignored their self-instructions treatment altogether. The impotence of self-instructions, thus, might be a function of the fact that many of these subjects cheeked their pill.

2. Demand-credibility-expectancy analyses permit the experimenter to assess whether or not the control group controlled for anything important. Soon after the Campbell and Stanley era, the realization that no-treatment controls fail to address the placebo problem became widespread. The fact that placebo treatments themselves may not solve the placebo problem, however, was not understood until much later (e.g., Kazdin & Wilcoxon, 1976). What we in fact do to our clients under the label of placebo is less important than what they believe we are doing to them. For example, the mere belief that one is receiving a pain-killing drug causes one's body to secrete endorphin, a form of opium (e.g., Levine, Gordon, & Fields, 1978). Thus, unless we ensure that our placebo and experimental treatments generate equivalent beliefs (e.g., hopes for improvement), emerging favorable outcome differences are in no way construct valid.

Procedures and instruments were thus developed to address what is variously known as differential demand, credibility, and/or expectancy -- new terms for an old problem. A thorough review of developments and ensuing controversies here would take us far afield. Early on, for example, questions arose about the proper time to assess for differential demand. A testing point at the outset of treatment may not provide the subjects with sufficient data on which to make a valid judgment, whereas testing late in treatment may reveal differences arising from the subjects' correct realization that the experimental treatment is indeed working. Later debate (e.g., American Psychologist, 1987) focused on whether the concept being assessed here is really a subset of self-efficacy, which not only should reflect differences, but indeed has been postulated to be the central mechanism for therapeutic change. Thus, perhaps we ought to be using other devices, for example, good measures of theoretically unrelated outcomes, to illuminate the possibility of a hidden placebo artifact. Finally, it is somewhat ironic that containment of an artifact as powerful as the placebo phenomenon should rely on an inference from a null effect -- the weakest form of evidence suggesting equivalence in demand.

In any event, assessing the possibility of differential demand has been an important milestone in patching the construct validity of an experiment. Despite interpretive complexities, the occurrence of a pattern of results showing strong treatment effects in the absence of differences in demand, cannot help but raise confidence in the construct validity of our findings. Let us move on to other paradigms.

Construct Validity Through Blocking

One way to establish convergence and divergence in a counseling experiment is to take advantage of advances in differential diagnosis. I am not referring to the aptitude-treatment interaction literature or to similar counseling research involving organismic variables such as introversion-extroversion or conceptual complexity. Until we have achieved a psychometric analogy of Mendeleff's periodic chart in chemistry, I believe we're better served by seeking blocking variables within the confines of the clinical problem we hope to alleviate. Picture if you will, an experiment contrasting systematic desensitization with swimming lessons in the treatment of deep-water phobia. I frequently ask students in my research course to hypothesize the likely outcome. Opinions are about equally divided; some favor swimming lessons others desensitization.

Actually, the question is meaningless without a more precise specification of the subjects' clinical problem. Nonswimmers ought to fear deep water; desensitization would be inappropriate if not downright dangerous. Conversely, from the standpoint of theory, swimming lessons are hopelessly irrelevant to water-phobic swimmers. To make sense, the two-condition question (systematic desensitization vs swimming lessons) requires another factor (swimmer vs nonswimmer) and a hypothesized interaction showing improvement only among nonswimming subjects who received lessons and swimming subjects who received desensitization.

I have not provided a straw example. Our literature on test and other performance anxieties is chock full of nutty data (c.f., Horan, 1980). For instance, in validation studies of diagnostic criteria for test anxiety, there is a considerable amount of what Cook and Campbell (1979) call "ambiguity in the direction of causal inference." Inverse correlations between the various anxiety instruments and grade-point-average or Graduate-Record-Examination scores are often touted as evidence for concurrent validity. In truth, is it not equally likely that previous mediocre performances on aptitude tests, coupled with knowledge of their importance in graduate admissions, will have an inflammatory influence on self-reported test anxiety?

I believe that this causal ambiguity has obscured the fact that there are at least three different kinds of test anxiety (adaptive, maladaptive, and reactive) with profoundly different treatment implications. Recall the classic Yerkes-Dodson law of 1909 which posited a curvilinear relationship between anxiety and performance. Some of the anxiety complained about by students is undoubtedly beneficial in the sense that it facilitates performance. That kind might be called adaptive anxiety. Recall the rising portion of the curve. If we try to diminish adaptive anxiety, theoretically, we should expect lower grade-point-averages as well. Musicians and actors are well acquainted with the need to be "up" (i.e., somewhat anxious) or their performance will be flat. Treatments other than, say, providing "normality" assurances to our clients seem inappropriate.

The effect of maladaptive anxiety is represented by the descending portion of the curve. That's where most of the action in our literature purportedly takes place, but relatively few test anxious souls seem to reside there. In fact, my former students and I collected a bushel basket full of unpublished data from which failed to identify a single individual with bonafide maladaptive test anxiety in a large class of graduate students who were approaching a final exam. Essentially, we had several equivalent forms of the exam which could be administered in practice or for real, on different days, and in counterbalanced order. Even though the students knew the instructor would not be privy to how well they did in the practice condition, their scores from equivalent forms were amazingly consistent with the real condition. No one's performance declined as a function of the test having implications for one's actual course grade!

Now, obviously, I cannot deny the existence of a popularly reported phenomenon on the basis of null findings from a possibly idiosyncratic sample of subjects. But I think it is a safe bet that our literature on interventions such as relaxation training and cognitive restructuring would be vastly improved if the legions of folks with adaptive and reactive anxiety were purged from the subject rolls.

Reactive anxiety, in contrast to maladaptive, is realistic and appropriate to the situation. Reactive anxiety results from, rather than causes poor performance. Just as we ought to fear touching a poisonous snake, taking a test after a night of partying or with full knowledge that for whatever reason our mastery of the curriculum is deficient, really ought to be cause for trepidation. That bushel-basket of data I mentioned above melded with a couple decades of practicum supervision has led to my supposition that the popularly encountered version of test anxiety is a largely reactive phenomenon more appropriately addressed with skill-building or decision-making interventions.

This background thinking set the stage for an early study foreshadowing experimental construct validity using blocking variables. Unlike test anxiety, it is relatively easy to sort out adaptive, maladaptive, and reactive musical performance anxiety. Musicians know full well -- and it's easy to verify -- whether their performance deteriorates in public recitals relative to how well they perform in practice. In her dissertation Gladys Sweeney (c.f., Sweeney & Horan, 1983) chose to work with the maladaptives, and excluded all subjects from the other two diagnostic categories. Moreover, by requiring practice sessions during the treatment phase, she prevented the development of a reactive anxiety overlay in the maladaptive subject pool.

Essentially, Gladys crossed two coping skills, a cognitive technique and a behavioral technique, both deployed in a typical stress inoculation paradigm. The coping skill given in the active control cell was musical analysis training, a procedure theoretically and operationally similar to swimming lessons and study skills, and thus irrelevant to the appropriate treatment of maladaptive musical performance anxiety. Though musical analysis training is the most commonly deployed treatment for stage fright, with a truly maladaptive subset, it's like giving swimming lessons to accomplished swimmers.

Gladys's battery of measures included not only the ubiquitous self-report state and trait devices, but she also staged public recitals from which she derived pulse rate, behavioral signs of anxiety, and an index of performing competence. Her outcomes were clear; the experimental treatments had both common and unique effects. Moreover, despite having demand characteristics equivalent to the experimental conditions, musical analysis training had no effects on all dependant variables when contrasted with an additional wait-list control condition. In retrospect, it would have been interesting to build upon this study by looking for appropriate interactions in another 2 x 2 design which would cross the combined cognitive-behavioral treatment and musical analysis training with the diagnostic categories of maladaptive and reactive anxiety.

My students and I have mused, rather unproductively I might add, about appropriate blocking variables in the differential diagnosis of other clinical problems. For example, the literature on smoking cessation once offered up a number of intriguing etiological formulations -- some smokers purportedly smoke to elicit positive affect, others to reduce negative affect. If so, the grounds for pursuing treatment-by-diagnosis interactions were fertile indeed. Unfortunately, none of ephemeral smoker taxonomies seemed to hold up under closer examination (e.g., Addesso & Glad, 1978), so our arm-load of intervention studies in this area has never moved beyond the point of simply comparing multi-component treatments with controls and reporting generic efficacy data (c.f., Olson, Horan, & Polansky, 1994).

The problem of fuzzy potential blocking variables is vexing. For example, what are the relevant facets of depression that are stable enough to diagnose, and clear enough to be theoretically strapped to different treatments? Kate McNamara and I spent a year and a half chasing down factor analyses of the Beck Depression Inventory, arguing, and using up a pound or two of napkins at a nearby cafeteria with scribbled diagrams of unfulfilled experimental designs. The literature kept pointing to the diagnostic categories of cognitive depression and behavioral depression which obviously suggested a 2 x 2 design in which each diagnostic category would be crossed with cognitive and behavioral interventions. The problem was that there were indeed cognitive and behavioral depressions, but most subjects have both. So what would be our blocking variable? Virtually all of our outcome research has thus followed what might be called a traditional "brute force" model, i.e., generic treatments applied to heterogeneous groups of subjects having purportedly common concerns. We learned later that experimental construct validity may be much easier to demonstrate via the outcomes produced by counseling interventions deployed alone or in combination.

Construct Validity Through Outcome Patterns

If one has discrete diagnostic categories, the task of designing construct valid experiments in counseling and psychotherapy is rather straightforward. What does one do when the diagnostic categories are blurry? One solution might be to focus, instead, on the discreteness of the independent variable and then hone in very carefully on what that variable is supposed to do.

Recall Campbell and Fiske's (1959) view of psychometric construct validity -- measures of the same construct should correlate across methods, while measures of different constructs should fail to correlate within the same method. Likewise, the construct validity of the counseling experiment can be thought of in terms of convergent and discriminant relationships between purported causes and effects. In other words, do the independent variables produce theoretically consistent changes on measures which they are supposed to influence, and do they reliably fail to produce differences on theoretically unrelated variables? Whereas the Campbell and Fiske paradigm pays attention to the degree of correlation, the experimental simile that I am suggesting focuses on the magnitude of the effect size.

As an opening illustration of this particular paradigm for experimental construct validity, I have chosen a very simplistic example of an experiment containing the appropriate convergent and divergent phenomena.

An Exemplary Triviality

Let us suppose we have been assigned the task of evaluating the impact of a high school study-skills program consisting of two microcomputer-assisted instructional modules, one for reading and the other for math. Let us further assume that the potential subject pool is deficient in both skills, and that our specific mission is to ascertain the separate and combined effects of these modules on multiple measures of school achievement.

At least four different treatment conditions are necessary: 1) a combined treatment consisting of both the reading and the math modules, 2) the reading module alone, 3) the math module alone, and 4) a control treatment. Since these modules are superimposed on the standard curriculum we might choose to let that standard treatment serve as the control condition. It would, perhaps, be better to plug in a "computer-assisted irrelevancy module" (e.g., career development) as the principal control condition, and view the standard curriculum as an additional "no treatment" control group; however, adding such elegancies here needlessly complicates our example. Four treatments it shall be.

We have a choice in our experimental design. We can lay out the four treatments along side each other in a 1 x 4 format; or we can deploy these conditions in a 2 x 2 design in which each factor is defined by the presence or absence of each module. For statistical power reasons, I prefer the latter. Emerging main effects would constitute evidence for the efficacy of either module; any interactions could indicate whether the modules enhance or detract from each other. But again for expository simplicity, let us momentarily assume the 1 x 4 design and move on to a set of hypotheses, which if confirmed would beef up our claims for having a construct valid experiment.

First of all, we would expect that students exposed to only the reading module would demonstrate higher reading achievement than control students and other students exposed to only the math module. We would hypothesize a similar outcome pattern from math-only students relative to control students and reading-only students on math achievement. Each of these modules would predictably have an effect (albeit smaller) relative to the control treatment on a generalization measure such as overall grade point average, which in turn would presumably be more heavily influenced by the combined treatment than by either module deployed alone. Finally, if we are to conclude that these gains are correctly attributable to the modules, per se, we would expect no differences in the quality of the independent variable manipulation (e.g., instructor fidelity to the accompanying treatment manuals), counseling process characteristics (e.g., number of treatment sessions attended by the students), and the capacity of the modules themselves to generate demands and expectations for improvement.

The aforementioned outcome pattern reflects convergent validity in that the treatments produce effects of varying magnitude on specific measures according to the dictates of logic. Discriminant validity is evidenced by the failure of the treatments to produce illogical effects. Construct validity requires both. If the reading treatment, for example, registers an effect on only the math measure, or if the math treatment generates greater expectations than the control treatment, construct validity is threatened.

Now, there may be a variety of perfectly understandable reasons why this idealized outcome pattern does not emerge. For example, a ceiling phenomenon appearing on the reading measure could easily preclude significant effects; or as has occasionally been reported, combination treatments might erode the efficacy of the individual ingredients. In the former instance, construct validity is simply not apparent. In the latter case, however, a radical revision of the purported cause-effect relationship is clearly in order. Construct validity is contra-indicated unless this peculiar phenomenon can be easily accommodated--if not anticipated a priori--by the underlying theory.

We can view this outcome pattern in alternative terms; however, before doing so it might be helpful to recall that although ANOVAs conducted on the dependent measures in a given experiment usually involve raw scores, it is perfectly permissible to standardize those scores into say z or T values beforehand. Doing so permits the array of dependent measures to be treated as an additional factor in an experimental design, and enables a definition of construct validity in terms of "operational interactions," that is, specific manipulated variables interacting with specific observed variables. In sum, here we would be looking for particular interactions between treatments and measures, whereas earlier we were simply expecting significant effects to occur on some measures and not on others.

Actual Examples

Rarely in experimental counseling research do we find treatments and measures as discrete and "clean" as that just described above pertaining to math and reading. The empathy, positive regard, and congruence of the computer are constant; the content of the math and reading treatments are obviously dissimilar; and the instruments used to assess achievement in these domains not only differ, but can be easily linked to the interventions from which they are derived. Moreover, we haven't muddied our expository water by introducing the Campbell and Fiske issue of using alternative methods beyond self-report to assess achievement.

Return to Kate McNamara's interest in depression. There is considerable debate in the literature as to what is the most effective mode of treatment. So at the outset, Kate's study might be construed as a horse race: What are the separate and combined effects of cognitive and behavioral approaches to intervention? A 2 x 2 design was contrived in which cognitive therapy was crossed with behavior therapy, and the cell defined by the absence of these two factors was occupied by a high-demand-control condition.

The measures deployed in her study fell into several categories: 1) Diagnostic/Generalization, 2) Cognitive, 3) Behavioral, and 4) Control. The first category contained measures relating to the full syndrome, and the last included the usual array of devices reflecting expectancy, client satisfaction, and adherence to the therapeutic regimen. Care was taken in the cognitive and behavioral categories to include well known instruments as well as devices that represented different methods (as per Campbell & Fiske) of assessment. So what happened? In a nutshell, there was a multivariate main effect favoring cognitive therapy which showed a clear impact on the entire cognitive battery and some generalization to a very stringent criterion in the behavioral battery (Observer-evaluated social skills). Had the latter occurred in the absence of the former, construct validity would have been threatened. The control measures also behaved appropriately--refusing to yield effects damaging to claims of construct valid findings. A post-mortem analysis revealed that both cognitive and behavior therapy may have impacted the generalization measure of depression. However, since behavior therapy had no influence on the cluster of measures reflecting behavioral change, the construct validity of behavior therapy was not evident. In effect, behavior therapy may have worked, but not necessarily because of its theoretical underpinnings.

I am fast approaching the end of my remarks today. The next papers in this symposium continue the review of a dozen or so studies showing the efficacy and construct validity of counseling interventions relevant to other problems such as the reduction of occupational stress in prison guards (Johns, Horan, & Games, 1990), the promotion of effective job interviewing skills (Cianni & Horan, 1990; Donley, Horan, & DeShong, 1989), the retention of at-risk students (Polansky, Horan & Hanish, 1993); the amelioration of loneliness (McWhirter & Horan, 1992), the prevention and treatment of substance abuse (Keillor & Horan, in preparation; Kerns & Horan, 1993; Polansky, Horan, Buki, Kornfield, Ceperich, & Burrows, 1994), the alleviation of bulimia (Olson, McNamara, & Horan, 1995), and the enhancement of self-esteem (Erickson, Horan, & Hackett, 1991; Horan, in preparation; Nielsen, Horan, Keen, Ceperich, St. Peter, & Ostlund, 1995). These data have steadily accumulated over the past decade. Some studies have provided clear illustrations of experimental construct validity; in other instances, however, we have been unable to obtain unequivocal evidence in favor of a treatment effect, let alone establish whether or not the obtained pattern of results was consistent with theory.

Undoubtedly, there are logical positivists out there who might be quick to counterpoint that underlying constructs are to cause-effect relationships as pajamas are to newlyweds (unnecessary and inconvenient). I do identify with the brashness of that perspective, and indeed might have commented in similar fashion during my misguided youth. In truth, however, ours is not a literature of serendipity; our treatments are in fact derived from theory.

To the counselor in practice who wonders "why bother?", I would like to underscore the practical implications for this sort of work. For example, the deployment of a battery of assessment devices, each of which had been shown to be malleable by a particular intervention, would have profound implications for individualized treatment programming.

Scientist-practitioners need no convincing that attending to the construct validity of an experiment not only permits the cataloging of outcomes, but also gives us at least a temporary glimpse of "why." I say "temporary" because no scientific theory ever escapes death. Experimental construct validity data simply hasten the evolutionary process.

References

Addesso & Glad (1978).

American Psychologist, 1987

Bechtold, H. P. (1959). Construct validity: A critique. American Psychologist, 14, 619-629.

Campbell, D. T., & Fiske, D. W. (1959) Convergent and discriminant validation by the multitrait-multimethod matrix. Psychological Bulletin, 56, 81-105.

Campbell, D. T., & Stanley, J. (1966). Experimental and quasi-experimental designs for research. Chicago: Rand McNally.

Cook, T. D., & Campbell, D. T. (1979). Quasi-experimentation: Design and analysis issues for field settings. Chicago: Rand McNally College Publishing.

Chronbach, L. J., & Mehl, P. E. (1955). Construct validity in psychological tests. Psychological Bulletin, 52, 281-302.

Cianni, M., & Horan, J. J. (1990). An attempt to establish the experimental construct validity of cognitive and behavioral approaches to assertiveness training. Journal of Counseling Psychology, 37, 243-247.

Donley, R. J., Horan, J. J., & DeShong, R. L. (1989). The effect of several self-disclosure permutations on counseling process and outcome, Journal of Counseling and Development, 67,408-412.

Erickson, C. D., Horan, J. J., & Hackett, G. (1991). On thinking and feeling bad: Do client problems derive from a common irrationality or specific irrational beliefs? Paper presented at the Annual Meeting of the American Psychological Association, San Francisco, CA, August.

Gilson, E. (1937). Being and realism. Reprinted in Beck, R. N. B. (Ed.), (1961). Perspectives in philosophy. NY: Holt, Rinehart, & Winston. Pp. 85-89.

Hackett, G., & Horan, J. J. (1980). Stress inoculation for pain: What's really going on? Journal of Counseling Psychology,27, 107-116.

Horan, J. J. (1980, December). Experimentation in counseling and Psychotherapy. Part 1: New myths about old realities. Educational Researcher, 5-10.

Horan, J. J. (1984). On the construct validity of experiments in counseling and psychotherapy. New fellow address at the Annual Meeting of the American Psychological Association, Toronto, Canada, August 25.

Horan, J. J. (1987). Paradigms for establishing experimental construct validity. Invited address at the Annual Meeting of the American Educational Research Association, Washington, D. C.

Horan, J. J. (In preparation). Effects of computer-based cognitive restructuring on rationally mediated self essteem.

Johns, R., Horan, J. J., & Games, P. (1990). Permutations of occupational stress inoculation applied to correctional officers. Paper presented at the Annual Meeting of the American Psychological Association.

Kazdin, A. E. & Wilcoxon, L. A. (1976). Systematic desensitization and nonspecific treatment effects: A methodological evaluation. Psychological Bulletin, 83, 729-758.

Keillor, R. M., & Horan, J. J. (In preparation). The effects of a videotaped expectancy challenge program on the drinking behavior of adjudicated college students.

Kerns, A., & Horan, J. J. (1993). The effects of assertion training and decision-making curricula on the liklihood of adolescent substance abuse. Unpublished ms.

Levine, J. D., Gordon, N. C., & Fields, H. L. (1978). The mechanism of placebo analgesia. Lancet, September 23, 654-657.

McNamara, K., & Horan, J. J. (1986). Experimental construct validity in the evaluation of cognitive and behavioral treatments for depression. Journal of Counseling Psychology, 33,23-30.

McWhirter, B., & Horan, J. J. (1992). Experimental construct validity of treatments linked to empirically derived subtypes of loneliness. Paper presented at the Annual Meeting of the American Psychological Association.

Nielsen, D. M., Horan, J. J., Keen, B., Ceperich, S. D., St. Peter, C. C., & Ostlund, D (1995). An attempt to improve self-esteem by modifying specific irrational beliefs. Paper presented at the Annual Meeting of the American Psychological Association, New York. (Journal of Cognitive Psychotherapy, in press)

Olson, C., McNamara, K. & Horan, J. J. (1995). Experimental construct validity of classic and nouveau therapies for bulimia nervosa. Paper presented at the Annual Meeting of the American Psychological Association.

Olson, C., Horan, J. J., & Polansky, J. (1994). Perspectives on ubstance abuse prevention. Handbook of Counseling Psychology

Polansky, J., Horan, J. J., & Hanish, C. (1993). Experimental construct validity of the outcomes of study skills training and career counseling as treatments for the retention of at-risk students. Journal of Counseling and Development, 71, 488-492.

Polansky, J., Horan, J. J., Buki, L. P., Kornfield, S. M., Ceperich, S. D., & Burrows, D. D. (1993). Common and specific effects of substance abuse prevention videos. Paper presented at the Annual Meeting of the American Psychological Association. Toronto, Canada, August.

Sweeney, G. A., & Horan, J. J. (1982). Separate and combined effects of cue-controlled relaxation and cognitive restructuring in the treatment of musical performance anxiety. Journal of Counseling Psychology, 29, 486-497.