Understanding the Psychological Diagnosis Test: A Comprehensive Guide

Psychological assessment plays a pivotal role in gaining insights into an individual’s unique traits and capabilities. It involves a systematic process of gathering, synthesizing, and interpreting information about a person (Groth-Marnat, 2009; Weiner, 2003). This comprehensive understanding is achieved through diverse methods and measures, meticulously chosen to align with the specific goals of the evaluation. The sources of information are varied and may include:

  • Records: Pertinent documents such as medical, educational, occupational, and legal records obtained from the referral source.
  • External Records: Data from other organizations and agencies identified as potentially relevant to the assessment.
  • Interviews: Structured or unstructured conversations with the individual being assessed to gather firsthand information.
  • Behavioral Observations: Direct observation of the individual’s behavior in different contexts.
  • Corroborative Interviews: Interviews with family members, friends, teachers, and other individuals who can provide valuable perspectives.
  • Formal Psychological Testing: The administration of standardized psychological or neuropsychological tests, often referred to as a Psychological Diagnosis Test.

The convergence of findings across multiple measures and sources, alongside any discrepancies, allows for a more nuanced and complete picture of the individual being evaluated. This holistic approach is crucial for reaching accurate and well-founded clinical conclusions, such as making a precise diagnosis or formulating effective treatment plans.

The clinical interview stands as a cornerstone of many psychological and neuropsychological assessments. These interviews can range from structured and semistructured formats to more open-ended conversations. Regardless of the format, the primary objective remains consistent: to pinpoint the nature of the client’s presenting concerns, gather direct historical information related to these issues, and explore historical factors that might be contributing to the current complaints. Moreover, the interview component of a psychological diagnosis test allows for valuable behavioral observations that can aid in characterizing the client and identifying potential diagnostic patterns. Information and observations gleaned from the interview guide the selection of appropriate assessment instruments, the identification of corroborative informants, and the recognition of historical records that can further assist clinicians in arriving at a diagnosis. In essence, clinical interviewing is a process of exploring the presenting problem, understanding the case history, developing hypotheses to be tested, and determining suitable methods, including formal testing, to address these hypotheses.

A critical component of the assessment process, and the central focus of this discussion, is psychological testing, often referred to as a psychological diagnosis test. This involves administering one or more standardized procedures under controlled environmental conditions (e.g., quiet setting, adequate lighting) to obtain a representative sample of behavior. Formal psychological diagnosis tests may include standardized interviews, questionnaires, surveys, and tests, carefully chosen based on the individual’s specific circumstances and the assessment questions being addressed. Assessments, therefore, serve to answer specific questions through the strategic use of tests and other procedures. It is crucial to emphasize that selecting the right tests requires a deep understanding of the individual’s unique situation and relies heavily on clinical judgment. Therefore, while this article discusses various types of tests, it does not endorse the use of any specific test for any particular situation. The selection of a psychological diagnosis test is best entrusted to a qualified assessor who is thoroughly familiar with the specific context of the assessment.

To address questions about the application of psychological diagnosis tests in evaluating the presence and severity of disability due to mental disorders, this chapter offers an introductory overview of psychological testing. It is structured into three main sections: (1) types of psychological tests, (2) psychometric properties of tests, and (3) test user qualifications and test administration. While the context of disability determination is considered where relevant, the primary aim of this chapter is to serve as a foundational introduction to psychological diagnosis testing.

TYPES OF PSYCHOLOGICAL TESTS

The categorization of psychological tests is multifaceted, expanding even further when educational tests are included. In fact, distinguishing between purely psychological and educational tests can often be challenging. The following discussion will outline some key distinctions among these tests, while acknowledging that there is no single, definitive way to categorize them due to frequent overlaps. Psychological diagnosis tests can be classified based on the nature of the behavior they assess (what they measure), their administration method, their scoring procedures, and their intended use. Figure 3-1 visually represents the types of psychological measures as described in this report.

FIGURE 3-1

Components of psychological assessment. NOTE: Performance validity tests do not measure cognition, but are used in conjunction with performance-based cognitive tests to examine whether the examinee is exerting sufficient effort to perform well and responding (more…)

The Nature of Psychological Measures

One of the most fundamental distinctions in psychological diagnosis tests is whether they measure typical behavior (often non-cognitive measures) or maximal performance (often cognitive tests) (Cronbach, 1949, 1960). Measures of typical behavior are designed to understand what an individual usually does in a given situation. These measures, which assess personality, interests, values, and attitudes, are often termed non-cognitive measures. Conversely, tests of maximal performance, as the name suggests, require individuals to answer questions and solve problems to the best of their ability. Because these tests typically involve cognitive skills, they are frequently referred to as cognitive tests. Intelligence tests and other ability tests fall under the category of cognitive tests, although they can also be known as ability tests, which is a more specific classification. Non-cognitive measures generally do not have correct answers in the traditional sense, although in certain contexts, such as employment testing, some responses may be more desirable. In contrast, cognitive tests almost always include items with definitive correct answers. This distinction between non-cognitive measures and cognitive tests forms a crucial framework for understanding psychological diagnosis testing, particularly in the context of disability evaluation.

Within non-cognitive measures, a further distinction exists based on whether the stimuli are structured or unstructured. A structured personality test, for example, might present true-or-false questions about common behaviors or activities. These are highly structured questions with clear response options. On the other hand, some personality tests employ unstructured projective stimuli, such as inkblots or ambiguous pictures. Individuals are then asked to describe what they see or imagine in these stimuli. The underlying principle of projective measures is that ambiguous stimuli can elicit projections of an individual’s unconscious motivations and attitudes. Scoring these unstructured measures is often more complex compared to structured tests.

Cognitive tests, a significant part of the psychological diagnosis test landscape, exhibit considerable diversity in what they measure, necessitating a more detailed explanation. They are often categorized into tests of ability and tests of achievement, although this distinction is not always clear-cut. Both types of tests involve learning and reflect what an individual has learned and can do. However, achievement tests typically focus on learning acquired through specific education and training, while ability tests assess learning gained from one’s broader environment. Some areas of learning, like vocabulary, are influenced by both home, social environment, and formal education. Notably, vocabulary is a strong predictor of intelligence test performance, often making it the initial test in intelligence assessments or even the core component of certain intelligence tests (e.g., the Peabody Picture Vocabulary Test). Conversely, vocabulary tests can also be designed to assess words learned solely in academic settings. Intelligence tests are so prevalent in clinical psychology and neuropsychology, core components of psychological diagnosis, that they are also considered neuropsychological measures. Specific abilities are often measured using subtests from intelligence tests, such as working memory tests. Standalone tests also exist for various specialized abilities.

Some ability tests are further divided into verbal and performance tests. Verbal tests rely on language for questions and answers. Performance tests, in contrast, minimize language use, often involving problem-solving tasks that do not require language. These might include manipulating objects, tracing mazes, sequencing pictures, or completing patterns. This distinction is most commonly applied to intelligence tests but can extend to other ability tests. Performance tests are particularly useful when the individual being tested is not proficient in the language of the test. Many of these tests assess visual-spatial skills. Historically, nonverbal measures were used as intelligence tests for non-English-speaking soldiers in World War I in the United States. These tests remain relevant in educational and clinical settings due to their reduced language dependence, making them valuable tools in a psychological diagnosis test battery.

Cognitive tests are also differentiated as speeded tests versus power tests. A purely speeded test is designed such that everyone could answer all questions correctly given enough time. Clerical skills tests, for example, might present paired lists of numbers where the task is to identify identical pairings. Performance is limited by speed. Pure power tests, on the other hand, are designed to assess the extent of an individual’s knowledge or ability, with ample time provided. The focus is solely on what the test-taker can do. In reality, most tests are a combination of both speed and power components. For instance, a testing company might aim for 90 percent of test-takers to complete 90 percent of the questions within the time limit. However, the purpose of testing significantly influences such guidelines. In educational settings, teachers generally want students to be able to complete tests within class time. When test-takers have disabilities that affect their response speed, accommodations like extended time are often provided, depending on the test’s purpose and the characteristics being assessed. This consideration is vital for ensuring fairness and accuracy in a psychological diagnosis test.

Questions in both achievement and ability tests can involve either recognition or free-response formats. Recognition tests, common in education and intelligence testing, typically use multiple-choice questions where the correct answer is among the options. Free-response questions, similar to fill-in-the-blanks or essay questions, require recalling or generating the answer without provided choices. This distinction also applies to some non-cognitive tests, though the focus shifts from recognition to selections. For example, a recognition question in a non-cognitive test might ask about preference between ice skating and movies, while a free recall question might ask about preferred leisure activities.

Cognitive tests in a psychological diagnosis test can also be categorized as process or product tests. Consider mathematics tests: some only credit correct final answers (product), while others award partial credit for correct steps even with an incorrect final answer (process). Similarly, psychologists and neuropsychologists often observe not just whether a person solves problems correctly (product), but also how they approach problem-solving (process). This qualitative aspect can be highly informative in a psychological diagnosis test.

Test Administration

A crucial distinction in psychological diagnosis tests is whether they are group administered or individually administered by a trained professional (psychologist, physician, or technician). Traditionally, group-administered tests were paper-and-pencil measures. Test-takers received a test booklet and an answer sheet, marking responses on the sheet unless they had specific disabilities. Increasingly, technology is used for test administration via computers and electronic media. Computer administration may offer adaptive features, although not all computer-based tests are adaptive. Individually administered measures are given directly to the test-taker by a trained professional. These are often considered more reliable because the administrator can make real-time judgments during the testing process, influencing administration, scoring, and observations. This direct interaction allows for a more nuanced and accurate psychological diagnosis test.

Tests can be administered in an adaptive or linear fashion, regardless of whether they are computer-based or individually administered. Linear tests present questions in a predetermined order. Adaptive tests adjust the subsequent questions based on the test-taker’s performance on earlier items. Typically, correct answers or expected responses lead to more difficult questions, progressively tailoring the test to the examinee’s ability level. Conversely, incorrect answers or unexpected responses may result in easier questions. This adaptive approach enhances efficiency and precision in psychological diagnosis testing.

Psychological diagnosis tests can be administered in written (keyboard or paper-and-pencil), oral, assistive device (for individuals with motor disabilities), or performance format. Oral and performance tests are generally challenging to administer in group settings. However, advancements in electronic media are enabling the administration of such tests without direct human examiners.

Another important distinction relates to who the respondent is. In most cases, the test-taker is the respondent. However, for young children, individuals with autism, or those with language impairments, the examiner may need to gather information from others who know the individual well (parents, teachers, spouses, family members) to describe their behavior, personality, and typical patterns. This reliance on informant reports is a common adaptation in psychological diagnosis tests for specific populations.

Scoring Differences

Psychological diagnosis tests are categorized as objectively scored, subjectively scored, or sometimes a combination of both. Objectively scored tests have predetermined correct answers that are counted to generate a final score, either directly or after conversion. Scoring can be manual or automated using optical scanning, computerized software, or templates. Subjectively scored tests rely on examiner judgment. Examiner ratings and self-report interpretations are evaluated by professionals using rubrics or scoring systems to convert responses into scores, which may be numerical or qualitative. Subjective scoring often includes both quantitative and qualitative summaries or narrative descriptions of an individual’s performance, adding depth to the psychological diagnosis test.

Test scores are frequently classified as norm-referenced (or normative) or criterion-referenced. Norm-referenced cognitive measures, such as college and graduate school admissions tests, compare a test-taker’s performance to that of others in a defined norm group. For example, a percentile score indicates the percentage of the norm group that scored below a particular individual. Intelligence tests and most other ability tests are norm-referenced. In recent years, there has been increasing emphasis on criterion-referenced tests, particularly in education (Hambleton and Pitoniak, 2006). Criterion-referenced tests compare an individual’s score to a fixed standard or criterion, rather than to other test-takers. High school graduation tests, licensure exams, and competency tests are examples of criterion-referenced measures. Driving tests, for instance, are criterion-referenced; the outcome is pass or fail based on meeting a set standard, not on comparison to other drivers. This distinction is important in understanding how results from a psychological diagnosis test are interpreted.

Test Content

As previously highlighted, a primary distinction among psychological diagnosis tests is whether they assess cognitive or non-cognitive qualities. In clinical psychological and neuropsychological settings, common cognitive tests include intelligence tests, neuropsychological measures, and performance validity measures. Many tests used in these settings by professionals assess specific functions like memory or problem-solving. Performance validity measures are brief assessments used to evaluate whether the examinee is putting forth sufficient effort and responding to the best of their ability. These are often integrated within broader assessments. Common non-cognitive measures include personality tests and symptom validity measures. Some personality tests, such as the Minnesota Multiphasic Personality Inventory (MMPI), assess the extent to which an individual exhibits behaviors considered atypical compared to a norming sample.1 Other personality tests are more focused on providing insights about the client to therapists. Symptom validity measures, similar to performance validity measures, are used to check if an individual is presenting themselves honestly and truthfully. Bridging the cognitive and non-cognitive domains are measures of adaptive functioning, which often incorporate both cognitive and non-cognitive components, providing a holistic view in a psychological diagnosis test.

PSYCHOMETRICS: EXAMINING THE PROPERTIES OF TEST SCORES

Psychometrics is the scientific discipline devoted to the development, interpretation, and evaluation of psychological diagnosis tests and measures. It focuses on assessing variability in behavior and linking this variability to psychological phenomena. Evaluating the quality of psychological measures traditionally centers on test reliability (consistency), validity (accuracy of interpretations and uses), and fairness (equivalence of use across different groups). This section offers a general overview of these concepts, providing a foundation for subsequent discussions. Given the implications of using psychological measures with diverse racial and ethnic backgrounds, issues of equivalence and fairness in psychological testing are also addressed. These psychometric properties are crucial for ensuring the integrity and usefulness of a psychological diagnosis test.

Reliability

Reliability in psychological diagnosis tests refers to the stability and consistency of test scores. Unreliable measures yield scores that do not accurately reflect the true value of the psychological variable being measured. Observed test scores are understood to be composed of both true and error components. The standard error of measurement quantifies the degree of error, indicating a confidence interval (e.g., 95 percent) within which a person’s true score is likely to fall, acknowledging that obtained scores are estimates of true scores (Geisinger, 2013).

Reliability is typically assessed in four primary ways:

  1. Test-retest reliability: Measures the consistency of scores over time, reflecting stability and temporal consistency.
  2. Inter-rater reliability: Assesses the consistency of scores across different independent judges or raters.
  3. Parallel or alternate forms reliability: Evaluates the consistency of scores across different versions of the same test, indicating both stability and equivalence.
  4. Internal consistency reliability: Examines the consistency of different items within a test that are intended to measure the same construct, reflecting homogeneity. A specific case of internal consistency is split-half reliability, where scores from two halves of a test are compared and converted into a reliability index.

Several factors can influence the reliability of psychological diagnosis test scores. These include the time interval between test administrations (affecting test-retest and alternate-forms reliability), the similarity of content and subject expectations in alternate forms, split-half, and internal consistency approaches. Additionally, changes in subjects over time due to physical health, emotional state, environment, or test-related factors like unclear instructions, subjective scoring, and guessing can all impact test reliability. It’s important to recognize that a test might produce reliable scores in one context but not in another, and different reliability estimates are not interchangeable (Geisinger, 2013). Therefore, understanding reliability is paramount in choosing and interpreting a psychological diagnosis test.

Validity

While reliability is essential, reliable scores from a psychological diagnosis test do not automatically guarantee validity. Validity is defined as “the degree to which evidence and theory support the interpretations of test scores for proposed uses of tests” (AERA et al., 2014, p. 11). It’s crucial to understand that validity applies not to the test itself, but to the interpretation and use of the test scores. For an interpretation to be considered valid, it must be firmly rooted in psychological theory and empirical evidence demonstrating a clear relationship between the test and what it is intended to measure (Furr and Bacharach, 2013; Sireci and Sukin, 2013). Traditionally, psychology and education have identified three primary types of validity evidence (Sattler, 2014; Sireci and Sukin, 2013:

  1. Construct validity evidence: The extent to which test scores align with the theoretical construct the test is designed to measure. This involves demonstrating high correlation with theoretically similar measures and low correlation with theoretically dissimilar measures.
  2. Content validity evidence: The degree to which the test content adequately represents the subject matter and supports the test’s intended use.
  3. Criterion-related validity evidence: The degree to which test scores correlate with other measurable, reliable, and relevant variables (criteria) that are thought to measure the same construct.

Other types of validity relevant to specific contexts, such as disability assessment, have been proposed but are not universally accepted as distinct types of validity. These include:

  1. Diagnostic validity: The extent to which psychological diagnosis tests effectively aid in formulating accurate diagnoses.
  2. Ecological validity: The degree to which test scores reflect real-world functioning (e.g., the impact of disability on daily life).
  3. Cultural validity: The extent to which test content and procedures accurately reflect the sociocultural context of the individuals being tested.

Each of these validity types raises complex considerations for using psychological diagnosis tests, especially in contexts like SSA evaluations. Ecological validity is particularly critical in SSA assessments, where the focus is on everyday functioning. Intelligence tests, for instance, have sometimes been criticized for lacking ecological validity (Groth-Marnat, 2009; Groth-Marnat and Teal, 2000). However, research suggests that many neuropsychological tests do show a moderate level of ecological validity in predicting everyday cognitive functioning (Chaytor and Schmitter-Edgecombe, 2003, p. 181).

Current discussions on validity are increasingly adopting an argument-based approach, using diverse evidence to build a comprehensive case for the validity of test score interpretations (Furr and Bacharach, 2013). In this framework, construct validity is seen as an overarching concept, encompassing evidence from multiple sources to support the validity of score interpretations. Five key sources of validity evidence are generally considered (AERA et al., 2014; Furr and Bacharach, 2013; Sireci and Sukin, 2013:

  1. Test content: Does the test content adequately cover the important aspects of the construct being measured? Are the test items relevant, appropriate, and aligned with the purpose of the test?
  2. Relation to other variables: Is there a demonstrable relationship between test scores and other criteria or constructs that are expected to be related?
  3. Internal structure: Does the actual structure of the test align with the theoretically expected structure of the construct?
  4. Response processes: Are test-takers engaging the theoretical constructs or processes that the test is designed to measure?
  5. Consequences of testing: What are the intended and unintended consequences of using the psychological diagnosis test?

Standardization and Testing Norms

Standardization is a fundamental aspect of developing any psychometrically sound psychological diagnosis test. It involves establishing and clearly defining explicit methods and procedures for test administration. Standardized administration typically includes: (1) a quiet, distraction-free environment, (2) precise adherence to scripted instructions, and (3) provision of necessary materials or stimuli. These standardized procedures are used when collecting normative data and should be followed in all subsequent administrations to ensure the applicability of normative data to individual evaluations (Lezak et al., 2012).

Standardized psychological diagnosis tests provide normative data (norms), which are scores derived from representative groups for whom the test is designed (the norm group or designated population). These norms allow for comparison of an individual’s performance to the designated population, using transformed scores like percentiles, cumulative percentiles, and standard scores (e.g., T-scores, Z-scores, stanines, IQs). Without standardized administration, an individual’s test performance may not accurately reflect their true ability. For instance, abilities might be overestimated if the examiner provides unauthorized assistance, or underestimated if proper instructions are not given. When nonstandardized administration is unavoidable, norms should be used cautiously due to potential systematic errors.

Understanding the intended population for a psychological diagnosis test is crucial. The standardization sample, or norm group, is key to meaningful score interpretation and prediction. Appropriate norms depend on the sample size and representativeness. Larger, more representative norm groups provide better approximations of the population distribution.

Norms should be based on representative samples from the intended test population, ensuring equal opportunity for all individuals to be included in the standardization sample. Stratified sampling allows test developers to account for demographic characteristics in the population, closely mirroring these proportions in the norm sample. For example, intelligence test norms often use census-based norming, proportionally representing demographics like race, ethnicity, parental education, socioeconomic status, and geographic region.

Applying psychological diagnosis tests to individuals outside the intended population can lead to inaccurate scores and misinterpretations. Testing individuals with disabilities often presents complex issues. Test users sometimes employ tests not developed or normed for disabled populations. It is crucial that tests used with such individuals (including SSA disability claimants) have representative norming samples. When such norms are unavailable, assessors must acknowledge this limitation and its potential impact on interpretation (Turner et al., 2001).

Test Fairness in High-Stakes Testing Decisions

Performance on psychological diagnosis tests often carries significant consequences (high stakes), influencing educational, occupational, and SSA disability determinations. These consequences can be positive or negative, intended or unintended. Therefore, test fairness is paramount to ensure no individual or group is unfairly disadvantaged due to factors unrelated to the measured abilities. Bias must be eliminated from professional assessments, and research must demonstrate fair and equivalent use across diverse population subgroups. It is important to acknowledge that for many linguistic and cultural groups, appropriately normed tests are lacking. In such cases, assessors must explicitly state this limitation and its potential impact on scores and interpretations.

While all tests are influenced by cultural context (cultural loading), bias refers to systematic error in measuring a psychological construct. Bias leads to inaccurate results, either overestimating or underestimating the true measure. Cultural test bias occurs when bias is linked to culturally related variables (e.g., race, ethnicity, social class, gender, education).

Key considerations for test fairness in psychological diagnosis tests relate to equivalence (Suzuki et al., 2014, p. 260):

  1. Functional equivalence: Does the construct being measured occur with equal frequency across different groups?
  2. Conceptual equivalence: Is the item content equally familiar and meaningful across groups?
  3. Scalar equivalence: Do average score differences reflect the same degree, intensity, or magnitude across cultural groups?
  4. Linguistic equivalence: Does the language used have similar meaning across groups?
  5. Metric equivalence: Does the scale measure the same behavioral qualities, and does it have similar psychometric properties across cultures?

Establishing that a psychological diagnosis test functions appropriately across various cultural contexts is essential. Test developers address equivalence through methods like:

  • Expert panel reviews: Professionals evaluate item content for potential biases.
  • Differential item functioning (DIF) analysis: Examining whether items function differently across groups.
  • Statistical comparisons of psychometric properties: Comparing reliability coefficients across different populations.
  • Factor analysis and structural equation modeling: Assessing the similarity and differences in construct structure and measurement invariance.
  • Analysis of mean score differences: Considering score spread within and between racial and ethnic groups.

Cultural equivalence refers to whether “interpretations of psychological measurements, assessments, and observations are similar if not equal across different ethnocultural populations” (Trimble, 2010, p. 316). It is a higher-order concept dependent on meeting specific criteria to ensure appropriate use across cultural groups beyond the original development population. Trimble (2010) notes that numerous types of equivalence influence interpretive and procedural practices in establishing cultural equivalence for a psychological diagnosis test.

Item Response Theory and Tests2

Classical test theory, dominant for much of the 20th century, posited that all test scores consist of true score and error. True score is a hypothetical value representing a person’s actual score without error. Error is assumed to be random, with no correlation to true scores or other variables (Geisinger, 2013). This theory heavily relies on reliability.

Since the mid-20th century, item response theory (IRT) has emerged as a mathematically sophisticated alternative. IRT models, particularly relevant to cognitive tests, assume that item response depends on item difficulty and test-taker ability. Computer-adaptive testing uses IRT to estimate test-taker ability after each response, adjusting subsequent item difficulty. Correct answers lead to harder questions, incorrect answers to easier ones, efficiently tailoring the test. This is particularly useful in psychological diagnosis tests administered via computer.

IRT models simplify test form equating, allowing different test forms with varying item difficulties to yield comparable scores. Equating relies on anchor items, common items across test forms, to establish a fixed reference and compare scores across groups.

Common IRT models include one-, two-, and three-parameter models. The one-parameter model (Rasch model) focuses solely on item difficulty. The two-parameter model adds item discrimination, the item’s ability to differentiate between those with high and low ability. This is useful for tests like essay exams. The three-parameter model includes a pseudo-guessing parameter, accounting for chance-level correct responses, and is used in large-scale multiple-choice testing.

IRT models, less dependent on test-taker sampling, are valuable for test equating, ensuring score comparability across different test forms. High-stakes admissions tests like GRE, MCAT, and GMAT use IRT for scoring and equating, offering greater efficiency and accuracy than classical methods in ensuring the reliability of psychological diagnosis tests.

TEST USER QUALIFICATIONS

The test user is responsible for the appropriate use of psychological diagnosis tests, including selection, administration, interpretation, and application of results (AERA et al., 2014). Test user qualifications involve considering training levels, educational degrees, knowledge in assessment domains (ethics, administration, scoring, interpretation), certifications, licensure, and professional memberships. Psychometric knowledge and skills, along with training in responsible test use (ethics), are essential. This includes understanding descriptive statistics, reliability, validity, normative interpretation, test selection, and administration procedures. Guidelines also emphasize the importance of understanding the impact of ethnic, racial, cultural, gender, age, educational, and linguistic factors in test selection and use (Turner et al., 2001).

Test publishers provide detailed manuals outlining the construct being assessed, norming sample, reading level, administration time, and scoring and interpretation guidelines. Instructions for examinees are verbatim, and sample responses often aid in determining correct answers or assigning points. Ethical and legal knowledge regarding assessment competencies, confidentiality, test security, and test-taker rights is imperative. Resources like the Mental Measurements Yearbook (MMY) offer descriptive information and reviews of commercially available psychological diagnosis tests, promoting informed test selection (Buros, 2015). Inclusion in MMY requires sufficient documentation of psychometric quality (validity, reliability, norming).

Test Administration and Interpretation

Following the Standards for Educational and Psychological Testing (AERA et al., 2014) and APA guidelines (Turner et al., 2001), many test publishers use tiered qualification levels (A, B, C) for purchasing, administering, and interpreting psychological diagnosis tests (e.g., PAR, n.d.; Pearson Education, 2015). Many instruments discussed in this report are level C, requiring advanced degrees, specialized psychometric knowledge, and formal training in administration, scoring, and interpretation. Level B tests may require a bachelor’s or master’s degree and specialized training. Level A tests have minimal requirements. Individual test manuals provide specific qualifications.

Standardized procedures are crucial, requiring administrators of cognitive and neuropsychological measures to be well-trained in standardized protocols. They need interpersonal skills to build rapport and encourage maximal effort. Understanding psychometric properties, validity, and reliability, and potential risks to these during testing is essential. Doctoral-level psychologists are typically trained in test administration. Neuropsychologists may be needed for evaluating cognitive deficits (Chapter 5). Non-doctoral-level psychometrists or technicians are commonly used for administration and scoring under doctoral-level supervision (APA, 2010; Brandt and van Gorp, 1999; Pearson Education, 2015).

Interpreting psychological diagnosis test results demands more clinical training than administration. Understanding test construction and validity is crucial for interpreting self-report measures. Interpreting results without this knowledge violates professional ethics (APA, 2010). SSA requires tests to be “individually administered by a qualified specialist … licensed or certified to administer, score, and interpret psychological tests and have the training and experience” (SSA, n.d.). Doctoral-level clinical psychologists trained in psychometric testing are generally qualified for interpretation. For cognitive or neuropsychological evaluations, SSA requires “proper training in this area of neuroscience.” Clinical neuropsychologists, trained in brain-behavior relationships and meeting specific professional benchmarks (AACN, 2007; NAN, 2001), may be necessary for interpreting cognitive tests, ensuring accurate psychological diagnosis test outcomes.

Use of Interpreters and Other Nonstandardized Test Administration Techniques

Modifying procedures, such as using interpreters or nonstandardized administration, can introduce systematic errors into psychological diagnosis testing. Errors can stem from language issues, translator use, or examinee abilities (sensory, perceptual, motor). Language interpreters may cause mistranslations and inaccurate scores. Translators are a less preferred option, and assessors should be familiar with both the language and culture of the individual to interpret results or determine test appropriateness. Test adaptation is a growing industry, with tests developed in English being adapted for other countries, requiring linguistic and cultural expertise (ITC, 2005).

For sensory, perceptual, or motor disabilities, modifications may alter the construct being measured. In both scenarios, scores may lack a relevant normative group for accurate interpretation. While detailed discussion is beyond this scope, nonstandardized administration necessitates acknowledging potential errors in conclusions drawn from the psychological diagnosis test.

PSYCHOLOGICAL TESTING IN THE CONTEXT OF DISABILITY DETERMINATIONS

As noted in Chapter 2, SSA considers objective medical evidence to include standardized psychological diagnosis test results. Objectivity varies among psychological tests, largely depending on scoring processes. Unstructured measures with open-ended responses are less objective due to reliance on professional judgment. Standardized tests, like those discussed, are structured and objectively scored. Non-cognitive self-report measures involve predetermined answer choices. Cognitive tests have correct answers and provide normative data for comparison. Standardized tests rely less on clinical judgment and are more objective. Unlike direct measurements like weight, psychological diagnosis tests require individual cooperation. Validity testing, discussed further in Chapters 4 and 5, enhances confidence in test results. Appropriately administered and interpreted standardized tests are considered objective evidence.

The use of psychological diagnosis tests in disability determinations has critical implications. Ecological validity, whether test performance reflects real-world behavior, is paramount in SSA evaluations. Two aspects of ecological validity in neuropsychological assessment are: (1) verisimilitude, how well the test captures everyday cognitive skills to identify those with real-world task difficulties, and (2) veridicality, how well test performance predicts real-world functioning (Chaytor and Schmitter-Edgecombe, 2003, pp. 182–183). Establishing ecological validity is complex due to non-cognitive factors (emotional, physical, environmental) influencing test and daily performance. Test environment artificiality, behavior sampling limitations, and compensatory strategies not usable in testing can lead to underestimation of abilities.

Activities of daily living (ADLs) and return-to-work likelihood are important in disability determinations. Occupational status is complex and requires supplementing psychological diagnosis test data with observations, informant ratings, and environmental assessments (Chaytor and Schmitter-Edgecombe, 2003). Table 3-1 outlines major mental disorders, relevant test types, and functioning domains.

TABLE 3-1

Listings for Mental Disorders and Types of Psychological Tests.

Disability determination hinges on a medically determinable impairment and associated functional limitations. SSA’s five-step process evaluates impairments against the Listing of Impairments at Step 3, considering symptoms, signs, and lab findings (Paragraph A criteria) and functional limitations (Paragraph B criteria). Meeting or equaling listing criteria leads to claim allowance. Otherwise, residual functional capacity, including mental residual functional capacity, is assessed for past work (Step 4) or any work in the national economy (Step 5).

SSA assesses functioning in four domains: understanding and memory, sustained concentration and persistence, social interaction, and adaptation. Psychological diagnosis testing is crucial in understanding functioning in these areas. Box 3-1 describes ecological assessment of these core areas. Psychological assessments provide structured evaluation through interviews, standardized measures, checklists, observations, and other procedures.

BOX 3-1

Descriptions of Tests by Four Areas of Core Mental Residual Functional Capacity. Remember location and work-like procedures Understand and remember very short and simple instructions

This chapter has covered foundational aspects of psychological diagnosis tests, including psychometric principles and fairness. Test applications inform disability determinations. Chapters 4 and 5 build on this, examining useful test types, including validity measures. Chapter 4 focuses on non-cognitive, self-report measures and symptom validity tests. Chapter 5 covers cognitive tests and performance validity tests. Strengths and limitations are discussed to explore test relevance for different claims and disorder categories, emphasizing validity of claims in psychological diagnosis testing.

REFERENCES

  • AACN (American Academy of Clinical Neuropsychology). AACN practice guidelines for neuropsychological assessment and consultation. Clinical Neuropsychology. 2007;21(2):209–231. [PubMed: 17455014]
  • AERA (American Educational Research Association), APA (American Psychological Association), and NCME (National Council on Measurement in Education). Standards for educational and psychological testing. Washington, DC: AERA; 2014.
  • APA. Ethical principles of psychologists and code of conduct. 2010. [March 9, 2015]. http://www​.apa.org/ethics/code .
  • Brandt J, van Gorp W. American Academy of Clinical Neuropsychology policy on the use of non-doctoral-level personnel in conducting clinical neuropsychological evaluations. The Clinical Neuropsychologist. 1999;13(4):385–385.
  • Buros Center for Testing. Test reviews and information. 2015. [March 19, 2015]. http://buros​.org/test-reviews-information .
  • Chaytor N, Schmitter-Edgecombe M. The ecological validity of neuropsychological tests: A review of the literature on everyday cognitive skills. Neuropsychology Review. 2003;13(4):181–197. [PubMed: 15000225]
  • Cronbach LJ. Essentials of psychological testing. New York: Harper; 1949.
  • Cronbach LJ. Essentials of psychological testing. 2nd. Oxford, England: Harper; 1960.
  • De Ayala RJ. Theory and practice of item response theory. New York: Guilford Publications; 2009.
  • DeMars C. Item response theory. New York: Oxford University Press; 2010.
  • Furr RM, Bacharach VR. Psychometrics: An introduction. Thousand Oaks, CA: Sage Publications, Inc.; 2013.
  • Geisinger KF. Reliability. Geisinger KF, Bracken BA, Carlson JF, Hansen JC, Kuncel NR, Reise SP, Rodriguez MC, editors. Washington, DC: APA; APA handbook of testing and assessment in psychology. 2013;1
  • Groth-Marnat G. Handbook of psychological assessment. Hoboken, NJ: John Wiley & Sons; 2009.
  • Groth-Marnat G, Teal M. Block design as a measure of everyday spatial ability: A study of ecological validity. Perceptual and Motor Skills. 2000;90(2):522–526. [PubMed: 10833749]
  • Hambleton RK, Pitoniak MJ. Setting performance standards. Educational Measurement. 2006;4:433–470.
  • ITC (International Test Commission). ITC guidelines for translating and adaptating tests. Geneva, Switzerland: ITC; 2005.
  • Lezak M, Howieson D, Bigler E, Tranel D. Neuropsychological assessment. 5th. New York: Oxford University Press; 2012.
  • NAN (National Academy of Neuropsychology). NAN definition of a clinical neuropsychologist: Official position of the National Academy of Neuropsychology. 2001. [November 25, 2014]. https://www​.nanonline​.org/docs/PAIC/PDFs​/NANPositionDefNeuro.pdf .
  • PAR (Psychological Assessment Resources). Qualifications levels. 2015. [January 5, 2015]. http://www4​.parinc.com​/Supp/Qualifications.aspx .
  • Pearson Education. Qualifications policy. 2015. [January 5, 2015]. http://www​.pearsonclinical​.com/psychology/qualifications​.html .
  • Sattler JM. Foundations of behavioral, social, and clinical assessment of children. 6th. La Mesa, CA: Jerome M. Sattler, Publisher, Inc.; 2014.
  • Sireci SG, Sukin T. Test validity. Geisinger KF, Bracken BA, Carlson JF, Hansen JC, Kuncel NR, Reise SP, Rodriguez MC, editors. Washington, DC: APA; APA handbook of testing and assessment in psychology. 2013;1
  • SSA (Social Security Administration). Disability evaluation under social security—Part III: Listing of impairments—Adult listings (Part A)—section 12.00 mental disorders. n.d. [November 14, 2014]. http://www​.ssa.gov/disability​/professionals/bluebook/12​.00-MentalDisorders-Adult.htm .
  • Suzuki LA, Naqvi S, Hill JS. Assessing intelligence in a cultural context. Leong FTL, Comas-Diaz L, Nagayama Hall GC, McLoyd VC, Trimble JE, editors. Washington, DC: APA; APA handbook of multicultural psychology. 2014;1
  • Trimble JE. Encyclopedia of cross-cultural school psychology. New York: Springer; 2010. Cultural measurement equivalence; pp. 316–318.
  • Turner SM, DeMers ST, Fox HR, Reed G. APA’s guidelines for test user qualifications: An executive summary. American Psychologist. 2001;56(12):1099.
  • Weiner IB. The assessment process. In: Weiner IB, editor. Handbook of psychology. Hoboken, NJ: John Wiley & Sons; 2003.

Footnotes

1This may be in comparison to a nationally representative norming sample, or with certain tests or measures, such as the MMPI, particular clinically diagnostic samples.

2The brief overview presented here draws on the works of De Ayala (2009) and DeMars (2010), to which the reader is directed for additional information.

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *