Many psychologists use labels such as achievement test, aptitude test, and ability test imprecisely, and nonpsychologists use them as synonyms. This lack of precision is understandable because in actual practice, tests bearing these labels often appear to be quite similar and are used for similar purposes. This article explains the theoretical distinction among achievement, aptitude, and ability tests; describes the primary uses of these tests; and provides a brief overview of the types of subscales widely used in these tests and the constructs they measure.
Achievement tests are designed to assess the extent to which a person has developed a specific motor skill or learned a specific body of knowledge. Typically, an achievement test is administered following a period of instruction designed to teach the motor or cognitive skill to be examined. The prototypical achievement test is the periodic classroom exam that is administered to determine how much the student has learned. Other examples include the written and driving tests taken to secure a driver’s license, the Scholastic Assessment Test (SAT) and American College Test (ACT) taken by high school students contemplating college, and the Graduate Records Examination (GRE) taken by college students who want to go to graduate school.
Theoretically, the purpose of the achievement test is descriptive—to measure the extent to which an examinee has mastered a motor skill or area of knowledge. In practice, however, achievement test results often are interpreted as an indicator of future performance. For example, while achievement tests such as the SAT and GRE evaluate the knowledge examinees have accrued as a result of their educational experiences, scores on those tests are used to predict the likelihood of success in more advanced and challenging programs of study. This common practice con-founds the performance assessment (i.e., descriptive) function of achievement tests with the prediction goals of aptitude tests.
Many achievement and aptitude tests are very similar in appearance, but the primary purpose of aptitude tests is prediction. They are designed to obtain information that can be used in predicting some aspect of the person’s future behavior. Aptitude tests assess the examinee’s ability to learn both cognitive and motor skills. Often, scores on a broadly based test of verbal comprehension are used to predict the examinee’s potential to learn (and use) new cognitive skills. In fact, the most common use of aptitude tests is to predict future performance in an educational program or occupational setting. However, some aptitude tests measure motor skills (e.g., eye-hand coordination or the time it takes to run a 40-yard dash). Scores on aptitude tests such as these are used to predict the examinee’s ability to learn (and use) desirable motor skills.
The distinction between aptitude and ability tests is subtle, and many psychologists and test publishers use the terms interchangeably. In general, however, ability tests assess cognitive and motor skill sets that have been acquired over a long period of time and that are not attributable to any specific program of instruction. For example, intelligence tests such as the Wechsler
Adult Intelligence Scale—Third Edition (WAIS-III) and the Stanford-Binet Intelligence Scales, Fifth Edition (SB5) measure verbal comprehension, working memory, perceptual organization, and processing speed. These abilities are not the result of any specific program of instruction. Instead, they are believed to be a function of the person’s native ability to learn from life experiences. Ability tests are descriptive in that they assess people’s knowledge and skills, but they are also predictive because they measure qualities that are presumed to influence the person’s ability to learn new skills and to solve novel problems.
In summary, psychologists distinguish among achievement, aptitude, and ability tests at a theoretical level. Achievement tests describe people’s present status, aptitude tests predict their future behavior, and ability tests assess their innate potential. In practice, however, achievement, aptitude, and ability tests are often similar in form and used for similar purposes.
Common Variations among Tests
Psychologists have created such a variety of tests that even developing a system to classify them is challenging. Most tests measure cognitive aptitudes (e.g., the Kendrick Cognitive Tests for the Elderly and the Peabody Individual Achievement Test), but many tests also measure motor skills (e.g., the O’Connor Finger Dexterity Test and the USES General Aptitude Test Battery [GATB]). Most tests require the use of verbal and reading abilities (e.g., the Multidimensional Aptitude Battery [MAB] and the Differential Aptitude Test [DAT]), but a few use nonverbal means of measuring aptitudes (e.g., Tests of Non-Verbal Intelligence, Second Edition and the Peabody Picture Vocabulary Test). In addition, tests differ in the number of aptitudes they measure, their standardization, and the manner of administration.
Number of Aptitudes
Some tests measure a single aptitude (e.g., the Electrical and Electronics Test and the Personnel Assessment Selection System) but many measure multiple aptitudes (e.g., the Armed Services Vocational Aptitude Battery [ASVAB] and the Ball Aptitude Battery [BAB]). Both approaches have advantages.
Multi-aptitude batteries obtain information about a broad array of cognitive and motor skills and allow comparisons of the examinee’s relative strengths and weaknesses. These instruments are useful when individuals or organizations are seeking information to guide vocational and educational decisions. Although numerous multi-aptitude and multi-ability test batteries exist, all generally measure a relatively standard set of constructs.
More specialized instruments that measure a single construct are useful when more focused predictions are desirable. Obtaining information about constructs like creative and artistic potential often requires the use of specialized instruments designed for that specific purpose. Furthermore, a single aptitude instrument designed to measure reading, math, spatial, or mechanical skills may measure those skills more precisely than a multi-aptitude battery.
Standardized tests are those that have been administered to a group of people (referred to as the norm group) to obtain information about the likelihood of each possible score on the test. Comparing the score of an examinee to the scores obtained by the people composing the norm group allows psychologists to interpret the score. Scores on standardized tests typically are reported in terms of a standard score, an age equivalent score, or a grade equivalent score.
It is critically important that the norm group used to interpret an examinee’s score provides a meaningful basis for comparison. For example, it would be misleading to interpret a high school student’s achievement test score using a norm group composed of middle school students. Although the problem identified in this example is obvious, more subtle problems have only recently begun to be avoided. For example, comparing the scores a female obtained on a test to the scores obtained by a norm group composed exclusively of males yields a questionable interpretation in many instances. However, this practice was standard not too many years ago.
For this reason, many tests have more than one norm group. A test used with elementary school children, for example, might have a norm group composed of first-grade students, a second norm group composed of second-grade students, and so on up to a norm group composed of sixth graders. In addition, separate norm groups for girls and boys might be available for each grade level. Additional examples of the types of norm groups that could be developed for a test include female college graduates, successful carpenters, African American lawyers, and enlisted males.
Accurate normative interpretation of a test is not possible without a relevant norm group, but the development of norm groups is costly and time-consuming. This creates two problems. First, many tests have only one or a few norm groups. This practice forces users to base their interpretation on the most relevant norm group rather than on a directly relevant norm group. Sometimes a relevant norm group that matches the gender, cultural background, or ethnic heritage of an examinee is not available, so the user is forced to make the best interpretation possible under the circumstances. Second, the expense of obtaining norm groups also means that some published norm groups are not current. Each cohort of examinees is born into a world that differs in important respects from the preceding cohort. The accuracy and usefulness of normative interpretations of test results declines as the norm group becomes more and more dated.
While almost all aptitude and ability tests are standardized, the typical classroom achievement test is nonstandardized. Generally, scores of nonstandardized tests are reported in terms of the percentage of items answered correctly and are interpreted in terms of a predetermined standard (e.g., A > 90%, B = 89%-80%; > 70% = Pass and < 69% = Fail). Tests that are interpreted by comparing the examinee’s performance to a predetermined standard rather than to a norm group are called criterion-referenced. Two advantages of criterion-referenced tests are that the scores obtained on such tests are inherently meaningful and no artificial constraints are imposed on the number of examinees that can perform at a given level.
Most achievement, aptitude, and ability tests can be administered to a single person or to a group of individuals. Historically, the option to administer the test to a group was essential for tests such as the SAT, ACT, and GRE that are taken by hundreds of thousands of people each year. However, in some instances the information that can be obtained from behavioral observations made by a psychologist while administering the test to an individual is quite important. Although more costly and time-consuming, the option of an individualized administration of the test is important. This option is more likely to be desirable when administering a test to children, adolescents, and individuals with learning disabilities or other problems that might interfere with their performance on the test. Some tests, particularly intelligence tests designed to measure cognitive abilities, are designed exclusively for individual administration.
In the last decade, an additional option for test administration has emerged: computer administration of the test. Computer administration combines the advantage of economic administration to large numbers of individuals with the possibility of some behavioral observations during the administration of the test. For example, response latencies (i.e., the amount of time it takes the examinee to answer the question) can be recorded during computer administration of the test. Furthermore, tests based on item response theory can tailor the test to the ability level of the examinee.
Computer administration is now the primary option for some tests. For example, the GRE is administered by computer to examinees in the United States, Canada, and many other areas of the world unless special arrangements are made for a paper-based administration. This trend will likely accelerate, and it is likely that most widely used standardized tests will provide an option for computer administration within a decade.
Hundreds of scales have been developed to measure various facets of achievement, aptitude, and ability. Numerous tests focus on specific content areas such as spatial, mathematical, verbal, and motor skills. Many tests that measure a single cognitive or motor skill provide an alternate (and in some instances a more precise) measure of the skills measured by the multi-aptitude batteries and intelligence tests. Despite this amazing variety of options, most of the leading tests assess the same select set of skills. Although the specific name of the skill varies from test to test and subtle differences exist among similarly named tests, the constructs measured by the various tests are quite similar.
The following sections describe some of the most frequently measured cognitive and motor skills and some of the composite scores obtained by combining information about these skills.
Verbal Aptitude: The ability to understand the meanings of words, sentences, and paragraphs and to use them effectively. Measures of this skill assess how well an examinee understands ideas expressed in words and how clearly he or she can reason with words. Some tests include separate subscales to measure components of verbal knowledge. For example, the ASVAB includes tests of word knowledge and paragraph comprehension. The WAIS-III, MAB, and SB5 contain vocabulary tests. The SB5 also includes the test Verbal Relations. The DAT contains the following tests: Verbal Reasoning and Language Usage: Spelling and Grammar.
Numerical Aptitude: The ability to understand ideas expressed in numbers. Tests include some combination of items that assess numerical computation (i.e., the ability to add, subtract, and perform other arithmetic calculations) and numerical reasoning (i.e., how well an examinee can think and reason with numbers). Scales that assess aspects of this skill are variously titled Arithmetic, Equation Building, Numerical Ability, and Quantitative, among others.
Spatial Aptitude: The ability to visualize or form mental images of solids from looking at plans on a flat piece of paper. Some items require people to look at a diagram and determine how an object would look in three-dimensional space if it were completed. Others ask respondents to look at a picture or drawing of a completed object and visualize in three-dimensional space how that object would look if it were rotated into a different position. Related skills are measured in tests titled Block Design, Matrix Reasoning, Matrices, Paper Folding and Cutting, Pattern Analysis, Space Relations, and Spatial.
Abstract Reasoning: The ability to understand ideas that are presented without using words or numbers. Tests of abstract reasoning present problems in terms of size, shape, position, or quantity using pictures, shapes, patterns, or some other nonverbal, non-numerical form. Scales measuring spatial aptitude measure one aspect of this ability. Other scales measuring aspects of this ability are titled Form Perception, Object Assembly, Picture Completion, and Picture Arrangement, among others.
Comprehension: The ability to use deductive reasoning (and, to a lesser extent, inductive reasoning) to derive solutions for socially relevant problems and issues. These tests assess examinees’ practical judgment and common sense and their ability to deal with their social and cultural environment. Other scales measuring aspects of this aptitude are titled Absurdities and Similarities.
In a sense motor skills represent the output function of human ability. Understanding and problem solving occur unobserved inside the human brain, but the product of that mental activity is expressed either in words or through some physical activity. Many motor skills tests require people to use their cognitive skills prior to making some physical response. For example, tests of clerical speed and accuracy (i.e., scanning lists of names or street addresses to see if they match or are in alphabetical order) require people to use both cognitive and motor skills. Other examples include block design (i.e., arranging blocks to make a designated design) and digit symbol (i.e., translating randomly arranged symbols into numbers using a key that matches the symbols and numbers).
The following three tests provide purer assessments of motor skills. The tasks they require people to perform are simple and do not require cognitive skill to understand. As such, they provide a clear measure of the individual’s ability to perform the physical task.
Motor Coordination: The ability to coordinate the eyes and the hands or fingers in quick, precise, accurate movements. Tests of motor coordination present people with a page of small boxes and require them to make a mark in as many boxes as possible within a brief designated time.
Finger Dexterity: The ability to make small, rapid, and accurate movements with the fingers such as in typing and to move small objects quickly and accurately such as in assembling two or more objects. Tests of finger dexterity require people to assemble simple objects such as putting a washer on a rivet and to disassemble objects such as taking washers off of rivets and returning the washers and rivets to their storage location.
Manual Dexterity: The ability to make coordinated movements with the hands quickly and skillfully. Tests of manual dexterity require people to place objects in designated positions or to turn objects from one position to a designated position.
Composite scores are scores obtained by combining the scores obtained on two or more tests. Often, test scores are interpreted both in terms of their meaning as a stand-alone score and as part of a composite. The concept of intelligence is probably the best-known composite. Many intelligence tests yield three composite scores: a verbal intelligence, a performance intelligence, and an overall or full-scale intelligence score. Another way to conceptualize intelligence is in terms of the mental processes that form the basis of cognitive behavior. This perspective suggests that intellectual behavior involves understanding, organizing, thinking, and remembering. Important composite scores obtained from tests of cognitive ability that reflect this view of intelligence are verbal comprehension, perceptual organization, processing speed, and working memory.
Verbal Intelligence: Obtained by combining scores on measures of verbal, numerical, and spatial aptitude. This composite provides an overall measure of people’s abstract reasoning ability and ability to comprehend and learn new skills. This composite is heavily influenced by verbal skills.
Performance Intelligence: Obtained by combining scores on measures that require both abstract reasoning and the manipulation of objects such as blocks, beads, pictures, or puzzle pieces. This composite provides a measure of abstract reasoning ability that relies less heavily on the use of words and verbal skills. The ability to understand nonverbal material figures more prominently in this composite.
Intelligence: Obtained by combining scores on verbal intelligence and performance intelligence. In tests such as the WAIS-III, MAB, and SB5, this composite incorporates information from verbal and performance composites of five or six tests each. In multi-aptitude batteries such as the DAT and USES GATB, this composite incorporates information from measures of verbal aptitude, numerical aptitude, and spatial aptitude.
Verbal Comprehension: This composite provides an overall measure of the individual’s ability to understand and work with verbal information. It is obtained by combining scores on measures that assess vocabulary, general information, and the ability to work with ambiguous information to solve problems when presented in verbal form.
Perceptual Organization: This composite provides information about the individual’s ability to analyze information that is presented in a nonverbal form and to organize it into a meaningful pattern. It is obtained by combining scores on measures that require the individual to work with pictures, blocks, or matrices.
Processing Speed: This composite provides information about the speed with which the person can work with abstract symbols. It is obtained by combining scores on tests that assess the ability to work with abstract symbols that do not have any readily accessible verbal meaning.
Working Memory: This composite provides information about the person’s ability to hold information in memory and work with it to solve problems. It is obtained by combining scores on tests that require the individual to remember patterns formed by pictures of beads, numbers, and letters.
The development of the psychological test is one of the most important and enduring contributions of psychology to civilization. Indeed, noted psychologists Rene V. Dawis and David Lubinski regard psychological tests as serving the same function for psychologists that the microscope and telescope serve for microbiologists and astronomers. Tests provide psychologists with the ability to see phenomena that would otherwise be invisible. Many of the enriching benefits psychology has contributed to modern day society would not have been possible without the use of psychological tests.
Early psychologists began work on the first tests of achievement, aptitude, and ability in the late 1800s. Research and innovation up to World War II focused largely on the development of methods for measuring vocational interests and cognitive and motor skills. The modern science of psychological measurement is attributable to decades of research on tests, as is the fruitful diversity of tests developed by psychologists. Although psychologists have conscientiously developed tests to address the full range of social needs, their productivity in the area of achievement, aptitude, and ability testing is unsurpassed.
- American Educational Research Association, American Psychological Association, and National Council on Measurement in Education. (1999). The standards for educational and psychological testing. Washington, DC: American Educational Research Association.
- American Psychological Association, Joint Committee on Testing Practices. (2005). Code of fair testing practices in education. Washington, DC: Author.
- Anastasi, A., & Urbina, S. (1997). Psychological testing (7th ed.). Upper Saddle River, NJ: Prentice Hall.
- Hambleton, R. K., Robin, F., & Xing, D. (2000). Item response models for the analysis of educational and psychological test data. In H. E. A. Tinsley & S. D. Brown (Eds.), Handbook of applied multivariate statistics and mathematical modeling (pp. 553-581). San Diego, CA: Academic Press.
- National Council on Measurement in Education. (1995). Code of professional responsibilities in educational measurement. Madison, WI: Author.
- Spies, R. A., & Plake, B. S. (Eds.). (2005). The sixteenth mental measurements yearbook. Lincoln, NE: Buros Institute of Mental Measurements.
- Walsh, W. B., & Betz, N. E. (1995). Tests and assessment (4th ed.). Upper Saddle River, NJ: Prentice Hall.