Why straight A’s may indicate poor learning – report from an unusual study

This post is the promised sequel to its predecessor, On making omelets and learning math.

So you got an A. What does that say about how well you are able to apply your new-found knowledge a month from now?

There’s plenty of research into learning (from psychology, cognitive science, neuroscience, and other disciplines) that explains why learning mathematics (more precisely, learning it well, so you can use it later on) is intrinsically difficult and frustrating. But for non-scientists in particular, no amount of theoretical discussion will have quite the impact as the hard evidence from a big study, particularly one run the same way pharmaceutical companies test the effectiveness (and safety) of a new drug.

Unfortunately, studies of that nature are hard to come by in education—for the simple reason that, unlike pharmaceutical research, they are all but impossible to run in the field of learning.

But there is one such study. It was conducted a few years ago, not in K-12 schools, but at a rather unique, four-year college. That means you have to be cautious when it comes to drawing conclusions about K-12 learning. So bring your own caution. My guess is that, like me, when you read about the study and the results it produced, you will conclude they do apply to at least Grades 8-12. (I can’t say more than that because I have no experience with K-8, either first-hand or second.)

The benefits of conducting the study at this particular institution was that is allowed the researchers to conduct a randomized control study on a group of over 12,000 students over a continuous nine-year period starting with their first four years in the college. That’s very much like the large scale, multi-year studies that pharmaceutical companies run (indeed, are mandated to run) to determine the efficacy and safety of a new drug. It’s impossible to conduct such a study in most K-16 educational institutions—for a whole variety of reasons.

Classroom at the United States Air Force Academy in Colorado Springs, Colorado

For the record, I’ll tell you the name of that particular college at the outset. It’s the United States Air Force Academy (USAFA) in Colorado Springs, Colorado. Later in this article, I’ll give you a full overview of USAFA. As you will learn, in almost all respects, its academic profile is indistinguishable from most US four-year colleges. The three main differences—all of which are important for running a massive study of the kind I am talking about—are that (1) the curriculum is standard across all instructors and classes, (2) grading is standardized across all classes, and (3) students have to serve five years in the Air Force after graduation, during which time they are subject to further standardized monitoring and assessment. This framework provided the researchers a substantial amount of reliable data to measure how effective were the four years of classes as preparation for the graduates first five years in their chosen specialization within the Air Force.

True, the students at USAFA are atypical in wanting a career in the military (though for some it is simply a way to secure a good education “at no financial cost”, and after their five years of service are up they leave and pursue a different career). In particular, they enter having decided what they want to do for the next nine years of their lives. That definitely needs to be taken into account when we interpret the results of the study in terms of other educational environments. I’ll discuss that in due course. As I said, bring your own caution. But do look at—and reflect on—the facts before jumping to any conclusion

If that last (repeated) warning did not get your attention, the main research finding from the study surely will: Students who perform badly on course assignments and end-of-course evaluations turn out to have learned much better than students who sail through the course with straight A’s.

There is, as you might expect, a caveat. But only one. This is an “all else being equal” result. But it is a significant finding, from which all of us in the math instruction business can learn a lot.

As I noted already, conducting a study that can produce such an (initially surprising) result with any reliability is a difficult task. In fact, in a normal undergraduate institution, it’s impossible on several counts!

First obstacle: To see how effective a particular course has been, you need to see how well a student performs when they later face challenges for which the course experience is—or at least, should be—relevant. That’s so obvious, in theory it should not need to be stated. K-16 education is meant to prepare students for the rest of their lives, both professional and personal. How well they do on a test just after the course ends would be significant only if it correlated positively with how well they do later when faced with having to utilize what the course purportedly taught them. But, as the study shows, that is not the case; indeed the correlation is negative. 

The trouble is, for the most part, those of us in the education system usually have no way of being able to measure that later outcome. At most we can evaluate performance only until the student leaves the institution where we teach them. But even that is hard. So hard, that measuring learning from a course after the course has ended and the final exam has been graded is rarely attempted.

Certainly, at most schools, colleges, or universities, it’s just not remotely possible to set up a pharmaceutical-research-like, randomized, controlled study that follows classes of students for several years, all the time evaluating them in a standardized, systematic way. Even if the course learning outcomes being studied are from a first-year course at a four-year college, leaving the student three further years in the institution, students drop out, select different subsequent elective courses, or even change major tracks.

That problem is what made the USAFA study particularly significant. Conducted from 1997 to 2007, the subjects were 12,568 USAFA students. The researchers were Scott E. Carrell, of the Department of Economics at the University of California, Davis and James E. West of the Department of Economics and Geosciences at USAFA.

As I noted earlier, since USAFA is a fairly unique higher education institute, extrapolation of the study’s results to any other educational environment requires knowledge of what kind of institution it is.

USAFA is a fully accredited undergraduate institution of higher education with an approximate enrollment of 4,200 students. It offers 32 majors, including humanities, social sciences, basic sciences, and engineering. The average SAT for the 2005 entering class was 1309 with an average high school GPA of 3:60 (Princeton Review 2007). Applicants are selected for admission on the basis of academic, athletic, and leadership potential, and a nomination from a legal nominating authority. All students receive 100 percent scholarship to cover their tuition, room, and board. Additionally, each student receives a monthly stipend of $845 to cover books, uniforms, computer, and other living expenses. All students are required to graduate within four years, after which they must serve a for five years as a commissioned officer in the Air Force.

Approximately 17% of the study sample was female, 5% was black, 7% Hispanic, and 5% Asian. 

Academic aptitude for entry to USAFA is measured through SAT verbal and SAT math scores and an academic composite that is a weighted average of an individual’s high school GPA, class rank, and the quality of the high school attended. All entering students take a mathematics placement exam upon matriculation, which tests algebra, trigonometry, and calculus. The sample mean SAT math and SAT verbal are 663 and 632, with respective standard deviations of 62 and 66. 

UAAFA students are required to take a core set of approximately 30 courses in mathematics, basic sciences, social sciences, humanities, and engineering. Grades are determined on an A, A-, B+, B, …, C-, D, F scale, where an A is worth 4 grade points, an A- is 3.7 grade points, a B+ is 3.3 grade points, etc. The average GPA for the study sample was 2.78. Over the ten-year period of the study there were 13,417 separate course-sections taught by 1, 462 different faculty members. Average class size was 18 students per class and approximately 49 sections of each core course were taught each year.

USAFA faculty, which are both military officers and civilian employees, have graduate degrees from a broad sample of high quality programs in their respective disciplines, similar to a comparable undergraduate liberal arts college. 

Clearly, in many respects, this reads like the academic profile many American four-year colleges and universities. The main difference is the nature of the student body, where USAFA students enter with a specific career path in mind (at least for nine years), albeit a career path admitting a great many variations, perhaps also, in many cases, with a high degree of motivation. While that difference clearly has to be taken in mind when using the study’s results to make inferences for higher education as a whole, the research benefits of such an organization are significant, leading to results highly reliable for that institution.

First, there is the sheer size of the study population. So large, that there was no problem randomly assigning students to professors over a wide variety of standardized core courses. That random assignment of students to professors, together with substantial data on both professors and students, enabled the researchers to examine how professor quality affects student achievement, free from the usual problems of student self-selection. 

Moreover, grades in USAFA core courses are a consistent measure of student achievement because faculty members teaching the same course use an identical syllabus and give the same exams during a common testing period. 

Student grades in mathematics courses, in particular, are particularly reliable measures. Math professors grade only a small proportion of their own students’ exams, which vastly reduces the ability of “easy” or “hard” grading professors to affecting their students’ grades. Math exams are jointly graded by all professors teaching the course during that semester in “grading parties” where Professor A grades question 1 for all students, Professor B grades question 2 for all students, and so on. Additionally, all professors are given copies of the exams for the course prior to the start of the semester. All final grades in all core courses are determined on a single grading scale and are approved by the department chair. Student grades can thus be taken to reflect the manner in which the course is taught by each professor.

A further significant research benefit of conducting the study at USAFA is that students are required to take, and are randomly assigned to, numerous follow-on courses in mathematics, humanities, basic sciences, and engineering, so that performance in subsequent courses can be used to measure effectiveness of earlier ones—which, as we noted earlier, is a far more meaningful measure of (real) learning than weekly assignments or an end-of-term exam.

It is worth noting also that, even if a student has a particularly bad introductory course instructor, they still are required to take the follow-on related curriculum.

If you are like me, given that background information, you will take seriously the research results obtained from this study. At a cost of focusing on a special subset of students, the statistical results of the study will be far more reliable and meaningful than for most educational studies. Moreover, the study will be measuring the important, long term benefits of the course. So what are those results?

First, the researchers found there are relatively large and statistically significant differences in student achievement across professors in the contemporaneous course being taught. A one-standard deviation increase in the professor fixed effect (a variable like age, sex, ethnicity, or qualifications, that is constant across individuals) results in a 0:08 to 0:21-standard deviation increase in student achievement. 

Introductory course professors significantly affect student achievement in follow-on related courses, but these effects are quite heterogeneous across subjects.

But here is the first surprising result. Students of professors who as a group perform well in the initial mathematics course perform significantly worse in the (mandatory) follow-on related math, science, and engineering courses. For math and science courses, academic rank, teaching experience, and terminal degree status of professors are negatively correlated with contemporaneous student achievement, but positively related to follow-on course achievement. That is, students of less experienced instructors who do not possess terminal degrees perform better in the contemporaneous course being taught, but perform worse in the follow-on related courses. 

Presumably, less academically qualified instructors may spur (potentially unsustained) interest in a particular subject through higher grades, but those students perform significantly worse in follow-on related courses that rely on the initial course for content.  (Interesting side note: for humanities courses, the researchers found almost no relationship between professor observable attributes and student achievement.)

Turning our attention from instructors to students, the study found that students who struggle and frequently get low grades tend to do better than the seemingly “good” students, when you see how much they remember, and how well they can perform, months or even years later

This is the result I discussed in the previous post. On the face of it, you might still find that result had to believe. But it’s hard to ignore the result of a randomized control study of over 12,000 students over a period of nine years.

For me, the big take-home message from the study is the huge disparity between course grades produced at the time and assessment of learning obtained much later. The only defense of contemporaneous course grades I can think of is that in most instances they are the only metric that is obtainable. It would be a tolerable defense were it not for one thing. Insofar as there is any correlation between contemporaneous grades and subsequent ability to remember and make productive use of what was learned in the course, that correlation is negative.

It makes me wonder why we continue, not only to use end-of-course grades, but to frequently put great emphasis on them and treat them as if they were predictive of future performance. Continuous individual assessment of a student by a well trained teacher is surely far more reliable.

A realization that school and university grades are poor predictors of future performance is why many large corporations that employ highly skilled individuals increasingly tend to ignore academic grades and conduct their own evaluations of applicants.