Teaching math for life in a wicked world

Do these mental activities develop critical thinking or make you a better problem solver?

Both doing math and playing chess are frequently touted as beneficial to developing good critical thinking skills and problem solving ability. And on the face of it, it seems that they self-evidently  will have that effect. Yet, despite being a mathematician (though not a chess player), I always had my doubts. I long harbored a suspicion that a course on, say, history or economics would (if suitably taught) serve better in that regard. It turns out my suspicions were well founded. Read on.

The point is, mathematics and chess are highly constrained domains defined by formal rules. In both domains, the problems you have to solve are what are sometimes referred to as “kind problems”, a classification introduced in 2015 to contrast them to “wicked problems”, a term introduced in the social sciences in the late 1960s to refer to problems that cannot be solved by the selection and application of a rule-based procedure.

Actually, that is not a good definition of a wicked problem; for the simple reason that there is no good, concise definition. But once you get the idea (check out the linked Wikipedia entry above), you will find you can recognize a wicked problem when you see one. In fact, pretty well any  problem that arises in the social sciences, or in business, or just in life in general, is a wicked problem.

For example, is it a good idea to install solar panels to power your home? Most of us initially compare several mental images, one of a bank of solar panels on a roof, another of a smoke-emitting, coal-fired, power plant, another of a nuclear power plant, and perhaps one of a wind-turbine. We can quickly list pluses and minuses for each one. 

Given how aware we are today of the massive dangers of climate change resulting from the emission of greenhouse gases, we probably dismiss the coal-fired power plant right away. 

But for the other three, you really need to look at some data. For example, solar panels seem to be clean, they make no noise, they require very little maintenance, and unlike wind turbines they don’t kill birds. But what is the cost of manufacturing them (including the mining and processing of the materials from which they are made), both monetarily and in terms of impact on the environment? What about the cost of disposing of them when they fail or become too old to function properly? Without some hard data, it’s impossible to say whether they are the slam-dunk best choice we might initially see them as.

In fact, as soon as you set aside an hour or so to think about this problem, you start to realize you are being drawn into a seemingly endless series of “What if?” and “What about?” questions, each requiring data before you can begin to try to answer it. For example, what if a house with a solar-paneled roof is burned in a wildfire, a possibility that residents in many parts of the western United States now face every year? Do those solar panels spew dangerous chemicals into the atmosphere when they burn at very high temperatures? How big a problem would that be? What if, as increasingly happens these days, an entire community burns? How many homes need to burn for the concentration of chemicals released into the atmosphere to constitute a serious danger to human life? 

You are clearly going to have to use mathematics as a tool to collect and analyze the data you need to make some reliable comparisons. But it’s also clear that “doing the math” is the easy part—or rather, the easier part. Particularly when there are digital tools available to do all the calculations and execute all the procedures. (See below.) But what numbers to you collect? Which factors do you consider? Which of them do you decide to include in your comparison dataset and which to ignore?

Part of making these decisions will likely involve applying number sense. For instance, for some factors, the numbers may be too small (compared to those associated with other factors) to make it worthwhile including those factors in your quantitative analysis.

Or maybe—if you are very, very lucky—the numbers for one factor dominate all the others, in which case the problem is essentially a kind one, and you can get the answer by old-fashioned “doing the math.” But that kind of outcome is extremely rare.

Usually you have to make trade-offs and compare the numbers you have against other, less quantifiable considerations. This means that problems like this cannot be solved using mathematics alone. And that in turn means they have to be tackled by diverse (!) teams, with each team member bringing different expertise.

For sure, one of the team definitely needs to be mathematically able. But, while just one mathematician may be enough, the others should, ideally, know enough about mathematics to work effectively with the math expert (or experts) on the team.

This is, of course, a very different scenario from the notion of a “mathematical problem solver” that everyone had in mind when I learned mathematics in the 1960s. Back when I was working toward my high school leaving certificate and then my mathematics bachelors degree, with a view to a career as a mathematician, I imagined myself spending most of my professional time working alone. And indeed, for several years, that was the case. But then things changed. Keep reading.

I began this essay with a question: does learning math or playing chess make you a better reasoner—a better problem solver? I hope by now that the answer is clear. For kind problems, almost certainly it does. The largely linear, step-by-step process you need to solve a kind problem involves the same kind of mental processes as math and chess.

But for wicked problems, the above short discussion of selecting among alternative energy sources should indicate that the kind of thinking required is very different. And in an era when machines can beat humans at chess and can do all the heavy lifting for solving a kind math problem (see below), it’s the wicked problems that require humans to solve them.

In other words, in the world our students will inhabit, skill at solving wicked problem is what is needed. And that requires training in mathematics that is geared towards that goal.

So, what does this all mean for us mathematics educators?

The educational preparation for being able to solve wicked problems clearly (see above) has to be very different from what is required to develop the ability to solve kind problems. In domains like mathematics and chess, once you have mastered the underlying rules, repeated, deliberate practice will, in time, make you an expert. The more you practice (that is, deliberate practice—this is a technical term; google “Anders Ericsson deliberate practice”), the better you become.

In this regard, chess and mathematics are like playing a musical instrument and many sports, where repeated, deliberate practice is the road to success. This is where the famous “10,000 hours” meme is applicable, a somewhat imprecise but nevertheless suggestive way to capture the empirical observation that true experts in such domains typically spent a great many hours engaged in deliberate practice in order to achieve their success.

But deliberate practice does not prepare people to engage with wicked problems. And that is a major problem for educators, because, as I noted already, the vast majority of problems people face in their lives or their jobs today are wicked problems.

This state of affairs is new, at least for mathematicians. (Not for social scientists.) Until the early 1990s, mathematics educators did not have to face accusations that they were not preparing their students adequately for the lives they would lead, because being able to calculate (fast and accurately) was an essential life skill, and being able to execute mathematical procedures quickly and accurately was important in many professions (and occasionally in everyday life).

But the 1960s brought the electronic calculator, that could outperform humans at arithmetic, and the late 1980s saw the introduction of digital technologies that can execute pretty well any mathematical procedure—faster, with way more accuracy, and for far greater datasets, than any human could do. Once those technological aids became available, it was only a matter of time until they became sufficiently ubiquitous to render obsolete, human skill at performing calculations and executing procedures.

It did not take long. By the start of the Twenty-First Century, we were at that point of obsolescence (of those human skills).

To be sure, there remains a need for students to learn how to calculate and execute procedures, in order to understand the underlying concepts and the methods so they can make good, safe, effective use of the phalanx of digital mathematics tools currently available. But what has gone is the need for many hours of deliberate practice to achieve skills mastery.

The switch by the professionals in STEM fields, from executing procedures by hand to using digital tools to do it, happened very fast, and with remarkably little publicity. Consequently, few people outside the professional STEM communities realized it had occurred. Certainly, few mathematics teachers were aware of the scope of the change, including college instructors at non-research institutions.

But in the professional STEM communities, the change not only happened fast, it was total. The way mathematics is used in the professional STEM world today is totally different how it was used for the previous three thousand years. And it’s been that way for thirty years now.

As a consequence of this revolution in mathematical praxis—and it really was a revolution—the mathematical ability people need in today’s world is not calculation or the execution of procedures, as it had been for thousands of years, but being able to use mathematics (or more generally mathematical thinking) to help solve wicked problems. [Note that digital tools won’t solve a wicked problem for you. The most they can do is help out by handling the procedural math parts for you.]

Today, to solve a kind mathematical problem, there is almost certainly a digital tool that will handle it, most likely Wolfram Alpha. To be sure, some knowledge is required to be able to do that. And we must make sure our students graduate with those skills. But what they no longer need is the high skills mastery that requires years of deliberate practice. Adjusting to this new reality is straightforward, and many teachers have already made that change. You teach—and assess—the same concepts and methods as before, but with the goal being understanding rather than performance.

When it comes to solving wicked problems, however, what people need is an ability to use mathematics in conjunction with other ways of thinking, other methodologies, and other kinds of knowledge. And that really is a people thing. No current digital device is remotely close to being able to solve a wicked problem, and maybe never will be.

So what exactly do those of us in mathematics education have to do to ensure that our students acquire the knowledge and skills they will require in today’s world?

Well, the biggest impact in terms of changing course content and structure is at the college and university level. Major changes are required there, and indeed are already well underway. In particular, tertiary-level students are going to have to learn, through repeated classroom experiences, how to tackle wicked problems. Project-based teamwork is going to have to play a big role. (See below.)

In terms of K-12, however, there is a good argument to be made for continuing to focus on highly constrained, kind problems that highlight individual concepts and techniques. A solid grounding in basic mathematical concepts and techniques is absolutely necessary for any mathematical work that will come later. 

That’s certainly an argument I would make—though as always when discussing K-12 education issues, I hold back from providing specific advice to classroom teachers, particularly K-10, since that is not my domain. Check out youcubed.org for that!

But I do have many years of first-hand experience of how mathematics is used in the world, and based on that background I can add my voice to the chorus who are urging a shift in K-12 mathematics education, away from basic computation and procedural skills mastery, to preparing students for a world in which using mathematics involves utilizing the available tools for performing calculations and executing procedures. 

It’s definitely not a question of new content being needed (at the K-12 level). The goals of the Common Core State Standards for Mathematics already cover the main concepts and topics required. Today’s world does not run on some brand new kind of mathematics—though there is some of that, with new techniques being developed and introduced all the time. The familiar concepts developed and used over the centuries are still required.

Rather, the change has been in mathematical praxis: how math is done and how it is used. The main impact of the 1990s mathematical praxis revolution on K-12 is that there is no longer any need for repetitive, deliberate practice to develop fast, fluent skills at calculation and the execution of procedures—since those skills have been outsourced to machines. [Yes, I keep repeating this point. It’s important.]

Insofar as students engage in calculation and executing procedures—and they certainly should— the goal is not smooth, accurate, or fast execution, but understanding. For that is what they need (in spades) to make use of all those shiny new digital math technologies. (Actually, many of them are hardly new or shining, being forty years of age and older. It just took a while before they found their way outside the professional STEM community.)

So, while leaving it to experienced K-12 educators and academic colleagues such as Jo Boaler to figure out how best to teach school math for life in the 21st Century, let me finish by giving some indication of how tertiary education (the world I am familiar with) is changing to meet the new need. That, after all, is what many high school graduates will face in the next phase of their education, so the more their K-12 experience prepares them for that, the better will be their subsequent progress.

In contrast to K-12, when it comes to tertiary education, other than in the mathematics major (a special case I’ll come back to in a future post), the focus should be on developing students’ ability to tackle wicked problems.

How best to do that is an ongoing question, but a lot is already known. That’s also something I’ll pursue in a future post. As a teaser, however, let me end by highlighting some key elements of the skillset required to tackle a wicked problem. 

Let me stress that this list is one drawn up for college level students. In fact, this post is extracted and adapted from a longer one I just wrote for the Mathematical Association of America, a professional organization for college-level math instructors. Neither I nor (I believe) anyone else is advocating doing this kind of thing at levels K-10. (Though maybe for grades 11 and 12. I have tried this out with high school juniors and seniors, and it has gone well.) 

[Incidentally, the list is not just something that someone dreamt up sitting in an armchair. Well, maybe it started out that way. But there is plenty of research into what it takes to produce good teamwork that achieves results. I get a lot of my information from colleagues at Stanford who work on these issues. But there are many good sources on the Web.]

To solve a wicked problem, you should:

  • Work in a diverse team. The more diverse the better.
  • Recognize that you don’t know how to solve it. 
  • If you think you do, be prepared for others on the team to quickly correct you. (And be a good, productive “other on the team” and correct another member when required.)
  • OTOH, you might even not be sure what the heart of the problem really is; or maybe you do but it turns out that other team members think it’s something else. Answering that question is now part of the “solution” you are looking for.
  • Be collegial at all times (even when you think you need to be forceful), but remember that if you are the only expert on discipline X, the others do need to hear your input when you think it is required.
  • The other team members may not recognize that your expertise is required at a particular point. Persuade them otherwise.
  • Listen to the other team members. Constantly remind yourself that they each bring valuable expertise unique to them.
  • It’s all about communication. That has two parts: speaking and listening. If the team has at least three member, you should be listening more than you are speaking. (Do the math as to how frequently you “should” be speaking, depending on the size of the team.)
  • The onus is on you to explain your input to the others. They do not have your background and context for what you say. With the best will in the world—which you can reasonably expect from the team—they depend on you to explain what you are advocating or suggesting.
  • If the group agrees that one of you needs to give a short lesson to the others, fine. Telling people things and showing them how to do things are useful ways of getting them to learn things.
  • These are not rules; they are guidelines.
  • Guidelines can be broken. Sometimes they should be.

So there you have it. If you are teaching math in K-12 and you can ensure that when your students graduate they can thrive—and enjoy—working in that fashion, you will have set them up for life. 

That’s the wicked truth.

NOTE: A longer, overlapping essay discussing kind versus wicked problems, but aimed at college and university research and education professionals, can be found in my November 1 post on the Devlin’s Angle page on the Mathematical Association of America’s MATHVALUES website.

A conversation between Jo Boaler and Keith Devlin

On May 23, 2019, Stanford Mathematics Education Professor Jo Boaler, the founder and director of youcubed, and I sat down before a public audience in Cubberley Auditorium on the Stanford campus to have a discussion about the nature of 21st Century mathematics and the changes it requires to the way mathematics is taught in our schools. The (edited) video our our conversation is now available on this website and the youcubed website. (See the Videos page on either site.) Produced by youcubed in conjunction with SUMOP. Run time 31min 28sec.

Why straight A’s may indicate poor learning – report from an unusual study

This post is the promised sequel to its predecessor, On making omelets and learning math.

So you got an A. What does that say about how well you are able to apply your new-found knowledge a month from now?

There’s plenty of research into learning (from psychology, cognitive science, neuroscience, and other disciplines) that explains why learning mathematics (more precisely, learning it well, so you can use it later on) is intrinsically difficult and frustrating. But for non-scientists in particular, no amount of theoretical discussion will have quite the impact as the hard evidence from a big study, particularly one run the same way pharmaceutical companies test the effectiveness (and safety) of a new drug.

Unfortunately, studies of that nature are hard to come by in education—for the simple reason that, unlike pharmaceutical research, they are all but impossible to run in the field of learning.

But there is one such study. It was conducted a few years ago, not in K-12 schools, but at a rather unique, four-year college. That means you have to be cautious when it comes to drawing conclusions about K-12 learning. So bring your own caution. My guess is that, like me, when you read about the study and the results it produced, you will conclude they do apply to at least Grades 8-12. (I can’t say more than that because I have no experience with K-8, either first-hand or second.)

The benefits of conducting the study at this particular institution was that is allowed the researchers to conduct a randomized control study on a group of over 12,000 students over a continuous nine-year period starting with their first four years in the college. That’s very much like the large scale, multi-year studies that pharmaceutical companies run (indeed, are mandated to run) to determine the efficacy and safety of a new drug. It’s impossible to conduct such a study in most K-16 educational institutions—for a whole variety of reasons.

Classroom at the United States Air Force Academy in Colorado Springs, Colorado

For the record, I’ll tell you the name of that particular college at the outset. It’s the United States Air Force Academy (USAFA) in Colorado Springs, Colorado. Later in this article, I’ll give you a full overview of USAFA. As you will learn, in almost all respects, its academic profile is indistinguishable from most US four-year colleges. The three main differences—all of which are important for running a massive study of the kind I am talking about—are that (1) the curriculum is standard across all instructors and classes, (2) grading is standardized across all classes, and (3) students have to serve five years in the Air Force after graduation, during which time they are subject to further standardized monitoring and assessment. This framework provided the researchers a substantial amount of reliable data to measure how effective were the four years of classes as preparation for the graduates first five years in their chosen specialization within the Air Force.

True, the students at USAFA are atypical in wanting a career in the military (though for some it is simply a way to secure a good education “at no financial cost”, and after their five years of service are up they leave and pursue a different career). In particular, they enter having decided what they want to do for the next nine years of their lives. That definitely needs to be taken into account when we interpret the results of the study in terms of other educational environments. I’ll discuss that in due course. As I said, bring your own caution. But do look at—and reflect on—the facts before jumping to any conclusion

If that last (repeated) warning did not get your attention, the main research finding from the study surely will: Students who perform badly on course assignments and end-of-course evaluations turn out to have learned much better than students who sail through the course with straight A’s.

There is, as you might expect, a caveat. But only one. This is an “all else being equal” result. But it is a significant finding, from which all of us in the math instruction business can learn a lot.

As I noted already, conducting a study that can produce such an (initially surprising) result with any reliability is a difficult task. In fact, in a normal undergraduate institution, it’s impossible on several counts!

First obstacle: To see how effective a particular course has been, you need to see how well a student performs when they later face challenges for which the course experience is—or at least, should be—relevant. That’s so obvious, in theory it should not need to be stated. K-16 education is meant to prepare students for the rest of their lives, both professional and personal. How well they do on a test just after the course ends would be significant only if it correlated positively with how well they do later when faced with having to utilize what the course purportedly taught them. But, as the study shows, that is not the case; indeed the correlation is negative. 

The trouble is, for the most part, those of us in the education system usually have no way of being able to measure that later outcome. At most we can evaluate performance only until the student leaves the institution where we teach them. But even that is hard. So hard, that measuring learning from a course after the course has ended and the final exam has been graded is rarely attempted.

Certainly, at most schools, colleges, or universities, it’s just not remotely possible to set up a pharmaceutical-research-like, randomized, controlled study that follows classes of students for several years, all the time evaluating them in a standardized, systematic way. Even if the course learning outcomes being studied are from a first-year course at a four-year college, leaving the student three further years in the institution, students drop out, select different subsequent elective courses, or even change major tracks.

That problem is what made the USAFA study particularly significant. Conducted from 1997 to 2007, the subjects were 12,568 USAFA students. The researchers were Scott E. Carrell, of the Department of Economics at the University of California, Davis and James E. West of the Department of Economics and Geosciences at USAFA.

As I noted earlier, since USAFA is a fairly unique higher education institute, extrapolation of the study’s results to any other educational environment requires knowledge of what kind of institution it is.

USAFA is a fully accredited undergraduate institution of higher education with an approximate enrollment of 4,200 students. It offers 32 majors, including humanities, social sciences, basic sciences, and engineering. The average SAT for the 2005 entering class was 1309 with an average high school GPA of 3:60 (Princeton Review 2007). Applicants are selected for admission on the basis of academic, athletic, and leadership potential, and a nomination from a legal nominating authority. All students receive 100 percent scholarship to cover their tuition, room, and board. Additionally, each student receives a monthly stipend of $845 to cover books, uniforms, computer, and other living expenses. All students are required to graduate within four years, after which they must serve a for five years as a commissioned officer in the Air Force.

Approximately 17% of the study sample was female, 5% was black, 7% Hispanic, and 5% Asian. 

Academic aptitude for entry to USAFA is measured through SAT verbal and SAT math scores and an academic composite that is a weighted average of an individual’s high school GPA, class rank, and the quality of the high school attended. All entering students take a mathematics placement exam upon matriculation, which tests algebra, trigonometry, and calculus. The sample mean SAT math and SAT verbal are 663 and 632, with respective standard deviations of 62 and 66. 

UAAFA students are required to take a core set of approximately 30 courses in mathematics, basic sciences, social sciences, humanities, and engineering. Grades are determined on an A, A-, B+, B, …, C-, D, F scale, where an A is worth 4 grade points, an A- is 3.7 grade points, a B+ is 3.3 grade points, etc. The average GPA for the study sample was 2.78. Over the ten-year period of the study there were 13,417 separate course-sections taught by 1, 462 different faculty members. Average class size was 18 students per class and approximately 49 sections of each core course were taught each year.

USAFA faculty, which are both military officers and civilian employees, have graduate degrees from a broad sample of high quality programs in their respective disciplines, similar to a comparable undergraduate liberal arts college. 

Clearly, in many respects, this reads like the academic profile many American four-year colleges and universities. The main difference is the nature of the student body, where USAFA students enter with a specific career path in mind (at least for nine years), albeit a career path admitting a great many variations, perhaps also, in many cases, with a high degree of motivation. While that difference clearly has to be taken in mind when using the study’s results to make inferences for higher education as a whole, the research benefits of such an organization are significant, leading to results highly reliable for that institution.

First, there is the sheer size of the study population. So large, that there was no problem randomly assigning students to professors over a wide variety of standardized core courses. That random assignment of students to professors, together with substantial data on both professors and students, enabled the researchers to examine how professor quality affects student achievement, free from the usual problems of student self-selection. 

Moreover, grades in USAFA core courses are a consistent measure of student achievement because faculty members teaching the same course use an identical syllabus and give the same exams during a common testing period. 

Student grades in mathematics courses, in particular, are particularly reliable measures. Math professors grade only a small proportion of their own students’ exams, which vastly reduces the ability of “easy” or “hard” grading professors to affecting their students’ grades. Math exams are jointly graded by all professors teaching the course during that semester in “grading parties” where Professor A grades question 1 for all students, Professor B grades question 2 for all students, and so on. Additionally, all professors are given copies of the exams for the course prior to the start of the semester. All final grades in all core courses are determined on a single grading scale and are approved by the department chair. Student grades can thus be taken to reflect the manner in which the course is taught by each professor.

A further significant research benefit of conducting the study at USAFA is that students are required to take, and are randomly assigned to, numerous follow-on courses in mathematics, humanities, basic sciences, and engineering, so that performance in subsequent courses can be used to measure effectiveness of earlier ones—which, as we noted earlier, is a far more meaningful measure of (real) learning than weekly assignments or an end-of-term exam.

It is worth noting also that, even if a student has a particularly bad introductory course instructor, they still are required to take the follow-on related curriculum.

If you are like me, given that background information, you will take seriously the research results obtained from this study. At a cost of focusing on a special subset of students, the statistical results of the study will be far more reliable and meaningful than for most educational studies. Moreover, the study will be measuring the important, long term benefits of the course. So what are those results?

First, the researchers found there are relatively large and statistically significant differences in student achievement across professors in the contemporaneous course being taught. A one-standard deviation increase in the professor fixed effect (a variable like age, sex, ethnicity, or qualifications, that is constant across individuals) results in a 0:08 to 0:21-standard deviation increase in student achievement. 

Introductory course professors significantly affect student achievement in follow-on related courses, but these effects are quite heterogeneous across subjects.

But here is the first surprising result. Students of professors who as a group perform well in the initial mathematics course perform significantly worse in the (mandatory) follow-on related math, science, and engineering courses. For math and science courses, academic rank, teaching experience, and terminal degree status of professors are negatively correlated with contemporaneous student achievement, but positively related to follow-on course achievement. That is, students of less experienced instructors who do not possess terminal degrees perform better in the contemporaneous course being taught, but perform worse in the follow-on related courses. 

Presumably, less academically qualified instructors may spur (potentially unsustained) interest in a particular subject through higher grades, but those students perform significantly worse in follow-on related courses that rely on the initial course for content.  (Interesting side note: for humanities courses, the researchers found almost no relationship between professor observable attributes and student achievement.)

Turning our attention from instructors to students, the study found that students who struggle and frequently get low grades tend to do better than the seemingly “good” students, when you see how much they remember, and how well they can perform, months or even years later

This is the result I discussed in the previous post. On the face of it, you might still find that result had to believe. But it’s hard to ignore the result of a randomized control study of over 12,000 students over a period of nine years.

For me, the big take-home message from the study is the huge disparity between course grades produced at the time and assessment of learning obtained much later. The only defense of contemporaneous course grades I can think of is that in most instances they are the only metric that is obtainable. It would be a tolerable defense were it not for one thing. Insofar as there is any correlation between contemporaneous grades and subsequent ability to remember and make productive use of what was learned in the course, that correlation is negative.

It makes me wonder why we continue, not only to use end-of-course grades, but to frequently put great emphasis on them and treat them as if they were predictive of future performance. Continuous individual assessment of a student by a well trained teacher is surely far more reliable.

A realization that school and university grades are poor predictors of future performance is why many large corporations that employ highly skilled individuals increasingly tend to ignore academic grades and conduct their own evaluations of applicants.

On making omelets and learning math

As the old saying goes, “You can’t make an omelet without breaking eggs.” Similarly, you can’t learn math without bruising your ego. Learning math is inescapably difficult, frustrating, and painful, requiring high tolerance of failure. Good teachers have long known this, but the message has never managed to get through to students and parents (and it appears, many system administrators who evaluate students, teachers, and schools).

The parallel (between making omelets and learning math) plays out in the classroom in a manner that many students and parents would find shocking, were they aware of it. It’s this.

All other factors being equal, when you test how well students have mastered course material some months or even years after the course has ended, students who do well in courses, getting mostly A’s on assignments and exams, tend to perform worse than students who struggled and got more mediocre grades at the time.

Yes, you read that correctly, the struggling students tend to do better than the seemingly “good” students, when you see how much they remember, and how well they can perform, months or even years later.

There is a caveat. But only one. This is an “all other things being equal” result, and assumes in particular that both groups of students want to succeed and make an effort to do so. I’ll give you the lowdown on this finding in just a moment. (And I will describe one particular, highly convincing, empirical demonstration in a follow-up post.) For now, let’s take a look at the consequences.

Since the purpose of education is to prepare students for the rest of their lives, those long term effects are far more important educationally than how well the student does in the course. I stressed that word “educationally” to emphasize that I am focusing on what a student learns. The grade a student gets from a course simply measures performance during the course itself. 

If the course grade correlated positively with (long-term) learning, it would be a valuable measure. But as I just noted, although there is a correlation, it is negative.  This means that educators and parents should embrace and celebrate struggle and mediocre results, and avoid the false reassurance of progress that is so often the consequence of a stellar classroom performance. 

Again, let me stress that the underlying science is an “all other things being equal” result. Assuming that requirement is met, a good instructor should pace the course so that each student is struggling throughout, constantly having to spend time correcting mistakes.

The simple explanation for this (perhaps) counter-intuitive state of affairs is that our brains learn as a result of trying to make sense of something we find puzzling, or struggling to correct an error we have made. 

Getting straight A’s in a course may make us feel good, but we are actually not learning something by so doing; we are performing. 

Since many of us discover that, given sufficient repetitive practice, we can do well on course assignments and ace the final exam regardless of how well we really understand what we are doing, a far more meaningful measure of how well we have learned something is to test us on it some time later. Moreover, that later test should not just be a variant of the course final exam; rather we should be tested on how able we are in making use of what we had studied, either in a subsequent course or in applying that knowledge or skills in some other domain.

It is when subjected to that kind of down-the-line assessment that the student who struggled tends to do better than the one who performed well during the course.

This is not just some theoretical idea, removed from reality. In particular, it has been demonstrated in a large, random control study conducted on over 12,000 students over a nine-year period.

The students were of traditional college age, at a four-year institution, and considerable effort was put in to ensuring that all important “all other things being equal” condition was met. I’ll tell you about the study and the institution where it was carried out in a follow-on post to this one. For now, let’s look at its implications for math teaching (for students of all ages).

To understand what is going on, we must look to other research on how people learn. This is a huge topic in its own right, with research contributions from several disciplines, including neurophysiology.

Incidentally, neurophysiologists do not find the negative-correlation result counter-intuitive. It’s what they would expect, based on what they have learned about how the brain works. 

To avoid this essay getting too long, I’ll provide an extremely brief summary of that research, oriented toward teaching. (I’ll come back to all these general learning issue in future posts. It’s not an area I have worked in, but I am familiar with the work of others who do.) 

Learning occurs when we get something wrong and have to correct it. This is analogous to the much better known fact that when we subject our bodies to physical strain, say by walking, jogging, or lifting weights, the muscles we strain become stronger—we gain greater fitness.

The neurophysiologists explain this by saying that understanding something or solving a problem we have been puzzling over, is a consequence of the brain forming new connections (synapses) between neurons. (Actually, it would be more accurate to say that understanding or solving actually is the creation of those new connections.) So we can think of learning as a process to stimulate the formation of new connections in our brain. (More accurately, we should think of learning as being the formation of those new connections.)

Exactly what leads to those new connections is not really known—indeed, some of us regard this entire neurons and synapses model of brain activity as, to some extent, a scientific metaphor. What is known is that it is far more likely to occur after a period in which the brain repeatedly tries to understand something or to solve the problem, and keeps failing. (This is analogous to the way muscles get stronger when we repeatedly subject them to strain, but in the case of muscles the mechanism is much better understood.) In other words, repeatedly trying and failing is an essential part of learning.

In contrast, repeatedly and consistently performing well strengthens existing neuronal connections, which means we get better at whatever it is we are doing, but that’s not learning. (It can, however, prepare the brain for further learning.) 

Based on these considerations, the most effective way to teach something in a way that will stick is to put students in a position of having to arrive at the best answer they can, without hints, even if it’s wrong. Then, after they have committed, you can correct, preferably with a hint (just one) to prompt them to rectify the error. Psychologists who have studied this refer to the approach as introducing “desirable difficulties.” Google it if you have not come across it before. The term itself is due to the Stanford psychologist Robert Bjork. 

For sure, the result of this approach makes students (and likely their parents and their instructor) feel uncomfortable, since the student does not appear to be making progress. In particular, if the instructor gauges it well, their assignment work and end-of-term test will be littered with errors. (Instructors should grade on the curve. I frequently set the pass mark around 30%, with a score of 60% or more correct getting an A, though in an ideal world I would have preferred to not be obliged to assign a letter grade, at least based purely on contemporaneous testing.)

Of course, the students are not going to be happy about this, and their frustration with themselves is likely to be offloaded onto the instructor. But, for all that it may seem counterintuitive, they will walk away from that course with far better, more lasting, and more usable learning than if they had spent the time in a feelgood semester of shallow reinforcement that they were getting it all right. 

To sum up: Getting things right, with the well-deserved feeling of accomplishment it brings, is a wonderful thing to experience, and should be acknowledged and rewarded—when you are out in the world applying your learning to do things.  But getting everything right is counterproductive if the goal is meaningful, lasting learning. 

Learning is what happens by correcting what you got wrong. Indeed, the learning is better if the correction occurs some time after the error is made. Stewing for a while in frustration at being wrong, and not seeing how to fix it, turns out to be a good thing. 

So, if you are a student, and your instructor refuses to put you out of your misery, at least be aware that the instructor most likely is doing so because they want you to learn. Remember, you can’t learn to ride a bike or skateboard without bruising your knees and your elbows. And you can’t learn math (and various other skills) without bruising your ego. 

Cracking your ego is an unavoidable part of learning.

What topics should be covered in school mathematics?

On May 25, Jo Boaler and I had a public conversation at Stanford about K-12 mathematics. (An edited video recording will be made available on the youcubed and SUMOP websites as soon as it is ready.) Our conversation was live-tweeted by @AliceKeeler, and that led to a lively twitter debate that lasted several days, with much of the focus on what topics should be taught.

Having been a professional mathematician for close on fifty years (first in pure mathematics, then in the world of business and government service), my take, which I articulated in my conversation with Jo and you can find in the Blue Notepad videos on this site, is somewhat unusual. I actually believe it is (in many ways, but not all) not important what is taught, but rather how it is taught.

My recent post in my monthly Devlin’s Angle for the Mathematical Association of America explains why I have that view. In a follow-up post next month, I will connect the argument I present this month with the discussion Jo and I had and the ensuing twitter debate.