Given the growing deluge of horrifying stories and images surrounding the global COVID-19 pandemic over the past few weeks, it is easy to find yourself reacting in one of two ways, both of which are potentially dangerous and neither of which is justified by the data.

First, when faced with daily reports of increasing numbers of deaths, you might think the odds are so stacked against you, there is little point in taking the recommended steps to protect yourself from infection. Alternatively, you could find yourself paralyzed with anxiety, afraid to do anything, and sink into a nihilistic depression.

Neither is justified by the actual data. In fact, what the data indicate are:

The odds are in favor of you avoiding infection if you follow the recommended preventative measures (avoiding contact with or close proximity to others, regular washing of hands, not touching your face, and a few others);

Even if you are infected, the odds favor you not having to be hospitalized;

The odds on you dying from COVID-19 infection are very low.

But just how good are those odds really? More specifically, how do they stack up against the risks associated with other things we encounter and do in life?

To allay my own initial anxiety, I did a few quick calculations, based on the data available in the press, and found that, while the risk of death is certainly much higher than that of, say, dying in an airplane crash, you are much more likely to die of heart disease than from the novel coronavirus.

How much more likely? Numerically, the risk is a whopping 28 times higher. But be careful. Whenever you see risk comparisons like that, you need to pay attention to what is being compared with what. On their own, the numbers tell you nothing; absent a proper context, that factor of 28 is meaningless.

In fact, that caveat about context can be significant whenever numbers are quoted; it is particularly so when a number is given to compare risks.

In the case of COVID-19, I went into the details of my calculation in a post in my personal blog profkeithdevlin.org. I leave you to follow the link and read what I said there. The mathematics is not hard; indeed, it is just basic arithmetic, along the lines of some of the more recent preceding posts in the SUMOP blog. The important thing is to keep in mind what the numbers refer to. The ultimate goal is not to produce a number — an answer; rather, it is to understand the relative risks.

A segment about the 2020 presidential election primaries on MSNBC News on March 5 caused a flood of comments on Twitter. Here is the tweet that brought the segment into the Twittersphere and started all the ruckus:

Both the guest, a member of the New York Times Editorial Board, and host Brian Williams failed to notice how absurd was the arithmetical claim being made.

If this is the first time you have seen this segment, it likely flew by too quickly to register. Here (right) is the original tweet that started it all.

Curious as to who put out the tweet, I checked her profile. (See below.) [Normally I anonymize tweets, even though they are public, but on this case the tweet was shown on live national television.] She starts out with the claim that she is bad at math. I have no idea whether she is or she isn’t. But that tweet does not show she is. It actually suggests she may be a better mathematical thinker than many – read on.

Many of the comments on Twitter lamented the poor arithmetical skills of the tweeter and the two media figures on the show. In fact, the story went well beyond Twitter. The next day, the Washington Post, no less, devoted an entire article to the gaffe.

The Big Lead took another swipe the same day, pointing out that the tweeter, Mekita Rivas, is a freelance writer and Washington Post contributor, and noting that the math disclaimer on her Twitter bio was a result of her oh-so-public gaffe.

But she was by no means alone. The error was repeated, or at least overlooked, by a whole host of smart media folk: Mara Gay, the New York Times editorial board member who brought it up in her on-camera conversation, the MSNBC graphics department, several producers, host Brian Williams himself, and likely more.

The episode is reminiscent of the numerical error made by a famous New York Times commentator that I wrote about in a post a few days ago (March 3). To be sure, both episodes highlight the clear need, in today’s data saturated and data-driven world, for all citizens to have a reasonably well developed number sense.

For this problem is not, as many critics of Ms Rivas claimed, that she cannot do basic arithmetic. I would not be surprised if she were no worse than most other people, and besides arithmetic skill became in-principle obsolete with the introduction of the electronic calculator in the 1960s, and in every practical sense became really obsolete when we entered an era where we all carry around an arithmetic powerhouse in our smartphone. [There is educational value from teaching basic arithmetic, but that is a separate issue.]

What is most definitely not obsolete, however, and is in fact, a crucial human skill in today’s world, is number sense. What was worrying about the whole Twitter–MSNBC episode is that none of those involved recognized instinctively that the claim in the original tweet was absurd.

The two follow-up news articles I just referred to delve into the absurdity of the error, which comes down to the difference between $1M+ and $1.52.

But this is where it gets interesting. What is it about the statement in the Rivas tweet that led a whole host of smart professionals to not see the error? What led them to not feel in their bones that the amount every American actually would receive from Bloomberg would be “something a bit north of a dollar-fifty” and not a sum “in excess of a million dollars.” This is not a question of calculating; it’s not poor arithmetic. It’s something else, and it’s far more dangerous. Doing arithmetic is something our iPhone can do, quickly, error-free, and with more numbers, and bigger numbers, than we computationally-puny humans can comfortably handle. Understanding numbers, on the other hand, is very much people stuff. Our devices understand nothing.

If a whole group of smart people are so quantitatively illiterate (and that’s what we are talking about) that they don’t instinctively see the Rivas error , how can we as a society make life-critical decisions such as assessing our personal risk in the coronavirus outbreak or the degree to which we should take a candidate’s climate change policies into consideration when deciding who to vote for.

Many of the videos and blogposts on (and referred to on) this website focus on number sense, and discuss how best to ensure no future citizen graduates from high school without adequate number sense. (The Common Core State Standards are designed to achieve that goal, though teaching practices often seem to miss that point, sometimes as a result of inappropriate administrative pressure coming from poorly informed politicians.)

What interested me in particular about the MSNBC example was the nature of the error. It’s similar to the example I discuss at the end of the first of the Blue Notebook videos on this site, in that the way the proposition is worded triggers learned arithmetic skills (more accurately, and appropriately derisive, “test-taking tricks”) that in general are reliable.

Here is the Rivas argument, spelled out:

1.FACT : Bloomberg spent $500 million on TV ads.

2. FACT: The US population is 327 million.

3. (FALSE) ARGUMENT: We are talking millions here. If you take the whole amount and divide it up among the population, everyone gets 500 divided by 327 (millions). Good heavens, that’s more than one million each!

Rivas is doing two smart things here – smart in the sense that, in general, they lead to the correct answer quickly with the least effort, which is what she (and you) likely needed to be able to do to pass timed math tests.

1. First, she says, everything is about millions, so we can forget those six zeros at the end of each number. [GOOD MOVE]

2. Then she says, we have to divide to see how much each person would get. That’s 500 divided by 327, which is around 1.5 (or at least more than 1). [GOOD MOVE]

3. Then finally she remembers everything is in millions. So it’s really $1.5M (or more than $1M). [EXCELLENT. SHE REMEMBERED THE SIMPLIFICATION AND PUT THE ZEROES BACK IN]

On its own, the idea behind each step is fine, indeed can be useful – in the right circumstances. But not, unfortunately, in this coupling! [I’m not saying she went through these steps consciously doing each one. Rather, she was surely applying a heuristic she had acquired with practice in order to pass math in school.]

The trouble is, if someone leaves school having mastered a bunch of heuristics to pass timed math tests – which is how many students get through math – but has not been taught how to think intelligently about numbers (and thereby develop number sense), then they are prone to be tripped up in this way.

Not convinced? Check out the example toward the end of that first Blue Notepad video. It’s a bit more subtle than the MSNBC example I am discussing here; in fact, more than half the people in every audience I have given that example to (under time pressure) get it wrong. The odds are, you would have too. But the overall message about math education is the same.

Ms. Rivas should take that disclaimer off her Twitter bio.

But maybe replace it by one that says, “I need to improve my number sense.” That’s a motto that – by my observation of the news media, social media, and society in general – would well serve the majority of people, including many who are good at “getting the right answer.”

I was wandering aimlessly round a county fair not long ago, and came across a stall with a game I remember playing as a teenager, many years ago back in the UK – and losing. [The image here is not the one I saw; I just pulled it from the Web for illustration.]

You pay an entry fee and are given three darts. The goal is (and this may vary, though likely not by much) that if two of your three darts lands in a card, you win a small prize, and if all three do you win a more substantial prize.

I remember that as a teenager I was surprised, and disappointed, when none of my three throws ended up in a card. Not one. After all, at first glance it looks as though the cards occupy most of the board, so the odds should be in your favor.

Understanding what is going on provided yet another great example of number sense, the topic I focused on in my previous two posts. Let’s do some quick calculations. As usual, they don’t have to be accurate; we can make simplifications.

There are 4 rows of 5 cards, so 20 cards in all. According to a quick google search, playing-cards typically measure around 2.5 in by 3.5 in. Based on that information, a brief glance at the photo shows that the cards are placed on the board with a vertical and horizontal separation of roughly 1.25 in, with an outer border roughly the same size. So we can conclude that the dimensions of the board are

[5 x 2.5 + 6 x 1.25 = 12.5 + 7. 5 = 20] in wide by [4 x 3.5 + 5 x 1.25 = 14 + 6.25 = 20.25] in high.

Which means the total target area is 20 x 20.25 = 405 sq in.

The area of a single card is 2.5 x 3. 5 = 8.75 sq in, so the total area occupied by the cards is

[20 x 8.75 = 175] sq in.

Hence, the proportion of the target area occupied by a card is 175/405, or approximately 0.43.

That’s less than half, so the odds are against you on each throw. Assuming your throwing is random, which for most of us is will be, your throws will fail to land on card 57% of the time.

The probability of getting all three darts to land on cards to win a “substantial” prize is 0.43^{3}, which is approximately 0.08. You will win a real prize only 8 times in 100 attempts.

In fact, the odds are against you even winning a small prize, since the probability of two darts landing on a card is 0.43^{2}, or approximately 0.2, so you fail to win anything at all on 4 of every 5 attempts.

So the game is fine for a bit of fun, where you play maybe one or two rounds. Especially if the stall is there to raise money for a worthy charity. But for most of us, this is definitely not financially wise.

As for skilled darts players, in a commercial setting, the stallholder will surely stop anyone playing once they recognize the contestant’s skill (as happens in casinos with blackjack), so even if you play well, the odds are still heavily stacked against you making a killing.

This tweet caught my eye a few weeks ago. It was from a well-known political commentator for a major US newspaper, who I follow. I am obscuring his name since his identity is not relevant to this post. What does bother me, is that an influential commentator with a large pulpit can display such a lack of number sense. (I still enjoy reading him; he has a good grasp of many things. But not where numbers are concerned.)

In fact, the tweet did more than catch my eye; it jumped right off the screen and nigh on blinded me. Regardless of whether the initial assumption is correct or not, the conclusion is so obviously wrong on simple numerical grounds.

As usual, a very simple, made-up example is all it takes to highlight the fallacy. Also as usual in such situations, virtually no mathematics is involved here, and no arithmetic beyond very simple stuff with whole numbers. Number sense is not arithmetic. The goal is to get a quick numerical sense of the issue

Assume that a worker making $25,000 a year in 2010 saw their wages rise by 5% over the ten-year period to today. They would now be making $25,000 + $1,250 = $26,250. Compare that with their boss, who made $100,000 a year in 2010 but whose income rose only by 2.5%. In 2020, they bring in $100,000 + $2,500 = $102,500. Their pay increased at half the rate of their low-paid employee, but that meant their income rose by twice as much.

As with the example of the “danger of drinking wine” I wrote about last time, it all depends on the base figures on which the percentages are calculated.

How close to reality is my made-up-on-the-spot, number sense example? I googled “rise in wages”, and immediately found myself looking at a recent report from the Brookings Institute that gave me all I need. (I chose a page from a source I knew to be reliable.) For the low-paid worker, my example was not too far off. From 2010 to 2018, the bottom 10th percentile of hourly wages grew 5.1%. But I was way off with my number for the boss. In fact, the entire top 90^{th} percentile of workers saw their income go up by an average of 7.4%.

What this tells us is that the opinion writer not only has poor number sense, he is not even able to get his underlying numerical facts right. Of course, you might say the writer set out to deliberately mislead. That’s possible, I suppose, but I doubt it. His article was in a major national newspaper that has fact-checkers, so both he and the publisher knew that his claims could easily be checked. To continue doing his job, readers have to trust him. Much more likely, I think, the issue was innumeracy. Like many people I know, when faced with figures, both the writer and his editor likely found their eyes glazing over.

Yet to anyone with number sense, the writer’s tweet jumps off the page as being absurdly wrong. What the actual numbers are does not matter. Even if his assumption about the relative rises had been correct, his conclusion would be false, as my initial made-up example showed.

In fact, my one use of google showed that even his starting point was off-the-charts, factually wrong: wages of higher-paid workers had risen a lot faster than for those at the bottom.

The sad fact is that the majority of people in the media have very poor educational backgrounds in mathematics, science, or engineering. Numbers are alien to them.

Given the huge significance of numerical data in today’s world, that tells us that our educational system needs to change. Arguing about the pros and cons of teaching Algebra 2 or Calculus seems totally misplaced, when we are producing so many people who lack basic numeracy.

In the meantime, if we cannot rely on the media to get the numbers right—by which I mean, ballpark, number-sense, appreciate-it-when-you-see-it right—then it is up to we citizens to protect ourselves from being misled.

And, for those of us in the education business, to make sure our students can be both good consumers of quantitative information and good communicators of numerical data.

A family member alerted me recently to the Medium article with the above headline. Knowing I have a long-adopted European lifestyle habit of having a glass (sometimes two) of wine with a meal, she was concerned about what seemed to be a significant health risk.

The article led off with this paragraph:

It was so comforting to think that a daily glass of wine or a stiff drink packed health benefits, warding off disease and extending life. But a distillation of the latest research reveals a far more sobering truth: Considering all the potential benefits and risks, some researchers now question whether any amount of alcohol can be considered good for you.

As a scientist, I am always open to being convinced by hard data. So I read on. The article rapidly became more alarmist:

For years, moderate drinking — typically defined as one drink (such as regular beer or a glass of wine) per day for women and up to two for men — had been billed as a way to reduce the risk of stroke, in which a vessel carrying blood to the brain bursts or is clotted. But a study earlier this year, involving more than 500,000 men and women in China and published in the Lancet, refutes that claim. “There are no protective effects of moderate alcohol intake against stroke,” says one of the study’s co-authors, University of Oxford professor Zhengming Chen. “Even moderate alcohol consumption increases the chances of having a stroke.”.

Wow!

While not for a moment doubting the validity of the science and the conclusions the researchers stated, I nevertheless wondered if there were really any cause for alarm. Just how significant is the risk I am being exposed to as I sip my Pinot Noir? After all, from a scientific standpoint, a well-established increased risk of 0.5% can merit publication in a professional journal, but most of us would discount such a low increase when it comes to deciding to give up an activity that gives us a great deal of pleasure. Maybe the article writer was being unjustifiably alarmist.

In the case of moderate wine drinking, there is certainly a well-established correlation between countries where that is common practice and countries with longer life-expectancy—though as is often the case, establishing any causation is a tricky challenge. (I tend to think that the pleasure I get from a glass of good Pinot, together with the relaxed sensation is produces, has a net beneficial effect on my overall health. That’s a plausible assumption, but did the new study show that effect was outweighed by negative consequences?)

In fact, the article quoted far more alarmist figures:

The study, published in the Lancet, found a “strong association between alcohol consumption and the risk of cancer, injuries, and infectious diseases.” Among other findings, just one drink daily was linked to a 13% increased risk of breast cancer, 17% increased risk of esophageal cancer, and 13% higher risk of cirrhosis of the liver.

Those percentages definitely grab the attention. But again, while not for a moment disputing them as valid scientific findings, I wondered how significant they are in making a life decision.

Time to do some quick calculations.

According to the article, 88,000 people in the United States die each year from alcohol-related causes. Let’s assume the bulk of those are adults (i.e., legal drinkers). The total population of people over 21 in the United States is about 200,000,000. So according to the figure quoted, the proportion of American adults who die from alcohol each year is 88 out of 200,000.

To simplify the calculation, let’s assume the situation is worse, and 100 out of 200,000 die each year. That’s 10 people in every 20,000. I dropped down to 20,000 because that’s something we can visualize, as many stadiums and arenas have capacities of around that number. A quick Google search revealed that the Amway Center in Orlando, Florida has a stated capacity of exactly 20,000.

I can now get a visual representation of the annual risk of death from drinking alcohol as 10 people in the crowded stadium. (Other well-known examples I came up with are Madison Square Garden in New York City, with a capacity just over 20,000, and Meadowlands Arena in New Jersey that comes in just under 20,000. Imagine ten individuals in either of those facilities.)

Personally, I don’t find that particularly scary. If it were not for the degree to which a glass of wine with my meal gives me significant pleasure, it would absolutely make sense to avoid the risk of being among those ten people in the stadium. But in my case, I’ll take that risk. Not least because I suspect that most of those deaths are from people who drink a lot more than I do. (The writer of the buzzkill article responsibly alludes to the far greater risks associated with excessive alcohol consumption and binge drinking, where to my mind the data cannot be dismissed.)

But the focus of the article was not on the base-level risks, rather the increased risks of one drink each day. That has me right in the crosshairs. What is the significance of that particular risk?

Again, let’s look at a scenario worse than the one reported, to make the math simpler. Assume there is an increased risk of 20% — bigger than those reported increases of 13% and 17%. That would mean 12 people in the Amway Center; 2 up from 10. Now I can visualize, and hence understand, my increased risk as follows. (I’m assuming I were not already in the danger group, but become so as a result of my daily glass of wine. Again, my goal is not to produce accurate risk data, but to visualize, and hence understand, the data presented. I am making simplifications that, if anything, create a picture showing greater risks than the actual data indicate.)

In any one year, 10 of those people in Amway Center will die as a result of alcohol. If you don’t drink, you will not be one of those 10. The increased risk as a result of having one drink daily enlarges that danger group of 10 people to a group of 12 people. At least to my eyes, the difference between 10 people and 12 people in that massive Center is nothing like enough to convince me to give up my daily glass of wine. I’ll accept the risk.

In my scenario, those two additional people in the audience are the ones who get there as a result of one glass of wine a day. Becoming one of those two individuals in the Amway Center are the ones the buzzkill article is warning us about.

Just to be doubly sure I’m being sensible, however—after all, I have a vested interest in convincing myself my wine habit is not unwise—I decided to google common causes of accidents and deaths.

According to my search, taking a shower turns out to be a dangerous activity. (For my entry “accidents in the bathroom”, Google returned over 22 million hits.) But what are the figures? In the U.S., roughly one person per day dies in the shower, mostly adults. In an adult population of around 200 million, that’s a low risk. But that’s the risk of death. Statistically of more significance, around 250,000 people aged 16-or-older per year have an accident in the bathroom that requires a visit to the emergency room (with 14% requiring hospitalization). That represents 25 people in Amway Center. We all accept the risk of being one of those 25 every time we step into the bathroom.

None of this is to say we should ignore the valuable evidence science provides when it comes to risks. On the contrary, I went through the above exercise because I habitually try to maximize my length and quality of life. Every action we take carries risks—including inaction. We just have to make wise decisions that balance the pluses and minuses of everything we do.

To do that, it helps to visualize the risks in some real-world scenario we are familiar with. I tend to go for movie theaters, sports stadia, and the like. It usually doesn’t require any complicated calculations; it’s number sense rather than arithmetic. Grab a few relevant figures from the Web and make gross simplifications that if, anything, make things worse than they really are.

In the situations described in the article, that “20%” figure refers to a 20% increase in size of the risk group. But if that risk group is already small, the increased risk group will still be small. In the case of moderate alcohol consumption, it corresponds in going from ten people in the vast Amway Center to being twelve in Amway Center.

But the question I asked—and everyone faces questions like this all the time—is this: Will I accept the risk if there is a personal cost to me. Once I had a clear way to visualize that risk, I found the decision easy. Pass the Pinot!

That’s not an endorsement or a recommendation. It’s my personal decision arrived at from a good understanding of the risks, based on hard data. (Hence no photo of a glass of wine.) Comparable data on risks persuades me to always fasten my seat belt in a car and never go out on my bike without a helmet. Why take a risk—however small—when there is no personal cost to avoiding it?

And that’s what this post is about: Applying number sense to make sense of numerical data in order to reach personally significant decisions. Once I had a good image in my mind to visualize the risks, I found the decision was easy.

But that was not the focus of the buzzkill article, where I found the complete absence of important contextual information to help readers understand what the data actually represents resulted in a story that I think is unduly alarmist. To repeat a recurring SUMOP theme, number sense is a critical mental tool in today’s data-rich world.

We’ve all been there. Someone shows us a method to solve a particular kind of problem and helps us as we try to go through the steps ourselves. Eventually, we feel confident we know how the method works. Then our instructor says, “Here is a similar problem. Try this one completely on your own.”

And we have no idea how to begin!

The instructor told us it is similar to the one we just saw. But we can’t quite see how it is similar.

Clearly, I’m not talking about elementary-grade worksheets designed to provide repetitive practice at one specific mathematical operation, such as addition or multiplication. Though even there, beginners can experience the same feeling of not knowing how to proceed, even if just one or two specific numbers change.

More significantly, however, instructors are almost certainly familiar with the situation where students in, say, a physics class are unable to solve a problem the requires the very linear algebra techniques they just applied to pass a test in the math class. Simply being in a different class can make a big difference.

The problem is even more pronounced when it comes to using mathematics techniques to solve real-world problems. Even when we are told explicitly that the same technique just used successfully to solve one problem can be used to solve the new one, fewer than 10% of us can actually do that.

This is not a sign of intellectual weakness. It’s a built-in feature of the human brain that cognitive scientists have known about and studied for many years. The good news is, it can be overcome. Those same cognitive science studies show us what we need to do to overcome the difficulty.

The starting point is the recognition that the evolution by natural selection of the human brain equipped it to learn naturally from experiences, to better ensure its survival, or to perform in a more self-advantageous fashion, when next faced with a similar experience. Learning through experience in this fashion is automatic, and tends to be robust, but is heavily dependent on the particular circumstances in which that learning occurred. It does not take much variation in the circumstances to render what was learned ineffective.

Let me give you an example. Imagine you are a doctor faced with a patient who has a malignant tumor in their stomach. It is impossible to operate on the patient, but unless the tumor is destroyed the patient will die. There is a kind of ray that can be used to destroy the tumor. If the rays reach the tumor all at once at a sufficiently high intensity, the tumor will be destroyed. Unfortunately, at this intensity the healthy tissue that the rays pass through on the way to the tumor will also be destroyed. At lower intensities the rays are harmless to healthy tissue, but they will not affect the tumor either. What type of procedure might be used to destroy the tumor with the rays, and at the same time avoid destroying the healthy tissue?

You might want to think about this for a while before reading on. Fewer than 10% of subjects are able to solve it at first encounter.

Have you solved it?

Are you one of that ten percent?

Okay, let’s move on.

The solution is to direct multiple low-intensity rays toward the tumor simultaneously from different directions, so that the healthy tissue will be left unharmed by the low-intensity rays passing through it, but when all the low-intensity rays converge at the tumor they will combine to destroy it.

Easy, no? At least, it’s easy once you are shown how to solve it!

Here is a second problem. A General wishes to capture a fortress located in the center of a country. There are many paths radiating outward from the fortress. All have been mined so that while small groups of men can pass over the paths safely, any large force will detonate the mines. A full-scale direct attack is therefore impossible. What does the General decide to do?

Again, you might want to think about this before proceeding.

This too is solved by at most 10% of subjects when they first meet it.

How did you do with this one?

The General’s solution is to divide his army into small groups, send each group along a different path, and have the groups converge simultaneously on the fortress.

This is, of course, logically the same problem as the cancer treatment, and the same “hub and converging spokes” solution works in both cases. Except that there is no “of course” about it. A substantial proportion of people fail to recognize the similarity, even when presented with one problem right after the other, as I just did here.

Specifically, a number of studies have shown that, whereas only 10% of people solve the radiation problem at first encounter, if they are first shown the General’s problem and its solution and then presented with the radiation problem, about 30% are able to solve the radiation problem. They see the similarity. That’s up from the 10% of subjects who can solve the radiation problem in the absence of any priming, but the do not see a connection between the two problems.

Notice that, in this two-problem scenario, where the subjects are not told to look for a similarity, fewer than a third of them are able to spontaneously notice it. Moreover, this disparity arises despite the fact that, knowing they are subjects in a psychology experiment, one might expect that all subjects would consider how the first part might be related to the second. But they do not make that connection.

On the other hand, if a group is shown the two problems one after the other, and told to look for a similarity, then around 75% of them can solve the second problem. (But fully one quarter still cannot!)

In both study cases with the two problems, a substantial number of subjects see one as a problem about warfare and the other as a problem about medical treatment. They do not see an underlying logical structure that is common to both. That’s a significant finding, from which we can learn a lot.

Cognitive scientists use the term “inflexible knowledge” to refer to knowledge that is closely tied to the surface structure of the scenario in which it was acquired. It takes time, effort, and exposure to two or more different representations of what is in actuality the same problem in order to recognize the underlying structure that is the key to making that knowledge flexible—something to be applied to any novel scenario having the same underlying structure.

In the case of the radiation problem, 90% of people see it purely in terms of radiation—essentially, a physics problem, unconnected to the General’s problem (a problem about military strategies).

Yet those two instances of inflexible knowledge become mere applications of a more general strategy of a hub-and-spokes wheel, where activity that follows the spokes from the circumference inwards converges into a single combined action at the hub—once you have acquired that wheel concept. When you have, it’s easy to recognize new instances where it can be applied. Being on an abstract level, your knowledge is flexible. But getting to that more abstract, flexible-knowledge structure is not automatic. It requires work. In the case of the hub-and-spokes solution, it takes exposure to a third problem scenario before most people “get” the trick. (See Note 1 at the end.)

Here’s why this is relevant to mathematics. In order to solve mathematical problems (or, more generally, to solve problems using mathematical thinking), you generally need to identify the underlying logical structure, so you can apply (possibly with some adaptation) a mathematical solution you or someone else found for a structurally similar problem. That means digging down beneath the surface features of the problem to uncover the logical structure.

To be sure, once you have mastered—by which I mean fully understood—a particular mathematical concept, structure, or method, it becomes a lot easier to see it as the underlying structure (if indeed it is). For example, if you understand linear algebra, you will be able to identify many problems where it can be applied, in math, in physics, in economics, or whatever. Quite simply, your (flexible) knowledge of the (abstract) method makes it possible for you to recognize when it may be of use.

Here’s the educational rub. Actionable, flexible knowledge cannot be taught, as a set of rules. It has to be acquired, through a process of struggling with a number of variants until the crucial underlying structure becomes apparent. There is no fast shortcut.

How do you recognize an underlying abstract structure or achieve understanding of an abstract method in the first place? It seems important to make those connections between the abstract-and-general, to the concrete-and-particular. We learn best from concrete examples we experience, and the mind’s natural inclination is to store knowledge in a fashion closely bound up with the scenarios in which it was first acquired. Overcoming that constraint to learning to recognize abstract structural or logical patterns requires effort, and often the study of more than just a couple of examples. You need several examples and you need to go deep into them.

To sum up, in order to develop the flexible thinking required to tackle novel problems using mathematics—regardless of where those problems come from, what specific mathematical concepts they may embed, and what specific techniques their solution may involve—what is required is experience working in depth on a small number of topics, each of which can be represented and approached from at least two, and ideally more, different perspectives.

NOTE 1: For a more expansive discussion of these issues, with more examples, see my December 1 Devlin’s Angle post for the Mathematical Association of America.

NOTE 2: For anyone curious as to why the brain works the way it does, and why it finds some things particularly difficult to master, particularly the recognition of abstract structure, check out my book The Math Gene, published in 2001, where I draw on a wide range of results from several scientific disciplines in an attempt to shed light on just how the human brain does mathematics, how it acquired the capacity to do so, and why it finds abstraction so challenging.

A common theme among the articles and videos on this website is the regular use of online resources in developing and using mathematics. The image shown here (which you will find in several of my videos and linked-articles on the site) presents some of the digital tools professional mathematicians use most commonly when working on a problem.

This particular list is based on my own experience over several decades working on problems for industry and various branches of the US government, and in conducting various academic research projects, but also includes two tools (MATLAB and the graphing calculator) that I do not use but many other mathematicians do, which I list as a nod towards others’ toolboxes. I typically use those tools in the order they appear in the display, reading left-to-right.

Whenever I show this image to an audience, I inevitably remark that the use of, in particular, Google and Youtube requires a degree of sophistication in order to (1) find the most useful sites and (2) assess the reliability of the information provided on those sites. Item (1) requires sufficient knowledge to enter keywords and phrases that pull up useful resources; item (2) depends on a skillset generally referred to as digital literacy.

Given the central role of digital tools and online resources in contemporary mathematical praxis, these two skillsets are absolutely critical components in 21^{st} Century mathematical praxis. That means they have to be part of all students’ mathematics education.

The first skillset, being able to make effective use of search engines to navigate to relevant information and online resources, has to be provided in the mathematics class. It is only by having a good, broad overview of a sufficiently large part of the range of mathematical concepts, facts, methods, procedures, and tools that are available, that a student can select keywords and phrases to conduct a productive search. Such an overview can be acquired only by experience, over several years.

Since there is no longer any need for a student to spend many hours practicing the hand execution of mechanical procedures for solving math problems in order to achieve accuracy and speed, the vast amounts of time freed up from what used to be a massive time sink, can be devoted to working on a wide variety of different kinds of problem.

[CAVEAT: Whether an individual teacher has the freedom to follow this strategy is another matter. Sensible implementation of mathematics education along the lines of the Common Core should, in theory, make this possible; indeed, the CCSSM were developed — as standards, not a curriculum — to facilitate this. But I frequently see complaints that various curricula and local school districts still insist on a lot of rote practice of skills that will never be used. Other than use my bully pulpit to try to change that, as I not infrequently do, I cannot remove that obstacle, I’m afraid.]

Turning to the second skillset, assessing the reliability of the information provided on a Web resource, in today’s world that needs to be a major component of almost every classroom subject.

In the case of Wikipedia, which is high on my list of mathematical tools, for post-secondary mathematics it is an efficient and highly reliable resource to find out about any particular mathematical definition, concept, or technique — its reliability being a consequence of the fact that only knowledgeable professional mathematicians are able to contribute at that level. Unfortunately, the same cannot be said for K-12 mathematics.

For example, a quick look as I was writing this post showed that the Wikipedia entry for multiplication is highly misleading. In fact, it used to be plain wrong until I wrote a series of articles for the Mathematical Association of America a few years ago. [See the section on Multiplication in this compilation post.] However, while the current entry is not exactly wrong, its misleading nature is a pedagogic disaster in the making. It therefore provides a good example of why the wise teacher or student should use Wikipedia with extreme caution as a resource for K-12 mathematics.

Ditto for Google, Youtube, and any other online resource. “Buyer beware” needs to be the guiding principle.

Unfortunately, a recent report from Stanford’s Graduate School of Education (see momentarily) indicates that for the most part, America’s school system is doing a terrible job in making sure students acquire the digital literacy skills that are of such critical importance to everyone in today’s world.

Note: I am not pointing a finger at any one person or any one group here. It is the education system that’s failing our students. And not just at K-12 level. Higher education too needs to do a lot more to ensure all students acquire the digital literacy that is now an essential life skill.

The Stanford report focuses on Civics, a domain where the very functioning of democracy and government requires high levels of digital literacy, as highlighted by the massive growth of “fake news” in the period leading up to the 2016 U.S. election, and subsequently. But the basic principles of digital literacy apply equally to mathematics and pretty well any discipline where online resources are used (which is pretty well any discipline, of course). So the Stanford report provides an excellent pointer to what needs to be done in all school subjects, including mathematics.

The report is readily available online: Students’ Civic Online Reasoning, by Joel Breakstone, Mark Smith, & Sam Wineburg, The Stanford History Education Group, 14 November, 2019. (Accessing it provides an easy first exercise in applying the reliability assessing skills the report points to!)

While I recommend you read the whole thing (there is a PDF download option), it is 49 pages in length (including many charts), so let me provide here a brief summary of the parts particularly pertinent to the use of online resources in mathematics.

First, a bit of general background to the new report. In November 2016, the Stanford History Education Group released a study showing that young people lacked basic skills of digital evaluation. In the years since then, a whole host of efforts—including legislative initiatives in 18 states—have tried to address this problem. Between June 2018 and May 2019, the Stanford team conducted a new, deeper assessment on a national sample of 3,446 students, chosen to match the demographic profile of high school students across the United States.

The six exercises in the assessment gauged students’ ability to evaluate digital sources on the open internet. The results should be a wake-up call for the nation. The researchers summed up the results they found in a single, dramatic sentence: “The results—if they can be summarized in a word—are troubling.”

The nation’s high school students are, it seems, hopelessly ill-equipped to use the Internet as a source for information. The report cites the following examples:

• Fifty-two percent of students believed a grainy video claiming to show ballot stuffing in the 2016 Democratic primaries (the video was actually shot in Russia) constituted “strong evidence” of voter fraud in the U.S. Among more than 3,000 responses, only three students tracked down the source of the video, even though a quick search turns up a variety of articles exposing the ruse.

• Two-thirds of students couldn’t tell the difference between news stories and ads (set off by the words “Sponsored Content”) on Slate’s homepage.

• Ninety-six percent of students did not consider why ties between a climate change website and the fossil fuel industry might lessen that website’s credibility. Instead of investigating who was behind the site, students focused on superficial markers of credibility: the site’s aesthetics, its top-level domain, or how it portrayed itself on the About page.

The assessment questions used by the Stanford team were developed after first looking at the ways three groups of Internet users evaluated a series of unfamiliar websites: Stanford freshmen, university professors from four different institutions, and fact checkers from some of the country’s leading news outlets.

Of particular relevance, the fact checkers’ approach differed markedly from the undergraduates and the professors.

When fact checkers landed on an unknown website, they immediately left it, and opened new browser tabs to search for information about the trustworthiness of the original source. (The researchers refer to this approach as lateral reading.)

In contrast, both the students and the academics typically read vertically, spending minutes examining the original site’s prose, references, About page, and top-level domain (e.g., .com versus .org). Yet these features are all easy to manipulate.

Fact checkers’ first action upon landing on an unfamiliar site is, then, to leave it. The result of this seemingly paradoxical behavior is that they read less, learn more, and reach better conclusions in less time. Their initial goal is to answer three questions about the resource: (1) Who is behind the information? (2) What is the evidence? (3) What do other sources say?

While the value of the fact-checkers’ approach is critical in navigating today’s online sources of news and current affairs, the approach is no less critical in using online resources in the STEM disciplines.

For instance, imagine the initial stage of collecting data for a mathematical analysis of problems about climate change, the effectiveness of vaccines, or diet — all topics that students find highly engaging, and which thus provide excellent projects to learn about a wide variety of mathematical techniques. In all three cases, there is no shortage of deliberate misinformation that must be filtered out.

Here then, is a summary of the six assessment tasks the Stanford team developed, listed with the fact-checkers initial questions being addressed in each case, together with (in italics) the specific example task given to the students:

Evaluating Video Evidence (What’s the evidence? Who’s behind the information? Evaluate whether a video posted on Facebook is good evidence of voter fraud.)

Webpage Comparison (Who’s behind the information? Explain which of two websites is a better source of information on gun control.)

Article Evaluation (What do other sources say? Who’s behind the information? Using any online sources, explain whether a website is a reliable source of information about global warming.)

Claims on Social Media 1 (What’s the evidence? Who’s behind the information? Explain why a social media post is a useful source of information about background checks for firearms.)

Claims on Social Media 2 (What’s the evidence? Who’s behind the information? Explain how a social media post about background checks might not be a useful source of information.)

Homepage Analysis (Who’s behind the information? Explain whether tiles on the homepage of a website are advertisements or news stories.)

The remainder of this post focuses on the study’s results of particular relevance to mathematics education. It is taken, with minimal editing, directly from the original report.

Overall, the students in the high schools study struggled on all of the tasks. At least two-thirds of student responses were at the “Beginning” level for each of the six tasks. On four of the six tasks, over 90% of students received no credit at all. Out of all of the student responses to the six tasks, fewer than 3% earned full credit.

Claims on Social Media question 1 had the lowest proportion of Mastery responses, with fewer than 1% of students demonstrating a strong understanding of the COR competencies measured by the task.

Evaluating Evidence had the highest proportion, with 8.7% earning full credit.

The Website Evaluation task (which is of particular significance for the use of online resources in mathematics) had the highest proportion of “Beginning” scores, with 96.8% of students earning no points. The question assessed whether students could engage in lateral reading—that is, leaving a site to investigate whether it is a trustworthy source of information. Students were provided a link to the homepage of CO2 Science (co2science.org), an organization whose About page states that their mission is to “disseminate factual reports and sound commentary” on the effects of carbon dioxide on the environment. Students were asked whether this page is a reliable source of information. Screen prompts reminded students that they were allowed to search online to answer the question. The few students who earned a Mastery score used the open internet to discover that CO2 Science is run by the Center for the Study of Carbon Dioxide and Global Change, a climate change denial organization funded by fossil fuel companies, including ExxonMobil. The Center for the Study of Carbon Dioxide and Global Change also has strong ties to the American Legislative Exchange Council, an organization that opposes legislative efforts to limit fossil fuel use.

A student from a rural district in Oregon wrote: “I do not believe this is a reliable source of information about global warming because even though the company is a nonprofit organization, it receives much of its funding from the “largest U.S. oil company–ExxonMobil–and the largest U.S. coal mining company–Peabody Energy” (Greenpeace). Moreover, Craig Idso, the founding chairman of the Center for the Study of Carbon Dioxide and Global Change, was also a consultant for Peabody Energy. It is no wonder this organization advocates for unrestricted carbon dioxide levels; these claims are in the best interest of the Center for the Study of Carbon Dioxide and Global Change as well as the oil and mining companies that sponsor it.”

Another student from suburban Oklahoma responded: “No, it is not a reliable source because it has ties to large companies that want to purposefully mislead people when it comes to climate change. According to USA TODAY, Exxon has sponsored this nonprofit to pump out misleading information on climate change. According to the Seattle Post-Intelligencer, many of their scientists also have ties with energy lobbyists.”

Both students adeptly searched for information on the internet about who was behind the site. Both concluded that the site was unreliable because of its ties to fossil fuel interests.

Unfortunately, responses like these were exceedingly rare. Fewer than two percent of students received a “Mastery” score. Over 96% of student responses were categorized as “Beginning”. Instead of leaving the site, these students were drawn to features of the site itself, such as its top-level domain (.org), the recency of its updates, the presence or absence of ads, and the quantity of information it included (e.g., graphs and infographics).

A student from suburban New Jersey wrote: “This page is a reliable source to obtain information from. You see in the URL that it ends in .org as opposed to .com.” This student was seemingly unaware that .org is an “open” domain; any individual or group can register a .org domain without attesting to their motives. A student from the urban South was taken in by CO2 Science’s About page: “The ‘about us’ tab does show that the organization is a non-profit dedicated to simply providing the research and stating what it may represent. In their position paper on CO2, they provide evidence for both sides and state matters in a scientific manner. Therefore, I would say they are an unbiased and reliable source.”As the Stanford team note in their report, “Accepting at face value how an unknown group describes itself is a dangerous way to make judgments online.”

In fact, students often displayed a confusion about the meaning of top-level domains such as .org. While there are many .org’s that work for social betterment, the domain is a favorite for political lobby groups and groups that cast themselves as grassroots efforts but which are actually backed by powerful political or commercial interests). For-profit companies can also be listed as .org’s. Craigslist, a corporation with an estimated $1 billion in revenue in 2018, is registered as craigslist.org. Nor is nonprofit status a dependable marker of an organization’s credibility. Obtaining recognition by the IRS as a public charity is an extremely easy thing to do. Of the 101,962 applications the IRS received in 2015, 95,372 were granted tax-deductible status—an approval rate of 94%.

Food for thought, don’t you agree?

For a discussion of the other items in the study, I’ll refer you to the report itself.

Let me end by re-iterating that the specific findings I listed above are all highly relevant to developing good skillsets for using online resources in mathematics.

Both doing math and playing chess are frequently touted as beneficial to developing good critical thinking skills and problem solving ability. And on the face of it, it seems that they self-evidently will have that effect. Yet, despite being a mathematician (though not a chess player), I always had my doubts. I long harbored a suspicion that a course on, say, history or economics would (if suitably taught) serve better in that regard. It turns out my suspicions were well founded. Read on.

The point is, mathematics and chess are highly constrained domains defined by formal rules. In both domains, the problems you have to solve are what are sometimes referred to as “kind problems”, a classification introduced in 2015 to contrast them to “wicked problems”, a term introduced in the social sciences in the late 1960s to refer to problems that cannot be solved by the selection and application of a rule-based procedure.

Actually, that is not a good definition of a wicked problem; for the simple reason that there is no good, concise definition. But once you get the idea (check out the linked Wikipedia entry above), you will find you can recognize a wicked problem when you see one. In fact, pretty well any problem that arises in the social sciences, or in business, or just in life in general, is a wicked problem.

For example, is it a good idea to install solar panels to power your home? Most of us initially compare several mental images, one of a bank of solar panels on a roof, another of a smoke-emitting, coal-fired, power plant, another of a nuclear power plant, and perhaps one of a wind-turbine. We can quickly list pluses and minuses for each one.

Given how aware we are today of the massive dangers of climate change resulting from the emission of greenhouse gases, we probably dismiss the coal-fired power plant right away.

But for the other three, you really need to look at some data. For example, solar panels seem to be clean, they make no noise, they require very little maintenance, and unlike wind turbines they don’t kill birds. But what is the cost of manufacturing them (including the mining and processing of the materials from which they are made), both monetarily and in terms of impact on the environment? What about the cost of disposing of them when they fail or become too old to function properly? Without some hard data, it’s impossible to say whether they are the slam-dunk best choice we might initially see them as.

In fact, as soon as you set aside an hour or so to think about this problem, you start to realize you are being drawn into a seemingly endless series of “What if?” and “What about?” questions, each requiring data before you can begin to try to answer it. For example, what if a house with a solar-paneled roof is burned in a wildfire, a possibility that residents in many parts of the western United States now face every year? Do those solar panels spew dangerous chemicals into the atmosphere when they burn at very high temperatures? How big a problem would that be? What if, as increasingly happens these days, an entire community burns? How many homes need to burn for the concentration of chemicals released into the atmosphere to constitute a serious danger to human life?

You are clearly going to have to use mathematics as a tool to collect and analyze the data you need to make some reliable comparisons. But it’s also clear that “doing the math” is the easy part—or rather, the easier part. Particularly when there are digital tools available to do all the calculations and execute all the procedures. (See below.) But what numbers to you collect? Which factors do you consider? Which of them do you decide to include in your comparison dataset and which to ignore?

Part of making these decisions will likely involve applying number sense. For instance, for some factors, the numbers may be too small (compared to those associated with other factors) to make it worthwhile including those factors in your quantitative analysis.

Or maybe—if you are very, very lucky—the numbers for one factor dominate all the others, in which case the problem is essentially a kind one, and you can get the answer by old-fashioned “doing the math.” But that kind of outcome is extremely rare.

Usually you have to make trade-offs and compare the numbers you have against other, less quantifiable considerations. This means that problems like this cannot be solved using mathematics alone. And that in turn means they have to be tackled by diverse (!) teams, with each team member bringing different expertise.

For sure, one of the team definitely needs to be mathematically able. But, while just one mathematician may be enough, the others should, ideally, know enough about mathematics to work effectively with the math expert (or experts) on the team.

This is, of course, a very different scenario from the notion of a “mathematical problem solver” that everyone had in mind when I learned mathematics in the 1960s. Back when I was working toward my high school leaving certificate and then my mathematics bachelors degree, with a view to a career as a mathematician, I imagined myself spending most of my professional time working alone. And indeed, for several years, that was the case. But then things changed. Keep reading.

I began this essay with a question: does learning math or playing chess make you a better reasoner—a better problem solver? I hope by now that the answer is clear. For kind problems, almost certainly it does. The largely linear, step-by-step process you need to solve a kind problem involves the same kind of mental processes as math and chess.

But for wicked problems, the above short discussion of selecting among alternative energy sources should indicate that the kind of thinking required is very different. And in an era when machines can beat humans at chess and can do all the heavy lifting for solving a kind math problem (see below), it’s the wicked problems that require humans to solve them.

In other words, in the world our students will inhabit, skill at solving wicked problem is what is needed. And that requires training in mathematics that is geared towards that goal.

So, what does this all mean for us mathematics educators?

The educational preparation for being able to solve wicked problems clearly (see above) has to be very different from what is required to develop the ability to solve kind problems. In domains like mathematics and chess, once you have mastered the underlying rules, repeated, deliberate practice will, in time, make you an expert. The more you practice (that is, deliberate practice—this is a technical term; google “Anders Ericsson deliberate practice”), the better you become.

In this regard, chess and mathematics are like playing a musical instrument and many sports, where repeated, deliberate practice is the road to success. This is where the famous “10,000 hours” meme is applicable, a somewhat imprecise but nevertheless suggestive way to capture the empirical observation that true experts in such domains typically spent a great many hours engaged in deliberate practice in order to achieve their success.

But deliberate practice does not prepare people to engage with wicked problems. And that is a major problem for educators, because, as I noted already, the vast majority of problems people face in their lives or their jobs today are wicked problems.

This state of affairs is new, at least for mathematicians. (Not for social scientists.) Until the early 1990s, mathematics educators did not have to face accusations that they were not preparing their students adequately for the lives they would lead, because being able to calculate (fast and accurately) was an essential life skill, and being able to execute mathematical procedures quickly and accurately was important in many professions (and occasionally in everyday life).

But the 1960s brought the electronic calculator, that could outperform humans at arithmetic, and the late 1980s saw the introduction of digital technologies that can execute pretty well any mathematical procedure—faster, with way more accuracy, and for far greater datasets, than any human could do. Once those technological aids became available, it was only a matter of time until they became sufficiently ubiquitous to render obsolete, human skill at performing calculations and executing procedures.

It did not take long. By the start of the Twenty-First Century, we were at that point of obsolescence (of those human skills).

To be sure, there remains a need for students to learn how to calculate and execute procedures, in order to understand the underlying concepts and the methods so they can make good, safe, effective use of the phalanx of digital mathematics tools currently available. But what has gone is the need for many hours of deliberate practice to achieve skills mastery.

The switch by the professionals in STEM fields, from executing procedures by hand to using digital tools to do it, happened very fast, and with remarkably little publicity. Consequently, few people outside the professional STEM communities realized it had occurred. Certainly, few mathematics teachers were aware of the scope of the change, including college instructors at non-research institutions.

But in the professional STEM communities, the change not only happened fast, it was total. The way mathematics is used in the professional STEM world today is totally different how it was used for the previous three thousand years. And it’s been that way for thirty years now.

As a consequence of this revolution in mathematical praxis—and it really was a revolution—the mathematical ability people need in today’s world is not calculation or the execution of procedures, as it had been for thousands of years, but being able to use mathematics (or more generally mathematical thinking) to help solve wicked problems. [Note that digital tools won’t solve a wicked problem for you. The most they can do is help out by handling the procedural math parts for you.]

Today, to solve a kind mathematical problem, there is almost certainly a digital tool that will handle it, most likely Wolfram Alpha. To be sure, some knowledge is required to be able to do that. And we must make sure our students graduate with those skills. But what they no longer need is the high skills mastery that requires years of deliberate practice. Adjusting to this new reality is straightforward, and many teachers have already made that change. You teach—and assess—the same concepts and methods as before, but with the goal being understanding rather than performance.

When it comes to solving wicked problems, however, what people need is an ability to use mathematics in conjunction with other ways of thinking, other methodologies, and other kinds of knowledge. And that really is a people thing. No current digital device is remotely close to being able to solve a wicked problem, and maybe never will be.

So what exactly do those of us in mathematics education have to do to ensure that our students acquire the knowledge and skills they will require in today’s world?

Well, the biggest impact in terms of changing course content and structure is at the college and university level. Major changes are required there, and indeed are already well underway. In particular, tertiary-level students are going to have to learn, through repeated classroom experiences, how to tackle wicked problems. Project-based teamwork is going to have to play a big role. (See below.)

In terms of K-12, however, there is a good argument to be made for continuing to focus on highly constrained, kind problems that highlight individual concepts and techniques. A solid grounding in basic mathematical concepts and techniques is absolutely necessary for any mathematical work that will come later.

That’s certainly an argument I would make—though as always when discussing K-12 education issues, I hold back from providing specific advice to classroom teachers, particularly K-10, since that is not my domain. Check out youcubed.org for that!

But I do have many years of first-hand experience of how mathematics is used in the world, and based on that background I can add my voice to the chorus who are urging a shift in K-12 mathematics education, away from basic computation and procedural skills mastery, to preparing students for a world in which using mathematics involves utilizing the available tools for performing calculations and executing procedures.

It’s definitely not a question of new content being needed (at the K-12 level). The goals of the Common Core State Standards for Mathematics already cover the main concepts and topics required. Today’s world does not run on some brand new kind of mathematics—though there is some of that, with new techniques being developed and introduced all the time. The familiar concepts developed and used over the centuries are still required.

Rather, the change has been in mathematical praxis: how math is done and how it is used. The main impact of the 1990s mathematical praxis revolution on K-12 is that there is no longer any need for repetitive, deliberate practice to develop fast, fluent skills at calculation and the execution of procedures—since those skills have been outsourced to machines. [Yes, I keep repeating this point. It’s important.]

Insofar as students engage in calculation and executing procedures—and they certainly should— the goal is not smooth, accurate, or fast execution, but understanding. For that is what they need (in spades) to make use of all those shiny new digital math technologies. (Actually, many of them are hardly new or shining, being forty years of age and older. It just took a while before they found their way outside the professional STEM community.)

So, while leaving it to experienced K-12 educators and academic colleagues such as Jo Boaler to figure out how best to teach school math for life in the 21^{st} Century, let me finish by giving some indication of how tertiary education (the world I am familiar with) is changing to meet the new need. That, after all, is what many high school graduates will face in the next phase of their education, so the more their K-12 experience prepares them for that, the better will be their subsequent progress.

In contrast to K-12, when it comes to tertiary education, other than in the mathematics major (a special case I’ll come back to in a future post), the focus should be on developing students’ ability to tackle wicked problems.

How best to do that is an ongoing question, but a lot is already known. That’s also something I’ll pursue in a future post. As a teaser, however, let me end by highlighting some key elements of the skillset required to tackle a wicked problem.

Let me stress that this list is one drawn up for college level students. In fact, this post is extracted and adapted from a longer one I just wrote for the Mathematical Association of America, a professional organization for college-level math instructors. Neither I nor (I believe) anyone else is advocating doing this kind of thing at levels K-10. (Though maybe for grades 11 and 12. I have tried this out with high school juniors and seniors, and it has gone well.)

[Incidentally, the list is not just something that someone dreamt up sitting in an armchair. Well, maybe it started out that way. But there is plenty of research into what it takes to produce good teamwork that achieves results. I get a lot of my information from colleagues at Stanford who work on these issues. But there are many good sources on the Web.]

To solve a wicked problem, you should:

Work in a diverse team. The more diverse the better.

Recognize that you don’t know how to solve it.

If you think you do, be prepared for others on the team to quickly correct you. (And be a good, productive “other on the team” and correct another member when required.)

OTOH, you might even not be sure what the heart of the problem really is; or maybe you do but it turns out that other team members think it’s something else. Answering that question is now part of the “solution” you are looking for.

Be collegial at all times (even when you think you need to be forceful), but remember that if you are the only expert on discipline X, the others do need to hear your input when you think it is required.

The other team members may not recognize that your expertise is required at a particular point. Persuade them otherwise.

Listen to the other team members. Constantly remind yourself that they each bring valuable expertise unique to them.

It’s all about communication. That has two parts: speaking and listening. If the team has at least three member, you should be listening more than you are speaking. (Do the math as to how frequently you “should” be speaking, depending on the size of the team.)

The onus is on you to explain your input to the others. They do not have your background and context for what you say. With the best will in the world—which you can reasonably expect from the team—they depend on you to explain what you are advocating or suggesting.

If the group agrees that one of you needs to give a short lesson to the others, fine. Telling people things and showing them how to do things are useful ways of getting them to learn things.

These are not rules; they are guidelines.

Guidelines can be broken. Sometimes they should be.

So there you have it. If you are teaching math in K-12 and you can ensure that when your students graduate they can thrive—and enjoy—working in that fashion, you will have set them up for life.

That’s the wicked truth.

NOTE:A longer, overlapping essay discussing kind versus wicked problems, but aimed at college and university research and education professionals, can be found in my November 1 post on theDevlin’s Angle pageon the Mathematical Association of America’sMATHVALUESwebsite.

On May 23, 2019, Stanford Mathematics Education Professor Jo Boaler, the founder and director of youcubed, and I sat down before a public audience in Cubberley Auditorium on the Stanford campus to have a discussion about the nature of 21st Century mathematics and the changes it requires to the way mathematics is taught in our schools. The (edited) video our our conversation is now available on this website and the youcubed website. (See the Videos page on either site.) Produced by youcubed in conjunction with SUMOP. Run time 31min 28sec.

There’s plenty of research into learning (from psychology, cognitive science, neuroscience, and other disciplines) that explains why learning mathematics (more precisely, learning it well, so you can use it later on) is intrinsically difficult and frustrating. But for non-scientists in particular, no amount of theoretical discussion will have quite the impact as the hard evidence from a big study, particularly one run the same way pharmaceutical companies test the effectiveness (and safety) of a new drug.

Unfortunately, studies of that nature are hard to come by in education—for the simple reason that, unlike pharmaceutical research, they are all but impossible to run in the field of learning.

But there is one such study. It was conducted a few years ago, not in K-12 schools, but at a rather unique, four-year college. That means you have to be cautious when it comes to drawing conclusions about K-12 learning. So bring your own caution. My guess is that, like me, when you read about the study and the results it produced, you will conclude they do apply to at least Grades 8-12. (I can’t say more than that because I have no experience with K-8, either first-hand or second.)

The benefits of conducting the study at this particular institution was that is allowed the researchers to conduct a randomized control study on a group of over 12,000 students over a continuous nine-year period starting with their first four years in the college. That’s very much like the large scale, multi-year studies that pharmaceutical companies run (indeed, are mandated to run) to determine the efficacy and safety of a new drug. It’s impossible to conduct such a study in most K-16 educational institutions—for a whole variety of reasons.

For the record, I’ll tell you the name of that particular college at the outset. It’s the United States Air Force Academy (USAFA) in Colorado Springs, Colorado. Later in this article, I’ll give you a full overview of USAFA. As you will learn, in almost all respects, its academic profile is indistinguishable from most US four-year colleges. The three main differences—all of which are important for running a massive study of the kind I am talking about—are that (1) the curriculum is standard across all instructors and classes, (2) grading is standardized across all classes, and (3) students have to serve five years in the Air Force after graduation, during which time they are subject to further standardized monitoring and assessment. This framework provided the researchers a substantial amount of reliable data to measure how effective were the four years of classes as preparation for the graduates first five years in their chosen specialization within the Air Force.

True, the students at USAFA are atypical in wanting a career in the military (though for some it is simply a way to secure a good education “at no financial cost”, and after their five years of service are up they leave and pursue a different career). In particular, they enter having decided what they want to do for the next nine years of their lives. That definitely needs to be taken into account when we interpret the results of the study in terms of other educational environments. I’ll discuss that in due course. As I said, bring your own caution. But do look at—and reflect on—the facts before jumping to any conclusion

If that last (repeated) warning did not get your attention, the main research finding from the study surely will: Students who perform badly on course assignments and end-of-course evaluations turn out to have learned much better than students who sail through the course with straight A’s.

There is, as you might expect, a caveat. But only one. This is an “all else being equal” result. But it is a significant finding, from which all of us in the math instruction business can learn a lot.

As I noted already, conducting a study that can produce such an (initially surprising) result with any reliability is a difficult task. In fact, in a normal undergraduate institution, it’s impossible on several counts!

First obstacle: To see how effective a particular course has been, you need to see how well a student performs when they later face challenges for which the course experience is—or at least, should be—relevant. That’s so obvious, in theory it should not need to be stated. K-16 education is meant to prepare students for the rest of their lives, both professional and personal. How well they do on a test just after the course ends would be significant only if it correlated positively with how well they do later when faced with having to utilize what the course purportedly taught them. But, as the study shows, that is not the case; indeed the correlation is negative.

The trouble is, for the most part, those of us in the education system usually have no way of being able to measure that later outcome. At most we can evaluate performance only until the student leaves the institution where we teach them. But even that is hard. So hard, that measuring learning from a course after the course has ended and the final exam has been graded is rarely attempted.

Certainly, at most schools, colleges, or universities, it’s just not remotely possible to set up a pharmaceutical-research-like, randomized, controlled study that follows classes of students for several years, all the time evaluating them in a standardized, systematic way. Even if the course learning outcomes being studied are from a first-year course at a four-year college, leaving the student three further years in the institution, students drop out, select different subsequent elective courses, or even change major tracks.

That problem is what made the USAFA study particularly significant. Conducted from 1997 to 2007, the subjects were 12,568 USAFA students. The researchers were Scott E. Carrell, of the Department of Economics at the University of California, Davis and James E. West of the Department of Economics and Geosciences at USAFA.

As I noted earlier, since USAFA is a fairly unique higher education institute, extrapolation of the study’s results to any other educational environment requires knowledge of what kind of institution it is.

USAFA is a fully accredited undergraduate institution of higher education with an approximate enrollment of 4,200 students. It offers 32 majors, including humanities, social sciences, basic sciences, and engineering. The average SAT for the 2005 entering class was 1309 with an average high school GPA of 3:60 (Princeton Review 2007). Applicants are selected for admission on the basis of academic, athletic, and leadership potential, and a nomination from a legal nominating authority. All students receive 100 percent scholarship to cover their tuition, room, and board. Additionally, each student receives a monthly stipend of $845 to cover books, uniforms, computer, and other living expenses. All students are required to graduate within four years, after which they must serve a for five years as a commissioned officer in the Air Force.

Approximately 17% of the study sample was female, 5% was black, 7% Hispanic, and 5% Asian.

Academic aptitude for entry to USAFA is measured through SAT verbal and SAT math scores and an academic composite that is a weighted average of an individual’s high school GPA, class rank, and the quality of the high school attended. All entering students take a mathematics placement exam upon matriculation, which tests algebra, trigonometry, and calculus. The sample mean SAT math and SAT verbal are 663 and 632, with respective standard deviations of 62 and 66.

UAAFA students are required to take a core set of approximately 30 courses in mathematics, basic sciences, social sciences, humanities, and engineering. Grades are determined on an A, A-, B+, B, …, C-, D, F scale, where an A is worth 4 grade points, an A- is 3.7 grade points, a B+ is 3.3 grade points, etc. The average GPA for the study sample was 2.78. Over the ten-year period of the study there were 13,417 separate course-sections taught by 1, 462 different faculty members. Average class size was 18 students per class and approximately 49 sections of each core course were taught each year.

USAFA faculty, which are both military officers and civilian employees, have graduate degrees from a broad sample of high quality programs in their respective disciplines, similar to a comparable undergraduate liberal arts college.

Clearly, in many respects, this reads like the academic profile many American four-year colleges and universities. The main difference is the nature of the student body, where USAFA students enter with a specific career path in mind (at least for nine years), albeit a career path admitting a great many variations, perhaps also, in many cases, with a high degree of motivation. While that difference clearly has to be taken in mind when using the study’s results to make inferences for higher education as a whole, the research benefits of such an organization are significant, leading to results highly reliable for that institution.

First, there is the sheer size of the study population. So large, that there was no problem randomly assigning students to professors over a wide variety of standardized core courses. That random assignment of students to professors, together with substantial data on both professors and students, enabled the researchers to examine how professor quality affects student achievement, free from the usual problems of student self-selection.

Moreover, grades in USAFA core courses are a consistent measure of student achievement because faculty members teaching the same course use an identical syllabus and give the same exams during a common testing period.

Student grades in mathematics courses, in particular, are particularly reliable measures. Math professors grade only a small proportion of their own students’ exams, which vastly reduces the ability of “easy” or “hard” grading professors to affecting their students’ grades. Math exams are jointly graded by all professors teaching the course during that semester in “grading parties” where Professor A grades question 1 for all students, Professor B grades question 2 for all students, and so on. Additionally, all professors are given copies of the exams for the course prior to the start of the semester. All final grades in all core courses are determined on a single grading scale and are approved by the department chair. Student grades can thus be taken to reflect the manner in which the course is taught by each professor.

A further significant research benefit of conducting the study at USAFA is that students are required to take, and are randomly assigned to, numerous follow-on courses in mathematics, humanities, basic sciences, and engineering, so that performance in subsequent courses can be used to measure effectiveness of earlier ones—which, as we noted earlier, is a far more meaningful measure of (real) learning than weekly assignments or an end-of-term exam.

It is worth noting also that, even if a student has a particularly bad introductory course instructor, they still are required to take the follow-on related curriculum.

If you are like me, given that background information, you will take seriously the research results obtained from this study. At a cost of focusing on a special subset of students, the statistical results of the study will be far more reliable and meaningful than for most educational studies. Moreover, the study will be measuring the important, long term benefits of the course. So what are those results?

First, the researchers found there are relatively large and statistically significant differences in student achievement across professors in the contemporaneous course being taught. A one-standard deviation increase in the professor fixed effect (a variable like age, sex, ethnicity, or qualifications, that is constant across individuals) results in a 0:08 to 0:21-standard deviation increase in student achievement.

Introductory course professors significantly affect student achievement in follow-on related courses, but these effects are quite heterogeneous across subjects.

But here is the first surprising result. Students of professors who as a group perform well in the initial mathematics course perform significantly worse in the (mandatory) follow-on related math, science, and engineering courses. For math and science courses, academic rank, teaching experience, and terminal degree status of professors are negatively correlated with contemporaneous student achievement, but positively related to follow-on course achievement. That is, students of less experienced instructors who do not possess terminal degrees perform better in the contemporaneous course being taught, but perform worse in the follow-on related courses.

Presumably, less academically qualified instructors may spur (potentially unsustained) interest in a particular subject through higher grades, but those students perform significantly worse in follow-on related courses that rely on the initial course for content. (Interesting side note: for humanities courses, the researchers found almost no relationship between professor observable attributes and student achievement.)

Turning our attention from instructors to students, the study found that students who struggle and frequently get low grades tend to do better than the seemingly “good” students, when you see how much they remember, and how well they can perform, months or even years later.

This is the result I discussed in the previous post. On the face of it, you might still find that result had to believe. But it’s hard to ignore the result of a randomized control study of over 12,000 students over a period of nine years.

For me, the big take-home message from the study is the huge disparity between course grades produced at the time and assessment of learning obtained much later. The only defense of contemporaneous course grades I can think of is that in most instances they are the only metric that is obtainable. It would be a tolerable defense were it not for one thing. Insofar as there is any correlation between contemporaneous grades and subsequent ability to remember and make productive use of what was learned in the course, that correlation is negative.

It makes me wonder why we continue, not only to use end-of-course grades, but to frequently put great emphasis on them and treat them as if they were predictive of future performance. Continuous individual assessment of a student by a well trained teacher is surely far more reliable.

A realization that school and university grades are poor predictors of future performance is why many large corporations that employ highly skilled individuals increasingly tend to ignore academic grades and conduct their own evaluations of applicants.