This is the second installment of a series on the metaphysics of statistics, designed to remind us that statistics are not a science, that I have called When Central Tendency Junkies Attack. Part 1 is here.
When you hear an ambulance on a city street; when you can identify, after five minutes, the smartest – or most annoying – or least socially phobic – student in a college seminar; when you taste the excess salt in a bowl of soup; when you go to a party and cannot take your eyes off the lone celebrity; when a single baby, crying, ruins your transatlantic flight – when any of these things, and also many other things – even most other things – happen in your life, then you have come face to face with that demon of median-lovers everywhere: the right tail.
Or, as I once called him, Zach.
Some people do not like right tails, and I have been mulling over this dislike ever since this summer, when I published a post on this site titled “Jonah Lehrer is Not a Neuroscientist.” The post seemed to get a lot of attention (read the friendly comments!), I think because of the snarkiness of the title (he has written a book called Proust Was a Neuroscientist).
The post raised what I think is a concern about the discipline of Pop Neuroscience: can one person – even someone as motivated, eloquent, well-educated and thoughtful as Jonah Lehrer – really be expected to cover the neuroscience beat?And if not, what should we do to help him?
That question is in turn a reflection of a larger concern I have, which is that “the brain” is an ideological construct that we have confused with a scientific fact, but if we are going to out the ideology, and prepare ourselves to see ourselves differently, we are going to need more than one reporter on the case. I fear that most of us have been bamboozled – and bamboozled ourselves – into not noticing the ideology, and the small number of commentators on this topic, relative to its importance in our lives (e.g. 27 million Americans on antidepressants) isn’t exactly setting us up to see through all the spin. This blog is my effort to pull the little Dutch boy’s thumb out the dike and let the ocean of metaphysics he’s been keeping out to finally flood the city of science. It’s waaaaaaaay too dry in here.
Anyway, in my Jonah Lehrer post I made a two-part point that circles back – or up – to this larger point.
First, while Jonah Lehrer is not a neuroscientist in the technical sense (he doesn’t apply the empirical method to studying neurons – he writes about the work of people who do), in the minds of the general public he is one, in the sense that they have identified him as the source from which to draw their neuroscience. This is necessary because neuroscience does a terrible job of presenting its results to the public – people are way too busy writing grants and doing studies to spend time on popular writing – and this is why its translators often come to represent the field (kind of the way Carney represents Obama.) Lehrer has taken the time to develop the art of explaining how the brain works in human terms that the public can make use of; therefore he, and David Brooks, are the ones they turn to.
The second point of my post was that I doubted that anyone, even Eric Kandel, would be up to the job. A book is one thing, but the neuroscience beat is another. It’s nobody’s fault; it’s the nature of the project. The New York Times does not assign one reporter to the entire 2012 Presidential campaign, and ideally we’d have a team on neuroscience too, given the vast array of topics, all of which are more complex than even the most complex presidential candidate.
To bring this point home, I closed the piece with a renunciation of my neuroscientist status (more on that later): “There are just too many mistakes out there waiting to be made. I have no doubt that even in this post I have made my share of them. Please let me know. After all, I’m not a neuroscientist too.”
In between my two big points I presented as evidence Lehrer’s article on the wisdom of crowds in the WSJ, in which I felt three mistakes were made in his effort to translate a scientific piece into layman’s terms.
The first mistake was that he did not citing the study he was describing, and I had to sleuth out what that study was. Readers wanting to investigate the idea Lehrer raised were therefore blocked from being able to do so. Probably an editing error – but still, it was someone’s mistake.
The second mistake was that he cherry-picked his data. Thus he turned an article that should have been bouncing around the rim of his main idea into a slam dunk.
The third mistake was that the single data point he used came from a category of data – the median response – that the authors, in their methods section, did not indicate was their choice as their dependent variable. Jonah Lehrer himself graciously wrote to me:
Jonah Lehrer: I realize that, in this particular study, the geometric mean was a better demonstration of the WoC effect. That said, many other papers have found that the median does a better job, especially when dealing with crowds of naive, uninformed subjects.
Let’s put aside that Lehrer’s use of the word “particular” suggests what is possibly a willingness to value general patterns over particular instances (he seems to imply that the methods of related papers might be the methods of this one). This is a classic non-empirical, rationalist maneuver (valuing pre-existing ideas over data).
Let’s also put aside that “better” is also a funny word to use, because it implies he had discretion here. I think, more or less, he didn’t. It was the correct measure to use, because he was representing the article to the general public, and needs to report on the outcome variable the authors reported. As his cherry-picking decision implicitly showed, it was not the better demonstration – that was the whole reason he didn’t use it, unless my forensic pop neuroscience skills are misfiring very badly today. Lay readers were doubtless far more impressed by the median he used than any other data point, which is, sensibly enough, why he used it.
The reason I don’t like citing the wrong data from a paper, even if generally one might find that class of data useful to talk about, is that old adage about not wanting to see the sausage getting made. That adage is also true of science; I’m sure it is of every field. Outsiders, even seasoned reporters, who do not participate in the decision making process behind a paper don’t really know how the chefs decided what to put in and what to leave out. But if you have been behind the scenes of science, you know there is a fair amount of spin at work. Like anyone else scientists need money and want tenure. They therefore sometimes are tempted to spin out of papers their bad results and spin into their papers their best results, even if their best results are not from their best class of data. Often these are forthrightly described as their post hoc findings.
Therefore when a scientist appears to have obliviously walked past his best data and reported another less-good result – think again. The odds are that this is deliberate. There is often some problem with that point that you think he should have made, and he has walked past it out of necessity. But because you can never know for sure what’s really gone on behind the scenes unless you know the parties well, you can’t know anything but what you are told. And so if you are told that variable X is the key variable in a study, even if you think it should have been variable Y, you must go with reporting variable X.
Anyway, those were the three mistakes I identified, all in what I *hope* is the reasonable service of monitoring how pop neuroscience describes science to the general public.
Most of that got lost in the response, which blew me away. What really got people’s attention was that I implied that median values were not as good a way of representing a wise crowd’s views as mean values.
Now before I mount my metaphysical defense, let me just point out the smaller but probably better point that Galton – Galton Galton – the guy who started the the whole wise crowd thing – also used the mean.
I am not an expert on Galton and do not intend to become one, but here’s what I found in Wikipedia: “Galton was a keen observer. In 1906, visiting a livestock fair, he stumbled upon an intriguing contest. An ox was on display, and the villagers were invited to guess the animal’s weight after it was slaughtered and dressed. Nearly 800 participated, but not one person hit the exact mark: 1,198 pounds. Galton’s insight was to examine the mean of these guesses from independent people in the crowd: Astonishingly the mean of those 800 guesses was 1,197 pounds: accurate to fraction of a percent.“
Nevertheless, Galton was not around to defend me, and critics went bazonkers over my dis of the median per se, even if they would acknowledge that from the standpoint of accurate reporting it shouldn’t have been used. I was repeatedly told that I misunderstood the importance of medians; I was repeatedly told that they do not, as I said, represent the views of one guy in the middle of a group, and that get rid of long right tails and therefore have the great asset of making skewed distributions essentially normal.
Let’s put aside for a second that they are both wrong and right about me, personally. They are right to pick up that I don’t have any particular fondness for the median as a measure of central tendency. I love tails – I think tails are where the action is – and because medians cut off tails and lose information, I tend not to like them much.
Instead, let’s notice that my critics unintentionally demonstrate why I don’t like medians: in focusing on one detail of a long article, and not its main point either, they were focusing on the “right tail” of my own personal idiocy. That is, they forgot that I am largely just a mediocre mind, producing mediocre thoughts, with a small number of good ones, and this one howler.
It’s the howler they focused on – which is not a very median thing to do.
To which I say: yay! That’s my point. They helped to confirm precisely what they wished to disprove, namely that right tails dominate human psychology. And though I didn’t like the criticism, the point getting out there made me happy.
Now I fear that if any “central tendency junkes” are still reading this blog, they will get hot and bothered all over again on hearing me double-down on my Lehrer position, and accuse me of not understanding even more of the very statistics that they noticed me not understanding in first place. Nevertheless, I need to keep going, even if it means more of a dispute, because I want to use their confusion as a way of showing why statistics are a branch of metaphysics.
To make an analogy to the point that’s coming, I think for my critics to say that a median is better than a mean for measuring central tendency is analogous to saying the Yankees are a “better team” than the Red Sox for showing how much fun it is to watch baseball. Most of us would say this really isn’t a supportable position. Most of us, in the realm of baseball, would love the arbitrariness of fandom. But in statistics, we get all serious, and think that statistical truths are real.
And that’s what I think the larger problem is with my critics response. They seem to believe that statistics are an empirical science. In fact, they are a profoundly subjective tool; the statistical test you use is almost entirely determined by the question you are asking, and the question you are asking – as Hollis has argued niceley – is always subjective. That’s what Popper’s “problem of induction” is all about – that you can’t get the subjectivity out of science; it is always the bottom turtle.
My prank posting about this – in which I asked readers to imagine statistical tests being discovered at the LHC – was meant to drive home the absurdity that we all fall for periodically, I think, of imagining (wishing?) statistics to be an empirical science.
The insights that statistics gives us about the world are not empirical, but metaphysical: they provide a framework for viewing reality. Not the framework – a framework. Thus to say that one or another statistical test is “better” than another in any given situation – say that a median is better than a mean for understanding a group – is not a scientific statement. It is as metaphysical a statement as the assertion that the Yankees are a better baseball team than the Red Sox.
Now the mathyness of statistics – and the learning experience we have of statistics, in which there are right and wrong answers to the math problems posed per se – make this point feel intuitively ridiculous. Most people sputter for quite some time about the impracticality of this point when they first hear it – I know that I did. Statistics seem so stable and solid that it seems a bit cranky of me to assert that they represent merely metaphysical truths.
But in the long run I think that all of us suffer when we forget that, to quote my various critics that “liking” medians and thinking they “represent a group,” and my “cranky” “contempt” for medians and thinking they “represent one person” are in the end metaphysical beliefs.
To get to this point – where you intuitively feel the subjectivity at the root of statistics, and learn to resist, mentally, the pressure that science places upon you to accept as factual some statistical result – I want to give you a visceral understanding for why there is no such thing as an empirically “correct” way to look at central tendency.
And the third-grade tool I’m going to use to do this is Zach.
In the first graph, the kids were asked to select the average kid in their class and then rate his or her aggressiveness. Note that as compared to the Galton example from Wikipedia above, there are two important differences between this example and the Galton case. First, all of Galton’s respondents were looking at the same ox. In my case, each kid was imagining his own kid, so that they were each looking at a different “reality.” Second, there is no objective “reality” when it comes to rating aggression. There are competing measures, none of which predict aggression that well, and aggression itself, lacking any natural physical unit, is hard to define – unlike weight (which is made of those Higgs Bosons they should be announcing later today – fingers crossed!)
The two differences are related: in social science, we often ask respondents to choose their own stimuli to ensure similarity across respondents. For example, mothers may be asked to think of how much they love their own child, not a single “objective” child. In this way the respondent’s subjective experience is the reality in much of social science, a matter we will return to when we look at the three examples of crowd wisdom that Jonah Lehrer used to begin his article on the Wisdom of Crowds.
This in mind, the children’s responses produced this graph (for those who missed the snark, the whole example is made up and there is no Zach):
The results are what we’d expect. Because children were imagining the average kid in their class – definitely not the aggressive Zach – they essentially “cut off their own tails” from the start. There were no outliers in this graph, because none of these (imaginary) children had anxiety disorders that caused them to falsely attribute high aggressiveness to a normal child. And therefore the means and medians of each group were identical.
Now consider the graph I showed earlier.
Notice two things. First, in Mr. O’s class (blue), the median and mean aggressiveness are the same. That is, the kid who is smack-dab between the most and the least aggressive halves of the class (the median child) is the same amount aggressive as the mean aggressiveness of the whole class. In Mrs. R’s class (red), however, the mean aggressiveness is distorted by the presence of Hank, who’s off-the-charts aggressiveness is far higher than the class’s median aggressiveness. That is the class as a whole looks more aggressive than any one of its members save one – who, himself, is more aggressive than the class average.
Second, notice that the two classes look identical if you compare their medians. But if you compare the means, Mrs. R’s class looks far more aggressive. This should show you why I think of the mean as telling you more about what the group feels, about itself as a group qua group, in which Zach is a dominant – probably the dominant – force in the room. This should also show why I feel the median really is, psychologically, representative of one person in the middle of the group.Half the kids are more aggressive than him, half less; Zach is in this view “just” one member of the class.
But we all know – Zach is not “just” one member. He is ruining everybody’s year.
Now let’s look at the third graph in my earlier post, which I implied a realistic person, who works with kids, and who is fielding angry calls from parents, knows is closer to “the truth” of how the majority of kids are experiencing their year.
When the Principal asked children in the two classes how badly they wanted to switch classrooms, in an all-or-nothing manner, they all took Zach into account.
These three graphs alone should be sufficient to prove to you that statistics is not a branch of empirical science. The same mathematics can produce three very different portraits of a group and its wisdom, just as the same primary color paint can produce three very different portraits of a person. It all depends on the painter – or the mathematician.
This may help explain why so many of the complaints about my Jonah Lehrer article had a moral flavor – morality being a branch of metaphysics too. I was told that the median is a “good” or “better” or “appropriate” or “the fairest” way of describing a skewed data set, whereas my interest in the mean was – in the view of Melanie M – bordering on “unethical.”
Appealing to morality in matters of science is, of course, officially off the table, but as the reader can see it happens all the time in practice. And really, ever since the debate between “subjectivist” and “frequentist” (aka “objectivist”) camps of Bayesian statistics arose in the 1700s, as LaPlace made his big push into the art of prediction, morality and statistics have been warring siblings. Perhaps no phrase in the popular imagination captures our intuitive understanding of this link than Twain’s – “lies, damned lies, and statistics.”
To close this point with a thought experiment, I want to drive home the importance of subjectivity in measuring central tendency. I’ll even give it a title.
Zach and the Third Grade Time Machine
You, me, and my biggest critic take a time machine back to third grade (we just moved to town, and school is well underway).
We become eight years old again, and toddle out in mid-October at the school door, where we are met by our Principal.
He tells us that there are two third grade classes in this school- not Mr. O and Mrs. R, but two new, unknown classes, and that each class is identical in all ways except there may (or may not) be a difference in their aggressiveness.
He tells each of us he will give us one bit of information about the two classes. Either we can learn the median aggressiveness of each class, or the mean aggressiveness of each class. After we get that bit of information, we must put our heads together and, collectively, choose our class for the year. It has to be a group decision, majority rules (eg, 2-1 will suffice to make the call). Otherwise he will assign us randomly.
My biggest critic votes first, naturally for the median.
I vote second, naturally for the mean.
So, it’s down to you. You’re the deciding vote. Which will it be?
See the forum for this website at neuroself.lefora.com