36 Comments

I love it when you write about the academy. Each academic discipline is run by a collection of the smartest people in the room chosen for lifetime employment by someone with one or two degrees of separation from their dissertation advisor. This makes it extraordinarily difficult for academics to "think how flimsily constructed everything is that we know" because, to borrow Upton SInclair's great line, their salary depends on them not understanding it.

Your insider out perspective feels more likely to change minds, or at least open them a crack, than people from the humanities (we have our own problems), or worse, historians, scolding them for not seeing the whole for the parts, for missing life in lifeless numbers, or correcting a date of publication, or whatever.

That said, as a historian, I need you and anyone who reads your comments to know that Charles Peirce (he pronounced it "purse" because Boston) published that wonderful essay in The Monist in 1891. Peirce was a weirdo, iconoclast, the son of a famous Harvard professor, an unrepentant asshole who endured an incurable and painful condition called facial neuralgia, and one of the finest metrologists and neologists in history who gave William James some of his best ideas. Peirce was the sort of writer Emerson had in mind when he said "Beware when the great God lets loose a thinker on this planet."

He was also mostly forgotten until John Dewey and a few other admirers gathered what they could find of his essays and got them published in that 1923 volume. This has led to a slow, steady revival of interest in his writing among historians and the occasional philosopher or member of the Santa Fe Institute's faculty.

The extent to which I can follow you into the thickets of probability maths and statistical arcana is due to my attempts to understand words that Peirce put down in his often difficult nineteenth-century prose. Hence, my over-excited response to seeing his name in one of your essays.

Let me leave you with my favorite of Peirce's neologisms: fallibilism. Peirce defined this as “the doctrine that our knowledge is never absolute but always swims, as it were, in a continuum of uncertainty and of indeterminacy.”

Expand full comment

Wow, but this is just amazing. Thank you for the date correction, I will update the piece.

Expand full comment

Spelling, too. I turned in a paper in grad school referencing Pierce instead of Peirce, and confidently pronounced his name like the character in MASH for most of the first class meeting in my first seminar on American intellectual history. Grad school really is about suffering to obtain knowledge.

Expand full comment

I may or may not update the spelling (house style includes typos, alas)

But thank you again very much

Expand full comment

Respect to the house style of my favorite publication on Substack.

Expand full comment

Someone clever enough could have inferred most of chaos theory from the fact that coin flips are deterministic

Expand full comment

The roulette is deterministic too. Why don't you hurry off to Vegas and make a killing.

Expand full comment

It is incredibly validating to see a take on statistics that manages to pass between the Scylla and Charbidis that on the one hand, recognizing asymptotic limits and bounds can fail to be sharp or even adequate descriptions of typical behavior, and on the other, accepting just how fragile the dependence on tail conditions is for many of these bounds to be valid at all.

Working in a part of ML that asks for finite sample bounds[^1] has been incredibly humbling relative to both the applied person's "inference is a hurdle I must jump through to show others that I was right along" and the theorist's idyll holidays in asymptopia.

Even there, many of the introductory treatments essentially say: just assume sub-Gaussian tails and apply something like a Hoeffding inequality and you will get essentially the familiar Central Limit Theorem results without having to go to asymptotics. And in some cases this is fine! But as you move beyond some classification approaches where sub-Gaussianity is enforced by construction, you are forced into the same dilemma. For anything like a reasonable description of even fairly simple regression behavior in a typical case, you start needing to look beyond what the tail behavior gives you to the bulk of the distribution, to something like a Bernstein inequality to get an appropriate "fast rates" condition. And in high dimensions, none of these will give you a good average representation and you need to run hat in hand to the statistical physicists to steal their random matrix theory (many of which are only rigorously known for Gaussian data, let alone sub-Gaussian). But then of course sub-Gaussianity is in so many applications completely untenable for real data without a priori bounds; in many of these cases the CLT tells us we will still get basically the same thing "eventually" and Berry-Esseen will even tell us that it won't take forever, but in many cases the tail behavior propagates terribly for quite a while. In these cases, robust statistics in the style of Huber can help accelerate the process, though statisticians are only quite recently developed robust procedures with near-sharp finite sample guarantees for estimating even the humble mean. See, e.g., Median-of-Means or Catoni estimators; the multivariate case gets even worse, resulting in procedures at the bound of computational feasibility.

It's almost as if there's no substitute for putting in the work and thinking hard case by case, and acknowledging how far we might be from getting things right...

Anyway, inspiring post; I would comment on the rest of it, beyond to say that I loved the Misak Ramsey biography, but the classics part runs into my humbling lack of erudition. My Homer knowledge comes almost exclusively from high school literature class heavily abridged excerpts and summaries and from Maya Deane's "Wrath Goddess Sing", which is basically a genderswapped AU extended Agamemnon/transfem Achilles/transmasc Briseis throuple Iliad slashfic and maybe just possibly differs from canon in some details

[1]: BTW, did you know that finite sample high-probability results for general GMM (a simple but seemingly rarely-done exercise I put in a paper a few years ago) require crazy tail assumptions? You almost certainly did, because you've run the simulations and worked out details for many special cases...

Expand full comment

I think even the asymptotics for GMM require the parameter to live in a compact space, so if one thinks of the parameter as being in the reals, one has to be ready to posit upper and lower bounds for it!! (this is true for the classical GMM asymptotics anyway, I am not up on any potential improvements though I believe they could be made)

thank u for this wonderful comment, Rachel. it always feels so cool to me to get to talk stats with you.

Expand full comment

You know I am willing to talk stats anytime, lol.

I am as guilty of relying on compact parameter spaces in my teaching and in my papers as most of the rest of us, but there are actually alternative proofs that don't require this! They're just a lot harder than the usual Glivenko-Cantelli on the objective proof those of us who learned from Newey-McFadden always use.

For M-estimators, see Lemma 5.2.3 for consistency and lemmas 7.2.1 or 7.2.3 for asymptotic normality (implicitly by condition C, where that <\epsilon condition allows localization of the parameter space that avoids compactness for the space as a whole), in these van de Geer notes: https://people.math.ethz.ch/~geer/empirical-processes.pdf . The consistency lemma replaces compactness with convexity, which nests most of the classic GLMs, but I believe versions of this argument can also be extended to convexity outside of a compact space, too. The idea is that as you go far enough out into the tails, the criterion must be high enough above any particular point that you can ignore it for purposes of minimization, and this lets you replace your space with a subset that can be handled by uniform methods. I seem to recall that early versions of this approach went back to maybe Wald, though my notes aren't at hand so don't quote me on that.

Note that these arguments don't get you out of thinking hard about tails when you don't explicitly impose compactness. On my first applied GMM project, the model had a non-compact parameter space and failed to be convex outside of a ball, and I lost days to terrible optimizations with estimates shooting off to infinity. The conditions really do matter. But we know OLS works, we know GLMs work and we can explain it. But yeah, the proofs that nest all those cases get ugly, and I avoid them except when pressed on it. So thank you for pressing!

Expand full comment

Oddly related, and likely not-interesting to you, in computer science people have long tried to model systems using "nondeterminism" which is a kind of "random but without probabilities". Complex systems, especially ones with parallel or interleaved computations, have a property that running the same test multiple times may produce multiple results and one approach is to think of the systems as inherently non-deterministic devices which "choose" among alternative paths in some unknowable way. I have tried to argue that it's more useful to think of these things as completely deterministic, but only partially specified and also dependent on inputs from not-modeled external systems (e.g. a user who might type "Peirce" or "Pierce"). Now I think I can tell people to consider these as analogous to coin flipping machines - deterministic but determined by factors we don't know. Anyway, apologies for the digression.

Expand full comment

no, i think this is very interesting! once i thought more about coin flipping i started to realise this must be deeply related to complexity theory, so i ordered a book of foundational papers in the field, we'll see. :)

Expand full comment

So i guess what i'm saying is i'd had the same thought from the other end!

Expand full comment

That's good, but you've made the crucial mistake of encouraging me.

The early Rabin-Scott paper is really good - and much later work by lesser people is much more obscure.

https://www.scribd.com/document/457114256/1959-Rabin-Scott-Finite-automata-and-their-decision-problems-pdf. (I really hate the IEEE and its efforts to lock up such papers forever)

Another way of looking at it is to think of these machines as maps from finite strings (a history of discrete events) to output values. That's deterministic. But non-deterministic are maps from strings to sets of output values.

For use in complexity theory, I'd recommend the old Gary and Johnson book

https://en.wikipedia.org/wiki/Computers_and_Intractability

Expand full comment

Awesome recs, thank you :) !

Expand full comment

I also hope you feel better soon— I didn’t know you were sick— and this essay makes me feel like a child on tiptoes looking over a counter. I read the whole thing, though, and I think I’m understanding wisps of it. People’s habit of misunderstanding stats and chance— that’s an area where I think I understand it at least better than the other untrained people do, and if I interpret you correctly, even the trained people often misunderstand its implications. I can relate to that in my own academic field.

I also understand the paradox of breadth and synthesis of knowledge; it gets me in trouble all the time. I’m one of the breadth people.

I’ve been reading a book of Henry James essays for review, and I will go back to that book feeling cockier. Bring it on, Henry. Henry James will feel like a comfy slipper after this.

Expand full comment

"Henry James will feel like a comfy slipper after this" is such an endorsement lol :)

Expand full comment

inspiring essay in more than one way. im teaching the neoclassical growth model under uncertainty this week, hope my students dont mind the long digression that will come out of reading this.

Expand full comment

Thank you mateo, I hope it goes well, & who doesnt like a long digression tbh

Expand full comment
Jan 20Edited

Loved this one! And many others- thanks for writing&posting them!

Excited to read Diakonis and Skyrms :)

Have you read After Sappho?

- Ch 14: ‘for so long we had said to ourselves that we were going to be Sappho that Cassandra’s words were strange on our tongues’: academics find it hard to admit we know so little because our dreams started by falling in love with dazzling Ramseys who- so young and long ago- seemed already to bring clarity in place of confusion

(Flying in defiance of your commandment, I’ll sheepishly share one thought you might like about QM in demure little brackets here, hopefully excused by its Ramsey connection (apologies if already known): The multiverse interpretation of QM is entirely deterministic. When a measurement of a mixed state (e.g.|state1>= |live cat> +|dead cat>) is made, the uncertainty isn’t randomly ‘collapsed’ into one reality. Instead, the observer just becomes split like the system they were measuring (entangled with it), and so worlds in which you see the live and dead cat both occur and are equally real (|state2>= |’happy cat owner’; live cat>+|’sad cat owner’; dead cat>). This causes a problem: in the real world we see one outcome or another, not both, and with probabilities captured by the magnitudes of the states in the wavefunction. In the emergent multiverse (the modern bible of the multiverse interpretation, which also has a nice chapter on problems in the philosophy of probability (problems defining objective chance) which you might like on a rainy day), David Wallace proves a representation theorem-a la Ramsey!- showing that under a set of rationality assumptions (the standard ones for these theorems plus some extras relating to how it’s reasonable to respond to symmetric quantum states) a rational actor in a world with multiverse QM must act as though they are maximising expected value over the different future branches, with probabilities of the branches given by their weight in the wavefunction. So there’s no ‘objective’ probability/randomness (which is great given that’s not a coherent concept) but its nonetheless rational to respond to the branching state space as though we do not know which of the deterministic branches we will end up in, and the exact credences we should give to each outcome are justified by symmetries of the wavefunction (just as we might justify our credence in different outcomes of a die roll by the way the symmetries of the die organise our ignorance of the actually already determined outcome).)

Expand full comment

Now, I'm just a humble electrical engineer from a backwoods university in the Midwest, but I can appreciate a finely crafted piece of writing when I see one. I had a more boring take a while ago:

https://realizable.substack.com/p/probabilities-coherence-correspondence

Expand full comment

Then you could appreciate how this is not a finely crafted piece of writing but tedious whining and "woe is me"

Expand full comment

As predicted I didn't understand a lot of this, but also as always your writing voice and essays are extremely enjoyable.

Expand full comment

I understood 20% of this…but it was beautiful.

Expand full comment

For a bunch of reasons, I didn't go to grad school. I missed quite a lot. For example, I did most of my early work on decision theory with no knowledge of Savage or Anscombe-Aumann. On the other hand, I did lots of things I probably wouldn't have, particularly political writing. My academic work is certainly very different as a result of missing that training. And obviously my life would have been radically different in unpredictable ways.

Expand full comment

I had no idea you didnt go to grad school! how did you end up in a career as an academic? (actually now im wondering how Persi got back on the straight and narrow, I know he quit school at some point, or that's the story as I recall people used to tell it...)

Expand full comment
Jan 25Edited

Are you not the prominent economist John Quiggin who got a Masters from ANU and PhD from the University of New England?

Expand full comment

I came by the beginnings of this instinct about chance in the swamps of computer performance monitoring/resource estimation back in the '80s, when systems measurement and management were well in the future. A guide of sorts in that world was Barry Merrill's "Guide to Computer Performance Evaluation Using SAS Systems", a Bible of the field and times. It was good enough that the gaps common practice came into focus in the data. Distributions definitely weren't normal/Gaussian. The need to believe made Cassandras of us, but how we correctly prophesied resource requirements was a mystery management raged about when we wouldn't give them averages, but which methods were thankfully beyond them, were pleased for a moment. Did Cassandra reveal her methods? :-)

This articulation/validation of that instinct, along with the book recommendations, is fuel for my itch to study Econometrics and Statistics for which I had only kindling, Sorry for gushing, but this is exciting!

I see victor yodaiken above is also in the field. <waves>

Expand full comment

excellent

Expand full comment

Good work.

Expand full comment

A reason for theorist superiority is that the greatest work of English-language drama to include the phrase “ a certain sub-sigma-algebra” is Kreps’s chapter on de Finetti, and that’s all of our’s favorite book.

Expand full comment

Ahhhh now this makes so much sense

Expand full comment

This is so good!! On so many fronts! I hope your illness has passed/passes quickly

Expand full comment

Thank you very much, Kody <3

Expand full comment
Jan 25Edited

How do you know all those things about "people"?

Expand full comment