Joel A. Middleton
Joel A. Middleton

Unifying design-based inference (working)
  1. On bounding and estimating the variance of any linear estimator in any experimental design
  2. A new variance estimation principle
  3. On point estimation
  4. New regression estimators


Background

Under the Neyman Causal Model (NCM), the only source of randomness in an experiment is the assignment mechanism. Draws from abstract distributions (parametric or otherwise), errors (iid or otherwise), conditional expectations, data generating elves (DGEs), and the like are all put aside. 

Even though it lacks these elements, the very ones that ordinarily put the sizzle in statisticsss, the NCM has a couple of modest virtues.  One is that it provides a degree of conceptual clarity. For example, the notion of non-random potential outcomes makes clear that treatment effects may be individual specific and that an average treatment effect is, in some sense, a mere description, applicable to the units at hand.  As such, for generalization a researcher might consult speculators, oracles, or the Magic 8-ball boxed with old stuffies under the guestroom bed at Mom's.  They might even consider asking a Bayesian, though this is a matter of taste, and perhaps one of last resort.

A second virtue is that the random assignment mechanisms that animate design-based statistics, can, themselves, be investigated empirically. In a world less strange, scholars in some segments might find this an attractive feature.  Empirical researchers, for example.

I address at least two common criticisms of design-based inference in this project, both of which have always seemed a little puzzling to me. The first is that the design-based framework is more difficult mathematically, because many common experimental designs have assignments that are not independent.  Were I to note that these same critics attend parties where the subject of asymptotics is considered an ice breaker, it would be besides the point. So instead, I will simply note that some effort is made in this project to streamline and simplify.

The second criticism is that, in many cases, the estimators and variance estimators that are derived under the design-based framework are the same or very similar to those that might be ordinarily used anyway. For example, OLS can be motivated by the design-based, parametric or semi-parametric models, but the set of assumptions used to justify it do nothing to alter the estimator itself. Likewise, variance estimators developed for the design-based framework are sometimes algebraically or asymptotically equivalent to those found in the semi-parametric framework.  Here I resist reminding the reader that this is what the Ptolemeans said when they had path dependence on their side, and simply note that a new variance estimation principle and several new types of regression have arisen from this project that are unlikely to come out of more common frameworks.

Paper 1 of 4 attempts to alleviate NCM-induced angina by way of carefully considered notation.  The framework is applicable to virtually any experimental design, any linear estimator and any variance bound. This allows for easy comparisons and analysis through straight-forward methods such as eigendecompositions of design matrices. One trade-off is that variance is not based upon stochastic "errors" leaping forth from the occult to animate point estimators, but only upon the properties of the randomization mechanism itself, a relatively boring idea by comparison.  That said, the proposed variance estimator reproduces Eicker-Huber-White (HC) and cluster robust (CR) standard as special cases. Previously HC and CR were the two basic flavors available.  This "generalized sandwich" is applicable to any design, linear estimator and bound.

In Paper 2 of 4, I provide an alternative variance estimation principle that is unlikely to have arisen in alternative frameworks and which may have remarkably lower variance (note: variance of the variance estimator) than equivalents in the sandwich family that have the same expected value (note: expected values of variance estimators).  That the new variance estimation concept is unlikely to arise from other, more standard, frameworks is manifestly true in light of the 40 year lacuna since White (1980).

In Paper 3 of 4, I hope to leave the reader wondering, as I often do, who decided it was a good idea to interpret a regression coefficient in the first place.

Finally, Paper 4 of 4 will be a big top full of new and freakish regression estimators.  Headlining is the first and only Best Linear Unbiased Estimator (BLUE) developed in the design-based framework.  After the disruption following Freedman (2008), I expect this will help restore balance to the universe. 

​Another benefit of Paper 4 of 4 is that econometricians, having read Paper 3 of 4 and now to be found in unlikeliest of places, lolling about under the oaks with a copy of Camus for example, might be enticed return to their offices to read about the spectacular. It will be a real "lapel yanker", as the great barker, himself, would say.

(Of course, this is all a bit tongue-in-cheek.  For example, I am well aware that economists would never read Paper 3 of 4, or anything by scholars outside of top 10 econ departments for that matter. Without the correct and True training, outsiders are assumed to be lacking where vital impetus is concerned, not to mention hubris.)


Dedication

This work is written with reverence for scholars like David Blackwell, and who was one of the greatest statisticians the world has known, in spite of everything working against him, and who was also good.  His son Hugo tells me that, after his father passed, he was surprised to discover a desk drawer full of honorary degrees, and that David liked to serve martinis at faculty gatherings in their home.  Evidence of this can be found in a photo in the statistics conference room in Edwards. 

The work is also dedicated to scholars such as David's good friend, Jerzy, who tried to hire him in 1943 but campus said no. Jerzy ultimately did hire him a decade later, at a time when, even still, civil rights was not yet a thing.

All the evidence I have seen suggests that these two were scholars of great intellect and character both, who valued ideas and humans above accolades and optics, and who probably landed at Berkeley precisely because they had no interest in leaping into a pit of snakes in Cambridge, Mass.  The weather is better here besides. 

To David and Jerzy's other friend, the other David, thank you. You once saw me in your home for coffee when, as a graduate student, there was no one else to talk to.  You must have been unwell in those days, and I wonder now if I might have lingered too long.  This day was, and continues to be, the happiest of my career.
​