Reply to Guinnane and Hoffman

Guinnane and Hoffman (subsequently GH) have circulated a comment on two of our papers. The original papers contribute to the recent literature on the origins of political extremism: Voigtländer and Voth: Persecution Perpetuated (2012, subsequently PP) and Satyanath, Voigtländer and Voth: Bowling for Fascism (2017, subsequently BF). GH use the publicly available replication datasets to re-examine the evidence presented in these two papers and argue they find "several econometric and historical errors."

We have already addressed their criticisms in a reply, which we shared with the authors. They have since proceeded to adapt and change their original critique, and have sent new versions to various journals, so far without success. Here, we respond to the latest version that we are aware of (from the fall of 2023), analyze their results, and show why the adapted version remains as unconvincing as their original critique. Below, we provide links to the original versions of their comment, together with our replies.

Contents

Odd Nature of the Critique

The critique is odd in several ways:

Statistics and "Reverse p-hacking" in the Critique of PP

The critique's basic intellectual premise is questionable. The authors engage in extensive "reverse p-hacking," trying to find a combination of estimation methods, regressors, and subsamples where conventional levels of significance are no longer reached. This is a basic statistical mistake, as the difference between "significant" and "not significant" coefficients is not itself statistically significant: See Gelman and Stern "The difference between 'significant' and 'not significant' is not itself statistically significant." The American Statistician. 2006 Nov 1;60(4):328-31.

Take the results below for estimating the main results table in PP with and without Bavaria. Some coefficients increase in size. Throwing away 70+ observations, as GH do in the case of dropping Bavaria, will change standard errors, of course. Nevertheless, while some results' significance now drop below conventional levels (1920s pogroms, NS vote 1928), others become stronger (DVFP vote, Stuermer letters).

The important point is this: The difference of 0.0461 and 0.0607, both with errors, is not significant. It is poorly estimated, and should not change anyone’s prior about the true relationship. There is nothing to see here. It is worth pointing out that about 2/3 of the critique is built on this idea (cf. the extensive results in the appendix, where the authors engage in a blizzard of specification search, sample experimentation, and estimator play).

Critique of PP: Bavaria

The historical motivation for dropping Bavaria is unconvincing and ad hoc, as we discuss in detail in our original reply. In addition, GH's claim that our results are fragile to the inclusion/dropping of Bavaria is incorrect. The replication files, re-run without Bavaria, reveal near-identical results:


(1)
Pogrom 1920s
(2)
NSvote 28
(3)
DVFPvote 24
(4)
deptotal
(5)
stuermersum
(6)
synagogue
damaged or
destroyed
Original results 0.0607***
(0.0226)
0.0142**
(0.00567)
0.0147
(0.0110)
0.142**
(0.0706)
0.369**
(0.144)
0.124**
(0.0522)
Without Bavaria 0.0461
(0.0291)
0.00685
(0.00435)
0.0126*
(0.00686)
0.162**
(0.0771)
0.373***
(0.136)
0.135**
(0.0592)

The coefficients barely change; two equations lose a "star" and one gains a star.

The stability of our results is also reflected in the raw data. Consider the basic table of pogrom frequencies, again for the entire sample and the no-Bavaria sample (e.g. in places without a 1349 pogrom, there was a 20.9% probability of no attack on a synagogue in 1938, etc.):

Full sample

synagogue
damaged or
destroyed
(probability in %)
Pogrom in 1349 Total
0 1
0 20.90 6.16 9.71
1 79.10 93.84 90.29
Total 100.00 100.00 100.00

Without Bavaria

synagogue
damaged or
destroyed
(probability in %)
Pogrom in 1349 Total
0 1
0 19.61 4.62 8.04
1 80.39 95.38 91.96
Total 100.00 100.00 100.00

For places with a medieval pogrom, the chance of a pogrom in the Reichspogromnacht goes up from 80 to 95% without Bavaria; in the full data, it goes up from 79 to 94%. This is not a damning problem; it's a clear confirmation of our results.

Allegedly Misspecified Deportation Regression in PP

GH claim that our inclusion of the total number of deportees, the number of Jews, the percentage of Jews in the population, and the population total in the deportation regression is misspecified. However, controlling for the size of the population at risk of deportation is natural, and the inclusion of the percentage of Jews in the population is not absurd. Dropping the percentage of Jews from the original specification makes no difference to our results, as shown in the table below. As the table below shows, the PP result actually become stronger, with a larger coefficient and smaller standard error:


(1)
original
(2)
without %Jewish
Pogrom 1349 0.142**
(0.0706)
0.164**
(0.0687)
log (pop 1933) 0.241***
(0.0841)
0.0943*
(0.0534)
log (Jews 1933) 0.815***
(0.0822)
0.958***
(0.0545)
%Jews 1933 0.0743**
(0.0348)

%Protestants 1925 -0.00391***
(0.00116)
-0.00367***
(0.00114)

Again, the extensive p-hacking in GH's appendix, Tables A2.5 and A2.5, just serve to show how hard it is to 'kill' the positive coefficient on pogroms, giving further credence to the original results.

Matching results in PP

GH express their unease about the idea of matching, claiming that economists tend to be skeptical. However, matching is a very common technique in economics. A historical paper is not the place to discuss a pet peeve that is no more than a robustness check.

Robustness and Quantile Regressions in PP

GH make much of the quantile regressions they estimate, claiming they are less sensitive to outliers. However, quantile regressions are not superior in any generally accepted way. They are informationally less efficient, and there is no reason why the conditional median is more interesting than other percentiles. In our reply to the first edition of this critique, we showed that i) in quantile regressions, the results in question are generally statistically significant for the first six deciles (that is, they are not driven by outliers with particularly high Nazi Party votes), ii) estimating robust regressions that downweight outliers confirm our results.

Critique of Bowling for Fascism (BF)

BF focus not on our main findings, but on an auxiliary result: the role of political instability in the relationship between social capital and Nazi support. This is a small part of our argument in BF, which is dedicated to showing the effect of social capital on Nazi entry. Our reply to GH disproves these claims and shows they are not the result of fair-minded data handling.

Historical Evidence

GH's grand statements about the importance of historical evidence are self-congratulatory and condescending, yet not backed up by knowledge of Germany's historical context. Our reply to the first iteration of this critique shows that GH completely misunderstand party politics in Bavaria, for example.

In general, better ideas, interpretations, and insights should replace poor, unfounded, or sloppily researched ideas. A useful way to go beyond our work would be to find a better interpretation of differences in anti-Semitism and Nazi support, and to offer clear evidence that it works and why it's superior. P-hacking and misreading the historical record will not advance knowledge.

Link to Previous Versions

Earlier versions of GH's comment and our replies can be found here: