Download the notebook here
!
Interactive online version:
Replication of Angrist (1990): Lifetime earnings and the Vietnam era draft lottery: Evidence from social security administrative records
Project by Pascal Heid, Summer 2020.
[1]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from auxiliary.auxiliary_figures import get_figure1, get_figure2, get_figure3
from auxiliary.auxiliary_tables import (
get_table1,
get_table2,
get_table3,
get_table4,
)
from auxiliary.auxiliary_data import process_data
from auxiliary.auxiliary_visuals import background_negative_green, p_value_star
from auxiliary.auxiliary_extensions import (
get_flexible_table4,
get_figure1_extension1,
get_figure2_extension1,
get_bias,
get_figure1_extension2,
get_figure2_extension2,
)
import warnings
warnings.filterwarnings("ignore")
plt.rcParams["figure.figsize"] = [12, 6]
This notebook replicates the core results of the following paper:
Angrist, Joshua. (1990). Lifetime Earnings and the Vietnam Era Draft Lottery: Evidence from Social Security Administrative Records. American Economic Review. 80. 31336.
In the following just a few notes on how to read the remainder:
In this excerpt I replicate the Figures 1 to 3 and the Tables 1 to 4 (in some extended form) while I do not consider Table 5 to be a core result of the paper which is why it cannot be found in this notebook.
I follow the example of Angrist keeping his structure throughout the replication part of this notebook.
The naming and order of appearance of the figures does not follow the original paper but the published correction.
The replication material including the partially processed data as well as some replication dofiles can be found here.
1. Introduction
For a soft introduction to the topic, let us have a look at the goal of Angrist’s article. Already in the first few lines Angrist states a clearcut aim for his paper by making the remark that “yet, academic research has not shown conclusively that Vietnam (or other) veterans are worse off economically than nonveterans”. He further elaborates on why research had yet been so inconclusive. He traces it back to the flaw that previous research had solely tried to estimate the effect of veteran status on subsequent earnings by comparing the latter across individuals differing in veteran status. He argues that this naive estimate might likely be biased as it is easily imaginable that specific types of men choose to enlist in the army whose unobserved characteristics imply low civilian earnings (selfselcetion on unobservables).
Angrist avoids this pitfall by employing an instrumental variable strategy to obtain unbiased estimates of the effect of veteran status on earnings. For that he exploits the random nature of the Vietnam draft lottery. This lottery randomly groups people into those that are eligible to be forced to join the army and those that are not. The idea is that this randomly affects the veteran status without being linked to any unobserved characteristics that cause earnings. This allows Angrist to obtain an estimate of the treatment effect that does not suffer from the same shortcomings as the ones of previous studies.
He finds that Vietnam era veterans are worse off when it comes to long term annual real earnings as opposed to those that have not served in the army. In a secondary point he traces this back to the loss of working experience for veterans due to their service by estimating a simple structural model.
In the following sections I first walk you through the identification idea and empirical strategy. Secondly, I replicate and explain the core findings of the paper with a rather extensive elaboration on the different data sources used and some additional visualizations. Thirdly, I critically assess the paper followed by my own two extensions concluding with some overall remarks right after.
2. Identification and Empirical Approach
As already mentioned above the main goal of Angrist’s paper is to determine the causal effect of veteran status on subsequent earnings. He believes for several reasons that conventional estimates that only compare earnings by veteran status are biased due to unobservables that affect both the probability of serving in the military as well as earnings over lifetime. This is conveniently shown in the causal graph below. Angrist names two potential reasons why this might be likely. First of all, he makes the point that probably people with few civilian opportunities (lower expected earnings) are more likely to register for the army. Without a measure for civilian opportunities at hand a naive estimate of the effect of military service on earnings would not be capable of capturing the causal effect. Hence, he believes that there is probably some selfselection into treatment on unobservables by individuals. In a second point, Angrist states that the selection criteria of the army might be correlated with unobserved characteristics of individuals that makes them more prone to receiving future earnings pointing into a certain direction.
Econometrically spoken, Angrist argues with the following linear regression equation representing a version of the right triangle in the causal graph:
He argues that estimating the above model with the real earnings \(y_{cti}\) for an individual \(i\) in cohort \(c\) at time \(t\) being determined by cohort and time fixed effects (\(\beta_c\) and \(\delta_t\)) as well an individual effect for veteran status is biased. This is for the above given reasons that the indicator for veteran status \(s_i\) is likely to be correlated with the error term \(u_{it}\).
Angrist’s approach to avoid bias is now to employ an instrumental variable approach which is based on the accuracy of the causal graph below.
The validity of this causal graph rests on the crucial reasoning that there is no common cause of the instrument (Draft Lottery) and the unobserved variables (U). Angrist provides the main argument that the draft lottery was essentially random in nature and hence is not correlated with any personal characteristics and therefore not linked to any unobservables that might determine military service and earnings. As will be later explained in more detail, the Vietnam draft lottery determined randomly on the basis of the birth dates whether a person is eligible to be drafted by the army in the year following the lottery. The directed graph from Draft Lottery to Military Service is therefore warranted as the fact of having a lottery number rendering a person drafteligible increases the probability of joining the military as opposed to a person that has an excluded lottery number.
This argumentation leads Angrist to use the probability of being a veteran conditional on being drafteligible in the lottery as an instrument for the effect of veteran status on earnings. In essence this is the Wald estimate which is equal to the following formula:
The nominator equals to the estimated \(\alpha\) from equation (1) while the denominator can be obtained by a first stage regression which regresses veteran status on drafteligibility. It reduces to estimating the difference in conditional probabilities of being a veteran \(prob(veteran \mid eligible = 1)  prob(veteran \mid eligible = 0)\). Estimates for this are obtained by Angrist through weighted least squares (WLS). This is done as Angrist does not have micro data but just grouped data (for more details see the data section in the replication). In order to obtain the estimates of the underlying micro level data it is necessary to adjust OLS by the size of the respective groups as weights. The above formula is also equivalent to a Two Stage Least Squares (2SLS) procedure in which earnings are regressed on the fitted values from a first stage regression of veteran status on eligibility.
In a last step, Angrist generalizes the Wald grouping method to more than just one group as instrument. There are 365 lottery numbers that were split up into two groups (eligible and noneligible) for the previous Wald estimate. Those lottery numbers can also be split up even further into many more subgroups than just two, resulting in many more dummy variables as instruments. Angrist splits the lottery numbers into intervals of five which determine a group \(j\). By cohort \(c\) he estimates for each group \(j\) the conditional probability of being a veteran \(p_{cj}\). This first stage is again run by WLS. The resulting estimate \(\hat p_{cj}\) is then used to conduct the second stage regression below.
The details and estimation technique will be further explained when presenting the results in the replication section below.
3. Replication
3.1 Background and Data
The Vietnam Era Draft Lottery
Before discussing how the data looks like it is worthwhile to understand how the Vietnam era draft lottery was working in order to determine to which extent it might actually serve as a valid instrument. During the Vietnam war there were several draft lotteries. They were held in the years from 1970 to 1975. The first one took place at the end of 1969 determining which men might be drafted in the following year. This procedure of determining the lottery numbers for the following year continued until 1975. The table below shows for which years there were lotteries drawn and which birth years were affected by them in the respective year. For more details have a look here.
Year 
Cohorts 
DraftEligibility Ceiling 

1970 
194450 
195 
1971 
1951 
125 
1972 
1952 
95 
1973 
1953 
95 
1974 
1954 
95 
1975 
1955 
95 
1976 
1956 
95 
The authority of drafting men for the army through the lottery expired on June 30, 1973 and already before no one was drafted anymore. The last draft call took place on December 7, 1972.
The general functioning of those seven lotteries was that every possible birthday (365 days) was randomly assigned a number between 1 and 365 without replacement. Taking the 1969 lottery this meant that the birthdate that had the number 1 assigned to, it caused every man born on that day in the years 1944 to 1950 to be drafted first if it came to a draft call in the year 1970. In practice, later in the same year of the draft lottery, the army announced drafteligibility ceilings determining up to which draft lottery number was called in the following year. In 1970, this means that every man having a lottery number of below 195 was called to join the army. As from 1973 on nobody was called anymore, the numbers for the ceiling are imputed from the last observed one which was 95 in the year 1972. Men with lottery numbers below the ceiling for their respective year are from here on called “drafteligible”.
Being drafted did not mean that one actually had to serve in the army, though. Those drafted had to pass mental and physical tests which in the end decided who had to join. Further it should be mentioned that Angrist decides to only use data on those that turned 19 when being at risk of induction which includes men born between 1950 and 1953.
The Data
Continuous Work History Sample (CWHS)
This administrative data set constitutes a random one percent sample draw of all possible social security numbers in the US. For the years from 1964 until 1984 it includes the FICA (social security) earnings history censored to the Social Security maximum taxable amount. It further includes FICA taxable earnings from selfemployment. For the years from 1978 on it also has a series on total earnings (Total W2) including for instance cash payments but excluding earnings from selfemployment. This data set has some confidentiality restrictions which means that only group averages and variances were available. This means that Angrist cannot rely on micro data but has to work with sample moment which is a crucial factor for the exact implementation of the IV method. A group is made of by year of earnings, year of birth, ethnicity and five consecutive lottery numbers. The statistics collected for those also include the number of people in the group, the fraction of them having taxable earnings equal and above the taxable maximum and the fraction having zero earnings.
Regarding the actual data sets available for replication we have the data set cwhsa
which consists of the above data for the years from 1964 to 1977 and then cwhsb
which consists of the CWHS for the years after.
Above that Angrist provides the data set cwhsc_new
which includes the adjusted FICA earnings. For those Angrist employed a strategy to approximate the underlying uncensored FICA earnings from the reported censored ones. All of those three different earnings variables are used repeatedly throughout the replication.
[3]:
process_data("cwhsa")
[3]:
earnings  earnings variance  sample size  fraction zero earnings  

ethnicity  birth year  year  lottery interval  
1  44  64  1  1691.030029  1480.599976  182.0  0.170 
2  1535.430054  1359.020020  187.0  0.187  
3  1818.010010  1604.420044  210.0  0.171  
4  1636.380005  1626.270020  208.0  0.231  
5  1889.800049  1639.609985  207.0  0.184  
...  ...  ...  ...  ...  ...  ...  ... 
2  53  77  69  3643.739990  4273.600098  53.0  0.415 
70  4127.490234  5623.089844  55.0  0.473  
71  4712.459961  4588.279785  76.0  0.316  
72  4676.939941  5321.140137  85.0  0.353  
73  4651.870117  4989.020020  83.0  0.241 
20440 rows × 4 columns
The above earnings data only consists of FICA earnings. The lottery intervals from 1 to 73 are equivalent to intervals of five consecutive lottery numbers. Consequently, the variable lottery interval equals to one for the lottery numbers 1 to 5 and so on. The ethnicity variable is encoded as 1 for a white person and 2 for a nonwhite person.
[4]:
process_data("cwhsb")
[4]:
earnings  earnings variance  sample size  fraction zero earnings  

data source  ethnicity  birth year  year  lottery interval  
TAXAB  1  44  78  1  10625.58  7052.47  179  0.179 
2  11546.46  8032.55  182  0.198  
3  11401.16  7508.27  209  0.196  
4  10899.99  7342.60  206  0.189  
5  11667.14  7507.56  207  0.159  
...  ...  ...  ...  ...  ...  ...  ...  ... 
TOTAL  2  53  84  69  6846.43  9117.49  53  0.396 
70  11357.89  14734.47  55  0.455  
71  8695.86  9613.24  76  0.368  
72  14013.24  14182.30  84  0.274  
73  10742.71  18095.78  83  0.506 
20440 rows × 4 columns
As stated above this data now consists of earnings from 1978 to 1984 for FICA (here encoded as “TAXAB”) and Total W2 (encoded as “TOTAL”).
Survey of Income and Program Participation (SIPP) and the Defense Manpower Data Center (DMDC)
Throughout the paper it is necessary to have a measure of the fraction of people serving in the military. For this purpose the above two data sources are used.
The SIPP is a longitudinal survey of around 20,000 households in the year 1984 for which is determined whether the persons in the household are Vietnam war veterans. The survey also collected data on ethnicity and birth data which made it possible to match the data to lottery numbers. The DMDC on the other hand is an administrative record which shows the total number of new entries into the army by ethnicity, cohort and lottery number per year from mid 1970 until the end of 1973. Those
sources are needed for the results in Table 3 and 4. A combination of those two are matched to the earnings data of the CWHS which constitutes the data set chwsc_new
below.
[5]:
data_cwhsc_new = process_data("cwhsc_new")
data_cwhsc_new
[5]:
earnings  probability of serving  

data source  ethnicity  birth year  year  lottery interval  
ADJ  1  50  74  1  8853.940430  0.352700 
75  1  9062.639648  0.352700  
76  1  10096.055664  0.352700  
77  1  10916.072266  0.352700  
78  1  11738.444336  0.352700  
...  ...  ...  ...  ...  ...  ... 
TOTAL  2  53  84  37  10562.357422  0.111818 
57  8988.295898  0.082410  
40  9857.195312  0.111429  
11  8690.839844  0.088025  
23  9709.985352  0.073750 
12818 rows × 2 columns
This data set now also includes the adjusted FICA earnings which are marked by “ADJ” as well as the probability of serving in the military conditional on being in a group made up by ethnicity, birth cohort and lottery interval.
Below we have a short look at how the distribution of the different earnings measures look like. In the table you see the real earnings in 1978 dollar terms for the years from 1974 to 1984 for FICA and adjusted FICA as well as the years 1978 until 1984 for Total W2.
[6]:
for data in ["ADJ", "TAXAB", "TOTAL"]:
ax = sns.kdeplot(
data_cwhsc_new.loc[data, "earnings"],
color=np.random.choice(np.array([sns.color_palette()]).flatten(), 4),
)
ax.set_xlim(xmax=20000)
ax.legend(["Adjusted FICA", "FICA", "TOTAL W2"], loc="upper left")
ax.set_title("Kernel Density of the different Earning Measures")
[6]:
Text(0.5, 1.0, 'Kernel Density of the different Earning Measures')
For a more detailed description of the somewhat confusing original variable names in the data sets please refer to the appendix at the very bottom of the notebook.
3.2 Establishing the Validity of the Instrument
In order to convincingly pursue the identification strategy outlined above it is necessary to establish an effect of draft eligibility (the draft lottery) on veteran status and to argue that draft eligibility is exogenous to any unobserved factor affecting both veteran status and subsequent earnings. As argued before one could easily construct reasonable patterns of unobservables that both cause veteran status and earnings rendering a naive regression of earnings on veteran status as biased.
The first requirement for IV to be valid holds as it is clearly observable that drafteligibility has an effect on veteran status. The instrument is hence relevant. For the second part Angrist argues that the draft lottery itself is random in nature and hence not correlated with any unobserved characteristics (exogenous) a man might have that causes him to enroll in the army while at the same time making his earnings likely to go into a certain direction irrespective of veteran status.
On the basis of this, Angrist now shows that subsequent earnings are affected by draft eligibility. This is the foundation to find a nonzero effect of veteran status on earnings. Going back to the causal diagram from before, Angrist argued so far that there is no directed graph from Draft Lottery to the unobservables U but only to Military Service. Now he further establishes the point that there is an effect of drafteligibility (Draft Lottery) that propagates through Military Service onto earnings (Wages).
In order to see this clearly let us have a look at Figure 1 of the paper below. For white and nonwhite men separately the history of average FICA earnings in 1978 dollar terms is plotted. This is done by year within cohort across those that were drafteligible and those that were not. The highest two lines represent the 1950 cohort going down to the cohort of men born in 1953. There is a clearly observable pattern among white men in the cohorts from 1950 to 52 which shows persistently lower earnings for those drafteligible starting the year in which they could be drafted. This cannot be seen for those born in 1953 which is likely due to the fact that nobody was actually drafted in 1973 which would have otherwise been “their” year. For nonwhite men the picture is less clear. It seems that for cohorts 50 to 52 there is slightly higher earnings for those ineligible but this does not seem to be persistent over time. The cohort 1953 again does not present a conclusive image. Observable in all lines, though, is that before the year of conscription risk there is no difference in earnings among the group which is due to the random nature of the draft lottery.
[7]:
# read in the original data sets
data_cwhsa = pd.read_stata("data/cwhsa.dta")
data_cwhsb = pd.read_stata("data/cwhsb.dta")
data_cwhsc_new = pd.read_stata("data/cwhsc_new.dta")
data_dmdc = pd.read_stata("data/dmdcdat.dta")
data_sipp = pd.read_stata("data/sipp2.dta")
[8]:
get_figure1(data_cwhsa, data_cwhsb)
A more condensed view of the results in Figure 1 is given in Figure 2. It depicts the differences in earnings between the red and the black line in Figure 1 by cohort and ethnicity. This is just included for completeness as it does not provide any further insight in comparison to Figure 1.
[9]:
get_figure2(data_cwhsa, data_cwhsb)
A further continuation of this line of argument is resulting in Table 1. Angrist makes the observations from the figures before even further finegrained and explicit. In Table 1 Angrist estimates the expected difference in average FICA and Total W2 earnings by ethnicity within cohort and year of earnings. In the table below for white men we can observe that there is no significant difference to the five percent level for the years before the year in which they might be drafted. This changes for the cohorts from 1950 to 52 in the years 1970 to 72, respectively. There we can observe a significantly lower income for those eligible in comparison to those ineligible. This seems to be persistent for the cohorts 1950 and 52 while less so for those born in 1951 and 1953. It should further be noted that Angrist reports that the quality of the Total W2 earnings data was low in the first years (it was launched in 1972) explaining the inconlusive estimations in the periods at the beginning.
To focus the attention on the crucial points I mark all the negative estimates in different shades of green with more negative ones being darker. This clearly emphasizes the verbal arguments brought up before.
[10]:
table1 = get_table1(data_cwhsa, data_cwhsb)
table1["white"].style.applymap(background_negative_green)
[10]:
type  FICA  TOTAL  

byr  50  51  52  53  50  51  52  53  
year  Statistic  
66  Average  21.810000  
Standard Error  14.990000  
67  Average  8.020000  13.170000  
Standard Error  18.210000  16.450000  
68  Average  14.900000  12.340000  8.960000  
Standard Error  24.200000  19.500000  19.250000  
69  Average  2.100000  18.790000  11.420000  4.090000  
Standard Error  34.580000  26.470000  22.780000  18.340000  
70  Average  233.870000  44.830000  5.070000  32.940000  
Standard Error  39.720000  36.700000  29.380000  24.200000  
71  Average  325.950000  298.210000  29.420000  27.680000  
Standard Error  46.630000  41.780000  40.260000  30.350000  
72  Average  203.580000  197.450000  261.600000  2.130000  
Standard Error  55.420000  51.180000  46.890000  42.920000  
73  Average  226.650000  228.860000  357.780000  56.580000  
Standard Error  67.840000  61.640000  56.260000  54.810000  
74  Average  243.040000  155.460000  402.740000  15.060000  
Standard Error  81.450000  75.330000  68.380000  68.150000  
75  Average  295.240000  99.210000  304.590000  28.300000  
Standard Error  94.420000  89.790000  85.010000  79.630000  
76  Average  314.220000  86.870000  370.780000  145.510000  
Standard Error  106.620000  102.940000  98.300000  93.080000  
77  Average  262.640000  274.230000  396.970000  85.510000  
Standard Error  117.910000  112.260000  111.180000  107.140000  
78  Average  205.400000  203.880000  467.100000  65.320000  1059.400000  233.270000  175.360000  1974.550000 
Standard Error  132.710000  127.040000  127.300000  123.190000  2159.340000  1609.440000  1567.940000  912.110000  
79  Average  263.610000  60.530000  236.900000  89.280000  1588.720000  523.690000  580.860000  557.940000 
Standard Error  160.590000  152.390000  153.920000  148.700000  1575.610000  1590.540000  736.750000  750.140000  
80  Average  339.160000  267.980000  312.110000  93.880000  1028.120000  85.630000  581.320000  428.730000 
Standard Error  183.250000  175.310000  178.230000  170.740000  756.860000  599.870000  309.170000  341.540000  
81  Average  435.830000  358.320000  342.890000  34.390000  589.670000  71.610000  440.530000  109.540000 
Standard Error  210.590000  203.670000  206.880000  199.070000  299.430000  423.400000  265.080000  245.250000  
82  Average  320.200000  117.310000  235.120000  29.490000  305.540000  72.760000  514.710000  18.720000 
Standard Error  235.860000  229.140000  232.380000  222.660000  345.490000  372.160000  296.570000  281.910000  
83  Average  349.580000  314.060000  437.740000  96.370000  512.940000  896.550000  915.710000  30.160000 
Standard Error  261.670000  253.270000  257.550000  248.770000  441.220000  426.380000  395.260000  318.120000  
84  Average  484.390000  398.460000  436.060000  228.680000  1143.320000  809.200000  767.240000  164.210000 
Standard Error  286.830000  279.260000  281.930000  272.260000  492.270000  380.960000  376.060000  366.100000 
For the nonwhite males there is no clear cut pattern. Only few cells show significant results which is why Angrist in the following focuses on white males when constructing IV estimates. For completeness I present Table 1 for nonwhite males below although it is somewhat less important for the remainder of the paper.
[11]:
table1["nonwhite"].style.applymap(background_negative_green)
[11]:
type  FICA  TOTAL  

byr  50  51  52  53  50  51  52  53  
year  Statistic  
66  Average  11.880000  
Standard Error  27.690000  
67  Average  12.910000  4.030000  
Standard Error  34.230000  30.660000  
68  Average  29.540000  6.290000  12.040000  
Standard Error  44.510000  37.400000  35.050000  
69  Average  5.130000  67.800000  3.450000  42.420000  
Standard Error  66.850000  53.410000  43.420000  36.490000  
70  Average  99.820000  62.250000  24.750000  0.950000  
Standard Error  78.600000  75.740000  62.270000  45.000000  
71  Average  164.810000  144.310000  25.080000  18.230000  
Standard Error  92.750000  86.500000  85.190000  60.790000  
72  Average  188.880000  156.720000  208.280000  60.440000  
Standard Error  113.610000  105.730000  104.280000  92.830000  
73  Average  85.730000  134.890000  175.680000  115.590000  
Standard Error  137.790000  127.080000  129.090000  119.480000  
74  Average  179.350000  96.710000  181.420000  216.590000  
Standard Error  165.090000  160.130000  155.650000  145.200000  
75  Average  190.350000  236.150000  183.730000  111.640000  
Standard Error  189.320000  186.810000  185.880000  166.950000  
76  Average  105.340000  333.790000  308.910000  46.400000  
Standard Error  214.710000  215.410000  216.540000  199.360000  
77  Average  112.430000  206.880000  251.130000  153.510000  
Standard Error  238.500000  240.490000  248.540000  233.510000  
78  Average  163.670000  108.610000  424.930000  381.910000  1145.070000  2978.240000  4676.250000  482.800000 
Standard Error  272.670000  269.280000  279.480000  275.770000  2395.620000  2869.680000  1393.130000  2206.090000  
79  Average  187.040000  210.310000  391.710000  312.040000  4005.420000  1545.070000  494.790000  1043.330000 
Standard Error  317.210000  323.080000  324.830000  326.330000  2721.280000  2191.150000  2683.890000  1660.240000  
80  Average  203.250000  4.810000  212.660000  344.080000  790.240000  376.470000  292.700000  288.700000 
Standard Error  363.100000  368.410000  372.530000  370.320000  648.170000  533.690000  441.000000  416.500000  
81  Average  534.520000  313.200000  305.860000  717.820000  802.590000  415.980000  272.360000  784.410000 
Standard Error  413.580000  419.190000  429.110000  433.730000  524.630000  745.170000  492.870000  503.150000  
82  Average  285.160000  175.470000  262.570000  810.470000  326.040000  244.340000  160.220000  675.160000 
Standard Error  461.290000  471.650000  476.750000  486.300000  608.970000  647.840000  590.010000  564.100000  
83  Average  96.070000  419.560000  177.340000  543.640000  315.480000  254.330000  53.640000  462.350000 
Standard Error  512.620000  538.170000  531.510000  523.260000  720.000000  767.600000  643.490000  638.970000  
84  Average  76.870000  223.190000  123.400000  641.350000  287.440000  718.610000  288.100000  827.400000 
Standard Error  548.220000  562.880000  568.600000  568.200000  804.100000  771.590000  721.010000  716.810000 
3.3 Measuring the Effect of Military Service on Earnings
3.3.1 Waldestimates
As discussed in the identification section a simple OLS regression estimating the model in equation (1) might suffer from bias due to elements of \(s_i\) that are correlated with the error term \(u_{it}\). This problem can be to a certain extent circumvented by the grouping method proposed by Abraham Wald (1940). Grouping the data by the instrument which is draft eligibility status makes it possible to uncover the effect of veteran status on earnings. An unbiased estimate of \(\alpha\) can therefore be found by adjusting the difference in mean earnings across eligibility status by the difference in probability of becoming a veteran conditional on being either draft eligible or not. This verbal explanation is translated in the following formula:
The variable \(\bar y\) captures the mean earnings within a certain cohort and year further defined by the superscript \(e\) or \(n\) which indicates drafteligibility status. The above formula poses the problem that the conditional probabilities of being a veteran cannot be obtained from the CWHS data set alone. Therefore in Table 2 Angrist attempts to estimate them from two other sources. First from the SIPP which has the problem that it is a quite small sample. And secondly, he matches the CWHS data to the DMDC. Here it is problematic, though, that the amount of people entering the army in 1970 (which is the year when those born 1950 were drafted) is only collected for the second half of the year. This is the reason why Angrist has to go with the estimates from the SIPP for the cohort of 1950 while taking the bigger sample of the matched DMDC/CWHS for the birth years 1951 to 53. The crucial estimates needed for the denominator of equation (3) are presented in the last column of Table 2 below. It can already be seen that the differences in earnings by eligibility that we found in Table 1 will be scaled up quite a bit to obtain the estimates for \(\hat{\alpha}\). We will come back to that in Table 3.
Note: The cohort 1950 for the DMDC/CWHS could not be replicated as the data for cohort 1950 from the DMDC set is missing in the replication data. Above that the standard errors for the estimates coming form SIPP differ slightly from the published results but are equal to the results from the replication code.
[12]:
table2 = get_table2(data_cwhsa, data_dmdc, data_sipp)
table2["white"]
[12]:
Sample  P(Veteran)  P(Veteraneligible)  P(Veteranineligible)  P(Veligible)  P(Vineligible)  

Data Set  Cohort  Statistic  
SIPP (84)  1950  Value  351.0  0.2673  0.3527  0.1934  0.1594 
Standard Error  0.0136  0.0215  0.0166  0.0272  
1951  Value  359.0  0.1973  0.2831  0.1469  0.1362  
Standard Error  0.0124  0.0230  0.0139  0.0269  
1952  Value  336.0  0.1554  0.2310  0.1257  0.1053  
Standard Error  0.0111  0.0245  0.0119  0.0273  
1953  Value  390.0  0.1298  0.2192  0.1126  0.1066  
Standard Error  0.0102  0.0313  0.0104  0.0330  
DMDC/CWHS  1951  Value  16768.0  0.1176  0.2071  0.0708  0.1362 
Standard Error  0.0025  0.0053  0.0024  0.0059  
1952  Value  17703.0  0.1515  0.2683  0.1102  0.1581  
Standard Error  0.0027  0.0065  0.0027  0.0071  
1953  Value  17749.0  0.1343  0.1548  0.1268  0.0280  
Standard Error  0.0026  0.0053  0.0029  0.0060 
[13]:
table2["nonwhite"]
[13]:
Sample  P(Veteran)  P(Veteraneligible)  P(Veteranineligible)  P(Veligible)  P(Vineligible)  

Data Set  Cohort  Statistic  
SIPP (84)  1950  Value  70.0  0.1625  0.1957  0.1355  0.0603 
Standard Error  0.0281  0.0449  0.0353  0.0571  
1951  Value  63.0  0.1703  0.2014  0.1514  0.0500  
Standard Error  0.0283  0.0497  0.0340  0.0603  
1952  Value  52.0  0.1332  0.1449  0.1288  0.0161  
Standard Error  0.0265  0.0525  0.0308  0.0609  
1953  Value  55.0  0.1749  0.2247  0.1642  0.0605  
Standard Error  0.0297  0.0762  0.0321  0.0827  
DMDC/CWHS  1951  Value  5258.0  0.0794  0.1173  0.0599  0.0574 
Standard Error  0.0037  0.0076  0.0040  0.0086  
1952  Value  5493.0  0.0953  0.1439  0.0794  0.0644  
Standard Error  0.0040  0.0095  0.0042  0.0104  
1953  Value  5303.0  0.0925  0.0984  0.0904  0.0080  
Standard Error  0.0040  0.0079  0.0046  0.0092 
In the next step Angrist brings together the insights gained so far from his analysis. Table 3 presents again differences in mean earnings across eligibility status for different earnings measures and within cohort and year. The values in column 1 and 3 are directly taken from Table 1. In column 2 we now encounter the adjusted FICA measure for the first time. As a reminder, it consists of the scaled up FICA earnings as FICA earnings are only reported to a certain maximum amount. The true average earnings are likely to be higher and Angrist transformed the data to account for this. We can see that the difference in mean earnings is most often in between the one of pure FICA earnings and Total W2 compensation. In column three there is again the probability difference from the last column of Table 2. As mentioned before the measure is taken from the SIPP sample for the cohort of 1950 and the DMDC/CWHS sample for the other cohorts. Angrist decides to exclude cohort 1953 and nonwhite males as for those draft eligibility does not seem to be an efficient instrument (see Table 1 and Figure 1 and 2). Although Angrist does not, in this replication I also present Table 3 for nonwhites to give the reader a broader picture. Further Angrist focuses his derivations only on the years 1981 to 1984 as those are the latest after the Vietnam war for which there was data avalaible. Effects in those years are most likely to represent long term effects.
Let us now look at the most crucial column of Table 3 which is the last one. It captures the Wald estimate for the effect of veteran status on adjusted FICA earnings in 1978 dollar terms per year and cohort from equation (3). So this is our \(\hat{\alpha}\) per year and cohort. For white males the point estimates indicate that the annual loss in real earnings due to serving in the military was around 2000 dollars. Looking at the high standard errors, though, only few of the estimates are actually statistically significant. In order to see this more clearly I added a star to those values in the last column that are statistically significant to the five percent level.
Note: In the last column I obtain slightly different standard errors than in the paper. The same is the case, though, in the replication code my replication is building up on.
[14]:
table3 = get_table3(data_cwhsa, data_cwhsb, data_dmdc, data_sipp, data_cwhsc_new)
p_value_star(table3["white"], slice(None), ("", "Service Effect in 1978 $"))
[14]:
First Level  Draft Eligibility Effects in Current $  

Second Level  FICA Earnings  Adjusted FICA Earnings  Total W2 Earnings  P(Veligible)  P(Vineligible)  Service Effect in 1978 $  
Cohort  Year  Statistic  
1950  1981  Value  435.8  487.8  589.7  0.159  2195.3* 
Standard Error  210.6  237.6  299.4  0.027  1069.5  
1982  Value  320.2  396.1  305.5  1679.0  
Standard Error  235.9  281.7  345.5  1194.1  
1983  Value  349.6  450.1  512.9  1849.3  
Standard Error  261.7  302.0  441.2  1240.7  
1984  Value  484.4  638.8  1143.3  2517.1  
Standard Error  286.8  336.6  492.3  1326.3  
1951  1981  Value  358.3  428.8  71.6  0.136  2258.3* 
Standard Error  203.7  216.7  423.4  0.027  1141.2  
1982  Value  117.3  278.6  72.8  1382.1  
Standard Error  229.1  251.5  372.2  1247.5  
1983  Value  314.1  452.2  896.6  2174.4  
Standard Error  253.3  277.7  426.4  1335.3  
1984  Value  398.5  573.4  809.2  2644.3  
Standard Error  279.3  308.0  381.0  1420.3  
1952  1981  Value  342.9  392.7  440.5  0.105  2675.1 
Standard Error  206.9  220.3  265.1  0.027  1500.6  
1982  Value  235.1  255.3  514.7  1638.2  
Standard Error  232.4  254.0  296.6  1630.1  
1983  Value  437.7  500.1  915.7  3110.0  
Standard Error  257.6  283.3  395.3  1761.9  
1984  Value  436.1  560.1  767.2  3340.9  
Standard Error  281.9  310.8  376.1  1853.8 
Looking at nonwhite males now, we observe what we already expected. All of the Wald estimates are actually far away from being statistically significant.
[15]:
p_value_star(table3["nonwhite"], slice(None), ("", "Service Effect in 1978 $"))
[15]:
First Level  Draft Eligibility Effects in Current $  

Second Level  FICA Earnings  Adjusted FICA Earnings  Total W2 Earnings  P(Veligible)  P(Vineligible)  Service Effect in 1978 $  
Cohort  Year  Statistic  
1950  1981  Value  534.5  654.0  802.6  0.06  7780.5 
Standard Error  413.6  495.2  524.6  0.057  5891.3  
1982  Value  285.2  335.4  326.0  3758.5  
Standard Error  461.3  529.8  609.0  5937.0  
1983  Value  96.1  169.1  315.5  1836.3  
Standard Error  512.6  551.6  720.0  5990.4  
1984  Value  76.9  65.1  287.4  677.8  
Standard Error  548.2  601.9  804.1  6269.8  
1951  1981  Value  313.2  401.5  416.0  0.05  5760.5 
Standard Error  419.2  446.6  745.2  0.06  6407.4  
1982  Value  175.5  228.1  244.3  3081.9  
Standard Error  471.6  524.4  647.8  7087.0  
1983  Value  419.6  398.9  254.3  5224.8  
Standard Error  538.2  558.8  767.6  7318.6  
1984  Value  223.2  293.5  718.6  3687.0  
Standard Error  562.9  598.1  771.6  7513.4  
1952  1981  Value  305.9  316.5  272.4  0.016  14104.0 
Standard Error  429.1  454.8  492.9  0.061  20262.8  
1982  Value  262.6  502.6  160.2  21092.7  
Standard Error  476.8  524.1  590.0  21993.9  
1983  Value  177.3  275.9  53.6  11221.1  
Standard Error  531.5  546.6  643.5  22235.2  
1984  Value  123.4  99.8  288.1  3892.0  
Standard Error  568.6  600.3  721.0  23420.2 
3.3.2 More complex IV estimates
In the next step Angrist uses a more generalized version of the Wald estimate for the given data. While in the previous analysis the mean earnings were compared solely on the basis of two groups (eligibles and ineligibles, which were determined by the lottery numbers), in the following this is extended to more complex subgroups. The grouping is now based on intervals of five consecutive lottery numbers. As explained in the section on idenficication this boils down to estimating the model described in equation (2).
\(\bar y_{ctj}\) captures the mean earnings by cohort \(c\), in year \(t\) for group \(j\). \(\hat p_{cj}\) depicts the estimated probability of being a veteran conditional on being in cohort \(c\) and group \(j\). We are now interested in obtaining an estimate of \(\alpha\). In our current set up \(\alpha\) corresponds to a linear combination of the many different possible Wald estimates when comparing each of the subgroups in pairs. With this view in mind Angrist restricts the treatment effect to be same (i.e. equal to \(\alpha\)) for each comparison of subgroups. The above equation is equivalent to the second stage of the 2SLS estimation. Angrist estimates the above model using the mean real earnings averaged over the years 1981 to 84 and the cohorts from 1950 to 53. In the first stage Angrist has to estimate \(\hat p_{cj}\) which is done again by using a combination of the SIPP sample and the matched DWDC/CWHS data set. With this at hand Angrist shows how the equation (2) looks like if it was estimated by OLS. The following Figure 3 is also called Visual Instrumental Variables (VIV). In order to arrive there he takes the residuals from an OLS regression of \(\bar y_{ctj}\) and \(\hat p_{cj}\) on cohort and time dummies, respectively. Then he performs another OLS regression of the earnings residuals on the probability residuals. This is depicted in Figure 3 below. The slope of the regression line corresponds to an IV estimate of \(\alpha\). The slope amounts to an estimate of 2384 dollars which serves as a reference for the treatment effect measured by another, more efficient method described below the Figure.
[16]:
get_figure3(data_cwhsc_new)
We now shortly turn back to a remark from before. Angrist is forced to only work with sample means due to confidentiality restrictions on the underlying micro data. For the Wald estimates it is somewhat easily imaginable that this does not pose any problem. For the above estimation of \(\alpha\) using 2SLS this is less obvious. Angrist argues, though, that there is a Generalized Method of Moments (GMM) interpretation of the 2SLS approach which allows him to work with sample moments alone. Another important implication thereof is that he is not restricted to using only one sample to obtain the sample moments. In our concrete case here, it is therefore unproblematic that the earnings data is coming from another sample than the conditional probabilities of being a veteran as both of the samples are drawn from the same population. This is a characteristic of the GMM.
In the following, Angrist estimates equation (2) by using the more efficient approach of Generalized Least Squares (GLS) as opposed to OLS. The GLS is more efficient if there is correlation between the residuals in a regression model. Angrist argues that this is the case in the above model equation and that this correlation can be estimated. GLS works such that coming from the estimated covariance matrix \(\hat\Omega\) of the residuals, the regressors as well as the dependent variable are transformed using the upper triangle of the Cholesky decomposition of \(\hat\Omega^{1}\). Those transformed variables are then used to run a regular OLS model with nonrobust standard errors. The resulting estimate \(\hat\alpha\) then is the most efficient one (if it is true that there is correlation between the residuals).
Angrist states that the optimal weigthing matrix \(\Omega\) resulting in the most efficient estimate of \(\hat\alpha\) looks the following:
All of the three elements on the right hand side can be estimated from the data at hand.
Now we have all the ingredients to have a look at the results in Table 4. In practice, Angrist estimates two models in the above manner based on the general form of the above regression equation. Model 1 allows the treatment effect to vary by cohort while Model 2 collapses them into a scalar estimate of \(\alpha\). The results for white men in Model 1 show that for each of the three earnings measures as dependent variable only few are statistically significant to the five percent level (indicated by a star added by me again). A look at Model 2 reveals, though, that the combined treatment effect is significant and it amounts to a minus of 2000 dollar (we look again at real earnings in 1978 dollar terms) annualy for those having served in the army. For cohort 1953 we obtain insignificant estimates which was to be expected given that actually nobody was drafted in that year.
Note: The results are again a bit different to those in the paper. The same is the case, though, in the replication code my replication is building up on.
[17]:
table4 = get_table4(data_cwhsc_new)
p_value_star(
table4["white"], (slice(None), slice(None), ["Value", "Standard Error"]), (slice(None)),
)
[17]:
FICA Taxable Earnings  Adjusted FICA Earnings  Total W2 Compensation  

Model  Cohort  Statistic  
Model 1  1950  Value  1709.2  2093.7  1895.0 
Standard Error  946.8  1109.2  1336.9  
1951  Value  1457.1  1983.7  2431.4  
Standard Error  954.7  1036.5  1155.4  
1952  Value  1724.0  1943.0*  2058.7*  
Standard Error  863.3  927.5  1004.8  
1953  Value  1223.8  900.7  488.6  
Standard Error  3232.5*  3506.6*  3947.4*  
Chi Squared  578.3  630.3  569.5  
Model 2  195053  Value  1562.9  1920.4  2094.5 
Standard Error  521.7  576.8  649.1  
Chi Squared  579.1  631.0  569.7 
Angrist also reports those estimates for nonwhite men which are not significant. This was already expected as the the instrument was not clearly correlated with the endogenous variable of veteran status.
[18]:
p_value_star(
table4["nonwhite"], (slice(None), slice(None), ["Value", "Standard Error"]), (slice(None)),
)
[18]:
FICA Taxable Earnings  Adjusted FICA Earnings  Total W2 Compensation  

Model  Cohort  Statistic  
Model 1  1950  Value  3893.7*  3871.9*  5711.8* 
Standard Error  5355.1  6246.9  7206.5  
1951  Value  891.3  333.4  2609.0  
Standard Error  4399.6  4667.1  4887.1  
1952  Value  3182.9  3457.7  3068.0  
Standard Error  3994.9  4194.9  4222.7  
1953  Value  5928.3  8571.5  6325.8  
Standard Error  10302.3*  10652.6*  11393.0*  
Chi Squared  616.7  681.7  693.6  
Model 2  195053  Value  643.3  999.7  367.8 
Standard Error  2406.8  2602.6  2733.8  
Chi Squared  618.4  683.4  695.6 
This table concludes the replication of the core results of the paper. Summing up, Angrist constructed a causal graph for which he employs a plausible estimation strategy. Using his approach he concludes with the main result of having found a negative effect of serving in the military during the Vietnam era on subsequent earnings for white male in the United States.
Angrist provides some interpretation of the found effect and some concerns that might arise when reading his paper. I will discuss some of his points in the following critical assessment.
4. Critical Assessment
Considering the time back then and the consequently different state of research, the paper was a major contribution to instrumental variable estimation of treatment effects. More broadly, the paper is very conclusive and well written. Angrist discusses caveats quite thoroughly which makes the whole argumentation at first glance very concise. Methodologically, the paper is quite complex as due to the kind of data available. Angrist is quite innovative in that regard as he comes up with the two sample IV method in this paper which allows him to pratically follow his identification strategy. The attempt to explain the mechanisms behind the negative treatment effect found by him makes the paper comprehensive and shows the great sense of detail Angrist put into this paper.
While keeping in mind the positive sides of his paper, in hindsight, Angrist is a bit too vocal about the relevance and accuracy of his findings. Given our knowledge about the local average treatment effect (LATE) we encountered in our lecture, Angrist only identifies the average treatment effect of the compliers (those that enroll for the army if they are drafteligible but do not if they are not) if there is individual level treatment heterogeneity and if the causal graph from before is accurate. Hence, the interpretation of the results gives only limited policy implications. For the discussion of veteran compensation the group of those who were induced by the lottery to join the military are not crucial. As there is no draft lottery anymore, what we are interested in is how to compensate veterans for their service who “voluntarily” decided to serve in the military. This question cannot be answered by Angrist’s approach given the realistic assumption that there is treatment effect heterogeneity (which also Angrist argues might be warranted).
A related difficulty of interpretation arises because in the second part, Angrist uses an overidentified model. As already discussed before this amounts to a linear combination of the average treatment effects of subgroups. This mixes the LATEs of several subgroups making the policy implications even more blurred as it is not clear what the individual contributions of the different subgroups are. In this example here this might not make a big difference but should be kept in mind when using entirely different instrumental variables to identify the LATE.
In a last step, there are several possible scenarios to argue why the given causal graph might be violated. Angrist himself delivers one of them. After the lottery numbers were drawn, there was some time in between the drawing and the announcement of the drafteligibility ceiling. This provoked behavioral responses of some individuals with low numbers to volunteer for the army in order to get better terms of service as well as enrolling in university which rendered them ineligible for the army. In our data, it is unobservable to see the fraction of individuals in each group to join university. If there was actually some avoidance behavior for those with low lottery numbers, then the instrument would be questionable as there would be a path from the Draft Lottery to unobservables (University) which affects earnings. At the same time there is also clearly a relation between University and Military Service.
Rosenzweig and Wolpin (2000) provide a causal graph that draws the general interpretability of the results in Angrist (1990) further into question. Let us look at the causal graph below now imagining that there was no directed graph from Draft Lottery to Civilian Experience. Their argument is that Military Service reduces Schooling and Civilian Experience which lowers Wages while affecting Wages directly and increasing them indirectly by reducing Schooling and increasing work experience. Those subtle mechanism are all collapsed into one measure by Angrist which gives an only insufficiently shallow answer to potentially more complex policy questions. Building up on this causal graph, Heckman (1997) challenges the validity of the instrument in general by making the point that there might be a directed graph from Draft Lottery to Civilian Experience. The argument goes as follows: Employers, after learning about their employees’ lottery numbers, decrease the training on the job for those with a high risk of being drafted. If this is actually warranted the instrument Draft Lottery cannot produce unbiased estimates anymore.
Morgan and Winship (2014) add to this that the bias introduced by this is further affected by how strongly Draft Lottery affects Military Service. Given the factor that the lottery alone does not determine military service but that there are tests, might cause the instrument to be rather weak and therefore a potential bias to be rather strong.
5. Extensions
5.1 Treatment effect with different years of earning
In the calculation of the average treatment in Table 4 Angrist chooses to calculate it for earnings in the years from 1981 to 84. While he plausibly argues that this most likely constitutes a long term effect (as those are the last years for which he has data) in comparison to earlier years, it does not give a complete picture. Looking at Table 1 again we can see that for the earnings differences in the years 81 to 84 quite big estimates are calculated. Assuming that the difference in probability of serving given eligibility versus noneligibility stays somewhat stable across the years, we would expect some heterogeneity in average treatment effects depending on which years we use the earnings data of. Angrist, though, does not investigate this although he has the data for it at hand. For example from a policy perspective one could easily argue that a look at the average treatment effect for earlier years (close to the years in which treatment happens) might be more relevant than the one for years after. This is because given the long time between the actual service and the earnings data of 1981 to 84 it is likely that second round effects are driving some of the results. These might be initially caused by veteran status but for later years the effect of veteran status might mainly act by means of other variables. For instance veterans after the war might be forced to take simple jobs due to their lack of work experience and from then on their path is determined by the lower quality of the job that they had to take right after war. For policy makers it might be of interest to see what happens to veterans right after service to see what needs to be done in order to stop second round effects from happening in the first place.
To give a more wholesome image, I estimate the results for Table 4 for different years of earnings of white men. As mentioned before the quality of the Total W2 data set is rather low and the adjusted FICA is more plausible than the FICA data. This is why I only use the adjusted FICA data in the following. For the adjusted FICA I have data for Table 4 for the years from 1974 to 1984. For each possible four year range within those ten years I estimate Model 1 and 2 from Table 4 again.
Below I plot the average treatment effects obtained. On the xaxis I present the starting year of the range of the adjusted FICA data used. For starting value 74 it means that the average treatment effect is calculated for earnings data of the years 1974 to 77. The results at the starting year 81 are equivalent to the ones found by Angrist in Table 4 for white men.
[19]:
# get the average treatment effects of Model 1 and 2 with adjusted FICA earnings for
# several different ranges of four years
results_model1 = np.empty((8, 4))
results_model2 = np.array([])
for number, start_year in enumerate(np.arange(74, 82)):
years = np.arange(start_year, start_year + 4)
flex_table4 = get_flexible_table4(data_cwhsc_new, years, ["ADJ"], [50, 51, 52, 53])
results_model1[number, :] = (
flex_table4["white"].loc[("Model 1", slice(None), "Value"), :].values.flatten()
)
results_model2 = np.append(
results_model2, flex_table4["white"].loc[("Model 2", slice(None), "Value"), :].values,
)
[20]:
# Plot the effects for white men in Model 1 and 2
# (colors apart from Cohort 1950 are random, execute again to
# change them)
get_figure1_extension1(results_model1, results_model2)
The pattern is more complex than what we can see in the glimpse of Table 4 in the paper. We can see that there is quite some heterogeneity in average treatment effects across cohorts when looking at the data for early years. This changes when using data of later years. Further the fact of being a veteran does seem to play a role for the cohort 1953 right after the war but the treatment effect becomes insignificant when looking at later years. This is interesting as the cohort of 1953 was the one for which no one was drafted (remember that in 1973 no one was drafted as the last call was in December 1972).
Another observation is linked to the fact that draft eligibility does not matter for those born in 1953. These people appear to have voluntarily joined the army as no one of them could have possibly been drafted. This cannot be said for the cohorts before. Employers can only observe whether a person is a veteran and when they are born (and not if they are compliers or not). A theory could be that employers act on the loss of experience for initial wage setting for every army veteran right after the war. The fact that the cohort of 1953 could only be volunteers but not draftees could give them a boost in social status to catch up again in the long run, though. This mechanism might explain to a certain extent why we observe the upward sloping line for the cohort of 1953 (but not for the other groups).
As discussed in the critical assessment, we actually only capture the local average treatment effect of the compliers. Those are the ones who join the army when they are drafteligible but do not when they are not. The identifying assumption for the LATE requires that everyone is a complier. This is probably not warranted for the cohort of 1953. In that year it is easily imaginable that there are both defiers and compliers which means that we do not capture the LATE for cohort 1953 in Model 1 and for cohort 195053 in Model 2 but something else we do not really know how to interpret. This might be another reason why we observe this peculiar pattern for the cohort of 1953. Following up on this remark I estimate the Model 2 again excluding the cohort of 1953 to focus on the cohorts for which the assumptions for LATE are likely to hold.
[21]:
results_model2_53 = np.array([])
for number, start_year in enumerate(np.arange(74, 82)):
years = np.arange(start_year, start_year + 4)
flex_table4 = get_flexible_table4(data_cwhsc_new, years, ["ADJ"], [50, 51, 52])
results_model2_53 = np.append(
results_model2_53, flex_table4["white"].loc[("Model 2", slice(None), "Value"), :].values,
)
[22]:
get_figure2_extension1(results_model2, results_model2_53)
We can see that for later years the treatment effect is a bit lower when excluding the cohort of 1953. It confirms the findings of Angrist with the advantage of making it possible to attach a clearer interpretation to it.
Following the above path, it would also be interesting to vary the amount of instruments used by more than just the two ways Angrist has shown. It would be interesting to break down the interval size of lottery numbers further. Unfortunately I could no find a way to do that with the already preprocessed data I have at hand.
5.2 Bias Quantification
In the critical assessment I argued that the simple Wald estimate might be biased because employers know their employees’ birth date and hence their draft eligibility. The argument was that employers invest less into the human capital of those that might be drafted. This would cause the instrument of draft eligibility to not be valid and hence suffer from bias. This bias can be calculated in the following way for a binary instrument:
What has been done in the last column of Table 3 (the Wald estimate) is that Angrist calculated the left hand side of this equation. This calculation yields an unbiased estimate of the treatment effect of \(D\) (veteran status) on \(Y\) (earnings) \(\delta\) if there is no effect of the instrument \(Z\) (draft eligibility) on \(Y\) through means of unobservables \(\epsilon\). In our argumentation this assumption does not hold which means that \(E[\epsilonZ=1]  E[\epsilonZ=0]\) is not equal to zero as draft eligibility affects \(Y\) by the behavioral change of employers to make investing into human capital dependent on draft eligibility. Therefore the left hand side calculation is not equal to the true treatment effect \(\delta\) but has to be adjusted by the bias \(\frac{E[\epsilonZ=1]  E[\epsilonZ=0]}{E[DZ=1]  E[DZ=0]}\).
In this section I run a thought experiment in which I quantify this bias. The argumentation here is rather heuristic because I lack the resources to really find a robust estimate of the bias but it gives a rough idea of whether the bias might matter economically. My idea is the following. In order to get a measure of \(E[\epsilonZ=1]  E[\epsilonZ=0]\) I have a look at estimates for the effect of work experience on earnings. Remember that the expected difference in earnings due to a difference in draft eligibility is caused by a loss in human capital for those draft eligible because they might miss out on onthejobtraining. This loss in onthejobtraining could be approximated by a general loss in working experience. For an estimate of that effect I rely on Keane and Wolpin (1997) who work with a sample for young men between 14 and 21 years old from the year 1979. The effect of working experience on real earnings could be at least not far off of the possible effect in our sample of adjusted FICA real earnings of 19 year old men for the years 1981 to 1984. Remember that lottery participants find out about whether they are draft eligible or not at the end of the year before they might be drafted. I assume that draft dates are spread evenly over the draft year. One could then argue that on average a draft eligible person stays in his job for another half a year after having found out about the eligibility and before being drafted. Hence, for on average half a year an employer might invest less into the human capital of this draft eligible man. I assume now that employers show a quite moderate behavioral response. During the six months of time, the employees only receive a five month equivalent of human capital gain (or work experience gain) as opposed to the six months they stay in the company. This means they loose one month of work experience on average in comparison to those that are not draft eligible.
To quantify this one month loss of work experience I take estimates from Keane and Wolpin (1997). For blue collar workers they roughly estimate the gain in real earnings in percent from an increase in a year of blue collar work experience to be 4.6 percent (actually their found effect depends on the years of work experience but I simplify this for my rough calculations). For white collar workers the equivalent estimate amounts to roughly 2.7 percent. I now take those as upper and lower bounds, calculate their one month counterparts and quantify the bias in the Wald estimates of the last column of Table 3. The bias \(\frac{E[\epsilonZ=1]  E[\epsilonZ=0]}{E[DZ=1]  E[DZ=0]}\) is then roughly equal to the loss in annual real earnings due to one month less of work experience divided by the difference in probability of being a veteran conditional on draft eligibility.
The first table below depicts how the bias changes by cohort across the different years of real earnings with increasing estimates of how a loss in experience affects real earnings. Clearly with increasing estimates of how strong work experience contributes to real earnings, the bias gets stronger. This is logical as it is equivalent to an absolute increase in the nominator. Above that the bias is stronger for later years of earnings as the real earnings increase by year. Further the slope is steeper for later cohorts as the denominator is smaller for later cohorts. Given the still moderate assumption of a loss of one month of work experience we can see that the bias does not seem to be negligible economically especially when taking the blue collar percentage estimate.
[23]:
# Calculate the bias, the true delta and the orginal Wald estimate for a
# ceratain interval of working experience effect
interval = np.linspace(0.025, 0.05, 50) / 12
bias, true_delta, wald = get_bias(
data_cwhsa, data_cwhsb, data_dmdc, data_sipp, data_cwhsc_new, interval
)
[24]:
# plot the bias by cohort
get_figure1_extension2(bias, interval)
To get a sense of how the size of the bias relates to the size of the previously estimated Wald coefficients, let us have look at the figure below. It shows for each cell consisting of a cohort and year combination, the Wald estimate from Table 3 as the horizontal line and the true \(\delta\) depending on the weight of the loss in work experience as the upward sloping line. Given that our initial estimates of the Wald coefficients are in a range of only a few thousands, an estimated bias of roughly between 200 and 500 dollars cannot be characterized as incosiderable. Further given Angrist’s policy question concerning Veteran compensation, even an estimate that is higher by 200 dollars makes a big difference when it is about compensating thousands of veterans.
[25]:
# plot the the true delta (accounted for the bias) compared to the original Wald estimate
get_figure2_extension2(true_delta, wald, interval)
6. Conclusion
Regarding the overall quality and structure of Angrist (1990), reading it is a real treat. The controversy after its publication and the fact that it is highly cited clearly show how important its contribution was and still is. It is a great piece of discussion when it comes to the interpretability and policy relevance of instrumental variable approaches. As already reiterated in the critical assessment, one has to acknowledge the care Angrist put into this work. Although his results do not seem to prove reliable, it opened a whole discussion on how to use instrumental variables to get the most out of them. Another contribution that should not go unnoticed is that Angrist shows that instruments can be used even though they might not come from the same sample as the dependent and the endogenous variable. Practically, this is very useful as it widens possible areas of application for instrumental variables.
Overall, it has to be stated that the paper has some shortcomings but the care put into this paper and the good readibility allowed other researchers (and Angrist himself) to swoop in giving helpful remarks that improved the understanding of instrumental variable approaches for treatment effect evaluation.
References
Angrist, J. (1990). Lifetime Earnings and the Vietnam Era Draft Lottery: Evidence from Social Security Administrative Records. American Economic Review. 80. 31336.
Angrist, J. D., & Pischke, J.S. (2009). Mostly harmless econometrics: An empiricist’s companion.
Heckman, J. (1997). Instrumental Variables: A Study of Implicit Behavioral Assumptions Used in Making Program Evaluations. The Journal of Human Resources, 32(3), 441462. doi:10.2307/146178
Keane, M., & Wolpin, K. (1997). The Career Decisions of Young Men. Journal of Political Economy, 105(3), 473522. doi:10.1086/262080
Morgan, S., and Winship, C. (2014). Counterfactuals and Causal Inference: Methods and Principles for Social Research (Analytical Methods for Social Research). Cambridge: Cambridge University Press. doi:10.1017/CBO9781107587991
Rosenzweig, M. R. and Wolpin, K. I.. (2000). “Natural ‘Natural Experiments’ in Economics.” Journal of Economic Literature 38:827–74.
Wald, A. (1940). The Fitting of Straight Lines if Both Variables are Subject to Error. Ann. Math. Statist. 11 , no. 3, 284–300.
Appendix
Key Variables in the Data Sets
data_cwhsa
Name 
Description 

index 

byr 
birth year 
race 
ethnicity, 1 for white and 2 for nonwhite 
interval 
interval of draft lottery numbers, 73 intervals with the size of five consecutive numbers 
year 
year for which earnings are collected 
variables 

vmn1 
nominal earnings 
vfin1 
fraction of people with zero earnings 
vnu1 
sample size 
vsd1 
standard deviation of earnings 
data_cwhsb
Name 
Description 

index 

byr 
birth year 
race 
ethnicity, 1 for white and 2 for nonwhite 
interval 
interval of draft lottery numbers, 73 intervals with the size of five consecutive numbers 
year 
year for which earnings are collected 
type 
source of the earnings data, “TAXAB” for FICA and “TOTAL” for Total W2 
variables 

vmn1 
nominal earnings 
vfin1 
fraction of people with zero earnings 
vnu1 
sample size 
vsd1 
standard deviation of earnings 
data_cwhsc_new
Name 
Description 

index 

byr 
birth year 
race 
ethnicity, 1 for white and 2 for nonwhite 
interval 
interval of draft lottery numbers, 73 intervals with the size of five consecutive numbers 
year 
year for which earnings are collected 
type 
source of the earnings data, “ADJ” for adjusted FICA, “TAXAB” for FICA and “TOTAL” for Total W2 
variables 

earnings 
real earnings in 1978 dollars 
nj 
sample size 
nj0 
number of persons in the sample with zero earnings 
iweight_old 
weight for weighted least squares 
ps_r 
fraction of people having served in the army 
ern74 to ern84 
unweighted covariance matrix of the real earnings 
data_dmdc
Name 
Description 

index 

byr 
birth year 
race 
ethnicity, 1 for white and 2 for nonwhite 
interval 
interval of draft lottery numbers, 73 intervals with the size of five consecutive numbers 
variables 

nsrvd 
number of people having served 
ps_r 
fraction of people having served 
data_sipp (this is the only micro data set)
Name 
Description 

index 

u_brthyr 
birth year 
nrace 
ethnicity, 0 for white and 1 for nonwhite 
variables 

nvstat 
0 if man is not a veteran, 1 if he is 
fnlwgt_5 
fraction of people with this index among overall sample 
rsncode 
1 if person was draft eligible, else if not 