Political Science 799: Multivariate Analysis

Fall 2022
Tuesday 4-6 (2325 MH)
Professor: Walter R. Mebane, Jr.
Office: 7735 Haven Hall (607/592-0546); email wmebane@umich.edu
Office hours: Tue 1-3 or other times by appointment.
GSI: Fabricio Vasselai; email vasselai@umich.edu
GSI Office hours: XXX or other times by appointment.
Course web page: in Canvass; syllabus also at http://www.umich.edu/~wmebane/ps799.html

Assignment Due Dates
due date description weight
TBA problem sets 60%
Dec 15, 1:30pm-3:30pm final paper presentations 10%
Dec 16 final paper 20%
-- participation 10%

See fpaper.pdf posted on the Canvas site for more information about the final paper and presentation.

Reading Availability

Much of the course will refer to journal articles. I plan to follow or refer to a few chapters in the following books (others also appear below).

Cameron, A. Colin, and Pravin K. Trivedi. 2005. Microeconometrics: Methods and Applications. Cambridge UP.

Kenneth E. Train. 2003. Discrete Choice Methods with Simulation. Cambridge UP.

In the following listing, required reading is preceded by a bullet. Other items are recommended.

Class meeting and reading schedule

  1. computing (Aug 30)

    The Comprehensive R Archive Network. 2019. https://cran.r-project.org/

    Crawley, Michael R. 2007. The R Book. Wiley.

    Spector, Phil. 2008. Data manipulation with R. Springer.

    Albert, Jim. 2007. Bayesian Computation with R. Springer.

    Chambers, John M. 2008. Software for Data Analysis. Springer.

    Braun, W. John, and Duncan J. Murdoch. 2007. A First Course in Statistical Programming with R. Cambridge.

    Bierlaire, M. (2018). PandasBiogeme: a short introduction. Technical report TRANSP-OR 181219. Transport and Mobility Laboratory, ENAC, EPFL.

    Plummer, Martyn. 2003. “JAGS: A Program for Analysis of Bayesian Graphical Models Using Gibbs Sampling,” Proceedings of the 3rd International Workshop on Distributed Statistical Computing (DSC 2003), March 20-22, Vienna, Austria. ISSN 1609-395X.

    JAGS. http://mcmc-jags.sourceforge.net/

    Pemstein, Daniel, Kevin M. Quinn and Andrew D. Martin. 2011. “The Scythe Statistical Library: An Open Source C++ Library for Statistical Computation.” Journal of Statistical Software. 42. DOI 10.18637/jss.v042.i12

    Martin, Andrew D., Kevin M. Quinn, and Jong Hee Park. 2011. “MCMCpack: Markov Chain Monte Carlo in R.” Journal of Statistical Software. 42. DOI 10.18637/jss.v042.i09

  2. maximum likelihood and numerical optimization (Sep 6)

    Philip E. Gill , Walter Murray and Margaret H. Wright. 1982. Practical Optimization. Classics in Applied Mathematics edition (2019). https://doi.org/10.1137/1.9781611975604

    Stefan Theussl, Florian Schwendinger, Hans W. Borchers. 2021. “CRAN Task View: Optimization and Mathematical Programming.” https://cran.r-project.org/web/views/Optimization.html

  3. generalized linear models and QMLE (Sep 13)

    Peter McCullagh and John A. Nelder. 1989. Generalized Linear Models. 2d ed. Chapman and Hall.

    O. E. Barndorff-Nielsen. 1995. “Quasi Profile and Directed Likelihoods From Estimating Functions.” Ann. Inst. Statist. Math. 47(3), 461-464. https://www.ism.ac.jp/editsec/aism/pdf/047_3_0461.pdf

    S. A. Murphy and A. W. van der Vaart. 2000. “On Profile Likelihood.” Journal of the American Statistical Association 95 (450 Jun)), 449-465. (in file Murphy.Vaart.JASA2000.2669386.pdf)

  4. asymptotics, bootstrap and refinements (Sep 20)

    Bradley Efron. 1979. “Bootstrap Methods: Another Look at the Jackknife,” Annals of Statistics 7 (1): 1-26. (in file efron.aos1979.pdf)

    Barndorff-Nielsen, O. E., and D. R. Cox. 1984. “Bartlett Adjustments to the Likelihood Ratio Statistic and the Distribution of the Maximum Likelihood Estimator,” Journal of the Royal Statistical Society. Series B (Methodological) 46 (3): 483-495.

    A.C. Davison and D.V. Hinkley. 1997. Bootstrap Methods and their Applications. Cambridge.

    Steven J. Sepanski. 1994. “Asymptotic for Multivariate $t$-Statistic and Hotelling's $T^2$-Statistic under Infinite Second Moments via Bootstrapping,” Journal of Multivariate Analysis 49 (1): 41-54.

    Cameron and Trivedi. Chapters 5, 11 and Appendix A.

    Jeffrey M. Wooldridge. 2002. Econometric Analysis of Cross Section and Panel Data. MIT Press. Chapters 3, 12-14.

    Russell Davidson and James G. MacKinnon. 1993. Estimation and Inference in Econometrics. Oxford UP. Chapters 4, 8-9.

    Diogo Ferrari and John E. Jackson. 2019. “ceser R Package: Cluster Estimated Standard Error in R.” Journal of Statistical Software. (in file ceser.pdf)

  5. text as data (Sep 27)

    Matthew J. Denny and Arthur Spirling. 2018. “Text Preprocessing For Unsupervised Learning: Why It Matters, When It Misleads, And What To Do About It.” Political Analysis 26: 168-189. (in file text_preprocessing_for_unsupervised_learning.pdf)

    David M. Blei, Andrew Y. Ng and Michael I. Jordan. 2003. Latent Dirichlet allocation. Journal of Machine Learning Research 3 (Jan.): 993-1022.

    David M. Blei, Jon D. McAuliffe. 2007. Supervised topic models. Neural Information Processing Systems 21.

    Walter R. Mebane, Jr., Patrick Y. Wu, Logan Woods, Alejandro Pineda, Blake Miller, Joseph Klaver, Preston Due and Adam Rauh. 2020. “Diverse Election Experiences Reported without Bias: Observing Election Incidents in the United States via Twitter.” (in file TEO.pdf)

  6. prediction, machine learning, LASSO, regularization (Oct 4)

  7. choice models (Oct 11)

    Hensher, David A., and William H. Greene. 2003. The mixed logit model: the state of practice. Transportation 30 (2): 133-176.

    McFadden, Daniel. 1974. “Conditional logit analysis of qualitative choice behavior.” In P Zarembka, ed., Frontiers of Econometrics, New York: Acadmic Press. pages 105-142.

    McFadden, Danel. 1981. “Structural Discrete Probability Models Derived from Theories of Choice.” In Charles F. Manski and Daniel L. McFadden, eds, Structural Analysis of Discrete Data and Econometric Applications, Cambidge, MA: MIT Press, chapter 5, pp. 198-272.

    Mauricio Sarrias and Ricardo A. Daziano. 2017. “Multinomial Logit Models with Continuous and Discrete Individual Heterogeneity in R: The gmnl Package.” Journal of Statistical Software 79 (2). doi: 10.18637/jss.v079.i02

  8. observational studies, RD and causal inference (Oct 25)

    Paul R. Rosenbaum. 2002. Observational Studies. Springer.

    Paul R. Rosenbaum. 2009. Design of Observational Studies. Springer.

    Judea Pearl. 2009. Causality: Models, Reasoning, and Inference, 2d ed. Cambridge.

    Joshua D. Angrist and Jörn-Steffen Pischke. 2009. Mostly Harmless Econometrics. Princeton.

    Dunning, Thad. 2012. Natural Experiments in the Social Sciences: A Design-Based Approach (Strategies for Social Inquiry). Cambridge.

    Gerber, Alan S., and Donald P. Green. 2012. Field Experiments: Design, Analysis, and Interpretation. Norton.

  9. causal identification norms, DAGs, interference (Nov 1)

    Roger Bowden. 1973. The Theory of Parametric Identification. Econometrica 41 (Nov): 1069-1074.

    Franklin Fisher. 1976. The Identification Problem in Econometrics. Krieger.

    Roger Bowden and Darrell Turlington. 1984. Instrumental Variables. Cambridge UP.

    Judea Pearl. 2009. Causality: Models, Reasoning and Inference, 2d ed. Cambridge UP. Chapters 1-5.

    James M. Robins. 1999. Association, Causation, and Marginal Structural Models. Synthese 121: 151-179.

    David A. Freedman and Jasjeet S. Sekhon. 2010. Endogeneity in Probit Response Models. Political Analysis 18 (2): 138-150.

    James J. Heckman and Edward Vytlacil. 2005. Structural Equations, Treatment Effects, and Econometric Policy Evaluation. Econometrica 73 (3, May): 669-738.

    Heckman, J. J. 1978. Dummy endogenous variables in a simultaneous equation system. Econometrica 46: 931-959.

    Heckman, J. J. 1979. Sample selection bias as a specification error. Econometrica 47: 153-161.

  10. hierarchical models, MCMC (Nov 8)

    George Casella and Edward I. George. 1992. Explaining the Gibbs Sampler The American Statistician 46 (3, Aug.): 167-174.

    Siddhartha Chib and Edward Greenberg. 1995. Understanding the Metropolis-Hastings Algorithm The American Statistician 49 (4, Nov.): 327-335.

    Jeff Gill. 2002. Bayesian Methods: A Social and Behavioral Approach. Chapman & Hall.

  11. latent variable models (Nov 15, 22)

    Jian-Qing Shi and Sik-Yum Lee. 2000. Latent Variable Models with Mixed Continuous and Polytomous Data. Journal of the Royal Statistical Society. Series B (Statistical Methodology) 62 (1): 77-87. (in file shi.lee.jrssb2000.pdf)

    Michael A. Bailey. 2007. Comparable Preference Estimates across Time and Institutions for the Court, Congress, and Presidency. American Journal of Political Science 51 (3, Jul.): 433-448.

    Sik-Yum Lee. 2007. Structural Equation Modelling: A Bayesian Approach. Wiley.

    Sik-Yum Lee, Xin-Yuan Song, John C. K. Lee. 2003. Maximum Likelihood Estimation of Nonlinear Structural Equation Models with Ignorable Missing Data. Journal of Educational and Behavioral Statistics 28 (Summer): 111-134.

    Sik-Yum Lee and Xin-Yuan Song. 2004. Maximum Likelihood Analysis of a General Latent Variable Model with Hierarchically Mixed Data. Biometrics 60 (Sep.): 624-636.

    Sophia Rabe-Hesketh, Anders Skrondal and Andrew Pickles. 2004. Generalized Latent Variable Modelling: Multilevel, Longitudinal and Structural Equation Models. Chapman & Hall.

  12. hypothesis tests and model selection (Nov 29)

    Kass, Robert E., and Adrian E. Raftery. 1995. “Bayes factors” Journal of the American Statistical Association 90 (430) : 773-795.

    Chib, S., 2001. “Markov chain Monte Carlo methods: computation and inference.” In Handbook of Econometrics (Vol. 5, pp. 3569-3649). Elsevier.
    Section 10: MCMC methods in model choice problems. (in file chib2001.pdf)

    Gelman, A. and Meng, X.L., 1998. “Simulating normalizing constants: From importance sampling to bridge sampling to path sampling.” Statistical Science 163-185.

  13. partial identification and identification with missing covariates (Dec 6)

    Charles F. Manski. 1995. Identification in the Social Sciences. Harvard UP.

    Charles F. Manski. 2003. Partial Identification of Probability Distributions. Springer.

    Rosa L. Matzkin. 2007. Nonparametric Identification. In James J. Heckman and Edward E. Leamer, eds., Handbook of Econometrics volume 6B. North-Holland. Pp. 5307-5368.

    Charles F. Manski and Elie Tamer. 2002. Inference on Regressions with Interval Data on a Regressor or Outcome. Econometrica 70 (2, Mar): 519-546.

  14. bounded influence estimation (Dec 6)

    Hampel, Frank R. and Peter J. Rousseeuw and Elvezio Ronchetti. 1981. The Change-of-Variance Curve and Optimal Redescending M-Estimators. Journal of the American Statistical Association 76 (Sep): 643-648.

    Croux, Christophe and Peter J. Rousseeuw and Ola Hossjer. 1994. Generalized S-Estimators. Journal of the American Statistical Association 89 (Dec): 1271-1281.

  15. paper presentations (Dec 14, 1:30pm-3:30pm)