Machine learning in econ

I came across a good article in Journal of Economic Perspectives on big data: Machine Learning: An Applied Econometric Approach.  

It succinctly summarizes the difference between supervised machine learning and parameter estimation. Supervised machine learning is to predict Y using X (aka y_hat). Parameter estimation (traditional econometric method) is to discover the underlying relationship between Y and X (aka beta). Just because X can predict Y doesn’t mean X causes Y. They are tools for different purposes. 

However, the article suggests 3 areas where machine learning can be fruitfully used: 

  1. Deal with new types of data: e.g., bring images and texts into analyzable forms
  2. Use prediction to serve parameter estimation: e.g., predict X in the 1st stage of instrumental variable estimation
  3. Policy application: e.g. whether to send a defendant to wait for trial in jail or at home? (I find this last area worrisome. Should data have this much power?)

The article also points out 2 steps that help us think through how to effectively use machine learning:

  1. Regularization: select the best function given a complexity level
  2. Empirical tuning: use cross-validation to pick the best regularization parameters (or the optimal complexity level)