Machine learning in econ
I came across a good article in Journal of Economic Perspectives on big data: Machine Learning: An Applied Econometric Approach.
It succinctly summarizes the difference between supervised machine learning and parameter estimation. Supervised machine learning is to predict Y using X (aka y_hat). Parameter estimation (traditional econometric method) is to discover the underlying relationship between Y and X (aka beta). Just because X can predict Y doesn’t mean X causes Y. They are tools for different purposes.
However, the article suggests 3 areas where machine learning can be fruitfully used:
- Deal with new types of data: e.g., bring images and texts into analyzable forms
- Use prediction to serve parameter estimation: e.g., predict X in the 1st stage of instrumental variable estimation
- Policy application: e.g. whether to send a defendant to wait for trial in jail or at home? (I find this last area worrisome. Should data have this much power?)
The article also points out 2 steps that help us think through how to effectively use machine learning:
- Regularization: select the best function given a complexity level
- Empirical tuning: use cross-validation to pick the best regularization parameters (or the optimal complexity level)