Three fundamental concepts

Ch 4 in Deep Learning with Python (DLP) touches on 3 fundamental concepts in machine learning. 

(1) 4 Branches of ML

They are supervised learning, unsupervised learning, self-supervised learning, and reinforcement learning.

One of my earlier posts summarizes the difference between supervised and unsupervised learning. In other words, supervised learning has input data and known output targets (or sometimes referred to as labels) while unsupervised learning only has inputs. 

Self-supervised learning has input data and some kind of targets. Unlike supervised learning, those targets are not annotated by humans, but assigned heuristically by certain algorithms based on the inputs. 

Reinforcement learning can be defined as an agent tries to learn what actions to take in a given environment so as to maximize some reward. This optimization behavior sounds very similar to the idea of homo economicus in economics. Therefore, this will be my next project to explore after deep learning. 

(2) Evaluation of ML Models

To evaluate an ML model, one usually splits the available data into 3 sets: training, validation, and test. Use training data to train the model, evaluate the model using validation data, and apply the fine-tuned model to test data so as to test the model for one last time. 

There are 3 validation approaches. If one has a big dataset, use simple hold-out validation, meaning set aside a part of the data as validation set. If one has a small dataset, use K-fold validation or iterated K-fold validation with shuffling

K-fold validation equally partitions the data into K parts. For each partition i, train the model on K-1 partitions and evaluate it on partition i, thus repeating the process K times. Iterated K-fold validation with shuffling is repeating K-fold validation multiple times while shuffling the data every time before partitioning into K parts. 

(3) Regularization

Regularization is used to tackle overfitting. Ch 4 lists 3 methods. First, reduce the size of the model. For DL models, it could mean reducing the number of layers or the number of hidden units per layer.

Second, add a penalty to having more complex models. For DL models, this means forcing weights to take small values. We can add a cost to the model. This cost can be proportional to the absolute value of weight coefficients (L1 norm of the weights), or proportional to the squared value of weight coefficients (L2 norm of the weights). 

Lastly, apply dropout to a DL model. It means randomly dropping out (aka setting values to zero) a few output features of one or multiple layers during training.