Other functionalities of Keras

Ch 7 in Deep Learning with Python (DLP) goes through some of the other key functionalities offered by Keras library. 

Functional API

Keras functional API enables us to use layers as functions that take tensor inputs and return tensor outputs. As a result, one can build multi-input as well as multi-output DL models.

We can have directed acyclic graphs of layers, meaning to implement neural-network components as graphs. Inception is an example of such a model. However, given its large scale, vanishing gradient is a problem. Residual connections offers a solution by making output of earlier layers available as inputs to later layers. 

We can reuse a layer multiple times in a model. It’s called layer weight sharing and allows us to build models with shared branches. Similarly, we can use model as a layer and reuse weights of the model. 

Other Features

We usually normalize our initial input data. But there is no guarantee that the output after each layer still follows a normal distribution. We can use Keras to normalize the output data after each layer transformation. This is called batch normalization

A depth-wise separable convolution layer processes a spatial convolution on each channel of the input separately before mixing output channels. We can use it in a DL model if spatial locations of the input are correlated but different channels are independent. It makes the model smaller and thus faster to run. Also, it tends to learn better with less data. 

We can use the callback object to monitor a model while it’s running, and take actions such as stop training or change learning rate in the meantime. We can also use TensorBoard to visualize and monitor real-time performance of a model. Moreover, Keras has a utility that plots the model as a graph of layers. We tried these functionalities on our IMDb movie review data. 

On the right is the 3D visualization of word embedding generated by TensorBoard. On the left is our model in a graph of layers (it’s a known issue that the input layer isn’t shown properly, and the proposed solution doesn’t work for me). See here for Python code.  

Model Layer and 3D Word Embedding


Improve Model Prediction

To improve our prediction, we should optimize hyperparameters of our model. Hyperparameters refer to things such as the number of layers, activation function, dropout rate, etc. Right now there is no good, reliable tools to automatically carry out the optimization process. We have to tune the model manually. 

Model ensembling is another technique to improve our prediction. First, use different models to generate a set of predictions. Then take a weighted average of those predictions as the final prediction. We can search for the ensembling weights randomly or via an optimization algorithm such as Nelder-Mead.