Paper: https://arxiv.org/pdf/1712.07195.pdf
Code: https://github.com/shenwei1231/caffeDeepRegressionForests
Problem Definition
Given a facial image, the objective is to estimate the age of the person in the image.
Some examples of images from three popular datasets: MORPH, FGNET and CACD, and the age of each subject is shown above.
Architecture
The network is based on a concept called Differentiable regression forests. Unlike the traditional regression forests that perform hard data partitions, these differential regression forests perform soft data partition, so that an inputdependent partition function can be learned to handle heterogeneous data. In addition, the differentiable regression forests can be seamlessly integrated with any deep networks, which enables us to conduct an endtoend deep age estimation model, which we name Deep Regression Forests (DRFs). The outline of the architecture is given below.
Training and Implementation details

An alternating optimization strategy is adopted:
first the leaf nodes are fixed and then they optimize the data partitions at split nodes as well as the CNN parameters (feature learning) by Backpropagation;
Then, the split nodes are fixed and they optimize the data abstractions at leaf nodes (local regressors) by Variational Bounding. 
The realization of DRFs is based on the public available “caffe” framework.

VGG16 Net is used for the CNN part of the proposed DRFs.

As part of the preprocessing, faces are firstly detected by using a standard face detector and facial landmarks are localized by AAM.

The performance of age estimation is evaluated in terms of mean absolute error (MAE) as well as Cumulative Score (CS).