Problem Definition

Given a facial image, the objective is to estimate the age of the person in the image.


Some examples of images from three popular datasets: MORPH, FG-NET and CACD, and the age of each subject is shown above.


The network is based on a concept called Differentiable regression forests. Unlike the traditional regression forests that perform hard data partitions, these differential regression forests perform soft data partition, so that an input-dependent partition function can be learned to handle heterogeneous data. In addition, the differentiable regression forests can be seamlessly integrated with any deep networks, which enables us to conduct an end-to-end deep age estimation model, which we name Deep Regression Forests (DRFs). The outline of the architecture is given below.


Training and Implementation details

  • An alternating optimization strategy is adopted:
    first the leaf nodes are fixed and then they optimize the data partitions at split nodes as well as the CNN parameters (feature learning) by Back-propagation;
    Then, the split nodes are fixed and they optimize the data abstractions at leaf nodes (local regressors) by Variational Bounding.

  • The realization of DRFs is based on the public available “caffe” framework.

  • VGG-16 Net is used for the CNN part of the proposed DRFs.

  • As part of the pre-processing, faces are firstly detected by using a standard face detector and facial landmarks are localized by AAM.

  • The performance of age estimation is evaluated in terms of mean absolute error (MAE) as well as Cumulative Score (CS).


  • MORPH: Popular dataset for age estimation which contains more than 55,000 images from about 13,000 people of different races.
  • FG-NET: Widely used for age estimation. Contains 1002 facial images of 82 individuals.
  • CACD: A large dataset which has around 160,000 facial images of 2,000 celebrities.