Neural Network Pruning: a group of methods to reduce the memory footprint of a neural network by removing redundant parts.
- Train the network
- Remove the redundant parts
- Finetune the network
The pruned network is generally 90-95% lighter and just as accurate as the original model.
Question: Why can't we train the sparse network from scratch?
Answer: The pruned network reinitialized and trained from scratch doesn't reach the same accuracy and slow convergence.
It seems that the overparameterization is helpful when training the network not during inference.
The Lottery Ticket Hypothesis: A randomly-initialized, dense neural network contains a subnetwork that is initialized such that — when trained in isolation — it can match the test accuracy of the original network after training for at most the same number of iterations.
This subnetwork is referred to as the winning ticket because of its lucky intialization.
Procedure to find the winning ticket:
- Randomly intialize a neural network
- Train until convergence
- Prune a part of the network
- Reset the weights of the pruned network to their values from (1)
- Train the pruned and untrained network and examine its convergence and accuracy to test wether its a winning ticket or not.
So the winning tickets train faster but if reinitialized randomly the resulting networks train slower than the full network.
The Lottery Ticket Conjecture: Returning to our motivating question, we extend our hypothesis into an untested conjecture that SGD seeks out and trains a subset of well-initialized weights. Dense, randomly-initialized networks are easier to train than the sparse networks that result from pruning because there are more possible subnetworks from which training might recover a winning ticket.
- This paper is fascinating read. The authors did a very thorough investigation to validate the hypothesis.
- I think the next breakthrough will be instead of postproessing, the optimization directly takes in a subspace. Interesting to see what's going to happen
- Jonathan Frankle and Michael Carbin, The Lottery Ticket Hypothesis: Training Pruned Neural Networks (ICLR 2019)