Tuning LDA using Differential Evolution - LDADE

A. Agrawal, Wei Fu, & T. Menzies (2016). What is Wrong with Topic Modeling? (and How to Fix it Using Search-based SE). http://arxiv.org/abs/1608.08176

This paper propose a novel way of tuning the LDA’s hyper parameters $\alpha$, $\beta$ and topic_size $k$. The method used here is Differential Evolution Algorithm - a black box optimization method used widely outside machine learning domain, hence the name as LDADE. This paper also reviews multiple papers in this domain and points out the issues of running LDA without tuning has instability issues with topic formation. There are multiple tuning methods proposed before to address this topic instability issue eg; LDA-GA ( LDA with Genetic Algorithm). This paper claims LDADE method is more stable and converges faster.

$\alpha$ -> Dirichlet prior for Topic Distribution over document, initialized uniformly at starting and updated via Bayesian inference.

$\beta$ -> Dirichlet prior for Vocab distribution over Topic.

$k$ -> Number of topic. This affects the LDA performance.

Differential Evolution optimization can be used to minimize any function, it does search over parameter space to find the best fitting parameter that minimizes the target function. The convergence speed is much faster and DE method prunes down the possible search space pretty quickly. You can think this as - out of all the mutation happening over the population only the favouring mutations are being carry forward to the next generation. Similarly DE start with a population size over the parameter space and then nudge towards the parameters which was best in that generation to next, provided the new parameter improves the goal / function DE trying to minimize. Please refer bellow links for more explanations.

Reference

@article{DBLP:journals/corr/AgrawalFM16,
  author    = {Amritanshu Agrawal and
               Wei Fu and
               Tim Menzies},
  title     = {What is Wrong with Topic Modeling? (and How to Fix it Using Search-based
               {SE)}},
  journal   = {CoRR},
  volume    = {abs/1608.08176},
  year      = {2016},
  url       = {http://arxiv.org/abs/1608.08176},
  archivePrefix = {arXiv},
  eprint    = {1608.08176},
  timestamp = {Mon, 03 Sep 2018 16:48:23 +0200},
  biburl    = {https://dblp.org/rec/bib/journals/corr/AgrawalFM16},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}
Haridas Narayanaswamy avatar
About Haridas Narayanaswamy, "Hari"
Searching for models close to data source behaviour.