A. Agrawal, Wei Fu, & T. Menzies (2016).
What is Wrong with Topic Modeling? (and How to Fix it Using Search-based SE).

This paper propose a novel way of tuning the LDA's hyper parameters $\alpha$, $\beta$
and topic_size $k$. The method used here is Differential Evolution Algorithm

  • a black box optimization method used widely outside machine learning domain,
    hence the name as LDADE. This paper also reviews multiple papers in this
    domain and points out the issues of running LDA without tuning has instability
    issues with topic formation. There are multiple tuning methods proposed before
    to address this topic instability issue eg; LDA-GA ( LDA with Genetic
    Algorithm). This paper claims LDADE method is more stable and converges

$\alpha$ -> Dirichlet prior for Topic Distribution over document, initialized
uniformly at starting and updated via Bayesian inference.

$\beta$ -> Dirichlet prior for Vocab distribution over Topic.

$k$ -> Number of topic. This affects the LDA performance.

Differential Evolution optimization can be used to minimize any function, it
does search over parameter space to find the best fitting parameter that
minimizes the target function. The convergence speed is much faster and DE
method prunes down the possible search space pretty quickly. You can think this
as - out of all the mutation happening over the population only the favouring
mutations are being carry forward to the next generation. Similarly DE start
with a population size over the parameter space and then nudge towards the
parameters which was best in that generation to next, provided the new parameter
improves the goal / function DE trying to minimize. Please refer
bellow links for more explanations.


  author    = {Amritanshu Agrawal and
               Wei Fu and
               Tim Menzies},
  title     = {What is Wrong with Topic Modeling? (and How to Fix it Using Search-based
  journal   = {CoRR},
  volume    = {abs/1608.08176},
  year      = {2016},
  url       = {http://arxiv.org/abs/1608.08176},
  archivePrefix = {arXiv},
  eprint    = {1608.08176},
  timestamp = {Mon, 03 Sep 2018 16:48:23 +0200},
  biburl    = {https://dblp.org/rec/bib/journals/corr/AgrawalFM16},
  bibsource = {dblp computer science bibliography, https://dblp.org}