A library for the LDA topic modelling algorithm in Python and C.
__ ____ _____ | | | \| _ | | |__| | | | |_____|____/|__|__|
The best way to use liblda is through the command line:
./run.py --docs docs.txt --numT 40 --vocab vocab.txt --seed 3 --iter 400 --alpha 0.1 --beta 0.01 --save_probs --print_topics 10
where:
docs.txtcontains one document per line,vocab.txtcontains the vocabulary (one word per line)--save_probsindicates that you want to output the probs phi and theta
Place the directory liblda somewhere in your Python path.
We have implemented the Gibbs sampling approach which is fairly efficient when done in C. All the rest of the functionality is done in Python so it is very hackable.
numpy(for arrays)scipy(for weave)
The code base works, but is a bit of a mess right now. A rewrite has begun -- in cython.
Ivan Savov, first dot last at gmail