Tensor Factorization: Statistically Recover Hidden Topics for New York Times

Click on the Play, Pause and Stop buttons to visualize an animation of differents steps of the algorithm.
Click Iterations up and Iterations down to show the state of the algorithm that corresponds to different iterations.
Change the frequency of the animation using the buttons Period Up and Period Down.
Hover on each line to highlight it.


Number of iterations:
Frequency of the animation:

This example demonstrates the recovery of topics from New York Times data obtained from the New York Times dataset.
It contains 102,660 distinct words, 300,000 documents, and 100,000,000 words in total. Individual document names (i.e. an identifier for each docID) are not provided for copyright reasons.

The graphic and the table show the estimated probabilities of each word belonging to each topic as recovered by the tensor algorithm (only the top words for the top topics are shown). The data shown corresponds to the iteration of the algorithm indicated in Number of iterations: xxx.

Our algorithm recovered topics such as Economy, Education, Sports, Online Social Media and Crime Reports.

Acknowledgements

Topic modeling code: Furong Huang & Forough Arabshahi

Visualization: Oriol JuliĆ  Carrillo