Group: com.github.sergejzr.lib - All Dependencies

icon
Diversity 0.0.1

Efficient Diversity Computation of Large Datasets · We propose two efficient algorithms for exploring topic diversity in large document corpora such as user generated content on the social web, bibliographic data, or other web repositories. Analyzing diversity is useful for obtaining insights into knowledge evolution, trends, periodicities, and topic heterogeneity of such collections. Calculating diversity statistics requires averaging over the similarity of all object pairs, which, for large corpora, is prohibitive from a computational point of view. Our proposed algorithms overcome the quadratic complexity of the average pair-wise similarity computation, and allow for constant time (depending on dataset properties) or linear time approximation with probabilistic guarantees.

Sep 10, 2016
0 stars

Advertisement

Top Dependency Usages

Feb 13, 2021
95.1k usages
8.4k stars
Jun 02, 2023
69.4k usages
14.2k stars
Mar 17, 2023
51k usages
2.1k stars
Jul 31, 2023
27.1k usages
49k stars
Aug 09, 2023
25k usages
2.7k stars