这个其实就是bloom过滤器在统计方面的变型。
#Cormode, Graham, and Shan Muthukrishnan. "An improved data stream summary: the count-min sketch and its applications." Journal of Algorithms 55.1 (2005): 58-75.
== 聚类(Clustering) ==
# Leskovec, Jure, Anand Rajaraman, and Jeffrey David Ullman. Mining of massive datasets. Cambridge University Press, 2014. [http://www.mmds.org/ MMDS_book]
https*基于概率的数据结构 大数据处理中基于概率的数据结构,https://www.cnblogs.com/fxjwind/p/3289221.html https://highlyscalable.wordpress.com/2012/05/01/probabilistic-structures-web-analytics-data-mining/ *基数估计算法解读Cardinality Estimation算法(第一部分:基本概念), http://blog.codinglabs.org/articles/algorithms-for-cardinality-estimation-part-i.html 解读Cardinality Estimation算法(第二部分:Linear Counting), http://blog.codinglabs.org/articles/algorithms-for-cardinality-estimation-part-ii.html 解读Cardinality Estimation算法(第三部分:LogLog Counting), http://blog.codinglabs.org/articles/algorithms-for-cardinality-estimation-part-iii.html 解读Cardinality Estimation算法(第四部分:HyperLogLog Counting及Adaptive Counting), http://blog.codinglabs.org/articles/algorithms-for-cardinality-estimation-part-iv.html