https://en.wikipedia.org/wiki/Analytics.
=数据分析的算法=
== 频繁项目集发现 ==
查找频繁项目集ItemSets。其中最有名的算法是A-Priori算法。
#Gangyi Zhu et al., SciCSM: Novel Contrast Set Mining over Scientific Datasets Using Bitmap Indices, SSDBM 2015.
== 独立元素计数问题 ==
FM(Flajolet-Martin)算法
== 窗口内计数问题 ==
DGIM(Datar-Gionis-Indyk-Motwani)算法
#Datar, M., Gionis, A., Indyk, P., & Motwani, R. (2002). Maintaining stream statistics over sliding windows. SIAM journal on computing, 31(6), 1794-1813.
== 基数预估 ==
基数预估或估计(Cardinality Estimation),评估一个集合中不同数据项的个数的近似算法。比如,访问一个网站的独立IP个数。
# Heule, Stefan, Marc Nunkesser, and Alexander Hall. "HyperLogLog in practice: algorithmic engineering of a state of the art cardinality estimation algorithm." In Proceedings of the 16th International Conference on Extending Database Technology, pp. 683-692. ACM, 2013.
== 聚类 ==
聚类是对点集按照某种距离测度将它们聚成多个簇的过程。聚类目标是使得同一簇内的点之间距离较短,而不同的簇中的点之间距离较大。
==相关性挖掘==
===相关性测度(Correlation Metrics )===
地球移动距离(Earth Mover's Distance, EMD)
== 子群发现(subgroup mining) ==
基于位图索引的子群发现方法加速。