更改 - iCenter Wiki

数据分析

添加40字节、2018年8月22日 (三) 13:20

https://en.wikipedia.org/wiki/Analytics.

=数据分析的算法=

== 频繁项目集发现 ==

查找频繁项目集ItemSets。其中最有名的算法是A-Priori算法。

#Gangyi Zhu et al., SciCSM: Novel Contrast Set Mining over Scientific Datasets Using Bitmap Indices, SSDBM 2015.

== 独立元素计数问题 ==

FM（Flajolet-Martin）算法

== 窗口内计数问题 ==

DGIM（Datar-Gionis-Indyk-Motwani）算法

#Datar, M., Gionis, A., Indyk, P., & Motwani, R. (2002). Maintaining stream statistics over sliding windows. SIAM journal on computing, 31(6), 1794-1813.

== 基数预估 ==

基数预估或估计（Cardinality Estimation），评估一个集合中不同数据项的个数的近似算法。比如，访问一个网站的独立IP个数。

# Heule, Stefan, Marc Nunkesser, and Alexander Hall. "HyperLogLog in practice: algorithmic engineering of a state of the art cardinality estimation algorithm." In Proceedings of the 16th International Conference on Extending Database Technology, pp. 683-692. ACM, 2013.

== 聚类 ==

聚类是对点集按照某种距离测度将它们聚成多个簇的过程。聚类目标是使得同一簇内的点之间距离较短，而不同的簇中的点之间距离较大。

==相关性挖掘==

===相关性测度（Correlation Metrics ）===

地球移动距离（Earth Mover's Distance, EMD）

== 子群发现（subgroup mining） ==

基于位图索引的子群发现方法加速。

Zhenchen

行政员、管理员

6,105

个编辑