2017年5月10日 (三) 15:31的版本

数据解析

数据解析（Data Analytic），是指对数据集的属性值进行SUM，TopN，Rank操作。一般要求实时响应。

大数据解析平台，是实现数据解析的分布式软件系统。

Navarro, Gonzalo, and Eliana Providel. "Fast, small, simple rank/select on bitmaps." In International Symposium on Experimental Algorithms, pp. 295-306. Springer Berlin Heidelberg, 2012.
Vigna, Sebastiano. "Broadword implementation of rank/select queries." In International Workshop on Experimental and Efficient Algorithms, pp. 154-168. Springer Berlin Heidelberg, 2008.

基数估计（Cardinality Estimation），评估一个集合中不同数据项的个数。比如，访问一个网站的独立IP个数。

Flajolet, Philippe, Éric Fusy, Olivier Gandouet, and Frédéric Meunier. "Hyperloglog: the analysis of a near-optimal cardinality estimation algorithm." DMTCS Proceedings 1 (2008).
Heule, Stefan, Marc Nunkesser, and Alexander Hall. "HyperLogLog in practice: algorithmic engineering of a state of the art cardinality estimation algorithm." In Proceedings of the 16th International Conference on Extending Database Technology, pp. 683-692. ACM, 2013.

关联规则挖掘（association rule mining），查找频繁项目集ItemSets。其中最有名的算法是Apriori算法。

Kim, Sung-Tan, Jae-Myung Kim, and Sang-Won Lee. "BAR: bitmap-based association rule: an implementation and its optimizations." In Proceedings of the 7th International Conference on Advances in Mobile Computing and Multimedia, pp. 627-631. ACM, 2009.

@@ 第1行： / 第1行： @@
-=== 数据解析 ===
+= 数据解析 =
 数据解析（Data Analytic），是指对数据集的属性值进行SUM，TopN，Rank操作。一般要求实时响应。
@@ 第13行： / 第13行： @@
 # Vigna, Sebastiano. "Broadword implementation of rank/select queries." In International Workshop on Experimental and Efficient Algorithms, pp. 154-168. Springer Berlin Heidelberg, 2008.
-=== 基数估计 ===
+= 基数估计 =
 基数估计（Cardinality Estimation），评估一个集合中不同数据项的个数。比如，访问一个网站的独立IP个数。
@@ 第21行： / 第21行： @@
 # Flajolet, Philippe, Éric Fusy, Olivier Gandouet, and Frédéric Meunier. "Hyperloglog: the analysis of a near-optimal cardinality estimation algorithm." DMTCS Proceedings 1 (2008).
 # Heule, Stefan, Marc Nunkesser, and Alexander Hall. "HyperLogLog in practice: algorithmic engineering of a state of the art cardinality estimation algorithm." In Proceedings of the 16th International Conference on Extending Database Technology, pp. 683-692. ACM, 2013.
+= 关联规则挖掘  =
+关联规则挖掘（association rule mining），查找频繁项目集ItemSets。其中最有名的算法是Apriori算法。
+#Kim, Sung-Tan, Jae-Myung Kim, and Sang-Won Lee. "BAR: bitmap-based association rule: an implementation and its optimizations." In Proceedings of the 7th International Conference on Advances in Mobile Computing and Multimedia, pp. 627-631. ACM, 2009.