==大数据索引==
索引是加快查找的数据结构(Data Structure),主要有哈希表(Hash Table),树(Tree)和倒排索引(Inverted Index)。
# Tardos, Eva, and Jon Kleinberg. "Algorithm Design." (2006).
===Hash Table===
布隆过滤器(BloomFilter)
[https://github.com/lemire/bloofi Bloofi]
# B. H. Bloom,Space/time trade-offs in hash coding with allowable errors, Commun. ACM, Volume 13 Issue 7, Pages 422-426, July 1970.
# Crainiceanu, Adina, and Daniel Lemire. "Bloofi: Multidimensional Bloom Filters." Information Systems 54 (2015): 311-324.
===Tree Index===
[https://en.wikipedia.org/wiki/B-tree B-tree]
# Sellis, Timos, Nick Roussopoulos, and Christos Faloutsos. "The R+-Tree: A Dynamic Index for Multi-Dimensional Objects." VLDB, 1987.
===Inverted Index===
倒排索引(Inverted Index)是搜索引擎使用的数据结构。
* 倒排索引上的最重要的运算是集合交(Conjunction),并(Disjunction)和非(Negation)。
* 倒排索引在实际实现中,可以采用位图(Bitmap)与整数链表(Integer List)两种结构形式。
* 倒排索引上的交,并和非运算,对应的整数链表操作是Intersection/Unions操作,对应位图是比特AND, OR, NOT操作。Unions操作,对应位图是比特AND,OR,NOT操作。
倒排索引实现:
[https://lucene.apache.org/core/ Lucene]
=====Bitmap Index=====
Roaring Bitmap
# Chenxing Li et al., BAH: A Bitmap Index Compression Algorithm for Fast Data Retrieval, LCN 2016.
=====Integer List=====
Data Structures for Inverted Indexes ([https://github.com/ot/ds2i ds2i])
# Lakshminarasimhan, Sriram, et al. "Scalable in situ scientific data encoding for analytical query processing." Proceedings of the 22nd international symposium on High-performance parallel and distributed computing. ACM, 2013.
===混合结构===
融合几种独立结构的混合结构
# Athanassoulis, Manos, and Anastasia Ailamaki. "BF-tree: approximate tree indexing." Proceedings of the VLDB Endowment 7, no. 14 (2014): 1881-1892.
===其它结构===
[https://github.com/simongog/sdsl-lite Succinct Data Structure Library]