更改 - iCenter Wiki

索引

添加359字节、2019年6月25日 (二) 10:16

* 定义

~~索引（Index）是加快查找（Search）的数据结构（Data~~ 索引（Index）是加快查找（Search）功用的数据结构（Data Structure）。

*~~数据结构~~用于索引的数据结构

用于索引功能的数据结构有哈希表（Hash Table）、树（Tree）、整数数组/链表（Integer Array/Linked List）和位图（Bitmap）。

~~可参考《数据结构》教科书中的搜索（Search）的章节。~~*索引的应用：分布式系统、数据库与数据系统、信息检索等

* 教科书：

索引结构的设计可参考《算法与数据结构》教科书中的搜索（Search）的章节。

# Thomas H. Cormen, Introduction to algorithms, third edition, MIT press, 2009.

# Eva Tardos and Jon Kleinberg, Algorithm Design, Pearson, 2006.

# Tim Roughgarden, [http://www.algorithmsilluminated.org Algorithms Illuminated].

=索引的形式=

# B. H. Bloom,Space/time trade-offs in hash coding with allowable errors, Commun. ACM, Volume 13 Issue 7, Pages 422-426, July 1970.

# Crainiceanu, Adina, and Daniel Lemire. "Bloofi: Multidimensional Bloom Filters." Information Systems 54 (2015): 311-324.

# Chazelle, Bernard, et al. "The Bloomier filter: an efficient data structure for static support lookup tables." SODA, 2004.

== 树（Tree） ==

本身设计是为了IP地址的压缩存储与快速查询，但是通过合适扩展其思想也可以被用于更广范围的数据存储。

~~Poptrie主要通过改进trie的思想来提升效率。Tire是通过树形结构来存储数据，主要利用数据之间前缀的重叠来提高存储效率。~~Poptrie主要通过改进trie的思想来提升效率。Trie是通过树形结构来存储数据，主要利用数据之间前缀的重叠来提高存储效率。

其大体实现方法是先将数据分块，每一块对应树的一层，而每一层的节点的最大孩子节点数即为该块数据的可能至的数目，即每一个位置的孩子对应一块数据的一种取值。通过这种方法我们可以将一些数据构造成一棵树。这种方式当数据之间前缀重叠率高时，效果出色，而重叠率低时，可能反而会浪费空间。

#Brisaboa, Nieves R., Susana Ladra, and Gonzalo Navarro. "Directly addressable variable-length codes." In International Symposium on String Processing and Information Retrieval, pp. 122-130. Springer Berlin Heidelberg, 2009.

=== SDSL - Succinct Data Structure Library ===

https://github.com/gitster/git/blob/master/Documentation/technical/bitmap-format.txt

= ~~索引（搜索引擎）~~ 索引应用（信息检索） =

[[搜索引擎]]（Search ~~Engine）是一种典型的大数据系统，是实现信息检索（Information~~ Engine）是一种典型的信息检索的大数据系统，是实现信息检索（Information Retrieval）的软件。

经典的搜索引擎主要是针对爬取的网页文本的检索。一个网页文本或文档，是由许多的单词组成的。其中每个单词可以在同一个文档中重复出现很多次，同一个单词也可以出现在不同的文档中。

[http://www.opensearchserver.com OpenSearchServer]

=软件代码库=

Zhenchen

行政员、管理员

6,105

个编辑