Go to the first, previous, next, last section, table of contents.

`mkid': Creating an ID Database

`mkid' builds an ID database. It accepts the names of files and/or directories on the command line, selects files that have an enabled scanner, then extracts and stores tokens from those files. The resulting ID database is architecture- and byte-order-independent so it can be shared among all systems.

The primary virtues of `mkid' are speed and high capacity. The size of the source trees it can index is limited only by available system memory. `mkid''s indexing algorithm is very space-efficient and exhibits excellent locality-of-reference, and so is capable of operating with a working-set size that is only half the size of its virtual address space. A typical UNIX-like operating system with 16 megabytes of system memory should be able to build an ID database covering approximately 12,000-14,000 source files totaling approximately 50--100 Megabytes. A 66 MHz 486 computer can build such a large ID database in approximately 10-15 minutes.

In a future release, `mkid' will be able to incrementally update an ID database much faster than it can build one from scratch. Until this feature becomes available, it might be a good idea to schedule a `cron' job to regularly update large ID databases during off-hours.

`mkid' writes the ID file, therefore it accepts the `--output' (and `--file') options as described in section Options for Programs that Write ID Databases. `mkid' extracts tokens from source files, therefore it accepts the `--lang-map', `--include', `--exclude', and `--lang-option' options, as well as the language-specific scanner options, all of which are described in section Options for Programs that Scan Source Files. `mkid' walks file trees, therefore it handles file and directory names on its command line and the `--prune' option as described in section Options for Programs that Walk File and Directory Trees..

In addition, `mkid' accepts the following command-line options:

`-s'
`--statistics': `mkid' reports statistics about resource usage at the end of its run.
`-v'
`--verbose': `mkid' reports statistics about each file as it is scanned, and about the resource usage of its indexing algorithm at regular intervals.

Go to the first, previous, next, last section, table of contents.