1 files changed, 151 insertions, 0 deletions
diff --git a/doc/lhash.doc b/doc/lhash.doc
new file mode 100644
index 0000000000..5a2aeb4b38
--- /dev/null
+++ b/doc/lhash.doc
@@ -0,0 +1,151 @@
+The LHASH library.
+
+I wrote this library in 1991 and have since forgotten why I called it lhash.
+It implements a hash table from an article I read at the
+time from 'Communications of the ACM'.  What makes this hash
+table different is that as the table fills, the hash table is
+increased (or decreased) in size via realloc().
+When a 'resize' is done, instead of all hashes being redistributed over
+twice as many 'buckets', one bucket is split.  So when an 'expand' is done,
+there is only a minimal cost to redistribute some values.  Subsequent
+inserts will cause more single 'bucket' redistributions but there will
+never be a sudden large cost due to redistributing all the 'buckets'.
+
+The state for a particular hash table is kept in the LHASH structure.
+The LHASH structure also records statistics about most aspects of accessing
+the hash table.  This is mostly a legacy of my writing this library for
+the reasons of implementing what looked like a nice algorithm rather than
+for a particular software product.
+
+Internal stuff you probably don't want to know about.
+The decision to increase or decrease the hash table size is made depending
+on the 'load' of the hash table.  The load is the number of items in the
+hash table divided by the size of the hash table.  The default values are
+as follows.  If (hash->up_load < load) => expand.
+if (hash->down_load > load) =>  contract.  The 'up_load' has a default value of
+1 and 'down_load' has a default value of 2.  These numbers can be modified
+by the application by just playing with the 'up_load' and 'down_load'
+variables.  The 'load' is kept in a form which is multiplied by 256.  So
+hash->up_load=8*256; will cause a load of 8 to be set.
+
+If you are interested in performance the field to watch is
+num_comp_calls.  The hash library keeps track of the 'hash' value for
+each item so when a lookup is done, the 'hashes' are compared, if
+there is a match, then a full compare is done, and
+hash->num_comp_calls is incremented.  If num_comp_calls is not equal
+to num_delete plus num_retrieve it means that your hash function is
+generating hashes that are the same for different values.  It is
+probably worth changing your hash function if this is the case because
+even if your hash table has 10 items in a 'bucked', it can be searched
+with 10 'unsigned long' compares and 10 linked list traverses.  This
+will be much less expensive that 10 calls to you compare function.
+
+LHASH *lh_new(
+unsigned long (*hash)(),
+int (*cmp)());
+	This function is used to create a new LHASH structure.  It is passed
+	function pointers that are used to store and retrieve values passed
+	into the hash table.  The 'hash'
+	function is a hashing function that will return a hashed value of
+	it's passed structure.  'cmp' is passed 2 parameters, it returns 0
+	is they are equal, otherwise, non zero.
+	If there are any problems (usually malloc failures), NULL is
+	returned, otherwise a new LHASH structure is returned.  The
+	hash value is normally truncated to a power of 2, so make sure
+	that your hash function returns well mixed low order bits.
+	
+void lh_free(
+LHASH *lh);
+	This function free()s a LHASH structure.  If there is malloced
+	data in the hash table, it will not be freed.  Consider using the
+	lh_doall function to deallocate any remaining entries in the hash
+	table.
+	
+char *lh_insert(
+LHASH *lh,
+char *data);
+	This function inserts the data pointed to by data into the lh hash
+	table.  If there is already and entry in the hash table entry, the
+	value being replaced is returned.  A NULL is returned if the new
+	entry does not clash with an entry already in the table (the normal
+	case) or on a malloc() failure (perhaps I should change this....).
+	The 'char *data' is exactly what is passed to the hash and
+	comparison functions specified in lh_new().
+	
+char *lh_delete(
+LHASH *lh,
+char *data);
+	This routine deletes an entry from the hash table.  The value being
+	deleted is returned.  NULL is returned if there is no such value in
+	the hash table.
+
+char *lh_retrieve(
+LHASH *lh,
+char *data);
+	If 'data' is in the hash table it is returned, else NULL is
+	returned.  The way these routines would normally be uses is that a
+	dummy structure would have key fields populated and then
+	ret=lh_retrieve(hash,&dummy);.  Ret would now be a pointer to a fully
+	populated structure.
+
+void lh_doall(
+LHASH *lh,
+void (*func)(char *a));
+	This function will, for every entry in the hash table, call function
+	'func' with the data item as parameters.
+	This function can be quite useful when used as follows.
+	void cleanup(STUFF *a)
+		{ STUFF_free(a); }
+	lh_doall(hash,cleanup);
+	lh_free(hash);
+	This can be used to free all the entries, lh_free() then
+	cleans up the 'buckets' that point to nothing.  Be careful
+	when doing this.  If you delete entries from the hash table,
+	in the call back function, the table may decrease in size,
+	moving item that you are
+	currently on down lower in the hash table.  This could cause
+	some entries to be skipped.  The best solution to this problem
+	is to set lh->down_load=0 before you start.  This will stop
+	the hash table ever being decreased in size.
+
+void lh_doall_arg(
+LHASH *lh;
+void(*func)(char *a,char *arg));
+char *arg;
+	This function is the same as lh_doall except that the function
+	called will be passed 'arg' as the second argument.
+	
+unsigned long lh_strhash(
+char *c);
+	This function is a demo string hashing function.  Since the LHASH
+	routines would normally be passed structures, this routine would
+	not normally be passed to lh_new(), rather it would be used in the
+	function passed to lh_new().
+
+The next three routines print out various statistics about the state of the
+passed hash table.  These numbers are all kept in the lhash structure.
+
+void lh_stats(
+LHASH *lh,
+FILE *out);
+	This function prints out statistics on the size of the hash table,
+	how many entries are in it, and the number and result of calls to
+	the routines in this library.
+
+void lh_node_stats(
+LHASH *lh,
+FILE *out);
+	For each 'bucket' in the hash table, the number of entries is
+	printed.
+	
+void lh_node_usage_stats(
+LHASH *lh,
+FILE *out);
+	This function prints out a short summary of the state of the hash
+	table.  It prints what I call the 'load' and the 'actual load'.
+	The load is the average number of data items per 'bucket' in the
+	hash table.  The 'actual load' is the average number of items per
+	'bucket', but only for buckets which contain entries.  So the
+	'actual load' is the average number of searches that will need to
+	find an item in the hash table, while the 'load' is the average number
+	that will be done to record a miss.