Release notes
ht://Dig © 1995, 1996, 1997 Andrew Scherpbier <andrew@contigo.com>
Please
see the file COPYING for license
information.
These are notes that go with each release of ht://Dig. There is
also a ChangeLog file which has more
details on the code changes.
Release notes for htdig-3.0.8b1 14-Apr-1997
I consider this a beta release since I have not had time to test
everything. Use at your own risk...
- Base tag problem fixed
- URL parser somewhat more robust
- Date parsing bug fixed
- Added Substring fuzzy algorithm.
- Various other bugs were fixed. Thanks for all the patches
that were sent to me!
Release notes for htdig-3.0.7 12-Jan-1997
More bug fixes and some minor new functionality. Hopefully,
I'll be able to finish up work on version 3.1 at some point in
the near future.
I have recently received some more patches for various things,
but I have not incorporated those, yet. Next version.
- The problem with the missing words has been fixed. This
was a problem in the Dictionary class.
- htsearch is a *lot* faster due to a patch by Esa
Ahola.
- htfuzzy has some work done to it. With the addition of
the new rx-1.4 library, the endings algorithm now actually
works for languages other than Enlish... It still takes an
awfull long time to build the tables for languages with lots
of rules.
- URLs now can be of the dubious form http:foo.html I have
never seen this used and think it is bogus, but alas, it
works now.
- A search form can now manually add words to any search
using the new keywords form attribute.
- A problem in the plaintext parser used to cause bogus HTML
in search results. This has been fixed.
- New documentation format. Lots of new documentation, as
well.
- New tobotstxt_name attribute. Used to match the
'user-agent' lines in robots.txt files.
- The <base> tag is now properly supported.
- Preliminary support for lots of new features, including:
- External document parsers. You'll be able to write
your own document parser for that special document type
that ht://Dig doesn't know about.
- New fuzzy search algorithms: substring, regex,
globbing, etc.
Release notes for htdig-3.0.6 26-Oct-1996
Just a single bug fix and one additional feature in this release.
-
Fixed the problem that caused frequent crashes with virtual memory
exhausted.
-
Added a new attribute, keywords_meta_tag_names, which should
contain a list of meta tag names for which the content
should be used as keywords. The default is set to "keywords
htdig-keywords"
Release notes for htdig-3.0.5 13-Oct-1996
This release consists of more bug fixes.
I want to thank Elliot Lee <sopwith@cuc.edu> for his help with
tracking down several bugs.
-
Fixed problem with accent characters. Words with SGML
entities and iso-8859-1 characters will now be indexed
correctly.
-
Changed the auto configuration to detect the need for a prototype
for the gethostname() function. (This was supposed to be fixed
before, but wasn't)
-
Reduced the memory requirements for all the programs by changing
the rehash() method in the Dictionary class. Access to hashes
may be a little slower, but the memory requirements were reduced by
a factor 10 or so.
-
Hopefully fixed a problem with the time related functions on
certain platforms. More checks are done to make sure the
functions that are used are actually available.
Release notes for htdig-3.0.4 2-Sep-1996
The previous version failed to build under Linux. This should be
fixed now.
-
Fixed problem with the time stuff which caused the build of
htdig to fail.
- Fixed a memory problem in htdig
Release notes for htdig-3.0.3 2-Sep-1996
Bugs bugs bugs... Will they ever all be found?
NOTE: I made extensive changes to the htdig.conf file that
gets installed. I would advise you to remove or rename your
existing htdig.conf and let the installation process create a
new one for you that you can then modify.
Also, since the rundig script has changed, you should remove
the old one before installing ht://Dig. (The installation
will refuse to overwrite existing files...)
-
The problem with htsearch crashing on some machines has been
fixed.
-
A bug caused the <AREA> tab to be ignored. Fixed.
-
A bug in SunOS caused dates to be all screwed up.
-
Added lots of comments to the example htdig.conf file. Also
added some additional example attributes.
-
Fixed a bug in the installation process which caused rundig
to be created incorrectly.
-
Added a sample synonyms file. Also modified rundig to
create a synonyms database for it.
Release notes for htdig-3.0.2 22-Aug-1996
More bug fixes.
-
Multiple start URLs now actually work. Before they were
just documented to work, but didn't actually work.
-
htmerge now will refuse to remove database files if it
detects that the call to /bin/sort failed.
-
htmerge can now tell /bin/sort to use a specific temporary
directory. This is done by setting the TMPDIR environment
variable.
-
htsearch can now search for words with non-ASCII characters
in them.
-
Added support for finding URLs in the <frame> and
<area> tags.
-
There is a problem with htsearch under Linux. It causes a
segmentation violation after the first search result is
displayed. Don't know what the problem is, yet.
-
Fixed bug in the auto configuration which always set the
value for NEED_PROTO_GETHOSTNAME to 1. For most systems
this actually needs to be 0.
Release notes for htdig-3.0.1 16-Aug-1996
This is a maintenance release in response to several bug reports.
-
htdig now will display a list of errors when the statistics
option (-s) is used. The list gives the URL that caused the
error and a URL that referred to it. Hopefully this
information is useful for site maintainers.
-
Some problems with the SGML character entities were fixed.
The major symptom was that the ';' that ends an entity used
to be included as well.
-
Major problems with htnotify were fixed. There were many
hardcoded things in this program that made it very specific
to SDSU and to me.
-
malloc.h should not be included anymore. All references to
it were replaced with stdlib.h instead. This should make
compiles on some platforms work better.
-
htsearch now will use the CONFIG_DIR environment variable to
override the compiled in default. (set in the CONFIG
file...) This was done so that htsearch can be called from
a simple wrapper that sets that environment variable. Only
the wrapper needs to be be modified to get different
CONFIG_DIR values.
Release notes for htdig-3.0 17-Jul-1996
I decided to make this the official 3.0 release.
It is extreemely important that you remove
all traces of earlier beta versions of the software before
installing this version or that you install in a completely
different location. Do not blame me for anything if you
didn't do this. You have been warned...
-
htwrapper is no more. htsearch is now the CGI program
-
htsearch now uses
templates to display the results. A template is simply a
piece of HTML code for a single match. The HTML code
includes variables that will be expanded to the various
items that are unique to each match, like URL, EXCERPT,
TITLE, etc. The template can be selected at search time
(through a menu). There are two builtin templates:
builtin-short and builtin-long. The
builtin-short template just lists the stars and
title while the builtin-long template lists results
in a similar fashion to the way Alta Vista displays results.
- Many runtime configuration options have been removed and
many new ones have been added. Check the configuration file documentation
for details. There are also some enhancements to the format
of the configuration file.
-
Attribute values can now span multiple lines by ending
each line that needs to be continued with a backslash
('\').
The file that is specified is read in and all
newlines and starting and trailing whitespaces are
reduced to a single space. If the file is not
found, nothing is included and no error is flagged.
Note that the backquote character is used, not the
regular quote character.
-
Attribute values can now include the contents of files.
Just put the filename in back-quotes. The filename can
use the normal variable expansion so that things like:
someattribute: `${common_dir}/somefile`
Notable attribute changes:
-
All the attributes that set the heading text have been
removed. These attributes include:
- accessed_heading_text
- datesize_heading_text
- descriptions_heading_text
- excerpt_heading_text
- modified_heading_text
- score_heading_text
- size_heading_text
- url_heading_text
- wordlist_heading_text
- field_order
-
New attributes added:
- http_proxy
-
Added to support the use of a HTTP proxy server to
index documents
- locale
-
Added to support international character sets
- match_method
-
New way of specifying if a search is an 'or', 'and',
or 'boolean' search
- matches_per_page
-
The new paged results uses this
- max_doc_size
-
Limit the size of documents retrieved
- next_page_text
- Used in the
navigation between pages
- no_excerpt_text
-
Text displayed if no excerpt was avaialble (this
used to be hard-coded)
- no_next_page_text
-
Used in the navigation between pages
- no_prev_page_text
-
Used in the navigation between pages
- prev_page_text
-
Used in the navigation between pages
- star_patterns
-
Allow different star images to be used depending on
the match URL
- synonym_dictionary
-
Support for the new synonyms fuzzy algorithm
- synonym_db
-
Support for the new synonyms fuzzy algorithm
- syntax_error_file
-
HTML file displayed if there was a boolean
expression syntax error
- template_map
-
Used in the support for the new result display
templates
- template_name
-
Sets the default template name
- text_factor
-
Added to allow normal text to have a variable weight
(0, for example...)
-
Some form tag names have changed. The list of recognized
form tags are in the htsearch documentation.
-
Multiple start urls can be specified as a value to the
'start_url' attribute. This could be combined with the
file inclusion to read in a file of URLs to start with.
-
htdig now sends the 'Referer:'
header in HTTP requests so that any link errors will be
logged in the server's log files.
-
In addition to the "htdig-keywords" META tag name, htdig now also supports just
"keywords". This is to make it more compatible with the
Alta Vista search engine.
-
The verbose display of htdig was
enhanced to show '+' for a link that will be followed and
'-' for a link that was discarded.
-
htmerge was changed to use the
Unix sort program instead of doing its own sorting. It no
longer uses mmap() to map the words into memory. This was
causing problems on systems with limited virtual memory
available. (What??? You mean you DON'T have at least a 1GB
disk dedicated to swap???)
-
The Endings algorithm was fixed up to work properly now.
There were several well hidden bugs that made the algorithm
come up with illegal words.
-
The synonyms fuzzy algorithm was added.
This is simply a mapping of words to other words. The input
file is just a list of words which causes the first word on
a line to be mapped to the rest of the words on that line.
(We use this to map course abbriviations to full course
names)
-
SGML entities are now supported. They are translated to
their equivalent ISO-8859-1 encoding.
Release notes for htdig-3.0b5
- The configuration has changed. There is now a CONFIG file
which contains all the variables which control where
things get installed. 'make install' will now actually
attempt to set everything up with default or example files.
Note that some default directories have changed. For
example, the default configuration file location is not
/usr/local/etc/htdig.conf anymore. Instead it is now
defined in terms of CONFIG_DIR.
- The htfuzzy/createDict.pl Perl program has been obsoleted.
Creating the endings database is now done by htfuzzy
itself. If you already have endings databases, you don't
need to recreate them, they will still work.
- GNU rx-1.0 is now included with the distribution. This is
used by htfuzzy to create the endings databases.
- The name of the whole search system has changed from
HTDig to ht://Dig.
- The HTML documentation got a big facelift! This includes
the new logo for ht://Dig. (Thanks goes to Keith Parks
for the Images!)
- htsearch got a new option '-r' which will allow it to
produce raw output. This output can easily parsed by a
wrapper program to produce custom HTML or other output for
the search results.
Andrew Scherpbier <andrew@contigo.com>
Last modified: Sun Jan 12 23:49:49 PST