[Previous] [Next] [Up] [Top] [Search] [Index]

Appendix B: Index File directives


This is a list of the items which may be placed in an index file to be processed by wndex. This file consists of a collection of records each of which consists of a group of lines pertaining to single file. Each line of a record begins with a directive like "Title=" which indicates that the remainder of that line is to be take as the title of the document whose record contains this line. The "File=" directive is special in that it indicates the beginning of a new record. The value of the "File=" directive is the name of the file whose record will follow. Lettercase is not significant in directive keywords.

When the character '#' is encountered in an index file it is assumed to be the start of a comment and everything after it on that line is ignored. To include the '#' character in, for example, a document title, it must be escaped with the '\' character. I.e. when "\#" is encountered it does not signify a comment and the character '#' (without the backslash) is treated as a normal character. In fact, since all directives contain the character '=', all lines which do not contain this character are silently ignored. Also a single conceptual line of an index file can be spread over several actual lines by ending all but the last line with the '\' character. I.e. if a line ends with '\' that character is removed and the contents of the next line is considered a continuation of the current line. The maximum total length of a line (including continuation) is 4096 characters.

The first record in an index file is special and is intended to describe attributes of the entire directory rather than individual files. It contains lines with directives specifying attributes of the directory as a whole or all the files in it. Here is a complete list of these directory directives:

Directory directives

Accessfile -- Specify directory access control file
The line
Accessfile=/dir/accessfile
specifies that the file /dir/accessfile is to be used to determine access priviliges (by hostname or IP address) for this directory. If this line is omitted access is allowed for everyone. Both the path /dir/accessfile and the path ~/dir/accessfile are taken relative to the WN root directory. In particular the accessfile must be in the WN hierarchy (unlike includes or filters, for example.) If the path does not begin with a '/' or a '~' then it is relative to the directory containing the index file. (See the user's manual section: Limiting Access to Your WN Hierarchy.)

Access-denied-URL -- Set a substitute URL for requests for which access is denied due to an accessfile restriction.
The line
Access-denied-URL=http://host/dir/foo.html
or the line
Access-denied-URL=/dir/foo.html
specifies that any request for a document in this directory which is denied because of an accessfile restriction should be redirected to the given URL. A default value for all directories can be set by uncommenting the "#define ACCESS_DENIED_URL" line in config.h and recompiling. If you use this directive be sure that the file foo.html does not have restricted access or you can create an infinite loop. This line has the special feature that it can also be placed as the first line of the accessfile controlling the directory. A line in the accessfile will override any value set in the index file.

Attributes -- Set directory attributes
Currently there are only two directory attributes, viz. "nosearch" and "serveall." Lettercase is not signficant in the attribute value.

The line

Attributes=serveall
specifies that any file, with a few exceptions, in this directory may be served not just those listed in the index file. The server will attempt to set the content type correctly based on the file name suffix using the same default correspondences between type and suffix that wndex uses. The exceptions are that files whose name starts with '.' or ends with '~' as well as the files "index" and "index.cache" will not be served.

The line

Attributes=nosearch
specifies that the index.cache databases in the current directory and its subdirectories should not be searched when the server does a title, keyword or user defined field search. Likewise context and grep searches will not be allowed in this directory. In this case when an attempt is made to do so an error message is returned to the client. It is also possible to exclude only some files from searching with the Attributes= file directive.

Authorization-Module, Authorization-Realm, Authorization-Type -- Specify program to be used as for authorization module.
Currently WN includes a "basic" authorization module called authwn. Its use is described in the chapter on limiting access to your server. Alternatively you can make your own module to handle authorization. Data is placed in environment variables as with CGI. WN expects this module to exit with status 0 if authorization is granted and with status 1 if access is denied.

For security reasons when you use an Authorization-Module you are required to use either the -t or -T option or the -a or -A optionand to have the index.cache file in the protected directory owned by the trusted user or group. This is to guard against counterfeit authorization modules.

Auth-denied-file -- Specify the name of an HTML file to be used as the error message when an authentication attempt for a password protected directory fails.
The line
Auth-denied-file=~/dir/foo.html
specifies that any request for a document in this directory which is denied because of an authorization module restriction results in the file ~/dir/foo.html being sent instead. A default value for all directories can be set by uncommenting the "#define AUTH_DENIED_FILE" line in config.h and recompiling. Note that this is not a URL but the name of a file whose content is to be sent as error text when authentication is denied. If the file name starts with '~/' as above it is assumed to be relative to the WN root directory. Otherwise it is assumed to be a path relative to the directory containing the index file.

Cache-Module -- Specify program to be used as interface to database for index.cache entries.
If this line specifies a program then instead of looking for file entries in the index.cache file this program is executed after putting the basename of the URL in the environment variable WN_KEY. This provides a mechanism to use a real database rather than the file index.cache. Note that the directory directives are still obtained from index.cache. The output of this module must be in the format of an index.cache line. Title, cache and grep searches are not supported since that would require reading the entire database.

Default-Attributes -- Specify the default attributes record for items in this directory.
The line
Default-Attributes=parse,dynamic
specifies that files in this directory should be parsed and marked as dynamic documents (not to be cached) unless they have an attributes record specifying the contrary.

Default-Content -- Specify the default MIME content type for items in this directory.
The line
Default-content=text/html
specifies that files in this directory which do not end in a suffix recognizable to wndex should be given the type "text/html". Any legitimate MIME type may be used as the value.

Default-Document -- Specify the default document for this directory.
The line
Default-Document=foo.html
specifies that a URL pointing to this directory like "http://host/dir/" will result in serving the document /WN_root/dir/foo.html instead of /WN_root/dir/index.html. Uses of this include making the default document a CGI script with "Default-Document=foo.cgi" or having a directory with HTML files all ending with the suffix ".htm" and using the directive "Default-Document=foo.htm". This directive applies only to the directory containing the index file, not to any subdirectories.

Default-Includes -- Specify a default includes line for documents in this directory.
The line
Default-Includes=footer.html
specifies that this line should be used as the Includes= directive for any document in this directory which does not have an Includes= directive explicitly set. To override this default value simply specify an explicit Includes= directive or use Includes= to have none.

Default-Max-Age -- Specify the default Max-Age for this directory.
The line
Default-Max-Age=2 weeks
specifies the Cache-Control and Expires headers of all documents served from this directory should be set to expire the document 2 weeks after it is served.

The line

Default-Max-Age=2 weeks
specifies the Cache-Control and Expires headers of all documents served from this directory should be set to expire the document 2 weeks after the last-modified date of the document. For more details see the Max-Age directive.

Default-Wrappers -- Specify a default Wrappers= directive for documents in this directory.
The line
Default-Wrappers=wrapper.html
specifies that this line should be used as the Wrappers= directive for any document in this directory which does not have a Wrappers= directive explicitly set. To override this default value simply specify an explicit Wrappers= directive or use Wrappers= to have none.

File-Module -- Specify program to be used as interface to database for obtaining files.
If this line specifies a program then instead of looking for a file in the current directory this program is executed after putting the basename of the URL in the environment variable WN_KEY. The output of this is served as if it were a file. This provides a mechanism to use a real database rather than the file index.cache.

Nomatchsub -- Set substitute file for searches on this directory which result in no matches.
The line
Nomatchsub=foo.html
specifies that the HTML file foo.html in the current directory should be used for the output of all searches (title, keyword, context, grep, etc.) on this directory which return no matches. It can only be used in conjunction with the Searchwrapper directive. See also Nomatchsub for files.

No-Such-File-URL -- Set substitute URL for requests for non-existent files
The line
No-Such-File-URL=http://host/dir/foo.html
or the line
No-Such-File-URL=/dir/foo.html
specifies that any request in this directory for a non-existent file should be redirected to the given URL. A default value for all directories and non-existent directories can be set by uncommenting the "#define NO_SUCH_FILE_URL" line in config.h and recompiling. The value set here will also be used if an index.cache file does not exist. If you use this directive be sure that the file foo.html does exist or you can create an infinite loop.

Owner -- Specify owner of directory items
This should be a line like
Owner=mailto:maintainer@host
The mailto:e-mail_address may be replaced with a URL referring to the indvidual who is responsible for the documents in this directory. This information is used in an HTTP header. It is not possible to designate the owner of a single file in an index directive. However, if the file is an HTML file this can be done with a "link" tag in the header of that document.

Searchwrapper -- Set wrapper file for searches on this directory.
The line
Searchwrapper=swrap.html
specifies that the HTML file swrap.html in the current directory should be used as a wrapper for the output of all searches on this directory.

Subdirs -- Specify subdirectories for searching and recursive use of wndex
When you run the wndex utility with the -r option (for recursive), it must know in which subdirectories it should descend to create a new index.cache database file. Likewise when the server does a title, keyword or user defined field search, it recursively descends the data hierarchy and must know for each directory which subdirectories are part of the hierarchy. The maintainer provides this information in a line like
Subdirs=subdir1,subdir2,subdir3
in the directory directives giving a comma separated list of subdirectories of the directory containing the current index file.

File directives

A collection of lines in the index file containing information about a single file in the directory of the index file is called a file record. A new file record begins with a line starting with File= and ends with the start of a new file record. Each line in a record begins with a file directive. Here is the complete list:
Attributes -- Set file attributes
Currently several possible attributes are possible including nosearch, parse, noparse, dynamic and cgi. Multiple values, separated by commas can be put on a single Attributes line, as in "Attributes=parse, dynamic, nosearch". Lettercase is not significant in the attribute value. Also "Attribute=" (without the 's') is synonymous with "Attributes=".

The line

Attributes=parse
indicates that the file referenced by this directive should be parsed for conditional text or server-side includes. This line is not necessary if there is also a Wrappers= line or an Includes= line since in that case the parse attribute is assumed. If you do not wish a document to be parsed when it otherwise would be the Attribute=noparse can be used. This might be done to improve efficiency when, for example, a document has a wrapper but nothing is included in it. Since it has a wrapper parsing will be turned on by default, but it is not necessary since nothing is actually included.

The line

Attributes=cgi
indicates that the standard CGI environment variables should be set up before processing this request. This is may be useful if there is a Filter= directive for this document or if the document has a server-side include which is the output of a script or program. In these cases the filter program or include script can access the CGI environment variables. This line is not necessary if the document it refers to is actually a CGI script since in that case this attribute is automatically set.

The line

Attributes=dynamic
indicates that the server should send headers attempting to encourage clients and proxies not to cache this document. This is done by setting the Expires: date to Jan 1, 1970 (if it is not otherwise set), omitting a Last-Modified header and sending a Pragma: no-cache header. It is not necessary to set this for CGI scripts which have no dependence on a QUERY_STRING or POST data as it is set by default for them. If you do not wish this done for a CGI script then use the line
Attributes=non-dynamic
If this is done the Last-Modified date will be that of the script.

The line

Attributes=nosearch
indicates that the file referenced by this directive should not be searched when the server does a context or grep search of the current directory.

See also the directory Attributes directive.

Content-encoding -- Specify the content encoding for a file
The line
Content-encoding=x-gzip
specifies "x-gzip" as the content encoding for the file described by this record. Only two types of content encoding are supported by common browsers. They are "x-gzip" and "x-compress". They indicate that the file has been compressed with the GNU gzip utility or the UNIX compress utility. The file is then sent by the server in the compressed format and will be decompressed automatically by the browser, if it supports this functionality.

In many cases this is unnecessary to specify this explicitly as the wndex program will automatically assign the the content-encoding x-gzip to a file whose name ends with ".gz" and the content-encoding x-compress to a file whose name ends in ".Z". Supplying the value "none" for the content-encoding will prevent the server from making this automatic assignment.

Content-type -- Specify the MIME content type for a file
The line
Content-type=audio/basic
specifies "audio/basic" as the MIME type for the file described by this record. In many cases this is unnecessary as the wndex program will automatically assign the MIME type if the file name ends in a suffix listed in the file /lib/mime.types with a corresponding type. If this line is supplied it will override the default value of the content type determined by the suffix.

Expires -- Specify the expiration date of a document or file
The line
Expires=Mon, 01 Sep 1997 14:11:01 GMT

specifies date and time which a document expires. Current practice is to use the format specified by RFC850 and illustrated above. In particular, GMT should be used. More information about HTTP date formats can be found here. For HTML documents the this information is automatically extracted from the document by wndex. This requires a line in the head of the HTML document like

<meta http-equiv="Expires" content="Tue, 10 Oct 1994 14:11:01 GMT">

If the Expires directive is also supplied in the index file it will override the expiration date in the document. See also the Maxage directive.

Field#n -- Specify a user supplied field associated with a file
The line
Field3=string

specifies "string" user supplied field #3 associated with the current document. These are used for field searches. The digit 3 can be replaced with any other single digit allowing a total of 10 user supplied fields.

File -- File name
The line
File=foo
begins a new file record for the file foo. It indicates that permission is granted for this file to be served. Other file directive lines will apply to this file until a new file record or text segment is started or the end of the index file is reached. The presence of this line causes an entry for this file to be written in the index.cache file created by wndex.

Filter -- Specify the filter with which a file is to be postprocessed.
The line
Filter=/dir/foo
causes the contents of the file whose record contains this line to be used as standard input of the program foo and the output of that program to be sent to the client instead of the file itself. A common use of this is to specify a decompressing program like zcat as the filter so that a compressed version of a file can be stored on disk and then be decompressed on the fly before being sent to the client. Another example would be "Filter=/usr/bin/nroff -man" which would convert an nroff man page to an ASCII text document on the fly.

If a listed file name begins with a '/' the name is considered as a path relative to the system root directory. If it begins with '~/' as in ~/dir/foo it is assumed to be relative to the WN root directory. Otherwise it is assumed to be a path relative to the directory containing the index file.

Header -- Add a line to the HTTP header for this document
The line
Header=[some legal HTTP header]
causes the line [some legal HTTP header] to be added to the HTTP header for this item. Don't do this unless you know what you are doing! This directive can be used multiple times to add multiple lines to the header.

Includes -- Specify the files to be included in a text document
The line
Includes=file1,file2,file3
causes the file whose record contains this line to be parsed for lines like <!-- #include -->. When such a marker is found one of the files listed with the Includes= directive is inserted. Subsequent occurrences of the marker cause the inclusion of subsequent files in the order in which they occur in this directive.

If a listed file name begins with a '/' the name is considered as a path relative to the system root directory. If it begins with '~/' as in ~/dir/foo it is assumed to be relative to the WN root directory. Otherwise it is assumed to be a path relative to the directory containing the index file. See the section of the user guide on includes and wrappers for more information.

Keywords -- Specify the keywords associated with a document or file
The line
Keywords=pink, elephant, HTTP

specifies a list of keywords assoicated with the current document. These are used for keyword searches. For HTML documents the keywords are automatically extracted from the document by wndex. This requires a line in the head of the HTML document like

<meta http-equiv="Keywords" content="pink, elephant, HTTP">

If the Keywords directive is also supplied in the index file it will override the keywords in the document.

Max-Age -- Specify the Cache-Control and Expires headers for an entry
The line
Max-Age= 10 days

specifies that a Cache-Contol header should be sent to expire the document in the specified time. If no Expires header has been set elsewhere in the index file or in the file itself, if it is an HTML file, then the Expires header will also be sent with a value equal to the current time plus the time period of the Max-Age header. The time period in the Max-Age header can be specified in units of seconds, minutes, hours, days or weeks, but more than one unit (as in 2 weeks and 3 days) is not allowed.

The line

Max-Age= 10 days after last-mod

specifies that a Cache-Contol header and the Expires header (if none is set elsewhere) should be set to expire the document in the specified amount of time after the last-modified date of the document. Negative time values for the Cache-Control header will be ignored, but Expires headers with dates in the past will be used.

Nomatchsub -- Set substitute file for searches on this file which result in no matches.
The line
Nomatchsub=foo.html
specifies that the HTML file foo.html in the current directory should be used for the output of all searches (context, grep, etc.) on this file which return no matches. It canonly be used in conjunction with the Searchwrapper directive. See also Nomatchsub for directories.

Redirect -- Send an HTTP redirect to a new URL
The lines
File=foo
Redirect=http://host/path/bar
cause a request for foo to be answered with an HTTP redirect response. The client will then automatically request the new URL. The file foo need not exist.

The redirection always send a "301 Moved Permanently" status header followed by a "Location:" header whose value is "http://host/path/bar". This means that the value of a Redirect= directive should always be a complete URL, starting with http:// or ftp:// etc. The one excepton is that you may use "Redirect=<null>". This causes the server to send a status 204 "no response" which tells the client to do nothing and leave the display alone. The page won't be reloaded and won't change.

Refresh -- Set a "Refresh" header for use with "client-pull"
The line
Refresh=60
adds an HTTP header at the beginning of the transmission of this document. If the client receiving this header supports "client-pull" (currently only Netscape browsers support this) then it will automatically reload the document after 60 seconds. This is useful for documents that are updated very frequently, a stock ticker, for example. If the directive
Refresh=30; URL=http://host/path/foo
is used then after 30 seconds the URL http://host/path/foo is loaded. This can be used to create an automatic slide show. The Refresh header is not part of an HTTP standard and hence may evolve. If it does this directive will be subject to change!

Searchwrapper -- Set wrapper file for searches on this file.
The line
Searchwrapper=swrap.html
specifies that the HTML file swrap.html in the current directory should be used as a wrapper for the output of all searches on this file.

Set-Cookie -- Set a "Cookie" header value
The lines
Set-Cookie=name1=opaque1
Set-Cookie=name=xxx; Expires=Thursday, 04-May-95 18:45:39 GMT
add an HTTP header at the beginning of the transmission of this document. If the client receiving this header supports cookie caching (currently only Netscape browsers support this) then it will save the name=value pairs and include them in the request headers when documents in the same directory or sub-directories are accessed. The server will put the name=value pairs in the environment variable HTTP_COOKIE for access by CGI scripts. This is useful for "shopping basket" type applications.

Normally the client will discard the cookie at the end of a session. However, if an Expires parameter like the one above is provided the cookie will be saved between sessions and only discarded when it expires.

More information about the proposed HTTP Set-Cookie header is available at http://home.netscape.com/newsref/std/cookie_spec.html

Title -- Specify the title of a document or file
The line
Title=This the the title
specifies the text "This is the title" as the title of the file. If the file is an HTML document this is not necessary as wndex will attempt to read the title from the document itself. If this line is supplied anyway it will override the title in the document. If this line is not supplied and the file is not an HTML document the default title "File " is used.

Wrappers -- Specify the files to be included in a text document
The line
Wrappers=file1
causes file1 to be parsed for lines like <!-- #include -->. When such a marker is found the file whose record contains this line is inserted and the combined document is sent to the client. It is possible to list multiple files in this directive. The semantics of this are explained in the section of the user guide on server-side includes and wrappers.

If a listed file name begins with a '/' the name is considered as a path relative to the system root directory. If it begins with '~/' as in ~/dir/foo it is assumed to be relative to the WN root directory. Otherwise it is assumed to be a path relative to the directory containing the index file. See the section of the user guide on includes and wrappers for more information.


John Franks <john@math.nwu.edu>
[Previous] [Next] [Up] [Top] [Search] [Index]