Excite, Inc. Excite for Web Servers Help

Custom Document Formats

NOTE: these features are currently unsupported!

Enabling Custom Formats

Excite for Web Servers provides a few basic features that allow for searching and indexing of files with custom document formats. By using these features, a user can define a string that acts as a document delimiter, in case there are multiple documents per file. Also, a string can be defined that sets off the title in each document, so that the title can be displayed in the results list.

In order to allow a custom format to be defined, two variable's values must be changed:

  * C<$custom_format> in F<afeatures.pl> must be set to 1, and
  * C<$restrict_beneath_document_root> must be set to 0, to allow
    documents that appear outside the Web server's document root to be
    indexed.

custom_format

Setting $custom_format to 1 notifies Excite for Web Servers that it should deal with all non-HTML files in this collection as custom-format files.

restrict_beneath_document_root

Excite for Web Servers knows how to serve custom-format documents only when they appear outside the Web server's document root. To serve these documents, EWS uses CGI scripts which grab and then output their text.

Setting $restrict_beneath_document_root to 0 is necessary as it relaxes the restriction of Excite for Web Servers that the documents it serves appear beneath the server's document root.

You should note that there are some drawbacks to this method of serving documents. First, there's some extra computational overhead in invoking CGI scripts. Second, relative-pathname hyperlinks may not work. And third, the loading of images doesn't work, since the text of documents served in this manner by Excite for Web Servers are contained within &ltPRE> tags.

Defining Custom Formats

Once the variables mentioned above have been appropriately set in afeatures.pl, it is possible to define document and title delimiters. This is done during collection configuration using AT-config.cgi, reachable from the main administration page, AT-admin.cgi.

Document Delimiter

This field defines a string that separates documents within a single file. This is useful when multiple documents per file appear.

Title Delimiter

In many custom formats, the document's title is set off by a certain string. By defining this string, it is possible to cause Excite for Web Servers to extract the title for display in the query result list.

Below follows an example custom format:

  AU Doe, John
  TI The History of Technology
  DA July 1st, 1995.
  AB A short 200-word history of technology.
  First came fire, then the wheel ...
  .
  .
  *****
  AU Doe, Jane
  TI The History of Mathematics
  DA July 2nd, 1995.
  AB A short 200-word history of mathematics.
  First came one, then two, ...
  .
  .
  *****
In the example above, the Document Delimiter would be '*****' and the Title Delimiter would be 'TI'. Therefore, when the text of each document is retrieved by Excite for Web Servers, there would be two documents in this example, and the titles presented in a result list would be The History of Technology and The History of Mathematics.

Indexing Custom Formats

After the Document Delimiter and Title Delimiter have been defined, one can then index the documents which are in that format.

As explained above, if you wish to serve the text of these documents using Excite for Web Servers, you must store the custom format files in directories outside of the Web server's document root. Only when documents are stored outside the Web server's document root does Excite for Web Servers know to use CGI scripts to access the text of those documents. This is especially important when a multiple-document-per-file format is being used, since otherwise a link to the file would return the text of many documents at once.

Finally, make sure that the IndexFilter attribute is set to index text files as well as HTML files. When the $custom_mode variable is enabled in afeatures.pl, the indexer assumes that all documents that are not HTML documents are in the custom format.