Excite, Inc. Excite for Web Servers Help

Using The Forms-Based Administration Tools

What Does This Software Do?
Installation Information
Main Administration Page
- Changing the Password
- Configuring URL Mappings
Document Collections
- New Collection
- Configurable Attributes
Query and Query-Results Page Generation
Documentation about Preferences and Customization
Documentation about Making Queries
Documentation about the Command Line Applications

What Does This Software Do?

Excite for Web Servers makes it easy for you to add searching -- Excite, Inc.'s advanced concept-based searching -- to your Web site.

Excite for Web Servers provides a simple Web-browser interface for doing all the things necessary to enable concept-based searching of collections of documents -- administering, indexing, and searching over the collections. In particular, one can:

define a document collection -- that is, specify a set of documents to be considered a single collection over which one can search,
design customized pages for displaying to users who wish to search over that collection,
index that collection, monitoring the progress, and
search the collection.

With Excite for Web Servers, it's easy to set up concept-based-searchable Web sites in minutes.

Installation Information

During installation of the Excite for Web Servers software, an HTML file containing information about the location of certain Excite for Web Servers-related files is generated for you. If you would like to find certain Excite for Web Servers files, you can track them down by looking at your installation information file.

Main Administration Page

The main administration page is accessible via the AT-admin script. From this main form you can create collections, configure existing collections, change passwords, and configure URL mappings for your Web Server.

Changing the Password

If you would like to change the password that allows access to the administration pages, you can do so by pressing the Password button on the main administration page.

Configuring URL Mappings

Pressing the Configure URL Mappings button on the main administration page allows you to tell EWS where the files that are served by your Web sever are stored, and how the URLs correspond to them. This feature is used to ensure that URLs in your search-results-lists point to the right place.

For example, suppose you had files on your site accessed with the URL http://foo.bar.com/root1/ that were stored in the directory /usr/local/www/html on your machine, while file accessed with the URL http://foo.bar.com/root2/ were stored in /usr/docs/html. You could add the following entries to the mappings:

  /root1/ /usr/local/www/html/
  /root2/ /usr/docs/html/

You can also use the URL mappings to deal with aliases to your server. Suppose all the files in /usr/surveys/html were served from the same server, but you wanted the URLs to appear with a different alias, http://survey.bar.com/. You could add this entry to the mappings:

  http://survey.bar.com/ /usr/surveys/html

NOTE: If you are indexing user's public_html directories, it is not necessary to set up mappings by hand for those directories, because it is handled automatically by EWS.

Document Collections

A document collection is the specification of a set of HTML or plain-text files over which one would like to do concept-based searches. Or more simply, it can be thought of as the searchable documents themselves.

Besides a name, each document collection has associated with it a set of configurable attributes -- information about the documents in the collection and the index to be built on those documents. These attributes, described in more detail below, include:

document information -- The CollectionContents. A specification of which files are to be included in the collection, by means of either:
- an explicit list of documents, or
- a set of rules describing the documents.
index information
- The CollectionIndex. The directory in which the index for the collection should be stored.
- IndexingContact. Optionally, the email address (or hostname in NT versions) of the person to be notified when indexing of this collection is complete.

Each document collection has a collection-name.conf file where its configurable attributes are stored. The Collection File Format is described more fully in the documentation on the command-line applications.

New Collection

Creating a new document collection is a two-step process: naming and configuring.

First, you must give the new collection a name. Simply enter a name in the field provided on the main Excite for Web Servers Administration page.

Once named, the new collection must be "configured" -- that is, have its other attributes defined. Click the Configure New Collection button to bring up a page on which to provide values for those attributes. (Defaults are provided, which you can change as appropriate.) See the next section for more information on these attributes.

When you're done configuring the new collection, you may then index it, generate search/result pages for it, and then search over it.

Configurable Attributes

There are a number of attributes associated with each collection that you can configure using the Configure New Collection form. These attributes provide information about the documents in the collection, the location of the index to be built on those documents, and whom to contact when indexing is complete. These attributes are explained in detail directly below.

CollectionIndex

When you index a collection for searching, Excite for Web Servers generates several files which make up that index. The indexing and searching applications need to know where these index files are located. The CollectionIndex is simply the name of a directory where the index files will be stored. If you don't particularly care where the index goes, you may simply leave the default value provided.

CollectionContents

There are three methods available to describe the collection of files you wish to have included in the index: Enter the Files Directly, Index Using File List, and Index ~user Directories. None of these options are exclusive: you can use any combination of them to pick the files you want to index.

Enter the Files Directly

The first option is to Enter the Files Directly, and while it's a little more complicated than using a file list, it's more useful. It allows you first to specify where the indexer should look for files to be indexed, and then it allows you to give the indexer rules about which of the files it finds there to include in or exclude from indexing, based on file name and content.

To specify where the indexer should look for files to index, you simply provide a list of file/directory names. Each file listed will be a candidate for indexing, and each directory will be "expanded" so that all the files "below" it -- that is, all the files contained in it and in its traversed subdirectories -- will be candidates for indexing as well. If a path name contains special non-alphanumeric characters such as whitespace, be sure to enclose the full path name with single quotes. Whitespace delineates individual file/directory names.

Once a set of candidate files as described by your list is created, then inclusion/exclusion rules are applied to each candidate to see whether it will be indexed. The set of inclusion/exclusion rules is the IndexFilter, described below.

Index Using File List

The other option for describing the collection of files you wish to have indexed is to Index Using (a) File List. Simply provide the indexer with a filename containing a list of files you wish to index. Remember that you must use absolute pathnames.

Index ~user Directories

When you choose the option on the configuration page for a collection, the indexer will find all of the user's directories that contain an appropriately named directory in the home directory (usually called public_html), and index the files in each of those directories.

IndexFilter

Your means for specifying which of the candidate files to index (or not) is the IndexFilter. The IndexFilter provides you with some generic options for inclusion and also allows you create a more complicated Custom Filter File.

There are three generic inclusion rules; of the files which are candidates for indexing, you may include:

some non-binary candidate files:
- all those files whose names match the expression *.htm* (and no others) AND/OR
- all those files whose names match either of the expressions
  - *.text or *.txt, OR

all non-binary candidate files. In addition to generic inclusion rules, you may create a Custom Filter File to specify other inclusion rules or exclusion rules.

Custom Filter File

If the generic IndexFilter inclusion rules aren't good enough for specifying which files to index how (and which not to index), you may optionally provide further specifications or restrictions on which files to index by use of a Custom Filter File. The rules provided in this file override the generic rules.

Inclusion/exclusion rules have three columns:

the type of expression being used to match against the candidate filenames (to be explained below),
the expression itself, and
the format in which to index the files which match the expression -- either HTML, TEXT, or nothing (don't index it).

There are two different categories of expressions one can use to match against filenames: regular expressions and Unix-style globbing expressions.

Regular expressions are very powerful, giving you access to very terse, expressive rules for matching filenames. (If you'd like to learn about regular expressions, there is a Unix man page on "regexp".) If you'd like to use a regular expression, simply put the token regexp in the first column of that line.

While regular expressions are very powerful, they're often just enough rope to hang yourself with. As a nice alternative, Unix-style globbing expressions may be less powerful, but they're relatively safe and comfortable. We allow three different types of Unix-style expressions for matching against filenames:

dir -- will match only root-level directory names. This is not likely to be all that useful, as you're requiring that the expression you give match the pathname starting from its absolute beginning, but we thought we'd let you have that choice.
subdir -- will match against any directory in the pathname.
file -- will match only simple filenames.

These three different types of Unix-style expressions should give you as much flexibility in specifying files as you'll need.

An example Custom Filter File:

  # don't index any ".pl" files in directories called "bin"
  regexp  \/bin\/.*\.pl$
  # index all stuff in /usr/local/www/html/text-files as plain text.
  # the next four lines are all equivalent.
  dir    usr/local/www/html/text-files       TEXT
  dir    usr/local/www/html/text-files/      TEXT
  dir    /usr/local/www/html/text-files      TEXT
  dir    /usr/local/www/html/text-files/     TEXT
  # don't index anything below and directory with "old" in its name.
  # again, the next four lines are all equivalent.
  subdir  *old*
  subdir  *old*/
  subdir  /*old*
  subdir  /*old*/
  # override the general '*.htm*' rule.  there happen to be some
  # files ending with '.html.C' -- index them as text instead of
  # HTML.
  # these two lines are equivalent.
  file    *.html.C     TEXT
  file    /*.html.C    TEXT

After you've created a Custom Filter File, then simply indicate in the entry field provided the name of this file you've created.

Summary Mode

Summaries are generated at indexing time, and therefore impact indexing speed. Fast summaries do not add any significant time to the indexing speed since the first few lines from the file are used as the document summary. Quality summaries are calculated using Excite's summarization technology. These are generally better descriptions than the first two lines of the file, but do slow down indexing a bit. If you would like to use your own description as the summary of a document, you can do so by adding the following META tag to the document:

<META NAME="DESCRIPTION" CONTENT="This is my own summary.">

IndexingContact

This field is optional, and allows the adminstrator to specify an email address of a user or hostname for a machine which should receive a notification upon the completion of an indexing process for this document collection. If you wish to index a collection previously configured by someone other than yourself, you will probably wish to change the configuration so that you are notified instead of the previous administrator. Important: Note that this field has different uses, depending on your operating system. UNIX users can provide an email address at which they can be reached, while NT users should provide the name of a machine that is running the NT Messenger Service, where they will receive notice when the indexing process finishes.

Query and Query-Results Page Generation

Once you've configured a document collection, you can then create query and query-results pages to be used specifically with that document collection.

The query page is the page a user sees when searching, and, predictably, the query-results page is used to return the results of a query to a user. In order to allow for these pages to look different for different document collections, a new query page and a new query-results page is "generated" for each document collection.

To generate a collection-specific query page or query-results page, you have two options:

You may use a stock template page provided for you, changing certain modifiable appearance attributes as you wish:
- the Banner Image to display at the top of both the query and query-results pages, and
- the Brief Description of the contents of the document collection to appear at the top of the query (but not query-results) page. OR
You may supply your own template page for either or both the query page and query-results page.

Specific information about how to generate query and query-results pages directly follows.

Banner Image

If you are using either -- or both -- the stock query or query-results page provided for you, you may still provide your own image to be displayed at the top of that page. This image is called the Banner Image. (By default, it is the Excite, Inc. logo as it appears at the top of the admin pages.)

To display an image of your own design, simply put the filename of your image in the entry field provided on the generation form.

(Hint: You may also edit the generated .html and .cgi scripts if wish to change the appearance of those pages. However, take care not to change the search form itself. If you "break" things, just re-generate the pages.)

Brief Description

Think of this description as a sub-title to the collection name. If you use the stock query page provided for you, you may provide a brief description (size, age, contents) of the document collection to be displayed there. This description will give the searcher further guidance as to the contents of your document collection.

It's good user-interface policy to tell users what they are searching over. If your collection contains information on economic conditions in Malaysia (and that's not obvious from the collection name), tell them this, so they don't spend their time doing queries on "Barbie Dolls" or "hydroponic gardening", which would no doubt provide dissappointing results.

Backlink

If you generate a page that allows searches to be forwarded to Excite, you can provide a URL in the field provided so that the search results page will contain a link back to your site.

Linktext

The linktext field accompanies the backlink field and lets you provide text (up to 20 characters) for the URL. If you do not provide linktext, the URL itself will be used as the text for the link.

Query Template

Instead of using the stock query and query-results pages provided for you, you may create your own, completely customized pages. In the case of the query page, you create your own query page by supplying a Query Template file.

A Query Template file is a regular HTML file which contains the line

  ###EXCITE###

in the place where you want the query form to be inserted. (Put nothing else on this line, or it may confuse the script which automatically generates the new page.)

After creating this file, indicate its location in the field provided, and that will be enough to override the use of the stock query page template provided for you.

Query-Results Template

Just as you can create your own, completely customized query page, you may create your own completely customized query-results page -- the page a user sees when getting back the results of a query on a particular document collection.

To do so, you must provide a template file (known as a Query-Results Template file) and indicate in the field provided where this file is located.

Like the Query Template, the Query-Results Template is a normal HTML file which contains on a line by itself:

  ###EXCITE###

, indicating, in this case, where you want the query results to appear.