Templeton's Known Limitations
Most software documentation tells what the program can do, but
we feel that it is equally important to know what it cannot do.
Due to the vast options available on most web servers, Templeton is not
capable of retrieving some types of documents. These known limitations
effect virtually all types of web robots and are described below. Many of
these limitations are expected to be corrected in later releases, although
currently no dates have been announced.
Protocol Limitations
- Templeton attempts to identify file types using HEAD requests. Some
servers (especially database systems) return errors for HEAD requests
but complete documents for GET commands. Templeton may incorrectly
determine a file as unretrievable if the HEAD request fails.
- Some servers do not return error 404 (or 4xx) for invalid documents.
A few servers have been seen returning HTML documents containing
a text error message and return code 200 (OK). Templeton will incorrectly
process these documents. (In the case of imagemaps, these documents will
not be processed.)
- Templeton does not support POST commands.
- Netscape "client pull" (aka refresh) tags are not supported.
These tags are usually in the form:
<META HTTP-EQUIV="Refresh" CONTENT="?;URL=URL>
- Templeton does not support text entry fields, selections, or pushbuttons.
- Templeton does not retrieve Java code. This feature may be
available in later releases.
- Image Maps are "clickable" pictures. By clicking on a portion
of the picture, you activate a link to another HTML document.
There are currently four ways to resolve imagemap URL addresses:
- Client-Side processing requires the URL to be identified
by the client application. Templeton supports this method.
- CERN/NCSA server-side mappings use the URL path to describe the
mapping program (generally imagemap) and the server's path to the
map file. Templeton attempts to resolve the map file's path from the
applications path, but may fail if the server does not return error 404
(not found) for invalid document paths.
- Imagemaps may refer to executables. Since Templeton cannot
decompile executables, these imagemaps cannot be retrieved.
- Some servers support map directories where all files in the
directory are processed as map resolution files. These servers do not
permit the retrieval of the map document. Consequently, map directory
references are not supported by Templeton.
Network Limitations
- Templeton may seem to hang when the nameserver is unable to resolve a
hostname. This "hanging" is the nameserver request timing out. This interval
is system dependent, but is generally no longer than two minutes. Currently,
Templeton is single-threaded and at the mercy of this limitation. This
may be corrected in later releases.
- gethostname() under OS/2 does not resolve properly if the environment
variable HOSTNAME is not set. This function also has problems when operated
on a gateway using a private LAN and a PPP/SLIP interface, with hostnames
that begin with numbers (such as those provided by Worldnet.att.net), and
with machine names containing a '.' character.
Currently, the user must set their username in the configuration file.
This will be corrected in a future release.
- Templeton cannot determine if a mounted drive supports long file names.
It is assumed that the drive does support long file names, but this may be a
bad assumption. For example, a DOS machine can export a partition to a
Linux system running Templeton. The user must set the FATFlag in the
configuration file to overcome this problem. The DOS version of Templeton
(when released) does not support long file names.
Program Limitations
- Command_html, etc. have a maximum string length of 1023 characters.
Attempts to create a command longer than 1023 characters
(including expanded %) will cause unpredictable results.
This may be changed in future releases.
- Maximum URL length is 255 characters. Longer URLs will be truncated.
This may be changed in future releases.
- Templeton (and other web robots) assume that the previous password for
a realm remains valid. This assumption is incorrect when different documents
(usually on different servers) coincidently use the same realm string.
Templeton only supports one (1) username:password per realm.
This may be changed in later releases.
- Templeton performs a single-pass processing on all documents. If a link
is thought to exist (and placed in the processing queue) but later is
unretrievable, all previously retrieved documents will contain links to a
nonexisting file.
- Templeton only permits a 4K meta header from the server.
Longer meta headers may be interpreted as part of the document.
This is currently implemented so servers which do not return meta information
will not have the entire (huge) file parsed.
This will be modified in a future release.
- Because Templeton is unable to know the default file of a directory (it
differs on each server), Templeton is unable to distinguish between '/' and
'/default_index_name'. If the server's default index name matches Templeton's
'Index' name (defined in the configuration file) then the default directory
file may be retrieved twice.
Rational: Suppose there are 2 files: Welcome.html and Index.html.
Welcome.html is the server's default file name.
Index.html (coincidently) is Templeton's default file name.
A single HTML file may refer to "/", "/Welcome.html", and "/Index.html".
Assuming that the default directory file is the same as Templeton's would
incorrectly store a copy of Welcome.html in the local Index.html file.
Retrieving the file twice resolves this issue.
Unfortunately, this also means a default server file called "Index.html"
would be retrieved twice (once for "/" and once for "/Index.html").
Worse: if the default index is never specified by an HTML document, but
a different remote file called "Index.html" is found, then the default
directory data will be lost.
How's that for a complicated situation?
This may be modified in a future release.
[Main Menu]
Document revision: 18 Mar. 1997 for Templeton 1.950
Copyright 1996,1997 N.A. Krawetz
Modification, republication, and redistribution of this
document is strictly prohibited. All rights reserved.