|
|
|
|
dtSearch
Desktop , Network
and Web contain a built-in Spider that
provides integrated searching of remote Web site content, along
with locally-available data. The dtSearch Spider can index and
search dynamically-generated content, such as ASP/ASP.NET, MS
CMS, MS SharePoint, etc.
|
|
The
Spider can index XML, HTML, PDF, ASP and ASP.NET Web pages, as well
as online postings of text based documents such as word processor
files and spreadsheets.
|
|
dtSearch
Desktop and Network will display Web pages and documents with highlighted
hits as well as links and images intact within HTML and PDF files. |
How
the dtSearch Spider Works
To index
a Web site, select "Add web" as shown below:
In the
dialog box that pops up, enter the name of the Web site, for example
http://www.federalreserve.gov/ then select the crawl depth; a crawl
depth of 1 will reach only pages linked directly to the home page, a
crawl depth of 4 will reach four levels deep into the site
and so on.
Options allow the Spider to crawl across multiple servers from a single
starting URL, limit the maximum size of items to download, the number
of files to index and number of minutes to spend indexing on a single
web site. The Spider supports the "robots" no index and no
follow meta tags. The Spider can perform "vertical" searching
of pages linked from a URL, as well as "horizontal" crawling
of sites linked to a URL.
Online
Demo
For
a Spider demo operating through dtSearch Web, click
here. (The www.dtsearch.com spidered site is hosted on a completely
different hosting system and physical location from the site that is
running the Search Site demo.)
Technical
Notes
Web
pages or text can be cached in version 7 indexes see
here for details.
In addition
to searching publicly available Web site, the Spider also supports indexing
and searching of secure content HTTPS sites and password-accessible
sites.The Spider also supports forms-based authentication.
For information
on searching ASP, please see this FAQ article: How
to use dtSearch Web with dynamically-generated content.
Related
Topics
|