Monday, October 04, 2004

Net Search Extender

Intro:

DB2 Net Search Extender Version 8.1 is one of a family of DB2 Extenders™. It replaces DB2 Text Information Extender Version 7.2, and Net Search Extender Version 7.2 and offers users and application programmers a fast, versatile, and intelligent method of searching full-text documents stored in DB2, other databases, and file systems using SQL queries.

Basically, DB2 Net Search Extender searches text documents held in the column of a database table. The documents can be in various formats, such as HTML or XML. Rather than sequentially searching through the text documents that would take a considerable amount of time, Net Search Extender creates a text index in order to make documents searchable. A text index consists of significant terms extracted from the text documents.

Installation:

  • Make sure you have the appropriate DB2 Fixpack that matches Extender’s Pre-requisite. (RTFM)
  • Install NSE and Fixpack if its needs any. (You can get a trial download that can be used for 90days from IBM’s site.)
  • After installation run the sample script available in the “SQLLIB/samples/nsesample.bat”

nsesample.bat

The script creates the given DBNAME if it doesn’t exist and tests if the NSE installation was successful.

  • Check the nsesample.log file to make sure everything is fine.

Enabling a database for text search:

db2text start (Starts the NSE instance services)

db2text ENABLE DATABASE FOR TEXT CONNECT TO
(Prepares the database for use with NSE)

As DB2 commands db2text ? gives the syntax of the command.

Creating an Index:

Note: The table being referenced is the one which has the Text/HTML Document.

NSE needs an index to be created on the column that needs to be searched (You need to have a primary key to create an index). Create an index on the table that has the Text/HTML file that needs to be searched.

db2text create index db2ext. for text on

() UPDATE FREQUENCY D (*) H (*) M (10) connect to

(This command creates an index and sets the update frequency to 10 minutes, this is required to avoid synchronization issues)

Right after creation, update the index to fill the created index with data

db2text update index db2ext. for text connect to


Queries that can be used now:

SELECT AUTHOR,TITLE FROM DB2EXT.TEXTTAB WHERE CONTAINS(COMMENT, ’"book"’) = 1

Searches for the word “book” in the text fields of the column comment.

SELECT AUTHOR,TITLE FROM DB2EXT.TEXTTAB WHERE CONTAINS(COMMENT, ’"book" & “db2”’)=1

As you do in Google you can use logical operators like “&” “|” “NOT”. And they can be mixed in a search string.

SELECT AUTHOR, TITLE, NUMBEROFMATCHES(COMMENT, ’"book"’) FROM DB2EXT.TEXTTAB

Gives the number of matches found for the given word.

WITH TEMPTABLE(docid,score) AS (SELECT docid, SCORE(COMMENT,’"book"’) FROM DB2EXT.TEXTTAB)
SELECT * FROM TEMPTABLE WHERE score > 0 ORDER BY score ASC

Score tells you how well the documents meets your search. This can be used to order the search results.

This is an Interesting one!

SELECT AUTHOR, TITLE FROM DB2EXT.TEXTTAB WHERE CONTAINS(COMMENT, 'fuzzy form of 80 "Joh"') =1

Above query results will contain JOHN. This can be used when a normal search didn’t return any result. Ever wondered how Google is making up its “Did you mean :)

More searches:

  • Search for multiple words in same sentence
  • multiple words in same paragraph
  • search structured documents like HTML
  • Thesaurus search with the dictionary built by you.
  • Search for numeric attributes
  • Wild card search

Read the Users guide to get more info about NSE.

References:

Users Guide

PerFormance Overview

More Resources

1 comment:

Anonymous said...

Can Net Search Extender index words inside a text file wrapped into a DISTINCT DATA TYPE? What if it's in a BLOB in a CREATE DISTINCT TYPE?