Category Archives: SQL Server

Full-text Index Structure Understanding

Full-text Index Structure:

Here we can get a clear idea about the Indexing process:

Let we applying indexing on the following table

ID

Description

1

Student, Pupils, Registration, University and school

2

The Teachers, Lecturer, Professor, University, Registration, 100

Then the FT INDEX look like this>> created by SQL Server: SQL-Server-Installation Path\Microsoft SQL Server\MSSQL.1\FTDATA

Keyword

ColId

DocId

Occ

Student

1

1

1

Pupils

1

1

2

Registration

1

1

3

University

1

1

4

School

1

1

6 (and is a noise word)

Teachers

1

2

2 (the is a noise word)

Lecturer

1

2

3

Professor

1

2

4

University

1

2

5

Registration

1

2

6

100

1

2

7

Columns in the Index table:

Keyword: Contains a single token extracted at indexing time. Word broker determines what makes a single token.

ColId: a value that corresponds to a particular table and Column that are FT indexed.

DocId: a Four-byte integer maps a particular Full-text Key value in a full-text indexed table.
Occ: Represent a particular word offset of the particular keyword within that DocId.

Full-text search Why and How??

This will be a series on Full text Search in MS SQL Server. In this tutorial you wll learn the details of full text search which is one of the fastest to find data using CONTAINS key word rather than LIKE keyword we normally use. I will also discuss about to make ranking on search result depending on the matching criteria and search string.
Lets start the First Post on FULL TEXT SEARCH:
FULL-TEXT SEARCH
Full-text Search Concepts:
Full-text search allows fast and flexible indexing for keyword-based query of text data. We can gain significant enhancements in the areas of performances, manageability, and functionality deliver for exceptional search capabilities for application of any size. The performance benefit of using FT can be best realizes when we use for unstructured huge text data.

Quote:

“A LIKE query against millions of rows of text data can take minutes to return, whereas a full-text query can take only one seconds or less against the same data, depending on the numbers of rows to return”: MSDN.

FULL-TEXT SEARCH TERMINOLOGY:
Full-text Index: Stores information about significant words and their location within a given column. Index information are use to search quickly for rows with particular words or their combinations.
Full-text Catalog: A full-text catalog may contain Zero or more full-text indexes. Each catalog may serve indexing needs of one or more tables within a Single DB. And Catalog must reside on a local hard drive (not in removable disk, net drive) associated with instance of SQL Server.
Word Broker: tokenizes text based on the lexical rules of a specific language.
Stemmer: Create inflectional (Synonyms, Antonyms, forms of word etc. Like go>went>>gone).
Filter: Filter a specified file type and also filter extract text from a file stored in varbinary or image column.
Population or Crawl: Is the process of creating and maintaining full-text index.
Noise Words: like “a”, “an”, “the”>> frequently occurring words that do not help the search. These words are ignored when creating index.
Full-text Indexes:
Full-text index is a special type of token based functional index built and maintained be MS Full-text for SQL Server (MSFTESQL). The process of creating and maintaining a full-text index is called index POPULATION.
Population Types:• Full population
• Change tracking-based population
• Incremental timestamp-based population
Full Population:
Typically occurs when a full-text index is first populated. Index entries are built for all the rows in all the tables covered by the catalog. If a full population is requested for a table, index entries are built for all the rows in that table.
CREATE FULLTEXT INDEX CHANGE TRACKING AUTO
>> CHANGE TRACKING OFF NO POPULATION if we do not want to populate the index at the time of creation.
CHANGE TRACKING BASED POPULATION
CHANGE TRACKING [AUTO/MANUAL/OFF]
AUTO>> automatically propagate changes to the FT index.
MANUAL>> propagate index on a schedule (e.g. every day, every hour), SQL server agent or programmer by himself by
ALTER FULLTEXT INDEX
Indexed update asynchronously (runs in the Background) when DB activity is low.
INCREMENTAL TIMESTAMP-BASED Population:
Updates the FT index for rows added, deleted or modified after the last population or while last population was in progress
Requirements: Indexed table must have a column of the timestamp data type.
Hope you enjoyed the first series on Full text search. Stay tuned for more.

Follow

Get every new post delivered to your Inbox.

Join 54 other followers