Thursday, March 22, 2012

custom wordbreaker

Is it possible to create a custom word breaker for SQL server? I've found
documentation for sharepoint and other technologies but SQL server appears
to be different and there are very few articles about customization (beyond
editing the junk word lists) for it.Can you provide some more information on what you are trying to do here? Are
you looking for some kind of a parsing routine that parses a character
string based on the occurrence of white space and punctuation?
Anith|||The problem I'm trying to address is that all the standard word breakers
don't handle chemical compound names correctly. For example AZ-113 might be
a valid term but even if i stick it in double quotes, the word breaker turns
it into two seperate terms 'AZ' and '113'. I need a way to have it keep
these as specific terms.
"Anith Sen" <anith@.bizdatasolutions.com> wrote in message
news:e58Ee#EFFHA.628@.TK2MSFTNGP15.phx.gbl...
> Can you provide some more information on what you are trying to do here?
Are
> you looking for some kind of a parsing routine that parses a character
> string based on the occurrence of white space and punctuation?
> --
> Anith
>|||Clyde Seigle wrote:
> The problem I'm trying to address is that all the standard word
> breakers don't handle chemical compound names correctly. For example
> AZ-113 might be a valid term but even if i stick it in double quotes,
> the word breaker turns it into two seperate terms 'AZ' and '113'. I
> need a way to have it keep these as specific terms.
>
I'm not sure you've answered the question Anith asked. What is a
word-breaker as you are using it? Is it a function on the client, a
library you are using, etc.? What exactly do you need it to do and when
does it need to do this? For example, are you more interested in
breaking up the words in a text column for display in an application or
are you breaking up words at insert time in the database?
If you can do this on the application side, I'd recommend you look at a
Regular Expression library that Microsoft provides. There's a COM
version installed with Windows (in VBScript.dll) and there's a version
for .Net. Regular expressions can be a little daunting at first, but
provide a great way to search and validate all kinds of text.
David G.|||Sorry for not being clear and, in fact, I realize that this post is
misplaced (should be in ...fulltext). My question relates to the FullText
indexer that is part of SqlServer. It has a built-in word breaker that is
uses to do it's full text indexing. This is where I'm having the problems.
"David Gugick" <davidg-nospam@.imceda.com> wrote in message
news:#fn3YRJFFHA.2756@.TK2MSFTNGP15.phx.gbl...
> Clyde Seigle wrote:
> I'm not sure you've answered the question Anith asked. What is a
> word-breaker as you are using it? Is it a function on the client, a
> library you are using, etc.? What exactly do you need it to do and when
> does it need to do this? For example, are you more interested in
> breaking up the words in a text column for display in an application or
> are you breaking up words at insert time in the database?
> If you can do this on the application side, I'd recommend you look at a
> Regular Expression library that Microsoft provides. There's a COM
> version installed with Windows (in VBScript.dll) and there's a version
> for .Net. Regular expressions can be a little daunting at first, but
> provide a great way to search and validate all kinds of text.
>
> --
> David G.
>|||Regrettably with both OS's Win2k and Win2003 the wordbreakers will break
AZ-113 as two separate words. Your best bet is to convert all such phrases
or tokens in your searches and content to AZ113.
Hilary Cotter
Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html
Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com
"Clyde Seigle" <clydeSeigle@.nospam.nospam> wrote in message
news:u8ayJlRFFHA.2156@.TK2MSFTNGP10.phx.gbl...
> Sorry for not being clear and, in fact, I realize that this post is
> misplaced (should be in ...fulltext). My question relates to the FullText
> indexer that is part of SqlServer. It has a built-in word breaker that is
> uses to do it's full text indexing. This is where I'm having the problems.
>
> "David Gugick" <davidg-nospam@.imceda.com> wrote in message
> news:#fn3YRJFFHA.2756@.TK2MSFTNGP15.phx.gbl...
>|||Clyde,
Yes, it is best to discuss this subject (custom wordbreaker) in the fulltext
newsgroup... First a couple of very important questions on your environment
and the language of the text you are FT Indexing. Could you post the full
output of the following SQL code?
use <your_database_name_here>
go
SELECT @.@.language
SELECT @.@.version
sp_configure 'default full-text language'
EXEC sp_help_fulltext_catalogs
EXEC sp_help_fulltext_tables
EXEC sp_help_fulltext_columns
EXEC sp_help <your_FT-enable_table_name_here>
go
The above information is most important in helping troubleshoot the common
word breaking issues, I've seen in this newsgroup over many years. Note, for
SQL Server 2000, the word breakers you are using are specific to the OS
platform that your SQL Server is installed on. See
http://groups.google.com/groups?q=langwrbk+infosoft for a discussion on
Win2K's infosoft.dll vs. WinXP & Win2003's langwrkb.dll wordbreaker issues.
Regards,
John
--
SQL Full Text Search Blog
http://spaces.msn.com/members/jtkane/
"Clyde Seigle" <clydeSeigle@.nospam.nospam> wrote in message
news:u8ayJlRFFHA.2156@.TK2MSFTNGP10.phx.gbl...
> Sorry for not being clear and, in fact, I realize that this post is
> misplaced (should be in ...fulltext). My question relates to the FullText
> indexer that is part of SqlServer. It has a built-in word breaker that is
> uses to do it's full text indexing. This is where I'm having the problems.
>
> "David Gugick" <davidg-nospam@.imceda.com> wrote in message
> news:#fn3YRJFFHA.2756@.TK2MSFTNGP15.phx.gbl...
>

No comments:

Post a Comment