FAQ
Hi,

I have 6 fields in a document with respective data types given below.

field name data type
------------------------
content text
title text
description text
content_em text_ws
title_em text_ws
description text_ws

My requirement is to prioritize search results based on exact and partial
match conditions. Document that have exact match should have high score than
documents with partial match.

To achieve this problem I have added 3 fields
(content_em,title_em,description_em) which contains the same content of
content,title and description respectively.

My dismax query is something similar to this

mm=1&qf=content^100+description^200+title^300&pf=content_em^500000+description_em^700000+title_em^900000&fl=id&start=0&q=London&qt=dismax

I have 2 problems with this approach:

Problem 1:

For instance if doc1 has London text appearing 1 time in description,
content and title fields and doc2 has
same text appearing 1 time only in description and content field, doc2 gives
me high score than doc1. Can anyone explain why this happens? Since I give
more boost to title field, I expect term matching that field should be given
more score.


Problem 2

Another scenario is with the search term "Ryder Cup".
Doc 1 has text "Cup" appearing 20 or more times in content field
Doc 2 has text "Ryder Cup" appearing 1 time in title field

On search I expect Doc 2 to be on top since I want exact match documents to
be prioritized over partial match documents. But unfortunatly Doc 1 comes on
top with more scoring.

Since I am new to Lucene, can anyone help me to solve these problem?

Many Thanks,
Balaji.

--
View this message in context: http://lucene.472066.n3.nabble.com/Scoring-Pattern-for-partial-and-exact-match-search-results-tp1780471p1780471.html
Sent from the Lucene - General mailing list archive at Nabble.com.

Search Discussions

  • Balaji.A at Oct 27, 2010 at 2:36 pm
    Hi,

    I have 6 fields in a document with respective data types given below.

    field name data type
    ------------------------
    content text
    title text
    description text
    content_em text_ws
    title_em text_ws
    description text_ws

    My requirement is to prioritize search results based on exact and partial
    match conditions. Document that have exact match should have high score than
    documents with partial match.

    To achieve this problem I have added 3 fields
    (content_em,title_em,description_em) which contains the same content of
    content,title and description respectively.

    My dismax query is something similar to this

    mm=1&qf=content^100+description^200+title^300&pf=content_em^500000+description_em^700000+title_em^900000&fl=id&start=0&q=London&qt=dismax

    I have 2 problems with this approach:

    Problem 1:

    For instance if doc1 has London text appearing 1 time in description,
    content and title fields and doc2 has
    same text appearing 1 time only in description and content field, doc2 gives
    me high score than doc1. Can anyone explain why this happens? Since I give
    more boost to title field, I expect term matching that field should be given
    more score.


    Problem 2

    Another scenario is having a search term "Ryder Cup".
    Doc 1 has text "Cup" appearing 20 or more times in content field
    Doc 2 has text "Ryder Cup" appearing 1 time in title field

    On search I expect Doc 2 to be on top since I want exact match documents to
    be prioritized over partial match documents. But unfortunatly Doc 1 comes on
    top with more scoring.

    Since I am new to Lucene, can anyone help me to solve these problem?

    Many Thanks,
    Balaji.

    --
    View this message in context: http://lucene.472066.n3.nabble.com/Scoring-Pattern-for-partial-and-exact-match-search-results-tp1780478p1780478.html
    Sent from the Lucene - General mailing list archive at Nabble.com.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupgeneral @
categorieslucene
postedOct 27, '10 at 2:35p
activeOct 27, '10 at 2:36p
posts2
users1
websitelucene.apache.org

1 user in discussion

Balaji.A: 2 posts

People

Translate

site design / logo © 2018 Grokbase