Skip to content

Fix: Improve Manticore search quality by using phrase matching per token #34

@shukebeta

Description

@shukebeta

Problem

Current search passes the raw query string to Manticore, which tokenizes CJK
text character by character and uses OR logic. This causes:

  • "小菜园" gets split into "小", "菜", "园" as separate tokens
  • Notes matching only "小" or only "菜" appear in results
  • After 1-2 pages, results become completely irrelevant
  • No way to get "newest relevant notes first" because relevance degrades so fast

Fix

Wrap each whitespace-separated token in quotes before passing to Manticore,
forcing phrase matching instead of token matching.

Query transformation

Input from user: 小菜园 浇水
Transformed: "小菜园" "浇水"

Both phrases must appear (AND semantics) — Manticore's default behaviour.

Code Change

In the search service, transform the query before sending to Manticore:

private static string BuildPhraseQuery(string rawQuery)
{
    var phrases = rawQuery
        .Trim()
        .Split(' ', StringSplitOptions.RemoveEmptyEntries)
        .Select(w => $"\"{w}\"");

    return string.Join(" ", phrases);
}

Then use it in the Manticore query:

{
  "query": {
    "query_string": "\"小菜园\" \"浇水\""
  },
  "sort": [
    { "weight()": "desc" },
    { "created_at": "desc" }
  ]
}

Expected Outcome

  • Search for "小菜园" only returns notes that contain "小菜园" as a continuous
    string, not notes that merely contain "小" or "菜"
  • Multi-word search "小菜园 浇水" requires both phrases to be present
  • Relevance stays high across pages instead of degrading after page 1-2

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions