[exploratory] modules/language-models: semantic highlighting component by cpoerschke · Pull Request #4230 · apache/solr

cpoerschke · 2026-03-20T17:16:57Z

done-ish list:

skeleton SemanticHighlightingComponent class in modules/language-models to avoid core/HighlightingComponent.java having modules/language-models dependency
created CustomModel.java and custom-model.json providing hard-coded mock embeddings for test use without an external model provider dependency
skeleton logic to use a language model to compute a score for a Passage
minimal SemanticHighlightingComponentTest class to illustrate usage

to-do list, non-exhaustive:

consideration of parameter details e.g. how to request semantic highlighting and the model(s) to use
how to obtain the vector against which passages are compared e.g. from some new parameter directly or by extraction from the q or hl.q parameter if a knn parser is used or by some other way?
consideration of Passage extraction e.g. currently it is term based but what if the q was a vector query i.e. no terms
should the PassageScorer and/or the Comparator<Passage> apply the language model to candidate passages?
how to properly and efficiently compute vector distances? currently using euclidian distance for illustration only.
tests
documentation
???

[exploratory] modules/language-models: semantic highlighting component

bbbee9b

github-actions bot added dependencies Dependency upgrades tool:build tests labels Mar 20, 2026

cpoerschke added the module:language-models label Mar 20, 2026

Provide feedback