mirror of https://github.com/hexastack/hexabot synced 2025-06-26 18:27:28 +00:00

Mohamed Marrouchi c018932400 feat: implement nlp based blocks prioritization strategy

feat: add weight to nlp entity schema and readapt

feat: remove commented obsolete code

feat: restore settings

feat: apply feedback

fix: re-adapt unit tests

feat: priority scoring re-calculation & enabling weight modification in builtin nlp entities

fix: remove obsolete code

feat: refine unit tests, apply mr coderabbit suggestions

fix: minor refactoring

feat: add nlp cache map type

feat: refine builtin nlp entities weight updates

feat: add more test cases and refine edge case handling

feat: add weight validation in UI

fix: apply feedback

feat: add a penalty factor & fix unit tests

feat: add documentation

fix: correct syntax

fix: remove stale log statement

fix: enforce nlp entity weight restrictions

fix: correct typo in docs

fix: typos in docs

fix: fix formatting for function comment

fix: restore matchNLP function previous code

fix: remove blank line, make updateOne asynchronous

fix: add AND operator in docs

fix: handle dependency injection in chat module

feat: refactor to use findAndPopulate in block score calculation

feat: refine caching mechanisms

feat: add typing and enforce safety checks

fix: remove typo

fix: remove async from block score calculation

fix: remove typo

fix: correct linting

fix: refine nlp pattern type check

fix: decompose code into helper utils,  add nlp entity dto validation, remove type casting

fix: minor refactoring

feat: refactor current implementation

2025-05-06 15:30:11 +01:00

3.8 KiB

Raw Blame History

NLP Block Scoring

Purpose

NLP Block Scoring is a mechanism used to select the most relevant response block based on:

Matching patterns between user input and block definitions
Configurable weights assigned to each entity type
Confidence values provided by the NLU engine for detected entities

It enables more intelligent and context-aware block selection in conversational flows.

Core Use Cases

Standard Matching

A user input contains entities that directly match a block’s patterns.

Example: Input: intent = enquiry & subject = claim
Block A: Patterns: intent: enquiry & subject: claim
Block A will be selected.

High Confidence, Partial Match

A block may match only some patterns but have high-confidence input on those matched ones, making it a better candidate than others with full matches but low-confidence entities. Note: Confidence is multiplied by a pre-defined weight for each entity type.

Example:
Input: intent = issue (confidence: 0.92) & subject = claim (confidence: 0.65)
Block A: Pattern: intent: issue
Block B: Pattern: subject: claim
➤ Block A gets a high score based on confidence × weight (assuming both weights are equal to 1).

Multiple Blocks with Similar Patterns

Input: intent = issue & subject = insurance
Block A: intent = enquiry & subject = insurance
Block B: subject = insurance
➤ Block B is selected — Block A mismatches on intent.

Exclusion Due to Extra Patterns

If a block contains patterns that require entities not present in the user input, the block is excluded from scoring altogether. No penalties are applied — the block simply isn't considered a valid candidate.

Input: intent = issue & subject = insurance
Block A: intent = enquiry & subject = insurance & location = office
Block B: subject = insurance & time = morning
➤ Neither block is selected due to unmatched required patterns (`location`, `time`)

Tie-Breaking with Penalty Factors

When multiple blocks receive similar scores, penalty factors can help break the tie — especially in cases where patterns are less specific (e.g., using Any as a value).

Input: intent = enquiry & subject = insurance

Block A: intent = enquiry & subject = Any
Block B: intent = enquiry & subject = insurance
Block C: subject = insurance

Scoring Summary:
- Block A matches both patterns, but subject = Any is considered less specific.
- Block B has a redundant but fully specific match.
- Block C matches only one pattern.

➤ Block A and Block B have similar raw scores.
➤ A penalty factor is applied to Block A due to its use of Any, reducing its final score.
➤ Block B is selected.

How Scoring Works

Matching and Confidence

For each entity in the block's pattern:

If the entity matches an entity in the user input:
- the score is increased by: confidence × weight
  - Confidence is a value between 0 and 1, returned by the NLU engine.
  - Weight (default value is 1) is a configured importance factor for that specific entity type.
If the match is a wildcard (i.e., the block accepts any value):
- A penalty factor is applied to slightly reduce its contribution: confidence × weight × penaltyFactor. This encourages more specific matches when available.

Scoring Formula Summary

For each matched entity:

score += confidence × weight × [optional penalty factor if wildcard]

The total block score is the sum of all matched patterns in that block.

Penalty Factor

The penalty factor is a global multiplier (typically less than 1, e.g., 0.8) applied when the match type is less specific — such as wildcard or loose entity type matches. It allows the system to:

Break ties in favor of more precise blocks
Discourage overly generic blocks from being selected when better matches are available

3.8 KiB Raw Blame History Unescape Escape