Files
hexabot/api/docs/nlp/README.md
Mohamed Marrouchi bab2e3082f feat: implement nlp based blocks prioritization strategy
feat: add weight to nlp entity schema and readapt

feat: remove commented obsolete code

feat: restore settings

feat: apply feedback

fix: re-adapt unit tests

feat: priority scoring re-calculation & enabling weight modification in builtin nlp entities

fix: remove obsolete code

feat: refine unit tests, apply mr coderabbit suggestions

fix: minor refactoring

feat: add nlp cache map type

feat: refine builtin nlp entities weight updates

feat: add more test cases and refine edge case handling

feat: add weight validation in UI

fix: apply feedback

feat: add a penalty factor & fix unit tests

feat: add documentation

fix: correct syntax

fix: remove stale log statement

fix: enforce nlp entity weight restrictions

fix: correct typo in docs

fix: typos in docs

fix: fix formatting for function comment

fix: restore matchNLP function previous code

fix: remove blank line, make updateOne asynchronous

fix: add AND operator in docs

fix: handle dependency injection in chat module

feat: refactor to use findAndPopulate in block score calculation

feat: refine caching mechanisms

feat: add typing and enforce safety checks

fix: remove typo

fix: remove async from block score calculation

fix: remove typo

fix: correct linting

fix: refine nlp pattern type check

fix: decompose code into helper utils,  add nlp entity dto validation, remove type casting

fix: minor refactoring

feat: refactor current implementation
2025-05-12 07:29:56 +01:00

103 lines
3.8 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# NLP Block Scoring
## Purpose
**NLP Block Scoring** is a mechanism used to select the most relevant response block based on:
- Matching patterns between user input and block definitions
- Configurable weights assigned to each entity type
- Confidence values provided by the NLU engine for detected entities
It enables more intelligent and context-aware block selection in conversational flows.
## Core Use Cases
### Standard Matching
A user input contains entities that directly match a blocks patterns.
```ts
Example: Input: intent = enquiry & subject = claim
Block A: Patterns: intent: enquiry & subject: claim
Block A will be selected.
```
### High Confidence, Partial Match
A block may match only some patterns but have high-confidence input on those matched ones, making it a better candidate than others with full matches but low-confidence entities.
**Note: Confidence is multiplied by a pre-defined weight for each entity type.**
```ts
Example:
Input: intent = issue (confidence: 0.92) & subject = claim (confidence: 0.65)
Block A: Pattern: intent: issue
Block B: Pattern: subject: claim
Block A gets a high score based on confidence × weight (assuming both weights are equal to 1).
```
### Multiple Blocks with Similar Patterns
```ts
Input: intent = issue & subject = insurance
Block A: intent = enquiry & subject = insurance
Block B: subject = insurance
Block B is selected Block A mismatches on intent.
```
### Exclusion Due to Extra Patterns
If a block contains patterns that require entities not present in the user input, the block is excluded from scoring altogether. No penalties are applied — the block simply isn't considered a valid candidate.
```ts
Input: intent = issue & subject = insurance
Block A: intent = enquiry & subject = insurance & location = office
Block B: subject = insurance & time = morning
Neither block is selected due to unmatched required patterns (`location`, `time`)
```
### Tie-Breaking with Penalty Factors
When multiple blocks receive similar scores, penalty factors can help break the tie — especially in cases where patterns are less specific (e.g., using `Any` as a value).
```ts
Input: intent = enquiry & subject = insurance
Block A: intent = enquiry & subject = Any
Block B: intent = enquiry & subject = insurance
Block C: subject = insurance
Scoring Summary:
- Block A matches both patterns, but subject = Any is considered less specific.
- Block B has a redundant but fully specific match.
- Block C matches only one pattern.
Block A and Block B have similar raw scores.
A penalty factor is applied to Block A due to its use of Any, reducing its final score.
Block B is selected.
```
## How Scoring Works
### Matching and Confidence
For each entity in the block's pattern:
- If the entity `matches` an entity in the user input:
- the score is increased by: `confidence × weight`
- `Confidence` is a value between 0 and 1, returned by the NLU engine.
- `Weight` (default value is `1`) is a configured importance factor for that specific entity type.
- If the match is a wildcard (i.e., the block accepts any value):
- A **penalty factor** is applied to slightly reduce its contribution:
``confidence × weight × penaltyFactor``. This encourages more specific matches when available.
### Scoring Formula Summary
For each matched entity:
```ts
score += confidence × weight × [optional penalty factor if wildcard]
```
The total block score is the sum of all matched patterns in that block.
### Penalty Factor
The **penalty factor** is a global multiplier (typically less than `1`, e.g., `0.8`) applied when the match type is less specific — such as wildcard or loose entity type matches. It allows the system to:
- Break ties in favor of more precise blocks
- Discourage overly generic blocks from being selected when better matches are available