feat: implement nlp based blocks prioritization strategy

feat: add weight to nlp entity schema and readapt feat: remove commented obsolete code feat: restore settings feat: apply feedback fix: re-adapt unit tests feat: priority scoring re-calculation & enabling weight modification in builtin nlp entities fix: remove obsolete code feat: refine unit tests, apply mr coderabbit suggestions fix: minor refactoring feat: add nlp cache map type feat: refine builtin nlp entities weight updates feat: add more test cases and refine edge case handling feat: add weight validation in UI fix: apply feedback feat: add a penalty factor & fix unit tests feat: add documentation fix: correct syntax fix: remove stale log statement fix: enforce nlp entity weight restrictions fix: correct typo in docs fix: typos in docs fix: fix formatting for function comment fix: restore matchNLP function previous code fix: remove blank line, make updateOne asynchronous fix: add AND operator in docs fix: handle dependency injection in chat module feat: refactor to use findAndPopulate in block score calculation feat: refine caching mechanisms feat: add typing and enforce safety checks fix: remove typo fix: remove async from block score calculation fix: remove typo fix: correct linting fix: refine nlp pattern type check fix: decompose code into helper utils, add nlp entity dto validation, remove type casting fix: minor refactoring feat: refactor current implementation
2025-06-26 18:27:28 +00:00 · 2025-03-26 13:11:07 +01:00
parent 0db40680dc
commit bab2e3082f
31 changed files with 1061 additions and 49 deletions
--- a/api/docs/nlp/README.md
+++ b/api/docs/nlp/README.md
@@ -0,0 +1,102 @@
+# NLP Block Scoring
+## Purpose
+
+**NLP Block Scoring** is a mechanism used to select the most relevant response block based on:
+
+- Matching patterns between user input and block definitions
+- Configurable weights assigned to each entity type
+- Confidence values provided by the NLU engine for detected entities
+
+It enables more intelligent and context-aware block selection in conversational flows.
+
+## Core Use Cases
+### Standard Matching
+
+A user input contains entities that directly match a block’s patterns.
+```ts
+Example: Input: intent = enquiry & subject = claim
+Block A: Patterns: intent: enquiry & subject: claim
+Block A will be selected.
+```
+
+### High Confidence, Partial Match
+
+A block may match only some patterns but have high-confidence input on those matched ones, making it a better candidate than others with full matches but low-confidence entities.
+**Note: Confidence is multiplied by a pre-defined weight for each entity type.**
+
+```ts
+Example:
+Input: intent = issue (confidence: 0.92) & subject = claim (confidence: 0.65)
+Block A: Pattern: intent: issue
+Block B: Pattern: subject: claim
+➤ Block A gets a high score based on confidence × weight (assuming both weights are equal to 1).
+```
+
+### Multiple Blocks with Similar Patterns
+
+```ts
+Input: intent = issue & subject = insurance
+Block A: intent = enquiry & subject = insurance
+Block B: subject = insurance
+➤ Block B is selected — Block A mismatches on intent.
+```
+
+### Exclusion Due to Extra Patterns
+
+If a block contains patterns that require entities not present in the user input, the block is excluded from scoring altogether. No penalties are applied — the block simply isn't considered a valid candidate.
+
+```ts
+Input: intent = issue & subject = insurance
+Block A: intent = enquiry & subject = insurance & location = office
+Block B: subject = insurance & time = morning
+➤ Neither block is selected due to unmatched required patterns (`location`, `time`)
+```
+
+### Tie-Breaking with Penalty Factors
+
+When multiple blocks receive similar scores, penalty factors can help break the tie — especially in cases where patterns are less specific (e.g., using `Any` as a value).
+
+```ts
+Input: intent = enquiry & subject = insurance
+
+Block A: intent = enquiry & subject = Any
+Block B: intent = enquiry & subject = insurance
+Block C: subject = insurance
+
+Scoring Summary:
+- Block A matches both patterns, but subject = Any is considered less specific.
+- Block B has a redundant but fully specific match.
+- Block C matches only one pattern.
+
+➤ Block A and Block B have similar raw scores.
+➤ A penalty factor is applied to Block A due to its use of Any, reducing its final score.
+➤ Block B is selected.
+```
+
+## How Scoring Works
+### Matching and Confidence
+
+For each entity in the block's pattern:
+- If the entity `matches` an entity in the user input:
+    - the score is increased by: `confidence × weight`
+        - `Confidence` is a value between 0 and 1, returned by the NLU engine.
+        - `Weight` (default value is `1`) is a configured importance factor for that specific entity type.
+- If the match is a wildcard (i.e., the block accepts any value):
+    - A **penalty factor** is applied to slightly reduce its contribution:
+    ``confidence × weight × penaltyFactor``. This encourages more specific matches when available.
+
+### Scoring Formula Summary
+
+For each matched entity: 
+
+```ts
+score += confidence × weight × [optional penalty factor if wildcard]
+```
+
+The total block score is the sum of all matched patterns in that block.
+
+### Penalty Factor
+
+The **penalty factor** is a global multiplier (typically less than `1`, e.g., `0.8`) applied when the match type is less specific — such as wildcard or loose entity type matches. It allows the system to:
+- Break ties in favor of more precise blocks
+- Discourage overly generic blocks from being selected when better matches are available