Page 397 - AI for Good Innovate for Impact

P. 397

AI for Good Innovate for Impact

• Normalize Arabic text (e.g., remove diacritics, standardize characters)

3) Exact Match Validation

• Check for exact matches in existing names (from SQL or Elasticsearch) 4.4-Productivity
• If an exact match is found, flag it as not valid and return related record IDs

4) Semantic Similarity Search
• Convert input name into a semantic vector using multilingual Sentence-BERT [1]
• Query Elasticsearch’s vector index [2] using cosine similarity
• Retrieve top N semantically similar names

5) Prohibited Words Check
• Validate against a list of banned or sensitive words (e.g., political groups, terrorist
organizations)
• Reject name if it contains any prohibited terms

6) Policy-Based Validation

Apply name validation rules, such as:
• Domain-specific checks: Flag names that closely resemble ministry functions or other
restricted domains

7) Decision Logic & Thresholding
• If:

• No exact match
• No prohibited words
• Semantic similarity score is below defined threshold
• Name complies with all policies – Then name is valid
8) Response Generation: Return a structured response

Name Recommendation Pipeline

1) Input Capture

• Input name
• Language
• Reason for rejection

2) Text Cleaning & Normalization
• Remove legal terms, trim spaces, standardize text
• Normalize Arabic: remove diacritics, normalize alef/hamza, etc.

3) Rule-Based Variants

• Apply rule-based methods as before: synonym replacement, reordering, suffixes
• Generate a base pool of candidate names
4) Sentence-BERT Embedding

• Use multilingual SBERT model like paraphrase-multilingual-MiniLM-L12-v2 [3]

361

392 393 394 395 396 397 398 399 400 401 402