Page 397 - AI for Good Innovate for Impact
P. 397
AI for Good Innovate for Impact
• Normalize Arabic text (e.g., remove diacritics, standardize characters)
3) Exact Match Validation
• Check for exact matches in existing names (from SQL or Elasticsearch) 4.4-Productivity
• If an exact match is found, flag it as not valid and return related record IDs
4) Semantic Similarity Search
• Convert input name into a semantic vector using multilingual Sentence-BERT [1]
• Query Elasticsearch’s vector index [2] using cosine similarity
• Retrieve top N semantically similar names
5) Prohibited Words Check
• Validate against a list of banned or sensitive words (e.g., political groups, terrorist
organizations)
• Reject name if it contains any prohibited terms
6) Policy-Based Validation
Apply name validation rules, such as:
• Domain-specific checks: Flag names that closely resemble ministry functions or other
restricted domains
7) Decision Logic & Thresholding
• If:
• No exact match
• No prohibited words
• Semantic similarity score is below defined threshold
• Name complies with all policies – Then name is valid
8) Response Generation: Return a structured response
Name Recommendation Pipeline
1) Input Capture
• Input name
• Language
• Reason for rejection
2) Text Cleaning & Normalization
• Remove legal terms, trim spaces, standardize text
• Normalize Arabic: remove diacritics, normalize alef/hamza, etc.
3) Rule-Based Variants
• Apply rule-based methods as before: synonym replacement, reordering, suffixes
• Generate a base pool of candidate names
4) Sentence-BERT Embedding
• Use multilingual SBERT model like paraphrase-multilingual-MiniLM-L12-v2 [3]
361

