Page 397 - AI for Good Innovate for Impact
P. 397

AI for Good Innovate for Impact



               •    Normalize Arabic text (e.g., remove diacritics, standardize characters)

               3)    Exact Match Validation

               •    Check for exact matches in existing names (from SQL or Elasticsearch)                           4.4-Productivity
               •    If an exact match is found, flag it as not valid and return related record IDs

               4)    Semantic Similarity Search
               •    Convert input name into a semantic vector using multilingual Sentence-BERT [1]
               •    Query Elasticsearch’s vector index [2] using cosine similarity
               •    Retrieve top N semantically similar names

               5)    Prohibited Words Check
               •    Validate against a list of banned or sensitive words (e.g., political groups, terrorist
                    organizations)
               •    Reject name if it contains any prohibited terms

               6)    Policy-Based Validation

               Apply name validation rules, such as:
               •    Domain-specific checks: Flag names that closely resemble ministry functions or other
                    restricted domains

               7)    Decision Logic & Thresholding
               •    If:

                    •  No exact match
                    •  No prohibited words
                    •  Semantic similarity score is below defined threshold
                    •  Name complies with all policies – Then name is valid
               8)     Response Generation: Return a structured response

               Name Recommendation Pipeline

               1)    Input Capture

               •    Input name 
               •    Language
               •    Reason for rejection

               2)    Text Cleaning & Normalization
               •    Remove legal terms, trim spaces, standardize text
               •    Normalize Arabic: remove diacritics, normalize alef/hamza, etc.

               3)    Rule-Based Variants 

               •    Apply rule-based methods as before: synonym replacement, reordering, suffixes
               •    Generate a base pool of candidate names
               4)    Sentence-BERT Embedding

               •    Use multilingual SBERT model like paraphrase-multilingual-MiniLM-L12-v2 [3]



                                                                                                    361
   392   393   394   395   396   397   398   399   400   401   402