Page 217 - Kaleidoscope Academic Conference Proceedings 2024
P. 217

Innovation and Digital Transformation for a Sustainable World




               expert networks, enabling superior performance while
               maintaining computational efficiency.

                M − 7 = SMoE(N experts , N params , f routing , f experts )

               Where:
               M − 7 is Mistral-7B model
               SMoE is the Sparse Mixture-of-Experts model
               N experts is the number of expert models
               N params ≈ 7 × 10 is the total number of parameters
                             9
               f routing is the routing function assigns input to experts
               f experts are the expert models that process the inputs
             • Item Model Size and Capabilities: Mistral-7B is
               a large-scale language model capable of capturing
               intricate patterns and relationships within natural
               language data. Its substantial parameter count endows it
               with remarkable linguistic understanding and generation
               abilities, making it well-suited for a wide range of
               natural language processing tasks like text generation,
               summarization, question answering, and language
               translation.
             • Multilingual Support: One of Mistral-7B’s notable
               features is its multilingual support.  The model has
               been trained on data from multiple languages, like
               English, French, Italian, German, and Spanish. This
               multilingual capability enables the model to understand
               and generate text in various languages, facilitating
               cross-lingual applications and enhancing its utility in
               diverse linguistic contexts.




                                                              Figure 5 – Long range performance of Mistral. (Left) Mistral
                                                              has 100% retrieval accuracy of the Passkey task regardless of
                                                              the location of the passkey and length of the input sequence.
                                                              (Right) The perplexity of Mistral on the proof-pile dataset
                                                              decreases monotonically as the context length increases [13]

                                                                  based SIEM solution implemented in a large enterprise
                                                                  organization has been collected.



                                                              4.2 Dataset Exploration

           Figure 4 – In all metrics Mistral-7B significantly outperforms  • The dataset contains 1 million security events and
           Llama 2 13B and is on par with Llama 34B [17]          human generated response for detecting, diagnosing and
                                                                  mitigating cyber threats using alerts, textual content and
                                                                  entity relationships [18].
                                                                • It includes firewall logs, endpoint security logs, access
                         4.  METHODOLOGY
                                                                  logs, audit logs, and intrusion detection system logs.
           Mistral offers open-weight models (Mistral 7B, Mixtral
           8x7B, Mixtral 8x22B) under the Apache 2 license for easy
           customization and deployment. We are using Mistral-7B  4.3 Data Preprocessing
           because it is accurate and ideal for fine-tuning using its
           portability, control and fast performance capabilities.  • Extracted relevant fields (event type, severity level,
                                                                  technique, timestamp, description) from the SIEM logs.
                                                                • Cleaned dataset via removing any irrelevant records and
           4.1  Dataset Collection
                                                                  sensitive information (usernames, IP addresses)
             • A real-world dataset of security events logs consumed  • Approximately 10% of the dataset is removed during
               from ELK (Elasticsearch, Logstash, and Kibana) stack  this cleaning step.




                                                          – 173 –
   212   213   214   215   216   217   218   219   220   221   222