Page 217 - Kaleidoscope Academic Conference Proceedings 2024

P. 217

Innovation and Digital Transformation for a Sustainable World

expert networks, enabling superior performance while
maintaining computational eﬃciency.

M − 7 = SMoE(N experts , N params , f routing , f experts )

Where:
M − 7 is Mistral-7B model
SMoE is the Sparse Mixture-of-Experts model
N experts is the number of expert models
N params ≈ 7 × 10 is the total number of parameters
9
f routing is the routing function assigns input to experts
f experts are the expert models that process the inputs
• Item Model Size and Capabilities: Mistral-7B is
a large-scale language model capable of capturing
intricate patterns and relationships within natural
language data. Its substantial parameter count endows it
with remarkable linguistic understanding and generation
abilities, making it well-suited for a wide range of
natural language processing tasks like text generation,
summarization, question answering, and language
translation.
• Multilingual Support: One of Mistral-7B’s notable
features is its multilingual support. The model has
been trained on data from multiple languages, like
English, French, Italian, German, and Spanish. This
multilingual capability enables the model to understand
and generate text in various languages, facilitating
cross-lingual applications and enhancing its utility in
diverse linguistic contexts.

Figure 5 – Long range performance of Mistral. (Left) Mistral
has 100% retrieval accuracy of the Passkey task regardless of
the location of the passkey and length of the input sequence.
(Right) The perplexity of Mistral on the proof-pile dataset
decreases monotonically as the context length increases [13]

based SIEM solution implemented in a large enterprise
organization has been collected.

4.2 Dataset Exploration

Figure 4 – In all metrics Mistral-7B signiﬁcantly outperforms • The dataset contains 1 million security events and
Llama 2 13B and is on par with Llama 34B [17] human generated response for detecting, diagnosing and
mitigating cyber threats using alerts, textual content and
entity relationships [18].
• It includes ﬁrewall logs, endpoint security logs, access
4. METHODOLOGY
logs, audit logs, and intrusion detection system logs.
Mistral oﬀers open-weight models (Mistral 7B, Mixtral
8x7B, Mixtral 8x22B) under the Apache 2 license for easy
customization and deployment. We are using Mistral-7B 4.3 Data Preprocessing
because it is accurate and ideal for ﬁne-tuning using its
portability, control and fast performance capabilities. • Extracted relevant ﬁelds (event type, severity level,
technique, timestamp, description) from the SIEM logs.
• Cleaned dataset via removing any irrelevant records and
4.1 Dataset Collection
sensitive information (usernames, IP addresses)
• A real-world dataset of security events logs consumed • Approximately 10% of the dataset is removed during
from ELK (Elasticsearch, Logstash, and Kibana) stack this cleaning step.

– 173 –

212 213 214 215 216 217 218 219 220 221 222