Page 402 - Kaleidoscope Academic Conference Proceedings 2024
P. 402
2024 ITU Kaleidoscope Academic Conference
preprocessing module is completed, further optimizations 5.5 Data Aggregation and Augmentation Strategies
and categorizations are performed by the optimization
module. This task is necessary to bring the data into a state For collecting data from OSV, it was found feasible to
optimal for generating vulnerability intelligence and periodically download the complete zipped JSON files
includes data categorization, data organization, and archive from the data dumps provided through their GCS
parameter optimization. While the data augmentation bucket and process it (GCS bucket maintained by OSV at
module is responsible to collect data from the CAPEC gs://osv-vulnerabilities [15]). This can be done timely
repository and extend it using CWE list so that it can be through a scheduled application process over an encrypted
linked with the aggregated vulnerability repository, the web web channel. For integration with NVD, the API mode of
user interface module is responsible for providing the means integration [15] best suited the mirroring the data, once
of interaction between the system users and system itself. completely and then incrementally.
Finally, the collaboration module provides the
interoperability features for the external systems that wish to To enhance the aggregated vulnerability data to suit the
work with the data aggregated and/or the vulnerability needs for being able to aid in forensic analysis process for
intelligence generated by the system without using the web software systems, the CAPEC repository, has been utilized
user interface module. This delivers the flexibility to enhance for data augmentation. Given the CAPEC repository and
the data being requested without requesting changes in the CWE list are available for download in csv (comma
system itself. separated values) format, their latest versions were both
imported directly into the local database and further
5.2 Interoperability Aspects processed according to the needs for mapping and linking.
The developed vulnerability intelligence platform has two 5.6 The Unified Schema for Seamless Integration
chief capabilities that allow the solution to interoperate with
external systems. First, it has a standardized data format to The OSV database transforms and stores the data from the
consume the vulnerability data for future integrations, called multiple open-source databases into a custom schema, that
the unified schema. Second, it exposes well- defined grew from the vulnerability interchange schema, having
RESTful APIs (APIs that following the Representational gone several iterations of change [5]. Similarly, the NVD
State Transfer architectural style for exchanging data over database maintains a standard format to keep the
the Internet) for external software systems to consume the vulnerability records. These two records are different in
aggregated vulnerability repository. Further, the main point several aspects and in order to integrate the data from these
of access to the platform is through a web user interface two sources to present the converged vulnerability insights,
which has been built in compliance with the latest web the schemas needed to be unified. This new unified schema
standards ensuring cross browser interoperability. developed for the system, not only provides relevant
information without any data loss from either source, but
5.3 Collaboration Aspects also reduces the noise from unnecessary data fields, or data
fields that may not be needed for current context.
The system supports collaboration with the users, both who
directly use the vulnerability intelligence user interface and Under the unified schema, all records are aggregated
those who wish to interact with the data consumed by the together under a unique identifier for every vulnerability
external software systems. Additionally, it provides a feature record, called VIP ID (represented by vip_id in the format).
to add feedback for integrations with external software This unified schema forms the crux of the convergence of
systems, so that the vulnerability intelligence can be vulnerability insights from multiple sources. It has been
accordingly re-organized to show restructured priority for structured in JSON format and is composed of the fields as
the specific usage. summarized in Table 1.
5.4 Configurability Aspects Table 1 – The Unified Schema Format
The developed system provides the ability to configure or Field Name Requirement Field
modify several parameters through its user interface. The Value
default ecosystem for which vulnerability intelligence is to Type
be generated can be customized for each user. The vip_id mandatory string
aggregated vulnerability repository can be enhanced from source_name mandatory string
another data source by configuring API information that can source_vuln_id mandatory string
be consumed periodically to append the information to the
overall system. Further, the collaboration with external ecosystem optional string
software systems can also be managed through a vulnerability_description mandatory object
configuration user interface. source_published mandatory string
response_version mandatory string
response_timestamp mandatory string
– 358 –