Page 110 - Big data - Concept and application for telecommunications
P. 110
3 Big data - Concept and application for telecommunications
DS Data Supplier
H/W Hardware
OS Operating System
PI Provenance Information
PII Personally Identifiable Information
URI Uniform Resource Identifier
5 Conventions
In this Recommendation:
The keywords "is required to" indicate a requirement which must be strictly followed and from which no
deviation is permitted if conformance to this document is to be claimed.
The keywords "is recommended" indicate a requirement which is recommended but which is not absolutely
required. Thus this requirement need not be present to claim conformance.
The keywords "can optionally" indicate an optional requirement which is permissible, without implying any
sense of being recommended. This term is not intended to imply that the vendor's implementation must
provide the option and the feature can be optionally enabled by the network operator/service provider.
Rather, it means the vendor may optionally provide the feature and still claim conformance with the
specification.
In the body of this document and its annexes, the words shall, shall not, should, and may sometimes appear,
in which case they are to be interpreted, respectively, as is required to, is prohibited from, is recommended,
and can optionally. The appearance of such phrases or keywords in an appendix or in material explicitly
marked as informative are to be interpreted as having no normative intent.
6 Introduction to data provenance
6.1 General concept of data provenance
The reliability of data used is an important factor to determine the trustworthiness of a data analysis
outcome. Indeed, data can be manipulated and transformed according to the intent of the analyst and
distorted in order to extract the desired result. In this sense, the data provenance aims to ensure the
reliability of data and analysis results by providing transparency of the historical path of the data.
Provenance is information pertaining to any source of information, including the party or parties involved in
generating it, introducing it and/or vouching for it. In the field of data management, data provenance is
information about the origin and creation process of data with:
– data product;
NOTE 1 – A data product is the output data production for distribution (open or sell) purpose.
– process that enable the creation of data;
NOTE 2 – A process is described by the applied functions on data source, intermediate outputs and
their order.
– metadata recording process of workflow, annotations, notes about processes; and,
– information that helps determine derivation history of a data product, starting from its original
sources.
Data provenance is useful for:
– managing derivation history of a data product starting from its original sources;
– ascertaining quality of data based on ancestral data and derivation;
102 Static data – Data provenance, data formats and trust