Page 110 - Big data - Concept and application for telecommunications
P. 110

3                                Big data - Concept and application for telecommunications



            DS          Data Supplier

            H/W         Hardware
            OS          Operating System

            PI          Provenance Information
            PII         Personally Identifiable Information

            URI         Uniform Resource Identifier


            5       Conventions
            In this Recommendation:
            The keywords "is required to" indicate a requirement which must be strictly followed and from which no
            deviation is permitted if conformance to this document is to be claimed.
            The keywords "is recommended" indicate a requirement which is recommended but which is not absolutely
            required. Thus this requirement need not be present to claim conformance.

            The keywords "can optionally" indicate an optional requirement which is permissible, without implying any
            sense of being recommended. This term is not intended to imply that the vendor's implementation must
            provide the option and the feature can be optionally enabled by the network operator/service provider.
            Rather,  it  means  the  vendor  may  optionally  provide  the  feature  and  still  claim  conformance  with  the
            specification.

            In the body of this document and its annexes, the words shall, shall not, should, and may sometimes appear,
            in which case they are to be interpreted, respectively, as is required to, is prohibited from, is recommended,
            and can optionally. The appearance of such phrases or keywords in an appendix or in material explicitly
            marked as informative are to be interpreted as having no normative intent.


            6       Introduction to data provenance

            6.1     General concept of data provenance

            The  reliability  of  data  used  is  an  important  factor  to  determine  the  trustworthiness  of  a  data  analysis
            outcome. Indeed, data can be manipulated and transformed according to the intent of the analyst and
            distorted  in  order  to  extract  the  desired  result.  In  this  sense,  the  data  provenance  aims  to  ensure  the
            reliability of data and analysis results by providing transparency of the historical path of the data.
            Provenance is information pertaining to any source of information, including the party or parties involved in
            generating it, introducing it and/or vouching for it. In the field of data management, data provenance is
            information about the origin and creation process of data with:
            –       data product;
                    NOTE 1 – A data product is the output data production for distribution (open or sell) purpose.
            –       process that enable the creation of data;
                    NOTE 2 – A process is described by the applied functions on data source, intermediate outputs and
                    their order.
            –       metadata recording process of workflow, annotations, notes about processes; and,
            –       information that helps determine derivation history of a data product, starting from its original
                    sources.
            Data provenance is useful for:
            –       managing derivation history of a data product starting from its original sources;

            –       ascertaining quality of data based on ancestral data and derivation;



            102      Static data – Data provenance, data formats and trust
   105   106   107   108   109   110   111   112   113   114   115