Page 119 - Big data - Concept and application for telecommunications
P. 119

Big data - Concept and application for telecommunications                       3


                    NOTE  1  –  Big  data  provenance  information  model  includes  function  name  and  its  uses,
                    computational environment, data type and format of input and output data, input parameters,
                    responsible party information, etc.
                    NOTE  2  –  Example  of  computational  environment  information  is  OS,  H/W  description,  locale
                    settings, and time zone, etc.
            –       (common format for exchange) It is recommended that BDSP supports encoding and decoding a
                    provenance information in a common format for use on different systems;

                    NOTE 3 –In this Recommendation, the meaning of encoding is the process of converting provenance
                    information into a specialized format. Decoding is the opposite process.
            –       (provenance recoding initiation) It is required that BDSP records provenance unit when data is
                    stored;
                    NOTE 4 – The information contained in the metadata (from DP:DB or generated by BDSP) can be
                    used for recoding provenance unit.
            –       (storing provenance unit) It is required that BDSP supports a cost-efficient storing mechanism for
                    provenance units;
                    NOTE 5 – In case of recording provenance information of streaming data, for the efficient storage
                    usage, it is needed to designate a predetermined period of time to record provenance unit, rather
                    than recording it every time data are stored. Data compression techniques can also be considered.

            –       (storing provenance information) BDSP can optionally support pre-storing provenance information
                    prior to request time to reduce retrieval time;
            –       (cearching provenance unit) It is required that BDSP supports searching a provenance unit;

            –       (Combining provenance units) It is required that BDSP supports combining of provenance units;
                    NOTE 6 – In case of deleting data, a provenance unit needed to combine (see clause 7.3.2).
            –       (retrieving provenance information) It is required that BDSP supports provenance unit aggregation
                    to retrieve a provenance information;
            –       (deleting provenance unit) It is required that BDSP provides a provenance unit deletion mechanism.
                    NOTE 7 – In case of deleting data, BDSP acts with three mechanisms on the provenance unit (keep,
                    combine, delete) based on the context (see clause 7.3.2).
                    NOTE 8 – The BDSP can maintain the associated provenance unit even if the data are deleted, which
                    is subject to management policy.

            8.2     Analysis support requirements
            Analysis support requirements include:

            –       (extracting workflow) It is required that BDSP provides extraction of workflow information from a
                    provenance information;
            –       (storing workflow) It is recommended that BDSP supports storing workflow;

                    NOTE 1 – The workflow is stored in forms of graph, which is organized with the usage frequency of
                    the analysis functions and sequential relationship among them.
            –       (retrieving workflow) It is recommended that BDSP supports workflow retrieval;

            –       (providing data list on function) It is recommended that BDSP provides a list of data related to a
                    given function recorded in given workflow;
            –       (providing function list on data) It is recommended that BDSP provides a list of functions related to
                    a given data recorded in given workflow;
            –       (data analysis automation) It is recommended that BDSP supports analysis automation based on
                    workflow;
            –       (user annotation) BDSP can optionally support annotation on provenance information;



                                                   Static data – Data provenance, data formats and trust   111
   114   115   116   117   118   119   120   121   122   123   124