Page 119 - Big data - Concept and application for telecommunications
P. 119
Big data - Concept and application for telecommunications 3
NOTE 1 – Big data provenance information model includes function name and its uses,
computational environment, data type and format of input and output data, input parameters,
responsible party information, etc.
NOTE 2 – Example of computational environment information is OS, H/W description, locale
settings, and time zone, etc.
– (common format for exchange) It is recommended that BDSP supports encoding and decoding a
provenance information in a common format for use on different systems;
NOTE 3 –In this Recommendation, the meaning of encoding is the process of converting provenance
information into a specialized format. Decoding is the opposite process.
– (provenance recoding initiation) It is required that BDSP records provenance unit when data is
stored;
NOTE 4 – The information contained in the metadata (from DP:DB or generated by BDSP) can be
used for recoding provenance unit.
– (storing provenance unit) It is required that BDSP supports a cost-efficient storing mechanism for
provenance units;
NOTE 5 – In case of recording provenance information of streaming data, for the efficient storage
usage, it is needed to designate a predetermined period of time to record provenance unit, rather
than recording it every time data are stored. Data compression techniques can also be considered.
– (storing provenance information) BDSP can optionally support pre-storing provenance information
prior to request time to reduce retrieval time;
– (cearching provenance unit) It is required that BDSP supports searching a provenance unit;
– (Combining provenance units) It is required that BDSP supports combining of provenance units;
NOTE 6 – In case of deleting data, a provenance unit needed to combine (see clause 7.3.2).
– (retrieving provenance information) It is required that BDSP supports provenance unit aggregation
to retrieve a provenance information;
– (deleting provenance unit) It is required that BDSP provides a provenance unit deletion mechanism.
NOTE 7 – In case of deleting data, BDSP acts with three mechanisms on the provenance unit (keep,
combine, delete) based on the context (see clause 7.3.2).
NOTE 8 – The BDSP can maintain the associated provenance unit even if the data are deleted, which
is subject to management policy.
8.2 Analysis support requirements
Analysis support requirements include:
– (extracting workflow) It is required that BDSP provides extraction of workflow information from a
provenance information;
– (storing workflow) It is recommended that BDSP supports storing workflow;
NOTE 1 – The workflow is stored in forms of graph, which is organized with the usage frequency of
the analysis functions and sequential relationship among them.
– (retrieving workflow) It is recommended that BDSP supports workflow retrieval;
– (providing data list on function) It is recommended that BDSP provides a list of data related to a
given function recorded in given workflow;
– (providing function list on data) It is recommended that BDSP provides a list of functions related to
a given data recorded in given workflow;
– (data analysis automation) It is recommended that BDSP supports analysis automation based on
workflow;
– (user annotation) BDSP can optionally support annotation on provenance information;
Static data – Data provenance, data formats and trust 111