Page 112 - Big data - Concept and application for telecommunications
P. 112
3 Big data - Concept and application for telecommunications
– reproduce an execution from provenance for big data applications: In some case of big data
execution, the environment information (e.g., hardware (H/W) information and parameter
configuration of big data engines) is an important factor.
The application area of big data provenance and its benefits are:
– collaborative big data analysis: Big data provenance allows collaboration of big data analysis among
multiple domains or applications by data sources information and their process steps;
– reuse of data processing: Generally, a big data analysis has complex process steps. Thus, a well-
defined analysis model which can be derived from provenance information is helpful for a similar
case of big data processing;
NOTE 3 – In data processing system, data processing means a course of events occurring according
to an intended purpose of effect.
– automating big data analysis process: Provenance gives a context in which to use the data, and
allows automated validation and revision of derived data when the base data is updated;
– audit and protect intellectual property: Provenance gives a lineage of data, and it allows auditing
and tracing of digital rights on mash-up data.
7 Overview of big data provenance
This clause presents an overview of big data provenance. This clause describes data provenance in a big data
ecosystem, a conceptual model, provenance operations, and logical components for big data provenance.
7.1 Data provenance in big data ecosystem
According to [ITU-T Y.3600], a big data service provider (BDSP) supports data provenance as a part of data
management by managing information about the origin and generation process methods of data, including
the party or parties involved in the generation, introduction and/or mash-up processes for data.
Figure 7-1 – Using data provenance in big data ecosystem
104 Static data – Data provenance, data formats and trust