Page 17 - Big data - Concept and application for telecommunications
P. 17

Big data - Concept and application for telecommunications                       1


            The keywords "is recommended" indicate a requirement which is recommended but which is not absolutely
            required. Thus this requirement need not be present to claim conformance.

            The keywords "can optionally" indicate an optional requirement which is permissible, without implying any
            sense of being recommended. This term is not intended to imply that the vendor's implementation must
            provide the option and the feature can be optionally enabled by the network operator/service provider.
            Rather,  it  means  the  vendor  may  optionally  provide  the  feature  and  still  claim  conformance  with  the
            specification.

            In the body of this document and its annexes, the words shall, shall not, should, and may sometimes appear,
            in which case they are to be interpreted, respectively, as is required to, is prohibited from, is recommended,
            and can optionally. The appearance of such phrases or keywords in an appendix or in material explicitly
            marked as informative are to be interpreted as having no normative intent.


            6       Overview of big data

            6.1     Introduction to big data

            With the rapid development of information and communications technology (ICT), Internet technologies and
            services, huge amounts of data are generated, transmitted and stored at an explosive rate of growth. Data
            are generated by many sources and not only by sensors, cameras or network devices, but also by web pages,
            email systems and social networks as well as by many other sources. Datasets are becoming so large and so
            complex or are arriving so fast that traditional data processing methods and tools are inadequate. Efficient
            analytics of data within tolerable elapsed times becomes very challenging. The paradigm being developed to
            resolve the above issues is called big data.
            For the purpose of this Recommendation it is understood, that within the big data ecosystem, data types
            include structured, semi-structured and unstructured data. Structured data are often stored in databases
            which may be organized in different models, such as relational models, document models, key-value models,
            graph  models,  etc.  Semi-structured  data  does  not  conform  to the  formal  structure of  data models,  but
            contain tags or markers to identify data. Unstructured data do not have a pre-defined data model and are
            not  organized  in  any  defined  manner.  Within  all  data  types  data  can  exist  in  formats,  such  as  text,
            spreadsheet, video, audio, image, map, etc.
            Big data are successfully used in many fields, if traditional methods and tools have become inefficient, where
            data processing is characterized by scale (volume), diversity (variety), high speed (velocity) and possibly other
            criteria  like  credibility  (veracity)  or  business  value.  These  characteristics,  usually  called  the  Vs,  can  be
            explained as follows:
            –       Volume: refers to the amount of data collected, stored, analysed and visualized, which big data
                    technologies need to resolve;

            –       Variety: refers to different data types and data formats that are processed by big data technologies;
            –       Velocity: refers to both how fast the data is being collected and how fast the data is processed by
                    big data technologies to deliver expected results.

            NOTE – Additionally, veracity refers to the uncertainty of the data and value refers to the business results
            from the gains in new information using big data technologies. Other Vs can be considered as well.

            Taking into account the above Vs' described characteristics, big data technologies and services allow many
            new challenges to be resolved and also create more new opportunities than ever before:

            –       Heterogeneity  and  incompleteness:  Data  processed  using  big  data  can  miss  some  attributes  or
                    introduce  noise  in  data  transmission.  Even  after  data  cleaning  and  error  correction,  some
                    incompleteness and some errors in data are likely to remain. These challenges can be managed
                    during data analysis. [b-CRA-BDWP].
            –       Scale: Processing of large and rapidly increasing volumes of data is a challenging task. Using data
                    processing technologies, the data scale challenge was mitigated by the evolution of processing and
                    storage resources. Nowadays however data volumes are scaling faster than resources can evolve.


                                                                                    Basics of Big data      9
   12   13   14   15   16   17   18   19   20   21   22