Page 52 - Big data - Concept and application for telecommunications
P. 52

1                                Big data - Concept and application for telecommunications



            7.1.1   Data collection functional component

            The  data  collection  functional  component  performs  data  collection  based  on  various  data  collection
            configurations. The data collection functional component provides:
            –       setting up various data collection configurations, such as data amount, traffic volume, collection
                    period, collection method;

                    NOTE 1 – Examples of collection methods include crawling, rich site summary collecting,    log
                    /sensor collecting.
                    NOTE  2  –  Rich  site  summary  is  used  to  aggregate  syndicated  web  content,  such  as  online
                    newspapers,  blogs, podcasts and video blogs in one location.
                    NOTE 3 – Crawling is used to gather data from the world wide web, especially web indexing.
                    NOTE 4 – Log collecting is used to collect data from log files generated by web servers.

            –       gathering data based on established configurations of data collection. The collected data is stored
                    in an appropriate storage according to the data type.

            7.1.2   Data visualization functional component
            The data visualization functional component makes data more intuitive and easier to understand for big data
            service users (e.g., CSC: big data service user (BDSU)) by using various data visualization tools. It also supports
            multiple user interactive reporting tools.
            This functional component provides:
            –       presenting data with multiple styles such as statistical graphics, forms, diagrams, charts and reports;
            –       reporting tools that can be configured by CSC:BDSU.

            7.1.3   Data pre-processing functional component
            The data pre-processing functional component is responsible for preparing data for further processing such
            as  data  analysis.  This  functional  component  provides  support  for  data  cleaning,  data  integration,  data
            transformation, data discretization and data extraction to improve data analysis efficiency.

            This functional component provides:
            –       cleaning  data  which  includes  processing  smoothing  noise  data,  and  identifying  and  removing
                    outliers to improve data quality;
                    NOTE – Outlier refers to abnormal data in a dataset. If it is not trimmed out, data quality may be
                    damaged.
            –       combining and integrating data from multiple sources to remove duplicated and redundant data;
            –       transforming the data collected in different formats and types;
            –       converting continuous data into discrete interval data;
            –       extracting the representative features from a large number of data features for data analysis.

            7.1.4   Data analysis functional component
            The data analysis functional component is responsible for extracting useful information or valuable insights
            from  big  data.  This  functional  component  provides  support  for  multiple  data  analysis  methods.  This
            functional component also supports customization of specific analysis methods.
            This functional component provides:
            –       registration of data analysis methods which are used for data analysis. Typical Data analysis methods
                    are classification analysis, clustering analysis, association analysis, regression analysis, customized
                    analysis, etc.;
                    NOTE  1  –  Classification  analysis:  This  supports  decision  tree,  support  vector  machine,  neural
                    networks and other algorithms, to identify to which set of categories data belongs.





            44       Basics of Big data
   47   48   49   50   51   52   53   54   55   56   57