Page 723 - Cloud computing: From paradigm to operation
P. 723

XaaS                                                     3


            7.1.1   Data collection functional component

            The  data  collection  functional  component  performs  data  collection  based  on  various  data  collection
            configurations. The data collection functional component provides:
            –       setting up various data collection configurations, such as data amount, traffic volume, collection
                    period, collection method;

                    NOTE 1 – Examples of collection methods include crawling, rich site summary collecting,    log  /sensor
            collecting.
                    NOTE  2  –  Rich  site  summary  is  used  to  aggregate  syndicated  web  content,  such  as  online  newspapers,
                    blogs, podcasts and video blogs in one location.
                    NOTE 3 – Crawling is used to gather data from the world wide web, especially web indexing.
                    NOTE 4 – Log collecting is used to collect data from log files generated by web servers.
            –       gathering data based on established configurations of data collection. The collected data is stored
                    in an appropriate storage according to the data type.

            7.1.2   Data visualization functional component

            The data visualization functional component makes data more intuitive and easier to understand for big data
            service users (e.g., CSC: big data service user (BDSU)) by using various data visualization tools. It also supports
            multiple user interactive reporting tools.
            This functional component provides:
            –       presenting data with multiple styles such as statistical graphics, forms, diagrams, charts and reports;

            –       reporting tools that can be configured by CSC:BDSU.
            7.1.3   Data pre-processing functional component

            The data pre-processing functional component is responsible for preparing data for further processing such
            as  data  analysis.  This  functional  component  provides  support  for  data  cleaning,  data  integration,  data
            transformation, data discretization and data extraction to improve data analysis efficiency.
            This functional component provides:

            –       cleaning  data  which  includes  processing  smoothing  noise  data,  and  identifying  and  removing
                    outliers to improve data quality;
                    NOTE  –  Outlier  refers  to  abnormal  data  in  a  dataset.  If  it  is  not  trimmed  out,  data  quality  may  be
                    damaged.
            –       combining and integrating data from multiple sources to remove duplicated and redundant data;
            –       transforming the data collected in different formats and types;
            –       converting continuous data into discrete interval data;

            –       extracting the representative features from a large number of data features for data analysis.
            7.1.4   Data analysis functional component

            The data analysis functional component is responsible for extracting useful information or valuable insights
            from  big  data.  This  functional  component  provides  support  for  multiple  data  analysis  methods.  This
            functional component also supports customization of specific analysis methods.

            This functional component provides:
            –       registration of data analysis methods which are used for data analysis. Typical Data analysis methods
                    are classification analysis, clustering analysis, association analysis, regression analysis, customized
                    analysis, etc.;
                    NOTE 1 – Classification analysis: This supports decision tree, support vector machine, neural networks and
                    other algorithms, to identify to which set of categories data belongs.
                    NOTE 2 – Clustering analysis: This supports k – means, k – center point, overlapping clustering, fuzzy clustering,
                    etc., to classify data into different classes or clusters according to their similarity.


                                                                                                         715
   718   719   720   721   722   723   724   725   726   727   728