Page 723 - Cloud computing: From paradigm to operation
P. 723
XaaS 3
7.1.1 Data collection functional component
The data collection functional component performs data collection based on various data collection
configurations. The data collection functional component provides:
– setting up various data collection configurations, such as data amount, traffic volume, collection
period, collection method;
NOTE 1 – Examples of collection methods include crawling, rich site summary collecting, log /sensor
collecting.
NOTE 2 – Rich site summary is used to aggregate syndicated web content, such as online newspapers,
blogs, podcasts and video blogs in one location.
NOTE 3 – Crawling is used to gather data from the world wide web, especially web indexing.
NOTE 4 – Log collecting is used to collect data from log files generated by web servers.
– gathering data based on established configurations of data collection. The collected data is stored
in an appropriate storage according to the data type.
7.1.2 Data visualization functional component
The data visualization functional component makes data more intuitive and easier to understand for big data
service users (e.g., CSC: big data service user (BDSU)) by using various data visualization tools. It also supports
multiple user interactive reporting tools.
This functional component provides:
– presenting data with multiple styles such as statistical graphics, forms, diagrams, charts and reports;
– reporting tools that can be configured by CSC:BDSU.
7.1.3 Data pre-processing functional component
The data pre-processing functional component is responsible for preparing data for further processing such
as data analysis. This functional component provides support for data cleaning, data integration, data
transformation, data discretization and data extraction to improve data analysis efficiency.
This functional component provides:
– cleaning data which includes processing smoothing noise data, and identifying and removing
outliers to improve data quality;
NOTE – Outlier refers to abnormal data in a dataset. If it is not trimmed out, data quality may be
damaged.
– combining and integrating data from multiple sources to remove duplicated and redundant data;
– transforming the data collected in different formats and types;
– converting continuous data into discrete interval data;
– extracting the representative features from a large number of data features for data analysis.
7.1.4 Data analysis functional component
The data analysis functional component is responsible for extracting useful information or valuable insights
from big data. This functional component provides support for multiple data analysis methods. This
functional component also supports customization of specific analysis methods.
This functional component provides:
– registration of data analysis methods which are used for data analysis. Typical Data analysis methods
are classification analysis, clustering analysis, association analysis, regression analysis, customized
analysis, etc.;
NOTE 1 – Classification analysis: This supports decision tree, support vector machine, neural networks and
other algorithms, to identify to which set of categories data belongs.
NOTE 2 – Clustering analysis: This supports k – means, k – center point, overlapping clustering, fuzzy clustering,
etc., to classify data into different classes or clusters according to their similarity.
715