Page 742 - Shaping smarter and more sustainable cities - Striving for sustainable development goals
P. 742
Table 8 – Anonymized medical record
Birth Gender ID Problem
1970 female 121 Cold Alice
1970 female 121 Cold
1970 female 121 Cold
198* human 12* poor circulation
198* human 12* poor circulation
198* human 12* Headache Bob
198* human 12* Headache
The demand for the secondary use of the data such as medical records is increasing, because it may
enable the estimation of infection routes. However, medical data frequently includes sensitive and
private information. The medical data providers should define the anonymization methods and the
related privacy protection levels when publishing the data. In addition, when the data provider
permits several methods of anonymization, the consumers of the data must select a method that
matches their requirements. Moreover, consumers of the anonymized data should avoid obtaining
private data that exceeds their requirements, including situations where the data provider permits
the lower protection level and thus provides the private data. Therefore, the anonymization data
infrastructure should provide a method to define anonymization methods and protection levels that
fulfil the requirements for both data providers and data consumers.
In order to meet these requirements, data publishing with anonymization is required. However,
PPDP utilizing anonymization has numerous problems. One of the problems is that no protocols and
formats currently exist to enable secure data publishing, as described in the introduction. The other
is loss of anonymity by publishing the same data multiple times. Table 6 is an example of a medical
record data table. Table 7 is an anonymized data table with data from Table 6, and Table 8 is another
anonymized data table with data from Table 6. In this case, those who can obtain both the
anonymized data of and k=3 can obtain the data, including situations where the data provider did
not permit the publishing of k=1 data. This results in the leak of privacy information. One cause of
this problem is that previously published data is not referenced in the anonymization process; as a
result the coherence between the and k=3 data was severed. Table 9 is another example of a k = 3
data table. Utilizing Table 9 instead of Table 8 avoids the problem described above. Table 9 was
generated by anonymizing Table 7 instead of anonymizing Table 6, to maintain coherency in masking
and generalization. This anonymizing process can prevent further leaks of privacy information.
To address these problems, a data‐publishing infrastructure is shown as a solution. It manages the
previously published data for the anonymization without the loss of anonymity and provides safe
secondary use and anonymization. For encryption technology, it utilizes public key infrastructure
(PKI). Certificate authority serves a function as an authorized organization for certifying the public
key of servers on the Internet. For this discussion, the anonymization technology and this
infrastructure can be associated with the encryption technology and PKI, respectively.
732 ITU‐T's Technical Reports and Specifications