About the tricky handling of anonymized data, and an amazingly simple solution.
Personal data is a sensitive thing. We all want it to be used only for the purpose for which we voluntarily provided it. The legal framework for this can be found, for example, in the German Federal Data Protection Act and the European General Data Protection Regulation (GDPR).
For many companies, this results into an inner conflict. After all, those who meticulously comply with the legal requirements can get themselves into serious trouble precisely because of this. He or she can only manage it by providing only those data whose protection is at stake.
Here are a few examples:
… for sales and marketing
A customer objects to a company using his data, demands that all personal data be deleted and no longer wishes to be contacted. The company must comply with this request. However, a literal implementation means that the company cannot store the information that the customer no longer wants to be written to. It is therefore possible that the customer will receive unsolicited mail in the future. A countermeasure would be to add the customer data in anonymized form to a blacklist and to use this for checking before future letters are sent. However, with today’s standard procedures, this check is not error-tolerant. Even minor deviations in the spelling of name or address mean that the customer is not found on the blacklist.
… for PEP and terrorism checks
Legal regulations require the regular checking of creditors and debtors against so-called terror and a, as well as the identification of politically exposed persons (PEP). For small and medium-sized enterprises (SMEs), this obligation poses a problem because it is not worthwhile to procure the corresponding software. Instead, such companies can use PEP and sanctions list checks as a service.
For data protection reasons, however, many companies are reluctant to give their customer data out. The alternative of releasing only anonymized data is not very practical because it usually does not involve error-tolerant checks. But this is precisely what is important for sanctions lists, because they often contain typing and transmission errors.
… from research
Let’s assume that data on the same person are available in different data sets, e.g. in results of medical studies. Linking these data sets on a person-by-person basis would enable further insights, but the legal obstacles for this are very high, especially in Germany. A procedure accepted under data protection law for such cases uses a trustee who determines the related personal data of the different inventories, but passes them on to the data user only in anonymized or pseudonymized form. In this case, the data providers must extend their trust to the data trustee. Even further protection of personal data is achieved if it is handed over to the data trustee only in anonymized form. The extension of the area of trust is not necessary in this case. However, the data fiduciary would then not be able to perform error-tolerant matching with the procedures commonly used today. Even minor typing errors would prevent the desired findings from being obtained when comparing the various data sets.
All examples have in common that error-tolerant data matching would be helpful, but this is difficult due to the required anonymization.
TOLERANT Software offers a procedure for error-tolerant matching of anonymized data – even if they contain minor errors. Persons are thus found despite anonymization even if there are deviations in the spelling.
We have integrated a procedure described in the literature under the keyword “Privacy preserving record linkage” into our product TOLERANT Match, where it is now available alongside other matching procedures. In the future, users of TOLERANT Match will be able to easily link the anonymized search with the non-anonymized search in a matching process, e.g. for personal data and other, less privacy-preserving information.
In addition to the integration in TOLERANT Match, an independent tool was developed for the anonymization of data. This allows a party A (data provider) to perform the anonymization and to pass the thus anonymized data to a party B for further processing.
All the tasks described above can be realized with the new feature of TOLERANT Match in such a way that personal data of customers, patients, etc. do not have to leave the trust domain of an organization in order to be matched against other data sets – and yet such matching can now be performed in an error-tolerant manner.
Dr. Markus Eberspächer
This post is also available in DE.