How are duplicates created?

Socks disappear in washing machines, duplicates are created in computers. The ghostly life of matter is probably one of the last great mysteries of existence. But while missing socks usually remain lost for all time in the orcus of the unfathomable, modern science can already understand the appearance of duplicates quite well.

The appearance of duplicates often has to do with the origin of the addresses. Most databases are fed from three sources:

  • Users entering addresses individually (e.g. web shop or data entry dialogue)
  • Processes that are supposed to regularly update the dataset (e.g. matching against relocation data or checking street, postcode or city for up-to-dateness)
  • Third-party stocks that are transferred to the dataset by file comparison (e.g. purchase of third-party addresses or transfer from other systems)

Let’s start with the user. Duplicates often occur because users do not search for existing duplicates in a targeted manner – either out of convenience or because the necessary intelligent search procedures are lacking. Take the example of taking an order by telephone:

A long-standing customer calls, let’s call him Kowalczik. His name is not understood correctly or is entered incorrectly. The system compares the entry with the address database and reports: no hit. So the real customer gets a virtual double or doppelganger, which nobody notices, and the disaster takes its course …

Data stocks become obsolete. Those who do not maintain them quickly get into a disastrous jumble of half-right, half-wrong information. If, for example, a municipality changes street names, new postal addresses appear – and a customer who has been known and recorded for a long time mistakenly receives an electronic double.

Or, similarly tricky: someone moves house, informs his business partners, and they mistakenly enter his data under the heading »new customer«. Add to this an automatically generated data record from a move comparison, and the person in question already exists three times in the same database. One would rather not imagine the resulting chaos.

The problem can only be avoided by using a fuzzy search and regular duplicate matching of the entire address database, especially after each update with external reference data such as street directories or relocation data.

The beauty of it is that you don’t have to manage this on your own – our software does most of the work for you.