Take Care When Merging - Public Datasets And Privacy
The manipulation of data from two different government data sets has created an educational opportunity for the wider community.
The creators of the website, whatdoesmylandlordown.org (WDMLO), a site that listed property owners and their address, built their web platform by merging two different data sets sourced from publicly available information held by Toitū Te Whenua Land Information New Zealand (LINZ).
"The WDMLO algorithm used to manipulate the data for publishing on the website created information that was inaccurate. As a result, people were identified as owners of properties that they did not own. Our office received complaints from people reporting emotional and reputational harm", says Deputy Privacy Commissioner Liz MacPherson.
"This is an example of the perils of merging data without taking into consideration that what you end up with might no longer represent facts. Two data sets don’t always add up to what you think they should."
Launched early in February, whatdoesmylandlordown.org quickly garnered attention from the public.
"My Office started receiving complaints about information on the WDMLO website identifying individuals as owners of properties soon after," says the Deputy Commissioner.
"The key message here for anyone using data from other providers is that they’re responsible for ensuring the data they’re creating is accurate."
"While the source agency for the information you use has responsibilities, you must take care to ensure any data manipulation you may complete to get the data ready for your own use, doesn’t then alter the accuracy of the data.," says the Deputy Commissioner.
The problem in this case is in the algorithm WDMLO used to combine the data sets. It identified people as owners of properties when they weren’t.
"The accuracy issues arose when people shared common names. The way the information was presented did not differentiate between them.
"We note WDMLO did try and remedy the situation, but we were not satisfied that the steps taken were enough to address this problem They have been found in breach of the Privacy Act."
The Privacy Commissioner made the decision to provide public comment on the details of this case, because it provides a cautionary example in an increasingly data-driven world.
The decision was made in the context of our Compliance and Regulatory Action Framework and Naming Policy.
"This is a valuable lesson for everyone who uses data," says Liz.
For more information, please read the case note.