Dirty Data Vs Human Data

 

An issue that shows the difference in how people view things and is very, very common, particularly when you merge financial industries is how people view dirty data.

I’m gonna go out on a limb here. Dirty data is not the same as Human centric data. Human centric data is when you have something that is meant to works with people and you try and force it into situations where people Only look at data in the way it works with systems.

The best contrast is when you compare banking transactions and insurance transactions. Yes, yes, I know. It’s very boring.., BUT banking transactions are very, very pure. They’re very strict money orientated things that are run by accountants and can be nailed hard to the wall. Such data structures are very precise and clean. So when there’s bits missing, they are provably dirty.

But if you take someone from a strictly banking financial place and put them in insurance, they often think insurance data is hideously dirty. It isn’t. It’s human data. Think of this scenario:

You have someone who is working out an imprecise insurance policy. How do you get imprecise insurance you ask? because it’s rooted in fear. Insurance is how when frightened of something that you need protection from, you talk with a human on how to protect yourself from said fear. Obviously with common fears like a car theft we have an automated way of dealing with it. But in a lot of cases, it is a discussion and negotiation over things that people value, and for that, you tend to involve a human, hence why you still have a lot of human brokers. Such negotiations are often done away from computers, away from the little screens. And when they are done, someone signs a bit of paper, the broker who has worked out the price, then takes everything back and only then tries to work out how he’s going to get everything that he agreed and signed up for into the computer. Now the computer has very definite things in a financial transaction that it REALLY wants, Many of them multiple thought sets away from a negotiation (Audit and regulation stuff for example). So the broker or whoever is doing the data input tries to work out the best way of getting the data in. Often in retrospect, they’ve missed getting stuff, but they’ve already agreed, got the signature and taken the money, they’re not going to be able to go back (not without loosing face). They do their best, they fill enter the data, but very often to them its merely the paperwork, the real work has been done, this is just naff stuff.

Unfortunately when this is then taken further up the line, analytics are done and there’s fields that don’t make a lot of sense because they’re out of context. And that data is classed as dirty. In this situation, it’s not from the original context, it’s human data that is too broad and too changeable to fit in the neat boxes.

So before you actually take data and say this is rubbish and just shout that people should take more care, try and think of the context in which the data is being provided. Because ultimately, to try and make data ‘non dirty’. You’re gonna have to go back to how the data was put in and change those methods, if you don’t it will result in never ending data clean-ups 1, you’re not going to get buy in from the people that do it. You’re not going to make people happy. And ultimately they will just complain and as soon as you’re gone they’ll go back to their old ways because you’re not taken consideration of what they do and how they do it.

So not all dirty data is dirty. Some of it just involves humans.

  1. or in the worse cases, data lakes that try and fix it in an automated way going forward[]

Leave a Reply

Your email address will not be published. Required fields are marked *