Dirty data
Data that is incomplete, incorrect, or irrelevant to the problem you're trying to solve
Clean data
Data that is complete, correct, and relevant to the problem you're trying to solve
Data engineers
Transform data into a useful format for analysis and give it a reliable infrastructure
Types of dirty data

Duplicate data
Description |
Possible causes |
Potential harm to businesse |
Any data record that shows up more than once |
Manual data entry, batch data imports, or data migration |
Skewed metrics or analyses, inflated or inaccurate counts or predictions, or confusion during data retrieval |
Outdated data
Description |
Possible causes |
Potential harm to businesses |
Any data that is old should be replaced with newer and more accurate information |
People changing roles or companies, or software and systems become obsolete |
Inaccurate insights, decision-making, and analytics |
Incomplete data
Description |
Possible causes |
Potential harm to businesses |
Any data that is missing important fields |
Improper data collection or incorrect data entry |
Decreased productivity, inaccurate insights, or inability to complete essential services |
Incorrect/inaccurate data
Description |
Possible causes |
Potential harm to businesses |
Any data that is complete but inaccurate |
Human error inserted during data input, fake information, or mock data |
Inaccurate insights or decision-making based on bad information resulting in revenue loss |
Inconsistent data
Description |
Possible causes |
Potential harm to businesses |
Any data that use different formats to represent the same thing |
Data stored incorrectly or errors inserted during data transfer |
Contradictory data points leading to confusion or inability to classify or segment customers |
Validity
Definition