Fixing misspellings
Inconsistent capitalization
Incorrect punctuation and other typos
Compatibility
How well two or more datasets are able to work together
- Do I have all the data I need?
- Does the data I need exist within these datasets?
- Does the data need to be cleaned, or are they ready for me to use?
- Are the datasets cleaned to the same standard?
Common mistakes to avoid
- Not checking for spelling errors:
- Forgetting to document errors:
- Not checking for misfielded values:
- Overlooking missing values:
- Only looking at a subset of the data:
- Losing track of business objectives:
- Not fixing the source of the error:
- Not analyzing the system prior to data cleaning:
- Not backing up your data prior to data cleaning: