Catalog
#data-quality
2 entries tagged “data-quality”
A017
79%
Model Collapse from Self-Training
“More training data always improves AI model quality, regardless of source.”
-40%output diversity (unique patterns)Eliminatedtail distribution representation
Read analysis
T027
85%
Data Lake Swamp
“Centralizing all data in a data lake enables organization-wide analytics and insights.”
60-70% unuseddata lake data actually used for analyticsInverteddata scientist time on data prep vs. analysis
Read analysis