Screencap: "When we rely on ever larger datasets we risk incurring documentation debt, 18 i.e. putting ourselves in a situation where the datasets are both undocumented and too large to document post hoc. While documentation allows for potential accountability [13, 52, 86], undocumented training data perpetuates harm without recourse. Without documentation, one cannot try to understand training data characteristics in order to mitigate some of these attested issues or even unknown ones. The solution, we propose, is to budget for" and footnote 18 "On the notion of documentation debt as applied to code, rather than data, see [154]."
https://cdn.masto.host/daircommunitysocial/media_attachments/files/110/067/706/634/894/053/original/6c6969c1be15c411.png