Public
- Public
- Network
- Groups
- Popular
- People

Screencap: "When we rely on ever larger datasets we risk incurring documentation debt, 18 i.e. putting ourselves in a situation where the datasets are both undocumented and too large to document post hoc. While documentation allows for potential accountability [13, 52, 86], undocumented training data perpetuates harm without recourse. Without documentation, one cannot try to understand training data characteristics in order to mitigate some of these attested issues or even unknown ones. The solution, we propose, is to budget for" and footnote 18 "On the notion of documentation debt as applied to code, rather than data, see [154]."

Download link

Screencap: "When we rely on ever larger datasets we risk incurring documentation debt, 18 i.e. putting ourselves in a situation where the datasets are both undocumented and too large to document post hoc. While documentation allows for potential accountability [13, 52, 86], undocumented training data perpetuates harm without recourse. Without documentation, one cannot try to understand training data characteristics in order to mitigate some of these attested issues or even unknown ones. The solution, we propose, is to budget for" and footnote 18 "On the notion of documentation debt as applied to code, rather than data, see [154]."
https://cdn.masto.host/daircommunitysocial/media_attachments/files/110/067/706/634/894/053/original/6c6969c1be15c411.png

Notices where this attachment appears

Emily M. Bender (she/her) (emilymbender@dair-community.social)'s status on Thursday, 23-Mar-2023 03:01:05 JST Emily M. Bender (she/her)

Apropos #openai refusing to disclose any information about the training data for #GPT4 and #Google being similarly cagey about #Bard...
From the Stochastic Parrots paper, written in late 2020 and published in March 2021:
@timnitGebru @meg

In conversation Thursday, 23-Mar-2023 03:01:05 JST from dair-community.social permalink