Understanding the provenance of data
Data ethnography is the in depth study of data. Just as traditional ethnography is concerned with people, and their habits and cultures, data ethnography seeks to get to grips with the facts behind the figures. Who or what does a dataset represent? What does it tell us about an event, person or group? How has it been collected, manipulated and stored? These kinds of questions are central to a clear understanding of the provenance – and existence – of data.
Data ethnography is becoming more important than ever, as we increasingly enter a data driven world. Most of our modern technologies are powered by data, so improving transparency around the data we feed into our computers, machines and devices is crucial. The most obvious example of this in the training of AI. At the most basic level, if we want AI to perform a certain task for us, then we need to give it data that is relevant to that task. It’s no use programming AI with data on language, for example, if we want it to recognise faces. Going deeper, a thorough understanding of data can explain why an AI programme behaves in a certain way. This includes highlighting any inherent bias present in the system.
The data ethnographer has an important role to play in connecting the individual operations of technology to its original data. It is the data ethnographer’s job to label and test data, so we can all have a better understanding of why technology behaves in the way it does, and the origins of the results we see in the real world. Currently, a lack of data ethnography – of deep insight into the data that technology relies on – is resulting in the perpetuation of unwanted bias in our search engines, chatbots, and predictive analytics. If we want a more diverse society in the future, we need data ethnographers to open our eyes to the bias we are programming into our technology now, before it is too late.
For insights into more of technology’s terms, sign up to our free weekly newsletter.