We’re creating an open dataset that collects diverse statements from the LGBTIQ+ community, such as “I’m gay and I’m proud to be out” or “I’m a fit, happy lesbian that has just retired from a wonderful career” to help reclaim positive identity labels. These statements from the LGBTIQ+ community and their supporters will be made available in an open dataset, which coders, developers and technologists all over the world can use to help teach machine learning models how the LGBTIQ+ community speak about ourselves.
It’s easy to say that algorithms are biased, because they are. It’s much harder to ask why they’re biased. They’re biased because of many reasons but one of the biggest contributors is that we simply don’t have diverse and inclusive data sets to train them on. Human bias and prejudice is reflected in our online interactions; they way we speak to each other on social media, the things we write about on blogs, the videos we watch on YouTube, the stories we share and promote. Project respect is an attempt to increase the set of inclusive and diverse training data for better and less biased machine learning.
Algorithms are biased because human beings are biased, and the ways that those biases are reflected back to us may be why we find them so offensive. Maybe we don’t like machine bias because of what it says about us.