Identity + Algorithms

Characteristics that make up human identity have become increasingly embedded into technological systems. Human characteristics like age, gender, race, and sexuality are being folded into the categorical structures of automated systems, such as algorithmic computer vision methods. However, these characteristics are often complex, nuanced, and fluid–and linked to social and historical instances of bias and discrimination. The simple and discrete categorization of these characteristics leads to tensions that can clash with human values and identity, and result in risky ramifications for already marginalized populations.

To mitigate the potential risks of these types of technological methods, we are researching ways to appropriately develop algorithms that are sensitive to the nuanced human identities held and expressed by the people classified. Our aim is to inform design approaches that are empowering and safe for all users.

Researchers

Anthony Pinter, Katie Gach, Aaron Jiang, Morgan Klaus Scheuerman, Jed Brubaker

Publications

How Data Workers Shape Datasets: The Role of Positionality in Data Collection and Annotation for Computer Vision Scheuerman, Morgan Klaus and Woodruff, Allison and Brubaker, Jed R.
Proceedings of the ACM on Human-Computer Interaction 9, CSCW
- Abstract
- Reference
- BibTeX
- DOI: 10.1145/3757481
Data workers play a key role in the big data industry. Clients hire data workers to collect and annotate data with human identity concepts, like demographic categories or clothing items. Through interviews and ethnographic observations of data workers, we show how worker positionality influences decisions during data work. We propose positional (il)legibility as an approach to data work that embraces the reality of positionality in classification practices.
Transphobia Is in the Eye of the Prompter: Trans-Centered Perspectives on Large Language Models Scheuerman, Morgan Klaus and Weathington, Katy and Petterson, Adrian and Doyle, Dylan Thomas and Das, Dipto and DeVito, Michael Ann and Brubaker, Jed R.
ACM Transactions on Computer-Human Interaction 32, 5
- Abstract
- Reference
- BibTeX
- DOI: 10.1145/3743676
Large language models (LLMs) are rapidly being integrated into products and services, often in chatbots. LLM-powered chatbots are expected to respond to any number of topics, including topics central to gender identity. In light of rising anti-trans discourse, we examined how two popular LLMs responded to real-world English-language questions about trans identity taken from Quora. We employed reflexive analysis that centered our situated knowledges of the trans community. We found that LLMs return pro-trans responses, even when presented with highly transphobic user prompts. While we also found highly transphobic LLM responses, anti-trans sentiment in LLMs was often subtle, requiring a deep positional understanding from diverse trans stakeholders to interpret.
Conservateur of a Former Self: Algorithmic Curation, Identity Exhibition, and Digital Well-Being Pinter, Anthony T. and Margaret, Annie and Brubaker, Jed R.
Oxford Intersections: Social Media in Society and Culture
- Abstract
- Reference
- BibTeX
- DOI: 10.1093/9780198945253.003.0113
This research explores how people do identity in online spaces, building on Goffman’s dramaturgical approach and Hogan’s exhibitional approach. We introduce the conservateur role to describe how people make choices about their identity collection and what data the algorithmic curator can access to create identity exhibitions.
Products of Positionality: How Tech Workers Shape Identity Concepts in Computer Vision Scheuerman, Morgan Klaus and Brubaker, Jed R.
Proceedings of the CHI Conference on Human Factors in Computing Systems
- Abstract
- Reference
- BibTeX
- DOI: 10.1145/3613904.3641890
There has been a great deal of scholarly attention on issues of identity-related bias in machine learning. Much of this attention has focused on data and data workers, workers who do annotation tasks. Yet tech workers—like engineers, data scientists, and researchers—introduce their own “biases” when defining “identity” concepts. More specifically, they instill their own positionalities, the way they understand and are shaped by the world around them. Through interviews with industry tech workers who focus on computer vision, we show how workers embed their own positional perspectives into products and how positional gaps can lead to unforeseen and undesirable outcomes. We discuss how worker positionality is mutually shaped by the contexts in which they are embedded. We provide implications for researchers and practitioners to engage with the positionalities of tech workers, as well as those in contexts outside of development that influence tech workers.
The “Colonial Impulse" of Natural Language Processing: An Audit of Bengali Sentiment Analysis Tools and Their Identity-based Biases Das, Dipto and Guha, Shion and Brubaker, Jed R. and Semaan, Bryan
Proceedings of the CHI Conference on Human Factors in Computing Systems
- Abstract
- Reference
- BibTeX
- DOI: 10.1145/3613904.3642669
While colonization has sociohistorically impacted people’s identities across various dimensions, those colonial values and biases continue to be perpetuated by sociotechnical systems. One category of sociotechnical systems–sentiment analysis tools–can also perpetuate colonial values and bias, yet less attention has been paid to how such tools may be complicit in perpetuating coloniality, although they are often used to guide various practices (e.g., content moderation). In this paper, we explore potential bias in sentiment analysis tools in the context of Bengali communities who have experienced and continue to experience the impacts of colonialism. Drawing on identity categories most impacted by colonialism amongst local Bengali communities, we focused our analytic attention on gender, religion, and nationality. We conducted an algorithmic audit of all sentiment analysis tools for Bengali, available on the Python package index (PyPI) and GitHub. Despite similar semantic content and structure, our analyses showed that in addition to inconsistencies in output from different tools, Bengali sentiment analysis tools exhibit bias between different identity categories and respond differently to different ways of identity expression. Connecting our findings with colonially shaped sociocultural structures of Bengali communities, we discuss the implications of downstream bias of sentiment analysis tools.
From Human to Data to Dataset: Mapping the Traceability of Human Subjects in Computer Vision Datasets Scheuerman, Morgan Klaus and Weathington, Katy and Mugunthan, Tarun and Denton, Emily and Fiesler, Casey
Proc. ACM Hum.-Comput. Interact. 7, CSCW1
- Abstract
- PDF
- Reference
- BibTeX
- DOI: 10.1145/3579488
Computer vision is a "data hungry" field. Researchers and practitioners who work on human-centric computer vision, like facial recognition, emphasize the necessity of vast amounts of data for more robust and accurate models. Humans are seen as a data resource which can be converted into datasets. The necessity of data has led to a proliferation of gathering data from easily available sources, including "public" data from the web. Yet the use of public data has significant ethical implications for the human subjects in datasets. We bridge academic conversations on the ethics of using publicly obtained data with concerns about privacy and agency associated with computer vision applications. Specifically, we examine how practices of dataset construction from public data-not only from websites, but also from public settings and public records-make it extremely difficult for human subjects to trace their images as they are collected, converted into datasets, distributed for use, and, in some cases, retracted. We discuss two interconnected barriers current data practices present to providing an ethics of traceability for human subjects: awareness and control. We conclude with key intervention points for enabling traceability for data subjects. We also offer suggestions for an improved ethics of traceability to enable both awareness and control for individual subjects in dataset curation practices.
Taxonomizing and Measuring Representational Harms: A Look at Image Tagging Katzman, Jared and Wang, Angelina and Scheuerman, Morgan Klaus and Blodgett, Su Lin and Laird, Kristen and Wallach, Hanna and Barocas, Solon
AAAI
- PDF
- Reference
- BibTeX
- DOI: 10.1609/aaai.v37i12.26670

Auto-essentialization: Gender in automated facial analysis as extended colonial project Scheuerman, Morgan Klaus and Pape, Madeleine and Hanna, Alex
Big Data & Society 8, 2: 20539517211053712

PDF
Reference
BibTeX
DOI: 10.1177/20539517211053712

@article{Scheuerman2021-bigdata-autoessentalization,
  title = {Auto-essentialization: Gender in automated facial analysis as extended colonial project},
  author = {Scheuerman, Morgan Klaus and Pape, Madeleine and Hanna, Alex},
  journal = {Big Data \& Society},
  volume = {8},
  number = {2},
  pages = {20539517211053712},
  year = {2021},
  publisher = {SAGE Publications Sage UK: London, England},
  doi = {10.1177/20539517211053712},
  url = {https://doi.org/10.1177/20539517211053712},
  tags = {identity-and-algorithms}
}

How We’ve Taught Algorithms to See Identity: Constructing Race and Gender in Image Databases for Facial Analysis Scheuerman, Morgan Klaus and Wade, Kandrea and Lustig, Caitlin and Brubaker, Jed R.
Proc. ACM Hum.-Comput. Interact. 4, CSCW1: Article 58 Best Paper Honorable Mention

PDF
Reference
BibTeX
DOI: 10.1145/3392866

@article{Scheuerman2020-cscw-databaseidentity,
  title = {How {We’ve} {Taught} {Algorithms} to {See} {Identity}: {Constructing} {Race} and {Gender} in {Image} {Databases} for {Facial} {Analysis}},
  author = {Scheuerman, Morgan Klaus and Wade, Kandrea and Lustig, Caitlin and Brubaker, Jed R.},
  doi = {10.1145/3392866},
  journal = {Proc. ACM Hum.-Comput. Interact.},
  number = {CSCW1},
  pages = {Article 58},
  volume = {4},
  year = {2020},
  tags = {identity-and-algorithms, marginalization-and-safety, positionalML},
  annote = {Best Paper Award},
  note = {Best Paper Honorable Mention}
}

How Computers See Gender: An Evaluation of Gender Classification in Commercial Facial Analysis and Image Labeling Services Scheuerman, Morgan Klaus and Paul, Jacob M and Brubaker, Jed R.
Proc. ACM Hum.-Comput. Interact. 3, CSCW: Article 144
- PDF
- Reference
- BibTeX
- DOI: 10.1145/3359246
"Am I Never Going to Be Free of All This Crap?": Upsetting Encounters With Algorithmically Curated Content About Ex-Partners Pinter, Anthony T. and Jiang, Jialun "Aaron" and Gach, Katie Z. and Sidwell, Melanie M. and Dykes, James E. and Brubaker, Jed R.
Proc. ACM Hum.-Comput. Interact. 3, CSCW: Article 70
- PDF
- Reference
- BibTeX
- DOI: 10.1145/3359172
Gender is not a Boolean: Towards Designing Algorithms to Understand Complex Human Identities Scheuerman, Morgan Klaus and Brubaker, Jed R.
In Participation+Algorithms Workshop at CSCW 2018.
- Abstract
- PDF
- Reference
- BibTeX
Algorithmic methods are increasingly used to identify and categorize human characteristics. A range of human identities, such as gender, race, and sexual orientation, are becoming interwoven with systems. We discuss the case of automatic gender recognition technologies that algorithmically assign binary gender categories. Based on our previous work with transgender participants, we discuss the ways current gender recognition systems misrepresent complex gender identities and undermine safety. We describe plans to build on this by conducting participatory design workshops with designers and potential users to develop improved methods for conceptualizing gender identity in algorithms.
Gender Recognition or Gender Reductionism?: The Social Implications of Embedded Gender Recognition Systems Hamidi, Foad and Scheuerman, Morgan Klaus and Branham, Stacy M.
Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems
- Abstract
- PDF
- Reference
- BibTeX
- DOI: 10.1145/3173574.3173582
Automatic Gender Recognition (AGR) refers to various computational methods that aim to identify an individual’s gender by extracting and analyzing features from images, video, and/or audio. Applications of AGR are increasingly being explored in domains such as security, marketing, and social robotics. However, little is known about stakeholders’ perceptions and attitudes towards AGR and how this technology might disproportionately affect vulnerable communities. To begin to address these gaps, we interviewed 13 transgender individuals, including three transgender technology designers, about their perceptions and attitudes towards AGR. We found that transgender individuals have overwhelmingly negative attitudes towards AGR and fundamentally question whether it can accurately recognize such a subjective aspect of their identity. They raised concerns about privacy and potential harms that can result from being incorrectly gendered, or misgendered, by technology. We present a series of recommendations on how to accommodate gender diversity when designing new digital systems.
I’m Working on Erasing You, Just Don’t Have the Proper Tools: Supporting Online Identity Management After the End of Romantic Relationships Pinter, Anthony T. and Brubaker, Jed R.
- Abstract
- Reference
- BibTeX
- DOI: 10.1145/3637343
After a break-up, people are left with data representative of their lost relationship - pictures, posts, and connections that exist because of that relationship. As part of breaking up and moving on, people often make decisions about managing that data. Prior work has identified two broad types of curatorial philosophies people adopt in data management: archivists and revisionists. However, what drives individuals to one approach remains unknown and is difficult to design sociotechnical systems for. Through focus group interviews with couples still together, we present a decision-making framework for data management. We outline factors that can influence individuals’ decision to act as an archivist or revisionist in the wake of a break-up. From our data and framework, we identify six implications for design to improve user experiences in the wake of a break-up, and from those implications, offer concrete suggestions for design for social media platforms.
Algorithmic Subjectivities Baumer, Eric P. S. and Taylor, Alex S. and Brubaker, Jed R. and McGee, Micki
- Abstract
- Reference
- BibTeX
- DOI: 10.1145/3660344
This paper considers how subjectivities are enlivened in algorithmic systems. We first review related literature to clarify how we see "subjectivities" as emerging through a tangled web of processes and actors. We then offer two case studies exemplifying the emergence of algorithmic subjectivities: one involving computational topic modeling of blogs written by parents with children on the autism spectrum, and one involving algorithmic moderation of social media content. Drawing on these case studies, we then articulate a series of qualities that characterizes algorithmic subjectivities. We also compare and contrast these qualities with a number of related concepts from prior literature to articulate how algorithmic subjectivities constitutes a novel theoretical contribution, as well as how it offers a focal lens for future empirical investigation and for design. In short, this paper points out how certain worlds are being made and/or being made possible via algorithmic systems, and it asks HCI to consider what other worlds might be possible.

Blog

How We’ve Taught Algorithms to See Identity

Break-ups Suck. They Could Suck Less.

Press

Even after blocking an ex on Facebook, the platform promotes painful reminders

How social media makes breakups that much worse The Problem With Putting Social Media in Charge of Our Memories