Research Projects

Plurals: A conceptual framework and python package for pluralistic AI deliberation

Note: Presented at IC2S2 2024; Full paper coming soon.

Generative AI usage is widespread and growing. But most uage is concentrated around a small number of foundation or 'generalist' models. This creates a tension where a small number of 'generalist' models are trying to serve a large number of diverse users, with their own values and their own preferences. One solution is to build pluralistic AI systems (Sorensen et al. 2024). I created a such a system called Plurals: It is a Python package that makes it very easy to conduct persona-based, multi-agent deliberation.

Plurals is composed of three abstractions: Agents, Structures, and Moderators. Agents are LLMs who have system instructions tasks. These system instructions can be created from user inputs, government datasets such as ANES, and templates for personas. Agents complete tasks within Structures, which control how Agents see responses of other Agents. For example, Structures differ in how much information is shared (e.g: in an Ensemble, Agents complete tasks in parallel; in a Debate, Agents go back and forth) and the directionality of information-sharing (e.g: reciprocal vs non-reciprocal). We have support for other Structure parameters, too---such as knobs controlling whether to re-write the order of agents. Finally, Moderators are special classes of Agents who summmarize or aggregate multi-agent deliberation. We also support AutoModerators, who bootstrap their own moderation instructions.

 

Seeing Like an AI: How LLMs Apply (and Misapply) Wikipedia Neutrality Norms

Large language models (LLMs) are trained on broad corpora and then used in communities with specialized norms. Is providing LLMs with community rules enough for models to follow these norms? We evaluate LLMs’ capacity to detect (Task 1) and correct (Task 2) biased Wikipedia edits according to Wikipedia’s Neutral Point of View (NPOV) policy. LLMs struggled with bias detection, achieving only 64% accuracy on a balanced dataset. Models exhibited contrasting biases (some under- and others over-predicted bias), suggesting distinct priors about neutrality. LLMs performed better at generation, removing 79% of words removed by Wikipedia editors. However, LLMs made additional changes beyond Wikipedia editors’ simpler neutralizations, resulting in high-recall but low-precision editing. Interestingly, crowdworkers rated AI rewrites as more neutral (70%) and fluent (61%) than Wikipedia-editor rewrites. Qualitative analysis found LLMs sometimes applied NPOV more comprehensively than Wikipedia editors but often made extraneous non-NPOV-related changes (such as grammar). LLMs may apply rules in ways that resonate with the public but diverge from community experts. While potentially effective for generation, LLMs may reduce editor agency and increase moderation workload (e.g., verifying additions). Even when rules are easy to articulate, having LLMs apply them like community members may still be difficult.

 

How AI Ideas Affect the Creativity, Diversity, and Evolution of Human Ideas: Evidence From a Large, Dynamic Experiment

Exposure to large language model output is rapidly increasing. How will seeing AI-generated ideas affect human ideas? We conducted an experiment (800+ participants, 40+ countries) where participants viewed creative ideas that were from ChatGPT or prior experimental participants and then brainstormed their own idea. We varied the number of AI-generated examples (none, low, or high exposure) and if the examples were labeled as 'AI' (disclosure). Our dynamic experiment design -- ideas from prior participants in an experimental condition are used as stimuli for future participants in the same experimental condition -- speaks to the interdependent process of cultural creation: creative ideas are built upon prior ideas. Hence, we capture the compounding effects of having LLMs 'in the culture loop'. We find that high AI exposure (but not low AI exposure) did not affect the creativity of individual ideas but did increase the average amount and rate of change of collective idea diversity. AI made ideas different, not better. There were no main effects of disclosure. We also found that self-reported creative people were less influenced by knowing an idea was from AI and that participants may knowingly adopt AI ideas when the task is difficult. Our findings suggest that introducing AI ideas may increase collective diversity but not individual creativity.

 

Prompt Design Matters for Computational Social Science Tasks but in Unpredictable Ways

Manually annotating data for computational social science tasks can be costly, time-consuming, and emotionally draining. While recent work suggests that LLMs can perform such annotation tasks in zero-shot settings, little is known about how prompt design impacts LLMs' compliance and accuracy. We conduct a large-scale multi-prompt experiment to test how model selection (ChatGPT, PaLM2, and Falcon7b) and prompt design features (definition inclusion, output type, explanation, and prompt length) impact the compliance and accuracy of LLM-generated annotations on four CSS tasks (toxicity, sentiment, rumor stance, and news frames). Our results show that LLM compliance and accuracy are highly prompt-dependent. For instance, prompting for numerical scores instead of labels reduces all LLMs' compliance and accuracy. The overall best prompting setup is task-dependent, and minor prompt changes can cause large changes in the distribution of generated labels. By showing that prompt design significantly impacts the quality and distribution of LLM-generated annotations, this work serves as both a warning and practical guide for researchers and practitioners.

 

The Dynamics of (Not) Unfollowing Misinformation Spreaders

Many studies explore how people "come into" misinformation exposure. But much less is known about how people "come out of" misinformation exposure. Do people organically sever ties to misinformation spreaders? And what predicts doing so? Over six months, we tracked the frequency and predictors of ~900K followers unfollowing ~5K health misinformation spreaders on Twitter. We found that misinformation ties are persistent. Monthly unfollowing rates are just 0.52%. In other words, 99.5% of misinformation ties persist each month. Users are also 31% more likely to unfollownon- misinformation spreaders than they are to unfollow misinformation spreaders. Although generally infrequent, the factors most associated with unfollowing misinformation spreaders are (1) redundancy and (2) ideology. First, users initially following many spreaders, or who follow spreaders that tweet often, are most likely to unfollow later. Second, liberals are more likely to unfollow than conservatives. Overall, we observe a strong persistence of misinformation ties. The fact that users rarely unfollow misinformation spreaders suggests a need for external nudges and the importance of preventing exposure from arising in the first place.

 

Environmental scan of sleep health in early childhood programs

We employed a multi-component environmental scan to investigate the translation of sleep health knowledge into early care and education (ECE) programs. Our methodology included a website scan of organizations' sleep content, surveys of ECE staff, and interviews with stakeholders from ECE, pediatric, and sleep communities. Results revealed gaps in sleep-related content on websites, with half lacking information on developmental links, optimal duration, or scientific background. While ECE staff reported comfort in addressing sleep issues, stakeholders identified sleep health as a high-relevance but lower-priority concern, noting poor knowledge of specific health and developmental links in ECE settings. Despite recognition of sleep's importance for school readiness, there is a lack of specific, actionable information in ECE training, programs, and policies. We suggest specific reccomendations based on this gap.