Elias Stengel-Eskin
I am a Postdoctoral Research Associate at the University of North Carolina, Chapel Hill in the MURGe-Lab led by Mohit Bansal. I received my Ph.D. in 2023 from Johns Hopkins University, where I was supervised by Benjamin Van Durme and supported by an NSF GRFP.
I aim to develop AI agents that can intelligently communicate and collaborate with people and each other. A central focus of my work is multi-agent communication via text, which has led to work on multi-LLM discussions/debates, pragmatic/verbalized uncertainty, persuasion, and semantic parsing/text-to-code (transforming text into representations of its meaning). I believe that intelligent agents will be multimodal: another line of my work covers multimodal tasks and focuses especially on underspecified tasks (ambiguous examples, long-context inputs). As we scale up tasks, underspecification and ambiguity will become increasingly relevant. I have a long-standing interest in implicit phenomena such as vagueness, underspecification, and ambiguity. While I’ve mostly explored these topics through a linguistic lens, I am interested in their importance to intelligence more broadly.
Concretely, some of the areas I’ve been publishing on recently are:
- Confidence Estimation and Calibration:
- Multi-Agent/Multi-Model Reasoning:
- on training models to accept good and resist bad persuasion (Stengel-Eskin et al., 2024)
- on structured distillation for learning from multiple LLM reasoning agents (Chen et al., ICML 2024)
- on a new benchmark to assess game-theoretic abilities for LLM agents (Duan et al., NeurIPS 2024)
- on multi-agent iterative coarse-to-fine refinement for reasoning tasks (Chen et al., 2024)
- on using bandits to select instance-level reward models for LLM alignment (Nguyen et al., 2024)
- Learning Skills and Abstractions for Agents/Coding/Planning:
- Ambiguity and Underspecification:
- Improving Multimodal Models and LLM Agents:
- on building and testing data generation agents for creating training data (Khan et al., 2024
- on a tree-based representation for LLM-based video reasoning (Wang et al. 2024)
- on improving visual prompting/object grounding without training (Wan et al., ECCV 2024)
- on a more effective/efficient self-consistency method for LLM agents (Wang et al., ACL 2024)
- on Western cultural bias in VLMs and the effect of pretraining language (Ananthram et al., 2024)
- on visual commonsense in unimodal and multimodal models (Zhang et al, 2022)
Before starting my Ph.D., I received my B.A.&Sc. with First Class Honours in Cognitive Science from McGill University, focusing in computer science and linguistics. While at McGill, I worked as a research assistant at the Montreal Language Modeling Lab (MLML), now MCQLL supervised by Morgan Sonderegger. I wrote my honours thesis (supervised by Timothy O’Donnell) on a variational inference algorithm for a model of language acquisition.
news
Nov 6, 2024 | Our philosophy collaboration on challenges in model editing was accepted to TMLR! Fundamental Problems With Model Editing: How Should Rational Belief Revision Work in LLMs? |
---|---|
Oct 21, 2024 | New paper out! Teaching Models to Balance Resisting and Accepting Persuasion, where we use multi-agent recursive dialogue trees to teach models to accept and resist persuasion when appropriate. Our method reduces susceptibility to misinformation and flipflopping while also improving LLMs’ ability to act together in a team thru multi-agent dialogue! |
Oct 11, 2024 | New preprint! DataEnvGym: Data Generation Agents in Teacher Environments with Student Feedback led by Zaid Khan with Jaemin Cho and Mohit Bansal on a novel testbed for creating data generation agents. These agents produce synthetic data for teaching student models based on their errors and weaknesses! |
Oct 3, 2024 | New preprint! LASeR: Learning to Adaptively Select Reward Models with Multi-Armed Bandits led by Duy Nguyen and Archiki Prasad with Mohit Bansal on using bandit methods to pick the best-suited RM to optimize at an instance level, improving LLMs on reasoning, instruction-following, and long-context understanding. |
Oct 2, 2024 | Two papers accepted to NeurIPS 2024! LACIE: Listener-Aware Finetuning for Confidence Calibration in Large Language Models uses pragmatics to calibrate LLMs, and GTBench: Uncovering the Strategic Reasoning Limitations of LLMs via Game-Theoretic Evaluations introduces a new game-theoretic benchmark. |
Sep 19, 2024 | New preprint! MAgICoRe: Multi-Agent, Iterative, Coarse-to-Fine Refinement for Reasoning led by Justin Chih-Yao Chen with Swarnadeep Saha, Archiki Prasad, and Mohit Bansal introduces a novel method for refinement that improves math reasoning by selectively refining only hard instances and by treating it as an iterative, multi-agent problem. |
Sep 14, 2024 | New preprint! AdaCAD: Adaptively Decoding to Balance Conflicts between Contextual and Parametric Knowledge led by Han Wang with Archiki Prasad and Mohit Bansal introduces a dynamic decoding strategy to deal with variable amounts of knowledge conflict. |
Jul 1, 2024 | Contrastive Region Guidance: Improving Grounding in Vision-Language Models without Training has been accepted to ECCV 2024! |
Jun 3, 2024 | New preprint! LACIE: Listener-Aware Finetuning for Confidence Calibration in Large Language Models tackles implicit and explicit calibration in LLMs by using insights from pragmatics! |
May 28, 2024 | New project on videos+LLMs! VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos uses a tree-based structure to help LLMs reason over long videos efficiently and effectively. Joint work with Ziyang Wang and Shoubin Yu. |
May 15, 2024 | Soft Self-Consistency Improves Language Model Agents has been accepted to ACL 2024! |
May 4, 2024 | Three papers accepted to ICML 2024! ReGAL: Refactoring Programs to Discover Generalizable Abstractions , which uses refactoring to discover program abstractions for LLM-based code generation, MAGDi: Structured Distillation of Multi-Agent Interaction Graphs Improves Reasoning in Smaller Language Models , which introduces a structured distillation method for learning from discussions between multiple LLMs, and Language-guided Skill Learning with Temporal Variational Inference, which learns reusable skills from trajectories of demonstrations. |
Mar 22, 2024 | Excited to be giving a keynote at the UncertaiNLP workshop at EACL 2024, titled Confidence-based Rephrasing, Refinement, and Selection. I’ll cover a wide range of topics including calibration in semantic parsing, using calibrated models to improve usability, underspecified visual question answering and much more! |
Mar 5, 2024 | New work with David Wan and Jaemin Cho on improving visual tasks (especially grounding) through region-based guidance in Contrastive Region Guidance: Improving Grounding in Vision-Language Models without Training |
Feb 3, 2024 | New work led by Justin Chen and Swarnadeep Saha on distilling multi-agent LLM interactions into smaller models: MAGDi: Structured Distillation of Multi-Agent Interaction Graphs Improves Reasoning in Smaller Language Models . MAGDi uses a graph structure on top of LLM dialogues to distill reasoning from several large teacher models into a single, lightweight student. |
Jan 30, 2024 | New preprint! ReGAL: Refactoring Programs to Discover Generalizable Abstractions introduces a new refactoring-based method for learning abstractions for LLM program prediction, improving performance on a variety of tasks. Joint work with Archiki Prasad as part of my postdoc at UNC. |
Jan 17, 2024 | Two papers accepted to ICLR 2024. Zero and Few-shot Semantic Parsing with Ambiguous Inputs introduces a new benchmark for semantic parsing with ambiguity and tests a variety of models on how they handle five common linguistic ambiguities. Rephrase, Augment, Reason: Visual Grounding of Questions for Vision-Language Models is the first paper from my new postdoc position and introduces RepARe, a method for augmenting and rephrasing VQA questions (especially underspecified ones) to make them easier for zero-shot VL models to answer. |
Jan 16, 2024 | My thesis is now publicly available: Modeling Meaning for Description and Interaction. Many thanks to my advisor Benjamin Van Durme for all of your guidance over the last five years and to my thesis committee Jacob Andreas and Kyle Rawlins for your feedback! |
Jun 3, 2023 | I’m incredibly excited to announce that I will be starting a Postdoc with Mohit Bansal at the University of North Carolina, Chapel Hill! Looking forward to lots of collaborations with the amazing students and faculty of UNC NLP and UNC CS! |
Jun 1, 2023 | Calibrated Interpretation: Confidence Estimation in Semantic Parsing has just been accepted to TACL! We examine the calibration of common semantic parsing models, including LLMs using in-context learning. Check out the paper for results across a number of tasks and datasets! |
May 3, 2023 | Why Did the Chicken Cross the Road? Rephrasing and Analyzing Ambiguous Questions in VQA has been accepted to ACL 2023! We introduce a brand new dataset of ambiguous questions in VQA, with a model disambiguation model and plenty of linguistic analysis. See you in Toronto! |
Mar 31, 2023 | I’ve restructured a previous pre-print into two different papers. The first focuses on cataloguing calibration in popular semantic parsing systems, and the second looks at what we can do with a well-calibrated model. |
Feb 28, 2023 | Super-CLEVR (CVPR highlight), an exciting new benchmark for generalization in vision tasks led by Zhuowan Li now accepted to CVPR 2023 as a highlight (~2% of submissions)! Super-CLEVR: A Virtual Benchmark to Diagnose Domain Robustness in Visual Reasoning |
Nov 30, 2022 | I am on the job market for faculty, postdoc, and industry positions! Please reach out if know of a role that would be a good fit for me: elias.stengel@gmail.com |
Nov 29, 2022 | Two new preprints out! On ambiguity in VQA and on calibration in semantic parsing |
Oct 7, 2022 | Two new papers accepted to EMNLP 2022. Preprints out on arxiv! On subject and object control in LLMs and on a troubling quirk in NLU |
Mar 6, 2022 | I am starting a year-long internship at MSR Montreal with Marc-Alexandre Côté, Eric Yuan, and Pierre-Yves Oudeyer |
Aug 31, 2021 | I have completed an internship at Microsoft Semantic Machines, supervised by Yu Su |