Elias Stengel-Eskin

headshot_new.jpg

I am a Postdoctoral Research Associate at the University of North Carolina, Chapel Hill in the MURGe-Lab led by Mohit Bansal. I received my Ph.D. in 2023 from Johns Hopkins University, where I was supervised by Benjamin Van Durme and supported by an NSF GRFP.

I aim to develop AI agents that can intelligently communicate and collaborate with people and each other. My work addresses three key problems:

  1. A central focus of my work is multi-agent communication and collaboration, which has led to work on multi-LLM multi-round discussions/debates, distilling multi-agent behavior, pragmatic/verbalized uncertainty, and persuasion.
  2. Agents must be grounded to the world through their inputs and actions: another line of my work covers multimodal grounding and converting language to action through semantic parsing, text-to-code, and learning abstractions and skills.
  3. Developing safe and robust agents means handling uncertainty, ambiguity, and underspecification. As we scale up tasks, underspecification and ambiguity will become increasingly relevant, especially when predicting actions/grounding to the world. My work covers calibration and uncertainty especially in connection with implicit phenomena such as vagueness, underspecification, and ambiguity. While I’ve mostly explored these topics through a linguistic lens, I am interested in their importance to intelligence more broadly.

Concretely, some of the areas I’ve been publishing on recently are:

Before starting my Ph.D., I received my B.A.&Sc. with First Class Honours in Cognitive Science from McGill University, focusing in computer science and linguistics. While at McGill, I worked as a research assistant at the Montreal Language Modeling Lab (MLML), now MCQLL supervised by Morgan Sonderegger. I wrote my honours thesis (supervised by Timothy O’Donnell) on a variational inference algorithm for a model of language acquisition.

news

Nov 6, 2024 Our philosophy collaboration on challenges in model editing was accepted to TMLR! Fundamental Problems With Model Editing: How Should Rational Belief Revision Work in LLMs?
Oct 21, 2024 New paper out! Teaching Models to Balance Resisting and Accepting Persuasion, where we use multi-agent recursive dialogue trees to teach models to accept and resist persuasion when appropriate. Our method reduces susceptibility to misinformation and flipflopping while also improving LLMs’ ability to act together in a team thru multi-agent dialogue!
Oct 11, 2024 New preprint! DataEnvGym: Data Generation Agents in Teacher Environments with Student Feedback led by Zaid Khan with Jaemin Cho and Mohit Bansal on a novel testbed for creating data generation agents. These agents produce synthetic data for teaching student models based on their errors and weaknesses!
Oct 3, 2024 New preprint! LASeR: Learning to Adaptively Select Reward Models with Multi-Armed Bandits led by Duy Nguyen and Archiki Prasad with Mohit Bansal on using bandit methods to pick the best-suited RM to optimize at an instance level, improving LLMs on reasoning, instruction-following, and long-context understanding.
Oct 2, 2024 Two papers accepted to NeurIPS 2024! LACIE: Listener-Aware Finetuning for Confidence Calibration in Large Language Models uses pragmatics to calibrate LLMs, and GTBench: Uncovering the Strategic Reasoning Limitations of LLMs via Game-Theoretic Evaluations introduces a new game-theoretic benchmark.
Sep 19, 2024 New preprint! MAgICoRe: Multi-Agent, Iterative, Coarse-to-Fine Refinement for Reasoning led by Justin Chih-Yao Chen with Swarnadeep Saha, Archiki Prasad, and Mohit Bansal introduces a novel method for refinement that improves math reasoning by selectively refining only hard instances and by treating it as an iterative, multi-agent problem.
Sep 14, 2024 New preprint! AdaCAD: Adaptively Decoding to Balance Conflicts between Contextual and Parametric Knowledge led by Han Wang with Archiki Prasad and Mohit Bansal introduces a dynamic decoding strategy to deal with variable amounts of knowledge conflict.
Jul 1, 2024 Contrastive Region Guidance: Improving Grounding in Vision-Language Models without Training has been accepted to ECCV 2024!
Jun 3, 2024 New preprint! LACIE: Listener-Aware Finetuning for Confidence Calibration in Large Language Models tackles implicit and explicit calibration in LLMs by using insights from pragmatics!
May 28, 2024 New project on videos+LLMs! VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos uses a tree-based structure to help LLMs reason over long videos efficiently and effectively. Joint work with Ziyang Wang and Shoubin Yu.
May 15, 2024 Soft Self-Consistency Improves Language Model Agents has been accepted to ACL 2024!
May 4, 2024 Three papers accepted to ICML 2024! ReGAL: Refactoring Programs to Discover Generalizable Abstractions , which uses refactoring to discover program abstractions for LLM-based code generation, MAGDi: Structured Distillation of Multi-Agent Interaction Graphs Improves Reasoning in Smaller Language Models , which introduces a structured distillation method for learning from discussions between multiple LLMs, and Language-guided Skill Learning with Temporal Variational Inference, which learns reusable skills from trajectories of demonstrations.
Mar 22, 2024 Excited to be giving a keynote at the UncertaiNLP workshop at EACL 2024, titled Confidence-based Rephrasing, Refinement, and Selection. I’ll cover a wide range of topics including calibration in semantic parsing, using calibrated models to improve usability, underspecified visual question answering and much more!
Mar 5, 2024 New work with David Wan and Jaemin Cho on improving visual tasks (especially grounding) through region-based guidance in Contrastive Region Guidance: Improving Grounding in Vision-Language Models without Training
Feb 3, 2024 New work led by Justin Chen and Swarnadeep Saha on distilling multi-agent LLM interactions into smaller models: MAGDi: Structured Distillation of Multi-Agent Interaction Graphs Improves Reasoning in Smaller Language Models . MAGDi uses a graph structure on top of LLM dialogues to distill reasoning from several large teacher models into a single, lightweight student.
Jan 30, 2024 New preprint! ReGAL: Refactoring Programs to Discover Generalizable Abstractions introduces a new refactoring-based method for learning abstractions for LLM program prediction, improving performance on a variety of tasks. Joint work with Archiki Prasad as part of my postdoc at UNC.
Jan 17, 2024 Two papers accepted to ICLR 2024. Zero and Few-shot Semantic Parsing with Ambiguous Inputs introduces a new benchmark for semantic parsing with ambiguity and tests a variety of models on how they handle five common linguistic ambiguities. Rephrase, Augment, Reason: Visual Grounding of Questions for Vision-Language Models is the first paper from my new postdoc position and introduces RepARe, a method for augmenting and rephrasing VQA questions (especially underspecified ones) to make them easier for zero-shot VL models to answer.
Jan 16, 2024 My thesis is now publicly available: Modeling Meaning for Description and Interaction. Many thanks to my advisor Benjamin Van Durme for all of your guidance over the last five years and to my thesis committee Jacob Andreas and Kyle Rawlins for your feedback!
Jun 3, 2023 I’m incredibly excited to announce that I will be starting a Postdoc with Mohit Bansal at the University of North Carolina, Chapel Hill! Looking forward to lots of collaborations with the amazing students and faculty of UNC NLP and UNC CS!
Jun 1, 2023 Calibrated Interpretation: Confidence Estimation in Semantic Parsing has just been accepted to TACL! We examine the calibration of common semantic parsing models, including LLMs using in-context learning. Check out the paper for results across a number of tasks and datasets!
May 3, 2023 Why Did the Chicken Cross the Road? Rephrasing and Analyzing Ambiguous Questions in VQA has been accepted to ACL 2023! We introduce a brand new dataset of ambiguous questions in VQA, with a model disambiguation model and plenty of linguistic analysis. See you in Toronto!
Mar 31, 2023 I’ve restructured a previous pre-print into two different papers. The first focuses on cataloguing calibration in popular semantic parsing systems, and the second looks at what we can do with a well-calibrated model.
Feb 28, 2023 Super-CLEVR (CVPR highlight), an exciting new benchmark for generalization in vision tasks led by Zhuowan Li now accepted to CVPR 2023 as a highlight (~2% of submissions)! Super-CLEVR: A Virtual Benchmark to Diagnose Domain Robustness in Visual Reasoning
Nov 30, 2022 I am on the job market for faculty, postdoc, and industry positions! Please reach out if know of a role that would be a good fit for me: elias.stengel@gmail.com
Nov 29, 2022 Two new preprints out! On ambiguity in VQA and on calibration in semantic parsing
Oct 7, 2022 Two new papers accepted to EMNLP 2022. Preprints out on arxiv! On subject and object control in LLMs and on a troubling quirk in NLU
Mar 6, 2022 I am starting a year-long internship at MSR Montreal with Marc-Alexandre Côté, Eric Yuan, and Pierre-Yves Oudeyer
Aug 31, 2021 I have completed an internship at Microsoft Semantic Machines, supervised by Yu Su