North East AI Agents Day

🗓️ May 8th, 2026
📍 Jane Street Offices, New York

The goal of this workshop is to offer a comprehensive overview of AI agents, bring ML, Systems, and HCI research communities together to share progress, discuss common problems and evaluation setups, and identify opportunities for collaboration. We aim to bring together attendees from diverse disciplines to foster interdisciplinary collaboration and discuss open research questions.

Call for Papers

We welcome contributions to a broad range of topics on the design, development, systems support, and applications of AI Agents.

Interfaces and Embodiment

How should agents perceive and act within their environments (e.g., through tool use, exploration, or interactions)?

Training Recipes and Data for Agentic Behavior

Enhancing agent capabilities through more effective training algorithms and better data mixtures, and evaluation-driven practices.

Safety and Societal Implications

Guardrails necessary to ensure AI agents operate safely in real-world settings, plus ethical and economic considerations.

Human-Agent Interaction, Alignment, and Personalization

How to personalize AI agents to adapt their behavior to dynamic user preferences across diverse contexts.

Applications

Computer use, data science, applications for science (e.g., scientific discovery and deep research), education, healthcare, and software development.

Benchmarking and Evaluation

Development of robust environments and evaluation metrics that capture the diversity of real-world settings.

Scalable Architectures for Training and Running Agents

Infrastructure to train, evaluate, and deploy agents reliably at scale under strict latency, cost, data, and quality requirements.

Context, State, and Orchestration of Agentic Systems

Mechanisms for managing agent state representation and coordination through shared data, memory, or environments.

Governance, Observability, Debugging, and Reproducibility

Building tools for tracing, provenance, replay, and root-cause analysis supporting agent inspection and reproducibility.

Sponsors

Contact ewu@cs.columbia.edu if interested in supporting the event.

Program

9:00–10:00
Arrivals
10:00–10:40
10:40–11:10
Lightning Talks 1A Systems / MLSys
11:10–11:25
Break
11:25–12:25
Lightning Talks 1B Machine Learning + Multi-Agents
12:25–1:25
Lunch
1:25–2:00
Lightning Talks 2A Data and Governance
2:00–2:55
Faculty Panel (55 min)
2:55–3:20
Lightning Talks 2B HCI
3:20–4:00
Mentoring Session + Coffee
4:00–6:00
Posters
Lightning Talks (22)

Session 1A: Systems / MLSys

  1. Staggered Training: Efficient RL for Agents Morgan Borjigin-Wang, Deepti Raghavan, Malte Schwarzkopf
  2. Impact of Scheduling for Terminal Agent Workloads on Unified-Memory Workstations Yuanli Wang, Vasiliki Kalavri
  3. A Simple and Fast Way to Handle Semantic Errors in Transactions Jinghan Zeng, Eugene Wu, Sanjay Krishnan, Wyatt Lloyd, Jialin Ding
  4. Cause and (Energy) Effect: Closing the Causal Loop in Smart Buildings Taqiya Ehsan, Shuren Xia, Jorge Ortiz
  5. Interactive, Interpretable, and Optimized Execution of Deep Research Queries Matthew Russo, Yash Agarwal, Tianyu Li, Zhuohan Gu, Mike Cafarella, Omar Khattab, Tim Kraska, Samuel Madden
  6. Towards Understanding Challenges Surrounding Agentic System Design Michael Shen, Xiaoyue Zhou, Ishan Patwardhan, Yueying Li, Udit Gupta

Session 1B: Machine Learning + Multi-Agents

  1. Subtask Decomposition Improves Cross-Task Skill Transfer in LLM Agents Yiyang Feng, Biddut Sarker Bijoy, Niranjan Balasubramanian, Jiawei Zhou
  2. RVR: Retrieve-Verify-Retrieve for Comprehensive Question Answering Deniz Qian, Hung-Ting Chen, Eunsol Choi
  3. Shoot First, Ask Questions Later? Building Rational Agents that Explore and Act Like People Gabriel Grand, Valerio Pepe, Joshua B. Tenenbaum, Jacob Andreas
  4. OpenApps: computer-use agent reliability Karen Ullrich, Jingtong Su, Claudia Shi, Arjun Subramonian, Amir Bar, Ivan Evtimov, Nikolaos Tsilivis, Randall Balestriero, Julia Kempe, Mark Ibrahim
  5. Orla: A Library for Serving LLM-Based Multi-Agent Systems Rana Shahout, Hayder Tirmazi, Minlan Yu, Michael Mitzenmacher
  6. DIAGPaper: Diagnosing Valid and Specific Weaknesses in Scientific Papers via Multi-Agent Reasoning Zhuoyang Zou, Abolfazl Ansari, Delvin Ce Zhang, Dongwon Lee, Wenpeng Yin
  7. Towards Efficient Communication In Multi-Agent Deliberation Weifan Jiang, Rana Shahout, Zhenting Qi, Yilun Du, Michael Mitzenmacher, Minlan Yu

Session 2A: Data and Governance

  1. Data Flow Control: Ensuring Safety for Data-Centric AI Agents Charlie Summers
  2. Syscon: Kernel-Level Syscall Monitoring for AI Agent Control in Sandboxed Environments Jade Sheffey, Joyce Levine, Joyce Werhane, Amir Houmansadr
  3. Measuring Legal Compliance in LLMs: A Causal Benchmark for Privacy Michael Morgenthal, Sungjun Lee, Trinav Bhattacharyya, Pooyan Jamshidi, Baishakhi Ray, Sebastian Zimmeck
  4. Utility-Aware Human–LLM Agent Orchestration for Data Science Pipelines Sohrab Namazi Nia, Senjuti Basu Roy
  5. Agentic Exploration of Large-Scale Graphs with Arkouda and Arachne David A. Bader

Session 2B: HCI

  1. VibeJam: An Open Platform for User Studies on Agentic Vibe Coding Nishant Balepur, Connor Baumler, Valerie Chen, Eunsol Choi, Rachel Rudinger, Jordan Lee Boyd-Graber
  2. Useless but Safe? Benchmarking Utility Recovery with User Intent Clarification in Multi-Turn Conversations Mingqian Zheng, Malia Morgan, Liwei Jiang, Carolyn Rose, Maarten Sap
  3. Beyond Benchmarks: What Might User-Centered Evaluation of Agentic AI Look Like? Tao Long, Bingcan Guo
  4. Vibe Debugging with autopsy_report Jeffrey Tao
Poster Presentations (55)
  • "Newspaper Eat" Means "Not Tasty": A Taxonomy and Benchmark for Coded Language in Real-World Chinese Online Reviews Ruyuan Wan, Changye Li, Ting-Hao Kenneth Huang
  • A Locally Agentic AI for 3D Neurosurgical Visualization and Segmentation Mohammad Peivandi, John Acosta, Lijing Wang, Zhifeng Kou
  • AgentFlow: Iterative Prompt Refinement for AI-Human Product Management Automation Adrian Dsouza
  • Agentic Aggregation for Parallel Scaling of Long-Horizon Agentic Tasks Yoonsang Lee, Howard Yen, Xi Ye, Danqi Chen
  • Agents as Auditors: Detecting Malicious UI Flows via Trajectory-Level Analysis Aryan Kaul, Zhuo Zhang, Yuhang Chen, Tianlong Chen
  • Agents on the Edge: Designing Agentic Systems for Real-Time Analysis Across Edge and Cloud Kausar Patherya, Francisco Alejandro Romero
  • An Agentic Framework for Therapeutic Reasoning in Oncology Sujoy Banik, Koushik Howlader, Zainab Ghafoor, Ushashi Bhattacharjee, Tanusree Bhattacharjee, Sayantan Chakraborty, Adrito Roy, Tirtho Roy
  • AutoMetas: Automated Metadata Design for Natural Language Retrieval Zhuohan Gu, Tianyu Li, Matthew Russo, Mike Cafarella, Tim Kraska, Omar Khattab, Samuel Madden
  • Bayes Data Agents: A Decentralized Approach to Balanced Synthetic Data Victor Perotti
  • Bayesian Deep Reinforcement Learning for Multi-Agent Crowds Bilas Talukdar, Tomer Weiss
  • Bridging Semantic and Structural Manifolds: A Zero-Shot Vision-Language Agent for Earth Observation and Anomaly Detection SHIH-CHIH LIN, Jia-Xian Jian, YunTung Chu, Wei-Chieh Sun, Fang-Yi Lin
  • CEOBench: Evaluating Language Model Agents as Long-Horizon Strategic Decision-Makers Haozhe Chen, Zhuang Liu
  • Characterizing the Solver-Like Behavior in LLM-as-Formalizer Ceyhun Efe Kayan, Li Zhang
  • Corpus-Controlled Agent Specialization: Hyper-Selective Document Ingestion as a Zero-Training Path to Domain Expertise Abdul Azeez Shaik, Thomas Gammer, Anthony Caruso, Sean Roche, Joshua Lanjwal
  • CullinAIre: A Multi-Agent Framework for Nutrition-Grounded Intelligent Cooking Meghana Bangari, Keith Anderson, Aritra Dasgupta
  • Demonstration of Doc*: Access Control for Information-Theoretically Secure Data Komal Kumari, Yin Li, Sharad Mehrotra, Shantanu Sharma, Venkata Krishna Sastry Sreeramakavacham
  • Designing for Doubt: The Case for Informed Refusal in Autonomous Agents Victor Ojewale, Suresh Venkatasubramanian
  • Discovera: A Workflow-Aligned AI Agent for Signature-to-Mechanisms Analysis Daniela Pinto Veizaga, Aécio Santos, Eden Wu, Sarah Keegan, Wenke Liu, David Fenyo, Juliana Freire
  • Do Human Organizational Principles Transfer to AI Agents? An Empirical Test Using Organizational Design Metrics Jessica Ezemba, Christopher McComb, Conrad Tucker
  • Document-Native Orchestration for Long-Horizon LLM Agents Levi Lian, Yining Hua
  • DoubleAgents: Iterative Human–Agent Alignment through Distributed Cognition Tao Long, Xuanming Zhang, Sitong Wang, Zhou Yu, Lydia Chilton
  • Dynamic Agentic Workflow for Multi-Objective Alloy Discovery Satanu Ghosh, Collin Holgate, Neal R Brodnik, Doug Downey, Samantha Daly, Tresa Pollock, Samuel Carton
  • E-GEO: A Testbed for Generative Engine Optimization in E-Commerce Puneet Singh Bagga, Vivek Farias, Tamar Korkotashvili, Tianyi Peng, Yuhang Wu
  • Escaping the AI Agent Trap: On a Defense via Differentiable Modal Logic Antonin Sulc
  • Explainable Agentic Runtime Control for Cache-Coherence Orchestration in Heterogeneous SoCs Je Yang, Luca Carloni
  • Facet: Decomposing Prompts into Reusable Primitives for Transfer Jixian Su, Yusen Zhang, Eugene Wu, Kexin Rong
  • FactorLens: Dissecting AI Trading Agents with Progressive Factor Stripping Hangyu Zhou, Jung Hun Phee
  • FlexSQL: Flexible Exploration and Execution Make Better Text-to-SQL Agents Quang Hieu Pham, Yang He, Ping Nie, Canwen Xu, Davood Rafiei, Yuepeng Wang, Xi Ye, Jocelyn Qiaochu Chen
  • Formalization of Partially Observable Environments Cassie Huang, Li Zhang
  • From Individual Expertise to Distributed Cognition: Towards Community-Centered Agentic AI for Environmental Sensemaking Avina Nakarmi, Subhodeep Ghosh, Aritra Dasgupta
  • Hygieia: A Multi-Modal Agent for Rare Disease Diagnosis and Gene Detection Tianyu Liu, Wangjie Zheng, Rui Yang, Botao Yu, Weihao Xuan, Kexin Huang, Nan Liu, James Zou, Hua Xu, Hongyu Zhao
  • Kourai Khryseai: Transparent Human-on-the-Loop Multi-Agent Software Development Arnaldo Barea
  • KramaBench: Benchmarking Agentic Data Science Gerardo Vitagliano, Eugenie Lai, Ziyu Zhang, Mike Cafarella, Tim Kraska, Samuel Madden
  • LakeAgents: An LLM-Based Multi-Agent Framework for Tabular Dataset Discovery Yao Tang, Guillaume Lachaud, Fatemeh Nargesian
  • Learning Dynamic Coalition Formation for AI Agents in Multi-Agent Reinforcement Learning Adel Abusitta
  • MetaSymbO: Multi-Agent Language-Guided Metamaterial Discovery via Symbolic Latent Optimization Jianpeng Chen, Wangzhi Zhan, Dawei Zhou
  • Morpheus: A Graph-Native Reasoning Substrate for LLM Agents Hao Shi, Iori Oikawa
  • OmniSch: A Multimodal PCB Schematic Benchmark For Structured Diagram Visual Reasoning Taiting Lu, Kaiyuan Lin, Runze Liu, Zhenghao Li, Mahanth Gowda
  • Ping: A Social Network Mediated by Always-on Agents Naveen Venkat, Shuze Chen
  • Privileged Information Distillation for Language Models Emiliano Penaloza, Dheeraj Vattikonda, Nicolas Gontier, Alexandre Lacoste, Laurent Charlin, Massimo Caccia
  • Prove2Me: An Open Agentic Platform for Scaling Math Formalization Shuze Chen, Tianyi Peng
  • PuzRian: Puzzle Environments for Reasoning in Visual-Language Agents Jiajun Hong, Jiawei Zhou
  • Rest Easy: Constraining Untrusted Agent Tool Calls with Haven Justus Adam, Yuchen Lu, Deepti Raghavan, Malte Schwarzkopf, Nikos Vasilakis
  • RuleSmith: Multi-Agent LLMs for Automated Game Balancing Ziyao Zeng, Chen Liu, Tianyu Liu, Hao Wang, Xiatao Sun, Fengyu Yang, Xiaofeng Liu, Zhiwen Fan
  • SAGA: Accelerating Scientific Discovery with Autonomous Goal-Evolving Agents Yuanqi Du, Botao Yu, Tianyu Liu, Tony Shen, Junwu Chen, Jan G. Rittig, Kunyang Sun, Yikun Zhang, Cassandra Masschelein, Yingze Wang, Haorui Wang, Haojun Jia, Chao Zhang, Hongyu Zhao, Martin Ester, Teresa Head-Gordon, Carla P Gomes, Huan Sun, Chenru Duan, Philippe Schwaller, Wengong Jin
  • SemaTune: Semantic-Aware Online OS Tuning with LLM Agents Georgios Liargkovas, Mihir Nitin Joshi, Hubertus Franke, Kostis Kaffes
  • Sherlock: Efficient and Reliable Execution of Agentic Workflows Yeonju Ro, Haoran Qiu, Íñigo Goiri, Rodrigo Fonseca, Aditya Akella, Zhangyang Wang, Mattan Erez, Esha Choukse
  • SOTOPIA-TOM: Evaluating Information Management in Multi-Agent Interaction with Theory of Mind Yashwanth YS, Ruichen Wang, Shihua Zeng, Xuhui Zhou, Koichi Onoue, Vasudha Varadarajan, Maarten Sap
  • Stop Guessing Joins: Deterministic Schema Linking for Text-to-SQL Agents Mayur Kulkarni, Prajwal Raghunath, Charlie Summers
  • We Let Our Agent Hack the Kernel and It Only Panicked Twice! Tal Zussman, Jeremy Carin
  • Weights to Workflows: Large-Scale Agentic Synthesis of Verified Inference Pipelines Yunqi Li, Yongjoo Park
  • What Does It Take to Detect an AI Agent? Minimal Feature Sets for Behavioral Detection under Browser Automation Vishisht Choudhary, Lukas Schmidt, Anne Zoe Kenntner, Feras Skhab, Michel Osswald, Jens Ernstberger
  • Who Holds the Reins in Agent-Assisted Systems Biology? Noga Aharony
  • Why Do LLM-based Web Agents Fail? A Hierarchical Planning Perspective Mohamed Aghzal, Gregory J. Stein, Ziyu Yao

Photos

Organizers

Steering Committee
Deepti Raghavan

Deepti Raghavan

Brown University

Dom Moritz

Dom Moritz

Carnegie Mellon University

Eugene Wu

Eugene Wu

Columbia University

Kexin Rong

Kexin Rong

Georgia Institute of Technology

Kostis Kaffes

Kostis Kaffes

Columbia University

Ofir Press

Ofir Press

Princeton University

Ravi Netravali

Ravi Netravali

Princeton University

Ryan Marcus

Ryan Marcus

University of Pennsylvania

Shuyan Zhou

Shuyan Zhou

Duke University

Tianyi Peng

Tianyi Peng

Columbia Business School

Zhou Yu

Zhou Yu

Columbia University

Zhuang Liu

Zhuang Liu

Princeton University

Student Committee
Haonan Wang

Haonan Wang

Columbia University

Jeffrey Cho

Jeffrey Cho

University of Pennsylvania

Siyan Sylvia Li

Siyan Sylvia Li

Columbia University

Tao Long

Tao Long

Columbia University

Yanzhe Zhang

Yanzhe Zhang

Georgia Institute of Technology

Yusen Zhang

Yusen Zhang

Columbia University

Billy Zhang

Billy Zhang

Columbia University