North East AI Agents Day
🗓️ May 8th, 2026
📍 Jane Street Offices, New York
The goal of this workshop is to offer a comprehensive overview of AI agents, bring ML, Systems, and HCI research communities together to share progress, discuss common problems and evaluation setups, and identify opportunities for collaboration. We aim to bring together attendees from diverse disciplines to foster interdisciplinary collaboration and discuss open research questions.
Call for Papers
We welcome contributions to a broad range of topics on the design, development, systems support, and applications of AI Agents.
Interfaces and Embodiment
How should agents perceive and act within their environments (e.g., through tool use, exploration, or interactions)?
Training Recipes and Data for Agentic Behavior
Enhancing agent capabilities through more effective training algorithms and better data mixtures, and evaluation-driven practices.
Safety and Societal Implications
Guardrails necessary to ensure AI agents operate safely in real-world settings, plus ethical and economic considerations.
Human-Agent Interaction, Alignment, and Personalization
How to personalize AI agents to adapt their behavior to dynamic user preferences across diverse contexts.
Applications
Computer use, data science, applications for science (e.g., scientific discovery and deep research), education, healthcare, and software development.
Benchmarking and Evaluation
Development of robust environments and evaluation metrics that capture the diversity of real-world settings.
Scalable Architectures for Training and Running Agents
Infrastructure to train, evaluate, and deploy agents reliably at scale under strict latency, cost, data, and quality requirements.
Context, State, and Orchestration of Agentic Systems
Mechanisms for managing agent state representation and coordination through shared data, memory, or environments.
Governance, Observability, Debugging, and Reproducibility
Building tools for tracing, provenance, replay, and root-cause analysis supporting agent inspection and reproducibility.
Sponsors
Contact ewu@cs.columbia.edu if interested in supporting the event.
Program
Vahab Mirrokni
Kostis Kaffes, Ryan Marcus, Nathan Malkin, Zhuang Liu, and Hale Sirin.
Mentors: Tianyu Liu, Jialin Ding, Francisco Romero, Adel Abu Sitta, and Kexin Rong.
Session 1A: Systems / MLSys
- Staggered Training: Efficient RL for Agents
- Impact of Scheduling for Terminal Agent Workloads on Unified-Memory Workstations
- A Simple and Fast Way to Handle Semantic Errors in Transactions
- Cause and (Energy) Effect: Closing the Causal Loop in Smart Buildings
- Interactive, Interpretable, and Optimized Execution of Deep Research Queries
- Towards Understanding Challenges Surrounding Agentic System Design
Session 1B: Machine Learning + Multi-Agents
- Subtask Decomposition Improves Cross-Task Skill Transfer in LLM Agents
- RVR: Retrieve-Verify-Retrieve for Comprehensive Question Answering
- Shoot First, Ask Questions Later? Building Rational Agents that Explore and Act Like People
- OpenApps: computer-use agent reliability
- Orla: A Library for Serving LLM-Based Multi-Agent Systems
- DIAGPaper: Diagnosing Valid and Specific Weaknesses in Scientific Papers via Multi-Agent Reasoning
- Towards Efficient Communication In Multi-Agent Deliberation
Session 2A: Data and Governance
- Data Flow Control: Ensuring Safety for Data-Centric AI Agents
- Syscon: Kernel-Level Syscall Monitoring for AI Agent Control in Sandboxed Environments
- Measuring Legal Compliance in LLMs: A Causal Benchmark for Privacy
- Utility-Aware Human–LLM Agent Orchestration for Data Science Pipelines
- Agentic Exploration of Large-Scale Graphs with Arkouda and Arachne
Session 2B: HCI
- VibeJam: An Open Platform for User Studies on Agentic Vibe Coding
- Useless but Safe? Benchmarking Utility Recovery with User Intent Clarification in Multi-Turn Conversations
- Beyond Benchmarks: What Might User-Centered Evaluation of Agentic AI Look Like?
- Vibe Debugging with autopsy_report
- "Newspaper Eat" Means "Not Tasty": A Taxonomy and Benchmark for Coded Language in Real-World Chinese Online Reviews
- A Locally Agentic AI for 3D Neurosurgical Visualization and Segmentation
- AgentFlow: Iterative Prompt Refinement for AI-Human Product Management Automation
- Agentic Aggregation for Parallel Scaling of Long-Horizon Agentic Tasks
- Agents as Auditors: Detecting Malicious UI Flows via Trajectory-Level Analysis
- Agents on the Edge: Designing Agentic Systems for Real-Time Analysis Across Edge and Cloud
- An Agentic Framework for Therapeutic Reasoning in Oncology
- AutoMetas: Automated Metadata Design for Natural Language Retrieval
- Bayes Data Agents: A Decentralized Approach to Balanced Synthetic Data
- Bayesian Deep Reinforcement Learning for Multi-Agent Crowds
- Bridging Semantic and Structural Manifolds: A Zero-Shot Vision-Language Agent for Earth Observation and Anomaly Detection
- CEOBench: Evaluating Language Model Agents as Long-Horizon Strategic Decision-Makers
- Characterizing the Solver-Like Behavior in LLM-as-Formalizer
- Corpus-Controlled Agent Specialization: Hyper-Selective Document Ingestion as a Zero-Training Path to Domain Expertise
- CullinAIre: A Multi-Agent Framework for Nutrition-Grounded Intelligent Cooking
- Demonstration of Doc*: Access Control for Information-Theoretically Secure Data
- Designing for Doubt: The Case for Informed Refusal in Autonomous Agents
- Discovera: A Workflow-Aligned AI Agent for Signature-to-Mechanisms Analysis
- Do Human Organizational Principles Transfer to AI Agents? An Empirical Test Using Organizational Design Metrics
- Document-Native Orchestration for Long-Horizon LLM Agents
- DoubleAgents: Iterative Human–Agent Alignment through Distributed Cognition
- Dynamic Agentic Workflow for Multi-Objective Alloy Discovery
- E-GEO: A Testbed for Generative Engine Optimization in E-Commerce
- Escaping the AI Agent Trap: On a Defense via Differentiable Modal Logic
- Explainable Agentic Runtime Control for Cache-Coherence Orchestration in Heterogeneous SoCs
- Facet: Decomposing Prompts into Reusable Primitives for Transfer
- FactorLens: Dissecting AI Trading Agents with Progressive Factor Stripping
- FlexSQL: Flexible Exploration and Execution Make Better Text-to-SQL Agents
- Formalization of Partially Observable Environments
- From Individual Expertise to Distributed Cognition: Towards Community-Centered Agentic AI for Environmental Sensemaking
- Hygieia: A Multi-Modal Agent for Rare Disease Diagnosis and Gene Detection
- Kourai Khryseai: Transparent Human-on-the-Loop Multi-Agent Software Development
- KramaBench: Benchmarking Agentic Data Science
- LakeAgents: An LLM-Based Multi-Agent Framework for Tabular Dataset Discovery
- Learning Dynamic Coalition Formation for AI Agents in Multi-Agent Reinforcement Learning
- MetaSymbO: Multi-Agent Language-Guided Metamaterial Discovery via Symbolic Latent Optimization
- Morpheus: A Graph-Native Reasoning Substrate for LLM Agents
- OmniSch: A Multimodal PCB Schematic Benchmark For Structured Diagram Visual Reasoning
- Ping: A Social Network Mediated by Always-on Agents
- Privileged Information Distillation for Language Models
- Prove2Me: An Open Agentic Platform for Scaling Math Formalization
- PuzRian: Puzzle Environments for Reasoning in Visual-Language Agents
- Rest Easy: Constraining Untrusted Agent Tool Calls with Haven
- RuleSmith: Multi-Agent LLMs for Automated Game Balancing
- SAGA: Accelerating Scientific Discovery with Autonomous Goal-Evolving Agents
- SemaTune: Semantic-Aware Online OS Tuning with LLM Agents
- Sherlock: Efficient and Reliable Execution of Agentic Workflows
- SOTOPIA-TOM: Evaluating Information Management in Multi-Agent Interaction with Theory of Mind
- Stop Guessing Joins: Deterministic Schema Linking for Text-to-SQL Agents
- We Let Our Agent Hack the Kernel and It Only Panicked Twice!
- Weights to Workflows: Large-Scale Agentic Synthesis of Verified Inference Pipelines
- What Does It Take to Detect an AI Agent? Minimal Feature Sets for Behavioral Detection under Browser Automation
- Who Holds the Reins in Agent-Assisted Systems Biology?
- Why Do LLM-based Web Agents Fail? A Hierarchical Planning Perspective
Photos
Selected photos from North East AI Agents Day 2026.
View full album