Seattle · UW CS + Physics · AI security + research systems

I like building ambitious things with real technical depth and startup speed.

I’m Divij Chawla. I work on AI security, model evaluation, research infrastructure, and technical systems that are actually hard to build. Right now I’m especially interested in the kinds of ideas that could become important labs, tools, or companies.

Current focus

AI security benchmarks Automated red teaming High-stakes AI evaluation Research infrastructure that actually ships

Positioning

I’m probably most useful where research taste and founder speed both matter.

I like hard technical work, but I also care a lot about pace. I don’t want to just study systems from a distance. I want to build things, run them, break them, and learn fast.

The YC version of that is simple: I’m drawn to technical problems with real surface area, where the right person can move from idea to artifact to insight very quickly.

Selected Work

A few things I’ve built that feel directionally like me.

AI safety Benchmarks

Shutdown-Bench

I’m currently building a benchmark for LLM and agent shutdownability in realistic tool-use settings, with scenario design, failure-mode taxonomy, automated red-team harnesses, and trace scoring for resistance patterns.

LLM evaluation Industry research

FINRISKEVAL at Walled AI

Built FINRISKEVAL at Walled AI: 1,720 profiles, 8 models, 13k+ outputs, and a full eval pipeline for correctness, consistency, and intent alignment in financial decisions.

Read paper
Open-source infra Training systems

OLMo-core contribution

Added data-mixture monitoring and source-level token statistics to AllenAI’s pretraining stack, improving visibility into how training data composition behaves at runtime.

View PR
Security Operational systems

Honeypot and telemetry systems at Emsec

Built and operated a honeypot fleet simulating 7k+ vulnerable applications, investigated attacker behavior across 10+ honeypots, and turned raw telemetry into evidence and detection signals.

Experience

Most of my edge comes from doing real work early.

Now

SPAR Research Fellow, Shutdown-Bench

Building a benchmark for shutdownability in tool-using agents: scenario suite, instruction hierarchy, automated harnesses, and trace-based scoring for delay, deflection, evasion, and goal-preservation behavior.

2025

AI Security Researcher, Walled AI

First-authored an EMNLP Industry paper and worked on high-stakes LLM evals, red-teaming playbooks, domain safety assessments, and guardrail strategy for production deployment.

2024–2025

AI Research Intern, University of Bristol

Built eval harnesses for latent alignment failures via representation backdoor attacks, then measured reliability and mitigation limits under white-box and black-box conditions.

2023–2024

Software Development Intern, Emsec

Operated a large honeypot fleet, investigated live attacker activity, and built telemetry pipelines that made defensive automation more grounded in actual adversarial behavior.

2025

DSP Intern, Emsec

Optimized C++ DSP modules for SDR pipelines and worked on real-time ingestion and low-latency processing paths where performance was not optional.

Builder signal

Lavin + student leadership

I’ve also spent time in venture-building and technical communities at UW, which is probably part of why I think about research in terms of leverage, momentum, and what could become a company.

Proof

The short version of why I think I’m worth betting on.

Taste

I pick problems with real depth

I’m drawn to hard technical areas where there’s still room to move fast: AI security, evals, autonomy, research tooling, and systems that interact with the real world.

Execution

I’ve already done work with actual consequences

My experience is not just class-project-clean. It’s eval pipelines, adversarial settings, attacker telemetry, training infrastructure, and systems where correctness matters.

Trajectory

I’m still early, but the slope is the interesting part

I’m early enough to still be compounding quickly, but already have enough signal that the main question feels less like “can he do it?” and more like “what should he go all in on?”

How I Think

I care most about problems that are both technically real and directionally big.

I’m most useful when a problem is hard to fake, still open enough for good judgment to matter, and important enough that solving it could unlock a lab, a platform, or a company.

I like research that turns into infrastructure, infrastructure that turns into leverage, and products that are honest about what they can actually do.

If you’re building something ambitious in frontier AI, security, developer tools, autonomy, robotics, or research software, I’d probably love to talk.

Contact

Open to research, internships, collaborations, and the right company-building conversations.

Email is the best way to reach me. If you want the cleanest overview, start with the resume. If you want to understand how I think through technical work, GitHub and Scholar are the right tabs.