Lakshya A Agrawal

I am a Computer Science and Artificial Intelligence PhD student at UC Berkeley, advised by Prof. Matei Zaharia and Prof. Dan Klein, affiliated with Sky Lab, BAIR Lab and Berkeley NLP Group. My research interests span Artificial Intelligence, Software Engineering, and Programming Languages.

Prior to joining UC Berkeley, I was an AI4Code Research Fellow at Microsoft Research, where I worked with Dr. Aditya Kanade, Dr. Navin Goyal, Dr. Shuvendu Lahiri, and Dr. Sriram Rajamani, where I focussed on improving the code generation capabilities of Large Language Models (LLMs) and exploring how generative AI can automate software engineering tasks.

My research focuses on improving the quality and correctness of code generated by Large Language Models (LLM) aiming to improve their reliability for software engineering and reasoning tasks. Most recently, I have been focusing on repository-level reasoning for code generation with LLMs. I have also explored long context usage, tool usage, better tokenization, prompting for code and decoding techniques with LLMs. Previously, I have worked in Programming Languages and Systems having developed language runtimes, IDE/Debugger support for languages, and source-to-source transpilers.

↴ (click to expand details on my projects)

GEPA

Led the development of "GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning", introducing a novel approach for optimizing large language model (LLM) based pipelines through interpretable, language-driven reflection rather than traditional reinforcement learning (RL) reward signals. GEPA (arXiv) is a general-purpose prompt optimizer designed to rapidly and efficiently adapt LLM-based systems to downstream tasks by leveraging natural language feedback and meta-reasoning. Unlike conventional RL methods such as Group Relative Policy Optimization (GRPO), which require extensive trial rollouts and often rely on sparse, scalar rewards, GEPA systematically samples complete system trajectories—including reasoning chains, tool usage, and outputs—and uses natural language to diagnose errors, propose and verify prompt adjustments, and synthesize high-level rules from the Pareto frontier of its own prompt revisions. This reflection-driven learning mechanism enables GEPA to achieve substantial improvements in adaptation quality using as little as 1/35th the number of rollouts required by GRPO. Empirical results across four diverse tasks demonstrate that GEPA outperforms GRPO by an average of 10%, and by up to 20% in certain scenarios, while also surpassing the current state-of-the-art prompt optimizer MIPROv2 by more than 10% on multiple LLM backbones. Additionally, GEPA shows strong promise as an inference-time search method for automatic code optimization.

dspy.GRPO

Along with (Noah Ziems, Dilara Soylu and Omar Khattab), I lead the development of dspy.GRPO, which is the first GRPO pipeline for tuning modular agents, including complex compound AI Systems that compose multiple structured and specialized LM calls and tool invocations. It works by a server-client abstraction, that decouples the GRPO policy gradient updates to the model and complex multi-stage agentic rollouts into separate processes, allowing for much greater flexibility in modular agentic system tuning.

Reliable code generation with LLMs:

Lead the work on "Monitor-Guided Decoding of Code LMs with Static Analysis of Repository Context" (OpenReview, GitHub) which proposes Monitor Guided Decoding (MGD), a novel decoding technique combining dynamic constrained decoding with language-server-protocol (LSP) based external tool usage. MGD was accepted at NeurIPS '23, and also won first place in Microsoft Global Hackathon on improving productivity. MGD can prevent hallucinated symbols and methods, ensure methods are called in correct order at runtime (following a typestate specification), and have correct number of arguments to function calls, thus preventing various compilation, runtime and security errors in LLM generated code at a minimal overhead. With MGD, we show that even small LMs of size 350M can achieve better compilation rate and ground truth match than much larger LM of size 175B, and achieve 20-25% improvements in compilation rate for generated code across all model sizes from 350M-175B.
Developed multilspy, an OSS library to easily use and launch different language servers, easing the process of creating language server clients for various applications including AI for Code scenarios. Language servers are tools that perform a variety of static analyses on source code and provide useful information such as type-directed code completion suggestions, symbol definition locations, symbol references, etc. multilspy abstracts the setting up of the language servers, performs language-specific configuration and handles communication with the server over the json-rpc based protocol, while exposing a simple interface to the user, allowing LSP use in just 3 lines of code!
Curated PragmaticCode and DotPrompts, large benchmark of buildable java repositories, which provides a unified harness to compile a diverse set of Java projects abstracting multiple build systems, thus allowing for pragmatic evaluations of Code-LMs. The datasets consist of 10,000+ prompts that require repository-level understanding to complete. Each prompt is a method-completion task.

Programming Languages and Systems:

As a research intern under Prof. James Larus at VLSC Lab, EPFL, I developed StreamBlocks GraalVM, the CPU runtime for CAL (a dataflow programming language), along with IDE and debugger support for it (Google Slides, GitHub).
As a Google Summer of Code student, I developed Pytranslate, a programming language transpiler to convert Maxima (computer algebra system) to Python, implemented in Common Lisp. It is now a part of all Maxima installations.
I am passionate for open source software, and apart from my own open source projects, I have contributed to projects like INRIA/spoon, lucidrains/memorizing-transformers-pytorch, mozilla/bugbug. You can find more about my open source work on GitHub: LakshyAAAgrawal.

Keywords: Large Language Models, AI4Code, Code Generation, Static Analysis, Software Engineering, LLM Tool Usage, LLM Decoding Techniques

For more details about my background, refer to my CV. If you'd like to chat with me about my work or research in general, feel free to reach out via email! If you would like to contact me anonymously, kindly fill this anonymous form.

Publications

GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning
Lakshya A Agrawal, Shangyin Tan, Dilara Soylu, Noah Ziems, Rishi Khare, Krista Opsahl-Ong, Arnav Singhvi, Herumb Shandilya, Michael J Ryan, Meng Jiang, Christopher Potts, Koushik Sen, Alexandros G. Dimakis, Ion Stoica, Dan Klein, Matei Zaharia, Omar Khattab

LangProBe: a Language Programs Benchmark
Shangyin Tan, Lakshya A Agrawal, Arnav Singhvi, Liheng Lai, Michael J Ryan, Dan Klein, Omar Khattab, Koushik Sen, Matei Zaharia
Association for Computational Linguistics (ACL) ARR, February 2025

Why Do Multiagent Systems Fail?
Melissa Z Pan, Mert Cemri, Lakshya A Agrawal, Shuyi Yang, Bhavya Chopra, Rishabh Tiwari, Kurt Keutzer, Aditya Parameswaran, Kannan Ramchandran, Dan Klein, Joseph E Gonzalez, Matei Zaharia, Ion Stoica
ICLR 2025 Workshop on Building Trust in Language Models and Applications

Monitor-Guided Decoding of Code LMs with Static Analysis of Repository Context
Lakshya A Agrawal, Aditya Kanade, Navin Goyal, Shuvendu Lahiri, Sriram Rajamani
Neural Information Processing Systems (NeurIPS), 2023
PDF Poster Slides NeurIPS Page with Virtual Talk OpenReview GitHub

A SPARQL to Cypher Transpiler: Proposal and Initial Results. Extended Abstract
Lakshya A Agrawal, Nikunj Singhal, Raghava Mutharaju
International Conference on Data Science and Management of Data (CODS-COMAD), 2022
PDF ACM DL Page Google Scholar

A SPARQL to cypher transpiler
Lakshya A Agrawal, Nikunj Singhal, Raghava Mutharaju
Undergraduate Thesis, Computer Science and Applied Mathematics, IIIT-Delhi, 2021 - 22
PDF Poster IIIT-D Archive Google Scholar

A novel sentiment analysis engine for preliminary depression status estimation on social media
Sudhir Kumar Suman, Hrithwik Shalu, Lakshya A Agrawal, Archit Agrawal, Juned Kadiwala
Preprint arXiv:2011.14280, 2020
PDF Google Scholar

Details on my past projects

dspy.GRPO

Along with (Noah Ziems, Dilara Soylu and Omar Khattab), I lead the development of dspy.GRPO, which is the first GRPO pipeline for tuning modular agents, including complex compound AI Systems that compose multiple structured and specialized LM calls and tool invocations. It works by a server-client abstraction, that decouples the GRPO policy gradient updates to the model and complex multi-stage agentic rollouts into separate processes, allowing for much greater flexibility in modular agentic system tuning.

Reliable code generation with LLMs:

Lead the work on "Monitor-Guided Decoding of Code LMs with Static Analysis of Repository Context" (OpenReview, GitHub) which proposes Monitor Guided Decoding (MGD), a novel decoding technique combining dynamic constrained decoding with language-server-protocol (LSP) based external tool usage. MGD was accepted at NeurIPS '23, and also won first place in Microsoft Global Hackathon on improving productivity. MGD can prevent hallucinated symbols and methods, ensure methods are called in correct order at runtime (following a typestate specification), and have correct number of arguments to function calls, thus preventing various compilation, runtime and security errors in LLM generated code at a minimal overhead. With MGD, we show that even small LMs of size 350M can achieve better compilation rate and ground truth match than much larger LM of size 175B, and achieve 20-25% improvements in compilation rate for generated code across all model sizes from 350M-175B.
Developed multilspy, an OSS library to easily use and launch different language servers, easing the process of creating language server clients for various applications including AI for Code scenarios. Language servers are tools that perform a variety of static analyses on source code and provide useful information such as type-directed code completion suggestions, symbol definition locations, symbol references, etc. multilspy abstracts the setting up of the language servers, performs language-specific configuration and handles communication with the server over the json-rpc based protocol, while exposing a simple interface to the user, allowing LSP use in just 3 lines of code!
Curated PragmaticCode and DotPrompts, large benchmark of buildable java repositories, which provides a unified harness to compile a diverse set of Java projects abstracting multiple build systems, thus allowing for pragmatic evaluations of Code-LMs. The datasets consist of 10,000+ prompts that require repository-level understanding to complete. Each prompt is a method-completion task.

Programming Languages and Systems:

As a research intern under Prof. James Larus at VLSC Lab, EPFL, I developed StreamBlocks GraalVM, the CPU runtime for CAL (a dataflow programming language), along with IDE and debugger support for it (Google Slides, GitHub).
As a Google Summer of Code student, I developed Pytranslate, a programming language transpiler to convert Maxima (computer algebra system) to Python, implemented in Common Lisp. It is now a part of all Maxima installations.
I am passionate for open source software, and apart from my own open source projects, I have contributed to projects like INRIA/spoon, lucidrains/memorizing-transformers-pytorch, mozilla/bugbug. You can find more about my open source work on GitHub: LakshyAAAgrawal.

Talks

Guiding Language Models of Code with Global Context using Monitors
Venues: Microsoft Research RiSE Group, Microsoft Research India, Microsoft DevDiv (July, August 2023)
abstract slides

CAL Implementation of GraalVM
Venues: Very Large Scale Computing Lab, Data Center Systems Laboratory (September 2021)
GitHub slides

Past and Present Affiliations

IIIT-D

2018 - 2022

Google Summer of Code

2019

Summer@EPFL

2020, 2021

Microsoft SDE Intern

2021

Microsoft Research Fellow

2022 - 2024

PhD, UC Berkeley Sky Lab

2024 - Present

Website source and acknowledgements