I am a Computer Science and Artificial Intelligence PhD student at UC Berkeley, advised by
Prof. Matei Zaharia and
Prof. Dan Klein, affiliated with
Sky Lab,
BAIR Lab and
Berkeley NLP Group. My research interests span Artificial Intelligence, Software Engineering, and Programming Languages.
Prior to joining UC Berkeley, I was an
AI4Code Research Fellow
at
Microsoft Research, where I worked with
Dr. Aditya Kanade,
Dr. Navin Goyal,
Dr. Shuvendu Lahiri, and
Dr. Sriram Rajamani, where I focussed on improving the code generation capabilities of Large Language Models (LLMs) and exploring how generative AI can automate software engineering tasks.
My research focuses on improving the quality and correctness of code generated by Large Language Models (LLM) aiming to improve their reliability for software engineering and reasoning tasks. Most recently, I have been focusing on repository-level reasoning for code generation with LLMs. I have also explored long context usage, tool usage, better tokenization, prompting for code and decoding techniques with LLMs. Previously, I have worked in Programming Languages and Systems having developed language runtimes, IDE/Debugger support for languages, and source-to-source transpilers.
↴ (click to expand details on my projects)
dspy.GRPO
- Along with (Noah Ziems, Dilara Soylu and Omar Khattab), I lead the development of dspy.GRPO, which is the first GRPO pipeline for tuning modular agents, including complex compound AI Systems that compose multiple
structured and specialized LM calls and tool invocations. It works by a server-client abstraction, that decouples the GRPO policy gradient updates to the model and complex multi-stage agentic rollouts into separate processes, allowing for much greater flexibility in modular agentic system tuning.
Reliable code generation with LLMs:
- Lead the work on "Monitor-Guided Decoding of Code LMs with Static Analysis of Repository Context" (OpenReview, GitHub) which proposes Monitor Guided Decoding (MGD), a novel decoding technique combining dynamic constrained decoding with language-server-protocol (LSP) based external tool usage. MGD was accepted at NeurIPS '23, and also won first place in Microsoft Global Hackathon on improving productivity. MGD can prevent hallucinated symbols and methods, ensure methods are called in correct order at runtime (following a typestate specification), and have correct number of arguments to function calls, thus preventing various compilation, runtime and security errors in LLM generated code at a minimal overhead. With MGD, we show that even small LMs of size 350M can achieve better compilation rate and ground truth match than much larger LM of size 175B, and achieve 20-25% improvements in compilation rate for generated code across all model sizes from 350M-175B.
- Developed multilspy, an OSS library to easily use and launch different language servers, easing the process of creating language server clients for various applications including AI for Code scenarios. Language servers are tools that perform a variety of static analyses on source code and provide useful information such as type-directed code completion suggestions, symbol definition locations, symbol references, etc. multilspy abstracts the setting up of the language servers, performs language-specific configuration and handles communication with the server over the json-rpc based protocol, while exposing a simple interface to the user, allowing LSP use in just 3 lines of code!
- Curated PragmaticCode and DotPrompts, large benchmark of buildable java repositories, which provides a unified harness to compile a diverse set of Java projects abstracting multiple build systems, thus allowing for pragmatic evaluations of Code-LMs. The datasets consist of 10,000+ prompts that require repository-level understanding to complete. Each prompt is a method-completion task.
Programming Languages and Systems:
- As a research intern under Prof. James Larus at VLSC Lab, EPFL, I developed StreamBlocks GraalVM, the CPU runtime for CAL (a dataflow programming language), along with IDE and debugger support for it (Google Slides, GitHub).
- As a Google Summer of Code student, I developed Pytranslate, a programming language transpiler to convert Maxima (computer algebra system) to Python, implemented in Common Lisp. It is now a part of all Maxima installations.
- I am passionate for open source software, and apart from my own open source projects, I have contributed to projects like INRIA/spoon, lucidrains/memorizing-transformers-pytorch, mozilla/bugbug. You can find more about my open source work on GitHub: LakshyAAAgrawal.
Keywords:
Large Language Models, AI4Code, Code Generation, Static Analysis, Software Engineering, LLM Tool Usage, LLM Decoding Techniques
For more details about my background, refer to my
CV. If you'd like to chat with me about my work or research in general, feel free to reach out
via email!
If you would like to contact me anonymously, kindly fill
this anonymous form.
LangProBe: a Language Programs Benchmark
Shangyin Tan, Lakshya A Agrawal, Arnav Singhvi, Liheng Lai, Michael J Ryan, Dan Klein, Omar Khattab, Koushik Sen, Matei Zaharia
Association for Computational Linguistics (ACL) ARR, February 2025
Why Do Multiagent Systems Fail?
Melissa Z Pan, Mert Cemri, Lakshya A Agrawal, Shuyi Yang, Bhavya Chopra, Rishabh Tiwari, Kurt Keutzer, Aditya Parameswaran, Kannan Ramchandran, Dan Klein, Joseph E Gonzalez, Matei Zaharia, Ion Stoica
ICLR 2025 Workshop on Building Trust in Language Models and Applications
dspy.GRPO
- Along with (Noah Ziems, Dilara Soylu and Omar Khattab), I lead the development of dspy.GRPO, which is the first GRPO pipeline for tuning modular agents, including complex compound AI Systems that compose multiple
structured and specialized LM calls and tool invocations. It works by a server-client abstraction, that decouples the GRPO policy gradient updates to the model and complex multi-stage agentic rollouts into separate processes, allowing for much greater flexibility in modular agentic system tuning.
Reliable code generation with LLMs:
- Lead the work on "Monitor-Guided Decoding of Code LMs with Static Analysis of Repository Context" (OpenReview, GitHub) which proposes Monitor Guided Decoding (MGD), a novel decoding technique combining dynamic constrained decoding with language-server-protocol (LSP) based external tool usage. MGD was accepted at NeurIPS '23, and also won first place in Microsoft Global Hackathon on improving productivity. MGD can prevent hallucinated symbols and methods, ensure methods are called in correct order at runtime (following a typestate specification), and have correct number of arguments to function calls, thus preventing various compilation, runtime and security errors in LLM generated code at a minimal overhead. With MGD, we show that even small LMs of size 350M can achieve better compilation rate and ground truth match than much larger LM of size 175B, and achieve 20-25% improvements in compilation rate for generated code across all model sizes from 350M-175B.
- Developed multilspy, an OSS library to easily use and launch different language servers, easing the process of creating language server clients for various applications including AI for Code scenarios. Language servers are tools that perform a variety of static analyses on source code and provide useful information such as type-directed code completion suggestions, symbol definition locations, symbol references, etc. multilspy abstracts the setting up of the language servers, performs language-specific configuration and handles communication with the server over the json-rpc based protocol, while exposing a simple interface to the user, allowing LSP use in just 3 lines of code!
- Curated PragmaticCode and DotPrompts, large benchmark of buildable java repositories, which provides a unified harness to compile a diverse set of Java projects abstracting multiple build systems, thus allowing for pragmatic evaluations of Code-LMs. The datasets consist of 10,000+ prompts that require repository-level understanding to complete. Each prompt is a method-completion task.
Programming Languages and Systems:
- As a research intern under Prof. James Larus at VLSC Lab, EPFL, I developed StreamBlocks GraalVM, the CPU runtime for CAL (a dataflow programming language), along with IDE and debugger support for it (Google Slides, GitHub).
- As a Google Summer of Code student, I developed Pytranslate, a programming language transpiler to convert Maxima (computer algebra system) to Python, implemented in Common Lisp. It is now a part of all Maxima installations.
- I am passionate for open source software, and apart from my own open source projects, I have contributed to projects like INRIA/spoon, lucidrains/memorizing-transformers-pytorch, mozilla/bugbug. You can find more about my open source work on GitHub: LakshyAAAgrawal.
IIIT-D
2018 - 2022
Google Summer of Code
2019
Summer@EPFL
2020, 2021
Microsoft SDE Intern
2021
Microsoft Research Fellow
2022 - 2024
PhD, UC Berkeley Sky Lab
2024 - Present
Presenting MGD at NeurIPS '23
Receiving certificate from the Chief Minister of Chhattisgarh
Presenting MGD at NeurIPS '23
Receiving Institute Medals at IIIT-D Convocation
Presenting MGD at Microsoft Global Hackathon
Previous
Next