I am a AI4Code Research Fellow at Microsoft Research, where I work with Dr. Aditya Kanade, Dr. Navin Goyal, Dr. Shuvendu Lahiri, and Dr. Sriram Rajamani. My research interests span Artificial Intelligence, Software Engineering, and Programming Languages. I am currently working on improving the code generation capabilities of Large Language Models (LLMs) and exploring how generative AI can automate software engineering tasks. I will be joining the Sky Lab and BAIR Lab at UC Berkeley as a PhD student starting August 2024.

My research focuses on improving the quality and correctness of code generated by Large Language Models (LLM) aiming to improve their reliability for software engineering and reasoning tasks. Most recently, I have been focusing on repository-level reasoning for code generation with LLMs. I have also explored long context usage, tool usage, better tokenization, prompting for code and decoding techniques with LLMs. Previously, I have worked in Programming Languages and Systems having developed language runtimes, IDE/Debugger support for languages, and source-to-source transpilers.

↴ (click to expand details on my projects)

Keywords: Large Language Models, AI4Code, Code Generation, Static Analysis, Software Engineering, LLM Tool Usage, LLM Decoding Techniques

For more details about my background, refer to my CV. If you'd like to chat with me about my work or research in general, feel free to reach out via email! If you would like to contact me anonymously, kindly fill this anonymous form.
Publications

Monitor-Guided Decoding of Code LMs with Static Analysis of Repository Context
Lakshya A Agrawal, Aditya Kanade, Navin Goyal, Shuvendu Lahiri, Sriram Rajamani
Neural Information Processing Systems (NeurIPS), 2023
PDF Poster Slides NeurIPS Page with Virtual Talk OpenReviewGitHub

A SPARQL to Cypher Transpiler: Proposal and Initial Results. Extended Abstract
Lakshya A Agrawal, Nikunj Singhal, Raghava Mutharaju
International Conference on Data Science and Management of Data (CODS-COMAD), 2022
PDF ACM DL Page Google Scholar

A SPARQL to cypher transpiler
Lakshya A Agrawal, Nikunj Singhal, Raghava Mutharaju
Undergraduate Thesis, Computer Science and Applied Mathematics, IIIT-Delhi, 2021 - 22
PDF Poster IIIT-D Archive Google Scholar

A novel sentiment analysis engine for preliminary depression status estimation on social media
Sudhir Kumar Suman, Hrithwik Shalu, Lakshya A Agrawal, Archit Agrawal, Juned Kadiwala
Preprint arXiv:2011.14280, 2020
PDF Google Scholar

Details on my past projects
Reliable code generation with LLMs:
  • Lead the work on "Monitor-Guided Decoding of Code LMs with Static Analysis of Repository Context" (OpenReview, GitHub) which proposes Monitor Guided Decoding (MGD), a novel decoding technique combining dynamic constrained decoding with language-server-protocol (LSP) based external tool usage. MGD was accepted at NeurIPS '23, and also won first place in Microsoft Global Hackathon on improving productivity. MGD can prevent hallucinated symbols and methods, ensure methods are called in correct order at runtime (following a typestate specification), and have correct number of arguments to function calls, thus preventing various compilation, runtime and security errors in LLM generated code at a minimal overhead. With MGD, we show that even small LMs of size 350M can achieve better compilation rate and ground truth match than much larger LM of size 175B, and achieve 20-25% improvements in compilation rate for generated code across all model sizes from 350M-175B.
  • Developed multilspy, an OSS library to easily use and launch different language servers, easing the process of creating language server clients for various applications including AI for Code scenarios. Language servers are tools that perform a variety of static analyses on source code and provide useful information such as type-directed code completion suggestions, symbol definition locations, symbol references, etc. multilspy abstracts the setting up of the language servers, performs language-specific configuration and handles communication with the server over the json-rpc based protocol, while exposing a simple interface to the user, allowing LSP use in just 3 lines of code!
  • Curated PragmaticCode and DotPrompts, large benchmark of buildable java repositories, which provides a unified harness to compile a diverse set of Java projects abstracting multiple build systems, thus allowing for pragmatic evaluations of Code-LMs. The datasets consist of 10,000+ prompts that require repository-level understanding to complete. Each prompt is a method-completion task.
Programming Languages and Systems:
Talks
Guiding Language Models of Code with Global Context using Monitors
Venues: Microsoft Research RiSE Group, Microsoft Research India, Microsoft DevDiv (July, August 2023)
abstract slides

CAL Implementation of GraalVM
Venues: Very Large Scale Computing Lab, Data Center Systems Laboratory (September 2021)
GitHub slides
Past and Present Affiliations
IIIT-D
2018 - 2022
Google Summer of Code
2019
Summer@EPFL
2020, 2021
Microsoft SDE Intern
2021
Microsoft Research Fellow
2022 - Present





Website source and acknowledgements