Rithesh Murthy

Research

20+ papers · 10+ patents · ICLR, NeurIPS, IEEE and more

Featured Work

Open Source 900+ Stars

Promptomatix: Automated Prompt Optimization Framework

An open-source framework for systematic, automated prompt optimization — making prompt engineering reproducible and scalable. Trusted by 900+ developers on GitHub.

View on GitHub →

arXiv

MobileAIBench: Benchmarking LLMs and LMMs for On-Device Use Cases

First comprehensive framework for evaluating LLMs and LMMs on mobile devices — spanning NLP, safety, and multimodal tasks across performance and efficiency dimensions.

Read paper →

Technical Report #1 Berkeley FCL

xLAM: A Family of Large Action Models to Empower AI Agent Systems

The model family that ranked #1 on the Berkeley Function-Calling Leaderboard, outperforming GPT-4 and Claude-3 on tool-use and agent benchmarks.

Read paper →

NeurIPS 2024 Top-3 HuggingFace

APIGen: Automated Pipeline for Generating Function-Calling Datasets

Automated pipeline for verifiable, diverse function-calling datasets. Associated dataset xlam-function-calling-60k was Top-3 trending on HuggingFace in July 2024.

View on Scholar →

ICLR 2024 Spotlight

Retroformer: Retrospective Large Language Agents with Policy Gradient Optimization

Policy gradient-based framework that reinforces language agents through retrospective reflection. Selected as an ICLR 2024 Spotlight — top ~5% of submissions.

Read paper →

ICLR 2025

Diversity Empowers Intelligence: Integrating Expertise of Software Engineering Agents

Multi-agent framework that leverages the complementary strengths of diverse specialized agents to outperform any single framework on complex software engineering tasks.

Read paper →

All Publications

2026

ICLR 2026

LoCoBench: A Benchmark for Long-Context LLMs in Complex Software Engineering

Jielin Qiu, Zuxin Liu, Zhiwei Liu, Jianguo Zhang, Haolin Chen, Shiyu Wang, Ming Zhu, Liangwei Yang, Juntao Tan, Zhepeng Cen, Cheng Qian, Shelby Heinecke, Weiran Yao, Silvio Savarese, Caiming Xiong, Huan Wang, Rithesh Murthy

2025

ICLR 2025

Diversity Empowers Intelligence: Integrating Expertise of Software Engineering Agents

Kexun Zhang, Weiran Yao, Zuxin Liu, Yihao Feng, Zhiwei Liu, Tian Lan, Lei Li, Renze Lou, Jiacheng Xu, Bo Pang, Yingbo Zhou, Shelby Heinecke, Silvio Savarese, Huan Wang, Rithesh Murthy, Caiming Xiong

arXiv →

2025

arXiv

MobileAIBench: Benchmarking LLMs and LMMs for On-Device Use Cases

Rithesh Murthy, Liangwei Yang, Juntao Tan, Tulika Manoj Awalgaonkar, Yilun Zhou, Shelby Heinecke, Sachin Desai, Chien-Sheng Wu, Ran Xu, Sarah Tan, Jianguo Zhang, Zhiwei Liu, Shirley Kokane, Zuxin Liu, Ming Zhu, Huan Wang, Caiming Xiong, Silvio Savarese

arXiv →

2025

BuildingTrust @ ICLR 2025

ToolScan: A Benchmark For Characterizing Errors In Tool-Use LLMs

Shirley Kokane, Ming Zhu, Tulika Manoj Awalgaonkar, Jianguo Zhang, Akshara Prabhakar, Thai Quoc Hoang, Zuxin Liu, Liangwei Yang, Weiran Yao, Juntao Tan, Zhiwei Liu, Huan Wang, Juan Carlos Niebles, Shelby Heinecke, Rithesh Murthy, Caiming Xiong, Silvio Savarese

2024

Technical Report #1 Berkeley FCL

xLAM: A Family of Large Action Models to Empower AI Agent Systems

Jianguo Zhang, Tian Lan, Ming Zhu, Zuxin Liu, Thai Hoang, Shirley Kokane, Weiran Yao, Juntao Tan, Akshara Prabhakar, Haolin Chen, Zhiwei Liu, Yihao Feng, Tulika Awalgaonkar, Rithesh Murthy, Eric Hu, Zeyuan Chen, Ran Xu, Juan Carlos Niebles, Shelby Heinecke, Huan Wang, Silvio Savarese, Caiming Xiong

arXiv →

2024

NeurIPS 2024 Top-3 HuggingFace

APIGen: Automated Pipeline for Generating Verifiable and Diverse Function-Calling Datasets

Zuxin Liu, Rithesh Murthy, Thai Quoc Hoang, Jianguo Zhang, Ming Zhu, Tian Lan, Shirley Kokane, Juntao Tan, Weiran Yao, Zhiwei Liu, Yihao Feng, Liangwei Yang, Silvio Savarese, Juan Carlos Niebles, Huan Wang, Shelby Heinecke, Caiming Xiong

2024

LLMAgents @ ICLR 2024

AgentOhana: Design Unified Data and Training Pipeline for Effective Agent Learning

Jianguo Zhang, Tian Lan, Zhiwei Liu, Weiran Yao, Juntao Tan, Thai Quoc Hoang, Liangwei Yang, Yihao Feng, Zuxin Liu, Ming Zhu, Tulika Manoj Awalgaonkar, Rithesh Murthy, Juan Carlos Niebles, Silvio Savarese, Shelby Heinecke, Huan Wang, Caiming Xiong

arXiv →

2024

LLMAgents @ ICLR 2024

BOLAA: Benchmarking and Orchestrating LLM Autonomous Agents

Zhiwei Liu, Weiran Yao, Jianguo Zhang, Le Xue, Shelby Heinecke, Yihao Feng, Rithesh Murthy, Zeyuan Chen, Juan Carlos Niebles, Devansh Arpit, Ran Xu, Phil L Mui, Huan Wang, Caiming Xiong, Silvio Savarese

arXiv →

2024

LLMAgents @ ICLR 2024

REX: Rapid Exploration and eXploitation for AI Agents

Shelby Heinecke, Juan Carlos Niebles, Zhiwei Liu, Le Xue, Weiran Yao, Yihao Feng, Zeyuan Chen, Akash Gokul, Rithesh Murthy, Devansh Arpit, Ran Xu, Phil L Mui, Huan Wang, Caiming Xiong, Silvio Savarese

arXiv →

2024

ICLR 2024 Spotlight

Retroformer: Retrospective Large Language Agents with Policy Gradient Optimization

Weiran Yao, Shelby Heinecke, Juan Carlos Niebles, Zhiwei Liu, Yihao Feng, Le Xue, Zeyuan Chen, Rithesh Murthy, Jianguo Zhang, Devansh Arpit, Ran Xu, Phil L Mui, Huan Wang, Caiming Xiong, Silvio Savarese

arXiv →

2018

IJECE 2018

Autonomous Traffic Signal Control Using Decision Tree

Rithesh R N, R Vignesh, Dr. Anala M R

View →

2017

IEEE ICIIP 2017 Best Paper Award

Direction and Gender Classification Using CNN For Side-View Images

Tarun Choubisa, Mohan Kashyap, Rithesh R N, Sampad B. Mohanty

View →

2017

IRJCS 2017

SVM-KNN: A Novel Approach to Classification Based on SVM and KNN

Rithesh R N

View →

Full list on Google Scholar and OpenReview.

Experience

2022 — Now

Senior Applied Scientist

Salesforce AI Research · Palo Alto, CA

Building and shipping AI systems across LLMs, agents, and enterprise products. Work spans research, engineering, and product integration — from concept to production at scale.

Building customer simulation platform to model realistic behavior for agent training & evaluation
Led MobileAIBench — first framework to evaluate LLMs and LMMs on mobile devices
Optimized XGen-Small for Salesforce workflows via quantization, prompt engineering, and synthetic data pipelines
Authored 20+ papers and filed 10+ patents in LLMs, agents, and applied ML

2020 – 2021

Applied ML / Jr. Applied ML Engineer

Embodied, Inc.

Built NLP models for text classification, question answering, and abstractive text summarization. Developed and deployed ML solutions for a social companion robot platform.

2019 – 2020

Graduate Research & Teaching Assistant

University of California, San Diego

Research on Personalized Dialogue Agents with Prof. Julian McAuley. Taught Data Science for Finance, Probability & Statistics, and Big Data Analytics using PySpark.

2019

Data Science Intern

Q-Centrix

Built ML models predicting Cath PCI registry answers from Electronic Medical Records. Built deep learning extractive summarization models for EMRs using ELMo & BERT.

Education

2018 – 2020

MS, Computer Science · AI Specialization

University of California, San Diego

2014 – 2018

BE, Computer Science

R.V. College of Engineering, Bangalore

About

Currently Building

Research Focus

Open To

Research

Featured Work

Promptomatix: Automated Prompt Optimization Framework

MobileAIBench: Benchmarking LLMs and LMMs for On-Device Use Cases

xLAM: A Family of Large Action Models to Empower AI Agent Systems

APIGen: Automated Pipeline for Generating Function-Calling Datasets

Retroformer: Retrospective Large Language Agents with Policy Gradient Optimization

Diversity Empowers Intelligence: Integrating Expertise of Software Engineering Agents

All Publications

LoCoBench: A Benchmark for Long-Context LLMs in Complex Software Engineering

Diversity Empowers Intelligence: Integrating Expertise of Software Engineering Agents

MobileAIBench: Benchmarking LLMs and LMMs for On-Device Use Cases

ToolScan: A Benchmark For Characterizing Errors In Tool-Use LLMs

xLAM: A Family of Large Action Models to Empower AI Agent Systems

APIGen: Automated Pipeline for Generating Verifiable and Diverse Function-Calling Datasets

AgentOhana: Design Unified Data and Training Pipeline for Effective Agent Learning

BOLAA: Benchmarking and Orchestrating LLM Autonomous Agents

REX: Rapid Exploration and eXploitation for AI Agents

Retroformer: Retrospective Large Language Agents with Policy Gradient Optimization

Autonomous Traffic Signal Control Using Decision Tree

Direction and Gender Classification Using CNN For Side-View Images

SVM-KNN: A Novel Approach to Classification Based on SVM and KNN

Experience

Senior Applied Scientist

Applied ML / Jr. Applied ML Engineer

Graduate Research & Teaching Assistant

Data Science Intern

Education

MS, Computer Science · AI Specialization

BE, Computer Science

Life

Travel

Adventure

Photography

Let's Talk