Latest AI Papers On LLMs And Reinforcement Learning - July 8, 2025
Welcome to the latest roundup of cutting-edge research papers in the fields of Large Language Models (LLMs) and Reinforcement Learning (RL), published as of July 3, 2025. This compilation, brought to you by CoderBak and DailyArXiv, delves into the most recent advancements, innovative approaches, and insightful findings in these rapidly evolving domains. For an enhanced reading experience and access to more papers, be sure to visit the Github page.
Large Language Models
The field of large language models (LLMs) continues to surge forward, pushing the boundaries of what's possible in natural language processing and artificial intelligence. This week's selection of papers highlights a diverse range of topics, from improving model adaptation and reasoning capabilities to addressing safety concerns and exploring novel applications. Understanding the latest developments in LLMs is crucial for researchers, developers, and anyone interested in the future of AI.
Bootstrapping Grounded Chain-of-Thought in Multimodal LLMs for Data-Efficient Model Adaptation
This intriguing paper, accepted by ICCV2025, explores data-efficient model adaptation in multimodal LLMs through a technique called Bootstrapping Grounded Chain-of-Thought. The authors propose a method to enhance the reasoning capabilities of LLMs by grounding them in visual data, allowing for more accurate and context-aware responses. This approach is particularly relevant in scenarios where training data is limited, as it enables models to learn more effectively from fewer examples. The implications of this research are significant, as it could lead to the development of more robust and adaptable LLMs that can handle a wide range of tasks and modalities.
Requirements Elicitation Follow-Up Question Generation
Effective requirements elicitation is crucial for successful software development. This paper, accepted at the 33rd IEEE International Requirements Engineering 2025, addresses the challenge of generating relevant follow-up questions to clarify and refine user needs. The authors present a novel approach to Requirements Elicitation Follow-Up Question Generation, leveraging the power of LLMs to automate and improve this critical process. The paper, spanning 13 pages with 2 figures, delves into the technical details of their method and presents empirical results demonstrating its effectiveness. This research has the potential to streamline the requirements gathering process, leading to more efficient and effective software development workflows.
MOTIF: Modular Thinking via Reinforcement Fine-tuning in LLMs
Reasoning is a core capability for any intelligent system. The paper on MOTIF: Modular Thinking via Reinforcement Fine-tuning in LLMs introduces a novel approach to enhance the reasoning abilities of LLMs. The key idea is to fine-tune LLMs using reinforcement learning, guiding them to adopt a modular thinking process. By breaking down complex problems into smaller, more manageable modules, LLMs can reason more effectively and arrive at accurate solutions. This research has significant implications for tasks that require logical deduction, problem-solving, and decision-making.
Visual Contextual Attack: Jailbreaking MLLMs with Image-Driven Context Injection
As LLMs become increasingly integrated into various applications, ensuring their safety and security is paramount. This paper, spanning 16 pages, delves into the vulnerabilities of Multimodal Large Language Models (MLLMs) by introducing a novel attack strategy called Visual Contextual Attack. The authors demonstrate how image-driven context injection can be used to jailbreak MLLMs, potentially leading to undesirable or harmful outputs. This research highlights the importance of developing robust defense mechanisms to protect MLLMs from malicious attacks and ensure their responsible use.
LLM-Driven Treatment Effect Estimation Under Inference Time Text Confounding
Estimating treatment effects accurately is crucial in various domains, including healthcare and social sciences. This paper tackles the challenge of LLM-Driven Treatment Effect Estimation Under Inference Time Text Confounding. The authors propose a method that leverages LLMs to mitigate the effects of text confounding, where textual information can bias the estimation of treatment effects. This research has the potential to improve the reliability and validity of treatment effect estimations, leading to more informed decision-making in various fields.
Improved Unbiased Watermark for Large Language Models
Watermarking is a crucial technique for detecting and preventing the misuse of LLMs. This paper, presented at ACL 2025 Main Conference, introduces an Improved Unbiased Watermark for Large Language Models. The authors present a novel watermarking scheme that is both effective and unbiased, ensuring that the watermark does not compromise the quality or performance of the LLM. This research contributes to the ongoing efforts to develop robust and reliable methods for identifying and tracking the outputs of LLMs.
StepHint: Multi-level Stepwise Hints Enhance Reinforcement Learning to Reason
This paper focuses on enhancing the reasoning capabilities of reinforcement learning agents by providing multi-level stepwise hints. The StepHint: Multi-level Stepwise Hints Enhance Reinforcement Learning to Reason approach allows agents to learn more effectively by breaking down complex tasks into smaller, more manageable steps. This research has implications for developing AI systems that can reason and solve problems in a more human-like manner.
From Web Search towards Agentic Deep Research: Incentivizing Search with Reasoning Agents
The ability to conduct deep research is crucial for advancing knowledge and solving complex problems. This paper explores the concept of Agentic Deep Research, where reasoning agents are used to conduct web searches and synthesize information. The authors propose a framework for incentivizing search with reasoning agents, enabling them to effectively explore the vast amount of information available on the web. This research has the potential to revolutionize the way we conduct research, making it more efficient and comprehensive.
ExPO: Unlocking Hard Reasoning with Self-Explanation-Guided Reinforcement Learning
Hard reasoning tasks often require agents to explain their reasoning process. This paper introduces ExPO: Unlocking Hard Reasoning with Self-Explanation-Guided Reinforcement Learning, a novel approach that leverages self-explanations to guide reinforcement learning. By encouraging agents to explain their reasoning, this method can improve their ability to solve complex problems and make accurate decisions. This research has implications for developing AI systems that are not only intelligent but also transparent and interpretable.
Large Language Model-Driven Closed-Loop UAV Operation with Semantic Observations
Unmanned Aerial Vehicles (UAVs) have the potential to transform various industries, from logistics to surveillance. This paper, spanning 9 pages with 7 figures, explores the use of LLMs to drive closed-loop UAV operations. The authors propose a system that uses semantic observations to guide the UAV's actions, enabling it to perform complex tasks in dynamic environments. This research demonstrates the potential of LLMs to enhance the autonomy and capabilities of UAVs.
SynapseRoute: An Auto-Route Switching Framework on Dual-State Large Language Model
This paper introduces SynapseRoute: An Auto-Route Switching Framework on Dual-State Large Language Model, a novel approach for improving the efficiency and robustness of LLMs. The framework leverages a dual-state LLM architecture, allowing for dynamic routing of information and improved performance. This research has implications for optimizing the design and implementation of LLMs.
Multimodal Mathematical Reasoning with Diverse Solving Perspective
Mathematical reasoning is a challenging task for AI systems. This paper, spanning 8 pages, explores Multimodal Mathematical Reasoning with Diverse Solving Perspective. The authors propose a method that leverages multimodal information and diverse problem-solving perspectives to enhance mathematical reasoning abilities. This research has implications for developing AI systems that can solve complex mathematical problems.
Is Reasoning All You Need? Probing Bias in the Age of Reasoning Language Models
While reasoning is a crucial capability for LLMs, it's important to ensure that these models are not biased. This paper delves into the issue of bias in reasoning language models, questioning whether reasoning is all that's needed for fair and ethical AI systems. The authors propose methods for probing bias and mitigating its effects, highlighting the importance of responsible AI development.
From Long Videos to Engaging Clips: A Human-Inspired Video Editing Framework with Multimodal Narrative Understanding
Video editing is a complex task that requires understanding narrative structure and visual content. This paper presents a Human-Inspired Video Editing Framework with Multimodal Narrative Understanding, leveraging LLMs to automate and improve the video editing process. The framework uses multimodal information to understand the narrative of a video and generate engaging clips, demonstrating the potential of LLMs in creative applications.
GPAS: Accelerating Convergence of LLM Pretraining via Gradient-Preserving Activation Scaling
Pretraining LLMs is a computationally intensive process. This paper introduces GPAS: Accelerating Convergence of LLM Pretraining via Gradient-Preserving Activation Scaling, a novel technique for accelerating the convergence of LLM pretraining. The authors propose a gradient-preserving activation scaling method that improves training efficiency without compromising model performance. This research has implications for reducing the cost and time required to train LLMs.
Reinforcement Learning
Reinforcement learning (RL) is another dynamic field within AI, focused on training agents to make decisions in an environment to maximize a reward. This collection of papers showcases the breadth and depth of current RL research, covering topics from modular thinking and multi-level hints to dynamic pricing and real-world applications. Keeping up with the latest in RL is essential for researchers and practitioners seeking to develop intelligent agents for diverse applications.
MOTIF: Modular Thinking via Reinforcement Fine-tuning in LLMs
As previously mentioned, this paper also falls under the category of Reinforcement Learning, as it leverages RL techniques to fine-tune LLMs for modular thinking. This highlights the growing intersection between LLMs and RL, where RL is used to improve the reasoning and decision-making capabilities of LLMs.
StepHint: Multi-level Stepwise Hints Enhance Reinforcement Learning to Reason
This paper, also discussed in the LLM section, is relevant to Reinforcement Learning as it proposes a method to enhance RL agents' reasoning abilities by providing multi-level stepwise hints. This approach can be particularly useful in complex environments where agents need to learn intricate strategies.
ExPO: Unlocking Hard Reasoning with Self-Explanation-Guided Reinforcement Learning
This paper, also featured in the LLM section, showcases the use of self-explanations to guide reinforcement learning, demonstrating a novel approach to improve reasoning in RL agents. The ability of agents to explain their decisions is crucial for building trust and understanding in AI systems.
Generalizing Verifiable Instruction Following
Ensuring that RL agents follow instructions correctly is crucial for their safe and reliable deployment. This paper, spanning 11 pages, addresses the challenge of Generalizing Verifiable Instruction Following. The authors propose a method that enables RL agents to generalize their ability to follow instructions across different environments and tasks. This research has implications for developing RL systems that can be used in real-world applications where instruction following is critical.
Multimodal Mathematical Reasoning with Diverse Solving Perspective
This paper, previously mentioned in the LLM section, also contributes to the field of Reinforcement Learning by exploring how RL can be used to enhance mathematical reasoning abilities. The authors propose a method that leverages multimodal information and diverse problem-solving perspectives to train RL agents to solve complex mathematical problems.
A Forget-and-Grow Strategy for Deep Reinforcement Learning Scaling in Continuous Control
Scaling deep reinforcement learning algorithms to complex continuous control tasks is a significant challenge. This paper introduces A Forget-and-Grow Strategy for Deep Reinforcement Learning Scaling in Continuous Control. The authors propose a novel approach that allows RL agents to selectively forget irrelevant information and focus on learning new skills, enabling them to scale to more complex tasks.
Multi-Agent Reinforcement Learning for Dynamic Pricing in Supply Chains: Benchmarking Strategic Agent Behaviours under Realistically Simulated Market Conditions
Dynamic pricing in supply chains is a complex problem that can be effectively addressed using multi-agent reinforcement learning. This paper explores the application of Multi-Agent Reinforcement Learning for Dynamic Pricing in Supply Chains. The authors benchmark strategic agent behaviors under realistically simulated market conditions, providing insights into the effectiveness of RL for optimizing pricing strategies in supply chains.
RLHGNN: Reinforcement Learning-driven Heterogeneous Graph Neural Network for Next Activity Prediction in Business Processes
Predicting the next activity in business processes is crucial for process optimization and automation. This paper introduces RLHGNN: Reinforcement Learning-driven Heterogeneous Graph Neural Network for Next Activity Prediction in Business Processes. The authors propose a novel approach that combines reinforcement learning and heterogeneous graph neural networks to predict the next activity in business processes, spanning 15 pages with 7 figures.
TUC-PPO: Team Utility-Constrained Proximal Policy Optimization for Spatial Public Goods Games
This paper focuses on the problem of spatial public goods games, where agents need to cooperate to achieve a common goal. The authors propose TUC-PPO: Team Utility-Constrained Proximal Policy Optimization, a novel RL algorithm that encourages agents to cooperate while respecting team utility constraints. This research has implications for developing RL systems that can effectively solve cooperative tasks.
On Efficient Bayesian Exploration in Model-Based Reinforcement Learning
Exploration is a crucial aspect of reinforcement learning, allowing agents to discover new and potentially rewarding strategies. This paper delves into the topic of Efficient Bayesian Exploration in Model-Based Reinforcement Learning. The authors propose a method that leverages Bayesian techniques to efficiently explore the environment, leading to faster and more effective learning.
VRAgent-R1: Boosting Video Recommendation with MLLM-based Agents via Reinforcement Learning
Video recommendation systems are an important application of reinforcement learning. This paper introduces VRAgent-R1: Boosting Video Recommendation with MLLM-based Agents via Reinforcement Learning. The authors propose a novel approach that combines multi-modal large language models with reinforcement learning to improve video recommendations.
Direct Preference Optimization Using Sparse Feature-Level Constraints
This paper explores the use of sparse feature-level constraints in direct preference optimization, a technique for learning from human preferences. The authors demonstrate that using sparse constraints can improve the efficiency and effectiveness of preference learning, contributing to the development of more user-centric AI systems.
SoccerDiffusion: Toward Learning End-to-End Humanoid Robot Soccer from Gameplay Recordings
Humanoid robot soccer is a challenging task that requires robots to learn complex motor skills and strategic decision-making. This paper introduces SoccerDiffusion: Toward Learning End-to-End Humanoid Robot Soccer from Gameplay Recordings. The authors propose a method that leverages diffusion models to learn soccer skills from gameplay recordings, paving the way for more capable and autonomous humanoid robots.
Offline Reinforcement Learning for Learning to Dispatch for Job Shop Scheduling
Job shop scheduling is a classic optimization problem with applications in manufacturing and logistics. This paper, accepted in Machine Learning, explores the use of Offline Reinforcement Learning for Learning to Dispatch for Job Shop Scheduling. The authors demonstrate that offline RL can be used to learn effective dispatching policies for job shop scheduling, leading to improved efficiency and productivity.
Self-Guided Process Reward Optimization with Redefined Step-wise Advantage for Process Reinforcement Learning
This paper focuses on the problem of process reinforcement learning, where agents need to learn optimal policies for sequential decision-making. The authors propose Self-Guided Process Reward Optimization with Redefined Step-wise Advantage, a novel approach that improves the efficiency and effectiveness of process reinforcement learning.
This concludes our roundup of the latest research papers in Large Language Models and Reinforcement Learning. Stay tuned for more updates on the cutting edge of AI research!