Week 4 — SARSA Agents, Q-Tables & Realistic Reward Modelling

Posted Jun 26, 2025 Updated Jun 27, 2025

1 min read

Week 4: SARSA-Driven Agents and Smarter Learning Dynamics

This week, we focused on integrating Reinforcement Learning (RL) into our agent decision-making process using the SARSA algorithm, giving each contributor the ability to learn from past actions and adapt future behavior accordingly.

Why SARSA?

The SARSA algorithm (State–Action–Reward–State–Action) enables agents to learn optimal policies on-policy, meaning they learn while following their current strategy.

Key Formula: [ Q(s, a) \leftarrow Q(s, a) + \alpha \left[ r + \gamma Q(s’, a’) - Q(s, a) \right] ]

This update rule ensures:

Continuous adaptation from environment feedback
More realistic learning compared to static logic
Personalized behavioral evolution per agent

Agent Architecture Overhaul

We introduced three role-driven SARSA agents:

🛠️ Contributor

Specializes in bug fixes
Gets higher reward for successfully completing bug tasks (+3)

💡 Innovator

Excels at feature development
Receives higher reward for feature tasks (+3)

📚 Knowledge Curator

Strong with documentation
Prioritized for docs tasks (+3)

All other successful tasks yield +1, failures -1, and skipping gives 0 — ensuring realistic feedback loops.

Rewards Summary

Agent Type	Task Type	Reward (Success)	Reward (Failure)
Contributor	bug	+3	-1
Innovator	feature	+3	-1
Knowledge Curator	docs	+3	-1
All (others)	any other	+1	-1

Real-Time Q-Table Learning

Agents now maintain Q-Tables with values for every (state, action) pair:

```json “(‘bug’,)”: { “do_task”: 1.62, “skip_task”: 0.01 }

blog

This post is licensed under CC BY 4.0 by the author.