Post

Week 4 — SARSA Agents, Q-Tables & Realistic Reward Modelling

Week 4 — SARSA Agents, Q-Tables & Realistic Reward Modelling

Week 4: SARSA-Driven Agents and Smarter Learning Dynamics

This week, we focused on integrating Reinforcement Learning (RL) into our agent decision-making process using the SARSA algorithm, giving each contributor the ability to learn from past actions and adapt future behavior accordingly.

Why SARSA?

The SARSA algorithm (State–Action–Reward–State–Action) enables agents to learn optimal policies on-policy, meaning they learn while following their current strategy.

Key Formula: [ Q(s, a) \leftarrow Q(s, a) + \alpha \left[ r + \gamma Q(s’, a’) - Q(s, a) \right] ]

This update rule ensures:

  • Continuous adaptation from environment feedback
  • More realistic learning compared to static logic
  • Personalized behavioral evolution per agent

Agent Architecture Overhaul

We introduced three role-driven SARSA agents:

🛠️ Contributor

  • Specializes in bug fixes
  • Gets higher reward for successfully completing bug tasks (+3)

💡 Innovator

  • Excels at feature development
  • Receives higher reward for feature tasks (+3)

📚 Knowledge Curator

  • Strong with documentation
  • Prioritized for docs tasks (+3)

All other successful tasks yield +1, failures -1, and skipping gives 0 — ensuring realistic feedback loops.


Rewards Summary

Agent TypeTask TypeReward (Success)Reward (Failure)Skip
Contributorbug+3-10
Innovatorfeature+3-10
Knowledge Curatordocs+3-10
All (others)any other+1-10

Real-Time Q-Table Learning

Agents now maintain Q-Tables with values for every (state, action) pair:

```json “(‘bug’,)”: { “do_task”: 1.62, “skip_task”: 0.01 }

This post is licensed under CC BY 4.0 by the author.