Week 4 — SARSA Agents, Q-Tables & Realistic Reward Modelling
Week 4: SARSA-Driven Agents and Smarter Learning Dynamics
This week, we focused on integrating Reinforcement Learning (RL) into our agent decision-making process using the SARSA algorithm, giving each contributor the ability to learn from past actions and adapt future behavior accordingly.
Why SARSA?
The SARSA algorithm (State–Action–Reward–State–Action) enables agents to learn optimal policies on-policy, meaning they learn while following their current strategy.
Key Formula: [ Q(s, a) \leftarrow Q(s, a) + \alpha \left[ r + \gamma Q(s’, a’) - Q(s, a) \right] ]
This update rule ensures:
- Continuous adaptation from environment feedback
- More realistic learning compared to static logic
- Personalized behavioral evolution per agent
Agent Architecture Overhaul
We introduced three role-driven SARSA agents:
🛠️ Contributor
- Specializes in bug fixes
- Gets higher reward for successfully completing bug tasks (+3)
💡 Innovator
- Excels at feature development
- Receives higher reward for feature tasks (+3)
📚 Knowledge Curator
- Strong with documentation
- Prioritized for docs tasks (+3)
All other successful tasks yield +1, failures -1, and skipping gives 0 — ensuring realistic feedback loops.
Rewards Summary
| Agent Type | Task Type | Reward (Success) | Reward (Failure) | Skip |
|---|---|---|---|---|
| Contributor | bug | +3 | -1 | 0 |
| Innovator | feature | +3 | -1 | 0 |
| Knowledge Curator | docs | +3 | -1 | 0 |
| All (others) | any other | +1 | -1 | 0 |
Real-Time Q-Table Learning
Agents now maintain Q-Tables with values for every (state, action) pair:
```json “(‘bug’,)”: { “do_task”: 1.62, “skip_task”: 0.01 }