🕹️ Reinforcement Learning 🤖

View on GitHub

Play Tic-Tac-Toe Against a Q-Learning Agent

This project implements a console-based Tic Tac Toe game where a human plays against an AI that learns with every game through Q-learning.

Read More

Reinforcement learning is a branch of machine learning in which an agent learns to make decisions by interacting with an environment and receiving feedback in the form of rewards or penalties. Unlike supervised learning, where a model trains on labeled examples, an RL agent discovers optimal behavior through trial and error. This makes it particularly well suited for problems involving sequential decision-making, game playing, and autonomous control. The approach has powered breakthroughs from AlphaGo's mastery of the board game Go to robotic manipulation in warehouse environments.

The projects in this section focus on tabular Q-learning, one of the foundational algorithms in reinforcement learning. Q-learning maintains a table of state-action values and updates them using the Bellman equation after each interaction with the environment. You will see how an agent can start with no knowledge of a game and gradually converge on a winning strategy by balancing exploration of new moves with exploitation of known good ones, a tradeoff controlled by the epsilon-greedy policy. The Tic-Tac-Toe environment provides a compact yet complete testbed: the state space is small enough to visualize the full Q-table, yet rich enough to demonstrate convergence, policy improvement, and the effect of opponent strategies on learned behavior.

Beyond the core algorithm, these projects explore reward shaping, the practice of designing reward signals that guide the agent toward desirable behavior more efficiently. You will also examine how hyperparameters such as the learning rate, discount factor, and exploration decay schedule affect convergence speed and final performance. The notebook includes experiments that vary each parameter independently and plot the resulting win-rate curves, so you can build intuition about how these choices interact in practice.

Each notebook includes fully commented Python code, training logs, and visualizations of the learning curve so you can reproduce and extend the experiments on your own. The code is structured to make it straightforward to swap in a different game environment or modify the reward function, giving you a reusable foundation for exploring more advanced reinforcement learning ideas such as deep Q-networks (DQN) and policy gradient methods. All dependencies are pinned and the notebooks run on Google Colab with a single click, so you can start experimenting immediately without any local setup. Whether you are new to reinforcement learning or looking to solidify your understanding of Q-learning fundamentals, these projects offer a practical, hands-on path from theory to working code.