Introduction to AI · Lesson 4: Reinforcement Learning (1 hour)

Lesson 4: Reinforcement Learning (1 hour)

Definition: Reinforcement learning learns by interacting with an environment and receiving rewards or penalties
Key Concept: It's like training a pet - you reward good behavior and it learns
Key Idea: Agent tries actions, gets feedback (reward/punishment), learns what works

Real-World Examples:

Why RL?

The Four Key Components:

Agent: The learner (AI system)
- Example: Game-playing AI, robot, self-driving car
- Makes decisions and takes actions
Environment: The world the agent interacts with
- Example: Game board, road, room
- Changes based on agent's actions
Actions: What the agent can do
- Example: Move pieces, steer car, move robot arm
- Agent chooses actions to take
Rewards: Feedback (positive or negative)
- Positive reward: Good outcome (score points, reach goal)
- Negative reward (penalty): Bad outcome (lose points, crash)
- Agent learns to maximize rewards

The Learning Process:

Simple Analogy: Training a Dog

Game Example: Pac-Man

Agent: Pac-Man AI
Environment: Game maze
Actions: Move up, down, left, right
Rewards: +10 for eating dot, +200 for eating ghost, -1 for each step (encourages efficiency)
Penalties: -500 if caught by ghost
AI learns best strategies through playing

Activity 1: Simple RL Demo (if available online)

Activity 2: Human RL Simulation

Game: "Find the Treasure"
- Draw simple grid/maze on board
- One student is "agent"
- Other students give rewards (clap for good moves, "boo" for bad)
- Agent learns best path
- Compare: First attempt vs. later attempts

Activity 3: Online RL Games (if available)

Activity 4: Design Your Own RL Scenario

In pairs, students design simple RL problem:
- Agent: What is learning?
- Environment: Where is it?
- Actions: What can it do?
- Rewards: What are the goals?
Share examples with class
Examples: Robot learning to sort objects, AI learning to recommend movies, etc.

Reflection Questions:

How is RL different from supervised learning?
Why might RL be useful for games?
What makes a good reward system?
How does the agent balance exploring (trying new things) vs. exploiting (using what works)?

Three types of ML: Supervised (with labels), Unsupervised (find patterns), Reinforcement (learn from rewards)
When would you use each type?
Preview: Next lesson - Putting it all together, training vs. testing

Younger students: Focus on game examples, simpler analogies, hands-on activities
Older students: Explore more complex RL concepts, research AlphaGo or similar, analyze reward functions
Struggling learners: Provide more structure, use very simple examples, more guidance
Advanced learners: Research Q-learning, explore policy gradients, analyze exploration vs. exploitation trade-offs