Lesson 4: Reinforcement Learning (1 hour)
Learning Objectives
- Understand what reinforcement learning is
- Recognize reinforcement learning examples
- Understand the concepts of agents, environments, rewards, and actions
- Experience reinforcement learning through interactive demos
Materials Needed
- Internet connection
- Reinforcement learning demos/games
- Examples of RL in action
- Student notebooks
- Optional: Simple game or simulation
Time Breakdown
- Review previous learning types (5 min)
- Introduction to reinforcement learning (15 min)
- RL concepts: Agent, Environment, Rewards (15 min)
- Hands-on: RL demos and games (20 min)
- Wrap-up (5 min)
Activities
1. Review Previous Learning Types (5 min)
- Supervised learning: Learning with labeled examples
- Unsupervised learning: Finding patterns without labels
- Today: Learning through trial and error with rewards
2. Introduction to Reinforcement Learning (15 min)
- Definition: Reinforcement learning learns by interacting with an environment and receiving rewards or penalties
- Key Concept: It's like training a pet - you reward good behavior and it learns
- Key Idea: Agent tries actions, gets feedback (reward/punishment), learns what works
Real-World Examples:
- Game-playing AI (chess, Go, video games)
- Self-driving cars (reward: staying on road, penalty: crash)
- Robot learning to walk
- Recommendation systems (reward: user clicks, penalty: user ignores)
- Trading algorithms (reward: profit, penalty: loss)
Why RL?
- When you can't provide labeled data
- When the best action depends on the situation
- When you need to learn through experience
- When exploration is important
3. RL Concepts: Agent, Environment, Rewards (15 min)
The Four Key Components:
-
Agent: The learner (AI system)
- Example: Game-playing AI, robot, self-driving car
- Makes decisions and takes actions
-
Environment: The world the agent interacts with
- Example: Game board, road, room
- Changes based on agent's actions
-
Actions: What the agent can do
- Example: Move pieces, steer car, move robot arm
- Agent chooses actions to take
-
Rewards: Feedback (positive or negative)
- Positive reward: Good outcome (score points, reach goal)
- Negative reward (penalty): Bad outcome (lose points, crash)
- Agent learns to maximize rewards
The Learning Process:
- Agent observes environment
- Agent chooses action
- Environment responds
- Agent receives reward/penalty
- Agent learns from experience
- Repeat - agent gets better over time
Simple Analogy: Training a Dog
- Agent: Dog
- Environment: Living room
- Actions: Sit, stay, come, fetch
- Rewards: Treats (positive), "No" (negative)
- Dog learns which actions get treats
Game Example: Pac-Man
- Agent: Pac-Man AI
- Environment: Game maze
- Actions: Move up, down, left, right
- Rewards: +10 for eating dot, +200 for eating ghost, -1 for each step (encourages efficiency)
- Penalties: -500 if caught by ghost
- AI learns best strategies through playing
4. Hands-On: RL Demos and Games (20 min)
Activity 1: Simple RL Demo (if available online)
- Show reinforcement learning visualization
- Watch agent learn to navigate maze or play game
- Observe: Starts poorly, improves over time
- Discuss: What is the agent learning? What are the rewards?
Activity 2: Human RL Simulation
- Game: "Find the Treasure"
- Draw simple grid/maze on board
- One student is "agent"
- Other students give rewards (clap for good moves, "boo" for bad)
- Agent learns best path
- Compare: First attempt vs. later attempts
Activity 3: Online RL Games (if available)
- Google's "Snake Game" with RL (if accessible)
- Or other browser-based RL demos
- Students watch agent learn
- Discuss observations
Activity 4: Design Your Own RL Scenario
- In pairs, students design simple RL problem:
- Agent: What is learning?
- Environment: Where is it?
- Actions: What can it do?
- Rewards: What are the goals?
- Share examples with class
- Examples: Robot learning to sort objects, AI learning to recommend movies, etc.
Reflection Questions:
- How is RL different from supervised learning?
- Why might RL be useful for games?
- What makes a good reward system?
- How does the agent balance exploring (trying new things) vs. exploiting (using what works)?
5. Wrap-Up (5 min)
- Three types of ML: Supervised (with labels), Unsupervised (find patterns), Reinforcement (learn from rewards)
- When would you use each type?
- Preview: Next lesson - Putting it all together, training vs. testing
Differentiation Strategies
- Younger students: Focus on game examples, simpler analogies, hands-on activities
- Older students: Explore more complex RL concepts, research AlphaGo or similar, analyze reward functions
- Struggling learners: Provide more structure, use very simple examples, more guidance
- Advanced learners: Research Q-learning, explore policy gradients, analyze exploration vs. exploitation trade-offs
Assessment
- Understanding of reinforcement learning concepts
- Participation in RL activities
- Quality of designed RL scenarios
- Ability to distinguish the three ML types