17-313: Foundations of Software Engineering
Recitation 7: Machine Learning
Setup Instructions (10 min):
Go to this GitHub repo
- Click the code button, switch to codespaces, and click green create codespace on main button
Overview
During this recitation, students will have the opportunity to play with various machine learning frameworks and tools (e.g., such as pandas, LIME, and Jupyter Notebooks.) Students will work with a partner.
Context
We can use the Titanic dataset to make predictions on whether or not passengers would survive given features in the dataset. We saw how gender was one feature that predicted if a passenger would survive, but during class several other ideas were proposed as well. For example, one might consider if fare paid was a good proxy for predicting if a passenger would survive.
Activity 1: Examine the Titanic dataset (10 mins)
- This dataset contains detailed information on the passengers aboard the Titanic. Our goal is to create a model able to predict whether a passenger will survive. However, before we start training our machine learning model, let us first explore the dataset. Use the pandas methods we went over earlier and explore what features are in the dataset.
- Then choose one feature and explore its correlation with passenger survival rate.
- Hypothesize an explanation for why it has such effects.
Activity 2: Train your model (20 min)
Using what we have learned earlier about decision trees and random forest classifiers, work with your partner to train your own model to predict whether a passenger with given features will survive. Be sure to calculate the accuracy of your model using the given test dataset.
Activity 3: Present your findings to the class (10 min)
Each partner pair should share how you trained your model (e.g. what features you considered) and what accuracy level you were able to obtain on the test set