Skip to content

Project 2: Titanic Dataset Analysis

Overview

This project performs exploratory data analysis and machine learning preparation on the Titanic dataset. The goal is to analyze passenger data and prepare features for predicting survival outcomes.

Dataset

The Titanic dataset is loaded directly from the seaborn library and contains information about passengers including demographics, ticket class, fare, and survival status.

Project Structure

  • ml02_yourname.ipynb: Main Jupyter notebook containing the analysis
  • README.md: Project documentation

Key Steps

1. Data Import and Inspection

  • Load and inspect the Titanic dataset
  • Identify missing values and data types
  • Calculate summary statistics and correlations

2. Data Exploration and Preparation

  • Visualize data patterns using scatter plots, histograms, and count plots
  • Handle missing values through imputation and cleaning
  • Engineer new features like family size

3. Feature Selection

  • Select relevant input features for prediction
  • Define target variable (survival status)
  • Justify feature choices based on exploratory analysis

4. Data Splitting

  • Compare basic train/test split with stratified splitting
  • Evaluate class distribution across splits
  • Assess which method produces better balance

Technologies Used

  • Python 3.x
  • pandas
  • numpy
  • matplotlib
  • seaborn
  • scikit-learn

Requirements

All dependencies from Project 1 are reused for this analysis.