Autumn Peters autumnmarin

🚀 Exciting News! 🏆

Finished in the Top 5% of the Rainfall Prediction Kaggle competition! 🌧️

I combined CatBoost, feature engineering, oversampling, and Optuna hyperparameter tuning into a powerful pipeline, and even explored embeddings & clustering for deeper insight.

Write-up coming soon – stay tuned! ⭐

👋 Hello, I’m Autumn. I hold a Master’s in Analytics from Georgia Tech, where I developed deep expertise in data science, machine learning, and strategy. I have a strong curiosity for uncovering patterns in complex data, turning insights into action, and communicating results in a way that drives meaningful, high-value impact.

🎒 Backpack Price Modeling and Prediction with ML

This project focuses on predicting product prices in the Backpack Price Prediction Kaggle competition. Rather than applying a basic regression model, the pipeline leverages feature engineering, real-world intuition, and model optimization to improve predictive accuracy in a noisy commercial dataset.

🔹 Key Highlights:

📌 Feature Engineering: Constructed product-specific features such as weight-to-compartment interactions, log transformations for skewed fields, and multi-way categorical combinations (e.g., brand + material + size).
📌 Modeling: Benchmarked XGBoost, LightGBM, and CatBoost, incorporating Optuna for tuning and using a stacked ensemble with Ridge regression for final predictions.
📌 Performance Metrics: Evaluated using RMSE on both notebook and Kaggle leaderboard submissions to track generalization.

📊 Technologies Used:

Python 🐍 Pandas, NumPy, Scikit-learn
Winning Model: Stacked Ensemble (XGBoost + LightGBM + CatBoost)
Feature Engineering & Preprocessing (One-hot encoding, interaction terms, outlier removal)
Hyperparameter Tuning (Optuna, Cross-Validation)
GitHub for Version Control 🛠

🔬 Innovative Methods Used:
While many tabular models focus solely on boosting performance, this project highlights the value of domain-aware feature construction and rigorous evaluation across multiple modeling pipelines. A shared preprocessing module ensured fairness across models and streamlined experimentation.

🔗 Check out the full write-up in the repository!

🏥 Predicting Cirrhosis Patient Outcomes with Multi-Class Classification

This project focuses on predicting patient outcomes in the Cirrhosis Outcome Prediction Kaggle competition. Instead of applying a basic classification model, I utilized feature engineering, domain knowledge, and model optimization techniques to improve multi-class prediction accuracy.

🔹 Key Highlights:

📌 Feature Engineering: Created domain-specific features like bilirubin-to-albumin ratio, log transformations for skeId features, and binary indicators for critical thresholds.
📌 Modeling: Compared XGBoost, LightGBM, and CatBoost, fine-tuning hyperparameters and using stacking ensembles for performance gains.
📌 Performance Metrics: Evaluated using multi-class log loss and cross-validation to ensure model generalization.

📊 Technologies Used:

Python 🐍 Pandas, NumPy, Scikit-learn
Winning Model XGBoost
Feature Engineering & Data Preprocessing (One-hot encoding, ratio calculations, outlier removal)
Hyperparameter Tuning (Randomized Search, Stratified K-Fold Validation)
GitHub for Version Control 🛠

🔬 Innovative Methods Used:
Many classification models for medical datasets rely on direct correlations or minimal preprocessing. This project takes a more data-driven and clinical approach, engineering features that reflect real-world liver disease progression. This improves both interpretability and predictive power.

🔗 Check out the full write-up in the repository!

🏠 A Different Approach to Feature Engineering for Predicting House Prices in Ames

This project is an in-depth analysis of the Ames Housing dataset, where I applied machine learning models to predict house sale prices. Instead of merely running standard models, I leveraged feature engineering, domain knowledge, and advanced model comparison techniques to improve prediction accuracy.

🔹 Key Highlights:

📌 Feature Engineering: Grouped related features to enhance predictive poIr
📌 Modeling: Compared Decision Trees, Random Forests, Gradient Boosting, and Linear Regression
📌 Performance Metrics: Evaluated RMSE and R² to measure model effectiveness

📊 Technologies Used:

Python 🐍 (Pandas, NumPy, Scikit-learn, Matplotlib, Seaborn)
Machine Learning Models (Linear Regression, Gradient Boosting, Random Forest, Decision Trees)
Feature Engineering & Data Preprocessing
GitHub for Version Control 🛠

📢 🔬 Innovative Methods Used: Most approaches to this dataset focus on either raw correlations or brute-force feature selection. My approach leverages real-estate knowledge to construct meaningful categories (e.g., grouping porch types, analyzing basement features separately), which led to better model interpretability and stronger predictions...in some cases. See the write-up where I explain.

🔗 Check out the full write-up in the repository!

🎯 Medley Relay Optimization

This project tackles the challenge of optimizing a medley relay lineup, where swimmers often excel in multiple strokes, creating trade-offs in event selection. Instead of guessing or manually shuffling times, I developed an Excel Solver-based optimization model to automatically determine the fastest possible relay combination.

🔗 Check out the full write-up in the repository!

🚀 ML Decoded

Machine learning terms with simple, intuitive explanations.

🔗 View the repository: ML Decoded on GitHub

🚀 Technical Skills

🤖 Machine Learning & Predictive Modeling

Developing and optimizing models using:
- Linear Regression, Decision Trees, Random Forests
- Gradient Boosting (LightGBM, XGBoost, CatBoost)
- Support Vector Machines (SVM), Neural Networks (TensorFlow, PyTorch)
- Clustering (K-Means, DBSCAN), Principal Component Analysis (PCA)

🧠 Data Analysis & Feature Engineering

Data wrangling, preprocessing, and feature engineering with:
- Pandas, NumPy, Scikit-learn, Statsmodels
- Handling missing values, scaling, encoding categorical variables
- Engineering domain-specific features to enhance model performance

📊 Data Visualization & Storytelling

Communicating insights using:
- Matplotlib, Seaborn, Plotly, Tableau
- Creating interactive and high-impact visualizations for stakeholder engagement

💾 Big Data & Scalable Computing

Working with large-scale datasets using:
- Amazon S3, Google BigQuery, Apache Spark, SQL
- Optimizing storage and query performance for large datasets

📈 Business Intelligence & Data-Driven Strategy

Applying data science for:
- Forecasting, market analysis, and strategic decision-making
- Business intelligence tools: PoIr BI, Looker
- Automating reporting and dashboarding solutions

🎓 Education

Masters of Science in Analytics 🐝
Georgia Institute of Technology 🌐
Web Development Professional Certificate 📜
University of California, Davis 🌟
Bachelor of Science in Business Finance 💹
California State University Sacramento 🌳

📚 Bookshelf:

Probabilistic Machine Learning for Finance and Investing: A Primer to Generative AI with Python

👀 👆 I am listed in the Acknowledgements section 😉
ISLR: Introduction to Statistical Learning
The Elements of Statistical Learning
DEBUGGING
Naked Statistics: Stripping the Dread from the Data

🌐 Let's Connect!

LinkedIn: https://linkedin.com/in/autumnpeters 🔗

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Autumn Peters autumnmarin

Achievements

Achievements

Block or report autumnmarin

🚀 Exciting News! 🏆

🎒 Backpack Price Modeling and Prediction with ML

🏥 Predicting Cirrhosis Patient Outcomes with Multi-Class Classification

🏠 A Different Approach to Feature Engineering for Predicting House Prices in Ames

🎯 Medley Relay Optimization

🚀 ML Decoded

🚀 Technical Skills

🤖 Machine Learning & Predictive Modeling

🧠 Data Analysis & Feature Engineering

📊 Data Visualization & Storytelling

💾 Big Data & Scalable Computing

📈 Business Intelligence & Data-Driven Strategy

🎓 Education

📚 Bookshelf:

🌐 Let's Connect!

Pinned Loading

Uh oh!