Finished in the Top 5% of the Rainfall Prediction Kaggle competition! π§οΈ
I combined CatBoost, feature engineering, oversampling, and Optuna hyperparameter tuning into a powerful pipeline, and even explored embeddings & clustering for deeper insight.
Write-up coming soon β stay tuned! β
π Hello, Iβm Autumn. I hold a Masterβs in Analytics from Georgia Tech, where I developed deep expertise in data science, machine learning, and strategy. I have a strong curiosity for uncovering patterns in complex data, turning insights into action, and communicating results in a way that drives meaningful, high-value impact.
![]() |
![]() |
This project focuses on predicting product prices in the Backpack Price Prediction Kaggle competition. Rather than applying a basic regression model, the pipeline leverages feature engineering, real-world intuition, and model optimization to improve predictive accuracy in a noisy commercial dataset.
πΉ Key Highlights:
π Feature Engineering: Constructed product-specific features such as weight-to-compartment interactions, log transformations for skewed fields, and multi-way categorical combinations (e.g., brand + material + size).
π Modeling: Benchmarked XGBoost, LightGBM, and CatBoost, incorporating Optuna for tuning and using a stacked ensemble with Ridge regression for final predictions.
π Performance Metrics: Evaluated using RMSE on both notebook and Kaggle leaderboard submissions to track generalization.
π Technologies Used:
- Python π Pandas, NumPy, Scikit-learn
- Winning Model: Stacked Ensemble (XGBoost + LightGBM + CatBoost)
- Feature Engineering & Preprocessing (One-hot encoding, interaction terms, outlier removal)
- Hyperparameter Tuning (Optuna, Cross-Validation)
- GitHub for Version Control π
π¬ Innovative Methods Used:
While many tabular models focus solely on boosting performance, this project highlights the value of domain-aware feature construction and rigorous evaluation across multiple modeling pipelines. A shared preprocessing module ensured fairness across models and streamlined experimentation.
π Check out the full write-up in the repository!
![]() |
![]() |
This project focuses on predicting patient outcomes in the Cirrhosis Outcome Prediction Kaggle competition. Instead of applying a basic classification model, I utilized feature engineering, domain knowledge, and model optimization techniques to improve multi-class prediction accuracy.
πΉ Key Highlights:
π Feature Engineering: Created domain-specific features like bilirubin-to-albumin ratio, log transformations for skeId features, and binary indicators for critical thresholds.
π Modeling: Compared XGBoost, LightGBM, and CatBoost, fine-tuning hyperparameters and using stacking ensembles for performance gains.
π Performance Metrics: Evaluated using multi-class log loss and cross-validation to ensure model generalization.
π Technologies Used:
- Python π Pandas, NumPy, Scikit-learn
- Winning Model XGBoost
- Feature Engineering & Data Preprocessing (One-hot encoding, ratio calculations, outlier removal)
- Hyperparameter Tuning (Randomized Search, Stratified K-Fold Validation)
- GitHub for Version Control π
π¬ Innovative Methods Used:
Many classification models for medical datasets rely on direct correlations or minimal preprocessing. This project takes a more data-driven and clinical approach, engineering features that reflect real-world liver disease progression. This improves both interpretability and predictive power.
π Check out the full write-up in the repository!
![]() |
![]() |
This project is an in-depth analysis of the Ames Housing dataset, where I applied machine learning models to predict house sale prices. Instead of merely running standard models, I leveraged feature engineering, domain knowledge, and advanced model comparison techniques to improve prediction accuracy.
πΉ Key Highlights:
- π Feature Engineering: Grouped related features to enhance predictive poIr
- π Modeling: Compared Decision Trees, Random Forests, Gradient Boosting, and Linear Regression
- π Performance Metrics: Evaluated RMSE and RΒ² to measure model effectiveness
π Technologies Used:
- Python π (Pandas, NumPy, Scikit-learn, Matplotlib, Seaborn)
- Machine Learning Models (Linear Regression, Gradient Boosting, Random Forest, Decision Trees)
- Feature Engineering & Data Preprocessing
- GitHub for Version Control π
π’ π¬ Innovative Methods Used: Most approaches to this dataset focus on either raw correlations or brute-force feature selection. My approach leverages real-estate knowledge to construct meaningful categories (e.g., grouping porch types, analyzing basement features separately), which led to better model interpretability and stronger predictions...in some cases. See the write-up where I explain.
π Check out the full write-up in the repository!
![]() |
![]() |
This project tackles the challenge of optimizing a medley relay lineup, where swimmers often excel in multiple strokes, creating trade-offs in event selection. Instead of guessing or manually shuffling times, I developed an Excel Solver-based optimization model to automatically determine the fastest possible relay combination.
π Check out the full write-up in the repository!
Machine learning terms with simple, intuitive explanations.
π View the repository: ML Decoded on GitHub
- Developing and optimizing models using:
- Linear Regression, Decision Trees, Random Forests
- Gradient Boosting (LightGBM, XGBoost, CatBoost)
- Support Vector Machines (SVM), Neural Networks (TensorFlow, PyTorch)
- Clustering (K-Means, DBSCAN), Principal Component Analysis (PCA)
- Data wrangling, preprocessing, and feature engineering with:
- Pandas, NumPy, Scikit-learn, Statsmodels
- Handling missing values, scaling, encoding categorical variables
- Engineering domain-specific features to enhance model performance
- Communicating insights using:
- Matplotlib, Seaborn, Plotly, Tableau
- Creating interactive and high-impact visualizations for stakeholder engagement
- Working with large-scale datasets using:
- Amazon S3, Google BigQuery, Apache Spark, SQL
- Optimizing storage and query performance for large datasets
- Applying data science for:
- Forecasting, market analysis, and strategic decision-making
- Business intelligence tools: PoIr BI, Looker
- Automating reporting and dashboarding solutions
-
Masters of Science in Analytics π
Georgia Institute of Technology π -
Web Development Professional Certificate π
University of California, Davis π -
Bachelor of Science in Business Finance πΉ
California State University Sacramento π³
-
Probabilistic Machine Learning for Finance and Investing: A Primer to Generative AI with Python
π π I am listed in the Acknowledgements section π
- LinkedIn: https://linkedin.com/in/autumnpeters π












