Skip to content
#

text-preprocessing

Here are 8 public repositories matching this topic...

Language: Jupyter Notebook
Filter by language

A comprehensive set of Jupyter notebooks that take you from NLP fundamentals to advanced techniques. Covers text preprocessing, POS tagging, NER, sentiment analysis (with VADER), text classification, word embeddings, and transformer models like BERT. Built with real-world datasets using NLTK, spaCy, scikit-learn, and Hugging Face Transformers.

  • Updated Oct 11, 2025
  • Python

This repository contains a Jupyter notebook implementing the Multinomial Naive Bayes algorithm from scratch for an email classification task of SPAM or HAM. The notebook also includes a comparison of the results obtained with the scikit-learn implementation of Multinomial Naive Bayes.

  • Updated May 26, 2023
  • Jupyter Notebook

Corpus building and NLP analysis for Persian Telegram channel messages. Includes a notebook to parse and clean channel_messages.json, stopword normalization, word cloud with a silhouette mask, and CSV outputs (filtered_messages.csv, final_results.csv). Reproducible pipeline for EDA and basic modeling.

  • Updated Oct 14, 2025
  • Jupyter Notebook

The repository contains notebooks created for collecting and preprocessing the corpus of diary entries and for experiments on creating models for predicting gender, age groups of authors and the time period of text creation.

  • Updated Jun 12, 2024
  • Jupyter Notebook

Improve this page

Add a description, image, and links to the text-preprocessing topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the text-preprocessing topic, visit your repo's landing page and select "manage topics."

Learn more