Skip to content

CodeCutTech/Data-science

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data Science Articles from CodeCut

About CodeCut

CodeCut is the platform that helps data scientists stay productive and current by delivering short, practical code examples that highlight modern tools in action.

It's the resource you wish you had when learning a new libraryβ€”clean, concise, and instantly applicable.

Article Collection

This repository is a curated collection of data science articles from CodeCut, covering topics like MLOps, data management, testing, visualization, and more. Each article comes with practical examples, code repositories, and video tutorials to help you quickly implement these tools and practices in your own projects.

Category Title Article Repository Video
MLOps Goodbye Pip and Poetry. Why UV Might Be All You Need πŸ”—
MLOps Stop Hard Coding in a Data Science Project – Use Configuration Files Instead πŸ”— πŸ”— πŸ”—
MLOps Poetry: A Better Way to Manage Python Dependencies πŸ”— πŸ”—
MLOps Git for Data Scientists: Learn Git through Practical Examples πŸ”— πŸ”—
MLOps 4 pre-commit Plugins to Automate Code Reviewing and Formatting in Python πŸ”— πŸ”— πŸ”—
MLOps How to Structure a Data Science Project for Maintainability πŸ”— πŸ”— πŸ”—
MLOps Build Reliable Machine Learning Pipelines with Continuous Integration πŸ”— πŸ”— πŸ”—
MLOps Automate Machine Learning Deployment with GitHub Actions πŸ”— πŸ”— πŸ”—
MLOps How to Build a Fully Automated Data Drift Detection Pipeline πŸ”— πŸ”— πŸ”—
Data Management Tools Version Control for Data and Models Using DVC πŸ”— πŸ”— πŸ”—
Data Management Tools What is dbt (data build tool) and When should you use it? πŸ”— πŸ”— πŸ”—
Data Management Tools Streamline dbt Model Development with Notebook-Style Workspace πŸ”— πŸ”— πŸ”—
Testing Pytest for Data Scientists πŸ”— πŸ”— πŸ”—
Python Helper Tools Write Clean Python Code Using Pipes πŸ”— πŸ”— πŸ”—
Python Helper Tools Introducing FugueSQL β€” SQL for Pandas, Spark, and Dask DataFrames πŸ”— πŸ”—
Python Helper Tools Fugue and DuckDB: Fast SQL Code in Python πŸ”— πŸ”—
Python Helper Tools Marimo: A Modern Notebook for Reproducible Data Science πŸ”— πŸ”—
Feature Engineering Polars vs. Pandas: A Fast, Multi-Core Alternative for DataFrames πŸ”— πŸ”—
Visualization Top 6 Python Libraries for Visualization: Which one to Use? πŸ”— πŸ”—
Python Python Clean Code: 6 Best Practices to Make Your Python Functions More Readable πŸ”— πŸ”— πŸ”—
Logging and Debugging Loguru: Simple as Print, Flexible as Logging πŸ”— πŸ”— πŸ”—
LLM Enforce Structured Outputs from LLMs with PydanticAI πŸ”— πŸ”—
LLM Run Private AI Workflows with LangChain and Ollama πŸ”— πŸ”—
Speed-up Tools Writing Safer PySpark Queries with Parameters πŸ”— πŸ”—
Speed-up Tools Narwhals: Unified DataFrame Functions for pandas, Polars, and PySpark πŸ”— πŸ”—
Speed-up Tools Eager to Lazy DataFrames with Narwhals πŸ”— πŸ”—
Speed-up Tools Scaling Pandas Workflows with PySpark's Pandas API πŸ”— πŸ”—

Contributing

If you're passionate about data science and want to share your knowledge about open-source tools for data processing and LLM applications in Python, we'd love to have you contribute!

To contribute:

  1. Create a GitHub issue:
    • Click on the "Issues" tab
    • Click "New issue"
    • Select "Article Topic Suggestion" template
    • Fill in the template with your article proposal
  2. Read our contribution guidelines