Skip to content

A collection of useful methods and tools for handling and processing data

Notifications You must be signed in to change notification settings

horacemtb/data-engineering-kit

Repository files navigation

data-engineering-kit

A collection of useful methods and tools for handling and processing data

de-job-market-analysis

Analysis for data engineering roles on the Russian and Worldwide job markets conducted using webpage parsing, extraction of role information and top skills via spacy, and generation of summaries using the TextRank algorithm.

create-vm-install-docker

Create a virtual machine with a given configuration in Yandex Cloud, install Docker and run hello-world image.

install-hadoop-sandbox

Pull Hortonworks Hadoop Sandbox image and run it as Docker container on VM. Launch Hadoop services, ensure their functionality and configure security policies.

analyze-weblogs-elk

Analyze weblogs using ELK by preparing and loading data into ElasticSearch and building a dashboard in Kibana

dwh-data-vault

Design a comprehensive Data Warehouse using the Data Vault methodology, transform and prepare data for analytical use.

nifi-load-raw-data

Build a data loading pipeline using Apache NiFi and save the data to MinIO S3.

airflow-postgresql

Build a data loading pipeline using Airflow and PostgreSQL database.

spark-api

Aggregate crime statistics by district in Boston using Apache Spark, including total crimes, monthly medians, and top crime types.

metabase-ui

Develop a business intelligence solution with Metabase and PostgreSQL to visualize and interact with business data.

About

A collection of useful methods and tools for handling and processing data

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published