A collection of useful methods and tools for handling and processing data
Analysis for data engineering roles on the Russian and Worldwide job markets conducted using webpage parsing, extraction of role information and top skills via spacy, and generation of summaries using the TextRank algorithm.
Create a virtual machine with a given configuration in Yandex Cloud, install Docker and run hello-world image.
Pull Hortonworks Hadoop Sandbox image and run it as Docker container on VM. Launch Hadoop services, ensure their functionality and configure security policies.
Analyze weblogs using ELK by preparing and loading data into ElasticSearch and building a dashboard in Kibana
Design a comprehensive Data Warehouse using the Data Vault methodology, transform and prepare data for analytical use.
Build a data loading pipeline using Apache NiFi and save the data to MinIO S3.
Build a data loading pipeline using Airflow and PostgreSQL database.
Aggregate crime statistics by district in Boston using Apache Spark, including total crimes, monthly medians, and top crime types.
Develop a business intelligence solution with Metabase and PostgreSQL to visualize and interact with business data.