Sitemap

Tidy Data with Java & Jupyter

6 min readNov 25, 2021

Exercising Tidy Data concepts and practices using Java on Google Colab (Jupyter)

Press enter or click to view image in full size
Tidy Data For Efficiency, Reproducibility, And Collaboration, Julie Lowndes And Allison Horst, October 12, 2020

This paper does not include a thorough explanation of Tidy Data. There are countless blog posts, articles, and videos that already do an exceptional job of just that, including the original paper on Tidy Data written by Hadley Wickham. It was fair to say, at the time of Wickham’s publication that…

A huge amount of effort is spent cleaning data to get it ready for analysis, but there has been little research on how to make data cleaning as easy and effective as possible.

Tidy Data by Hadley Wickham

To the benefit of the Data enthusiastic community at large, this is arguably no longer the case.

For a more thorough explanation of Tidy Data, though, I suggest reading the original paper, or any of the following articles published on Medium:

What is this Article About?

What this article hopes to add to the current body of work on the subject are examples of the most common Tidy Data concepts and practices using Java and Jupyter Notebooks.

Prerequisites & Getting Started

In previous articles, I demonstrated how we can code with Java in Jupyter Notebooks on Google Colab, use ‘Magics’ to import libraries, and further got our feet wet working with Dataframes using Tablesaw.

Assuming you are now minimally familiar with the steps I’ve outlined in the previous articles, you can copy this Jupyter Notebook Template, install the kernel by executing the included code block, and then connect to a hosted runtime to get started.

TL/DR this section: Sorry, there is no shortcut. You need to get familiar with the steps involved in configuring and running Java in Google Colab.

When you are, you can start with this Jupyter Notebook (Java Kernel).

Common Problems with…

--

--

Gary Sharpe
Gary Sharpe

Written by Gary Sharpe

I build back-end systems for moving, munging and synchronizing data from one end of the enterprise to the other.

No responses yet