• JUPYTER
  • FAQ
  • View as Code
  • Python 2 Kernel
  • View on GitHub
  • Execute on Binder
  • Download Notebook
  1. data-science-ipython-notebooks
  2. spark

This notebook was prepared by Donne Martin. Source and license info is on GitHub.

HDFS¶

Run an HDFS command:

In [ ]:
!hdfs

Run a file system command on the file systems (FsShell):

In [ ]:
!hdfs dfs

List the user's home directory:

In [ ]:
!hdfs dfs -ls

List the HDFS root directory:

In [ ]:
!hdfs dfs -ls /

Copy a local file to the user's directory on HDFS:

In [ ]:
!hdfs dfs -put file.txt file.txt

Display the contents of the specified HDFS file:

In [ ]:
!hdfs dfs -cat file.txt

Print the last 10 lines of the file to the terminal:

In [ ]:
!hdfs dfs -cat file.txt | tail -n 10

View a directory and all of its files:

In [ ]:
!hdfs dfs -cat dir/* | less

Copy an HDFS file to local:

In [ ]:
!hdfs dfs -get file.txt file.txt

Create a directory on HDFS:

In [ ]:
!hdfs dfs -mkdir dir

Recursively delete the specified directory and all of its contents:

In [ ]:
!hdfs dfs -rm -r dir

Specify HDFS file in Spark (paths are relative to the user's home HDFS directory):

In [ ]:
data = sc.textFile ("hdfs://hdfs-host:port/path/file.txt")

This website does not host notebooks, it only renders notebooks available on other websites.

Delivered by Fastly, Rendered by OVHcloud

nbviewer GitHub repository.

nbconvert version: 7.16.6

Rendered (Tue, 02 Dec 2025 03:29:03 UTC)