This notebook was prepared by Donne Martin. Source and license info is on GitHub.

HDFS¶

Run an HDFS command:

In [ ]:

!hdfs

Run a file system command on the file systems (FsShell):

In [ ]:

!hdfs dfs

List the user's home directory:

In [ ]:

!hdfs dfs -ls

List the HDFS root directory:

In [ ]:

!hdfs dfs -ls /

Copy a local file to the user's directory on HDFS:

In [ ]:

!hdfs dfs -put file.txt file.txt

Display the contents of the specified HDFS file:

In [ ]:

!hdfs dfs -cat file.txt

Print the last 10 lines of the file to the terminal:

In [ ]:

!hdfs dfs -cat file.txt | tail -n 10

View a directory and all of its files:

In [ ]:

!hdfs dfs -cat dir/* | less

Copy an HDFS file to local:

In [ ]:

!hdfs dfs -get file.txt file.txt

Create a directory on HDFS:

In [ ]:

!hdfs dfs -mkdir dir

Recursively delete the specified directory and all of its contents:

In [ ]:

!hdfs dfs -rm -r dir

Specify HDFS file in Spark (paths are relative to the user's home HDFS directory):

In [ ]:

data = sc.textFile ("hdfs://hdfs-host:port/path/file.txt")