Introduction
Parquet (http://parquet.io/) is an ecosystem wide columnar format for Hadoop. Read Dremel made simple with Parquet for a good introduction to the format while the Parquet project has an in-depth description of the format including motivations and diagrams. At the time of this writing Parquet supports the follow engines and data description languages:
Engines
- Apache Hive
- Apache Drill
- Cloudera Impala
- Apache Crunch
- Apache Pig
- Cascading
- Apache Spark
Data description
- Apache Avro
- Apache Thrift
- Google Protocol Buffers
The latest information on Parquet engine and data description support, please visit the Parquet-MR projects feature matrix.
Version
Hive 0.10, 0.11, and 0.12
To use Parquet with Hive 0.10-0.12 you must download the Parquet Hive package from the Parquet project. You want the parquet-hive-bundle jar in Maven Central.
Hive 0.13
Native Parquet support was added to Hive 0.13 via HIVE-5783. Please note that not all Parquet data types are supported yet. Support for the remaining data types is being added through HIVE-6384.
Hive QL Syntax
A CREATE TABLE statement can specify the Parquet storage format with syntax that depends on the Hive version.
Hive 0.10 - 0.12
CREATE TABLE parquet_test ( id int, str string, mp MAP<STRING,STRING>, lst ARRAY<STRING>, strct STRUCT<A:STRING,B:STRING>) PARTITIONED BY (part string) ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'parquet.hive.DeprecatedParquetInputFormat' OUTPUTFORMAT 'parquet.hive.DeprecatedParquetOutputFormat';
Hive 0.13 and later
CREATE TABLE parquet_test ( id int, str string, mp MAP<STRING,STRING>, lst ARRAY<STRING>, strct STRUCT<A:STRING,B:STRING>) PARTITIONED BY (part string) STORED AS PARQUET;
Limitations
- Binary and date support are pending (HIVE-6384).
- Timestamp, decimal, char and varchar are not supported in Hive releases 0.10.0 through 0.13.1, but Hive 0.14.0 adds support for timestamp (HIVE-6394), decimal (HIVE-6367), char and varchar (HIVE-7735).
- Column rename is supported with use of the flag
parquet.column.index.access, starting with release 0.14.0 (HIVE-6938). - Create Table AS SELECT (CTAS) is supported starting with release 0.13.0 (HIVE-6375).
- Parquet column names were case sensitive (query has to use column case that matches exactly what is in the metastore), but became case insensitive as of Hive 0.14.0 (HIVE-7554).

