Types of tables in Apache Hive

Overview

  • Get an overview of Internal and External table.
  • Key differences between Internal and External Table.

Table of Contents

  1. Managed Tables.
  2. External Tables.
  3. Managed vs External Table.
  4. Identify the Type of Table.

Managed Tables

In a managed table both the table data and the table schema are managed by Hive. The data will be located in a folder named after the table within the Hive data warehouse, which is essentially just a file location in HDFS.

The location is user configurable when Hive is installed. By managed or controlled we mean that if you drop (delete) a managed table, then Hive will delete both the Schema (the description of the table) and the data files associated with the table. Default location is /user/hive/warehouse).

Create Table

CREATE TABLE IF NOT EXISTS stocks (exchange STRING,

symbol STRING,

price_open FLOAT,

price_high FLOAT,

price_low FLOAT,

price_adj_close FLOAT)

ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘,’ ;

As for managed tables, you can also copy the schema (but not the data) of an existing table:

CREATE EXTERNAL TABLE IF NOT EXISTS mydb.employees3

LIKE mydb.employees

LOCATION ‘/path/to/data’;

External Tables

An external table is one where only the table schema is controlled by Hive. In most cases the user will set up the folder location within HDFS and copy the data file(s) there. This location is included as part of the table definition statement.

When an external table is deleted, Hive will only delete the schema associated with the table. The data files are not affected.

Create External Table

CREATE EXTERNAL TABLE IF NOT EXISTS stocks (exchange STRING,

symbol STRING,

price_open FLOAT,

price_high FLOAT,

price_low FLOAT,

price_adj_close FLOAT)

ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘,’

LOCATION ‘/data/stocks’;

Managed vs External Table

Internal Table External Table
Hive assumes that it owns the data for managed tables. For external tables Hive assumes that it does not manage the data.
If a managed table or partition is dropped, the data and metadata associated with that table or partition are deleted. Dropping the table does not delete the data, although the metadata for the table will be deleted.
For Internal tables, Hive stores data into its warehouse directory For External Tables, Hive stores the data in the LOCATION specified during creation of the table(generally not in warehouse directory)
Internal table provides ACID/transnational actions support. External Table does not provide ACID/transactional actions support.
Statements: ARCHIVE, UNARCHIVE, TRUNCATE, MERGE, CONCATENATE supported
Not supported.
Query Result Caching supported(saves the results of an executed Hive query for reuse ) Not Supported

Identify the Type of Table

You can tell whether or not a table is managed or external using the output of DESCRIBE EXTENDED tablename .

Near the end of the Detailed Table Information output, you will see the following for managed tables:

… tableType: MANAGED_TABLE)

For external tables, you will see the following:

… tableType: EXTERNAL_TABLE)

Note: If you omit the EXTERNAL keyword and the original table is external, the new table will also be external. If you omit EXTERNAL and the original table is managed, the new table will also be managed. However, if you include the EXTERNAL keyword and the original table is managed, the new table will be external. Even in this scenario, the LOCATION clause will still be optional.

You can also read this article on our Mobile APP Get it on Google Play

Related Articles

Author: admin

Leave a Reply

Your email address will not be published.