Hive tables can be created as EXTERNAL or INTERNAL. This is a choice that affects how data is loaded, controlled, and managed.
Use EXTERNAL tables when:
Use INTERNAL tables when:
Use EXTERNAL tables when:
- The data is also used outside of Hive. For example, the data files are read and processed by an existing program that doesn't lock the files.
- Data needs to remain in the underlying location even after a DROP TABLE. This can apply if you are pointing multiple schemas (tables or views) at a single data set or if you are iterating through various possible schemas.
- You want to use a custom location such as ASV.
- Hive should not own data and control settings, dirs, etc., you have another program or process that will do those things.
- You are not creating table based on existing table (AS SELECT).
Use INTERNAL tables when:
- The data is temporary.
- You want Hive to completely manage the lifecycle of the table and data.