How to optimize loading 150GB data into hive table?

How to optimize loading 150GB data into hive table?

I have a 150 GB file in hive stage table which uses following table properties

I have a 150 GB file in hive stage table which uses following table properties

    ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
    WITH SERDEPROPERTIES (
       "separatorChar" = "|",
       "quoteChar"     = "'",
       "escapeChar"    = "\\"
    )  
    STORED AS TEXTFILE;

Now when I load this data in my main table, it fails with java heap error after running for one hour. I am using partitioned main table and there are about 12000 partitions in the data. For loading the data I am using simple hql:

    insert overwrite mainTable partition(date)
    select * from stage table;

I have also tried increasing the map memory to 15GB but sill it fails. Is there any way to optimize this ? Any solution which includes spark or hive would work.

hadoop apache-spark

Bootstrap 5 Complete Course with Examples

Bootstrap 5 Tutorial - Bootstrap 5 Crash Course for Beginners

Nest.JS Tutorial for Beginners

Hello Vue 3: A First Look at Vue 3 and the Composition API

Building a simple Applications with Vue 3

Deno Crash Course: Explore Deno and Create a full REST API with Deno

How to Build a Real-time Chat App with Deno and WebSockets

Convert HTML to Markdown Online

HTML entity encoder decoder Online

Hadoop vs Spark | Hadoop MapReduce vs Spark

🔥Intellipaat Big Data Hadoop Course: https://intellipaat.com/big-data-hadoop-training/ In this video on Hadoop vs Spark you will understand about the top Big...

Apache Hadoop & Spark Tutorial For Beginners | What is Hadoop & Spark

Apache Hadoop & Spark Tutorial For Beginners will help you understand the basics of Hadoop and Spark with examples.

Apache Spark Tutorial For Beginners - Apache Spark Full Course

This video on Apache Spark Tutorial For Beginners - Apache Spark Full Course will help you learn the basics of Big Data, what Apache Spark is, and the architecture of Apache Spark. You will understand how to install Apache Spark on Windows and Ubuntu. You will look at the important components of Spark, such as Spark Streaming, Spark MLlib, and Spark SQL. You will get an idea about implement Spark with Python in PySpark tutorial and look at some of the important Apache Spark interview questions

Apache Spark For Beginners In 3 Hours | Apache Spark Training

In this Apache Spark For Beginners, we will have an overview of Spark in Big Data. An introduction to Apache Spark Programming. The Spark History. We'll learn why Spark is needed and covers everything that an individual needed to master its skill in this field. In this Apache Spark tutorial, you will not only learn Spark from the basics but also through this Apache Spark tutorial, you will get to know the Spark architecture and its components such as Spark Core, Spark Programming, Spark SQL, Spark Streaming, and much more.

Apache Spark Tutorial | Spark Tutorial For Beginners

You will learn what apache spark is, the features of Apache Spark, and the architecture of Apache Spark. You will understand the various components of Apache Spark, such as Spark Core, Spark SQL, Spark Streaming, Spark MLlib, and Spark GraphX. You will look into a case study of Spark for OpenTable company. Finally, you will do a demo on linear regression and logistic regression using PySpark.