Skip to content

Dataproc Serverless - Batch Jobs

Step-01: Introduction

  • Create a Dataproc Serverless Batch Job

Step-02: Pre-requisite-1: Enable Private Google Access in VPC Subnet

  • Goto VPC Networks -> default -> SUBNETS -> PICK us-central1 region SUBNET
  • Edit Subnet
  • Private Google Access: ON
  • Click on SAVE

Step-03: Pre-requisite-2: Verify firewall rule default-allow-internal

  • Goto VPC Networks -> default -> FIREWALLS
  • Verify if default-allow-interna is present

Step-04: Create Dataproc Serverless Batch Job

  • Goto Dataproc -> Serverless -> Batches -> Create

Batch Info

  • Batch ID: sort-words-101
  • Region: us-central1

Container

  • Batch Type: PySpark
  • Runtime version: USE LATEST 2.1 (Spark 3.4, Java 17, Scala 2.13)
  • Main Python file: gs://mybucket1071/sort-words.py

Network Configuration

  • Networks in this project: select
  • Primary network: default
  • Subnetwork: default
  • REST ALL LEAVE TO DEFAULTS
  • Click on SUBMIT

Step-04: Verify Job logs

  • Goto Dataproc -> Serverless -> Batches -> sort-words-101

Step-05: Delete the Batch Job after completion

  • Goto Dataproc -> Serverless -> Batches -> sort-words-101 -> DELETE
🎉 New Course
Ultimate DevOps Real-World Project Implementation on AWS
$15.99 $84.99 81% OFF
DEVOPS2026FEB
Enroll Now on Udemy
🎉 Offer