Cloud Dataflow - Batch Job¶
Step-01: Introduction¶
- Create a simple Batch Job
- Batch Job: Will run to completion
- Stream Job: Runs continuously
- Pre-requisite: Create Cloud Storage Bucket
Step-02: Create a Job from template¶
- Go to Dataflow -> Create job from template
- Job name: wordcount-batch-job
- Regional endpoint: us-central1
- Dataflow template: Word count
- Required Parameters
- Input Files in Cloud Storage: gs://dataflow-samples/shakespeare/kinglear.txt
- Output Cloud Storage file prefix: gs://mybucket1071/wordcounts/
- Temporary location: gs://mybucket1071/wordcountstemp/
- Click on RUN JOB
Step-03: Verify the following¶
- Go to Dataflow -> Jobs -> wordcount-batch-job
- Job Graph
- Execution Details
- Job Metrics
- Cost
- Logs
Step-04: Verify the output in Cloud Storage Bucket¶
- Go to Cloud Storage -> mybucket1021 -> wordcounts
- Review output file
Step-05: gcloud: Dataflow commands¶
Step-06: Review kinglear.txt file¶
# Access it on browser
https://storage.googleapis.com/dataflow-samples/shakespeare/kinglear.txt
# Download kinglear.txt
gcloud storage cp gs://dataflow-samples/shakespeare/kinglear.txt .
🎉 New Course
Ultimate DevOps Real-World Project Implementation on AWS
$15.99
$84.99
81% OFF
DEVOPS2026FEB
Enroll Now on Udemy
🎉 Offer