Simulate Apache Spark Workloads Without a Cluster Using Fauxspark

Posted3 months ago

dadbod

1 points

1 comments

github.comTechstory

calmpositive

Debate

0/100

Apache SparkSimulationTestingData Processing

Key topics

Apache Spark

Simulation

Testing

Data Processing

FauxSpark is a tool that allows users to simulate Apache Spark workloads without a cluster, and the HN community shares it with a calm and positive tone.

Snapshot generated from the HN discussion

Discussion Activity

Light discussion

First comment

N/A

Peak period

Start

Avg / period

Key moments

01Story posted
Oct 9, 2025 at 9:15 AM EDT
3 months ago
Step 01
02First comment
Oct 9, 2025 at 9:15 AM EDT
0s after posting
Step 02
03Peak activity
1 comments in Start
Hottest window of the conversation
Step 03
04Latest activity
Oct 9, 2025 at 9:15 AM EDT
3 months ago
Step 04

Generating AI Summary...

Analyzing up to 500 comments to identify key contributors and discussion patterns

Discussion (1 comments)

Showing 1 comments

dadbodAuthor

3 months ago

FauxSpark is a discrete event simulator built with SimPy whose objective is to model the internals of Apache Spark. It lets you experiment with Apache Spark workloads under various cluster configurations without spinning up a real cluster – perfect for testing failures, different job schedules, or capacity planning to observe the impact it has on your workload.

The current version includes:

- DAG scheduling with stages, tasks, and dependencies (but perhaps, designing around "RDD" would've been the right call)

- Modeling the input, output, shuffle partition sizes as probability distributions.

- Automatic retries on executor or shuffle-fetch failures

- Single-job execution

- Simple CLI to tweak cluster configuration, simulate failures, and scaling up executors

This tool might be relevant to the following folks:

- Data & Infrastructure engineers running Apache Spark who want to experiment with cluster configurations

- Anyone curious about Spark internals

I'd appreciate feedback from anyone with experience in discrete event simulation, particularly regarding the planned features as well as from anyone who may find this useful to shape its development.

A walkthrough section in the README demonstrates how it can be used.

GH repo https://github.com/fhalde/fauxspark

View full discussion on Hacker News

ID: 45527270Type: storyLast synced: 11/17/2025, 11:11:49 AM

Want the full context?

Jump to the original sources

Read the primary article or dive into the live Hacker News thread when you're ready.

Open link View on HN