Streaming Data Engineering Project

Overview

This project is a comprehensive Scala application designed to simulate a Data Engineering pipeline around streaming data. It generates fake bank transaction data and processes it in real-time using Apache Spark Structured Streaming. The application calculates various metrics, such as the total amount of transactions, the number of transactions, the average transaction amount, and tracks the balance of a specific bank account. Results are displayed in the terminal at regular intervals.

Features

Data Generation: Continuously generates fake bank transaction data with fields such as transaction ID, timestamp, account ID, amount, and transaction type.
Real-Time Processing: Utilizes Apache Spark Structured Streaming to process and analyze transaction data in real-time.
Metrics Calculation:
- Total amount of transactions.
- Total number of transactions.
- Average transaction amount.
- Real-time tracking of a specific account's balance.

Technologies Used

Scala: Programming language used for application development.
Apache Spark 3.5.2: Framework for large-scale data processing and streaming.
sbt: Build tool for Scala projects.
Play JSON: Library for JSON handling in Scala.

Files
build.sbt: Project configuration and dependency management.
Main.scala: Entry point of the application.
TransactionGenerator.scala: Generates fake transaction data.
StreamingProcessor.scala: Processes streaming data using Spark.

Prerequisites

Local Development

Java 8 or Java 11: Required for running Scala and Spark.
Scala 2.12+: Programming language.
sbt: Scala build tool.
Apache Spark 3.x: Data processing framework.

Installation

Setting Up Development Environment on Ubuntu

Update Package Index

sudo apt update

Install Java

sudo apt install openjdk-11-jdk -y

Install Scala

sudo apt install scala -y

Verify installation:

scala -version

Install sbt

echo "deb https://repo.scala-sbt.org/scalasbt/debian all main" | sudo tee /etc/apt/sources.list.d/sbt.list
echo "deb https://repo.scala-sbt.org/scalasbt/debian /" | sudo tee /etc/apt/sources.list.d/sbt_old.list
sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv 99E82A75642AC823
sudo apt update
sudo apt install sbt -y

Verify installation:

sbt sbtVersion

Install Apache Spark

Download the latest version of Apache Spark from the official website and extract the archive:

wget https://downloads.apache.org/spark/spark-3.2.0/spark-3.2.0-bin-hadoop3.2.tgz
tar xvf spark-3.2.0-bin-hadoop3.2.tgz
sudo mv spark-3.2.0-bin-hadoop3.2 /opt/spark

Configure environment variables:

echo "export SPARK_HOME=/opt/spark" >> ~/.bashrc
echo "export PATH=\$PATH:\$SPARK_HOME/bin:\$SPARK_HOME/sbin" >> ~/.bashrc
source ~/.bashrc

Verify installation:

spark-shell --version

Running the Application

Clone the Repository

git clone https://github.com/Hisqkq/Spark-Streaming-Bank.git
cd SparkStreamingBank

2 Compile the Application

sbt compile

Run the Application

sbt run

The application will start generating fake transactions and processing them in real-time. Metrics will be displayed in the terminal every 10 seconds.

Monitoring and Debugging

Spark Web UI: Access the Spark Web UI at http://localhost:4040 to monitor streaming queries, batch progress, and resource usage.
Logs: Check application logs for detailed error messages and stack traces to identify issues.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.gitignore		.gitignore
Main.scala		Main.scala
README.md		README.md
StreamingProcessor.scala		StreamingProcessor.scala
TransactionGenerator.scala		TransactionGenerator.scala
build.sbt		build.sbt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Streaming Data Engineering Project

Overview

Table of Contents

Features

Technologies Used

Files

Prerequisites

Local Development

Installation

Setting Up Development Environment on Ubuntu

Running the Application

Monitoring and Debugging

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Streaming Data Engineering Project

Overview

Table of Contents

Features

Technologies Used

Files

Prerequisites

Local Development

Installation

Setting Up Development Environment on Ubuntu

Running the Application

Monitoring and Debugging

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages