SIM-PIPE generates and simulates a deployment configuration for the final deployment that conforms to the hardware requirements and includes any additional necessary middleware inter-step communication code. Finally, the tool provides a pipeline testing functionality, including a sandbox for evaluating individual pipeline step performance, and a simulator to determine the performance of the overall Big Data pipeline. Specifically, SIM-PIPE provides the following high-level features:
- Deploying each step of a pipeline and running it in a sandbox by providing sample input
- Evaluating pipeline step performance by recording and analysing metrics about its execution in order to identify bottlenecks and steps to be optimized
- Identification of resource requirements for pipeline by calculating step performance per resource used
Supported by the scripts: Debian/Ubuntu (k3s) and macOS (Colima). For other distros, use the manual Helm flow in the Kubernetes section.
python3 install.py
python3 start.pyWhat these do:
install.py: installs prerequisites (Ansible, Docker, k3s on Linux; brew tools on macOS), installs/updates the Helm chart with charts/simpipe/values.yaml, and ensures required secrets using yourKUBECONFIG(falls back to~/.kube/configor/etc/rancher/k3s/k3s.yamlwhen they exist).start.py: re-runs installer checks, ensures secrets, and waits for the cluster. On macOS it also starts Colima+Kubernetes; on Linux it expects your cluster reachable viaKUBECONFIG(no permissions are changed).
Important:
- Set
KUBECONFIGto point to your cluster. The scripts will fall back to~/.kube/configor/etc/rancher/k3s/k3s.yamlif they exist, but they no longer change kubeconfig permissions or add you todocker.
Port-forward helper (optional, local dev):
python3 forwarding.pyAfter starting SIM-PIPE and while running the python forwarding.py script, browse to http://localhost:8088/ to access the SIM-PIPE GUI.
Build the hello-world software container image locally:
# Example using Docker
docker buildx build -t hello-world examples/hello-world
# or (if the previous command fails)
docker-buildx build -t hello-world examples/hello-worldRun the hello-world pipeline:
argo submit --watch examples/hello-world.yamlCheck the logs of the hello-world pipeline:
argo logs @latestRequires brew first. python3 start.py will install tools via brew, start Colima with Kubernetes (simpipe profile), and then install/upgrade the chart using the bundled values and secrets helper.
python3 install.py installs prerequisites (Ansible, Docker, k3s, helm, argo CLI), ensures SIM-PIPE secrets, and installs/updates the Helm chart using your KUBECONFIG (falling back to ~/.kube/config or /etc/rancher/k3s/k3s.yaml if present). python3 start.py re-runs checks and waits for pods but does not start k3s for you—start it manually if it is down.
If you prefer manual Ansible steps:
sudo ansible-galaxy install -r ./ansible/requirements.yaml --roles-path ./ansible/roles
sudo ansible-playbook -i localhost, -c local -b -K ./ansible/install-everything.yaml
sudo ansible-playbook -i localhost, -c local -b -K ./ansible/install-simpipe.yamlYou can install SIM-PIPE on any Kubernetes cluster using the Helm chart in the charts/simpipe folder or a released helm chart using the oci registry at oci://ghcr.io/datacloud-project/sim-pipe.
Please note that it is recommended to use a clean Kubernetes cluster for the installation.
SIM-PIPE is been developed and tested on kubernetes 1.27 with the K3S distribution. The default configuration
uses the default namespace and has opiniated settings for Argo Workflow and the various secrets and role bindings.
You may want to change the configuration of the Helm chart to match your needs. If installing manually, create the required secrets first (see secrets_manager.py), then apply the chart with the bundled values file:
# Using the latest release
helm install simpipe oci://ghcr.io/datacloud-project/sim-pipe -f charts/simpipe/values.yaml
# or using the local folder
helm install simpipe ./charts/simpipe -f charts/simpipe/values.yamlChart defaults to note (edit in charts/simpipe/values.yaml):
- Controller runs on hostNetwork, privileged, port 9000; Argo endpoint
http://simpipe-argo-workflows-server:2746/. - MinIO enabled with default creds
simpipe/simpipe1234, bucketsartifacts,logs,registry. - Carbontracker, cadvisor bridge, and Argo Workflows enabled by default. Change or secure before exposing externally.
SIM-PIPE runs everywhere as long as it runs Linux. If you are using Windows, you can install SIM-PIPE using the Windows Subsystem for Linux (WSL) in its second version (WSL2). Then you can select a Debian based Linux distribution and proceed as normal.
You may have to run the following instructions to make Docker work: microsoft/WSL#6655 (comment) The installation script attempts to fix it for you.
Please consult the ARCHITECTURE.md document for more details on the SIM-PIPE architecture.
SIM-PIPE is designed to only allow trusted users to deploy pipelines.
DO NOT expose the SIM-PIPE API to the public Internet without authorising and authentifying your users.
The default installation of SIM-PIPE IS NOT secure. You need to configure the authentication and authorisation mechanisms yourself.
In practice, SIM-PIPE is better to run on your local machine. When port forwarding, make sure you do not expose the SIM-PIPE API to an untrusted network. The defaults are set to localhost only.
Before raising a pull request, please read our contributing guide.
SIM-PIPE is released as open source software under the Apache License 2.0.
