Posts: Spark

PVC Operator; Creating Persistent Volume on Kubernetes made simple

At Banzai Cloud we work hard on our platform, Pipeline built on Kubernetes. Recently we teamed up with Red Hat and CoreOS to work on Kubernetes Operators using the recently released new Operator SDK and move human operational knowledge into code and we have open sourced quite a few operators already. This blog will dive deep into the PVC Operator. If you are looking for a complete guide how to use the Operator SDK or just interested in Kubernetes Operators, please check our comprehensive guide.

Read more...


Placeholder image

Sandor Magyari

Mon, Apr 16, 2018

Collecting Spark History Server event logs in the cloud

Apache Spark on Kubernetes series: Introduction to Spark on Kubernetes Scaling Spark made simple on Kubernetes The anatomy of Spark applications on Kubernetes Monitoring Apache Spark with Prometheus Apache Spark CI/CD workflow howto Spark History Server on Kubernetes Spark scheduling on Kubernetes demystified Spark Streaming Checkpointing on Kubernetes Deep dive into monitoring Spark and Zeppelin with Prometheus Spark Streaming Checkpointing on Kubernetes Deep dive into monitoring Spark and Zeppelin with Prometheus Apache Spark application resilience on Kubernetes

Read more...


Placeholder image

Toader Sebastian

Fri, Apr 13, 2018

Apache Spark application resilience on Kubernetes

Apache Spark on Kubernetes series: Introduction to Spark on Kubernetes Scaling Spark made simple on Kubernetes The anatomy of Spark applications on Kubernetes Monitoring Apache Spark with Prometheus Apache Spark CI/CD workflow howto Spark History Server on Kubernetes Spark scheduling on Kubernetes demystified Spark Streaming Checkpointing on Kubernetes Deep dive into monitoring Spark and Zeppelin with Prometheus Spark Streaming Checkpointing on Kubernetes Deep dive into monitoring Spark and Zeppelin with Prometheus Apache Spark application resilience on Kubernetes Collecting Spark History Server event logs in the cloud

Read more...


Placeholder image

Janos Matyas

Thu, Mar 15, 2018

Monitoring Spark with Prometheus, reloaded

Monitoring series: Monitoring Apache Spark with Prometheus Monitoring multiple federated clusters with Prometheus - the secure way Application monitoring with Prometheus and Pipeline Building a cloud cost management system on top of Prometheus Monitoring Spark with Prometheus, reloaded At Banzai Cloud we deploy large distributed applications to Kubernetes and operate these clusters as well. We don’t like to get a PagerDuty notification during the night so we try to get ahead of these issues by operating these clusters as efficient as we can.

Read more...


Placeholder image

Balint Molnar

Wed, Feb 21, 2018

Spark Streaming Checkpointing on Kubernetes

Apache Spark on Kubernetes series: Introduction to Spark on Kubernetes Scaling Spark made simple on Kubernetes The anatomy of Spark applications on Kubernetes Monitoring Apache Spark with Prometheus Apache Spark CI/CD workflow howto Spark History Server on Kubernetes Spark scheduling on Kubernetes demystified Spark Streaming Checkpointing on Kubernetes Deep dive into monitoring Spark and Zeppelin with Prometheus Apache Spark application resilience on Kubernetes Apache Zeppelin on Kubernetes series: Running Zeppelin Spark notebooks on Kubernetes Running Zeppelin Spark notebooks on Kubernetes - deep dive CI/CD flow for Zeppelin notebooks

Read more...


Placeholder image

Laszlo Puskas

Wed, Feb 14, 2018

CI/CD flow for Zeppelin notebooks

Apache Spark on Kubernetes series: Introduction to Spark on Kubernetes Scaling Spark made simple on Kubernetes The anatomy of Spark applications on Kubernetes Monitoring Apache Spark with Prometheus Apache Spark CI/CD workflow howto Spark History Server on Kubernetes Spark scheduling on Kubernetes demystified Spark Streaming Checkpointing on Kubernetes Deep dive into monitoring Spark and Zeppelin with Prometheus Apache Spark application resilience on Kubernetes Apache Zeppelin on Kubernetes series: Running Zeppelin Spark notebooks on Kubernetes Running Zeppelin Spark notebooks on Kubernetes - deep dive CI/CD flow for Zeppelin notebooks

Read more...


Placeholder image

Toader Sebastian

Thu, Feb 1, 2018

Spark scheduling on Kubernetes demystified

Apache Spark on Kubernetes series: Introduction to Spark on Kubernetes Scaling Spark made simple on Kubernetes The anatomy of Spark applications on Kubernetes Monitoring Apache Spark with Prometheus Apache Spark CI/CD workflow howto Spark History Server on Kubernetes Spark scheduling on Kubernetes demystified Spark Streaming Checkpointing on Kubernetes Deep dive into monitoring Spark and Zeppelin with Prometheus Apache Spark application resilience on Kubernetes Apache Zeppelin on Kubernetes series: Running Zeppelin Spark notebooks on Kubernetes Running Zeppelin Spark notebooks on Kubernetes - deep dive CI/CD flow for Zeppelin notebooks

Read more...


Placeholder image

Sandor Magyari

Wed, Jan 24, 2018

Spark application logs - History Server setup on Kubernetes

Apache Spark on Kubernetes series: Introduction to Spark on Kubernetes Scaling Spark made simple on Kubernetes The anatomy of Spark applications on Kubernetes Monitoring Apache Spark with Prometheus Apache Spark CI/CD workflow howto Spark History Server on Kubernetes Spark scheduling on Kubernetes demystified Spark Streaming Checkpointing on Kubernetes Deep dive into monitoring Spark and Zeppelin with Prometheus Apache Spark application resilience on Kubernetes Apache Zeppelin on Kubernetes series: Running Zeppelin Spark notebooks on Kubernetes Running Zeppelin Spark notebooks on Kubernetes - deep dive CI/CD flow for Zeppelin notebooks

Read more...


Placeholder image

Sandor Magyari

Mon, Jan 15, 2018

Amazon Elastic File System on Kubernetes

At Banzai Cloud we provision different frameworks and tools like Spark, Zeppelin and most recently Tensorflow, all running on our Pipeline PaaS (built on Kubernetes). One of Pipeline’s early adopter is running a Tensorflow Training Controller using GPUs on AWS EC2 wired into our CI/CD pipeline and needed significant parallelization for reading training data. We have introduced support for Amazon Elastic File System and will make it publicly available in the forthcoming release of Pipeline.

Read more...


Placeholder image

Sandor Magyari

Mon, Jan 8, 2018

Running Zeppelin Spark notebooks on Kubernetes - deep dive

Apache Spark on Kubernetes series: Introduction to Spark on Kubernetes Scaling Spark made simple on Kubernetes The anatomy of Spark applications on Kubernetes Monitoring Apache Spark with Prometheus Apache Spark CI/CD workflow howto Spark History Server on Kubernetes Spark scheduling on Kubernetes demystified Spark Streaming Checkpointing on Kubernetes Deep dive into monitoring Spark and Zeppelin with Prometheus Apache Spark application resilience on Kubernetes Apache Zeppelin on Kubernetes series: Running Zeppelin Spark notebooks on Kubernetes Running Zeppelin Spark notebooks on Kubernetes - deep dive CI/CD flow for Zeppelin notebooks

Read more...


Placeholder image

Toader Sebastian

Tue, Jan 2, 2018

The anatomy of Spark applications on Kubernetes

Apache Spark on Kubernetes series: Introduction to Spark on Kubernetes Scaling Spark made simple on Kubernetes The anatomy of Spark applications on Kubernetes Monitoring Apache Spark with Prometheus Apache Spark CI/CD workflow howto Spark History Server on Kubernetes Spark scheduling on Kubernetes demystified Spark Streaming Checkpointing on Kubernetes Deep dive into monitoring Spark and Zeppelin with Prometheus Apache Spark application resilience on Kubernetes Apache Zeppelin on Kubernetes series: Running Zeppelin Spark notebooks on Kubernetes Running Zeppelin Spark notebooks on Kubernetes - deep dive CI/CD flow for Zeppelin notebooks

Read more...


Placeholder image

Janos Matyas

Wed, Dec 27, 2017

Top 3 blogs of 2017 and what’s next

As 2017 comes to an end, we are looking back at the top three blog posts that were most popular with our readers. We can’t really look too far back (though we had 13 posts and one release already) as we basically started our startup just a little bit over one month (November 20, 2017 to be more precise) but during this short time period we achieved quite a lot and laid the foundation to some exciting new projects we plan to ship early next year.

Read more...


Placeholder image

Miklos Csendes

Wed, Dec 20, 2017

Introduction to spotguides

Last week we have released the first version of Pipeline - with end to end support for cloud native apps starting from a GitHub commit hook deployed into the cloud in minutes using a fully customizable CI/CD workflow. The core part of the Pipeline PaaS is spotguides - a collection of workflow/pipeline steps defined in a .pipeline.yml file and a few Drone plugins. In this post we would like to demystify spotguides and describe step by step how they work; the next post will be a tutorial of how to write a custom spotguide and an associated plugin.

Read more...


Placeholder image

Balint Molnar

Mon, Dec 18, 2017

Monitoring Apache Spark with Prometheus on Kubernetes

Apache Spark on Kubernetes series: Introduction to Spark on Kubernetes Scaling Spark made simple on Kubernetes The anatomy of Spark applications on Kubernetes Monitoring Apache Spark with Prometheus Apache Spark CI/CD workflow howto Spark History Server on Kubernetes Spark scheduling on Kubernetes demystified Spark Streaming Checkpointing on Kubernetes Deep dive into monitoring Spark and Zeppelin with Prometheus Apache Spark application resilience on Kubernetes Apache Zeppelin on Kubernetes series: Running Zeppelin Spark notebooks on Kubernetes Running Zeppelin Spark notebooks on Kubernetes - deep dive CI/CD flow for Zeppelin notebooks

Read more...


Placeholder image

Laszlo Puskas

Thu, Dec 14, 2017

Apache Spark CI/CD workflow howto

Apache Spark on Kubernetes series: Introduction to Spark on Kubernetes Scaling Spark made simple on Kubernetes The anatomy of Spark applications on Kubernetes Monitoring Apache Spark with Prometheus Apache Spark CI/CD workflow howto Spark History Server on Kubernetes Spark scheduling on Kubernetes demystified Apache Zeppelin on Kubernetes series: Running Zeppelin Spark notebooks on Kubernetes Running Zeppelin Spark notebooks on Kubernetes - deep dive CI/CD flow for Zeppelin notebooks

Read more...


Placeholder image

Janos Matyas

Tue, Dec 12, 2017

Pipeline PaaS - the first release

Banzai Pipeline, or simply Pipeline is a tabletop reef break located in Hawaii, Oahu’s North Shore. The most famous and infamous reef on the planet is forming the benchmark by which all other waves are measured. Pipeline is a PaaS with a built in CI/CD engine to deploy cloud native microservices in public cloud and on-premise. It simplifies and abstracts all the details of provisioning the cloud infrastructure, installing or reusing the Kubernetes cluster and deploying the application.

Read more...


Placeholder image

Sandor Magyari

Tue, Dec 5, 2017

Running Zeppelin Spark notebooks on Kubernetes

Apache Spark on Kubernetes series: Introduction to Spark on Kubernetes Scaling Spark made simple on Kubernetes The anatomy of Spark applications on Kubernetes Monitoring Apache Spark with Prometheus Apache Spark CI/CD workflow howto Spark History Server on Kubernetes Spark scheduling on Kubernetes demystified Spark Streaming Checkpointing on Kubernetes Deep dive into monitoring Spark and Zeppelin with Prometheus Apache Spark application resilience on Kubernetes Apache Zeppelin on Kubernetes series: Running Zeppelin Spark notebooks on Kubernetes Running Zeppelin Spark notebooks on Kubernetes - deep dive CI/CD flow for Zeppelin notebooks

Read more...


Placeholder image

Toader Sebastian

Fri, Dec 1, 2017

Scaling Spark made simple on Kubernetes

Apache Spark on Kubernetes series: Introduction to Spark on Kubernetes Scaling Spark made simple on Kubernetes The anatomy of Spark applications on Kubernetes Monitoring Apache Spark with Prometheus Apache Spark CI/CD workflow howto Spark History Server on Kubernetes Spark scheduling on Kubernetes demystified Spark Streaming Checkpointing on Kubernetes Deep dive into monitoring Spark and Zeppelin with Prometheus Apache Spark application resilience on Kubernetes Apache Zeppelin on Kubernetes series: Running Zeppelin Spark notebooks on Kubernetes Running Zeppelin Spark notebooks on Kubernetes - deep dive CI/CD flow for Zeppelin notebooks

Read more...


Placeholder image

Janos Matyas

Mon, Nov 27, 2017

Introduction to Spark on Kubernetes

Apache Spark on Kubernetes series: Introduction to Spark on Kubernetes Scaling Spark made simple on Kubernetes The anatomy of Spark applications on Kubernetes Monitoring Apache Spark with Prometheus Apache Spark CI/CD workflow howto Spark History Server on Kubernetes Spark scheduling on Kubernetes demystified Spark Streaming Checkpointing on Kubernetes Deep dive into monitoring Spark and Zeppelin with Prometheus Apache Spark application resilience on Kubernetes Apache Zeppelin on Kubernetes series: Running Zeppelin Spark notebooks on Kubernetes Running Zeppelin Spark notebooks on Kubernetes - deep dive CI/CD flow for Zeppelin notebooks

Read more...