Faster, fail-safe insights with a Yellowfin cluster

Faster, fail-safe insights with a Yellowfin cluster

In Yellowfin 8, we have enhanced how our clustering infrastructure functions.

If you have Yellowfin set up on a clustered environment, then you may be aware of how advantageous this can be.

Why should you have a clustered environment?

A cluster is a group of nodes interconnected to run applications. This optimizes parallel computing, as multiple types of tasks are simultaneously executed. These include web-requests and background tasks.

Opposed to a singular instance, where only a single node is solely responsible for managing an entire application, clusters allow for load balancing and offer high-availability. Load balancing ensures that tasks are split between nodes, making for a fast processing application. And with high-availability, you can be certain that even with a node going offline, there will always be another to take over its tasks, ensuring a stable environment for your business. So if you have something advanced, such as Yellowfin Signals, that continuously performs background analysis to provide instant insights, it’s best to deploy Yellowfin in a cluster for faster results and minimal failures.

Yellowfin clustering basics

Yellowfin clustering basics

In a Yellowfin cluster, there are a few main concepts to consider: Application Level Messaging, Load Balancing, and Background Processing. We’ll explain the purpose of each, and how they should be implemented.

Application Level Messaging ensures the cluster operates properly, by allowing the nodes to communicate with each other. Every time there is an infrastructure change, such as a node going offline, or a new one joining the cluster, each node is communicated this update. This also holds true for application tasks. So, if a report is flushed from a node’s cache, it will need to be flushed from all nodes. Or if you update your license on one of the nodes, it will get updated on all other nodes. Application Messaging is responsible for keeping data consistent, and up-to-date, across the entire cluster.

The way this messaging functions depends on how the clustering mode is implemented. Our clustering infrastructure supports three implementation modes, differing in how the nodes communicate with one another. These are:

  • Legacy: The old-school Yellowfin cluster setup that utilized web services to communicate between the nodes. With the introduction of faster Dynamic and Repository modes, this mode is no longer recommended.
  • Dynamic: This JGroup library utilizing mode allows nodes to communicate via the UDP protocol. This mode may experience restrictions in some environments, such as AWS, that may block UDP.
  • Repository: Our recommended mode, which is similar to the Dynamic mode, but uses the industry-standard TCP communication protocol (of TCP/IP fame, minus the IP).
Load balancing in Yellowfin cluster

The load balancing act

As it may be evident from the name, the purpose of a load balancer is to distribute the load between a group of nodes that it is connected to. These nodes are web-facing, in that only these will process web requests initiated by users, as well as web services. You can think of web requests as any UI tasks that the user is performing, such as dragging a field into a report, or using a web service. Besides these requests, there are also background tasks that the application runs, but more on that later.

Out of the two ways in which you can configure the load balancer to allocate tasks, we encourage implementing Sticky Sessions. What this means is that, all traffic from one user’s session is handled by the same node, until the session ends. This is ideal, as only the assigned node will need to create and maintain the user’s session object that is required to execute the task. The risk here is that, in case the allocated node fails, the session will terminate, which could result in a possible loss of unsaved information.

On the other hand, there is the Container Level Session Replication. In this type of implementation, the load balancer randomly allocates a node to process each request. This requires using up extra bits of memory, as every node must store the session objects of all users. Every time the user’s information is updated, the Container Level Session Replication ensures that the changes are replicated and synchronized across all the other nodes, resulting in additional processing.

If you opt out of deploying a load balancer, then you will have a single node to process all the web requests. However, we discourage this practice as, a) it can slow down processing, especially in case of a large number of users, and b) this is a single point of failure: if your designated node crashes, there will be no automatic replacement node to handle its tasks - you will have to manually redirect your web traffic to another node. Therefore, a load balancer is highly encouraged, so you can rest easy knowing your web requests are being processed faster, and with increased reliability.

Mastering Yellowfin clustering

Mastering the cluster

Besides web requests, there are other types of tasks processed in parallel in a cluster. These include:

  • Backgrounds tasks, that is, any scheduled Yellowfin tasks that run in the background. Examples include report broadcasts, Signals analysis, data transformations, access filter refreshes.
  • Cluster maintenance, which keeps the internal caches updated. For example, reloading license, flushing view, dashboard or report content from the cache, or task coordination.

 

As of Yellowfin 8, the background processing has been restructured in a major way: it now supports the Master - Slave concept (note that this does not include cluster maintenance). To give a quick overview of this type of implementation, a single node is selected as the “master”, making the rest of the nodes, its “slaves” (computer terminology at its worst). The master node has two main responsibilities:

  • To figure out which tasks require execution, and,
  • To allocate these tasks to eligible nodes, including itself.

 

We have also implemented additional functionality to let you control the cluster nodes. Administrators can designate master and slave nodes in a Yellowfin cluster. When allocating tasks to slave nodes, the selected node must have the capacity to run that task type. You can now configure nodes to not process tasks - in this case, the master does not allocate them any tasks. This can be done for the master as well, leaving it to only coordinate task execution. Nodes can also be configured to not become the master node.

The cluster can have only a single master node at one point in time. If that node were to fail, the cluster selects another eligible node as the master. The master is not to be confused with the load balancer: each handles the management of different types of tasks across the cluster.

You can even designate certain nodes to only process background tasks (and be managed by the master node), and some to only execute web requests (to be managed by the load balancer).

With the recommendations and tips shared in this article, you can assemble your ideal clustering infrastructure. The new enhancements also ensure that your cluster’s performance will be faster, with more fail-safes.

Imagine the power of your business on a clustered system running Yellowfin’s automated data discovering capabilities.

Clustering setup guide

Want to know more about setting up clustering? Read our handy guide on the wiki.