Yellowfin Evaluation Guide
Yellowfin is used for both enterprise analytics and embedded analytics use cases and for building bespoke analytical applications. Use this guide to ensure Yellowfin is the right technical fit for your requirements.
Managing Data
-
In this section
Updated 31 May 2022 -
Managing Data
Yellowfin does not require you to import data, rather it works with data in your existing data sources. Four main types of data store are supported:
- JDBC compliant databases such as SQLServer, Oracle, Exasol, Snowflake, Redshift and many more
- XML/A compliant cubes such as Microsoft Analysis Services, SAP BW, Oracle Essbase
- CSV files (which are then loaded in to a relational database)
- Third-Party sources including applications like Salesforce.com, Google Analytics and many others
Yellowfin uses meta-data to generate the appropriate, optimised queries for each type of data source. Data can even be obtained from a stored procedure that outputs a table object. Further, reports can be written that combine data from multiple data sources.
Yellowfin ships with a range of connectors to popular third-party applications – such as Salesforce, Google Analytics, Facebook and more. Further, Yellowfin’s extensible Plug-in framework allows for data to be extracted from any source. In that way custom connectors can be written to connect to data sources that are not currently supported.
-
Data Connections and Query Management
Which query languages does Yellowfin support?
Yellowfin automatically generates query syntax based on the meta-data layer definitions, and the data source you are connecting to. Most data sources use SQL. Yellowfin will adjust SQL syntax where necessary for differences between database vendors.
Yellowfin can also generate XML/A for accessing data stored in cubes such as MS Analysis Services, Oracle Essbase or SAP BW.How can I optimize my query performance?
The Yellowfin meta-data layer has query optimization built into it, so that for each report built, the SQL generated is optimised to only retrieve the fields required versus returning the entire data model exposed in the view.
If you want to further optimise query performance within Yellowfin you can:
- Expose native database functions via calculated fields at the view level
- Use Virtual tables to create freehand SQL queries within a view
- Write freehand SQL views for specific reporting use cases – note these tend to be less flexible than the options above.
- Create multiple charts from a single report data set to minimize the number of queries being run.
External to Yellowfin you can:
- Create database views that get called by the Yellowfin meta-data layer
- Use Stored-procedures as a data provider – although this limits query flexibility
- Optimise database data structures and indexes for reporting;
- Or use a high performance analytical database such as Exasol, RedShift, Google Big Query to name a few.
How can I control connection pooling?
Yellowfin uses a Connection Pool to manage database connections, and reuse database connections where possible. Each connection can have separate connection pool limits and sizes. The Connection Pool can be used to protect the database from being inundated with report queries from Yellowfin. Connection pool settings are managed via the administration console.
What data sources can I connect to?
Four main types of data store are supported:
- JDBC compliant databases such as SQLServer, Oracle, Exasol, Snowflake, Redshift and many more
- XML/A compliant cubes such as Microsoft Analysis Services, SAP BW, Oracle Essbase
- CSV files (which are then loaded in to a relational database)
- Third-Party sources including applications like Salesforce.com, Google Analytics and many others
Does Yellowfin take a copy of my data?
In most cases Yellowfin connects directly to your data store of choice. Yellowing uses its meta-data definitions to generate the appropriate query syntax for your data source. Data is returned live in the report for the user to view.
There are some exceptions to this, however these are all options used at your discretion:- Yellowfin can read and report on data files such as CSVs. The files are configured as new data sources and the data is read in and stored in a data store of your choice. By default, Yellowfin will store files in its own repository.
- Yellowfin has the ability to cache data for faster performance (see below). This is particularly useful for large or complex data stores where query performance is not adequate to support real time reporting. Yellowfin can be configured to store cached data in its own repository
- When running reports in batch mode, copies of reports will be retained in the repository. Archiving rules can be setup to automatically purge data.
Can I connect directly to my application data via APIs
Yes. Yellowfin ships with a range of connectors to popular applications. If a particular application is not supported, a custom connector can be built using the connector plug-in framework.
Can I add additional JDBC drivers into Yellowfin?
Yes you can. You can upload an unlisted JDBC driver via the plug-in manager. Yellowfin will generate standard SQL syntax against that data source. For vendor specific SQL extensions, freehand SQL can be used or you can request for the database to be certified by Yellowfin.
Does Yellowfin have any sort of caching layer?
There are a number of methods that can be used to accelerate query performance:
- View caching – the data from an entire view definition can be cached. All data will be pre-joined and stored in a single, flattened table. A variety of options can be configured including which database to cache the data in, the frequency of data refreshes, and whether incremental or full refreshes are required. Queries against the view will automatically be redirected to use the cached data where available.
- Periodic report refresh – when publishing a report, it can be configured to automatically refresh data on a periodic basis. This means when the report is opened, the report is already populated with data.
- Report broadcast – reports can be set up to automatically run based on a predefined schedule, with the report distributed either through the Yellowfin system, delivered as an attachment to an email, or saved to a file system location.
- Data caching – Yellowfin can cache data for commonly run reports in memory. This increases performance when concurrent users are viewing the same content. Data caching stops the same query from running against the source database within a particular time-frame
Can Yellowfin report on data streams?
There are two methods that can be used to connect to and create visualizations from streaming data.
The first, Yellowfin can connect to a streaming Database that provides a jdbc driver. Reports can be scheduled to periodically refresh to ensure up-to-date data is always shown to the user. In this style the chart is updated based on the frequency of the pull schedule.
The second, is to use Yellowfin’s Code mode to receive data pushed from a server and to update a JavaScript Chart in real-time. In this style the chart is updated based on the frequency of push.
Further reading on Data Connections
The full list of data source connection methods is described in detail here
Information on how to develop your own connector can be found here
Currently supported third-party connectors can be found on our marketplace
-
Data Storage
Does Yellowfin store a copy of my data?
In most cases Yellowfin connects directly to your data store of choice. Yellowfin uses its meta-data definitions to generate the appropriate query syntax for your data source. Data is returned live in the report for the user to view.
There are some exceptions to this, however these are all options used at your discretion:
- Yellowfin can read and report on data files such as CSVs. The files are configured as new data sources and the data is read in and stored in a data store of your choice. By default, Yellowfin will store files in its own repository.
- Yellowfin has the ability to cache data for faster performance. This is particularly useful for large or complex data stores where query performance is not adequate to support real time reporting.
- Yellowfin can be configured to store cached data in its own repository.
When running reports in batch mode, copies of reports will be retained in the repository. Archiving rules can be setup to automatically purge data. - Yellowfin has the capability to store a snapshot of report data. This is important when for situations where underlying data is updated regularly, but a user wants to preserve a copy of report data at a particular point in time (for example, when inserting into a story).
How can Yellowfin accelerate my report performance?
For high-volume or complex database schema, Yellowfin recommends the use of high-performance database technology in order to provide the optimal user experience. Where query performance is not optimal, Yellowfin can help accelerate performance using the following options:
- ETL – where data is stored in a non-optimal fashion, Yellowfins ETL capability can be used to transform the data into an optimized format. This could include simply aggregating detailed transaction data into smaller summary tables, through to completely reformatting normalized data into a report-optimized schema such as a star-schema.
- View caching – the data from an entire view definition can be cached. All data will be pre-joined and stored in a single, flattened table. A variety of options can be configured including which database to cache the data in, the frequency of data refreshes, and whether incremental or full refreshes are required. Queries against the view will automatically be redirected to use the cached data where available.
- Periodic report refresh – when publishing a report, it can be configured to automatically refresh data on a periodic basis. This means when the report is opened, the report is already populated with data.
Report broadcast – reports can be setup to automatically run based on a pre-defined schedule, with the report distributed either through the Yellowfin system, delivered as an attachment to an email, or saved to a file system location. - Data caching – Yellowfin can cache data for commonly run reports in memory. This increases performance when concurrent users are viewing the same content. Data caching stops the same query from running against the source database within a particular time-frame
Does Yellowfin store sensitive customer data?
By default, Yellowfin does not store your data.
As described above, there are options that can be configured at your discretion, that will result in your data being stored. You have full control over this including which database the data is stored in.Does Yellowfin support a hybrid cloud model?
Yes, Yellowfin can be deployed on-premise or in the cloud, and can connect to any data source whether on-premise or in the cloud. In a clustered deployment, different Yellowfin nodes can be placed in different locations (ie. clustered across an on-premise site and the cloud).
How can I use a stored procedure within Yellowfin?
Stored Procedures can be used as a data source provided your stored procedure returns a table. While we always recommend building views within our drag-and-drop interface, it is also possible to create views out of stored procedures. When creating a new view, select the “Stored Procedure” option. If your stored procedure/function returns a value – Certain stored procedures and most database functions are able to be called within a standard report, using freehand SQL calculated fields
Can I use my database views as a meta-data layer?
Database views are presented as standard tables in Yellowfin’s metadata layer. These are often used in application databases, providing a reporting structure on top of a transactional data model.
What Databases does Yellowfin support?
Four main types of data store are supported:
- JDBC compliant databases such as SQLServer, Oracle, Exasol, Snowflake, Redshift and many more
- XML/A compliant cubes such as Microsoft Analysis Services, SAP BW, Oracle Essbase
- CSV files (which are then loaded into a relational database)
- Third-Party sources including applications like Salesforce.com, Google Analytics and many others
-
Preparing Data for Analysis
Do I have to move my data into a proprietary format?
No, Yellowfin will report directly off the data in whatever database you have chosen – you don’t need to move or transform it into any specific format in order to report off it.
Do I have to prepare my data to be used in Yellowfin?
Yellowfin can generate queries to access data stored in any supported format. Some database architectures are more optimized for reporting – such as data warehouses, data lakes and data marts. These architectures employ different modelling and optimization techniques to simplify and speed up the querying of large volumes of analytical data. This could include transforming data into a specialized model such as a star or snowflake schema, or summarizing large numbers of records into aggregate tables.
For Customers wanting to analyze large volumes of data and provide fast response times to end-users, Yellowfin highly recommends investigating and investing in one of these architectures, combined with a special purpose analytical Database Management System.
How does Yellowfin help in preparing data for Analysis?
Where data is stored in a non-optimal fashion, Yellowfins ETL capability can be used to transform the data into an optimised format. Yellowfin’s ETL module allows you to extract data from any supported source, transform that data using a variety of transformation steps, and write the output to any supported write-able database target. For example, this could involve:
- Blending data from disparate data sources together into a single physical table (for example, transforming data from a 3NF model into a star-schema, or aggregating data for faster performance)
- Converting data elements into a more useful form, such as transforming the data type of columns, manipulating the data in columns, creating calculated fields
- Enhancing data by calling custom transform steps – for example to add external data (such as weather or geocoding points) or calling a data science model to calculate a prediction
- Completed ETL jobs are called Transformation flows. These jobs can be scheduled to run on a regular recurring basis and can be started from an external event using a Yellowfin Web Service.
Once the physical structure of the data is finalised, Yellowfins Metadata modelling layer can be used to further prepare data for analytical use. Yellowfin provides a comprehensive modelling layer to capture the technical and business information about your underlying data. Yellowfin uses this information to provide a business-friendly layer to end-users, whilst using the technical information to generate the relevant queries.
Meta-data modelling is performed using a friendly drag and drop interface. The data relevant for reporting can be identified, technical information such as data types, join conditions and so on can be defined, business names, definitions and default formatting can be applied, and additional information can be derived such as data grouping or complex calculations.
Yellowfin can analyze the underlying data and generate recommendations on what steps should be performed to prepare data for analysis.
Once defined, this meta-data underpins all other Yellowfin processes. It need only be defined once, and shared by everyone. This ensures a consistent approach to using data across your whole system.
Can I create reports from spreadsheets?
Yellowfin can upload files in a delimited format (comma, tab or custom). If not already in this format, simply save the spreadsheet as a CSV file from the source application (Save As in Excel or Sheets). CSV files can be uploaded via the Data Source Manager or when you are building a report. The user can control which writable data source the file is written to. From that point, the data can be used for reporting or combined with other data in a view.
As an alternative, an ETL job could be created to upload the file. The advantage of this is that it provides greater flexibility in terms of how the data is transformed, and a job can be set up to perform incremental loading of the file on a regular basis.
What tools does Yellowfin provide to help me transform and prepare my data for use in Analytics?
There are two main categories of tools available.
Where data requires physical transformation – that is data is read from a data source, manipulated or transformed in some way, and then stored in the new format – Yellowfin’s Data Transformation tool should be used.
Where data is to remain stored in its current format, Yellowfin’s Meta-Data layer can be used to create a logical business layer on top of the data. This can include renaming fields, formatting values and creating new on-the-fly calculations. The Meta-Data layer provides the ability to make your data more suited for analytics without having to physically move or duplicate the data.Can I run scheduled jobs to prepare my data?
Yes – ETL jobs can be scheduled to run on a regular basis. Yellowfin provides very flexible scheduling options. A schedule can also be started via a web service, thus allowing integration into third-party scheduling tools.
Can I use 3rd party tools for data preparation and ETL?
Yes. Yellowfin fully supports and partners with a variety of ETL and data integration providers. Provided the end output of your ETL process results in the data being stored in one of the many supported Yellowfin data sources, then any tool or programming language can be used to transform data for analytical purposes.
Can I use database functions?
If your database supports specific functions, these can be added by either:
- Using the calculated field builder. The calculated field builder has a freehand SQL option that enables you to enter any syntax compatible with your database; or
- Using the yellowfin custom analytic functions capability as well.
Further reading:
Access our ETL user-guide here
Information on how to develop your own custom transformation step can be found here
For information on working with files access the user guide here
-
Meta-Data Modelling
How can Yellowfin help to achieve a single version of the truth?
Yellowfin incorporates a single, shared meta-data repository. This repository is a central store of all information relating to database connections and the structure of relevant data stored in those databases. Business rules such as how to join tables, how to aggregate data, various calculations and so on, are defined here. The metadata is established once, and is then shared across all permitted Yellowfin users and Yellowfin functions. This ensures that all users are working from a consistent set of definitions.
Further, when authority to create or edit meta-data is distributed across multiple users, approval rules can be defined so that any new or changed definitions must pass through an approval workflow prior to being published for broader use. This creates necessary checks and balances and guarantees that system experts have reviewed and sign-off on all metadata being used.
How does the MetaData layer promote re-use and consistency?
All reporting content in Yellowfin must be built on top of the established metadata layer. This prevents different users across organisations from defining different or inconsistent sets of rules and definitions that result in conflicting and unreconciling reports.
How does Yellowfin handle complex schemas?
Yellowfin can accommodate the most complex schema definitions. There are no limits in terms of the number of tables or columns that can be defined into a single view, or how many views can be created against a given data source. Care needs to be taken to carefully plan and design the reporting views, and to model and structure the reporting database itself using best practice design principles.
Complex join conditions such as correlated sub-queries and unions can be designed using the drag and drop view builder – or if desired, freehand SQL can be directly entered.
Complex schema types such as Star-schemas with multiple fact tables can result in ambiguous join paths. These can be managed in Yellowfin using the Virtual Table object. This object is available at the view builder and can be used to combine fact tables virtually and force the correct and optimal join paths for these types of tables.
Further, subqueries can be used to join data from different data sources. Where all parts of a sub-query are from the same data source, the SQL and subsequent process is pushed to the database. Where components of a query are from different sources, Yellowfin will pull the intermediate result sets into memory in order to complete the query.
How can I combine data across different sources?
There are multiple ways to do this:
- ETL – a transformation flow can be built with multiple input steps. Each input step can be from a completely different data source or file. Data can then be combined using a merge transform step. Results from a transformation flow are always written to a new or existing table in a writable data source. That resultant table can be used for reporting in Yellowfin.
- Reporting – data can be combined from different sources using the advanced sub-query feature. Multiple types of sub-query are supported – including append, union, minus and intersect. The sub-query data can come from any view and that view can be from a different data source. Data from different data sources is combined in memory in the Yellowfin server.
- Dashboards – data from different data sources can be combined as different visualization objects on a dashboard. These objects can be wired together such that they all operate off a common set of filters, and can respond automatically to drill and brush events in another object.
Can I use my own SQL to generate a metadata layer?
Yes, freehand SQL and stored procedures can be used to define the data in a view. Additional metadata formatting and preparation is still available when using this method to obtain data in a view.
What are the best practices for creating meta-data layers?
The purpose of a Yellowfin view is to support self-service data analysis, automated analysis and signal detection as such when approaching the development of a view you should consider the breadth and depth of analysis you want to undertake.
In principle Views should:
- Support many possible queries, not just a single query.
- Enable self-service reporting by simplifying the data complexity
- Ensure governance and data security
Typically, we would recommend that a view is broad enough to cover the needs of a particular subject area. Within Finance as an example this could be Billings analysis. In which you would include attributes about the customer, the product sold, the store or channel etc. This would allow end-users the flexibility to analyse their Billings data by many dimensions etc.
It is not recommended that you attempt to model your entire information domain into a single view. Whilst technically feasible any view with 100’s of attributes becomes incredibly difficult to use by end-users. Where users need to bring together data from one or more views this can be achieved using advanced subqueries within the Yellowfin report builder.
To make self-service reporting easier you should consider how to make report creation easier by doing as much work up-front so that creating new content is quick and easy. Ways to do this include:
- Translate attributes into common business terms
- Categorise fields into folders that business users relate to
- Define default aggregations and formats for metric fields
- Create commonly used calculated fields – such as financial ratio’s
- Include commonly used filters
- Creating reference codes that translate database codes into human-readable values
- Apply security to sensitive fields, or data
How do I join two fact tables in a meta-data layer?
In some cases, a star schema data model may have been defined and a user wants to combine data from across two fact tables into a single report. This may occur when fact tables are arbitrarily split, such as by a transaction type; or when you want to combine different types of data from separate fact tables that have common dimensions.
There are multiple ways to address this:
- Data from both fact tables could be physically combined into a single table
- A database-level view could be created that unions the tables together
- A virtual table can be used in the Yellowfin view builder to union the two tables together and make them appear as a single table to the view builder. This will always force the union process to take place first and prevent incorrect SQL being generated
- The two tables could be combined using advanced subquery capability in the report builder
Further reading: