Explore Yellowfin now with our sample datasetGet Started
Getting your meta-data layer right is a key step to enabling business user self-service, generating automated analysis and speeding up the creation of all of your content. Spending a little extra time on creating a great meta-data layer is highly recommended, as it will pay huge dividends later on. The 4 key outcomes of your metadata layer include:
Anything that you can do to reduce the amount of steps it takes to create and format a report by end-users is worthwhile. This section covers off the best practices for the creation of meta-data and preparing data for analysis.
There is a balance between having too many or too few meta-data layers. Some try to create a monolithic model where all of the organization’s key metrics and dimensions are in a single view. This can be incredibly unwieldy and complicated for end users to use.
On the other hand, you can also go the other way and create a view for almost every query you want to run. In this scenario you end up with too many views which creates confusion for your end users when they need to choose a view for creating a new report.
The ideal approach is to have a balance between the two. Build your meta-data layers around a single subject domain. As an example, for marketing this could be website traffic (not the entire marketing function). Include all the fields you would need for analyzing traffic. This would enable users to do comprehensive analysis of this subject domain.
Where you need to analyze data across subject domains you could either choose to create a new view or if the number of reports is relatively small you could choose to build those using subquery functionality.
The entity model you build will define all the joins and logic required to generate the correct database query when creating new reports. The aim is to create relatively simple models so that performance is assured.
If you need to create highly complex entity models you should test them for query speed. If the speed is unacceptable, then consider options as described in the performance section above.
Critical to enabling self-service is that all your fields use common business terminology that is well understood by your end users. Often field names in data source are not commonly used or have been abbreviated. Rename all your fields and provide field descriptions so there is no ambiguity for your end users or for new data analysis who need to create analysis from your meta-data layer.
By default, fields are added to generic folders = Metric, Dimension and Time. Whilst somewhat useful it does not provide a lot of detail about your fields for your end users. If your data set has a lot of fields use field categories rather than the default Metric & Dimension folders. In this way you can group related fields into their own specific folders. Good examples of this would be Time, Location, Product Details, Customer Details etc. This will help your end-users who are creating new reports to make sense of the fields and what their purpose is.
Field hierarchies need to be defined to enable drill down interactions. Common hierarchies include Date (Year, Quarter Month) and Product or Location (Region, Country, State). Define the hierarchies you believe your end-users will want to use for their analysis. Users can always use drill anywhere functionality, but if common drill paths exist, set these up at the view so they are ready to use.
Ensure that all your date and metric fields have default formats applied. These default formats should be formats that are most commonly used for that field by the business. For example, for Currency, do you want to include a $ prefix and two decimal places (.00). By having default formats set up correctly, you will reduce the time it takes to create new reports that are formatted correctly and ready for consumption.
All metric fields will be summed by default at the report layer. If, however, your metric should not be aggregated, or any other aggregation other than SUM is a better aggregation to use, then you should set the default aggregation at the meta-data layer. Doing so will make reporting creation simpler and more efficient for your end-users.
Often values are stored in a database in all UPPER or all lower case. For dimensions it is more customary in a business setting to use them in Camel Case. Set your preferred case formatting as default, so that your end users do not need to include an additional step in their reporting creation process.
Some fields have values with specific sort orders that your users will expect. When categorical fields have this type of specific sort order, it is used to create an ‘ordinal’ axis on a chart. An example of this is day of week or performance criteria good, average poor. In cases such as this, it is best to define the sort at the meta-data layer so that users do not have to manually sort the data when creating a report.
If you use common metrics or categories across multiple reports and dashboards, you may want to assign specific colors to them. For example, the sales budget could always be grey versus sales actuals are always blue. This makes all your content visually consistent and easy for your end users to interpret.
Often data is stored in a database as codes or abbreviations. This is especially true of ISO codes such as country codes. These should be converted into values that are human readable, either at the ETL stage or at the meta-data layer.
Where common calculations are used in the business, such as profit calculations or conversion calculations, it is best practice to create these at the meta-data layer so that end0users can use them quickly and without error.
In addition, if your calculator logic were to change it is much easier to update them once at the meta-data layer, rather than identify them in each and every report and change them there.
Ensure that you have a counter field where 1 = each row in the data set. This enables users to do simple calculations, without needing to do a ‘count distinct’ on an ID field.
To support a range of data visualizations easily, create both dimensional and well as metric versions of commonly used subject areas. For example, if your data concerns crime and analysis of narcotic crime is common, you can create:
A metric – Narcotic crime count
A Dimension – Narcotic Crime – values = Yes / No
With these two fields, your users can simply aggregate the number of narcotic crimes and analyze trends, as well as use the dimensionality to split total crime by non-narcotic and narcotic crimes.
Commonly used filters should be created at the meta-data layer. These often reflect common analysis that is performed, such as analyzing ‘Organic Visitors’. In this way, your end-users can simply select a pre-defined filter rather than having to create a unique filter each and every time. If your filters are more complex, you will have the added benefit of having a less error prone environment.
The meta-data layer is used to auto generate assisted insights and run automated business monitoring (ABM), such as with tools like Yellowfin Signals. Make sure that the fields to be used for automated analysis are defined and are the most appropriate for the types of analysis your end-users are likely to want to undertake.