A really interesting development I’ve seen in the data and analytic space lately is the rise of the data catalog. You may know these by another name such as a semantic or metadata layer, but they’re all fundamentally the same thing. Data catalogs aren’t new, they’ve been around for a long time. While some vendors like Yellowfin and Cognos have always had them, others like Tableau and Qlik are now just getting to them.
Table of Contents
What is a data catalog (or metadata layer)?
A data catalog is basically a map of a data set that removes the complexity of the underlying data source. Take relational databases for example. They have lots of tables and if you want to write reports or do analysis you have to map those tables together. The data catalog does this and, if your data has transactional names that don’t make sense, you can rename the fields to business terminology so everyone understands what it is.
Create business logic
If you want to control the kind of analysis people do and ensure that it’s generally the same, then a data catalog is perfect because it provides all the business logic. It joins the names of fields and the type of calculations you use and delivers that to the business user pre-packaged so they can do their own self-service data discovery.
Once a data catalog is built you can use it over and over again and even publish it so people who know nothing about the underlying database can use it and understand what the names mean. This is a huge advantage for organizations that build lots of analytic content as it ensures that everyone starts from a similar point and their analysis can be understood by others.
Get governance and security
The data catalog also gives you governance and security. You can say who gets to see what data and what they can view. This is so important now because we’re moving away from desktop analytics and towards centralized analytics in the cloud which requires tighter governance around data.
At Yellowfin, we saw the value of the metadata layer from the outset and have found that enterprise customers in particular find it very valuable. It’s like the beating heart of our application because all the business logic and how end-users interact with their data is driven through it.
Guide to Governed Data Discovery Best Practices
The only way you can be certain of trustworthy data is by implementing robust data discovery governance. And part of that is having a solid metadata layer. Here's a guide to best practices.