Before we get into metadata management, we must understand what metadata really is. To put in simple terms, it is data about data. Metadata started as walls full of drawers, filled with little cards that showed users at libraries where to find specific information. Librarians would meticulously catalogue information from each book into the cards that are categorised by keyword and subject. Today, this same practice is called Metadata Management and is responsible to maintain a system of record of information contained in each enterprise data lake.
Any time someone tries to tell you that metadata is ‘meaningless, don’t worry, it’s just who you call, it’s just phone records, it’s not a big deal’ – realize we kill people based on metadata. So they must be pretty darn certain that they think they know something based on metadata.
– Rand Paul
For enterprises, though, the question of metadata is a little less obvious. Data is
Metadata here could be defined as a system of record of every transaction made on enterprise data.
Why metadata
If you walked into a library with a thought in your head and nothing else, what you would experience is the weight of all the information contained in the library. Not something you would like – which is why search engines became as popular – because they help put things into perspective before they display 4 million search results for your keyword or key phrase.
If you walked into a library with a thought in your head and nothing else, what you would experience is the weight of all the information contained in the library. Not something you would like – which is why search engines became as popular – because they help put things into perspective before they display 4 million search results for your keyword or key phrase.
What metadata does is:
Metadata management
Having established its importance, metadata needs to be managed effectively for enterprise data to be productive and efficient. But before we step into what it encompasses, metadata management is defined as:
the end-to-end process and governance framework for creating, controlling, enhancing, attributing, defining and managing a metadata schema, model or other structured aggregation system, either independently or within a repository and the associated supporting processes (often to enable the management of content)
Imagine terabytes, or rather petabytes of data in the form of maps, forms, whitepapers, reports, machine data, and a host of other formats that comprise enterprise data. Now imagine trying to unearth insights from this data hoard. With metadata, however, each of these data sets are categorised and catalogued under logical search strings that allow search engines to identify the quintessential needle in the haystack. So when you’re looking to identify the most productive location for drilling, the search engine can pull information from seismic data, drilling data, contract reports, logistics data and a host of other reports to give you information that can help you take that decision quickly and effectively. However, it all starts with creating the metadata registry.
Metadata discovery or harvesting
As a process, metadata management starts with Metadata Discovery, where we search enterprise data for logical associations within data sets and map them to search criteria. This forms the metadata registry. It works at three levels:
Metadata management tools
There are a multitude of metadata management tools from IBM to Informatica, to Esquire – the list is endless and each has their specific benefits and challenges. For each enterprise, it is important that the tool is selected, based on:
Each of the available tools offer automated metadata discovery and diverse features that help in metadata management. Ultimately, it is the enterprise-specific business case that will help identify the most efficient metadata management solution that needs to be implemented.