Metadata is defined as “data about data.” This is a terrible definition; let’s see if we can make better sense of it.
If we examine:
What can we deduce?
It’s a word file, created 24/11/2011, 26 KB in size and based on the file name, it’s a performance review for employee 1244. All this information about the file is metadata. Do we care about this information? No, we don’t; we care about the performance review, what is inside the Word document. However, we need this information so we can find the performance review for employee 1244 which we want to read.
Metadata is also known as tags, descriptions, or properties. It labels basically any piece of data that helps us identify information we are looking for.
Okay, so what? Metadata is about identifying the content of a file; big deal. Well that is exactly the problem with the definition; it is the application of the metadata that is important.
Imagine you are running a company with 1000 employees and the company has been in business for 10 years. You would have about 10,000 employee reviews on file! All of a sudden, the labels identifying the content of files become much more important because you need this information to find the review you are looking for.
Continue to multiply this out again by the 10,000s of files a company has and 1,000s of employees and now we have 1,000,000s of files and the only possible way we can organize the files is by identifying them with tags. We really don’t care about the Meta data, we care about the stuff in inside the files, but the metadata helps us locate the content in the files so we need to apply the metadata to find the files.
Let’s take a look at traditional options we have to help us organize our files:
We can see folders for various years with multiple versions of performance reviews for various employees. We are creating metadata to organize files through folders and file names. Unfortunately, this is unmanaged information. There is no control over the folder structure, nor control over the file naming convention. Multiple these by 1,000,000s of files and you have an uncontrollable mess (look at the LAN drive in your company).
How can we fix this mess? We need a metadata manager.
One aspect of SharePoint allows you to define the metadata you want to maintain and then SharePoint provides tools which use the metadata so you can organize and find the information you are looking for.
In the employee review example, we can identify what we want to keep track of, for example, employee name, employee id, review date, and department. When we save the review for an employee in SharePoint, SharePoint will prompt us for the information to organize the file.
This is called managed information. In addition to the information we already had (file name, create date, and file size, in the example above) SharePoint manages and controls the data we need to keep track of, by using common, enforced terms, to help us to identify and organize our information.
So we have a cop enforcing file tagging, is that all SharePoint is?
That is one aspect of SharePoint and a major one. SharePoint enforces the use of metadata before saving a file so we can find it later, and uses Meta data in searching, for example:
- Find all documents tagged with Employee ID 1244
Identify employee reviews by year, or department.
The application of metadata in SharePoint is also much more robust than a simply file name and folders. For example, if we want to keep track of an employee’s region but the employee can be in multiple regions SharePoint can easily allow for that
How can we use metadata to organize files?
Because we can define the information we need to organize our files and enforce its use, we can build content management solutions which are much more flexible than folders with deep nesting.
One approach would be to start with 3 major categories (but of course this can be more or less depending on organizational / information needs).
In each of the areas we can then identify the metadata that we need to help us manage our content. For example at the lower department level, we may need to know who is working on the files, but at the upper levels, we simply need to identify which department created the content, or what the content is for and identify its purpose.
SharePoint allows you to define metadata terms at the project level, department level and corporate / intranet level and to configure which terms are optional, mandatory or shared across the company. This aids in creating a common vocabulary across the organization.
There are tools in SharePoint that allow users to move files between the areas and which will then enforce the files having the correct metadata to organize the information in the new areas.
Can you deploy SharePoint without metadata?
Yes, you can; unfortunately, you won’t have a much better system than you would have with your LAN drive. The SharePoint tools that depend on metadata to identify your information would not have what they need to organize your content.
Do we need to identify all the metadata in our organization?
No, I think that would be a huge task. We can start by identifying common terms in the company; identify local terms used by departments and projects, apply metadata to shared information. The application of metadata will be different for each area of the company and will grow and morph over time as managed sites are created and files are migrated from unmanaged LAN drives.
SharePoint and Business Intelligence