logo
Published on developer.* Blogs (http://www.developerdotstar.com/community)

It's All In the Metadata

By Mario Van Damme
Created 2006-03-21 02:13

Introduction

Many people don't worry about metadata; rather they are only interested in their data. However, metadata is extremely important as it describes the semantics of 'what' the data represents.

In this article, I'll briefly try to convince you of the importance of metadata. After that, I'll discuss some metadata approaches.

Why we need metadata

The red ribbon through this article is my example of digital pictures. When you look at these, at first sight there's no big difference in treating them in another way than any other file.

However, let's take a step back – outside of the digital era -. In the past, we all took pictures using a film camera and sent them to a film developer. The pictures that came out of the photo finisher would last many decennia.

In those days, we typically wrote some text on the back of the picture to indicate who was on there, where the picture was taken, and so on.

This type of 'data about data' is what we call metadata in computer science terms.

Metadata approaches

Metadata can be stored together with the data. An example of this is the JPEG format with its EXIF metadata header.

EXIF is a Japanese standard for storing exposure-related metadata.
These include: Exposure time, flash used y/n, camera manufacturer and model, lens used, etc.

The main advantage of this approach is the beauty of the 'self-containment' principle. This means that when you copy or move the file, the included metadata travels with it.

Another approach is to store the metadata in a separate file or repository. Tempting here is to use xml to describe this metadata. Note that also the semantic web falls into this category.

Physically separating the metadata from the data has the following advantages:

  1. The file doesn't need to have a metadata header (Note that not all file formats have a metadata facility). File formats such as JPEG, MP3, MS-Word, etc. all have a metadata header. However, each file format has its own metadata standard.
  2. The referenced file doesn't need to be altered in order to add metadata to it. Consider the situation where multiple people can add metadata to pictures on the web. Different people are evaluating the same picture differently (some will add metadata related to the colour composition, other related to the location, still others related to the exposure). In case this metadata would be stored into the file, it would be difficult in case of concurrent updates.
  3. The metadata can be organised for faster access. Using indexes, etc it is possible to organise the metadata such that it can be accessed a lot more performant.
  4. Metadata can also be added to files that don't include a metadata header.

A third approach is to keep the metadata in the files, while duplicating it in a separate repository outside the file. This brings the advantage of self-containment in combination with good performing querying capabilities. This approach is used in picture management software such as ASee/DSee.
The sad thing about it is that there is no standard storage format for this type of information.

Why should I be concerned

As I mentioned in the introduction with my example related to digital imaging, you typically want to keep images for multiple decennia.

We should be concerned about how we can make sure that we can still look at our digital pictures and their metadata many years from now.

So how can you backup your pictures with metadata?

Even though it's not ideal, I would advise to use the last option, by means of extracting the EXIF metadata header.

What would be the ideal situation?

You might wonder whether or not you will be able display your images after all those years (metadata without the data would not make sense, right).

To answer this, let's dive into the recent history:

GIF 87a is an image format that was created in 1987 and which is still supported on multiple platforms. It's almost 20 years old, so I definitely think that JPEG will last –at least – as long as the GIF format.

Extending the EXIF metadata

The EXIF metadata can be extended in many ways:

An example of such as extension is the ability to add the GPS coordinates to the image's exif metadata header when you are taking a picture. With GPS becoming a commodity, prices are dropping tremendously; it doesn't have to take long before some vendor of digital cameras sees this as a competitive feature.

Can you imagine? You travel around for a couple of weeks, visit some places and take a lot of pictures. You come home, and there your PC software is able to link the location of where you took each picture, based on the GPS coordinates in the EXIF metadata.

Another example is the weather: You are looking at a picture; it looks warm but how warm was it? This could be accomplished in one of the following ways:

  1. Indirectly using the GPS coordinates together with the date/time that the picture was taken.
  2. Directly by incorporating a thermometer into the digital camera, storing the result in the EXIF metadata.

Metadata rules

Let me end with some metadata rules:

  1. Don't misuse the semantics of metadata fields.
    In the context of the EXIF standard and what Windows XP shows you be default, it might be tempting to misuse the 'Model make' (manufacturer of the device that captured the image) to put in your own comments for example. This field is shown be default, so you always have access to your comments when using the Windows Explorer.
  2. The metadata standard used needs to be extensible. You can add additional metadata fields to the EXIF format for example.

Resources on EXIF metadata

  1. The unofficial EXIF site : http://www.exif.org [1]
  2. EXIF reader : http://www.takenet.or.jp/~ryuuji/minisoft/exifread/english/ [2]
  3. .NET API : http://www.eggheadcafe.com/articles/20030706.asp [3]
  4. Java API : http://sourceforge.net/projects/libexif [4]

Exif command line tool (examples):

  1. exiv2.exe -M"add Exif.Image.Make Mario" "D:\Documents and Settings\All Users\Documents\My Pictures\Winter.jpg" : Add the image creator
  2. exiv2.exe -M"ADD EXIF.PHOTO.USERCOMMENT THIS PICTURE WAS TAKEN IN THE WINTER." "D:\DOCUMENTS AND SETTINGS\ALL USERS\DOCUMENTS\MY PICTURES\WINTER.JPG" : Add user comment

About the author

Mario Van Damme is a software architect, working for quite a number of years in the medical industry, and prior to that in the insurance and banking industry.
He can be contacted by e-mail at: mvandamme@sopragroup.com [5] and mario.vandamme.mv@belgacom.net [6].


Source URL:
http://www.developerdotstar.com/community/community/node/450