Image Metadata and Comparison

Published 17 May 04 10:34 AM | despos

The digital picture world is exponentially expanding, and so it is the number of pictures on everybody's hard disk. As a Cutting Edge reader pointed out, it is getting more and more frequent that people end up with several different files for just one physical photo. For example, a small thumbnail, a larger thumbnail, one black-and-white copy, one slightly blurred, one with some touch-ups, and so on. The question is,

Is there any way with .NET that I can find similar photos?

It goes without saying that the point here is not finding identical photos, but just identifying related photos after resizing, color correction, and the like.
This has very little to do with .NET itself, I'm afraid. The .NET platform contains thousands of classes but it's an application framework rather than a graphic library. So there's no built-in class to compare images (and it's supposed to be.) There are algorithms in Computer Graphics that try to compare images and can be found in ad hoc books. I don't know if advanced graphics commercial libraries implement any of them. It can be, though.

My point here is different and closer to reality I believe--metadata.

I'm not (yet?) a Longhorn super-expert and I don't know nitty-gritty details of the Longhorn storage mechanism. However, at its core the Longhorn storage engine makes intensive use of metadata to categorize, index, correlate data. LH creates and maintains a database of references and links metadata to files. When you query for certain files (e.g., pictures of kids when they were 6; all copies of a given picture; pictures of your wife smiling; ...) the engine does a plain old SQL query on internal tables and returns a list of JPG files.

No apparent magic behind, but if you don't know anything about underlying tables it might be like magic. On the other hand, there are tables behind the file system since the first day of creation of PCs.

To be able to consume metadata, metadata must be added to each file. In a semi- or totally automatic way. This is something that can be accomplished with smarter folder. The folder must be able to recognize image files and "force" you to enter some metadata whenever you save, edit, copy, delete any files.

Doable today, maybe easier (because integrated with the system shell) tomorrow with Longhorn.

Thoughts?

Comments

# Guido Domenici said on May 17, 2004 07:53 AM:

Indeed, Longhorn will surely offer more palatable options when it comes to associating metadata with files of any kind. As you point out, however, NTFS offers a similar facility today by means of the "OLE structured storage", which does the trick but has the disadvantage of being extremely cumbersome to use, especially from .NET.

To accomplish that, one may look at the IPropertySetStorage interface in the Win32 documentation (good luck). However, to make things simpler, there's a COM DLL one can interop with in order to set/retrieve a file's metadata (described in KB article http://support.microsoft.com/default.aspx?scid=http://support.microsoft.com:80/support/kb/articles/Q224/3/51.asp&NoWebContent=1&NoWebContent=1, "Dsofile.exe Lets You Edit Office Document Properties from Visual Basic and ASP").

# John Schroedl said on May 18, 2004 10:26 AM:

I agree!

I've been looking for a while for a relatively simple score (or scores) that could be assigned to an image representing visual aspects of the image. Perhaps a weighted avg of R,G,B? H,L,V? Storing this number/numbers in metadata could facilitate a "find similar pictures" operation quickly -- something I've wanted for a long time.

As for assigning keywords, perhaps there could be keyword "bins" to dnd images on (manually assigning keywords is obviously tedious and needs to be done in bulk if possible). Another idea would be an integrated intellisense-like checkable keyword list which pops up on a hover in explorer.

What about a rating system? I like to grade things 1-5 and come back later and query those. Needs to be easy!

John

Leave a Comment

(required) 
(required) 
(optional)
(required)