Flickr Strips Copyright Metadata

Flickr has a wee problem. While it loves the metadata that is in your photographs when you upload them, merrily adding the camera information to its database, adding keywords as tags, and the like, it then turns around and does something absolutely horrible. It proceeds to strip the most of the metadata out of every resized image it makes from your originally uploaded photograph, including the all important copyright and other IPTC metadata that describe who created the photograph and potentially carry along a caption, title, and keywords for the image.

Boiler Bay

This means that the 500 pixel on the longest size images that you see on Flickr as you browse a somebody's photo stream have no identity built into them as to who made them. Nor do any other size besides the original uploaded file. Sure, when you're viewing the image on the Flickr page, you don't need that information. But when a user saves that image out to their local disk, or embeds a that photo in their own website even if it's being served out of Flickr's farm, there's no way to link that photo back to the maker.

The problem gets worse as the use of the image spreads. For example, if you upload an image and then it gets uploaded to Wikipedia and then it makes its way into a news story on CNET or appears on a blog somewhere, the ability to find out who took that photo gets harder and harder with each successive step and often involves asking humans "Where'd you get that image?" With the number of images we all see in a day, you can see that this particular approach doesn't scale.

Furthermore, with data going everywhichway on the Internet today, not carrying along IPTC information is a recipe for creating orphan works. An orphan work is defined as a work for which a the copyright owner may be impossible to identify and locate. Orphan works have become such a big problem that the U.S. Copyright Office has prepared a report on orphan works. According to the Center for the Study of the Public Domain at Duke, orphan works probably comprise the majority of the record of 20th century culture.

By serving as one of the huge repositories of online imagery, and by stripping metadata on almost every image it serves, Flickr is unintentionally pouring fuel onto the fire of the orphan works problem. Sure, creator and copyright metadata take up a wee bit of space in the file format, but in today's Internet environment, the relatively few bytes it takes to preserve this information is negligible, especially compared to the value of being able to pass along this all too critical information.

Try it for yourself. Grab two versions of an image from Flickr, such as the image above. Grab the originally uploaded file and then grab the other any of the resized images from that file. Take a look in any tool that lets you examine a file's metadata. Here's an example from one of my images. The info box on the left is from the original size file, the info box on the left is from a resized version, specifically the image embedded above.

flickr-meta-original.png flickr-meta-resized.png

As you can see, the panel on the left has all sorts of information about the photo, it's caption, title, and the creator information. If somebody finds this image and wants to find me, they can type my name into any search engine and they'll find my contact info very quickly. On the other hand, the image that the panel on the right belongs to—the resized image—doesn't have any IPTC data embedded into it at all. Imagine you got ahold of this image from somewhere out there on the net—maybe a Facebook page where somebody had included it to show to a friend—and you wanted to contact me. Could you?

What should Flickr do to fix this problem? There are two steps, one of which should be relatively easy, the other which is hard.

  1. Flickr needs to change its resize engine to preserve IPTC metadata in uploaded files. This should be straightforward.
  2. Flickr needs to reprocess its collection of images so that IPTC metadata in original files is propagated back to all images on Flickr. This is the tough one since there is so much data in Flickr at this point.

Making sure that creator, copyright and other IPTC information, including titles and captions, is preserved in every image sourced from Flickr—especially the 500px default sizes—will be a huge benefit to all of us as the years go on and the images that people source from Flickr are used in more places than we can imagine. Otherwise, all of those images that are being served from Flickr right now will eventually become orphans, causing problems for both creators and consumers of that work.

I dig Flickr a lot, and Flickr has served me very well over the years, including just over the last few days when I asked readers of this blog to mark photos for possibly inclusion into a catalog of images to be sold as prints. The stripping of metadata, however, is a big problem for me. I've chimed in on a feature request at Flickr for preserving IPTC metadata. If you think that preserving IPTC data in photographs is important, and you're a Flickr user, please consider joining in and making your voice heard.

Related Posts:

Related Links:

0 TrackBacks

Listed below are links to blogs that reference this entry: Flickr Strips Copyright Metadata.

TrackBack URL for this entry: http://duncandavidson.com/cgi-bin/mt/mt-tb.cgi/16

9 Comments

Wow - that's an incredible oversight/problem.

Author Profile Page jerakeen.org on April 8, 2008 6:48 AM

I'm going to point out the obvious problem here. 2 billion photos times 5 resized views of every image times (say) 1k of metadata for titles is just shy of 10TB of extra storage that has to come from _somewhere_. Which might be an overestimate, of course - not everyone _has_ metadata on their photos, but nowadays I'd expect most photos to have at least whatever their cameras added.

More worryingly, there are plenty of photos in the wild with embedded colour profiles that are several times larger in bytes than the photo itself. Hanging onto those for the small thumbnails rapidly becomes silly - I've seen 2 megabyte 64px square thumbnails served from twitter, for instance. So you have to start picking and choosing which bits of metadata you're going to keep. And that'll just lead to _more_ arguments.

You're still right, of course. The underlying problem of orphaned data is too huge to dismiss on perfectly solvable technical grounds. But it's worth thinking about those technical grounds while considering the solution.

This would be a nice feature. Do any of the online services preserve metadata in resized images? I'm pretty sure SmugMug only keeps in the the original.

You are obviously right; alas, it seems that most services that provide an outlet for photographers (professional or amateur) are not concerned about protecting their users' rights (or preserving the relevant photo metadata), especially if it means additional programming and processing resources that would need to be committed to this).

A very recent Adobe Photoshop Express TOS fiasco (see my longer comment ) is a good example of companies not only ignoring photographers' rights, but indirectly (even if perhaps unintentionally) encouraging the abuses you described in your recent "Copyright Conspiracy" post. I guess the only way to try and change it is to talk and blog about it, until the importance of those issues "registers" among more people, including the casual users who are the majority of people using services like flickr, Picassa, or Adobe's new PS-Express site.

Jerakeen: With Flickr now taking in, hosting, and serving video--I'm not really thinking the extra n'th percentage of storage space that keeping full metadata around will require is going to make any kind of economic impact. Zero. Zippo. Nada. As far as what kind of metadata to keep, sure, you can make useful arguments about big ICC profiles--and I'm happy to convert thumbs and micro images to sRGB and let them be if that's what it takes--but IPTC metadata is pretty darn insignificant, and IPTC Creator and Copyright at a minimum should be a no brainer.

Ben: I'm not sure. I've not tried out some of the others to see. Good question.

George: indeed, it's another one of those "Community Content" dark sides. It's all good when it looks good for the aggregator, but if there's something that needs to be done that's not in the aggregator's interest, it takes more oomph for things to happen. Talking about it is one way. This is why I encourage everyone to go hit up that Flickr page and make their voice heard.

Author Profile Page jerakeen.org on April 9, 2008 12:56 AM

Yah. 12 hours after I worry about harddisk space, video turns up. Now I look _really_ foolish.

Wonder what Microsoft's lawyers think about this? IMHO, they'll probably make Flickr add the IPTC metadata to every resized image.

Ben: I just checked out the resized info in Zenfolio images and they do preserve Copyright and other EXIF/IPTC information in their resized images. You have to try pretty hard to grab an image thanks to some of their anti-copying behavior, but even once you do snag the image, it's got everything there.

Failures such as these are why the costs of technology still outweigh its benefits in so many cases. Weren't we all promised the world of the future? But now that it's here we spend what seems like five-fold the time and energy to maintain it. This is the perfect example: Flickr is an amazing pool of imagery (expecially Creative Commons-licensed photos), but in order to adhere to the terms of the licensing agreements to use the photos we must spend more time manually recording URLs of access and the creators of each image. No wonder sites like iStockPhoto with their rock-bottom pricing on RF images are so popular.

Leave a comment

Recent Entries

Why Auto White Balance Isn't Perfect
Every digital camera made comes with a promise of making perfect pictures with the click of a button. The algorithms used to deliver on this promise, however, are by no means perfect, including the auto white balance algorithm.
Print Store Update
On April 2nd, I was pretty optimistic that I was just a couple of weeks away from launching an online store with the first of my prints for sale. As you know, I asked for a bit of help from all of you to help out. That process turned out to be a fantastic experience and resulted in a lot of feedback.
Ignite SF April 2008
As part of Web 2.0 Expo SF 2008, Brady Forrest and company hosted an Ignite SF at the DNA Lounge last night. Ignite nights are always a good time. What's Ignite? Take 14 or so speakers, give them 5 minutes each to talk, and see what happens. The result is almost universally fun. And, thanks to the format, if somebody's talk is a dud, well, you only have to wait 5 minutes for the next one.