Thursday, May 14, 2009

Hidden Data in JPEG Files







Metadata in JPEG Files

window.google_render_ad();
The JPEG file format is the format most used for storing and transmitting photographs on the Internet. In addition, a large number of digital cameras store pictures as JPEG files. However, many users are likely to be unfamiliar with the fact that a JPEG file can contain other data besides the actual photograph.
The JPEG file format allows it to embed additional information called "metadata" in the file header. (Other image file formats can contain metadata, too.) The purpose of these metadata is to provide additional and useful information along with the picture. Image manipulation programs and especially digital cameras take advantage of this feature.
Metadata can be embedded in different ways. A common way is to store them according to the Exif specification, which has been created by the Japan Electronic Industry Development Association (JEIDA). Other popular specifications are the IPTC headers defined by the International Press Telecommunications Council (IPTC) and XMP developed by Adobe Systems. More detailed information about these metadata formats as well as descriptions of other metadata formats can be found on ExifTool's Tag Names page.
Among other things, the metadata section of a file can contain information about:
make and model of the digital camera
time and date the picture was taken
distance the camera was focused at
location information (GPS) where the picture was taken
small preview image (thumbnail) of the picture
firmware version, serial numbers, name and version of the image manipulation program, etc. ...
Should Metadata Be Removed?
If you intend to publish JPEG files on the Internet, you might want to remove all metadata to reduce the file size of the JPEG files. Depending on what kinds of metadata are stored in the file, the reduction can range between a few bytes and several kilobytes. For example, if you have a website with metered bandwidth or if you have visitors with dialup modems, you might be interested in saving as much bytes as possible.
Another reason why you might want to consider removing all metadata beforehand is that metadata can give away potentially sensitive information. This information can mean a thread to your privacy or to other legitimate interests (e. g. the interest of journalists to protect their sources). The following fictitious and real-life examples try to illustrate the problematic nature of metadata information:
Many digital cameras embed a small preview image (thumbnail) of the picture in the header of each JPEG file. This makes it possible to quickly browse the pictures. Not all image manipulation programs update this thumbnail along with the main picture. The consequence could be that an edited picture retains the original unmodified version of the picture as an Exif datum. In some cases, this may only be inconvenient; in other cases, this could create a significant information leak. For example, a supposedly anonymized picture of a person still shows his or her identity in the thumbnail. Another, more embarrassing example is the case of television personality Cat Schwartz (e.g. TechTV). Schwartz had published a photograph of herself on her personal blog. Because the program she had used to edit the picture did not update the thumbnail, the thumbnail revealed more nude facts than originally intended.
The following real-life case happened in February 2006: The Washington Post published an interview with a computer hacker: Invasion of the Computer Snatchers. The hacker had agreed to be interviewed only if he was not identified by name or hometown. In addition to the interview, a disguised picture of the hacker was published. Unfortunately, the picture contained IPTC metadata about the city and state where it was taken. With all the details mentioned in the article, it could be possible to track down the hacker.Other kinds of metadata could have meant a comparable thread: The Exif datum "location information (GPS) where the picture was taken" enables one to exactly locate the place where the picture was taken. The Exif datum "distance the camera was focused at" allows at least to calculate the exact position of the photographer if one knows the location of the photographed object.
A fictitious example: Bill does not want to go to uncle Linus' birthday party. He would rather go to a concert of the Rolling Stones. He tells his uncle that his boss wants him to work overtime to finish an important project. At the concert, Bill's friend Steve takes a picture of Bill. Bill publishes the picture on his homepage. Weeks later, uncle Linus visits Bill's homepage. He examines the Exif data "time and date the picture was taken" and discovers that Bill did not work overtime, but went to a concert on the day of the birthday party.
How to Remove Metadata
If you want to remove metadata from JPEG files, you should keep in mind that these data can provide useful information (such as focal length, shutter speed, etc.). Therefore, you should never remove them from your original pictures. Instead, you should make a copy of the picture you want to publish and then remove the metadata from this copy.
Several software tools support the removal of metadata. A recommendable one is jhead. It is a free open source program that runs on a variety of operating systems (Windows/Linux/BSD/Mac). Because jhead is a command line tool, it is particularly suitable to be used with batch files.
Less experienced users should adhere to this step-by-step guide (Windows):
Create the directory C:\jpeg.
Copy all pictures whose metadata you want to remove to C:\jpeg.
Download the program file jhead.exe to C:\jpeg.
Click on "Start" and then "Execute...".
Windows 95/98/Me: Type command.com and click on "OK".Windows NT/2000/XP: Type cmd.exe and click on "OK".
Change to C:\jpeg. To do so, type the following (the words in brackets are instructions to be executed, not to be typed):c: (press return)cd\ (press return)cd jpeg (press return)
To remove all metadata of all JPEG files in "C:\jpeg", type:jhead -purejpg * (press return)
If you prefer programs with a graphical user interface, you should try IrfanView. It is a free program for Windows that allows you to view and edit images. Besides IrfanView itself, you need the plug-in "Lossless JPG Transformations". More detailed information about how to install this plug-in can be found on IrfanView's homepage.
Less experienced users should stick to these instructions:
Download IrfanView and its plug-ins and install them.
Create the directory C:\jpeg.
Copy all pictures whose metadata you want to remove to C:\jpeg.
Start IrfanView. In its menu bar, click on "File" and then "Thumbnails".
A new window has opened. In its left column, select C:\jpeg. The right column will now show small preview images of all pictures of C:\jpeg.
Select all pictures. To do so, click on "Options" in the menu bar of the new window and then "Select all".
In the same menu bar, click on "File", "JPG Lossless Operations" and then "Lossless transformation with selected thumbs...".
A new window has opened. In the section "Transformation", select "None (can be used for optimizing and cleaning)".
In the section "JPG APP marker options", select "Clean all APP markers".
Finally, click on "Start".
Fingerprint of Digital Cameras

window.google_render_ad();
Many users may also not know that digital cameras leave an individual fingerprint in each picture. This allows to reliably link pictures to the camera with which they were taken -- in much the same way that forensic examiners can link bullets to the gun that fired them.
Professor Jessica Fridrich and two members of her Binghamton University research team exploit the fact that every digital camera produces tiny imperfections (noise) within a picture. Each camera has a characteristic way of producing noise (even cameras of the same make and model) due to inevitable irregularities during the manufacturing process of the camera and its sensors. Although the digital noise is largely invisible to the human eye, the team around Fridrich have developed algorithms to analyze the noise and thus to determine the individual fingerprint. According to Fridrich, the technique is accurate 99.99 percent of the time. A limitation is that it requires multiple pictures taken by the same camera to determine the fingerprint; a single picture is not sufficient.
With the help of the fingerprint, it is possible to tell if a picture was taken by a certain camera. It is even possible to detect image tampering. While unchanged regions of a picture keep their digital fingerprint, regions that have been tampered with lose their characteristic noise. Even if a picture has been compressed to a smaller file size (e.g. to send it by email), the fingerprint remains detectable.
Whereas Fridrich needs multiple pictures for her analysis, a technique developed by Nasir Memon of Polytechnic University in Brooklyn requires only a single picture. Memon's technique relies on the fact that different digital camera manufacturers use different interpolation algorithms. An interpolation algorithm is used by digital cameras to give each pixel of a digital photograph the correct color. As these algorithms leave telltale traces in the pictures and vary from company to company, Memon can match a picture to a camera brand with an accuracy of 90 percent.
Software tools that are capable of removing digital fingerprints do not seem to exist.
Digital Watermarks
Something that should be distinguished from digital fingerprints is digital watermarking. Among other things, digital watermarking is used to prevent -- or at least expose -- picture altering. Digital cameras equipped with digital watermarking technology append an extra stream of identifying data to each picture, which is usually invisible. If the picture is changed, these data and therefore the digital watermark are corrupted.
Cameras with watermarking technology are mainly purchased by professionals who need to prove that the pictures they have taken are unaltered (e. g. crime scene investigators). Just like digital fingerprints, digital watermarks could make it possible to determine if a picture was taken by a certain camera.
Although software tools that can remove digital watermarks do not seem to exist, digital watermarks are not really a problem. Simple countermeasures are to use only digital cameras without watermarking technology or with the option to disable watermarking.

No comments:

Post a Comment