Showing posts with label photos. Show all posts
Showing posts with label photos. Show all posts

Tuesday, February 12, 2013

Fix corrupted JPEGs made by the Samsung Galaxy S2

While organizing some of the pictures I took with my Samsung Galaxy S2, I've encountered one file that couldn't be opened. Being a nerd, I couldn't resist and had to investigate the issue. I think I've already spent more than two hours searching the not-so-all-knowing internet for solutions, but in the end it came down to using my brain and reading the EXIF specification (PDF).

tl;dr: The Samsung Galaxy S2 can occasionally create corrupted JPEGs, i.e., files that don't follow the specifications.

The Problem

Standard (linux) picture viewer applications would just say that they can't open the file. That's obviously not sufficient to get to the bottom of this, so I used GIMP and ImageMagicks convert, which both gave me the same information:
$ convert test.jpg out.jpg
convert: Corrupt JPEG data: 1072 extraneous bytes before marker 0xd8 `test.jpg' @ warning/jpeg.c/EmitMessage/231.
convert: Invalid JPEG file structure: two SOI markers `test.jpg' @ error/jpeg.c/EmitMessage/236.
convert: missing an image filename `out.jpg' @ error/convert.c/ConvertImageCommand/3011.
So, the two valuable information were:
1072 extraneous bytes before marker 0xd8
and
two SOI markers

The EXIF standard

The only helpful googling those error messages brought up was to use a hex editor. (Yeay!)
My corrupt file starts with
FF D8 FF E1  00 0E 45 78  69 66 00 00  49 49 2A 00 ...
What you see here, is a JPEG file (FF D8) followed by some EXIF information (FF E1). Using the EXIF specification (PDF), we learn that marks the start of an application segment 1 (APP1).
Offset (Hex)NameCode (Hex)
0000SOI (Start Of Image) MarkerFFD8
0002APP1 MarkerFFE1
0004APP1 Lengthxxxx
0006Identifier4578 6966 00 ("Exif"00)
000BPad00
000CAPP1 Body
Okay, so FFD8 is a SOI marker and the error message says that the file has two of them, which apparently is a bad thing. So I searched for another occurence of FFD8 and found one at 0x442 = 1090. It also said that it had 1072 extraneous bytes before marker 0xd8, which is only slightly smaller than the area between the APP1 header and the next SOI marker. So, is the SOI marker here wrong?

A valid JPEG file

Since I don't have the slightest idea of what exactly is wrong here, I opened another JPEG that works and was taken only minutes before the corrupt one. Comparing them by fast-switching between the console tabs (exploiting low-level visual processing and attention guidance of the brain is fun), I've noticed two things:
  1. FFD8 can be found at the same position in both files, so that is not the problem.
  2. The first difference is in the APP1 length.
  3. The difference is huge!
See for yourselves:

Corrupt file:
FF D8 FF E1  00 0E 45 78  69 66 00 00  49 49 2A 00 ...
Valid file:
FF D8 FF E1  E0 42 45 78  69 66 00 00  49 49 2A 00 ...
The length of the APP1 segment in the corrupt file is only 0xE = 14? That should be far too small.

I then started to increase the length in the corrupted file randomly and see what error messages convert would give me, but that's more like being in a completely dark room with a metal bucket and throwing rocks until I hear that I've hit the bucket.
But let's see what is at the end of the APP1 segment in the valid file:
0xE042: FD CF FF D9 FF DB 00 84
At 0xE044, which is 0xE042 plus the SOI marker before the APP1 segment, it says FFD9 and the EXIF specification tells us that this is the EOI (End Of Image) marker followed by FFDB, which is the DQT (Define Quantization Table) marker, see Table 40 of the specification. As far as I can tell, everything is where it should be.

Overflow

Now back in the corrupt file, I searched for FFD9FFDB and found it at 0x10010. Do you see it already?
Minus the two bytes for the SOI marker, the length of the APP1 segment should be 0x1000E, which unfortunately can't be stored in only two bytes. What CAN be stored in two bytes is the lower part, 0x000E, which we see as length in the APP1 segment header. A classic example of an integer overflow, the first one I've observed in the wild!

The EXIF specification is clear:
Interoperability, APP1 consists of the APP1 marker, Exif identifier code, and the attribute information itself. The size of APP1 including all these elements shall not exceed the 64 Kbytes specified in the JPEG standard.
Oops.

Solution

From my understanding, the APP1 segment contains the thumbnail at the end. I reckon that that can be recalculated and stored properly by most image processing applications, so let's try shorting the data there to get under 64 Kbytes. I removed 20 bytes directly before the FFD9FFDB, which yields a new APP1 segement length of 0x1000E - 0x14 = 0xFFFA, and store this new length at 0x0004.

It seems like this works! The JPEG can now be opened again without any errors, not even regarding the thumbnail, which I've truncated and is not so important to me.

This is the only time I've encountered this problem with pictures taken using my Samsung Galaxy S2, so this should be a one-time fix. If it happens again, I think I have write a little script to do that for me.