tl;dr: The Samsung Galaxy S2 can occasionally create corrupted JPEGs, i.e., files that don't follow the specifications.
The Problem
Standard (linux) picture viewer applications would just say that they can't open the file. That's obviously not sufficient to get to the bottom of this, so I used GIMP and ImageMagicksconvert
, which both gave me the same information:$ convert test.jpg out.jpg convert: Corrupt JPEG data: 1072 extraneous bytes before marker 0xd8 `test.jpg' @ warning/jpeg.c/EmitMessage/231. convert: Invalid JPEG file structure: two SOI markers `test.jpg' @ error/jpeg.c/EmitMessage/236. convert: missing an image filename `out.jpg' @ error/convert.c/ConvertImageCommand/3011.So, the two valuable information were:
1072 extraneous bytes before marker 0xd8and
two SOI markers
The EXIF standard
The only helpful googling those error messages brought up was to use a hex editor. (Yeay!)My corrupt file starts with
FF D8 FF E1 00 0E 45 78 69 66 00 00 49 49 2A 00 ...What you see here, is a JPEG file (
FF D8
) followed by some EXIF information (FF E1
). Using the EXIF specification (PDF), we learn that marks the start of an application segment 1 (APP1
).Offset (Hex) | Name | Code (Hex) |
---|---|---|
0000 | SOI (Start Of Image) Marker | FFD8 |
0002 | APP1 Marker | FFE1 |
0004 | APP1 Length | xxxx |
0006 | Identifier | 4578 6966 00 ("Exif"00) |
000B | Pad | 00 |
000C | APP1 Body |
FFD8
is a SOI marker and the error message says that the file has two of them, which apparently is a bad thing. So I searched for another occurence of FFD8
and found one at 0x442 = 1090. It also said that it had 1072 extraneous bytes before marker 0xd8, which is only slightly smaller than the area between the APP1 header and the next SOI marker. So, is the SOI marker here wrong?A valid JPEG file
Since I don't have the slightest idea of what exactly is wrong here, I opened another JPEG that works and was taken only minutes before the corrupt one. Comparing them by fast-switching between the console tabs (exploiting low-level visual processing and attention guidance of the brain is fun), I've noticed two things:FFD8
can be found at the same position in both files, so that is not the problem.- The first difference is in the APP1 length.
- The difference is huge!
Corrupt file:
FF D8 FF E1 00 0E 45 78 69 66 00 00 49 49 2A 00 ...
Valid file:FF D8 FF E1 E0 42 45 78 69 66 00 00 49 49 2A 00 ...
The length of the APP1 segment in the corrupt file is only 0xE = 14? That should be far too small.I then started to increase the length in the corrupted file randomly and see what error messages
convert
would give me, but that's more like being in a completely dark room with a metal bucket and throwing rocks until I hear that I've hit the bucket.But let's see what is at the end of the APP1 segment in the valid file:
0xE042: FD CF FF D9 FF DB 00 84At 0xE044, which is 0xE042 plus the SOI marker before the APP1 segment, it says
FFD9
and the EXIF specification tells us that this is the EOI (End Of Image) marker followed by FFDB
, which is the DQT (Define Quantization Table) marker, see Table 40 of the specification. As far as I can tell, everything is where it should be.Overflow
Now back in the corrupt file, I searched forFFD9FFDB
and found it at 0x10010. Do you see it already?Minus the two bytes for the SOI marker, the length of the APP1 segment should be 0x1000E, which unfortunately can't be stored in only two bytes. What CAN be stored in two bytes is the lower part, 0x000E, which we see as length in the APP1 segment header. A classic example of an integer overflow, the first one I've observed in the wild!
The EXIF specification is clear:
Interoperability, APP1 consists of the APP1 marker, Exif identifier code, and the attribute information itself. The size of APP1 including all these elements shall not exceed the 64 Kbytes specified in the JPEG standard.Oops.
Solution
From my understanding, the APP1 segment contains the thumbnail at the end. I reckon that that can be recalculated and stored properly by most image processing applications, so let's try shorting the data there to get under 64 Kbytes. I removed 20 bytes directly before theFFD9FFDB
, which yields a new APP1 segement length of 0x1000E - 0x14 = 0xFFFA, and store this new length at 0x0004.It seems like this works! The JPEG can now be opened again without any errors, not even regarding the thumbnail, which I've truncated and is not so important to me.
This is the only time I've encountered this problem with pictures taken using my Samsung Galaxy S2, so this should be a one-time fix. If it happens again, I think I have write a little script to do that for me.