Introduction
On 19 November 2024, I downloaded – from the site of a National European TV station – the film In good company1. I watched the film a couple of weeks later late at night and was very surprised to see in a flash (maybe a second long) the image of a Palestinian girl in the middle of the film. This was a schoolgirl, about 8 years old, with an intense look. I wasn’t quite sure whether I had been dozing or awake, whether the girl’s image was real or somehow invented by my brain based on the current news about the Israeli operation2 in Gaza, or even possibly found by my media player vlc in a remote corner of my hard disk3. I made a mental note about the time of the image instead of writing it down, with the result that I never found it again.
As I have now discovered, the image was real (Figure 2 and Video 1). It lasted for 14 frames , which is 0.56 seconds at 25 fps4. This probably does not qualify as sub-liminal as the word means “below perception”. I saw the image clearly enough to see that it represented a Palestinian girl but did not remember seeing any text, and even less read it. The “text-part” below the image included a QR-code, the logo and the name of a Charity as well as a link which can be used to make a contribution to fund operations of the Charity in Gaza5.
Remember that I downloaded the film, which was free to be streamed until 19 November. I don’t know when the 14 frames were added. The flashed image of the girl may have been seen by whoever streamed the film, but it seems unlikely that the text could have been read. Unless, maybe, the blue logo of the Charity is well known. I would probably have recognized a quasi-subliminal red cross or red crescent, but I had never seen the logo of the Charity before. I communicated by email with the Charity and the TV station. The latter confirmed that mid-film advertisements (commercials) did indeed include the image, which was thus imperfectly removed from the film that remained online6.
Some words about image formats and sizes
The duration of the film In good company is one hour and 45 minutes; this is 157500 frames (images) at 25 fps (6300 seconds). The frames of the film we mention are rectangular images (tables of pixels) of 1280×720 pixels, each coded as 3 bytes per pixel colour (Red, Green, Blue). When coded as a bitmap (the aforementioned rectangular table) each frame is an image of 2764854 bytes (2.8 MB) which is 1280 pixels across x 720 pixels down x 3 colours per pixel plus some overhead bytes. Decomposing the whole film into 157500 2.8 MB images would occupy 435 Gigabytes.
Someone may suggest to use compressed images such as jpg or webp instead. Compression takes advantage is taken of regularities in the images to reduce their size. For instance, instead of storing 1000 black pixels (which is about 3K on 3 bytes7) it is possible to design a way of coding the images that says “next come 1000 identical black pixels” which occupies 27 bytes (the 27 letters from the n of next to the s of pixels). As a result of compression, however, successive images are no longer comparable on a pixel-by-pixel basis8.
The principle of the method
It is impossible to visually scan 157500 frames one by one. But a film is made by a succession of scenes9 during which images changes little by little and successive images are usually similar. The basic idea is to extract the first image of each scene, as there are only 500 scenes or so in a film, and 500 images can easily be examined one by one. The comparison of different images can be done by extracting the same pixel values from successive images and see how much they differ (i.e. their colour) with the same pixels in the previous image. Of course, this is possible only if the images are comparable pixel-by-pixel, i.e. uncompressed. I have eventually found that it is not necessary to compare all the 2,764,854 pixels of all images but that it is sufficient to randomly select 10,000 pixel values (always the same pixels, e.g. Blue value of top left corner, Red value of row 5, column 123 etc) then correlate them using the Pearson correlation coefficient r.The statistic r yields a value close to 1 when the 1000010 values are similar in two images and 0 if there is no relation between the two images. I have considered arbitrarily that images are different if r<0.0511.
As mentioned above only uncompressed images can be used. Unfortunately uncompressed 24-bits (i.e. 3 8-bit colour pixels) BMP images are huge. As result, it is necessary to decompose the film into a number of BMP images that remains manageable. I have adopted a two-minute time step, which corresponds to 3000 images12. Each image is compared with the previous one using the r statistic. If r<0.05, the image itself is copied into a separate directory13 while average pixel value, standard deviation, r and the image name are written into a file. The 3000 files are then deleted and the next two minutes are examined.
Most processing of the film was done with ffmpeg. I am very much indebted to Eric Brasseur14 for his competent and patient assistance with ffmpeg, which – given my level of ignorance of video processing – ranks somewhere between babysitting and spoon feeding. ffmpeg was run under in Linux Mint Xfce V. 22 “Wilma” on a LENOVO 82EY system (IdeaPad Gaming 3 15ARH05). Most data handling, including ffmpeg calls were done with an ad hoc Freebasic 1.10.1 programme. The complete list of ffmpeg commands that were used is given at the end of this post before the footnotes.
Results
The ratio between frames with r<0.05 to the total number of frames is 0.002537, which is about 400 images out of the 157500 frames in In good company. In other words, there are about 400 sequences of similar images (scenes), each starting with a “different” first image (frame). 400 iconized first images of each sequence can easily and rapidly be inspected visually, and this is how the Palestinan girl was located (Figure 2 and Figure 3).
Figures 4 shows the average value and the standard deviations of the 10000 sampled random pixel values for all the frames between times 32 and 62 minutes in In good company. Figure 5 has the coefficient of correlation between each frame and the previous one. Average and standard deviation are easy to compute but are not of much help in identifying the interpolated static image of the Palestinian girl at 45:23. Low averages mostly point at dark images as black is obtained with low R, G and B pixel values.
The standard deviation is potentially more useful than the average, in that it remains constant when images do not change. This is an alternative possibility: scan the 157500 images to identify sequences of constant standard deviation. Otherwise, the standard deviation indicates “homogeneous images” (low standard deviation). The sequence around minute 34 (standard deviation high, fluctuating average) shows a talk between two people in an office, with one in front of the window during daytime, and the camera switching between the two at regular intervals, to show the animated nature of the discussion. At minute 48, we have a discussion in a bar at night, with two people sitting at a round table: the dim light changes little and so do the colours: both standard deviation and average are “quiet¨.
The next Figures (5, 6 and 7) all represent the coefficient of correlation at different time scales. The first is 30 minutes, the second about one and a half minute and the third just one second. The first shows an accumulatuion of high values just under 1 as they correspond to the film’s main scenes, characterised by the presence of the same people and the same environment. Each drop is the beginning of a new scene, but few drops reach the threshold of 0.05.
A close look at Figure 6 shows a short period of constant r at 1 (which means: an identical frame). This cannot be spotted in a graph, but it can be located easily in a table or with an ad hoc programme.
Finally, Figure 7 covers just 1 second. The first image of the Palestinian girl occurs at second 23.28 during minute 45. r drops to just under 0 (r=-0.01058) and r stays at 1 (images perfectly identical) for 14 frames. We then have a night scene (refer to Video 1 or Figure 2) with little contrast between pixels, which results in a drop of the standard deviation. The slight drop in r at second 23.80 is difficult to explain if the last image of the Palestinian girl is not slightly different from the previous ones. As I have added the logo and the text, this is probably my making (it also appears in the average and the standard deviation). Note that the first drop in r after the Palestinian girl sequence at 23.84 seconds reaches only 0.096351 which is larger than 0.05 and is therefore not registered as a new sequence.
Conclusions
Still images interpolated in a film can be spotted by looking at the sequence of coefficients of correlation between successive images. The interpolated frames are characterised by low r values for the first interpolated frame, followed by constant r at 1 while the still frame is displayed. The method also suggests that a subliminal or quasi-subliminal frame or series of frames would be difficult to spot if it contained a fixed part with a message of a very recognizable logo and a variable or random part, or possibly the logo fading in and out again.
ffmpeg commands used
- Convert film IGC249.mkv framerate from 24.975 fps to 25.000 fps: ffmpeg -r 25 -i IGC249.mkv -c:v libx264 -preset slow -tune fastdecode -crf 15 -c:a copy IGC.mkv
- Remove sound from IGC.mkv: ffmpeg -i IGC.mkv -map v -c copy IGC_no-sound.mkv
- Insert Wergosum logo logo_232x52.png at pixel (1000,40) over original TV station logo: ffmpeg -i IGC.mkv -i logo_232x52.png -filter_complex “overlay=x=1000:y=40” IGC_wergologo.mkv
- Extract from IGC.mkv the 3000 BMP images corresponding to the frames from second 1000 to 1120 (2 minutes): ffmpeg -ss 1000 -t 120 -i IGC.mkv out%d.BMP
- Insert frame work_humcrs.BMP into video swrk25v.mkv at time 7.52 (seconds) replacing existing frame and create ffmpeg wrk25w.mkv: ffmpeg -i wrk25v.mkv -i work_humcris.BMP -filter_complex “[1]setpts=07.52/TB[im];[0][im]overlay=eof_action=pass” -c:a copy wrk25w.mkv
Notes
- I downloaded the film using yt-dlp for Linux. yt-dlp is available from SourceForge for Windows, Mac and Linux. It is a fork of youtube-dl. ↩︎
- I am horrified by the brutality of the Israeli “operation”. I cannot understand how a highly cultured Jewish society better known for their philosophical, artistic and scientific achievements can have generated people who label their opponents “animals“. Retaliation, and even vengeance are one thing. But remaining humane (i.e. human) should never be given up. Not even by military personnel. ↩︎
- I am told that these things do happen! ↩︎
- FPS, frames (= images) per second. ↩︎
- Which I did! ↩︎
- It cannot be ascertained if the forgotten 14 frames result from someone’s intentional action. But it cannot be excluded either. ↩︎
- Black is no colour, i.e. Red=0, Green=0 and Blue=0. ↩︎
- There are many types of image formats, including some historical ones everyone is familiar with (jpeg, gif, tif) and more modern ones (webp, avif) developed for web applications. There are also different types of compression, some of which are lossy and other lossless. The degree of compression can often be decided by users. A jpg or a webp image compressed 10% loses much less detail than the same image compressed 80%. Lossy formats occupy little hard disk and memory space, but are degraded compared with the uncompressed version of the image. In other words, when a BMP image is compressed to jpg, it loses detail and the original full-resolution image cannot be reconstructed.
Here is a quick comparison of frames extracted from In Good company for the 250 images (10 seconds) that start at second 2400 (minute 40). The sample was selected arbitrarily. BMP, all images are 2,764,854 bytes for a total of 691.2 MB; jpg: images are between 15,376 bytes and 63,637 bytes for a total of 4.8 MB (some jpg images are significantly larger – up to 1 MB – in other parts of the film; png: images occupy between 577,395 bytes and 1,160,225 bytes for a total of 214.7 MB; tif: images are between 1,267,536 bytes and 1,361,712 for a total of 329.9 MB. ↩︎ - A scene (in theater or film) is “a single piece of action, a part of a play during which there is no change in time or place”. Refer to Longman Dictionary of Contemporary English. ↩︎
- Other values were tested, from 5000 to 100000; they yield basically the same results. 10000 was eventually selected as not too low but reasonably representative of the whole image. Pixels were selected randomly, but could also have been taken from points forming a regularly spaced grid. The sane random set of pixels was used for all frames. ↩︎
- r can vary from -1 (perfect negative correlation) to +1 (perfect positive correlation). In practice values are mostly close to 1 for relatively similar frames, but drop to close to 0 when there is a change in the narrative, i.e. two persons sit in kitchen and talk about the weather; successive images are similar for 5 seconds. Then the cat jumps into the aquarium, and the scene changes completely. The first image of the aquarium is uncorrelated with the previous one. There’s a drop in r every time the action changes. ↩︎
- This is 3000 3 MB images, thus totaling 9 GB. The processing of the last images is about 10 times slower than the first ones. ↩︎
- The image is renamed to include the time at which it appears in the film, as illustrated in Figure 3. ↩︎
- Eric’s vast image processing experience was mostly acquired preparing and publishing Youtube videos captured with his ultra-light home-made mini drones. Additional technical details about Eric and his drones is found on his Homepage, especially under the heading “quadcopters”. ↩︎