I noticed an interesting issue of trying to tell when sound and sight are not in time, while trying to match an audio track to a video track. I used to use a computer program that displayed a waveform view of the audio, and never had any problem identifying mismatched timing to align the sound with the image.
Recently I have been using software that does not display a waveform image of the sound. I realise that I can tell when the sound and image are not aligned - for instance a mouth opening out of time with the speech, or an object hitting the ground out of time with the crash - but I seem to be unable to tell whether the sound or the image comes first, or how far out of time they are. I have to progressively add or subtract a second or more delay until it is obviously worse, and then back off until I find an acceptable match.
(The problem I have is with digitized video taken from a VHS cassette, where the audio and the video recognize different amounts of time for skipped frames and unrecorded segments of tape).
I guess that this is a sensory integration problem - does anyone else have this problem, or is it very common?