A question that has emerged over recent years is whether audiovisual (AV) speech perception is a special case of multi-sensory perception. Electrophysiological (ERP) studies have found that auditory neural activity (N1 component of the ERP) induced by speech is suppressed and speeded up when a speech sound is accompanied by concordant lip movements. In Experiment 1, we show that this AV interaction is not speech-specific. Ecologically valid nonspeech AV events (actions performed by an actor such as handclapping) were associated with a similar speeding-up and suppression of auditory N1 amplitude as AV speech (syllables). Experiment 2 demonstrated that these AV interactions were not influenced by whether A and V were congruent or incongruent. In Experiment 3 we show that the AV interaction on N1 was absent when there was no anticipatory visual motion, indicating that the AV interaction only occurred when visual anticipatory motion preceded the sound. These results demonstrate that the visually induced speeding-up and suppression of auditory N1 amplitude reflect multisensory integrative mechanisms of AV events that crucially depend on whether vision predicts when the sound occurs.