Rapidly alternating (ABAB...) tones are usually perceived, at least initially as a single "trill"-like sound, but after a while the single auditory stream may appear to break into two, with either the A-A- or the -B-B sequence dominating the percept, and the other tone sequence becoming a "background" sound. The wider the frequency separation, the quicker generally the break-up into two streams. Here are two examples, 6 s long ABAB sequences with frequency separations of either 1 semitone (heard by most as one stream throughout) or 10 semitones (heard by most as two streams after only a second or so).

Wider frequency separation is not the only factor that increases the likelihood that two streams are perceived instead of one. Another factor which seems to play a role is temporal irregularity. Here you can try 3 second long sequences of varying frequency separation, and you can also introduce varying degrees of temporal irregularity ("jitter") into the higher frequency tone sequence.

Delta F (semitones):
RMS Jitter (ms):

The first five jitter values on offer replicate the values used in the experiment by Rajendran et al., in which temporal jitter was chosen from a uniform distribution around the "expected" timing of each B tone once every 200 ms. The Rajendran experiment still imposed relatively tight limitations on the timing of the B tones (for example there always had to be exactly one B tone between any two As). As an example of "extreme jitter" we also include a 200 ms jitter option in which the intervals between B tones are drawn from an exponential, rather than a uniform, distribution. Exponentially distributed random processes are maximum entropy and memoryless. They therefore represent "maximal possible uncertainty" about the timing of the B tones, and lie at the opposite end of the spectrum from a zero jitter precisely clock-like rhythm.