I’ll write more about this but speaker separation (or differentiation) is part of a bigger problem to identify who’s speaking. This is also referred to as the cocktail party problem. We’re able to tune out others who are speaking but machines have a difficult time doing this. Resource intensive strategies include beam forming DSPs or blind source separation.
With this paper, the team came up with a very low power method of figuring out who’s who from mixed audio.