Funny. I read this quote from the weak-to-strong paper:
We cannot directly study this problem today, but we can study a simple analogy: can small models supervise larger models (right)?
As: “We can use small models to supervise larger models. Right?“