“Are Emergent Abilities of Large Language Models a Mirage?”

—

A couple thoughts looking at this fascinating paper from 2023: “Are Emergent Abilities of Large Language Models a Mirage?”:

Maybe Frank Harrell could feel a bit smug about this?: https://www.fharrell.com/post/class-damage/ “Estimating tendencies is usually a more appropriate goal than classification, and classification leads to the use of discontinuous accuracy scores which give rise to misleading results.”
Looking for a continuous variable into which a “yes/no” question can be embedded seems like a generally good idea:
- “How much better or worse is the treatment?” vs “Is the treatment better?”
- “How wrong is it?” vs “Is it morally wrong?” (though what would be the units?)

Comments