Do I understand LLMs?

GardenVarietyAnxiety@lemmy.world · 2 months ago

Do I understand LLMs?

howrar@lemmy.ca · 2 months ago

mathematically “correct” sounding output

It’s hard to say because that’s a rather ambiguous way of describing it (“correct” could mean anything), but it is a valid way of describing its mechanisms.

“Correct” in the context of LLMs would be a token that is likely to follow the preceding sequence of tokens. In fact, it computes a probability for every possible token, then takes a random sample according to that distribution* to choose the next token, and it repeats that until some termination condition. This is what we call maximum likelihood estimation (MLE) in machine learning (ML). We’re learning a distribution that makes the training data as likely as possible. MLE is indeed the basis of a lot of ML, but not all.

*_{Oversimplification.}