Large Models ~ Small Code

Posted on Oct 17, 2024

When I describe running a model in production as “50 lines of code” people smile or outright laugh. Look at the screenshot below. It’s full implementation of Llama3.2 1B inference. Under 100 lines (screenshot has 61, but there are few additional lines above).

Machine learning in production is not a large software.

Llama 3.2 1B inference

Source codes