llama : Metal inference by ggerganov · Pull Request #1642 · ggerganov/llama.cpp
Add full GPU inference of LLaMA on Apple Silicon using Metal
The initial idea was proposed and explained here: https://github.com/ggerganov/llama.cpp/discussions/915
A basic PoC was demonstrated h...