llama : Metal inference by ggerganov · Pull Request #1642 · ggerganov/llama.cpp

Add full GPU inference of LLaMA on Apple Silicon using Metal The initial idea was proposed and explained here: https://github.com/ggerganov/llama.cpp/discussions/915 A basic PoC was demonstrated h...