Dette er en gammel utgave av dokumentet!
LET'S DEFINE THE PIECES
Everything you need to know falls into one of three categories:
1. MODELS (LLMs)
- These are the brains.
- They are NOT software.
- They are NOT programs.
- They are NOT plugins.
- They are neural networks stored in a single file, usually:
model.gguf
model.safetensors
model.bin
Examples of models:
LLaMA-3 8B
LLaMA-3 70B
Mistral 7B
Mixtral 8x7B
Qwen 2 7B
Phi-3
A model file contains:
Neurons
Synapses
Every learned pattern
All the intelligence
A model does NOT:
- Have a UI
- Open PDFs
- Connect to NASA
- Run indexing
- Provide a chat window
It only takes text in → text out.
2. MODEL RUNTIMES (ENGINES)
These are the programs that LOAD and RUN the model's brain.
Think of a “runtime” as the machine that runs a model file.
===== Runtimes include =====
:
✔ Ollama
- Terminal-based
- Local API
- Can fine-tune
- Good for automation
- Good for pipelines
- Very flexible
- Acts like a backend server
✔ LM Studio
- GUI desktop app
- Easy model downloading
- Drag-and-drop PDFs
- File chat
- Rudimentary RAG
- Easy to test many models
- Great for tinkering
✔ GPT4All
- GUI
- Also a runtime
- Similar to LM Studio
- Not as modern
✔ koboldcpp
- Runtime specialized for story-writing/roleplay
- GUI
- Some fine-tuning tools
—-