Dette er en gammel utgave av dokumentet!

LET'S DEFINE THE PIECES

Everything you need to know falls into one of three categories:

1. MODELS (LLMs)

These are the brains.
They are NOT software.
They are NOT programs.
They are NOT plugins.
They are neural networks stored in a single file, usually:

model.gguf
model.safetensors
model.bin

Examples of models:
LLaMA-3 8B
LLaMA-3 70B
Mistral 7B
Mixtral 8x7B
Qwen 2 7B
Phi-3

A model file contains:
Neurons
Synapses
Every learned pattern
All the intelligence

A model does NOT:

Have a UI
Open PDFs
Connect to NASA
Run indexing
Provide a chat window

It only takes text in → text out.

2. MODEL RUNTIMES (ENGINES)

These are the programs that LOAD and RUN the model's brain.
Think of a “runtime” as the machine that runs a model file.

===== Runtimes include ===== :
✔ Ollama

Terminal-based
Local API
Can fine-tune
Good for automation
Good for pipelines
Very flexible
Acts like a backend server

✔ LM Studio

GUI desktop app
Easy model downloading
Drag-and-drop PDFs
File chat
Rudimentary RAG
Easy to test many models
Great for tinkering

✔ GPT4All

GUI
Also a runtime
Similar to LM Studio
Not as modern

✔ koboldcpp

Runtime specialized for story-writing/roleplay
GUI
Some fine-tuning tools

—-

LET'S DEFINE THE PIECES

1. MODELS (LLMs)

2. MODEL RUNTIMES (ENGINES)

Page tools