Running large language models (LLMs) directly on Linux systems gives users full control over their data, eliminates recurring API costs, and allows for offline access. Two standout tools for accomplishing this are Ollama and LM Studio. Each offers a distinct approach: Ollama provides a powerful command-line interface and API, while LM Studio delivers a streamlined graphical desktop experience. Both support a wide range of open-source models, including Llama, Mistral, DeepSeek, and more.
Using Ollama to Run Local LLMs
Ollama operates as a lightweight command-line tool that manages, downloads, and runs LLMs locally. It is cross-platform, but Linux users benefit from its straightforward installation and robust performance, especially when leveraging GPU acceleration.
Step 1: Install Ollama on your Linux machine. To do this, open a terminal and run:
curl -fsSL https://ollama.com/install.sh | sh
This script downloads and installs Ollama. After installation, verify by running:
ollama --version
This command displays the installed version, confirming a successful setup.
Step 2: Download a model using the ollama pull
command. For example, to get the Llama 2 7B chat model, run:
ollama pull llama2:7b-chat
This fetches the model weights to your machine. Download times vary depending on model size and internet speed.
Step 3: Run the model interactively with:
ollama run llama2:7b-chat
This launches an interactive prompt where you can enter questions and receive responses. For one-off queries, you can append the prompt directly:
ollama run llama2:7b-chat "What is the capital of Poland?"
Step 4: Integrate Ollama into your applications using its local REST API. Start the API server with:
ollama serve
Then, send requests using curl
or any HTTP client. For example:
curl http://localhost:11434/api/generate -d '{
"model": "llama2:7b-chat",
"prompt": "List three Linux distributions.",
"stream": false
}'
The API returns generated responses as JSON. This setup allows developers to build custom chatbots, assistants, or integrate LLMs into other software without relying on external services.
Step 5: Optimize performance by using GPU acceleration, if available. To check if Ollama is utilizing your NVIDIA GPU, run:
OLLAMA_DEBUG=true ollama run llama2
Look for log messages indicating CUDA usage. GPU acceleration significantly speeds up inference for larger models.
Step 6: Manage and customize models with Modelfiles. Ollama supports a Dockerfile-like syntax for creating custom model variants, injecting system prompts, and setting parameters. Save a Modelfile in a directory, then create a new model with:
ollama create my-custom-model -f /path/to/Modelfile
This feature is valuable for automating workflows or tailoring models to specific tasks.
Using LM Studio to Run Local LLMs
LM Studio offers a graphical desktop application for running LLMs locally on Linux, Windows, and macOS. It’s especially suitable for users who prefer not to use the command line or want a more interactive experience.
Step 1: Download the LM Studio AppImage for Linux from the official website. The file is typically named like LM-Studio-0.3.9-6-x64.AppImage
and is about 1GB in size.
Step 2: Open a terminal and navigate to the directory where you downloaded the AppImage. Make the file executable with:
chmod u+x LM-Studio-0.3.9-6-x64.AppImage
Step 3: Launch LM Studio by running:
./LM-Studio-0.3.9-6-x64.AppImage
This command starts the application, which automatically opens the graphical interface.
Step 4: On first launch, LM Studio prompts you to download an LLM model. Use the built-in catalog to search, filter, and select a model suitable for your hardware. LM Studio displays estimated RAM and VRAM requirements, helping you avoid incompatible downloads.
Step 5: Once the model is downloaded, start a chat session in the app. Enter prompts in the chat window and receive responses directly in the interface. LM Studio tracks chat history, shows token usage, and displays system resource statistics, providing transparency into performance.
Step 6: Adjust advanced settings as needed. The side panel allows you to:
- Set system prompts to guide model behavior.
- Customize parameters such as temperature, top-p, top-k, and max tokens.
- Control GPU offload settings to balance speed and memory usage. For example:
- 4GB–8GB VRAM: Use partial offload (10–50 layers).
- 10GB–16GB VRAM: Use higher offload (50–80%).
- 24GB+ VRAM: Use full GPU offload if available.
Step 7: Enable the local API server from the Developer tab. This feature exposes an OpenAI-compatible API endpoint on localhost
, allowing other programs to interact with your chosen model. Developers can point their OpenAI API clients to this endpoint, making integration with existing tools simple.
Step 8: Manage multiple models and chat sessions. LM Studio allows you to download, update, and switch between various models, supporting both small and large LLMs in GGUF format. This flexibility lets users experiment with different models for diverse tasks.
Managing Model Storage and Sharing Between Ollama and LM Studio
Both Ollama and LM Studio store models in separate directories and may use different file formats. Ollama typically uses a Mojo-based format, while LM Studio relies on GGUF files compatible with llama.cpp
. To avoid redundant downloads and conserve disk space, users can:
- Identify where each application stores its models (
~/.ollama/models
for Ollama,~/.cache/lm-studio/models/
for LM Studio). - If a model is in GGUF format, create a symbolic link from the shared model location to the other application's directory. For example:
mkdir /store/MyModels
cd /store/MyModels
# Download or move the model here
ln -s ./the-model-file ~/.cache/lm-studio/models/
Alternatively, use community tools like gollama or llamalink to automate symlinking models between Ollama and LM Studio. Note that not all models are cross-compatible; conversion to GGUF may be required for LM Studio.
Choosing the Right Tool for Your Needs
LM Studio is ideal for users who want a graphical, plug-and-play experience with minimal setup. Its interface simplifies model discovery, parameter tuning, and multi-model management, making it suitable for beginners and those preferring a visual workflow.
Ollama excels for developers and advanced users who need automation, scripting, or integration into larger workflows. Its command-line and API features provide granular control over model usage, customization, and performance optimization.
Both tools empower users to run powerful LLMs locally, maintaining privacy, reducing costs, and enabling experimentation without reliance on cloud services. By selecting the approach that matches your technical comfort and project goals, you can deploy advanced AI on your own hardware efficiently.
Running LLMs with Ollama and LM Studio on Linux streamlines local AI deployment, giving you flexibility and privacy without sacrificing performance. Try both methods to see which workflow fits your needs best.
Member discussion