How to run an LLM locally on your desktop
Using Docker Model Runner, here's a step-by-step on how to run LLMs locally without wrestling with dependencies, installations, or confusing setups.
Sep 11, 2025 • 5 Minute Read

A common question these days is: “Is there an easy way to run large language models (LLMs) locally?”. As AI continues to gain traction, more people from hobbyists to IT pros and developers are looking for ways to experiment with and run LLMs on their own machines. There are plenty of reasons for this such as:
• A developer may want to build an application with an LLM in it
• It can be very expensive to run an LLM in the cloud or use one like OpenAI
• Taking advantage of local GPUs in your laptop
• The need to run LLMS offline
• Ensuring greater privacy when working with GenAI
Some popular local options include Ollama, DeepSeek, and GPT4ALL, along with tools like LM Studio, Llamafile, and Jan. Then there is Hugging Face, which is like the GitHub of open source LLMs. While it is an incredible resource, figuring out how to run models from Hugging Face locally can be confusing and complex.
The challenge with running LLMs locally
Running LLMs locally often requires a patchwork of steps that can feel overwhelming. You may need to download specialized applications, learn the ins and outs of how each one works, load and configure the models, manage complex dependencies, and constantly tweak directories or environment settings. On top of that, you might need to spin up a WebUI server just to access the model, which adds yet another layer of setup and maintenance.
Common challenges include:
• Installation and hardware compatibility issues
• Manual management of models and dependencies
• Developer centric interfaces that are not beginner friendly
In short, running LLMs locally can feel like wrestling with complicated setups instead of just experimenting and building AI.
The solution: Docker Model Runner
Enter Docker Model Runner (DMR), a new feature in Docker Desktop that makes running LLMs locally simple and consistent. Docker Model Runner streamlines the local development and testing of AI-powered applications. DMR is a container based runtime environment that packages and serves AI and ML models as Docker containers. Instead of installing frameworks and dependencies manually, you use Docker’s tooling to create a container that loads your model and exposes it for inference, usually via an HTTP API.
This is really exciting to me personally. Why? Because Docker is exactly doing for AI what it did for Containers: making complex technology accessible and developer friendly.
Key benefits of using DMR
• Consistency: “It runs on my machine” becomes “It runs anywhere with Docker.”
• No Dependency Hell: Everything you need lives inside the image.
• Speed: Pull a model from Docker Hub and start inferencing in minutes.
• Integration: Works seamlessly with existing CI/CD pipelines, Kubernetes clusters, and containerized apps.
• Scalability: Spin up multiple containers for load balancing or deploy to Kubernetes.
• Security and Isolation: Each model runs in its own environment with its own dependencies.
System requirements for DMR
DMR will run on a Windows, Mac, or Linux computer however it does have some core requirements that you will need to meet. These are:
- Windows: NVIDIA GPU
- Mac: Apple Silicon
- Linux: CPU or NVIDIA GPU
With DMR, it really is as simple as enabling it in Docker Desktop, pulling a model, running it, and prompting. Docker has also made several LLMs available on Docker Hub, and you can push your own custom or fine-tuned models there too.
Here are some key aspects of DMR:
Local execution
It enables running AI models locally on your machine, leveraging your system's resources (CPU or GPU) for inference. Models are loaded into memory only when a request is made and unloaded when not in use, optimizing resource utilization.
Familiar Docker commands
It extends the familiar Docker CLI with commands like docker model pull, docker model run, and docker model logs, allowing users to manage AI models with a consistent interface.
OpenAI-compatible API
Docker Model Runner exposes an OpenAI-compatible inference server, making it easy to integrate local models into applications that are already designed to interact with OpenAI's API.
Use cases for running your LLMs locally
• Prototyping AI features without heavy setup
• Deploying models in cloud native environments (AKS, EKS, GKE)
• Standardizing ML model delivery across teams
Using Docker Model Runner to run your local LLM
You can interact with models directly from the Docker Desktop UI or from the CLI. I will take you through some basic DMR workflows and examples of both the UI and CLI.
UI example
- Enable Docker Model Runner: First you have to enable the feature in Docker Desktop's settings. Navigate to Settings -> Features in development and check the box for Enable Docker Model Runner.
- Pulling Models: In the Docker Desktop UI, you can access a "Models" tab. This tab allows you to search for models on Docker Hub (specifically in the ai/ namespace) and pull them directly.
- Running Models: Once a model is pulled, you can run it. The model will run as a local service, not in a traditional container, and will be accessible via a local API.
- Interacting with Models: The models are exposed through an OpenAI-compatible API. You can interact with them through the Docker Desktop UI, CLI, or using tools and applications that are designed to work with the OpenAI API.
CLI example
Beyond the basics, you can also integrate LLMs into Docker Compose apps with Model Runner, but that is a topic for another post.
Conclusion
Just like it did with containers, Docker is simplifying a complex technology. With Docker Model Runner, you do not have to wrestle with dependencies, installations, or confusing setups. Instead, you can quickly pull models from Docker Hub, run them locally, and even integrate them into your containerized applications.
AI is exciting, but it can feel intimidating when you are just starting out with LLMs, especially open source ones. Docker removes that barrier, giving developers and experimenters a straightforward way to get hands on with AI.
If you have not tried it yet, I highly recommend enabling Docker Model Runner in Docker Desktop and giving it a spin. Thanks for reading and happy prompting.
Want to learn more about using Docker and LLMs?
Pluralsight has an expert-led learning path on using Docker for a wide range of containerization needs, including guided hands-on labs. It also has a path on Large Language Models designed for anyone interested in harnessing the power of LLMs to solve real-world problems.
Advance your tech skills today
Access courses on AI, cloud, data, security, and more—all led by industry experts.