How to run an LLM locally on your desktop

Using Docker Model Runner, here's a step-by-step on how to run LLMs locally without wrestling with dependencies, installations, or confusing setups.

By Steve Buchanan

Sep 11, 2025 • 5 Minute Read

Please set an alt value for this image...

Subscribe to the newsletter

A common question these days is: “Is there an easy way to run large language models (LLMs) locally?”. As AI continues to gain traction, more people from hobbyists to IT pros and developers are looking for ways to experiment with and run LLMs on their own machines. There are plenty of reasons for this such as:

• A developer may want to build an application with an LLM in it

• It can be very expensive to run an LLM in the cloud or use one like OpenAI

• Taking advantage of local GPUs in your laptop

• The need to run LLMS offline

• Ensuring greater privacy when working with GenAI

Some popular local options include Ollama, DeepSeek, and GPT4ALL, along with tools like LM Studio, Llamafile, and Jan. Then there is Hugging Face, which is like the GitHub of open source LLMs. While it is an incredible resource, figuring out how to run models from Hugging Face locally can be confusing and complex.

The challenge with running LLMs locally

Running LLMs locally often requires a patchwork of steps that can feel overwhelming. You may need to download specialized applications, learn the ins and outs of how each one works, load and configure the models, manage complex dependencies, and constantly tweak directories or environment settings. On top of that, you might need to spin up a WebUI server just to access the model, which adds yet another layer of setup and maintenance.

Common challenges include:

• Installation and hardware compatibility issues

• Manual management of models and dependencies

• Developer centric interfaces that are not beginner friendly

In short, running LLMs locally can feel like wrestling with complicated setups instead of just experimenting and building AI.

The solution: Docker Model Runner

Enter Docker Model Runner (DMR), a new feature in Docker Desktop that makes running LLMs locally simple and consistent. Docker Model Runner streamlines the local development and testing of AI-powered applications. DMR is a container based runtime environment that packages and serves AI and ML models as Docker containers. Instead of installing frameworks and dependencies manually, you use Docker’s tooling to create a container that loads your model and exposes it for inference, usually via an HTTP API.

This is really exciting to me personally. Why? Because Docker is exactly doing for AI what it did for Containers: making complex technology accessible and developer friendly.

Key benefits of using DMR

• Consistency: “It runs on my machine” becomes “It runs anywhere with Docker.”

• No Dependency Hell: Everything you need lives inside the image.

• Speed: Pull a model from Docker Hub and start inferencing in minutes.

• Integration: Works seamlessly with existing CI/CD pipelines, Kubernetes clusters, and containerized apps.

• Scalability: Spin up multiple containers for load balancing or deploy to Kubernetes.

• Security and Isolation: Each model runs in its own environment with its own dependencies.

System requirements for DMR

DMR will run on a Windows, Mac, or Linux computer however it does have some core requirements that you will need to meet. These are:

Windows: NVIDIA GPU
Mac: Apple Silicon
Linux: CPU or NVIDIA GPU

With DMR, it really is as simple as enabling it in Docker Desktop, pulling a model, running it, and prompting. Docker has also made several LLMs available on Docker Hub, and you can push your own custom or fine-tuned models there too.

Here are some key aspects of DMR:

Local execution

It enables running AI models locally on your machine, leveraging your system's resources (CPU or GPU) for inference. Models are loaded into memory only when a request is made and unloaded when not in use, optimizing resource utilization.

Familiar Docker commands

It extends the familiar Docker CLI with commands like docker model pull, docker model run, and docker model logs, allowing users to manage AI models with a consistent interface.

OpenAI-compatible API

Docker Model Runner exposes an OpenAI-compatible inference server, making it easy to integrate local models into applications that are already designed to interact with OpenAI's API.

Use cases for running your LLMs locally

• Prototyping AI features without heavy setup

• Deploying models in cloud native environments (AKS, EKS, GKE)

• Standardizing ML model delivery across teams

Using Docker Model Runner to run your local LLM

You can interact with models directly from the Docker Desktop UI or from the CLI. I will take you through some basic DMR workflows and examples of both the UI and CLI.

UI example

Enable Docker Model Runner: First you have to enable the feature in Docker Desktop's settings. Navigate to Settings -> Features in development and check the box for Enable Docker Model Runner.

Pulling Models: In the Docker Desktop UI, you can access a "Models" tab. This tab allows you to search for models on Docker Hub (specifically in the ai/ namespace) and pull them directly.

Running Models: Once a model is pulled, you can run it. The model will run as a local service, not in a traditional container, and will be accessible via a local API.

Interacting with Models: The models are exposed through an OpenAI-compatible API. You can interact with them through the Docker Desktop UI, CLI, or using tools and applications that are designed to work with the OpenAI API.

CLI example

Beyond the basics, you can also integrate LLMs into Docker Compose apps with Model Runner, but that is a topic for another post.

Conclusion

Just like it did with containers, Docker is simplifying a complex technology. With Docker Model Runner, you do not have to wrestle with dependencies, installations, or confusing setups. Instead, you can quickly pull models from Docker Hub, run them locally, and even integrate them into your containerized applications.

AI is exciting, but it can feel intimidating when you are just starting out with LLMs, especially open source ones. Docker removes that barrier, giving developers and experimenters a straightforward way to get hands on with AI.

If you have not tried it yet, I highly recommend enabling Docker Model Runner in Docker Desktop and giving it a spin. Thanks for reading and happy prompting.

Want to learn more about using Docker and LLMs?

Pluralsight has an expert-led learning path on using Docker for a wide range of containerization needs, including guided hands-on labs. It also has a path on Large Language Models designed for anyone interested in harnessing the power of LLMs to solve real-world problems.

Steve B.

Steve Buchanan is a Principal PM Manager with a leading global tech giant focused on improving the cloud. He is a Pluralsight author, the author of eight technical books, Onalytica's Who’s Who in Cloud?-top 50, and a former 10-time Microsoft MVP. He has presented at tech events, including, DevOps Days, Open Source North, Midwest Management Summit (MMS), Microsoft Ignite, BITCon, Experts Live Europe, OSCON, Inside Azure management, keynote at Minnebar 18, and user groups. He has been a guest on over a dozen podcasts and has been featured in several publications including the Star Tribune (the 5th largest newspaper in the US). He stays active in the technical community and enjoys blogging about his adventures in the world of IT at www.buchatech.com

More about this author