Small Language Models (SLM)

The big AI players – Gemini, ChatGPT, DeepSeek – get the spotlight, but Small Language Models (SLMs) are attracting the attention of AI leaders in the workplace. Lightweight and secure, they run on your internal data, speak your company’s language, and stay fully in your domain and control. For less money and compute, SLMs deliver speed, privacy, and precision for tasks like HR, legal, or IT support. Build one in-house or start with a simple RAG setup. Not as powerful as Copilot, but faster, safer, and focused. What they are, why they’re good for some and what you need to build one.

Fiona Passantino, AI Leadership, 13 MAY 2025

What’s a Small Language Model (SLM)?

For the past year or more, we’ve been generating away, using Large Language Models for professional and personal use, for generalized tasks and cognitive work. They are extremely powerful, putting their whopping multi-billion parameter brains to work to read our emails, summarize long documents and more.

What are Small Language Models? Small Language Models (SLMs) are lightweight versions of traditional language models designed to operate efficiently on resource-constrained environments such as smartphones, embedded systems, or low-power computers. While large language models have hundreds of billions of parameters, SLMs typically have less than 10 billion parameters[i]. SLMs might not have the mountains of training data that the big guys have but can still engage in convincing text generation, summarization, translation and research.

A growing number of forward-thinking companies are choosing to build and run their own mini-models in-house. Optimized for narrow, domain-specific tasks, the SLMs don’t have the computing power of a Gemini, but offer something potentially more valuable: speed, relevant data, process control, cost savings, and data privacy. By keeping sensitive information within your infrastructure, you bypass the risk and regulatory overhead of sending customer, employee, or proprietary data to third-party models. This is particularly relevant for organizations operating under GDPR, HIPAA, or other data protection regimes.

Why do I want one?

SLMs are trained on internal documents; they can deal with your organizational context and stick to your brand guidelines. Where frontier LLMs may stumble on your unique jargon, your SLM can be fluent in it, creating relevant, accurate and useful responses for specific parts of the business. There are no external rate fluctuations, fluid API pricing schemes or third-party disruptions.

Because SLMs are much smaller than their giant cousins, they respond faster and can often be run in real time, even on the hardware you have in-house. You control when it updates, how it behaves, and what data it can access.

Not every organization can pull off a small model build in house. It requires talent, a small team of Python coders, well-organized, structured, labeled and cleaned data and the right IT infrastructure. How do you even start?

Building a Small Language Model

1. Define your use case

SLMs are best for narrow, repetitive, well-defined tasks. What will your in-house AI be used for? Think of HR, legal, IT support or compliance teams that require multiple users querying the same dataset, hoping for ease of use and a low-tech interface. One internal SLM can support multiple use cases for reusability across departments.

2. Choose a suitable open-source model

Pick an open-source SLM that fits your needs; these are open-sources “framework” model weights that you can download and finetune within your own teams. Choose a model that balances accuracy with speed and memory requirements. Frameworks from Hugging Face Transformers, LMDeploy or Ollama allow you to set up quickly and easily[ii].

3. Set up the back-end

The next question is where to install your new system. The closer your SLM is to your data, the better; safer, faster and less error-prone. The first option is to run the system in-house. If you have servers that can run highly-powered GPUs and/or CPUs, a stable power and internet source and the staff to maintain it, keeping your SLM at home offers the most control.

Or, install your SLM in your private cloud (Azure, AWS, GCP), which is what most lean businesses run these days. Setting this up requires VMs or “Virtual Machines”. These are computers within computers that act like independent machines, like a safe sandbox for you to run your systems within a rented (shared) space. Most cloud services allow you to set up VMs with the CPU, memory and storage you need.

4. Build the front-end

With the back-end gearing up, it’s time to think about the front-end. Your team will have to interact with your SLM in a way that’s accessible, convenient, and easy to understand. For these tools, the simpler the better; one prompt input window, one giant output, and a few buttons to vary and refine, but that’s all that’s needed.

A simple user interface (UI) can take the form of a chatbot (most popular, easy to build), a plugin (living on a desktop or company phone), dashboard (think: Power BI) or it can integrate into existing tools used by the organization (Slack, CRM).

Alternatively, use RAG (Retrieval Augmented Generation) to plug in internal documents without retraining. Or fine-tune the model using your internal datasets (emails, SOPs, manuals). Fine-tuning requires labeled data and compute, while RAG is faster and often sufficient.

To connect the front to the back, use a lightweight API (see “Building APIs”) like FastAPI or LangChain to connect front-end and back-end. These APIs are not going to be taking orders from outside the organization, so they don’t need to be complex or have commercial-grade security requirements; the only people querying the model will be internal employees, and the only data accessible to the model will be internal documents.

5. Monitor and Iterate

If the project is a huge success, it can be developed and iterated. Launch with a stripped-down version and get it out there as soon as possible to track performance, keep track of hallucination rates, and to gather user feedback. Your IT crew should be setting up logging, usage analytics and feedback loops which are built into the delivery release train.

Yes, you can build and run your own mini-AI, fed with internal documents, and used by your employees. It won’t beat ChatGPT-4 in terms of sheer computational muscle, but it’s fast, private, highly focused… and all yours.

What does it cost?

€ 10,000 – € 40,000 for a functioning Minimum Viable, with operating (maintenance) costs between € 500 – € 3,000 per month depending on scale and usage. It will take between 3-6 weeks from idea to launch, depending on the complexity of the interface and scope of internal documents. It will take one dedicated AI engineer plus one to two backend developers and one part-time product owner at the least, also for operating and maintaining the bot over time.

Start with one focused use case for one part of the organization, such as an internal HR chatbot. A RAG interface is cheaper and faster; once this is in place, spinning your 1.0 version, you can always build out the functionality by investing in fine-tuning. Leverage open-source models to avoid license fees (widely available on Hugging Face or Measure ROI with employee hours saved, cost per API call avoided, and data privacy value

Need help with AI Integration?

Reach out to me for advice – I have a few nice tricks up my sleeve to help guide you on your way, as well as a few “insiders’ links” I can share to get you that free trial version you need to get started.

No eyeballs to read or watch? Just listen.

Working Humans is a bi-monthly podcast focusing on the AI and Human connection at work. Available on Apple and Spotify.

Listen on APPLE
Listen on SPOTIFY

About Fiona Passantino

Fiona helps empower working Humans with AI integration, leadership and communication. Maximizing connection, engagement and creativity for more joy and inspiration into the workplace. A passionate keynote speaker, trainer, facilitator and coach, she is a prolific content producer, host of the podcast “Working Humans” and award-winning author of the “Comic Books for Executives” series. Her latest book is “The AI-Powered Professional”.

[i] Kerner (2025) “What is a small language model (SLM)?” Tech Target. https://www.techtarget.com/whatis/definition/small-language-model-SLM

[ii] Johnson (2025) “Small Language Models (SLM): A Comprehensive Overview” Hugging Face. https://huggingface.co/blog/jjokah/small-language-model

Inspired. Engaged. AI-Powered.

Small Language Models (SLM)

No eyeballs to read or watch? Just listen.

About Fiona Passantino

Submit a Comment Cancel reply

Recent Posts

Recent Comments

Where AI + Human meet.