Building Your AI Team in 2025

December 8, 2025

by Stephen M. Walker II, Co-Founder / CEO

AI teams come in many shapes and sizes I'm going to share our insights This comes from 1:1 conversations from over a hundred AI teams this year

What is common about the teams

comfortable with ambiguity
Familiarity with models and prompt engineering
Collaborative across disciplines
Systematically iterative and experimental

What are common shapes of the teams

collaborative founders (founder + eng + pm) / for experiences core to the brand and product
eng pairs (full stack engineer + data engineer) / for experiences that require great data for it to work
product trio (pm + eng + des) / for experiences extending existing features
eng trio (pm + eng + data) / for technical (codegen) experiences
domain squad (pm/domain expert + eng + data) / for expert generations requiring strong domain expertise

what's commmon across these patterns

founder, PM, and domain expert play similar roles in the team
product-minded engineers drive success
data engineers necessary when generative experience relies on existing data

What hard skills are needed on the team

deep understanding of domain, customer, and use case
full-stack engineering to move across data, backend, and frontend
data engineering for improving generative and retrieval performance

/ can start with 1-2 people, scale to 3-4 once you traction

What hard technical skills are needed on the team

interaction design, or familiar with frameworks like shadcn
sse, will need to get used to assembling streamed data
caching and query optimization, LLMs are slow, you will need to find hacks

What hard LLM skills are needed on the team

prompt engineering
retrieval
fine-tuning
evaluation

What the team is doing that is new with LLMs

prompt engineering
evaluating prompts and models

/ what's not new, but adapted to LLMs

gathering user feedback on generations
trackking usage on generations and second-order activity
running a/b experiments
running code tests calling evals as teams checkin changes

prompt engineering

by now, most people have experience interacting with models via chatgpt or bard it is not recommended to add people to your AI team without experience

how most teams work / organize prompts / code, google sheet, notion, or klu

/ collaborate across team / share versions and feedback from teams

getting to v1 / early evaluation / start with a few prompt cases / run 10-20 times and take note on how things are working

v2 and beyond / interations / have a golden dataset / compare previous and current versions

domain experts

teams working in acredited or regulated industries will want an expert in the loop we see experts helping on projects related to strategy, marketing, code, legal

in most cases, engineering is not able to quickly continue the measure-learn-build loop without this expert / the outlier is for code gen, where engineers are often the only expert

the pmf team

the most productive teams shipping quickly and finding pmf are not building their own models unless you're in the 1% of the AI teams, you should not spend your time here all of the teams we've seen build custom models have gone back to using a best-in-class model

next-level collaboration

working on genai introduces a new level of collaboration we've never seen before engineers and product teams work closer to founders and domain experts than ever before the most successful teams see each other as true collaborators and not their linkedin job title

emerging behaviors

prompt versions and forks across team members
rapid collaboration, all hands on deck at release time
methodical notes tracking learnings between prompt versions

how klu fits into their stack

klu replaces notion and google sheets
teams collaborate on and evaluate prompts in one place, with real-time insights
engineers replace or augment calls by using the klu SDK
data teams adopt klu for dataset labeling with human and AI feedback

Skills

Comfortable with ambiguity — Generative AI models like LLMs often encounter complex and nuanced inputs that don't always have clear-cut solutions. The ability to be comfortable with ambiguity is crucial for making thoughtful decisions about how prompts should be written, what trade-offs should be made, and how results should be evaluated. Whether you're dealing with frontend design, backend development, or infrastructure optimization, an understanding of intricate systems is crucial for success in AI applications.

Familiarity with models and prompt engineering — While some engineers may be less experienced in LLMs than others, it's essential that they have a basic understanding of how these models work and the process of prompt engineering. This familiarity will help them communicate effectively with domain experts, data scientists, and other engineers who specialize in AI.

Collaborative — As mentioned earlier, prompt engineering requires close collaboration between product teams and engineering teams. Engineers need to be comfortable working closely with non-technical domain experts and facilitating their input into the AI development process. They should also be able to communicate effectively with data scientists and other engineers who may be involved in training or fine-tuning models.

Systematic workflow — Generative AI applications require a systematic workflow that prioritizes rigorous evaluation of results, feedback analysis, and continuous improvement. Engineers need to have experience designing such workflows, as well as the ability to think critically about how prompt engineering can be integrated into existing product development processes.

////

Generative AI and Large Language Models (LLMs) are new to most companies. If you're an engineering leader building Gen AI applications, it can be hard to know what skills and types of people are needed. At Klu.ai we've helped hundreds of companies put Large Language Models (LLMs) into production and in this post I'd like to share what we've learned about the skills needed to build a great AI team.

Building AI Teams in 2025: Key Insights

The landscape of AI development has shifted dramatically in recent years. The rise of Large Language Models (LLMs) like GPT-4 and open-source alternatives like LLaMa has reduced the need for specialized machine learning engineers. These models come pre-trained with a general understanding of the world and language, eliminating the need for custom model training from scratch. This shift has opened the door for more companies to adopt AI, as the talent required is likely already in-house.

One of the key skills in this new era of AI application development is "prompt engineering". This involves crafting clear, natural language instructions or "prompts" for the model, replacing the need for annotated datasets. Prompt engineering requires excellent written communication, a willingness to experiment, and a familiarity with the strengths and weaknesses of modern AI models. It doesn't require specific mathematical or technical knowledge, making it an ideal task for domain experts and product managers who understand the end user's needs.

The role of product managers and domain experts has evolved with the advent of LLMs. They are no longer one step removed from implementation but can directly shape AI products through prompt engineering. This not only saves engineering time but also shortens the feedback loop from deployment to improvement. Companies like Twain and Duolingo have successfully employed this approach, utilizing linguists and salespeople as prompt engineers to customize their AI models.

Despite the increased role of AI, the majority of an AI application still consists of traditional code. Full-stack engineers are responsible for building the majority of the application, orchestrating model calls, establishing the infrastructure for prompt engineering, integrating data sources to augment the model's context, and optimizing performance. Techniques like "finetuning" and "retrieval augmented generation" (RAG) are commonly used to optimize LLM performance.

The 'AI Engineer' role has emerged as a crucial position that requires some familiarity but not deep expertise with AI. This role sits closer to product than research, bridging the gap between technical and non-technical teams.

Klu is remote-first and global

Follow us