Building Your AI Team in 2025
by Stephen M. Walker II, Co-Founder / CEO
AI teams come in many shapes and sizes I'm going to share our insights This comes from 1:1 conversations from over a hundred AI teams this year
What is common about the teams
/
- comfortable with ambiguity
- Familiarity with models and prompt engineering
- Collaborative across disciplines
- Systematically iterative and experimental
What are common shapes of the teams
-
collaborative founders (founder + eng + pm) / for experiences core to the brand and product
-
eng pairs (full stack engineer + data engineer) / for experiences that require great data for it to work
-
product trio (pm + eng + des) / for experiences extending existing features
-
eng trio (pm + eng + data) / for technical (codegen) experiences
-
domain squad (pm/domain expert + eng + data) / for expert generations requiring strong domain expertise
what's commmon across these patterns
- founder, PM, and domain expert play similar roles in the team
- product-minded engineers drive success
- data engineers necessary when generative experience relies on existing data
What hard skills are needed on the team
- deep understanding of domain, customer, and use case
- full-stack engineering to move across data, backend, and frontend
- data engineering for improving generative and retrieval performance
/ can start with 1-2 people, scale to 3-4 once you traction
What hard technical skills are needed on the team
- interaction design, or familiar with frameworks like shadcn
- sse, will need to get used to assembling streamed data
- caching and query optimization, LLMs are slow, you will need to find hacks
What hard LLM skills are needed on the team
- prompt engineering
- retrieval
- fine-tuning
- evaluation
What the team is doing that is new with LLMs
- prompt engineering
- evaluating prompts and models
/ what's not new, but adapted to LLMs
- gathering user feedback on generations
- trackking usage on generations and second-order activity
- running a/b experiments
- running code tests calling evals as teams checkin changes
prompt engineering
by now, most people have experience interacting with models via chatgpt or bard it is not recommended to add people to your AI team without experience
how most teams work / organize prompts / code, google sheet, notion, or klu
/ collaborate across team / share versions and feedback from teams
getting to v1 / early evaluation / start with a few prompt cases / run 10-20 times and take note on how things are working
v2 and beyond / interations / have a golden dataset / compare previous and current versions
domain experts
teams working in acredited or regulated industries will want an expert in the loop we see experts helping on projects related to strategy, marketing, code, legal
in most cases, engineering is not able to quickly continue the measure-learn-build loop without this expert / the outlier is for code gen, where engineers are often the only expert
the pmf team
the most productive teams shipping quickly and finding pmf are not building their own models unless you're in the 1% of the AI teams, you should not spend your time here all of the teams we've seen build custom models have gone back to using a best-in-class model
next-level collaboration
working on genai introduces a new level of collaboration we've never seen before engineers and product teams work closer to founders and domain experts than ever before the most successful teams see each other as true collaborators and not their linkedin job title
emerging behaviors
- prompt versions and forks across team members
- rapid collaboration, all hands on deck at release time
- methodical notes tracking learnings between prompt versions
how klu fits into their stack
- klu replaces notion and google sheets
- teams collaborate on and evaluate prompts in one place, with real-time insights
- engineers replace or augment calls by using the klu SDK
- data teams adopt klu for dataset labeling with human and AI feedback
Skills
- Comfortable with ambiguity — Generative AI models like LLMs often encounter complex and nuanced inputs that don't always have clear-cut solutions. The ability to be comfortable with ambiguity is crucial for making thoughtful decisions about how prompts should be written, what trade-offs should be made, and how results should be evaluated. Whether you're dealing with frontend design, backend development, or infrastructure optimization, an understanding of intricate systems is crucial for success in AI applications.
Familiarity with models and prompt engineering — While some engineers may be less experienced in LLMs than others, it's essential that they have a basic understanding of how these models work and the process of prompt engineering. This familiarity will help them communicate effectively with domain experts, data scientists, and other engineers who specialize in AI.
Collaborative — As mentioned earlier, prompt engineering requires close collaboration between product teams and engineering teams. Engineers need to be comfortable working closely with non-technical domain experts and facilitating their input into the AI development process. They should also be able to communicate effectively with data scientists and other engineers who may be involved in training or fine-tuning models.
Systematic workflow — Generative AI applications require a systematic workflow that prioritizes rigorous evaluation of results, feedback analysis, and continuous improvement. Engineers need to have experience designing such workflows, as well as the ability to think critically about how prompt engineering can be integrated into existing product development processes.
////
Generative AI and Large Language Models (LLMs) are new to most companies. If you're an engineering leader building Gen AI applications, it can be hard to know what skills and types of people are needed. At Klu.ai we've helped hundreds of companies put Large Language Models (LLMs) into production and in this post I'd like to share what we've learned about the skills needed to build a great AI team.
Building AI Teams in 2025: Key Insights
The landscape of AI development has shifted dramatically in recent years. The rise of Large Language Models (LLMs) like GPT-4 and open-source alternatives like LLaMa has reduced the need for specialized machine learning engineers. These models come pre-trained with a general understanding of the world and language, eliminating the need for custom model training from scratch. This shift has opened the door for more companies to adopt AI, as the talent required is likely already in-house.
One of the key skills in this new era of AI application development is "prompt engineering". This involves crafting clear, natural language instructions or "prompts" for the model, replacing the need for annotated datasets. Prompt engineering requires excellent written communication, a willingness to experiment, and a familiarity with the strengths and weaknesses of modern AI models. It doesn't require specific mathematical or technical knowledge, making it an ideal task for domain experts and product managers who understand the end user's needs.
The role of product managers and domain experts has evolved with the advent of LLMs. They are no longer one step removed from implementation but can directly shape AI products through prompt engineering. This not only saves engineering time but also shortens the feedback loop from deployment to improvement. Companies like Twain and Duolingo have successfully employed this approach, utilizing linguists and salespeople as prompt engineers to customize their AI models.
Despite the increased role of AI, the majority of an AI application still consists of traditional code. Full-stack engineers are responsible for building the majority of the application, orchestrating model calls, establishing the infrastructure for prompt engineering, integrating data sources to augment the model's context, and optimizing performance. Techniques like "finetuning" and "retrieval augmented generation" (RAG) are commonly used to optimize LLM performance.
The 'AI Engineer' role has emerged as a crucial position that requires some familiarity but not deep expertise with AI. This role sits closer to product than research, bridging the gap between technical and non-technical teams.