Creating More Compelling AI Product Experiences

Thinking Beyond a Single Model

LLMs are incredibly powerful and, generally, very capable tools. I tend to think about them as being able to help with almost anything… a little bit. Not only do they require clear direction from someone who understands them, but they also generally require oversight, feedback, and some good old-fashioned work to get the job done right. Now that I say that, they’re not unlike my older kids…

Across the SaaS industry, (I Hope) we’re wrapping up this initial phase of integrating the “generic” LLM experience into existing products. Many of the tools I have been using for a while have some sort of AI button or experience built-in, and most of them do it through some kind of chat interface… which I’m personally really tired of.

There’s a wave of new products that are being built with AI as part of their core technology. So you have incredible products like Opus or one that I am currently working on called Luster that will have multiple features and experiences within the product powered by AI from day 1 and designed to be natural and intuitive to the user based on what they are trying to do.

The most impressive AI experiences I’ve seen aren’t powered solely by LLMs, they actually leverage more specialized models or trained agents.

As more companies reach that level of integrated AI, we’ll also see the bar for what AI can do (and what it can do well) being raised. Some companies are already doing this, but we’ll start to see it more and more often with complex tasks requiring a high degree of accuracy or more dynamic workflows requiring diversification in specialization.

Interactions Between Models

Models can interact with each other, sharing information and insights, to enhance the overall functionality and user experience. You can see this in action right now if you use GPT pro and have played around with custom GPTs You’re now allowed to @ mention the GPT you want/need from any thread. So, I might ask one of my product-focused GPTs for ideas and then ask Launchy Illustrator to create an image to accompany the topic.

Give it a try and share your Launchy with us

The Shift from Generic to Specialized AI

Specialized AI models, such as text-to-speech (TTS) technologies, are designed to perform specific tasks with a high degree of precision. Unlike generic models, these specialized systems are trained on task-specific data, leading to improved performance and efficiency. For instance, Meta AI's Voicebox represents a significant advancement in speech-generative AI, able to generate high-quality audio clips across multiple languages, perform noise removal, content editing, and style conversion. This model outperforms previous state-of-the-art models in zero-shot text-to-speech tasks, demonstrating superior intelligibility and audio similarity while being significantly faster.

Training specialized models like Voicebox involves using vast amounts of relevant data—in this case, over 50,000 hours of recorded speech—to learn diverse speech patterns without requiring explicit labeling of variations. Similarly, Amazon’s BASE model was trained on over 100,000 hours worth of data, and impressed researchers and businesses alike with its ability to demonstrate more dynamic speech with emotion and inflection. When comparing the output from these tools to what we’re used to hearing from Siri or Alexa, it’s clear to see how the experience for the end user would be impacted in a significant way.

The lighter weight and specialized nature of models like Voicebox, Deepgram's Aura, and ElevenLabs (my current fave)significantly impact application performance and architecture by allowing faster response times and more natural interactions, particularly valuable in real-time applications like IVR systems or AI agents for customer service.

The Power of Specialization

Research and development in specialized AI have led to models achieving tasks with a higher degree of accuracy and confidence. For example, Tortoise TTS excels in voice cloning and long-form content narration, making it ideal for creating virtual assistants or audiobooks with lifelike, customizable voices. This model leverages a unique combination of a Tacotron-style encoder-decoder and an audio compression autoencoder, facilitating efficient voice cloning and high-quality speech synthesis.

The problem with specialization is that it only solves one part of the equation or, in our case, creates one part of the experience. Maybe Elevenlabs is great at converting text to speech, but you still need to generate the initial text in the first place, right?

This is where multi-agent and multi-model setups shine. Instead of relying on one model’s ability to do all of the tasks at an acceptable level, you can build a specialized “team” to ensure that every piece of the customer/user experience meets your standards for quality, usability, and creativity.

You can also use this approach for complex tasks outside direct interactions. Maybe you have three models that are all OK at detecting fraud, but their OK isn’t good enough to meet your customers' product requirements or needs. You can use an ensemble technique to allow each model to make predictions and then look at how many of those models agree or have the same prediction. I had the chance to see this used in one of my first projects at EIG and we saw it shatter the industry standard for accuracy, giving this company a huge competitive advantage over the market and a solid technical moat.

All of this being said to tell you that you don’t have to accept the experiences that are possible by leveraging a specific service like OpenAI. In fact, Logan Killpatrick even stated in his interview with Lenny that it’s a general model and that the big opportunity in the AI space is on specialized models and agents doing specific things really well.

Rising to the Challenge with Multi-Model and Multi-Agent AI

The integration of multi-model and multi-agent AI into SaaS products is paving the way for a new generation of intelligent applications. These systems can understand and act upon complex user inputs, offering more personalized and dynamic experiences. The concept of Large Action Models, for example, suggests a future where AI devices not only interpret user commands but also execute tasks on their behalf, moving towards a more proactive and interactive model of user assistance.

Rabbit.tech and OpenInterpreter.com are among the initiatives exploring the possibilities of AI-powered devices capable of taking actions based on complex user inputs. These developments indicate a shift towards more autonomous, intelligent systems that can support users in a wide range of tasks, from everyday activities to specialized professional workflows.

Making it happen

Creating SaaS products that incorporate multi-model and multi-agent AI requires a detailed and role-specific approach to planning, designing, and building. Our whole goal with this newsletter is to make sure you walk away with ideas that inspire you and a clear path to take action, so here comes the actionable part.

For Product Managers

Focus on AI-Driven User Stories and Requirements:

  • When crafting user stories, emphasize scenarios where AI acts as an assistant or decision-maker, and outline how these interactions should feel to the user — even if it isn’t an explicit step the user takes. In fact, write out every step you expect (or hope) the AI to take.

  • Consider requirements that specify the AI's role in context and take the extra step to identify what it needs to be successful. It’s not unlike putting together an onboarding plan for a new team member, they need information, access, skills, and a whole lot of context.

Resources:

  • AI for Product Managers (Coursera Specialization): Provides a foundation in AI technology, applications, and product development lifecycle.

  • I also really enjoyed the 15-day BootCamp email series from NoCode.ai

For Designers

Designing AI-Interactive UIs:

  • Think about leverage AI as a way to subtract from the interface.

  • Integrate clear visual feedback for AI actions, such as animations or icons that indicate when AI is processing or has completed a task.

  • Design multimodal interfaces that support voice, text, and touch input, providing flexibility in how users interact with AI features.

Resources:

Interactive Experience Considerations:

  • Consider the transparency and predictability of AI interactions. Users should understand why the AI is making certain recommendations or decisions. Integrate the need for reasoning into your prompts to ensure consistent output and alignment on logic.

  • Plan for a feedback loop where users can correct or refine AI outputs, enhancing the system's accuracy and relevance.

  • Consider human feedback for humans and human feedback for AI. Human feedback for AI needs to be specific to a specific task or output. That doesn’t replace the need for product teams to understand how people feel about their experience.

For Engineers

There are new models, developer tools, and products for developers launched by the day. My best advice to you is, “Start with research.” Check and see what the Open Source community has tried, is using, or has created. Hugging Face is a gold mine.

Choosing the Right Tools and Frameworks:

  • For orchestrating multi-model AI, consider using TensorFlow Extended (TFX) for end-to-end machine learning pipelines that can handle multiple models.

  • For multi-agent systems, explore frameworks like OpenAI Gym for developing and comparing reinforcement learning algorithms.

  • Langchain tends to be the standard for developing AI applications and my team is eagerly awaiting a chance to try out Crew.

Technical Strategy for Integration:

  • Utilize microservices architecture to modularly integrate different AI models, allowing for flexibility and scalability.

  • Implement robust APIs for seamless communication between AI components, ensuring data consistency and real-time responsiveness.

That’s it… What did you think?