Deep Research and Prompt Design

Designing interactions, conversations, and context with the help of LLMs.

Quick note- OpenAi seems to have made an update to custom GPTs that royally f’d up my Launchy Illustrator so I’ll get back to our normal Launchy visuals when I figure that out.

The PMBS Tester Tool: A Quick Recap

I am having a blast building things with the help of AI. Between tools like Replit, Lovable, and Cursor, I’m able to do so much more than I ever could on my own. I decided that one of the best ways to share the fun witht he rest of the world would be to document the process of building something. I picked a use case that is relevant to me and validated by all of the other product leaders (and teams) I’ve worked with… detecting bullsh*t.
The idea is that PMs/product leaders can use this tool to get honest and actionable feedback on their critical artifacts and deliverables like Product Requirements Docs (PRDs), product strategies, and even preliminary analyses of user data.

The initial result is something like:

  1. A structured score (like a “health check”).

  2. Actionable feedback on specific weak spots.

  3. Potential blind spots you might be missing in your rationale.

This simple projects gives me a chance to work through requirements gathering and definition, research and prompt design, agentic workflow/orchestraton, technical requirements, prototyping, and refining — which are skills and capabilities I think will be expected of all PMs in the near future.

Let’s dig in to today’s post.

Measure Twice, Cut Once (a.k.a. Do Your Research!)

Don’t Make the AI Guess

When working with GPT or any AI, you can’t just say, “Review my doc, please!” You’ll get something generic back. Or worse, you’ll get a rambling 10,000-word response that looks official but says basically nothing. Instead, it’s crucial to:

  1. Define Your Goals: Are you looking for a simple review or a thorough critique?

  2. Specify the Format of the Feedback: Do you want bullet points? A table with a “score” and “explanation”? A comedic roast of your PRD?

  3. Provide Essential Context: If the AI doesn’t know why you wrote the doc or who it’s for, it can’t tailor the feedback.

Think of it like wiring a house. The electrician doesn’t show up and haphazardly poke around your walls—she reviews schematics, checks load requirements, and confirms voltage. Any confusion in the plan leads to wires that don’t connect properly later. Likewise, you want your AI to “see” the entire structure before offering the final blueprint.

Crafting Clear Prompts = Defining the Scope

The UI, the Data, and the User Experience

In the YouTube video I shared, I walk through the initial stages of building this PMBS Tester Tool. One big realization? The prompt design actually determines a lot about the functionality you’re building. If you want the AI’s final output to include a numeric score, you have to plan for that:

  • Interface: Provide text fields or file uploads for the doc, plus a “score weighting” section.

  • Data Structuring: Make sure your prompt requests a structured output and put the example of the format you want into the prompt. I like to give it the structure I want and also define my expectation for each item in line like this:

## Format 

Score: the overall score for the document based on the rubric and normalized to a percentage. 

Feedback summary: One sentence that summarizes the state of the document based on our analysis. 

Recommendations: A list of actions the user can take to improve the artifact based on your our analysis. 

Feedback: An explanation of the categories where the artifact did not meet or criteria for awesomeness, the specific criteria that the artifact failed to meet, and an examples of content from the artifact that align to our assessment. 

  • Mental Framework: Decide what “good” or “bad” means for each doc type. A “perfect” PRD might differ drastically from a “perfect” user-interview analysis so you need to decide on the context needed for each and they best way to structure it. In this case, I know that rubrics can be helpful when trying to quantify quality so I am working on rubrics for each doc type and will pull them in as resources via prompt engineering later.

In other words, the scope isn’t just the application’s front end or back end anymore. It extends to how many separate AI “agents” or prompts you have, the tools you integrate for advanced analysis, and the potential edge cases.

IF you want to score something, you need to answer questions like:

  • What does the score represent?

  • How do you define or identify something that represents the range of scores e.g. “Wha does bad, mediocre, and good look like? I’ll probably use fun/cheeky terms like “Sh*t” “Meh” and “Pure Awesomeness”.

Prompt Design as the New Wireframing

When we think “wireframing,” we usually picture gray boxes with placeholders for text and images. With AI, the “prompt” is the wireframe. It’s where you lay out the functional parts and understand what you actually need to create the experience.

To build on the previous section,

  • If we want the user to get a score, we need to give the AI what it needs to score the document

  • If we want the user to be able to provide a document, we need UI to upload it.

  • If we want supporting docs to be included, we need:

    • A way to differentiate the doc we are scoring from supporting docs.

    • Instructions/prompting to tell the agent what to do with the supporting docs.

    • Updated scoring criteria that factors in use of supporting documents

Once you have that structure nailed, then you can refine the details. For me, that’s adding a bit of personality or a unique vantage point. I also like to try out my prompts with different models as a sort of prototype inside the tools I already use. In this case, I will spin up a custom GPT or a new prompt in Gemini that just evaluates a doc and I see how it feels without all of the agentic pieces added in.

Avoid Generic Criteria: Gather Real Insights

One pitfall in building “intelligent” testers is relying on the same old top-ten list of typical PM mistakes. That can lead to superficial feedback (“Don’t ignore your users!”). Nobody needs another fluffy, obvious tip.

Instead, I’ve been playing with “deep research” approaches in GPT Pro to pull examples from real case studies and interviews—the kind of gritty, genuine stories that highlight where PMs really get stuck. For instance:

  • Over-indexing on Early Feedback: Relying too heavily on a single user’s anecdotal complaint instead of weighting it against metrics and broader usage patterns.

  • Glossing Over Strategic Rationale: Presenting a flashy roadmap without explaining why it matters or how it aligns with the company’s higher-level vision.

I also just fund out today that Perplexity lets you select your sources for their “Deep Research” option, and that one of them is conversations from social — which is perfect if I want to get ideas from real people. CLick the image to see the results. Here’s where it thinks most PMs go wrong in their PRDs.

So Where Do We Go from Here?

  1. Keep Testing: Before locking your AI “prompts” into code, iterate! My first pass was messy: I tried to cram analysis, PRD critiques, and strategic outlines into one giant prompt. Not surprisingly, the AI’s output was all over the place. Now I’m experimenting with separate prompt templates for each doc type.

  2. Refine the Personality: Much like branding, your AI’s “voice” can shape how feedback is received. It could be encouraging or super blunt—just be sure you craft it intentionally.

  3. Build, Then Re-Build: Once I have a workable prompt design, I’ll move into breaking it down into the pieces the orchestration layer needs, expanding the prompts for each agent, and configuring the different tasks and workflows. This loop might be more time-consuming than you’d expect—don’t rush it.