Categories
Artificial Intelligence

Remove Ambiguity: The Gallahue Method to Improve LLM Output and Efficiency

Pushing prompts that haven’t been optimized for efficiency and output controls into a production environment is unfortunately a common and wasteful practice that drives token budgets and kills progress.

The Gallahue method to fix and address these problems is simple yet effective in removing wasted tokens and improving overall outputs.

Scenario:

Let’s imagine I’m creating a website and I want to list the 10 best restaurants in every major city by cuisine type. Lacking the time and budget to visit and catalog these places personally, I write two prompts to source the web and generate JSON output I can quickly loop using N8N or another tool and then patch into a CMS.

Prompt 1:

For New York City, I need a bullet point list of the 10 best restaurants overall and the 10 best restaurants for the following cuisine types: American, Chinese, Sushi, Pizza, Steakhouses. Output as a JSON with entry for each restaurant under corresponding list.

Prompt 2:

For Eleven Madison Park, I need the following details: Year opened, cuisine type, average price point, star rating, executive chef, acclaimed dishes and a 150 word summary detailing why the restaurant is seen as exemplary. Output as a JSON with items for each point.

(Note: For both examples I used a single prompt but the setup in practice would be a system and user prompt where city/restaurant value is specified and of course websearch enabled)

Leveraging the Gallahue Method to Improve Prompt Output

On the surface, both prompts seem to be reasonable in the sense that the average person can read them and understand what’s being asked and what is the expected output.

However both have multiple flaws that will lead to degraded and poorly articulated output so let’s use my method to refine:

The Gallahue Method (DGRC) for Prompt Optimization:

  • Define – Strictly define inputs and outputs
  • Guide – Provide LLMs with easy paths to resolution
  • Review – Check prompt for ambiguity until cleared
  • Compress – Refine optimized prompt into simplest form for token efficiency

Defining Inputs and Outputs

Both prompts struggle from a lack of defined inputs and output. Specificity is critical to ensuring you get output that is of value and not a waste of tokens.

Silence of the Lambs is actually a weird but effective way to think about managing definitions. In a pivotal scene, Hannibal Lecter summarizes Marcus Aurelius by saying “Ask of each thing: What is its nature?” and that’s how you should approach writing production ready prompts.

Is this word, ask, input, output requirement clearly defined to the point that if I run this prompt 100 times on different accounts with no personalization, I will get the same answer format every time? If not, how I change it so that output normalizes?

Let’s look at how the define exercise works in practice by highlighting ambiguous or poorly defined terms:

Prompt 1:

For New York City, I need a BULLET POINT LIST of the 10 BEST restaurants OVERALL and the 10 BEST restaurants for THE FOLLOWING cuisine types: American, Chinese, Sushi, Pizza, Steakhouses. Output as a JSON with ENTRY FOR EACH RESTAURANT under corresponding list.

Bullet point list: While this is a tangible output structure, what specifically is the output? Is it the restaurant name? Is it a restaurant name and address? Do you want an ordered ranking list and a summary of the restaurant? If so what would be the criteria.

In this instance, the LLM must make a best guess as to what you want and depending on model and mode, you’ll either get a compact list or a 2,000 word dissertation on fine dining spots.

Best/Overall: Poor word choice is a mistake that even the most skilled prompt engineers will make. Ambiguous but well meaning terms like best or overall are open to interpretation and in my experience, LLMs will inference you’re looking for a composite of a brand with high level of exposure and high review counts and ratings on Yelp/G2/Trustpilot/Capterra which may not always align with your thinking.

In this case, New York City has no shortage of highly rated/reviewed restaurants and you could make individual lists for Midtown, Brooklyn, Chelsea etc … and have no degradation in quality of inclusions.

In this example one might replace best or overall with the requirement of a rating of 4.6 or higher on a specific site(s) and open at least 3 years to refine the list.

The Following: Absolutely critical point here … when you present LLMs with a list and do not gate the ouput, it’s likely you will get additional content that does not conform to what you want.

LLMs are instructed in system logic to be accommodating and because of this, they will often assume you would want additional output unless instructed otherwise. In this case because you want to build a recommended restaurant list … the LLM completing the prompt could naturally assume you want to cover all cuisine types and add Indian, Thai or vegan to the output.

Hence why you should always gate your output. You can use terms like: “Do not include additional categories” “Only generate for the items in this list” etc …

JSON Entry for Each Restaurant: The issue here is the exact same one we just identified in the first example where we ask for output but don’t define exactly what we want and thus leave LLMs to infer what we want based on the pattern.

Furthermore we haven’t defined that we only want the JSON and depending on LLM and settings, we may get summary text, introduction text or status confirmations that do not conform to our needs and output requirements.

Let’s take our definition exercise to the next prompt:

For Eleven Madison Park, I need the following details: Year opened, cuisine type, average price point, star rating, executive chef, acclaimed dishes and a 150 word summary detailing why the restaurant is seen as exemplary. Output as a JSON with items for each point.

I know 95% of the prompt is highlighted but it will make sense as we go line by line:

Eleven Madison Park: The restaurant has a unique name but if this were to run for a common restaurant name like Great Wall, Taste of India or Hibachi Express (all of which have 30+ unrelated restaurants in the United States) you would quickly introduce a best guess situation where models have to inference which location is correct. In this example, including an address as part of the prompt would remove any chance of location inaccuracy.

Cuisine Type: In this instance we have set a category in the previous prompt but we are not asking the subsequent prompt to conform to the

Average Price Point: The ambiguity here is that we don’t explicitly say whether we want the average cost of a meal and whether to output a range or specific dollar amount of whether we want a tiered metric like $, $$, $$$ (aligned of course to cost tiers we would set) that you commonly see in restaurant guides.

Star Rating: When it comes to restaurants we could be referencing Michelin stars or a traditional 1-5 star value. Second if we are referencing the traditional star value then we need to explain where to pull that value from and whether it’s a single source or composite.

Executive Chef: Restaurants commonly post this on web, press and social materials but in the instance they do not or they have recently parted ways with said individual … we need to give resolution path so that the LLM doesn’t waste tokens going down a rabbit hole trying to find your answer. For something like this, I’ll commonly put a line like: “If a value cannot be determined then output NA”

Acclaimed Dishes: In the previous section, we talked about terms like best/overall and what LLMs might inference from them. Acclaimed is another example that while it seems defined, it is still open to vast interpretation. Are you looking for dishes specifically referenced in high value source reviews or looking for the most commonly mentioned dishes in a review set? Again another point to define.

150 Word Summary: In the request we want a 150 word summary as to why the restaurant is exemplary. We need to define exemplary and to that we should provide LLMs with example points to highlight: Does the restaurant have a Michelin star? Has it been voted as a top 100/50/10 restaurant in the world or local market in the following sources ( ) …

Output as a JSON: Same feedback as above where you need to make sure that only the JSON is output and no intro text, summaries, notes etc …

Guiding LLMs

The second component to improving efficiency and quality of LLM output is to guide LLMs to key sources they can rely upon for conclusions.

When it comes to restaurant reviews and best of lists, especially for major markets, there’s no shortage of sources from Michelin Guides to YouTube videos shot in someone’s car of hot takes, dish reviews and tier rankings.

That being said when you are asking a LLM to put together a list like this, it might default to sources that aren’t the most valuable. For instance LLMs tuned to favor certain sources may choose to source from Reddit as a priority over Resy, Eater or The New York Times. In that case you would get snippet content from anonymous accounts over long form details from authors with names and a contextual list of works.

Unfortunately most people don’t guide their prompts with preferred sources and that leads to issues like the one above as well as wasted reasoning tokens as LLMs try to work out what sources would be best for a request and then search for relevant blocks on those sources.

How to “Guide”

Guiding is simply appending what sources you believe are best aligned to the output you want to achieve.

If I want to generate content about the 5 best new SUVs for families I can specify that my preferred sources or only sources to consider for composite research are Consumer Reports, Car and Driver and Motor Trend.

Subsequently I can also list sources to ignore. For instance let’s say I want to generate a balanced content piece talking about George W Bush’s presidential legacy. I will want to remove sources that will both overly praise and criticize his time in office in favor of sources that are more objective and in line with what I want to generate.

By doing this, we take the guesswork out of LLMs trying to inference what is a ‘best’ source for an item and instead go straight to search and processing.

Review Prompts with LLMs

Before you even ask LLMs to generate a single piece of output, you should always put your prompt into their system and ask it to review for points of ambiguity.

Iterating a prompt that is functionally deficient is a waste of resources whereas building the right prompt and fine tuning output is the much more efficient path.

Here’s the prompt I use to critically analyze my work:

You are a Prompt Auditor. Evaluate the target prompt strictly and return JSON that follows the schema below.

RUNTIME ASSUMPTIONS

  • Timezone: America/Chicago. Use ISO dates.
  • Reproducibility: temperature ≤ 0.3, top_p = 1.0. If supported, seed = 42.
  • Do NOT execute the target prompt’s task. Only evaluate and produce clarifications/rewrites.

DEFINITIONS

  • Components = { goal, inputs_context, constraints, required_process, websearch_dependencies, output_format, style_voice, examples_test_cases, non_goals_scope }.
  • Ambiguity categories = { vague_quantifier, subjective_term, relative_time_place, undefined_entity_acronym, missing_unit_range, schema_gap, eval_criteria_gap, dependency_unspecified }.
  • “Significant difference” threshold (for variance risk): any of { schema_mismatch, contradictory key facts, length_delta>20%, low_semantic_overlap }.

WEBSEARCH DECISION RULES

  • Flag websearch_required = true if the target prompt asks for: current facts/stats, named entities, “latest/today”, laws/policies, prices, schedules, or anything dated after the model’s known cutoff.
  • If websearch_required = true:
  • Provide candidate_domains (3–8 authoritative sites) and query_templates (2–5).
  • If you actually perform search in your environment, list sources_found with {title, url, publish_date}. If you do not perform search, leave sources_found empty and only provide candidate_domains and query_templates.

OUTPUT FORMAT (JSON only)
{
“ambiguity_report”: [
{
“component”: “”,
“span_quote”: “”,
“category”: “”,
“why_it_matters”: “<1–2 sentences>”,
“severity”: “”,
“fix_suggestion”: “”,
“clarifying_question”: “”
}
// …repeat for each issue found
],
“variance_risk_summary”: {
“risk_level”: “”,
“drivers”: [“”],
“controls”: [“Set temperature≤0.3”, “Specify schema”, “Pin timezone”, “Add ranges/units”]
},
“resolution_plan”: [
{
“step”: 1,
“goal”: “”,
“actions”: [“”],
“acceptance_criteria”: [“”],
“websearch_required”: ,
“candidate_domains”: [“”, “…”],
“query_templates”: [“”, “…”],
“sources_found”: [
{“title”:””,”url”:”<url>”,”publish_date”:”YYYY-MM-DD”}] } // …continue until the prompt is unambiguous and runnable], “final_rewritten_prompt”: “<<<A fully clarified, runnable prompt that incorporates fixes, constraints, and exact output schema.>>>” } TARGET PROMPT TO EVALUATE

PROMPT_TO_EVALUATE: “””[PASTE YOUR PROMPT HERE]”””

For the simplest of prompts, I’ll often get a laundry list of output flags and that’s even with years of experience writing and reviewing prompts.

Compress Compress Compress

The final step is to compress the prompt. Compression similar to minifying javascript on a site doesn’t do much if it’s a prompt you’re running a few times but if it’s a prompt you’ll potentially run millions of times, then reducing unnecessary characters, words, phrasing can save thousands of dollars in the long run.

Compression is both removing words and optimizing order of operations so that prompts run quickly and efficiently.

What does compression look like?

Let’s take a basic prompt

“Write a 200 word summary of Apple’s iPhone 17 launch event. Only talk about the new products and updates introduced and don’t talk about influencer or market reactions. Write it from a neutral point of view and don’t overly praise or criticize their launch event”

It’s pretty well defined but notice how we ask for output in the first sentence then tell it additional refinements in the following sentences? Those can create invisible loops where a reasoning path starts only to then be redirected based on subsequent instruction.

Second we use phrases like launch event multiple times which may not seem important if we’re running a prompt once but if this is a foundational prompt for a Saas product and it will be run millions of times, it’s going to waste thousands of dollars in tokens reading the same words twice.

Here’s the compressed prompt:

Write 175-225 words in neutral plain english covering only new products and updates from Apple’s iPhone 17 launch event.

It’s to the point, efficient and doesn’t restate words multiple times. One key point is not to over compress and accidentally remove key guardrails. Always error on the side of caution and again if you aren’t going to be using the same prompt multiple times then compression may not be necessary.

Putting it All Together Using the Gallahue Method

Consider this prompt for someone looking to write a nutrition guide. I’ve started it at a base level with no definition, guiding or review:

“Hey ChatGPT I need you to write a 600 word article about Kale. I want you to cover kale from three different angles. The first is a overview of what kale is and why it’s considered a superfood due to nutrient density, vitamins A,C, and K, fiber, antioxidants and how those are important for heart, bone and eye health. Second can you cover common diets that use a lot of kale like vegan, keto, paleo and mediterranean diets and finally can you give four examples of how someone can use kale in their day to day cooking? Ideally raw salads, sautéed dishes, smoothies and kale chips.”

On the surface it seems basic and self-explanatory but if I’m generating the same output for 30-40 foods or ingredients, there’s inefficiency that will slow it down and may not lead to the best output.

Here’s the optimized version that removes ambiguity and provides specific instruction for output:

Write 550–650 words in plain English about kale. Output only the article text with exactly three H2 headings: Overview, Health Benefits, Diet Uses.
Cover: what kale is and common varieties; why it’s considered a “superfood” (nutrient density—vitamins A, C, K; fiber; antioxidants—and key benefits for heart, bone, and eye health); and how to use it across diets (Mediterranean, vegan/vegetarian, keto, paleo) with four brief prep examples (raw salad, sautéed, smoothie, baked chips).
Keep a neutral, informative tone. Avoid fluff, anecdotes, lists, citations, or calls to action.

It’s to the point, specific with output and leaves no doubt as to what the LLM needs to generate in order to satisfy output requirements. Using my methodology on your prompts going forward will only serve to improve your own understanding of how LLMs reason output but also improve overall delivery of material.

In closing …

It’s important to realize no prompt will be perfect the first time you run it and in fact many of the prompts I work on have gone through 30-60 revisions in staging before they ever make it to production.

Think of this process as a debugging step for human language and it will all make sense.

Categories
AIO Artificial Intelligence

ChatGPT Custom API Call Template for N8N

Maybe it’s a byproduct of Google search getting progressively worse for narrow outcome searches or the fact that it may not yet exist – but I couldn’t find any JSON templates for custom calls to ChatGPT.

For any low/no-coders using N8N to prototype AI wrappers or just play with it’s call capabilities you might you may want to play with more settings than the managed call environment offers and I want to make that transition easier for you with this JSON template that covers all of the main settings and configs you’d want to pass or really experiment with.

Managed Call vs Custom Call

N8N and other no/low-code platforms offer managed ChatGPT flows that make it easy to message models but as your skillset grows and maybe your curiosity when it comes temperature and p value settings – you’ll find these managed calls won’t work.

In addition to fine-tuning settings, everything you’ll be using in your mini-app will be in JSON payloads so why not have your call in JSON as well?

ChatGPT Custom Call Template

To set this up:

  1. Create a HTTP Request step in your flow and link it
  2. Set Method to POST
  3. Set URL to https://api.openai.com/v1/responses
  4. If you’re using managed calls, you would have already setup your authentication. You can pass your key as part of a custom call but for this example let’s keep things simple
  5. ‘Send Query Parameters’ / ‘Send Headers’ should be set to OFF
  6. ‘Send Body’ set to ON

7. Set Body Content Type to JSON

8. Set Specify Body to JSON

9. Customize and Paste this template

{
"model": "gpt-4.1",
"input": [
{
"role": "system",
"content": [
{
"type": "input_text",
"text": "System Prompt Text"
}
]
},
{
"role": "user",
"content": [
{
"type": "input_text",
"text": "User Prompt Text"
}
]
}
],
"text": {
"format": {
"type": "text"
}
},
"reasoning": {},
"tools": [
{
"type": "web_search_preview",
"user_location": {
"type": "approximate",
"country": "US"
},
"search_context_size": "medium"
}
],
"temperature": 0,
"max_output_tokens": 2048,
"top_p": 0,
"store": true
}

So what are we setting in the above? Let’s talk through it:

  1. System Prompt
    • How you want the system to function
  2. User Prompt
    • What do you want to generate in reference to the system prompt
  3. Model
    • Which model do you want to use
  4. Websearch
    • Do you want websearch enabled
  5. Search_context_size
    • Depending on how involved the prompt, you will want to set this to high or specify the value in tokens
  6. Temperature
    • A value of 0 means you want ChatGPT to be purely deterministic and only generate results based on empirical data
    • A value of 1 means you want ChatGPT to be incredibly creative in answers and theorize answers in ways that may not be supported by empirical data
    • You can set any value in between to fine-tune
  7. Max_output_tokens
    • This value essentially determines output length. At the moment, the maximum number of tokens supported is 128,000 which would translate roughly into 96,000 words (1 token roughly equals 0.75 words)
  8. Top_p
    • Similar to temperature, when the model is outputting the next word, do you want it to focus only on the most common words or focus on less frequent combinations. 0 means you want more predictable output … 1 means more creative responses.
    • You can set any value between 0 and 1 to fine tune output
  9. Store
    • A true value means you want to have the conversations available in your dashboard
    • A false value means you don’t want to keep the conversations
Categories
AIO

How Glassdoor and Indeed Reviews Impact AI Brand Perception

I’ve worked with companies that obsessed over their Glassdoor reviews like Daniel Day-Lewis obsesses over every aspect of his character. I’ve also worked at companies that couldn’t care less and saw the platform as a way for disgruntled employees to take parting shots when things didn’t work out.

Whether your brand is on one side of the spectrum or more in the middle, you need to start paying attention to the roles, content and context of your inbound reviews and ratings because as you’ll see in this case study, it can have global implications.

Ranking Global IT Firms

Having come from 2.5 great years at TCS, I decided to test RankBee’s (in development) AI brand intelligence tool for the top global IT firms. Similar to the rating scale in my post on Luxury Brands, I pulled in global brand metrics and then segmented the results for a competitive set of Accenture, TCS, Infosys, Cognizant and Deloitte.

After putting a myriad of data into our system, the numbers showed Accenture was the overall category leader and my old firm TCS was 3rd in the group based on scores. Most of the brand factor scores were between 8.5-9.5 but what stuck out was that the lowest score for any of these factor metrics was on delivery which is the lifeblood of IT client servicing.

You’ll notice the number is under the heading ‘IT Firm 5’ because selfishly I want to have some level of mystery/engagement as to what the brand could be and also just putting it there, I feel like some people would read it and say “Makes sense” or “I’ve heard that about them” without diving deeper into why this is being generated by LLMs in the first place.

“Inconsistent Delivery and Slower to Innovate”

I began diving into the reason for the low score and first looked at the sources being pulled in to generate both the positive and negative sides of the score/perception.

On the positive side, external sources like press releases, C level interviews and industry awards for client satisfaction were being pulled in – alongside the client’s own case studies and investor relations pages talking about internal client survey results.

For most brands, this would form a very positive score of at least 8.5 but unfortunately for this IT firm, there was quite a bit of content that subtracted from the score – bringing it nearly a full point below their next competitor.

Glassdoor and Indeed’s Impact of AI Brand Perception

When looking at what was forming the basis for the low delivery score, there were a multitude of citations for Glassdoor and Indeed and when you think about it, it really makes sense as to why these sites would have the outsized impact they have.

The cited sources were review category pages for roles like delivery manager, project manager, delivery lead, account manager etc… and as I read through page after page of detailed, largely critical employee experiences, it really hit me as to why LLMs would think these are a rich data source for informing their opinion on delivery.

First you have the context that the material is coming from someone whose role would have been intimately familiar with that particular attribute of the consideration process – in this case project delivery and management.

Second is that reviews, especially on the employment side, tend to highlight specific, contextually relevant items in a long form fashion that would give LLMs even further contextual elaboration as to the company’s ability to meet certain functions.

Finally a single review won’t be enough to establish a trend but with the number of employees these firms have spread across multiple geographies and the levels of turnover natural to these operations, there’s a considerable volume of data and with that potentially enough material for LLMs to start seeing trends.

Could the Score Actually Impact Deal Flow?

IT services at this level are largely sold through relationships where you’ll have 10-15 people on an account constantly meeting and interacting with clients and those connections are largely what drive deal flow vs an enterprise brand going to Google or ChatGPT and trying to find service providers.

However as companies continue to look for ways to further reduce overhead, procurement functions are likely to be more and more AI driven and it’s conceivable that in the future, software will automatically determine which approved vendors can and cannot bid for a contract.

In that reality, which I feel is not too far away, companies like this one would see their lower delivery score begin to impact their sales pipeline in meaningful ways and may not even realize why until it’s too late.

So what should brands do?

Brands need to pay closer attention to macro trends that emerge from review sites and give them proper consideration rather than ignoring or getting into the habit of looking at a site like Glassdoor or Yelp as a place for venting frustration.

As seen in this example, a trend established over time with contextually relevant content from contextually relevant individuals can have an outsized impact on a global 100,000 employee firm.

Now that we can see how this impacts LLMs, brands should incorporate public exit “interview” trends as a core part of their annual internal reviews and leverage trends to better inform areas of structural and process improvement going forward to buck trends and establish better baselines.

Keep in mind that LLMs won’t laser focus on a single source to form a perception and in this case, there is a considerable amount of positive content talking about this company’s delivery strength – however brands need to do what they can to impact sources of doubt and continue to build on sources of strength.

Post originally written by me on LinkedIn April 1, 2025.

Categories
AIO Artificial Intelligence

AI’s Most Loved and Loathed Luxury Brands

Since opening our beta, the RankBee team has collected a large amount of prompt data from our early adopters – and one of the emerging trends is how ChatGPT and other LLMs perceive luxury brands.

In the same way that we humans look at luxury as a balance of name and quality, LLMs use that same reasoning when examining whether a consumer is paying for a durable, quality heirloom or a simply a label.

Every Brand Starts with a Blank Slate

LLMs don’t have a natural programming for or against luxury goods or their buyers. These systems are blank slates to be filled by the world’s available content and while LLMs ingest the works of Karl Marx, they also take in the latest fashion news and influencer recommendations. That is to say that if a LLM became human, it would be just as likely to go on strike as it would to shop SoHo’s boutiques.

Thus the conclusions and perceptions below are LLMs drawing broad conclusions based on the billions of writeups, posts, reviews, user experiences and ephemeral musings available for it to reference.

Brands focused heavily on quality, utility and making the purchase and ownership experience truly memorable will see that reflected naturally in the content their customers produce. However brands that have focused on building a label first and the rest of the experience second can find themselves exposed.

How RankBee Scored Each Brand

Below is table comparing three of the top brands in luxury fashion with our RankBee AI score across key consideration points for luxury products. Hermès is the highest rated brand in luxury whereas Gucci is toward the bottom of the list for major luxury marks.

While Chanel scores very high and is on the cusp of being a top 5 brand, concerns raised over the last two years in both news articles and key influencer forums over quality permeated the data and for that reason knocked it down just barely below the top tier.

(Want to know what LLMs like ChatGPT are saying about your brand and how that impacts your visibility? We can help)

AI’s Highest Rated Luxury Brands

The following brands scored highest in RankBee’s AI brand power metric which takes into account multiple output factors across LLMs and key purchasing considerations to determine their score.

Note: The brand, quality and you’re paying for summaries you see below are composites generated by the LLMs based on the total prompt data collected.


Hermès | 9.6

Brand: The ultimate in luxury. Think Birkins, Kellys, and a decades-deep waitlist. Screams quiet power.

Quality: Impeccable. Hand-stitched leather, precision, and heritage craftsmanship at its peak.

You’re paying for: Legacy, exclusivity, and craftsmanship that outlives trends.


Bottega Veneta | 9.5

Brand: Stealth wealth with a modern edge. Known for the signature Intrecciato weave and minimal branding.

Quality: Superb leather and construction, especially under the newer creative direction.

You’re paying for: Texture, taste, and a logo-free flex.


Brunello Cucinelli | 9.5

Brand: The “King of Cashmere.” Understated, refined, and morally polished (literally runs a “humanistic” company).

Quality: Pristine materials, subtle tailoring, and that soft-spoken luxe vibe.

You’re paying for: Feel-good fashion—ethically made, ultra-luxe basics.


Loro Piana | 9.5

Brand: Peak Italian quiet luxury. Whispered among those who know. No logos, just pure fabric elitism.

Quality: Unmatched textiles—cashmere, vicuña, baby camel hair—woven like a dream.

You’re paying for: The feel of luxury. Softness, subtlety, and supreme understatement.


The Row | 9.4

Brand: Olsen twins’ brainchild turned cult minimalist label. Fashion editors’ and insiders’ uniform.

Quality: Tailored to perfection, rich fabrics, and that rare “nonchalant but $$$” vibe.

You’re paying for: Understated elegance with architectural precision.


The most interesting note is that the top 4 brands have existed for 70+ years with each scoring very high for both quality, heritage and status whereas The Row has only been around since 2006.

AI’s Lower Rated Luxury Brands

What you’ll notice is a mix of old world and new world labels – some of which revel in their perceptions and while others are works in progress when it comes to defining the next chapters of their brand story.


Gucci | 8.2

Brand: Lots of mass production, especially under Alessandro Michele’s reign when things got maximalist and heavily logo-driven.

Quality: Accessories and shoes can be solid, but not always commensurate with the price.

You’re paying for: A loud, recognizable label.


Christian Louboutin | 8.1

Brand: Red bottoms have icon status, but they’re known for being painful and not particularly durable.

Quality: They look beautiful but comfort and wearability? Not their strong suit.

You’re paying for: The red sole.


Balenciaga | 7.2

Brand: Wild price tags for items like destroyed sneakers or t-shirts with minimal design.

Quality: Sometimes decent, sometimes questionable — definitely inconsistent.

You’re paying for: Hype, irony, and edgy branding.


Off-White | 6.8

Brand: Basic tees and hoodies with quotation marks and logos for $$$.

Quality: Streetwear-level; decent but not luxury-tier.

You’re paying for: Virgil Abloh’s legacy and street cred.


Supreme | 6.6

Brand: T-shirts and accessories marked up 10x on resale.

Quality: Meh. Often Hanes-quality tees with branding.

You’re paying for: Exclusivity, hype, and resale culture.


How does this impact the bottom line?

For many of the brands on the lower end of the scoring spectrum, this likely won’t change anything among their core customer group. Brands like Supreme and Off-White have communities of raving fans who live and breathe their brand’s ethos and would likely look at some of the negative factors and shrug them off as part of the overall experience.

However these scores, high or low will have three key impacts over time:

Brand Visibility in LLMs: AI looks to provide the most contextually relevant answer for a prompt and low brand scores mean that opportunities to appear consistently for all product lines, even for legacy brands, could be reduced. However smaller labels achieving higher scores over time with the right mix of quality and experience can overindex relative to their reach and revenue.

New to Brand Consumers: People saving for a luxury purchase and paring down their list of brands for that first bag, outfit, etc … could leverage AI as a litmus test for whether buying a specific brand or item will provide the benefits they hope to achieve and poor marks could shift prospective buyers to other labels.

Long term brand trends: AI adds another layer of complexity for brand and PR teams to measure perception and when LLM data reinforces declines or slides in experience, it will further exacerbate efforts to get things back on track.

So How Do Brands Fix Their Perception?

ChatGPT is ultimately a reflection of available data and not inherent bias. Brands need to open their eyes to the fact that their customers and their individual experiences in aggregate have the power to significantly shape brand perception to a degree that outweighs what can purely be controlled from a PR, advertising and messaging perspective.

The first step is to understand how LLMs perceive a best in class product for your category and once you have that foundation you can then see how your brand is perceived, what sources drive those perceptions and how high or low scores are ultimately impacting your ability to be found when people search in LLMs.

Fixing perception requires that brand and marketing teams work together to define and execute on specific action items and messaging changes throughout their ecosystem. Brands that can get these groups aligned will find themselves with a significant advantage in AI search.

Exploring RankBee and Bespoke AI Research

The RankBee brand power score will be coming in a future release of the app and in the meantime if your brand is looking for bespoke research on consumer perception, content gaps, reputation blind spots and optimization, feel free to email me will@rankbee.ai and I’ll be happy to develop a custom plan to help your brand take control in LLMs.

This post originally appeared on my own LinkedIn

Categories
Artificial Intelligence

Generative AI: How Close Are We to a Crossover Point?

When Toys”R”Us debuted their Sora AI generated ad last week, it set a new benchmark in terms of how quickly we are approaching a Generative AI media future. While the ad itself shows that the underlying technologies still have quite a bit of improvement before they’re ready to generate high fidelity media, it is nonetheless an achievement and a building block for future use.

What does this mean for the future of creative work and how will brands decide what to shoot in real life and what to generate instead?

What is the AI Crossover?

The current creative workflow is to draw, shoot, record and model out creative before using computers to then edit, correct and sequence assets into a final format. The crossover is the point where the majority of media (static images, videos) is generated by AI first and then subsequently edited by humans into a final format.

That crossover represents both a technology and confidence hurdle for generative AI and in the Toys”R”Us ad it’s clear that a lot of post-production work had to go into the final product just to make it function. Beyond generative issues with perspective, the clip featured noticeable edits that likely weren’t part of the original generated creative.

For a brand that has access to strong post-production teams, the generative AI crossover will likely be sooner rather than later as they can afford to work with middle of the road assets to build a completed product. However for smaller brands, the level of fidelity needs to be much higher before they can confidently begin building in AI.

Thus while adoption will continue to accelerate in some areas, the technology is still not at a level where it can universally meet the creative needs of all brands.

Cost to Shoot vs Cost to Generate

Once the technology gets to a point where generating high fidelity creative is possible, then the question as to whether creative will follow a traditional workflow vs a generated one will come down to cost.

Depending on the provider, it can cost anywhere from $0.05 to $0.12 to generate a single image based on token pricing. If you are trying to create a 30 second commercial at 30 frames per second then your rendered cost is less than $100 but that doesn’t include the number of iterations that would need to be made before you have a final product in addition to any renders you make for storyboards, concepts etc …

I would venture to suggest that the cost just to render a 30 second ad would easily be $4-5k all in and that’s before you add on any additional costs for pre/post production, sound, music licensing, or human voiceovers. Thus the true cost would be a multiple your render costs based on how much additional work is needed.

Given these parameters, brands looking to invest in Generative AI for use throughout their creative ecosystem should begin documenting and comparing costs between traditional ad creation and generative to determine which will be the better use in different scenarios.

Starting with the medium, audience and use brands should develop T table analyses of traditional costs like talent, locations, crews, pre/post vs pure rendering and editing to begin creating decision trees for their projects.

AI is Workflow Ready

While Generative AI may not yet be the right tool for producing final assets, it’s clear that the technology is ready for daily use as a way of creating sketches, proof of concept and storyboard pieces that can be used to inform high fidelity creation.

I know there is a lot of concern about the impact of AI on creative fields but at the very least adopting it as a workflow solution as an individual, agency or department will produce better returns in the long run.

Furthermore creative minds are still an asset that no amount of prompting and prodding will ever be able to replace so even if we get to a point where a 30 second commercial can be rendered in nothing more than a set of prompts, a mind and vision will still be required to create it.

Originally posted on my LinkedIn