Categories
Artificial Intelligence

The Real Reason 95 Percent of AI Implementations are Failing

AI is a transformational tool that can lead to significant improvements in key operational metrics. Yet 95% of implementations are failing according to MIT’s Nanda research group and that’s because there’s a gigantic skill gap between individuals who drive the idea teams and those who actually implement AI systems.

Unless this gap is addressed through meaningful upskilling and training throughout firms of all sizes, companies will continue to throw bad money after good in hopes of chasing returns that will never materialize or rescuing projects doomed to fail from the start.

AI Ideas Are a Dime a Dozen

The first issue we need to address is that ideas for implementing AI whether it’s cleaning up a paper based accounting workflow, speeding up creation and delivery of marketing assets or automatically writing custom emails to sales prospects are a dime a dozen.

It’s not an insult or dig at anyone building or selling enterprise level engagements but there’s a blindness that any process in effect today can magically be transformed by integrating AI and that’s where the issues begin.

In my previous consulting work, we had thousands of ideas for how AI could impact every aspect of our clients’ operations and we were constantly pitching AI for (insert business process here) and how the manual, grind heavy work of today could be completely automated tomorrow.

The disconnect was that we would build this beautiful dream for our prospects … “Imagine this 14 day process condensed into 4 hours and how that would transform your marketing department” but there was no thought to how it would actually get done.

It was the art of the possible with just a small footnote of implementation.

Just Plug AI into the System

Generally teams think of AI implementations as:

  • Hooking into ChatGPT
  • Training a custom model based on your assets and brand voice
  • Leveraging an existing proprietary AI
  • Leveraging a 3rd party vendor that says they can do this

The problem is that very few people at the idea level have any concept of how this works in real world environments. In the US, people in charge of AI projects have little to no training in AI deployment.

It’s one thing to understand the basic concept of how AI reasons output when you ask it a question – but it’s another thing entirely to understand what training a model really entails and how to design a working AI system diagram with inputs/, outputs, gating, security protocols and compliance measures.

Teams often sell an idea with the implicit assumption that someone smarter in some remote team could take that input and magically design a working system that met all of the project needs. Deferring the cognitive and implementation workload to unknown people on unknown teams somewhere in the organization is the cardinal sin of project management.

Data is Important?

The one thing teams never discuss is the role that clean, structured data plays to enable AI’s ‘magic’. Unfortunately employees on the consulting and client side are often so caught up in the demands from higher ups for rapid ASAP implementation or the lucid dream of a single click being able to create a tidal wave of productivity that the idea of rigorous data cleanup and evaluation gets lost in the shuffle.

The selling point to prospects was “We’ll take your data and …” “We can use your existing data to transform your …” “Leveraging your data we can” … when the reality was that client data was often, for compliance reasons, siloed and gated among systems. Projects that needed to tie an interaction in one part of a data ecosystem to an outcome somewhere else were dead in the water because compliance wouldn’t allow those pieces to interact.

Furthermore a lot of data was missing or incomplete and training a model using large datasets where 20% of the fields are blank is a gigantic waste of time and money. In fact a data error rate of 0.0001% is enough to poison an entire model in certain instances.

There was a lofty idea that as long as data exists, somewhere in the ether, that you just connect it to AI and tell it to predict outcomes or generate content and voila your work is done and the fee can be collected.

Someone Somewhere Can Do It

We need to dive further into the cardinal sin of assuming someone somewhere can do something.

Organizations across every industry instill the mentality of “Be scrappy, make calls, get to someone who can get to someone who can” … and that mentality is the exact opposite of what you need when working with AI.

Greenlighting an idea on the assumption you can just Google a vendor or throw a requirements document to someone in India or Poland and they’ll magically build an A-team for implementation is an all too common practice that kills progress.

You don’t just make a ‘webhook into a GPT’ or ‘build a local model’ and assume that because it sounds smart to say – that it actually does anything. AI implementations require significant planning, team building, data cleanup and testing just to get basic output that will conform to a single acceptance value.

There’s nothing wrong with having an idea and wanting to execute it at scale but without the foundational knowledge of how this happens, your project won’t go anywhere. It’s like having an idea for a new car and instead of partnering with an automotive design studio, you go to a junkyard and try to piece something together.

Where do we go from here?

We need mass investment in upskilling at both the brand and consulting level to get team members up to speed on the mechanics and requirements of implementing and operating AI enhanced systems.

A small course asking someone to write a prompt and use Midjourney to make an image are in no way sufficient for large sclae operations.

We need to train teams on data, compliance, model function, prompting and so much more than how teams are trained right now at the same enterprises priding themselves on selling the ideas of the future. Failing to upskill teams will lead to even more wasteful spending and at some point someone somewhere will pull the plug after another dead in the water project with no ROI launches.

Categories
Artificial Intelligence

Remove Ambiguity: The Gallahue Method to Improve LLM Output and Efficiency

Pushing prompts that haven’t been optimized for efficiency and output controls into a production environment is unfortunately a common and wasteful practice that drives token budgets and kills progress.

The Gallahue method to fix and address these problems is simple yet effective in removing wasted tokens and improving overall outputs.

Scenario:

Let’s imagine I’m creating a website and I want to list the 10 best restaurants in every major city by cuisine type. Lacking the time and budget to visit and catalog these places personally, I write two prompts to source the web and generate JSON output I can quickly loop using N8N or another tool and then patch into a CMS.

Prompt 1:

For New York City, I need a bullet point list of the 10 best restaurants overall and the 10 best restaurants for the following cuisine types: American, Chinese, Sushi, Pizza, Steakhouses. Output as a JSON with entry for each restaurant under corresponding list.

Prompt 2:

For Eleven Madison Park, I need the following details: Year opened, cuisine type, average price point, star rating, executive chef, acclaimed dishes and a 150 word summary detailing why the restaurant is seen as exemplary. Output as a JSON with items for each point.

(Note: For both examples I used a single prompt but the setup in practice would be a system and user prompt where city/restaurant value is specified and of course websearch enabled)

Leveraging the Gallahue Method to Improve Prompt Output

On the surface, both prompts seem to be reasonable in the sense that the average person can read them and understand what’s being asked and what is the expected output.

However both have multiple flaws that will lead to degraded and poorly articulated output so let’s use my method to refine:

The Gallahue Method (DGRC) for Prompt Optimization:

  • Define – Strictly define inputs and outputs
  • Guide – Provide LLMs with easy paths to resolution
  • Review – Check prompt for ambiguity until cleared
  • Compress – Refine optimized prompt into simplest form for token efficiency

Defining Inputs and Outputs

Both prompts struggle from a lack of defined inputs and output. Specificity is critical to ensuring you get output that is of value and not a waste of tokens.

Silence of the Lambs is actually a weird but effective way to think about managing definitions. In a pivotal scene, Hannibal Lecter summarizes Marcus Aurelius by saying “Ask of each thing: What is its nature?” and that’s how you should approach writing production ready prompts.

Is this word, ask, input, output requirement clearly defined to the point that if I run this prompt 100 times on different accounts with no personalization, I will get the same answer format every time? If not, how I change it so that output normalizes?

Let’s look at how the define exercise works in practice by highlighting ambiguous or poorly defined terms:

Prompt 1:

For New York City, I need a BULLET POINT LIST of the 10 BEST restaurants OVERALL and the 10 BEST restaurants for THE FOLLOWING cuisine types: American, Chinese, Sushi, Pizza, Steakhouses. Output as a JSON with ENTRY FOR EACH RESTAURANT under corresponding list.

Bullet point list: While this is a tangible output structure, what specifically is the output? Is it the restaurant name? Is it a restaurant name and address? Do you want an ordered ranking list and a summary of the restaurant? If so what would be the criteria.

In this instance, the LLM must make a best guess as to what you want and depending on model and mode, you’ll either get a compact list or a 2,000 word dissertation on fine dining spots.

Best/Overall: Poor word choice is a mistake that even the most skilled prompt engineers will make. Ambiguous but well meaning terms like best or overall are open to interpretation and in my experience, LLMs will inference you’re looking for a composite of a brand with high level of exposure and high review counts and ratings on Yelp/G2/Trustpilot/Capterra which may not always align with your thinking.

In this case, New York City has no shortage of highly rated/reviewed restaurants and you could make individual lists for Midtown, Brooklyn, Chelsea etc … and have no degradation in quality of inclusions.

In this example one might replace best or overall with the requirement of a rating of 4.6 or higher on a specific site(s) and open at least 3 years to refine the list.

The Following: Absolutely critical point here … when you present LLMs with a list and do not gate the ouput, it’s likely you will get additional content that does not conform to what you want.

LLMs are instructed in system logic to be accommodating and because of this, they will often assume you would want additional output unless instructed otherwise. In this case because you want to build a recommended restaurant list … the LLM completing the prompt could naturally assume you want to cover all cuisine types and add Indian, Thai or vegan to the output.

Hence why you should always gate your output. You can use terms like: “Do not include additional categories” “Only generate for the items in this list” etc …

JSON Entry for Each Restaurant: The issue here is the exact same one we just identified in the first example where we ask for output but don’t define exactly what we want and thus leave LLMs to infer what we want based on the pattern.

Furthermore we haven’t defined that we only want the JSON and depending on LLM and settings, we may get summary text, introduction text or status confirmations that do not conform to our needs and output requirements.

Let’s take our definition exercise to the next prompt:

For Eleven Madison Park, I need the following details: Year opened, cuisine type, average price point, star rating, executive chef, acclaimed dishes and a 150 word summary detailing why the restaurant is seen as exemplary. Output as a JSON with items for each point.

I know 95% of the prompt is highlighted but it will make sense as we go line by line:

Eleven Madison Park: The restaurant has a unique name but if this were to run for a common restaurant name like Great Wall, Taste of India or Hibachi Express (all of which have 30+ unrelated restaurants in the United States) you would quickly introduce a best guess situation where models have to inference which location is correct. In this example, including an address as part of the prompt would remove any chance of location inaccuracy.

Cuisine Type: In this instance we have set a category in the previous prompt but we are not asking the subsequent prompt to conform to the

Average Price Point: The ambiguity here is that we don’t explicitly say whether we want the average cost of a meal and whether to output a range or specific dollar amount of whether we want a tiered metric like $, $$, $$$ (aligned of course to cost tiers we would set) that you commonly see in restaurant guides.

Star Rating: When it comes to restaurants we could be referencing Michelin stars or a traditional 1-5 star value. Second if we are referencing the traditional star value then we need to explain where to pull that value from and whether it’s a single source or composite.

Executive Chef: Restaurants commonly post this on web, press and social materials but in the instance they do not or they have recently parted ways with said individual … we need to give resolution path so that the LLM doesn’t waste tokens going down a rabbit hole trying to find your answer. For something like this, I’ll commonly put a line like: “If a value cannot be determined then output NA”

Acclaimed Dishes: In the previous section, we talked about terms like best/overall and what LLMs might inference from them. Acclaimed is another example that while it seems defined, it is still open to vast interpretation. Are you looking for dishes specifically referenced in high value source reviews or looking for the most commonly mentioned dishes in a review set? Again another point to define.

150 Word Summary: In the request we want a 150 word summary as to why the restaurant is exemplary. We need to define exemplary and to that we should provide LLMs with example points to highlight: Does the restaurant have a Michelin star? Has it been voted as a top 100/50/10 restaurant in the world or local market in the following sources ( ) …

Output as a JSON: Same feedback as above where you need to make sure that only the JSON is output and no intro text, summaries, notes etc …

Guiding LLMs

The second component to improving efficiency and quality of LLM output is to guide LLMs to key sources they can rely upon for conclusions.

When it comes to restaurant reviews and best of lists, especially for major markets, there’s no shortage of sources from Michelin Guides to YouTube videos shot in someone’s car of hot takes, dish reviews and tier rankings.

That being said when you are asking a LLM to put together a list like this, it might default to sources that aren’t the most valuable. For instance LLMs tuned to favor certain sources may choose to source from Reddit as a priority over Resy, Eater or The New York Times. In that case you would get snippet content from anonymous accounts over long form details from authors with names and a contextual list of works.

Unfortunately most people don’t guide their prompts with preferred sources and that leads to issues like the one above as well as wasted reasoning tokens as LLMs try to work out what sources would be best for a request and then search for relevant blocks on those sources.

How to “Guide”

Guiding is simply appending what sources you believe are best aligned to the output you want to achieve.

If I want to generate content about the 5 best new SUVs for families I can specify that my preferred sources or only sources to consider for composite research are Consumer Reports, Car and Driver and Motor Trend.

Subsequently I can also list sources to ignore. For instance let’s say I want to generate a balanced content piece talking about George W Bush’s presidential legacy. I will want to remove sources that will both overly praise and criticize his time in office in favor of sources that are more objective and in line with what I want to generate.

By doing this, we take the guesswork out of LLMs trying to inference what is a ‘best’ source for an item and instead go straight to search and processing.

Review Prompts with LLMs

Before you even ask LLMs to generate a single piece of output, you should always put your prompt into their system and ask it to review for points of ambiguity.

Iterating a prompt that is functionally deficient is a waste of resources whereas building the right prompt and fine tuning output is the much more efficient path.

Here’s the prompt I use to critically analyze my work:

You are a Prompt Auditor. Evaluate the target prompt strictly and return JSON that follows the schema below.

RUNTIME ASSUMPTIONS

  • Timezone: America/Chicago. Use ISO dates.
  • Reproducibility: temperature ≤ 0.3, top_p = 1.0. If supported, seed = 42.
  • Do NOT execute the target prompt’s task. Only evaluate and produce clarifications/rewrites.

DEFINITIONS

  • Components = { goal, inputs_context, constraints, required_process, websearch_dependencies, output_format, style_voice, examples_test_cases, non_goals_scope }.
  • Ambiguity categories = { vague_quantifier, subjective_term, relative_time_place, undefined_entity_acronym, missing_unit_range, schema_gap, eval_criteria_gap, dependency_unspecified }.
  • “Significant difference” threshold (for variance risk): any of { schema_mismatch, contradictory key facts, length_delta>20%, low_semantic_overlap }.

WEBSEARCH DECISION RULES

  • Flag websearch_required = true if the target prompt asks for: current facts/stats, named entities, “latest/today”, laws/policies, prices, schedules, or anything dated after the model’s known cutoff.
  • If websearch_required = true:
  • Provide candidate_domains (3–8 authoritative sites) and query_templates (2–5).
  • If you actually perform search in your environment, list sources_found with {title, url, publish_date}. If you do not perform search, leave sources_found empty and only provide candidate_domains and query_templates.

OUTPUT FORMAT (JSON only)
{
“ambiguity_report”: [
{
“component”: “”,
“span_quote”: “”,
“category”: “”,
“why_it_matters”: “<1–2 sentences>”,
“severity”: “”,
“fix_suggestion”: “”,
“clarifying_question”: “”
}
// …repeat for each issue found
],
“variance_risk_summary”: {
“risk_level”: “”,
“drivers”: [“”],
“controls”: [“Set temperature≤0.3”, “Specify schema”, “Pin timezone”, “Add ranges/units”]
},
“resolution_plan”: [
{
“step”: 1,
“goal”: “”,
“actions”: [“”],
“acceptance_criteria”: [“”],
“websearch_required”: ,
“candidate_domains”: [“”, “…”],
“query_templates”: [“”, “…”],
“sources_found”: [
{“title”:””,”url”:”<url>”,”publish_date”:”YYYY-MM-DD”}] } // …continue until the prompt is unambiguous and runnable], “final_rewritten_prompt”: “<<<A fully clarified, runnable prompt that incorporates fixes, constraints, and exact output schema.>>>” } TARGET PROMPT TO EVALUATE

PROMPT_TO_EVALUATE: “””[PASTE YOUR PROMPT HERE]”””

For the simplest of prompts, I’ll often get a laundry list of output flags and that’s even with years of experience writing and reviewing prompts.

Compress Compress Compress

The final step is to compress the prompt. Compression similar to minifying javascript on a site doesn’t do much if it’s a prompt you’re running a few times but if it’s a prompt you’ll potentially run millions of times, then reducing unnecessary characters, words, phrasing can save thousands of dollars in the long run.

Compression is both removing words and optimizing order of operations so that prompts run quickly and efficiently.

What does compression look like?

Let’s take a basic prompt

“Write a 200 word summary of Apple’s iPhone 17 launch event. Only talk about the new products and updates introduced and don’t talk about influencer or market reactions. Write it from a neutral point of view and don’t overly praise or criticize their launch event”

It’s pretty well defined but notice how we ask for output in the first sentence then tell it additional refinements in the following sentences? Those can create invisible loops where a reasoning path starts only to then be redirected based on subsequent instruction.

Second we use phrases like launch event multiple times which may not seem important if we’re running a prompt once but if this is a foundational prompt for a Saas product and it will be run millions of times, it’s going to waste thousands of dollars in tokens reading the same words twice.

Here’s the compressed prompt:

Write 175-225 words in neutral plain english covering only new products and updates from Apple’s iPhone 17 launch event.

It’s to the point, efficient and doesn’t restate words multiple times. One key point is not to over compress and accidentally remove key guardrails. Always error on the side of caution and again if you aren’t going to be using the same prompt multiple times then compression may not be necessary.

Putting it All Together Using the Gallahue Method

Consider this prompt for someone looking to write a nutrition guide. I’ve started it at a base level with no definition, guiding or review:

“Hey ChatGPT I need you to write a 600 word article about Kale. I want you to cover kale from three different angles. The first is a overview of what kale is and why it’s considered a superfood due to nutrient density, vitamins A,C, and K, fiber, antioxidants and how those are important for heart, bone and eye health. Second can you cover common diets that use a lot of kale like vegan, keto, paleo and mediterranean diets and finally can you give four examples of how someone can use kale in their day to day cooking? Ideally raw salads, sautéed dishes, smoothies and kale chips.”

On the surface it seems basic and self-explanatory but if I’m generating the same output for 30-40 foods or ingredients, there’s inefficiency that will slow it down and may not lead to the best output.

Here’s the optimized version that removes ambiguity and provides specific instruction for output:

Write 550–650 words in plain English about kale. Output only the article text with exactly three H2 headings: Overview, Health Benefits, Diet Uses.
Cover: what kale is and common varieties; why it’s considered a “superfood” (nutrient density—vitamins A, C, K; fiber; antioxidants—and key benefits for heart, bone, and eye health); and how to use it across diets (Mediterranean, vegan/vegetarian, keto, paleo) with four brief prep examples (raw salad, sautéed, smoothie, baked chips).
Keep a neutral, informative tone. Avoid fluff, anecdotes, lists, citations, or calls to action.

It’s to the point, specific with output and leaves no doubt as to what the LLM needs to generate in order to satisfy output requirements. Using my methodology on your prompts going forward will only serve to improve your own understanding of how LLMs reason output but also improve overall delivery of material.

In closing …

It’s important to realize no prompt will be perfect the first time you run it and in fact many of the prompts I work on have gone through 30-60 revisions in staging before they ever make it to production.

Think of this process as a debugging step for human language and it will all make sense.

Categories
AIO Artificial Intelligence

ChatGPT Custom API Call Template for N8N

Maybe it’s a byproduct of Google search getting progressively worse for narrow outcome searches or the fact that it may not yet exist – but I couldn’t find any JSON templates for custom calls to ChatGPT.

For any low/no-coders using N8N to prototype AI wrappers or just play with it’s call capabilities you might you may want to play with more settings than the managed call environment offers and I want to make that transition easier for you with this JSON template that covers all of the main settings and configs you’d want to pass or really experiment with.

Managed Call vs Custom Call

N8N and other no/low-code platforms offer managed ChatGPT flows that make it easy to message models but as your skillset grows and maybe your curiosity when it comes temperature and p value settings – you’ll find these managed calls won’t work.

In addition to fine-tuning settings, everything you’ll be using in your mini-app will be in JSON payloads so why not have your call in JSON as well?

ChatGPT Custom Call Template

To set this up:

  1. Create a HTTP Request step in your flow and link it
  2. Set Method to POST
  3. Set URL to https://api.openai.com/v1/responses
  4. If you’re using managed calls, you would have already setup your authentication. You can pass your key as part of a custom call but for this example let’s keep things simple
  5. ‘Send Query Parameters’ / ‘Send Headers’ should be set to OFF
  6. ‘Send Body’ set to ON

7. Set Body Content Type to JSON

8. Set Specify Body to JSON

9. Customize and Paste this template

{
"model": "gpt-4.1",
"input": [
{
"role": "system",
"content": [
{
"type": "input_text",
"text": "System Prompt Text"
}
]
},
{
"role": "user",
"content": [
{
"type": "input_text",
"text": "User Prompt Text"
}
]
}
],
"text": {
"format": {
"type": "text"
}
},
"reasoning": {},
"tools": [
{
"type": "web_search_preview",
"user_location": {
"type": "approximate",
"country": "US"
},
"search_context_size": "medium"
}
],
"temperature": 0,
"max_output_tokens": 2048,
"top_p": 0,
"store": true
}

So what are we setting in the above? Let’s talk through it:

  1. System Prompt
    • How you want the system to function
  2. User Prompt
    • What do you want to generate in reference to the system prompt
  3. Model
    • Which model do you want to use
  4. Websearch
    • Do you want websearch enabled
  5. Search_context_size
    • Depending on how involved the prompt, you will want to set this to high or specify the value in tokens
  6. Temperature
    • A value of 0 means you want ChatGPT to be purely deterministic and only generate results based on empirical data
    • A value of 1 means you want ChatGPT to be incredibly creative in answers and theorize answers in ways that may not be supported by empirical data
    • You can set any value in between to fine-tune
  7. Max_output_tokens
    • This value essentially determines output length. At the moment, the maximum number of tokens supported is 128,000 which would translate roughly into 96,000 words (1 token roughly equals 0.75 words)
  8. Top_p
    • Similar to temperature, when the model is outputting the next word, do you want it to focus only on the most common words or focus on less frequent combinations. 0 means you want more predictable output … 1 means more creative responses.
    • You can set any value between 0 and 1 to fine tune output
  9. Store
    • A true value means you want to have the conversations available in your dashboard
    • A false value means you don’t want to keep the conversations
Categories
AIO

How Glassdoor and Indeed Reviews Impact AI Brand Perception

I’ve worked with companies that obsessed over their Glassdoor reviews like Daniel Day-Lewis obsesses over every aspect of his character. I’ve also worked at companies that couldn’t care less and saw the platform as a way for disgruntled employees to take parting shots when things didn’t work out.

Whether your brand is on one side of the spectrum or more in the middle, you need to start paying attention to the roles, content and context of your inbound reviews and ratings because as you’ll see in this case study, it can have global implications.

Ranking Global IT Firms

Having come from 2.5 great years at TCS, I decided to test RankBee’s (in development) AI brand intelligence tool for the top global IT firms. Similar to the rating scale in my post on Luxury Brands, I pulled in global brand metrics and then segmented the results for a competitive set of Accenture, TCS, Infosys, Cognizant and Deloitte.

After putting a myriad of data into our system, the numbers showed Accenture was the overall category leader and my old firm TCS was 3rd in the group based on scores. Most of the brand factor scores were between 8.5-9.5 but what stuck out was that the lowest score for any of these factor metrics was on delivery which is the lifeblood of IT client servicing.

You’ll notice the number is under the heading ‘IT Firm 5’ because selfishly I want to have some level of mystery/engagement as to what the brand could be and also just putting it there, I feel like some people would read it and say “Makes sense” or “I’ve heard that about them” without diving deeper into why this is being generated by LLMs in the first place.

“Inconsistent Delivery and Slower to Innovate”

I began diving into the reason for the low score and first looked at the sources being pulled in to generate both the positive and negative sides of the score/perception.

On the positive side, external sources like press releases, C level interviews and industry awards for client satisfaction were being pulled in – alongside the client’s own case studies and investor relations pages talking about internal client survey results.

For most brands, this would form a very positive score of at least 8.5 but unfortunately for this IT firm, there was quite a bit of content that subtracted from the score – bringing it nearly a full point below their next competitor.

Glassdoor and Indeed’s Impact of AI Brand Perception

When looking at what was forming the basis for the low delivery score, there were a multitude of citations for Glassdoor and Indeed and when you think about it, it really makes sense as to why these sites would have the outsized impact they have.

The cited sources were review category pages for roles like delivery manager, project manager, delivery lead, account manager etc… and as I read through page after page of detailed, largely critical employee experiences, it really hit me as to why LLMs would think these are a rich data source for informing their opinion on delivery.

First you have the context that the material is coming from someone whose role would have been intimately familiar with that particular attribute of the consideration process – in this case project delivery and management.

Second is that reviews, especially on the employment side, tend to highlight specific, contextually relevant items in a long form fashion that would give LLMs even further contextual elaboration as to the company’s ability to meet certain functions.

Finally a single review won’t be enough to establish a trend but with the number of employees these firms have spread across multiple geographies and the levels of turnover natural to these operations, there’s a considerable volume of data and with that potentially enough material for LLMs to start seeing trends.

Could the Score Actually Impact Deal Flow?

IT services at this level are largely sold through relationships where you’ll have 10-15 people on an account constantly meeting and interacting with clients and those connections are largely what drive deal flow vs an enterprise brand going to Google or ChatGPT and trying to find service providers.

However as companies continue to look for ways to further reduce overhead, procurement functions are likely to be more and more AI driven and it’s conceivable that in the future, software will automatically determine which approved vendors can and cannot bid for a contract.

In that reality, which I feel is not too far away, companies like this one would see their lower delivery score begin to impact their sales pipeline in meaningful ways and may not even realize why until it’s too late.

So what should brands do?

Brands need to pay closer attention to macro trends that emerge from review sites and give them proper consideration rather than ignoring or getting into the habit of looking at a site like Glassdoor or Yelp as a place for venting frustration.

As seen in this example, a trend established over time with contextually relevant content from contextually relevant individuals can have an outsized impact on a global 100,000 employee firm.

Now that we can see how this impacts LLMs, brands should incorporate public exit “interview” trends as a core part of their annual internal reviews and leverage trends to better inform areas of structural and process improvement going forward to buck trends and establish better baselines.

Keep in mind that LLMs won’t laser focus on a single source to form a perception and in this case, there is a considerable amount of positive content talking about this company’s delivery strength – however brands need to do what they can to impact sources of doubt and continue to build on sources of strength.

Post originally written by me on LinkedIn April 1, 2025.

Categories
AIO Artificial Intelligence

AI’s Most Loved and Loathed Luxury Brands

Since opening our beta, the RankBee team has collected a large amount of prompt data from our early adopters – and one of the emerging trends is how ChatGPT and other LLMs perceive luxury brands.

In the same way that we humans look at luxury as a balance of name and quality, LLMs use that same reasoning when examining whether a consumer is paying for a durable, quality heirloom or a simply a label.

Every Brand Starts with a Blank Slate

LLMs don’t have a natural programming for or against luxury goods or their buyers. These systems are blank slates to be filled by the world’s available content and while LLMs ingest the works of Karl Marx, they also take in the latest fashion news and influencer recommendations. That is to say that if a LLM became human, it would be just as likely to go on strike as it would to shop SoHo’s boutiques.

Thus the conclusions and perceptions below are LLMs drawing broad conclusions based on the billions of writeups, posts, reviews, user experiences and ephemeral musings available for it to reference.

Brands focused heavily on quality, utility and making the purchase and ownership experience truly memorable will see that reflected naturally in the content their customers produce. However brands that have focused on building a label first and the rest of the experience second can find themselves exposed.

How RankBee Scored Each Brand

Below is table comparing three of the top brands in luxury fashion with our RankBee AI score across key consideration points for luxury products. Hermès is the highest rated brand in luxury whereas Gucci is toward the bottom of the list for major luxury marks.

While Chanel scores very high and is on the cusp of being a top 5 brand, concerns raised over the last two years in both news articles and key influencer forums over quality permeated the data and for that reason knocked it down just barely below the top tier.

(Want to know what LLMs like ChatGPT are saying about your brand and how that impacts your visibility? We can help)

AI’s Highest Rated Luxury Brands

The following brands scored highest in RankBee’s AI brand power metric which takes into account multiple output factors across LLMs and key purchasing considerations to determine their score.

Note: The brand, quality and you’re paying for summaries you see below are composites generated by the LLMs based on the total prompt data collected.


Hermès | 9.6

Brand: The ultimate in luxury. Think Birkins, Kellys, and a decades-deep waitlist. Screams quiet power.

Quality: Impeccable. Hand-stitched leather, precision, and heritage craftsmanship at its peak.

You’re paying for: Legacy, exclusivity, and craftsmanship that outlives trends.


Bottega Veneta | 9.5

Brand: Stealth wealth with a modern edge. Known for the signature Intrecciato weave and minimal branding.

Quality: Superb leather and construction, especially under the newer creative direction.

You’re paying for: Texture, taste, and a logo-free flex.


Brunello Cucinelli | 9.5

Brand: The “King of Cashmere.” Understated, refined, and morally polished (literally runs a “humanistic” company).

Quality: Pristine materials, subtle tailoring, and that soft-spoken luxe vibe.

You’re paying for: Feel-good fashion—ethically made, ultra-luxe basics.


Loro Piana | 9.5

Brand: Peak Italian quiet luxury. Whispered among those who know. No logos, just pure fabric elitism.

Quality: Unmatched textiles—cashmere, vicuña, baby camel hair—woven like a dream.

You’re paying for: The feel of luxury. Softness, subtlety, and supreme understatement.


The Row | 9.4

Brand: Olsen twins’ brainchild turned cult minimalist label. Fashion editors’ and insiders’ uniform.

Quality: Tailored to perfection, rich fabrics, and that rare “nonchalant but $$$” vibe.

You’re paying for: Understated elegance with architectural precision.


The most interesting note is that the top 4 brands have existed for 70+ years with each scoring very high for both quality, heritage and status whereas The Row has only been around since 2006.

AI’s Lower Rated Luxury Brands

What you’ll notice is a mix of old world and new world labels – some of which revel in their perceptions and while others are works in progress when it comes to defining the next chapters of their brand story.


Gucci | 8.2

Brand: Lots of mass production, especially under Alessandro Michele’s reign when things got maximalist and heavily logo-driven.

Quality: Accessories and shoes can be solid, but not always commensurate with the price.

You’re paying for: A loud, recognizable label.


Christian Louboutin | 8.1

Brand: Red bottoms have icon status, but they’re known for being painful and not particularly durable.

Quality: They look beautiful but comfort and wearability? Not their strong suit.

You’re paying for: The red sole.


Balenciaga | 7.2

Brand: Wild price tags for items like destroyed sneakers or t-shirts with minimal design.

Quality: Sometimes decent, sometimes questionable — definitely inconsistent.

You’re paying for: Hype, irony, and edgy branding.


Off-White | 6.8

Brand: Basic tees and hoodies with quotation marks and logos for $$$.

Quality: Streetwear-level; decent but not luxury-tier.

You’re paying for: Virgil Abloh’s legacy and street cred.


Supreme | 6.6

Brand: T-shirts and accessories marked up 10x on resale.

Quality: Meh. Often Hanes-quality tees with branding.

You’re paying for: Exclusivity, hype, and resale culture.


How does this impact the bottom line?

For many of the brands on the lower end of the scoring spectrum, this likely won’t change anything among their core customer group. Brands like Supreme and Off-White have communities of raving fans who live and breathe their brand’s ethos and would likely look at some of the negative factors and shrug them off as part of the overall experience.

However these scores, high or low will have three key impacts over time:

Brand Visibility in LLMs: AI looks to provide the most contextually relevant answer for a prompt and low brand scores mean that opportunities to appear consistently for all product lines, even for legacy brands, could be reduced. However smaller labels achieving higher scores over time with the right mix of quality and experience can overindex relative to their reach and revenue.

New to Brand Consumers: People saving for a luxury purchase and paring down their list of brands for that first bag, outfit, etc … could leverage AI as a litmus test for whether buying a specific brand or item will provide the benefits they hope to achieve and poor marks could shift prospective buyers to other labels.

Long term brand trends: AI adds another layer of complexity for brand and PR teams to measure perception and when LLM data reinforces declines or slides in experience, it will further exacerbate efforts to get things back on track.

So How Do Brands Fix Their Perception?

ChatGPT is ultimately a reflection of available data and not inherent bias. Brands need to open their eyes to the fact that their customers and their individual experiences in aggregate have the power to significantly shape brand perception to a degree that outweighs what can purely be controlled from a PR, advertising and messaging perspective.

The first step is to understand how LLMs perceive a best in class product for your category and once you have that foundation you can then see how your brand is perceived, what sources drive those perceptions and how high or low scores are ultimately impacting your ability to be found when people search in LLMs.

Fixing perception requires that brand and marketing teams work together to define and execute on specific action items and messaging changes throughout their ecosystem. Brands that can get these groups aligned will find themselves with a significant advantage in AI search.

Exploring RankBee and Bespoke AI Research

The RankBee brand power score will be coming in a future release of the app and in the meantime if your brand is looking for bespoke research on consumer perception, content gaps, reputation blind spots and optimization, feel free to email me will@rankbee.ai and I’ll be happy to develop a custom plan to help your brand take control in LLMs.

This post originally appeared on my own LinkedIn