Categories
AIO

Why ChatGPT Ads Make AI Rank Tracking More Essential Than Ever

When ChatGPT rolled out its latest Android update, most of the attention went toward performance improvements, UI polish, and model updates. But buried in the code was something far more consequential for marketers: hooks for AI-driven advertising inside ChatGPT responses.

It’s a quiet change with massive implications for brands and the marketers focused on their day-to-day performance.

OpenAI hinted at an ad revenue model in the past, (in fact one of my friends interviewed for a paid search role at the company earlier this year), but this beta update signals that their generated content will soon integrate paid placements.

As of today, we don’t know the size, timeline and scope of the rollout. Likely it will start with a set of enterprise brands in defined market segments with budgets and ad teams large enough to provide feedback and insights but once ads enter conversational interfaces at scale, every brand will be forced to rethink AI visibility in terms of how that is:

  • How visibility is earned
  • How visibility is bought
  • How visibility is measured

This is where AI rank tracking becomes mission-critical and tools like Rankbee become indispensable partners for digital marketers. Let’s take a look at how paid placement impacts LLMs and what marketers need to do next.

The New Frontier: Measuring Ad Performance vs. Organic Visibility

With ads entering ChatGPT, brands will soon be able to run side-by-side performance tests comparing how they perform organically inside LLM answers against how they perform through paid placement.

Consider organic LLM prompts like:

  • “Best business checking accounts for freelancers”
  • “Top-reviewed project management tools for agencies”
  • “Affordable home insurance options in Texas”

With rank tracking tools like RankBee, marketers can already see:

  • How often they appear
  • In what context
  • Against which competitors
  • With what frequency and sentiment

But with ChatGPT ads, the question becomes even more strategic:

Is paying for LLM placement accretive or simply cannibalizing the organic visibility my brand already earns?

RankBee provides the data layer needed to answer that:

  • Identify prompts where organic visibility is already high for your brand and competitors to run cannibalization and overlapping placement tests.
  • Segment visibility by customer segment and intent to determine the types of prompts most likely to be disrupted with paid ads
  • Track competitors with low visibility who are likely to bid aggressively where they lack coverage

Without LLM rank tracking, advertising on ChatGPT becomes guesswork and brands won’t get the level of insights they need to truly understand their ROI.

Scraping LLM Output at Scale to Detect Ad Placement

LLM ads won’t be universal. Some prompts will trigger ads. Some won’t. Ad units will vary by model, user context and commercial intent.

However once ads are officially launched, scraping and monitoring results between prompts that contain ads and those that do not can provide valuable insights to marketers like:

  • Which prompts start showing ads
  • Which categories are becoming commercialized first
  • Which industries see the earliest ad encroachment
  • Where smaller brands still dominate due to better structured content

And perhaps most importantly:

Where is the lowest-hanging fruit for larger brands to use ad spend to break into existing AI answer patterns.

Imagine you’re a marketer at an enterprise level firm and you see that a niche software tool appears in 82 percent of prompts for “Best AI customer service chatbots” while your brand appears in only 3 percent of prompts. That’s an easy opportunity for your AI advertising strategy but it’s made visible only through extensive rank tracking.

This type of intelligence will define winners and losers in the next wave of AI-driven customer discovery and again highlights the importance of implementing LLM rank tracking.

Detecting Sudden Drops in AI Traffic or Visibility Caused by Competitor Ad Spend

As ads roll out, some brands will wake up to sudden drops in AI traffic or visibility because competitors started paying for placement and those ad placements may impact the length behind a generated answer.

Think of it like this … let’s say a set of prompts around a key product attribute typically generates 5-6 brands per answer and rankings have been stable enough where URL data can infer a range of monthly referral traffic as a result.

Without an AI rank tracking tool, marketers will be in the dark when:

  • Previously stable prompt rankings begin to fall
  • Competitors suddenly appear in top answer sections
  • Answer units shift in structure
  • Sentiment or brand descriptions change
  • Traffic estimates drop sharply

The intelligence gained from LLM rank tracking tools will soon reveal:

  • Which competitors are now bidding in your category
  • What types of ad units they are testing
  • Whether their spend is affecting your visibility—paid or organic

Even if your brand is not advertising in ChatGPT, you still need to know:

  • Who is
  • Where
  • How often
  • And with what impact

This is where RankBee.ai’s competitor-intelligence engine becomes indispensable. It doesn’t just tell you that visibility changed, it tells you why, where, and because of whom.

AI Rank Tracking as the Attribution Layer for Paid + Organic LLM Placement

The introduction of ads will break the clean, minimalistic UI that made ChatGPT such a novel experience. It will echo historical moments from earlier platforms:

  • Facebook’s first ads, which disrupted drunken party-photo feeds
  • Instagram’s first sponsored posts, which were jarringly off-vibe
  • Google’s evolving ad units, which blur lines between organic and paid results

Every platform eventually adopts monetization. But the shift always changes user behavior and brand performance.

In LLMs, this shift is even more dramatic because:

  • Users see one answer, not multiple pages
  • Ads will likely coexist with AI-generated summaries
  • Paid content will dynamically blend into reasoning chains
  • Organic placement will now compete directly with paid reasoning

To navigate this hybrid environment, RankBee.ai becomes the central attribution layer, capable of:

  • Segmenting prompts by user intent
  • Separating paid vs. organic placement (when ads are launched)
  • Measuring impact on traffic, discovery, and conversion
  • Benchmarking performance across multiple LLM ecosystems
  • Tracking how ads reshape answer structures over time

As AI shifts from “clean command line” to “monetized media channel,” brands will need to build:

  • AI-specific content strategies
  • AI-specific visibility strategies
  • AI-specific attribution models
  • AI-specific competitive scrutiny

Without the right tools, brands will miss first mover advantage and pay a premium to catch up to optimized competitors.

Conclusion: LLM Native Ads Change the Game and RankBee.ai Helps You Win It

OpenAI introducing ads into ChatGPT is not a small UI adjustment. It’s a fundamental transformation of the LLM ecosystem.

It means:

  • Brand discovery inside LLMs becomes pay-to-play
  • Organic visibility becomes more precious
  • Competitive bidding moves upstream
  • Attribution must evolve
  • And marketers must rethink how they track performance in AI environments

RankBee.ai is ready to help brands conquer this new era by offering:

  • True AI rank tracking
  • Theme + intent visibility analysis
  • Competitor monitoring
  • Paid vs. organic attribution (when launched)
  • Insights into ad-triggering prompts
  • And the most complete data layer for the AI-first search landscape

As LLMs shift into monetized, competitive discovery engines, the brands that win will be those who have visibility in both senses of the word.

Ads are coming. RankBee.ai helps you see everything they change.

Categories
Artificial Intelligence

The Real Reason 95 Percent of AI Implementations are Failing

AI is a transformational tool that can lead to significant improvements in key operational metrics. Yet 95% of implementations are failing according to MIT’s Nanda research group and that’s because there’s a gigantic skill gap between individuals who drive the idea teams and those who actually implement AI systems.

Unless this gap is addressed through meaningful upskilling and training throughout firms of all sizes, companies will continue to throw bad money after good in hopes of chasing returns that will never materialize or rescuing projects doomed to fail from the start.

AI Ideas Are a Dime a Dozen

The first issue we need to address is that ideas for implementing AI whether it’s cleaning up a paper based accounting workflow, speeding up creation and delivery of marketing assets or automatically writing custom emails to sales prospects are a dime a dozen.

It’s not an insult or dig at anyone building or selling enterprise level engagements but there’s a blindness that any process in effect today can magically be transformed by integrating AI and that’s where the issues begin.

In my previous consulting work, we had thousands of ideas for how AI could impact every aspect of our clients’ operations and we were constantly pitching AI for (insert business process here) and how the manual, grind heavy work of today could be completely automated tomorrow.

The disconnect was that we would build this beautiful dream for our prospects … “Imagine this 14 day process condensed into 4 hours and how that would transform your marketing department” but there was no thought to how it would actually get done.

It was the art of the possible with just a small footnote of implementation.

Just Plug AI into the System

Generally teams think of AI implementations as:

  • Hooking into ChatGPT
  • Training a custom model based on your assets and brand voice
  • Leveraging an existing proprietary AI
  • Leveraging a 3rd party vendor that says they can do this

The problem is that very few people at the idea level have any concept of how this works in real world environments. In the US, people in charge of AI projects have little to no training in AI deployment.

It’s one thing to understand the basic concept of how AI reasons output when you ask it a question – but it’s another thing entirely to understand what training a model really entails and how to design a working AI system diagram with inputs/, outputs, gating, security protocols and compliance measures.

Teams often sell an idea with the implicit assumption that someone smarter in some remote team could take that input and magically design a working system that met all of the project needs. Deferring the cognitive and implementation workload to unknown people on unknown teams somewhere in the organization is the cardinal sin of project management.

Data is Important?

The one thing teams never discuss is the role that clean, structured data plays to enable AI’s ‘magic’. Unfortunately employees on the consulting and client side are often so caught up in the demands from higher ups for rapid ASAP implementation or the lucid dream of a single click being able to create a tidal wave of productivity that the idea of rigorous data cleanup and evaluation gets lost in the shuffle.

The selling point to prospects was “We’ll take your data and …” “We can use your existing data to transform your …” “Leveraging your data we can” … when the reality was that client data was often, for compliance reasons, siloed and gated among systems. Projects that needed to tie an interaction in one part of a data ecosystem to an outcome somewhere else were dead in the water because compliance wouldn’t allow those pieces to interact.

Furthermore a lot of data was missing or incomplete and training a model using large datasets where 20% of the fields are blank is a gigantic waste of time and money. In fact a data error rate of 0.0001% is enough to poison an entire model in certain instances.

There was a lofty idea that as long as data exists, somewhere in the ether, that you just connect it to AI and tell it to predict outcomes or generate content and voila your work is done and the fee can be collected.

Someone Somewhere Can Do It

We need to dive further into the cardinal sin of assuming someone somewhere can do something.

Organizations across every industry instill the mentality of “Be scrappy, make calls, get to someone who can get to someone who can” … and that mentality is the exact opposite of what you need when working with AI.

Greenlighting an idea on the assumption you can just Google a vendor or throw a requirements document to someone in India or Poland and they’ll magically build an A-team for implementation is an all too common practice that kills progress.

You don’t just make a ‘webhook into a GPT’ or ‘build a local model’ and assume that because it sounds smart to say – that it actually does anything. AI implementations require significant planning, team building, data cleanup and testing just to get basic output that will conform to a single acceptance value.

There’s nothing wrong with having an idea and wanting to execute it at scale but without the foundational knowledge of how this happens, your project won’t go anywhere. It’s like having an idea for a new car and instead of partnering with an automotive design studio, you go to a junkyard and try to piece something together.

Where do we go from here?

We need mass investment in upskilling at both the brand and consulting level to get team members up to speed on the mechanics and requirements of implementing and operating AI enhanced systems.

A small course asking someone to write a prompt and use Midjourney to make an image are in no way sufficient for large sclae operations.

We need to train teams on data, compliance, model function, prompting and so much more than how teams are trained right now at the same enterprises priding themselves on selling the ideas of the future. Failing to upskill teams will lead to even more wasteful spending and at some point someone somewhere will pull the plug after another dead in the water project with no ROI launches.

Categories
Artificial Intelligence

Remove Ambiguity: The Gallahue Method to Improve LLM Output and Efficiency

Pushing prompts that haven’t been optimized for efficiency and output controls into a production environment is unfortunately a common and wasteful practice that drives token budgets and kills progress.

The Gallahue method to fix and address these problems is simple yet effective in removing wasted tokens and improving overall outputs.

Scenario:

Let’s imagine I’m creating a website and I want to list the 10 best restaurants in every major city by cuisine type. Lacking the time and budget to visit and catalog these places personally, I write two prompts to source the web and generate JSON output I can quickly loop using N8N or another tool and then patch into a CMS.

Prompt 1:

For New York City, I need a bullet point list of the 10 best restaurants overall and the 10 best restaurants for the following cuisine types: American, Chinese, Sushi, Pizza, Steakhouses. Output as a JSON with entry for each restaurant under corresponding list.

Prompt 2:

For Eleven Madison Park, I need the following details: Year opened, cuisine type, average price point, star rating, executive chef, acclaimed dishes and a 150 word summary detailing why the restaurant is seen as exemplary. Output as a JSON with items for each point.

(Note: For both examples I used a single prompt but the setup in practice would be a system and user prompt where city/restaurant value is specified and of course websearch enabled)

Leveraging the Gallahue Method to Improve Prompt Output

On the surface, both prompts seem to be reasonable in the sense that the average person can read them and understand what’s being asked and what is the expected output.

However both have multiple flaws that will lead to degraded and poorly articulated output so let’s use my method to refine:

The Gallahue Method (DGRC) for Prompt Optimization:

  • Define – Strictly define inputs and outputs
  • Guide – Provide LLMs with easy paths to resolution
  • Review – Check prompt for ambiguity until cleared
  • Compress – Refine optimized prompt into simplest form for token efficiency

Defining Inputs and Outputs

Both prompts struggle from a lack of defined inputs and output. Specificity is critical to ensuring you get output that is of value and not a waste of tokens.

Silence of the Lambs is actually a weird but effective way to think about managing definitions. In a pivotal scene, Hannibal Lecter summarizes Marcus Aurelius by saying “Ask of each thing: What is its nature?” and that’s how you should approach writing production ready prompts.

Is this word, ask, input, output requirement clearly defined to the point that if I run this prompt 100 times on different accounts with no personalization, I will get the same answer format every time? If not, how I change it so that output normalizes?

Let’s look at how the define exercise works in practice by highlighting ambiguous or poorly defined terms:

Prompt 1:

For New York City, I need a BULLET POINT LIST of the 10 BEST restaurants OVERALL and the 10 BEST restaurants for THE FOLLOWING cuisine types: American, Chinese, Sushi, Pizza, Steakhouses. Output as a JSON with ENTRY FOR EACH RESTAURANT under corresponding list.

Bullet point list: While this is a tangible output structure, what specifically is the output? Is it the restaurant name? Is it a restaurant name and address? Do you want an ordered ranking list and a summary of the restaurant? If so what would be the criteria.

In this instance, the LLM must make a best guess as to what you want and depending on model and mode, you’ll either get a compact list or a 2,000 word dissertation on fine dining spots.

Best/Overall: Poor word choice is a mistake that even the most skilled prompt engineers will make. Ambiguous but well meaning terms like best or overall are open to interpretation and in my experience, LLMs will inference you’re looking for a composite of a brand with high level of exposure and high review counts and ratings on Yelp/G2/Trustpilot/Capterra which may not always align with your thinking.

In this case, New York City has no shortage of highly rated/reviewed restaurants and you could make individual lists for Midtown, Brooklyn, Chelsea etc … and have no degradation in quality of inclusions.

In this example one might replace best or overall with the requirement of a rating of 4.6 or higher on a specific site(s) and open at least 3 years to refine the list.

The Following: Absolutely critical point here … when you present LLMs with a list and do not gate the ouput, it’s likely you will get additional content that does not conform to what you want.

LLMs are instructed in system logic to be accommodating and because of this, they will often assume you would want additional output unless instructed otherwise. In this case because you want to build a recommended restaurant list … the LLM completing the prompt could naturally assume you want to cover all cuisine types and add Indian, Thai or vegan to the output.

Hence why you should always gate your output. You can use terms like: “Do not include additional categories” “Only generate for the items in this list” etc …

JSON Entry for Each Restaurant: The issue here is the exact same one we just identified in the first example where we ask for output but don’t define exactly what we want and thus leave LLMs to infer what we want based on the pattern.

Furthermore we haven’t defined that we only want the JSON and depending on LLM and settings, we may get summary text, introduction text or status confirmations that do not conform to our needs and output requirements.

Let’s take our definition exercise to the next prompt:

For Eleven Madison Park, I need the following details: Year opened, cuisine type, average price point, star rating, executive chef, acclaimed dishes and a 150 word summary detailing why the restaurant is seen as exemplary. Output as a JSON with items for each point.

I know 95% of the prompt is highlighted but it will make sense as we go line by line:

Eleven Madison Park: The restaurant has a unique name but if this were to run for a common restaurant name like Great Wall, Taste of India or Hibachi Express (all of which have 30+ unrelated restaurants in the United States) you would quickly introduce a best guess situation where models have to inference which location is correct. In this example, including an address as part of the prompt would remove any chance of location inaccuracy.

Cuisine Type: In this instance we have set a category in the previous prompt but we are not asking the subsequent prompt to conform to the

Average Price Point: The ambiguity here is that we don’t explicitly say whether we want the average cost of a meal and whether to output a range or specific dollar amount of whether we want a tiered metric like $, $$, $$$ (aligned of course to cost tiers we would set) that you commonly see in restaurant guides.

Star Rating: When it comes to restaurants we could be referencing Michelin stars or a traditional 1-5 star value. Second if we are referencing the traditional star value then we need to explain where to pull that value from and whether it’s a single source or composite.

Executive Chef: Restaurants commonly post this on web, press and social materials but in the instance they do not or they have recently parted ways with said individual … we need to give resolution path so that the LLM doesn’t waste tokens going down a rabbit hole trying to find your answer. For something like this, I’ll commonly put a line like: “If a value cannot be determined then output NA”

Acclaimed Dishes: In the previous section, we talked about terms like best/overall and what LLMs might inference from them. Acclaimed is another example that while it seems defined, it is still open to vast interpretation. Are you looking for dishes specifically referenced in high value source reviews or looking for the most commonly mentioned dishes in a review set? Again another point to define.

150 Word Summary: In the request we want a 150 word summary as to why the restaurant is exemplary. We need to define exemplary and to that we should provide LLMs with example points to highlight: Does the restaurant have a Michelin star? Has it been voted as a top 100/50/10 restaurant in the world or local market in the following sources ( ) …

Output as a JSON: Same feedback as above where you need to make sure that only the JSON is output and no intro text, summaries, notes etc …

Guiding LLMs

The second component to improving efficiency and quality of LLM output is to guide LLMs to key sources they can rely upon for conclusions.

When it comes to restaurant reviews and best of lists, especially for major markets, there’s no shortage of sources from Michelin Guides to YouTube videos shot in someone’s car of hot takes, dish reviews and tier rankings.

That being said when you are asking a LLM to put together a list like this, it might default to sources that aren’t the most valuable. For instance LLMs tuned to favor certain sources may choose to source from Reddit as a priority over Resy, Eater or The New York Times. In that case you would get snippet content from anonymous accounts over long form details from authors with names and a contextual list of works.

Unfortunately most people don’t guide their prompts with preferred sources and that leads to issues like the one above as well as wasted reasoning tokens as LLMs try to work out what sources would be best for a request and then search for relevant blocks on those sources.

How to “Guide”

Guiding is simply appending what sources you believe are best aligned to the output you want to achieve.

If I want to generate content about the 5 best new SUVs for families I can specify that my preferred sources or only sources to consider for composite research are Consumer Reports, Car and Driver and Motor Trend.

Subsequently I can also list sources to ignore. For instance let’s say I want to generate a balanced content piece talking about George W Bush’s presidential legacy. I will want to remove sources that will both overly praise and criticize his time in office in favor of sources that are more objective and in line with what I want to generate.

By doing this, we take the guesswork out of LLMs trying to inference what is a ‘best’ source for an item and instead go straight to search and processing.

Review Prompts with LLMs

Before you even ask LLMs to generate a single piece of output, you should always put your prompt into their system and ask it to review for points of ambiguity.

Iterating a prompt that is functionally deficient is a waste of resources whereas building the right prompt and fine tuning output is the much more efficient path.

Here’s the prompt I use to critically analyze my work:

You are a Prompt Auditor. Evaluate the target prompt strictly and return JSON that follows the schema below.

RUNTIME ASSUMPTIONS

  • Timezone: America/Chicago. Use ISO dates.
  • Reproducibility: temperature ≤ 0.3, top_p = 1.0. If supported, seed = 42.
  • Do NOT execute the target prompt’s task. Only evaluate and produce clarifications/rewrites.

DEFINITIONS

  • Components = { goal, inputs_context, constraints, required_process, websearch_dependencies, output_format, style_voice, examples_test_cases, non_goals_scope }.
  • Ambiguity categories = { vague_quantifier, subjective_term, relative_time_place, undefined_entity_acronym, missing_unit_range, schema_gap, eval_criteria_gap, dependency_unspecified }.
  • “Significant difference” threshold (for variance risk): any of { schema_mismatch, contradictory key facts, length_delta>20%, low_semantic_overlap }.

WEBSEARCH DECISION RULES

  • Flag websearch_required = true if the target prompt asks for: current facts/stats, named entities, “latest/today”, laws/policies, prices, schedules, or anything dated after the model’s known cutoff.
  • If websearch_required = true:
  • Provide candidate_domains (3–8 authoritative sites) and query_templates (2–5).
  • If you actually perform search in your environment, list sources_found with {title, url, publish_date}. If you do not perform search, leave sources_found empty and only provide candidate_domains and query_templates.

OUTPUT FORMAT (JSON only)
{
“ambiguity_report”: [
{
“component”: “”,
“span_quote”: “”,
“category”: “”,
“why_it_matters”: “<1–2 sentences>”,
“severity”: “”,
“fix_suggestion”: “”,
“clarifying_question”: “”
}
// …repeat for each issue found
],
“variance_risk_summary”: {
“risk_level”: “”,
“drivers”: [“”],
“controls”: [“Set temperature≤0.3”, “Specify schema”, “Pin timezone”, “Add ranges/units”]
},
“resolution_plan”: [
{
“step”: 1,
“goal”: “”,
“actions”: [“”],
“acceptance_criteria”: [“”],
“websearch_required”: ,
“candidate_domains”: [“”, “…”],
“query_templates”: [“”, “…”],
“sources_found”: [
{“title”:””,”url”:”<url>”,”publish_date”:”YYYY-MM-DD”}] } // …continue until the prompt is unambiguous and runnable], “final_rewritten_prompt”: “<<<A fully clarified, runnable prompt that incorporates fixes, constraints, and exact output schema.>>>” } TARGET PROMPT TO EVALUATE

PROMPT_TO_EVALUATE: “””[PASTE YOUR PROMPT HERE]”””

For the simplest of prompts, I’ll often get a laundry list of output flags and that’s even with years of experience writing and reviewing prompts.

Compress Compress Compress

The final step is to compress the prompt. Compression similar to minifying javascript on a site doesn’t do much if it’s a prompt you’re running a few times but if it’s a prompt you’ll potentially run millions of times, then reducing unnecessary characters, words, phrasing can save thousands of dollars in the long run.

Compression is both removing words and optimizing order of operations so that prompts run quickly and efficiently.

What does compression look like?

Let’s take a basic prompt

“Write a 200 word summary of Apple’s iPhone 17 launch event. Only talk about the new products and updates introduced and don’t talk about influencer or market reactions. Write it from a neutral point of view and don’t overly praise or criticize their launch event”

It’s pretty well defined but notice how we ask for output in the first sentence then tell it additional refinements in the following sentences? Those can create invisible loops where a reasoning path starts only to then be redirected based on subsequent instruction.

Second we use phrases like launch event multiple times which may not seem important if we’re running a prompt once but if this is a foundational prompt for a Saas product and it will be run millions of times, it’s going to waste thousands of dollars in tokens reading the same words twice.

Here’s the compressed prompt:

Write 175-225 words in neutral plain english covering only new products and updates from Apple’s iPhone 17 launch event.

It’s to the point, efficient and doesn’t restate words multiple times. One key point is not to over compress and accidentally remove key guardrails. Always error on the side of caution and again if you aren’t going to be using the same prompt multiple times then compression may not be necessary.

Putting it All Together Using the Gallahue Method

Consider this prompt for someone looking to write a nutrition guide. I’ve started it at a base level with no definition, guiding or review:

“Hey ChatGPT I need you to write a 600 word article about Kale. I want you to cover kale from three different angles. The first is a overview of what kale is and why it’s considered a superfood due to nutrient density, vitamins A,C, and K, fiber, antioxidants and how those are important for heart, bone and eye health. Second can you cover common diets that use a lot of kale like vegan, keto, paleo and mediterranean diets and finally can you give four examples of how someone can use kale in their day to day cooking? Ideally raw salads, sautéed dishes, smoothies and kale chips.”

On the surface it seems basic and self-explanatory but if I’m generating the same output for 30-40 foods or ingredients, there’s inefficiency that will slow it down and may not lead to the best output.

Here’s the optimized version that removes ambiguity and provides specific instruction for output:

Write 550–650 words in plain English about kale. Output only the article text with exactly three H2 headings: Overview, Health Benefits, Diet Uses.
Cover: what kale is and common varieties; why it’s considered a “superfood” (nutrient density—vitamins A, C, K; fiber; antioxidants—and key benefits for heart, bone, and eye health); and how to use it across diets (Mediterranean, vegan/vegetarian, keto, paleo) with four brief prep examples (raw salad, sautéed, smoothie, baked chips).
Keep a neutral, informative tone. Avoid fluff, anecdotes, lists, citations, or calls to action.

It’s to the point, specific with output and leaves no doubt as to what the LLM needs to generate in order to satisfy output requirements. Using my methodology on your prompts going forward will only serve to improve your own understanding of how LLMs reason output but also improve overall delivery of material.

In closing …

It’s important to realize no prompt will be perfect the first time you run it and in fact many of the prompts I work on have gone through 30-60 revisions in staging before they ever make it to production.

Think of this process as a debugging step for human language and it will all make sense.

Categories
AIO Artificial Intelligence

ChatGPT Custom API Call Template for N8N

Maybe it’s a byproduct of Google search getting progressively worse for narrow outcome searches or the fact that it may not yet exist – but I couldn’t find any JSON templates for custom calls to ChatGPT.

For any low/no-coders using N8N to prototype AI wrappers or just play with it’s call capabilities you might you may want to play with more settings than the managed call environment offers and I want to make that transition easier for you with this JSON template that covers all of the main settings and configs you’d want to pass or really experiment with.

Managed Call vs Custom Call

N8N and other no/low-code platforms offer managed ChatGPT flows that make it easy to message models but as your skillset grows and maybe your curiosity when it comes temperature and p value settings – you’ll find these managed calls won’t work.

In addition to fine-tuning settings, everything you’ll be using in your mini-app will be in JSON payloads so why not have your call in JSON as well?

ChatGPT Custom Call Template

To set this up:

  1. Create a HTTP Request step in your flow and link it
  2. Set Method to POST
  3. Set URL to https://api.openai.com/v1/responses
  4. If you’re using managed calls, you would have already setup your authentication. You can pass your key as part of a custom call but for this example let’s keep things simple
  5. ‘Send Query Parameters’ / ‘Send Headers’ should be set to OFF
  6. ‘Send Body’ set to ON

7. Set Body Content Type to JSON

8. Set Specify Body to JSON

9. Customize and Paste this template

{
"model": "gpt-4.1",
"input": [
{
"role": "system",
"content": [
{
"type": "input_text",
"text": "System Prompt Text"
}
]
},
{
"role": "user",
"content": [
{
"type": "input_text",
"text": "User Prompt Text"
}
]
}
],
"text": {
"format": {
"type": "text"
}
},
"reasoning": {},
"tools": [
{
"type": "web_search_preview",
"user_location": {
"type": "approximate",
"country": "US"
},
"search_context_size": "medium"
}
],
"temperature": 0,
"max_output_tokens": 2048,
"top_p": 0,
"store": true
}

So what are we setting in the above? Let’s talk through it:

  1. System Prompt
    • How you want the system to function
  2. User Prompt
    • What do you want to generate in reference to the system prompt
  3. Model
    • Which model do you want to use
  4. Websearch
    • Do you want websearch enabled
  5. Search_context_size
    • Depending on how involved the prompt, you will want to set this to high or specify the value in tokens
  6. Temperature
    • A value of 0 means you want ChatGPT to be purely deterministic and only generate results based on empirical data
    • A value of 1 means you want ChatGPT to be incredibly creative in answers and theorize answers in ways that may not be supported by empirical data
    • You can set any value in between to fine-tune
  7. Max_output_tokens
    • This value essentially determines output length. At the moment, the maximum number of tokens supported is 128,000 which would translate roughly into 96,000 words (1 token roughly equals 0.75 words)
  8. Top_p
    • Similar to temperature, when the model is outputting the next word, do you want it to focus only on the most common words or focus on less frequent combinations. 0 means you want more predictable output … 1 means more creative responses.
    • You can set any value between 0 and 1 to fine tune output
  9. Store
    • A true value means you want to have the conversations available in your dashboard
    • A false value means you don’t want to keep the conversations
Categories
AIO

How Glassdoor and Indeed Reviews Impact AI Brand Perception

I’ve worked with companies that obsessed over their Glassdoor reviews like Daniel Day-Lewis obsesses over every aspect of his character. I’ve also worked at companies that couldn’t care less and saw the platform as a way for disgruntled employees to take parting shots when things didn’t work out.

Whether your brand is on one side of the spectrum or more in the middle, you need to start paying attention to the roles, content and context of your inbound reviews and ratings because as you’ll see in this case study, it can have global implications.

Ranking Global IT Firms

Having come from 2.5 great years at TCS, I decided to test RankBee’s (in development) AI brand intelligence tool for the top global IT firms. Similar to the rating scale in my post on Luxury Brands, I pulled in global brand metrics and then segmented the results for a competitive set of Accenture, TCS, Infosys, Cognizant and Deloitte.

After putting a myriad of data into our system, the numbers showed Accenture was the overall category leader and my old firm TCS was 3rd in the group based on scores. Most of the brand factor scores were between 8.5-9.5 but what stuck out was that the lowest score for any of these factor metrics was on delivery which is the lifeblood of IT client servicing.

You’ll notice the number is under the heading ‘IT Firm 5’ because selfishly I want to have some level of mystery/engagement as to what the brand could be and also just putting it there, I feel like some people would read it and say “Makes sense” or “I’ve heard that about them” without diving deeper into why this is being generated by LLMs in the first place.

“Inconsistent Delivery and Slower to Innovate”

I began diving into the reason for the low score and first looked at the sources being pulled in to generate both the positive and negative sides of the score/perception.

On the positive side, external sources like press releases, C level interviews and industry awards for client satisfaction were being pulled in – alongside the client’s own case studies and investor relations pages talking about internal client survey results.

For most brands, this would form a very positive score of at least 8.5 but unfortunately for this IT firm, there was quite a bit of content that subtracted from the score – bringing it nearly a full point below their next competitor.

Glassdoor and Indeed’s Impact of AI Brand Perception

When looking at what was forming the basis for the low delivery score, there were a multitude of citations for Glassdoor and Indeed and when you think about it, it really makes sense as to why these sites would have the outsized impact they have.

The cited sources were review category pages for roles like delivery manager, project manager, delivery lead, account manager etc… and as I read through page after page of detailed, largely critical employee experiences, it really hit me as to why LLMs would think these are a rich data source for informing their opinion on delivery.

First you have the context that the material is coming from someone whose role would have been intimately familiar with that particular attribute of the consideration process – in this case project delivery and management.

Second is that reviews, especially on the employment side, tend to highlight specific, contextually relevant items in a long form fashion that would give LLMs even further contextual elaboration as to the company’s ability to meet certain functions.

Finally a single review won’t be enough to establish a trend but with the number of employees these firms have spread across multiple geographies and the levels of turnover natural to these operations, there’s a considerable volume of data and with that potentially enough material for LLMs to start seeing trends.

Could the Score Actually Impact Deal Flow?

IT services at this level are largely sold through relationships where you’ll have 10-15 people on an account constantly meeting and interacting with clients and those connections are largely what drive deal flow vs an enterprise brand going to Google or ChatGPT and trying to find service providers.

However as companies continue to look for ways to further reduce overhead, procurement functions are likely to be more and more AI driven and it’s conceivable that in the future, software will automatically determine which approved vendors can and cannot bid for a contract.

In that reality, which I feel is not too far away, companies like this one would see their lower delivery score begin to impact their sales pipeline in meaningful ways and may not even realize why until it’s too late.

So what should brands do?

Brands need to pay closer attention to macro trends that emerge from review sites and give them proper consideration rather than ignoring or getting into the habit of looking at a site like Glassdoor or Yelp as a place for venting frustration.

As seen in this example, a trend established over time with contextually relevant content from contextually relevant individuals can have an outsized impact on a global 100,000 employee firm.

Now that we can see how this impacts LLMs, brands should incorporate public exit “interview” trends as a core part of their annual internal reviews and leverage trends to better inform areas of structural and process improvement going forward to buck trends and establish better baselines.

Keep in mind that LLMs won’t laser focus on a single source to form a perception and in this case, there is a considerable amount of positive content talking about this company’s delivery strength – however brands need to do what they can to impact sources of doubt and continue to build on sources of strength.

Post originally written by me on LinkedIn April 1, 2025.