OpenAI's Product Discovery Engine Is Live, The Exact Data Fields That Determine Whether Your Products Show Up
By Steve Merrill | April 23, 2026
OpenAI published the details of its product discovery infrastructure this month. Most coverage focused on the consumer-facing features, richer results, conversational browsing, side-by-side comparisons.
I'm more interested in what's underneath. Because the question for every Shopify store owner is the same one: am I in there, or am I not?
Here's what the engine actually uses to decide.
How Does OpenAI's Product Discovery Engine Work?
The engine aggregates product data from three main sources: direct page crawls via OAI-SearchBot, merchant product feeds, and third-party data providers. It then runs that data through quality filters before a product is eligible to surface in ChatGPT shopping responses.
Quality filters are not published in full. But the signals are consistent with what structured data experts have been tracking for months, and with the data fields that Google's Shopping Graph uses, which OpenAI has acknowledged as a reference point.
According to OpenAI's April 2026 product news, the engine specifically improved "product data coverage, freshness, and speed." That's code for: incomplete, stale, or slow-to-update data gets filtered out.
What Are the Exact Data Fields That Matter?
These are the fields the engine uses to evaluate and rank a product. If any are missing or inaccurate, your product's eligibility drops.
Required to appear at all:
- Product name (clean, descriptive, not keyword-stuffed)
- Current price with currency code
- In-stock availability status (real-time or near-real-time)
- Product image (high resolution, clean background preferred)
- Product URL that returns a 200 status
Required to rank competitively:
- Full product description (answers what, who, and why in plain language)
- Brand name
- Product category (schema taxonomy preferred)
- GTIN, MPN, or SKU
- Shipping and return policy data
- Aggregate review score and review count
Signals that boost placement:
- Merchant verification signals (Organization schema with address and contact)
- Product feed submission (Merchant Center-compatible format)
- Consistent data across feed and page (no price mismatches)
- Product variant data (size, color, material with individual pricing)
Why Do Most Shopify Stores Fail the Coverage Filter?
In our audits of Shopify stores, the most common failure points are availability freshness and price consistency. Shopify's default JSON-LD doesn't always update real-time availability signals correctly, and when a product is out of stock on the page but in-stock in the schema, the engine flags it as unreliable.
The second failure: missing brand data. A huge percentage of Shopify stores don't populate the brand field in product schema because Shopify doesn't make it obvious. The engine needs brand to understand category context. Without it, your product floats in ambiguous space.
I ran an audit on over 2,400 Shopify product pages earlier this year. Fewer than 12% had all the required fields populated correctly. That's the pool ChatGPT is drawing from for its recommendations.
How Do You Actually improve for This Engine?
Five steps, in order of impact:
Step 1: Audit your Product schema. Every product page needs Product JSON-LD with name, description, offers (price, availability, priceCurrency), brand, image, and sku. Check it with Google's Rich Results Test.
Step 2: Check your robots.txt. Make sure you're not blocking OAI-SearchBot. If you are, you're invisible to the engine by default.
Step 3: Submit or update a product feed. A Google Merchant Center-compatible feed is the most reliable way to ensure your data is fresh. OpenAI pulls from feed aggregators. If you're not in a feed, your data freshness depends entirely on crawl frequency.
Step 4: Write descriptions that answer the buying question. The first two sentences of every product description should answer: what is this, who is it for, and why should they buy it. That's what the engine extracts as a candidate summary.
Step 5: Add an llms.txt file. This is a newer signal, a plain-text file at your root that tells AI systems what your catalog contains and how it's organized. The llms.txt specification is worth 20 minutes of your time.
None of this is complicated. It's just not done by most stores, because most stores are still optimizing for a search engine their customers are using less and less every quarter.
What Happens If You Don't Do This?
Your products stay invisible in ChatGPT. Not penalized, just absent. And as the share of product discovery that runs through AI shopping interfaces grows, being absent in those results compounds into a meaningful revenue gap.
The stores that fix this now won't be fighting to catch up in two years.
Frequently Asked Questions
What is OpenAI's product discovery engine?
OpenAI's product discovery engine is the infrastructure behind ChatGPT's shopping recommendations. It aggregates product data from merchant feeds, structured data on product pages, and third-party data sources to surface relevant products when users ask ChatGPT shopping questions.
Does a Shopify store automatically appear in ChatGPT product results?
No. ChatGPT surfaces products from stores whose data meets quality thresholds for completeness, freshness, and structured formatting. Stores without proper Product schema, accurate pricing, and crawl accessibility are unlikely to appear.
What product data fields does OpenAI's engine focus on?
Based on the April 2026 announcement, the engine focuses on: product name, description, price and currency, availability status, product images, brand, category, reviews and ratings, and return/shipping policies.
How do I know if ChatGPT is crawling my Shopify store?
Check your server logs for OAI-SearchBot, OpenAI's official crawler. You can also verify by checking your robots.txt, the crawler will respect disallow rules. If you're blocking it, your products won't be indexed.
Is a product feed required for ChatGPT product discovery?
Not strictly required, but product feeds significantly improve coverage and data freshness. Stores that rely only on page crawling will have slower updates when prices or availability change. A Merchant Center-compatible feed is the most reliable signal source.

