OpenAI's ChatGPT Memory with Search rewrites web queries from chat history
OpenAI is moving ChatGPT further from the old search-box model. The new feature, Memory with Search, lets ChatGPT use details saved from past conversations to rewrite web queries before they go out. Ask for “restaurants near me,” and if it knows you’...
ChatGPT’s new memory-powered search changes the query before it hits the web
OpenAI is moving ChatGPT further from the old search-box model.
The new feature, Memory with Search, lets ChatGPT use details saved from past conversations to rewrite web queries before they go out. Ask for “restaurants near me,” and if it knows you’re vegan and in San Francisco, it may search for “best vegan restaurants in San Francisco” instead.
That sounds minor. It’s not. Query rewriting is one of the biggest determinants of search quality, and OpenAI is now feeding persistent user context directly into that layer.
For anyone building AI products, this is a pretty clear signal of where assistant UX is headed. Less prompt-by-prompt interaction, more stateful systems that carry context across tasks.
What changed
The basic behavior is simple:
- ChatGPT stores certain user details as memory, if memory is enabled
- When a prompt would benefit from web search, it can pull in relevant memories
- It rewrites the search query with that context
- It runs a live web search and returns current results
So the personalization happens before the results come back, not just in how the answer is phrased.
That matters because response-level personalization still starts from generic search results. Query-level personalization changes retrieval itself. It changes what the system goes to fetch.
OpenAI says this follows your existing memory settings. If memory is off, this should be off too. Users can also remove saved facts.
That’s the product description. The engineering implications are more interesting.
Why query rewriting matters
Search engines have been rewriting queries for years. Google does it with spelling correction, synonym expansion, geolocation, session history, and intent classification. What’s new here is the source of the context: an LLM’s persistent conversational memory.
That gives ChatGPT a different kind of context than a classic search engine usually has. It may know:
- your diet
- your location
- the stack you use at work
- the project you’re building
- the answer format you prefer
If you ask, “best laptops for travel,” a standard search engine has to infer intent from broad patterns. ChatGPT can fold in “you write CUDA code,” “you prefer Linux,” and “battery life matters more than gaming.”
That pushes it closer to an actual assistant. It also creates new failure modes.
A lot of search quality comes down to under-specification. People ask vague questions because spelling out every constraint is tedious. A memory layer fills in those blanks. Sometimes that helps. Sometimes it inserts assumptions the user didn’t mean for this query.
That tension defines the feature.
A likely architecture
OpenAI hasn’t published the internals, but the rough shape is easy to guess. This looks like a lightweight retrieval pipeline wrapped around web search.
A plausible flow:
- User submits a prompt
- The system decides whether web search is needed
- It retrieves a small set of relevant user memories
- A query-rewrite model combines prompt and memory into a better search string
- A search API runs that string against the live web
- The response model turns the results into an answer
The middle of that could look something like this:
prompt = "Find hikes near me"
memories = memory_store.retrieve(
user_id=user_id,
query=prompt,
top_k=5
)
rewritten_query = llm.rewrite_query(
prompt=prompt,
memories=memories
)
results = search_api.search(rewritten_query)
answer = llm.summarize(
prompt=prompt,
memories=memories,
search_results=results
)
The hard part is retrieval discipline. Saving user facts is easy enough. Any decent team can dump them into Postgres, Redis, or a vector store. The hard part is pulling in the right memories, and only the right ones, so they improve the query instead of contaminating it.
If the memory system drags in “planning a trip to Tokyo next month” when the user asks for “best running shoes,” query quality falls apart quickly. Persistent context only helps when retrieval is selective.
That’s why a feature like this probably needs more than naive semantic search. You’d expect some mix of recency, relevance scoring, memory type, and maybe a classifier that decides whether personalization should happen at all.
This is a RAG pattern, but a different one
Most RAG discussion is about grounding responses in company docs or proprietary data. This sits nearby, but the retrieval target is different. The system is retrieving user state, not domain knowledge.
That matters.
In a typical enterprise setup, RAG answers “what does the handbook say?” Memory-aware search answers “what do we already know about this user that should shape retrieval?”
Those are different problems, and teams blur them together all the time.
User-memory retrieval comes with its own failure cases:
- stale preferences
- oversharing across contexts
- privacy leakage
- accidental narrowing of results
- overconfident personalization from weak signals
A user may have mentioned being vegetarian once, then forgotten about it. If that keeps steering restaurant, travel, and grocery searches months later, the system starts to feel stubborn. Personalization turns into friction.
“Memory” is a soft product term. In system terms, this is long-lived user profile inference wired into retrieval.
Why developers should care
OpenAI is making a pattern more acceptable that plenty of product teams have wanted to ship for a while: persistent context that directly changes retrieval.
You can expect versions of this to show up in:
- developer tools that remember your stack and rewrite documentation searches
- internal copilots that know your team, repo, and incident history
- e-commerce search that folds in explicit preferences and behavioral memory
- support bots that tailor search results based on account state and prior tickets
The appeal is obvious. Fewer follow-up prompts. Better first-pass answers. Less repeated context from the user.
But the details will decide whether it works.
Latency gets ugly fast
A memory-aware search stack adds several steps:
- memory retrieval
- query rewrite inference
- search API call
- answer synthesis
Each one adds cost and delay. If you build this yourself, it’s very easy to stack on extra retrieval, reranking, and chain logic until your “smart search” takes six seconds and nobody wants to use it.
The practical version needs tight budgets. Small top_k, cheap rewrite calls, aggressive caching, and clear fallbacks when memory retrieval doesn’t add enough value.
You need memory governance, not just storage
“Can we save user preferences?” is the easy question. The harder ones show up after launch:
- How does a user inspect memory?
- How do they correct it?
- What expires automatically?
- What data is excluded from query rewriting?
- What gets logged when memory influences search?
In regulated markets, this gets serious quickly. GDPR and CCPA issues don’t disappear because the remembered detail seems harmless. “Prefers vegan food” is less sensitive than medical history, but it’s still personal data.
Encryption and deletion controls are baseline requirements. Auditability matters too. If personalization changes results, users and compliance teams may want to know why.
Evaluation is trickier than it looks
It’s easy to track click-through. It’s harder to tell whether memory improves outcomes or just makes answers feel smoother.
Teams should compare:
- raw query vs. rewritten query
- personalized retrieval vs. generic retrieval
- task completion rate
- time to satisfactory answer
- user correction rate
- opt-out rate for memory features
That last one matters a lot. If users keep disabling memory, the feature may be technically sound and still wrong for the product.
There’s a real risk of narrowing the web
Personalization has always had a downside: it reduces discovery.
If ChatGPT keeps steering searches through remembered preferences, it can hide useful surprises. A user who once mentioned strict dietary rules may still want broad recommendations for friends. A politically charged memory could tilt queries toward familiar sources. A developer who mostly writes Python may ask a general question and keep getting Python-heavy results anyway.
Classic search engines usually mix personalization with some diversity control for exactly this reason. Full trust in the profile is a bad idea.
LLM systems need the same restraint. Sometimes the best rewrite is no rewrite. Sometimes the right move is a follow-up question. Sometimes the results should stay wider than the user’s history suggests.
That judgment layer is harder than the retrieval layer, and a lot of assistant products still look immature here.
The competitive angle
OpenAI wants ChatGPT to feel cumulative. Google, Anthropic, Microsoft, and others are all pushing toward the same outcome: an assistant that remembers enough to save you effort without becoming creepy or brittle.
OpenAI has one obvious advantage. ChatGPT already has a strong habit loop around ongoing conversation. Users tell it things over time. Feeding that memory into search feels like a natural extension.
The weak point is trust. Search is already sensitive. Add persistent memory and the question changes: what does this system know about me, and when is it using that information?
If OpenAI gets the controls right, this will feel normal fast. If it gets them wrong, people will treat memory the way they used to treat browser cookies: useful on paper, disabled when possible.
What to copy from this design
If you’re building AI search or assistant products, the pattern worth borrowing is straightforward:
- keep memory retrieval narrow
- rewrite queries only when confidence is high
- make memory visible and editable
- measure whether personalization helps or annoys
- keep a fast generic path for requests where memory adds nothing
That last point gets overlooked. Plenty of systems try to personalize every request and end up making the product feel intrusive or slow. Good product judgment means leaving the query alone when context doesn’t help.
OpenAI’s update matters because it moves personalization into the retrieval path, where it can materially change results. That’s smart engineering. It also raises the stakes. Once an assistant starts editing your query based on what it remembers, memory stops being a convenience feature and becomes part of the search infrastructure.
Useful next reads and implementation paths
If this topic connects to a real workflow, these links give you the service path, a proof point, and related articles worth reading next.
Compare models against real workflow needs before wiring them into production systems.
How model-backed retrieval reduced internal document search time by 62%.
A former OpenAI safety researcher has published a close read of a 21-day ChatGPT conversation that reportedly fed a user’s delusional spiral. The details are grim. The point is simple enough: when you ship conversational AI at scale, sycophancy is a ...
Disney has signed a three-year deal with OpenAI to bring more than 200 characters from Disney, Pixar, Marvel, and Lucasfilm into Sora and ChatGPT Images. It's also investing $1 billion in OpenAI. The bigger shift is what the deal says about the marke...
OpenAI has opened submissions for a ChatGPT app directory and is rolling out app discovery inside ChatGPT’s tools menu. Its new Apps SDK, still in beta, gives developers a formal way to plug services into ChatGPT so the model can call them during a c...