Multi-location service brands are starting to treat robots.txt like an AI visibility switch. That is too simple.
The better question is not whether to allow or block "AI crawlers." It is which crawler job you are making a decision about. Search retrieval, user-requested page fetches, model training, ad review, and random scraping are different activities. A crawler policy that treats them the same can quietly remove location pages from answer engines, or open content to uses the brand never intended.
For a home services rollup, franchise system, med spa group, or hospitality operator, the practical goal is narrower: keep public location and service pages accessible to legitimate search and answer surfaces, limit training use where the business chooses to, and verify that CDN or bot-management settings are not blocking the pages that need to be found.
Important
Do not copy a generic "block all AI bots" robots.txt template into a multi-location site. It can protect content from some training crawlers, but it can also make real locations harder for AI search systems to retrieve, cite, or summarize.

Start with the crawler's job
OpenAI's crawler docs split access into separate user agents. OAI-SearchBot is for ChatGPT search features. GPTBot is for crawling content that may be used to train foundation models. ChatGPT-User is triggered by certain user actions and is not the control OpenAI tells site owners to use for Search opt outs.
Anthropic's current crawler page makes a similar separation. It lists ClaudeBot for model development, Claude-SearchBot for search, and Claude-User for retrieving content at a user's direction. Anthropic says the bots honor industry-standard robots.txt directives and describes how site owners can block or slow specific bots.
Perplexity documents PerplexityBot and says it will not index full or partial text content from a site that disallows it in robots.txt, while still noting that it may index limited facts such as the domain, headline, and a brief summary.
Google is different because AI Overviews and AI Mode are Search features. Google says the same SEO fundamentals apply, and that pages must be indexed and eligible to appear in Google Search with a snippet to be eligible as supporting links in AI features. Google also says Googlebot, not Google-Extended, is the control for crawling in Search. Google-Extended is a product token for managing certain Gemini training and grounding uses outside regular Search crawling.
That separation matters for local service businesses. Blocking a training crawler may be a reasonable brand decision. Blocking a search crawler, a user-request fetcher, or Googlebot can reduce visibility in the exact answer surfaces the business wants to influence.
A selective policy is safer than a broad block
Most operators do not need a philosophical crawler policy. They need a practical one that a marketing lead, SEO lead, and web developer can maintain across hundreds of location URLs.
A reasonable starting point is to group crawlers by purpose:
- Allow search and discovery crawlers that the business wants to appear in, including Googlebot for Google Search AI features and OAI-SearchBot for ChatGPT search when ChatGPT visibility matters
- Decide separately on training crawlers such as GPTBot, ClaudeBot, and Google-Extended based on the brand's content-use policy, legal posture, and appetite for model-training reuse
- Keep user-triggered fetches usable when appropriate because assistants may fetch a page when a user asks about a specific business, booking page, or source
- Block private, duplicate, parameter-heavy, and internal paths such as staging pages, admin routes, search-result pages, cart flows, or campaign parameters that do not help a buyer choose a location
- Verify server behavior beyond robots.txt because CDN rules, WAF challenges, bot toggles, and IP allowlists can override a permissive robots file
This is not a recommendation to allow everything. It is a recommendation to stop treating "AI" as one crawler class.
For many Cheers ICP companies, the highest-value public pages are location pages, service pages, reviews or proof pages, comparison pages, and helpful educational articles. Those pages should usually be crawlable if they support local demand. Internal dashboards, lead-routing APIs, private reports, and thin duplicate pages should not be.
What a local-service robots.txt might express
The exact file depends on the site's legal policy, CMS, CDN, and growth priorities. The point is to make search access, training access, and private-path access explicit.
User-agent: OAI-SearchBot
Allow: /
User-agent: GPTBot
Disallow: /
User-agent: Google-Extended
Disallow: /
User-agent: *
Disallow: /api/
Disallow: /admin/
Disallow: /private/
Allow: /
Sitemap: https://www.example.com/sitemap.xmlThat example allows ChatGPT's search crawler, blocks two training or non-Search-use controls, blocks private paths for everyone, and keeps the public site crawlable. It is not a universal template. It does not cover Anthropic, Perplexity, Bing, Apple, Common Crawl, or every CDN-level bot rule. It also does not solve content quality, entity clarity, source coverage, or review strength.
Robots.txt only answers one question: which crawlers are instructed not to fetch which paths. It does not prove that a page deserves to be recommended.
For that broader work, read How Local Businesses Can Show Up in Google AI Search and The Citation Stack for AI Search.
The multi-location failure mode is partial blocking
Large local-service sites rarely fail crawler access in one obvious way. They fail by market, subdomain, path, brand, or acquisition.
A PE-backed HVAC platform may have the corporate domain crawlable while acquired brands sit behind older robots rules. A franchise may allow crawlers on the marketing site while blocking franchisee microsites or appointment subdomains at the CDN. A med spa group may allow Googlebot but challenge every non-Google bot with JavaScript, which can stop answer engines from reading public service pages. A home services brand may accidentally block parameterized URLs that canonical location pages depend on.
Partial blocking is hard to spot because normal browser checks still work. The marketing team can load the page. Google Search Console may look clean for the main domain. The location page may be in the sitemap. But the AI crawler or user-agent that matters sees a 403, a bot challenge, a no-snippet rule, or an empty rendered page.
This is why crawler policy belongs in a recurring technical audit, not a one-time robots.txt edit. The audit should cover the apex domain, location directories, legacy acquisition domains, booking subdomains, blog paths, image paths, and proof pages. It should test robots.txt, HTTP status, renderability, canonical tags, snippets, structured data, and CDN bot settings.
If the business needs a repeatable branch-level measurement layer, pair this crawler check with How to Audit AI Search Visibility Across Locations.
If the same business also has inconsistent third-party listings, the crawler problem becomes an entity problem. Why AI Treats Your 50 Locations Like 50 Strangers explains that side of the work.
Do not confuse crawler access with permission to win
Allowing the right crawler is only the entry ticket. It means a search or AI system can try to fetch a page. It does not mean the page is useful, indexed, cited, or selected.
Google says there are no special technical requirements for appearing in AI Overviews or AI Mode beyond eligibility for Search and snippets, and it points site owners back to crawlability, internal links, page experience, textual content, images, structured data that matches visible text, and up-to-date Business Profile information.
For local-service brands, those fundamentals have a location-level interpretation. A crawlable page should identify the local branch, services, service area, contact path, business profile relationship, review proof, and the real-world problem the customer is trying to solve. A generic city page that exists only to target a keyword may be accessible, but still not useful enough to support an AI recommendation.
The mistake is to stop at "we allowed the bot." The better question is whether the page gives the system enough public evidence to say why this location is a credible provider for this job in this market.
Use llms.txt as context, not as the policy layer
An llms.txt file can help explain a site to AI tools that choose to read it. It can point crawlers or agents toward high-value pages, documentation, and business context. It should not replace robots.txt, noindex, authentication, or CDN-level controls.
For local-service brands, llms.txt is useful when it summarizes the site map, service categories, locations, proof pages, and preferred canonical resources. It is not useful when it becomes a dumping ground for every keyword page or a fake permission system.
The control layer is still robots.txt, page-level meta directives, HTTP headers, authentication, and edge rules. The context layer can include llms.txt, structured data, internal links, and clear page copy. Keep those jobs separate.
Read What Is LLMs.txt, and Should Your Business Use One? if your team is deciding whether to add one.
What to inspect this week
Before changing policy across hundreds of locations, pick ten revenue-critical URLs: three location pages, three service pages, two proof or review pages, one article, and one booking or contact path. Fetch each with Googlebot, OAI-SearchBot, GPTBot, ClaudeBot, Claude-SearchBot, PerplexityBot, and a normal browser user agent. Record whether each request returns a useful page, a robots block, a 403, a challenge, a redirect loop, a noindex or no-snippet directive, or a page with missing main content.
Then compare the result to the business decision. If search visibility matters, public location and service pages should be accessible to the search crawlers that feed those surfaces. If the brand does not want training reuse, document that separately. If abusive bots are creating load, handle them with log-based rules instead of blocking every documented AI crawler by default.
The Cheers AI Visibility Grader can show how one business appears in AI search. The crawler audit explains whether the public pages that should support that visibility are even reachable.
Sources
- Google Search Central: AI features and your website. Supports the point that AI Overviews and AI Mode use normal Search eligibility, crawling, snippets, internal links, structured data, and Business Profile freshness
- Google Crawling Infrastructure: Google-Extended. Explains that Google-Extended is a standalone product token, not a separate HTTP user agent
- OpenAI: overview of OpenAI crawlers. Defines OAI-SearchBot, GPTBot, and ChatGPT-User and their different purposes
- Anthropic Help Center: web crawling and blocking Anthropic bots. Documents ClaudeBot, Claude-SearchBot, Claude-User, and Anthropic's robots.txt guidance
- Perplexity Help Center: how Perplexity follows robots.txt. Documents PerplexityBot's stated behavior when a site disallows crawling
- Google Crawling Infrastructure: robots.txt specification. Supports the syntax and limitations of user-agent, allow, disallow, and sitemap rules
- Cloudflare bot solutions docs: managed robots.txt setting. Supports the point that CDN-managed robots settings and content signals can affect crawler instructions
Amadeus Peterson is the CTO & Co-Founder of Cheers, the local search platform for multi-location service businesses.