HTTPS surface reachable (robots ✓, sitemap ✗, title ✓)
Why it matters: Public files — robots.txt, sitemap.xml, head meta — are what attackers see first during reconnaissance. Misadvertised paths, stale sitemaps, and verbose generators leak more than intended (ISO 27001 A.8.9).
robots.txt
present
# Robots.txt for https://www.dailymail.com/ updated 20/04/2026
#
#
# Daily Mail content is made available for your personal and non-commercial
# use subject to our Terms and Conditions of use:
# https://www.dailymail.com/terms
# Use of any device, tool, or process designed to data mine or scrape the content
# using automated means is prohibited without prior written permission from
# DMG MEDIA LIMITED. Prohibited uses include but are not limited to:
# (1) text and data mining activities under Art. 4 of the EU Directive on Copyright in
# the Digital Single Market;
# (2) the development of any software, machine learning, artificial intelligence (AI),
# and/or large language models (LLMs);
# (3) creating or providing archived or cached data sets containing our content to others; and/or
# (4) any commercial purposes.
# For any licensing queries please contact partnerships@dmgmedia.co.uk.
#
#
# All Robots
User-agent: *
#
# Begin Standard Rules
Disallow: /*readcommentshtml*
Disallow: /*login?redirectPath=
Disallow: /*logout?redirectPath=
Disallow: /*nextThread.html
Disallow: /*previousThread.html
Disallow: /*questionId*
Disallow: /*selectedImage*
Disallow: /*threadIndex=*
Disallow: /*topGallery*
Disallow: /ce/item.cms*
Disallow: /guide/*
Disallow: /registration/*
Disallow: /best-buys/clickout/*
Disallow: /best-buys/m/clickout/*
Disallow: /weather.html?latitude=*
Disallow: /weather.html?old=*
Disallow: /home/search.html*
Disallow: /best-buys/mercury/out*
Disallow: /api/infinite-list.html
Disallow: /api/infinite-list.html
Disallow: /mobileapps/*
#
# Disallow Money for Google News
User-agent: Googlebot-News
Disallow: /tmoney/*
#
# Allow Adsense
User-agent: Mediapartners-Google
Disallow:
#
#
User-agent: CrystalSemanticsBot
Disallow: /
#
User-agent: archive-it.org
User-agent: archive.org_bot
User-agent: ArchiveBot
User-agent: Arquivo-web-crawler
User-agent: Authory
User-agent: babel
User-agent: bl.uk_ldfc_bot
User-agent: bl.uk_ldfc_renderbot
User-agent: bne.es_bot
User-agent: bnf.fr_bot
User-agent: ecoresearch
User-agent: ecoresearchCrawler
User-agent: Europarchive.org
User-agent: kb.dk_bot
User-agent: Kulturarw3
User-agent: LivelapBot
User-agent: mediacloud
User-agent: mirrorweb
User-agent: MSIECrawler
User-agent: nationalarchives.gov.uk
User-agent: netarkivet.dk
User-agent: netEstate NE Crawler
User-agent: Nicecrawler
User-agent: nl-israel_iaharvester2026
User-agent: nlnbot
User-agent: nlncrawler
User-agent: PageFreezer
User-agent: SafeDNS
User-agent: SmarshBot
User-agent: special_archiver
User-agent: tkl.iis.u-tokyo.ac.jp
User-agent: TurnitinBot
User-agent: webarchiv.cz
User-agent: WordPress.com mShots
User-agent: XY-Archive-Compliance-Archiver
User-agent: XY-Archive-Compliance-Crawler
Disallow: /
#
User-agent: AhrefsBot
User-agent: AI2Bot
User-agent: Ai2Bot-Dolma
User-agent: Amazonbot
User-agent: amazon-kendra-
User-agent: anthropic-ai
User-agent: Applebot-Extended
User-agent: bedrockbot
User-agent: Bytespider
User-agent: CloudflareBrowserRenderingCrawler
User-agent: CCBot
User-agent: ChatGLM-Spider
User-agent: ChatGPT-User
User-agent: ClaudeBot
User-agent: Claude-SearchBot
User-agent: Claude-User
User-agent: Claude-Web
User-agent: cohere-ai
User-agent: Cotoyogi
User-agent: DeepSeekBot
User-agent: Diffbot
User-agent: DuckAssistBot
User-agent: EchoboxBot
User-agent: FacebookBot
User-agent: FriendlyCrawler
User-agent: Gemini-Deep-Research
User-agent: Google-CloudVertexBot
User-agent: Google-Extended
User-agent: GoogleOther-Image
User-agent: GoogleOther-Video
User-agent: GPTBot
User-agent: Grok
User-agent: iaskspider/2.0
User-agent: ICC-Crawler
User-agent: ImagesiftBot
User-agent: img2dataset
User-agent: ISSCyberRiskCrawler
User-agent: Kangaroo Bot
User-agent: KunatoCrawler
User-agent: LinerBot
User-agent: Meltwater
User-agent: Meta-ExternalAgent
User-agent: Meta-ExternalFetcher
User-agent: MistralAI-User
User-agent: OAI-Operator
User-agent: OAI-SearchBot
User-agent: omgili
User-agent: omgilibot
User-agent: PanguBot
User-agent: PerplexityBot
User-agent: Perplexity-User
User-agent: PetalBot
User-agent: QualifiedBot
User-agent: Sc
head
- title
- US Home | Daily Mail Online
- description
- —
social
no OpenGraph or Twitter meta tags found