AgenticData
Web Search & Scraper MCP Tool
CLI Tool Name: agentic_web_search
Fetches any publicly accessible URL and returns a complete SEO signal report. Designed for AI agents that need to analyze web pages, audit competitor content, or extract structured metadata — without relying on search APIs or browser automation.
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
| url | string | yes | The full URL to fetch and analyze, including the protocol (https://). |
What it extracts
SEO metadata
Title, meta description, canonical URL, robots directives, word count.
Open Graph tags
og:title, og:description, og:image, og:type — for social sharing previews.
Twitter Card
twitter:card, twitter:site, twitter:creator.
Heading structure
All H1–H6 headings extracted in order — reveals content architecture.
Link breakdown
Internal vs. external link counts, nofollow links.
HTTP status
Response code — detect redirects, 404s, and server errors.
Common use cases
→Audit competitor landing pages for SEO gaps
→Verify that your own pages have correct meta tags and canonical URLs
→Extract structured content from web pages for research
→Monitor pages for changes in title, description, or heading structure
→Check that Open Graph tags are set correctly before publishing
Example output
json
{
"url": "https://example.com/article",
"status_code": 200,
"seo": {
"title": "Example Article Title",
"meta_description": "A concise description of the page content.",
"canonical_url": "https://example.com/article",
"robots": "index, follow",
"word_count": 1842
},
"open_graph": {
"og:title": "Example Article Title",
"og:description": "A concise description of the page content.",
"og:image": "https://example.com/og-image.jpg",
"og:type": "article"
},
"twitter_card": {
"twitter:card": "summary_large_image",
"twitter:site": "@example"
},
"headings": {
"h1": ["Example Article Title"],
"h2": ["Introduction", "Key Concepts", "Conclusion"],
"h3": ["Subpoint A", "Subpoint B"]
},
"links": {
"internal": 12,
"external": 5,
"nofollow": 2
}
}[info]
This tool makes an outbound HTTP request to the URL you provide. It respects robots.txt directives and uses a neutral User-Agent string.