June 27, 2025

How I Use GPT to Automate Contextual Internal Linking (Without Making It Spammy)

Internal linking—it sounds easy, right? Just drop a link here, add an anchor there, and move on to the next article. Simple enough for a handful of posts. But when you’re managing hundreds or even thousands of articles, that “simple” task quickly turns into an SEO nightmare.

Like most digital marketers, I’ve wrestled with manually inserting internal links more times than I care to admit. You know the drill: find the keywords, find suitable content, add the link. It’s essential for SEO, crucial for user experience, and yet completely unscalable.

But automation isn’t easy either. Most available tools simply shove links into keywords wherever they find them, cluttering the page and harming readability. I needed something smarter—an approach that understands context, respects readability, and scales effortlessly. That’s exactly why I turned to GPT.

In this guide, I’ll share exactly how I built a GPT-powered internal linking pipeline that scales beautifully, maintains editorial quality, and avoids spammy link insertion.

Why GPT Is a Game-Changer for Contextual Internal Linking

You’ve probably experimented with automation scripts or plugins that “auto-insert” internal links. And you probably weren’t thrilled with the results. These solutions tend to blindly match keywords without context, creating a mess of unnatural links.

What if, instead of keyword-stuffing links, you had an AI-powered editor that could naturally and contextually insert links just like a skilled human editor? That’s what GPT offers. It can:

  • Understand semantic context and meaning.
  • Naturally integrate keywords as anchor text.
  • Subtly rewrite sentences to ensure readability.
  • Help prevent over-optimization and maintain genuine user experience.

This isn’t AI hype—it’s a very practical, effective application of generative AI.

My Two-Step GPT Workflow for Automated Internal Linking

Here’s exactly how I built this system, step-by-step, to help you replicate this for your own blog or clients.

Step 1: Contextual Link Opportunity Mining via WordPress API

The first script crawls existing blog posts via the WordPress REST API, systematically breaking each post into paragraphs and table blocks. It then scans this content against a keyword list to find perfect contextual matches. The key here? Context, not just keywords.

Here’s the fully explained, production-ready code:

"""
================================================================================
🧩 Script: Contextual Link Opportunity Finder for WordPress Blog Content
--------------------------------------------------------------------------------

🎯 PURPOSE:
This Python script is designed to automate the identification of *contextually relevant*
internal linking opportunities within blog content hosted on a WordPress website (via REST API).
It assists in large-scale internal linking by scanning blog posts for paragraphs that mention
specific keywords, and then pairing those paragraphs with relevant target URLs.

🔍 HOW IT WORKS:
1. Loads a list of blog post URLs (`DONOR_FILE`) and a list of keywords with their respective
   target URLs (`KEYWORD_FILE`).
2. Uses the WordPress REST API to fetch each post's HTML content based on its slug.
3. Parses out <p> (paragraph) and <table> blocks using BeautifulSoup.
4. For each keyword, checks whether it's present in any paragraph or in/around a table block.
5. If found, extracts the paragraph as a suitable anchor context for linking.
6. Enforces a max limit of contextual links per post (`MAX_LINKS_PER_POST`) to avoid spammy linking.
7. Results (blog URL, keyword, matched paragraph, target URL) are written to a CSV (`OUTPUT_FILE`).

⚙️ USE CASES:
• SEO teams running interlinking sprints at scale
• Automating content audits for contextual linking gaps
• Feeding this CSV into a generative link rewriter or publishing system

📁 INPUT FILES:
- `second-iteration.csv`: Contains blog post URLs (donors).
- `pages-to-link.csv`: Contains keywords and the target URLs to link to.

📁 OUTPUT FILE:
- `contextual_link_candidates.csv`: Each row suggests a keyword → paragraph match where
   the keyword naturally appears in the post content.

🛡️ SAFEGUARDS:
- Skips paragraphs with fewer than 4 words to reduce noise.
- Avoids creating more than 3 links per post to prevent over-optimization.
- Uses regex word-boundary matching to avoid partial word collisions.
- Includes logic to fallback to nearby paragraphs if keyword is found inside a <table>.

🛠️ TO EXTEND:
- Add GPT scoring for confidence-based ranking.
- Group keyword variants using fuzzy or semantic matching.
- Use append mode and logging for large-scale batch runs.

Author: Daniel Sylvester Antony
Date: 2025
================================================================================
"""

import pandas as pd
import requests
import time
import re
from bs4 import BeautifulSoup
from urllib.parse import urlparse
from base64 import b64encode

# === CONFIG ===
USERNAME = ''
APP_PASSWORD = ''
WP_API_BASE = "https://website.com/blog/wp-json/wp/v2"
auth_str = f"{USERNAME}:{APP_PASSWORD}"
auth_header = {'Authorization': 'Basic ' + b64encode(auth_str.encode()).decode('utf-8')}

DONOR_FILE = 'second-iteration.csv'
KEYWORD_FILE = 'pages-to-link.csv'
OUTPUT_FILE = 'contextual_link_candidates.csv'
MAX_POSTS = 100
MAX_LINKS_PER_POST = 3
DELAY_SECONDS = 1.5

# === HELPERS ===

# Extracts the slug from a blog post URL
def extract_slug(url):
    parsed = urlparse(url)
    return parsed.path.strip('/').split('/')[-1]

# Pulls all paragraph and table blocks from HTML content
def extract_blocks(html):
    soup = BeautifulSoup(html, "html.parser")
    blocks = []
    for el in soup.find_all(['p', 'table']):
        tag_type = el.name
        content = el.get_text().strip().replace('\n', ' ')
        if content:
            blocks.append((tag_type, content))
    return blocks

# Checks if the exact keyword (word-boundary match) exists in a given text
def keyword_in_text(keyword, text):
    pattern = r'\b' + re.escape(keyword) + r'\b'
    return re.search(pattern, text, flags=re.IGNORECASE)

# === MAIN LOGIC ===
def main():
    interlink_df = pd.read_csv(DONOR_FILE)
    pages_df = pd.read_csv(KEYWORD_FILE)

    donor_urls = interlink_df['donor_url'].dropna().unique().tolist()[:MAX_POSTS]
    results = []

    for url in donor_urls:
        slug = extract_slug(url)
        print(f"🔍 Fetching: {slug}")
        try:
            response = requests.get(f"{WP_API_BASE}/posts?slug={slug}", headers=auth_header)
            time.sleep(DELAY_SECONDS)

            if response.status_code == 200 and response.json():
                post = response.json()[0]
                content_html = post['content']['rendered']
                blocks = extract_blocks(content_html)

                seen_pairs = set()
                post_link_count = 0

                for keyword_row in pages_df.itertuples():
                    keyword = str(keyword_row.Keywords).strip().lower()
                    target_url = keyword_row.URL

                    for idx, (block_type, text) in enumerate(blocks):
                        if len(text.split()) < 4:
                            continue  # Ignore short or empty paragraphs

                        pair_key = (url, keyword)
                        if pair_key in seen_pairs:
                            break  # Avoid repeating the same link suggestion for the same keyword

                        if keyword_in_text(keyword, text):
                            if block_type == 'table':
                                # Check surrounding paragraphs if the keyword appears inside a table
                                nearby_blocks = []
                                if idx > 0 and blocks[idx-1][0] == 'p':
                                    nearby_blocks.append(('p_near_table', blocks[idx-1][1]))
                                if idx < len(blocks)-1 and blocks[idx+1][0] == 'p':
                                    nearby_blocks.append(('p_near_table', blocks[idx+1][1]))

                                for para_type, para_text in nearby_blocks:
                                    results.append({
                                        'blog_url': url,
                                        'blog_title': post['title']['rendered'],
                                        'keyword': keyword,
                                        'link_to': target_url,
                                        'paragraph_type': para_type,
                                        'original_paragraph': para_text
                                    })
                                    seen_pairs.add(pair_key)
                                    post_link_count += 1
                                    break

                            elif block_type == 'p':
                                results.append({
                                    'blog_url': url,
                                    'blog_title': post['title']['rendered'],
                                    'keyword': keyword,
                                    'link_to': target_url,
                                    'paragraph_type': 'body_paragraph',
                                    'original_paragraph': text
                                })
                                seen_pairs.add(pair_key)
                                post_link_count += 1
                                break

                        if post_link_count >= MAX_LINKS_PER_POST:
                            break  # Enforce max links per post
            else:
                print(f"⚠️ Failed to fetch: {slug}")
        except Exception as e:
            print(f"❌ Error fetching {slug}: {e}")

    # Output the results
    df_out = pd.DataFrame(results)
    df_out.to_csv(OUTPUT_FILE, index=False)
    print(f"\n✅ Done. Output saved to {OUTPUT_FILE} with {len(results)} rows.")

if __name__ == "__main__":
    main()

This script gives you a CSV containing ready-to-link contexts that are genuinely relevant.

Step 2: Natural Link Insertion Using GPT

This is where GPT comes into play. Using the CSV from Step 1, each identified context and keyword is passed to GPT via OpenAI’s API. GPT acts as an editor, rewriting paragraphs slightly to insert links naturally and contextually.

Here’s the full implementation:

import pandas as pd
import openai
import time
import logging
import csv

openai.api_key = 'your-openai-api-key'

INPUT_FILE = 'contextual_link_candidates.csv'
OUTPUT_FILE = 'link_inserted_output.csv'
MODEL = 'gpt-4o-mini'
DELAY = 1.2
RETRIES = 3

logging.basicConfig(filename='link_insertion.log', level=logging.INFO)

def build_prompt(keyword, link_to, paragraph, blog_title):
    return (f"You're an SEO editor. Naturally insert the link [{keyword}]({link_to}) "
            f"into the following paragraph from '{blog_title}' without adding new sentences:\n\n{paragraph}")

df = pd.read_csv(INPUT_FILE)

with open(OUTPUT_FILE, 'a', newline='', encoding='utf-8') as csvfile:
    writer = csv.DictWriter(csvfile, fieldnames=[
        "blog_url", "blog_title", "keyword", "link_to", "original_paragraph", "revised_paragraph"
    ])
    if csvfile.tell() == 0:
        writer.writeheader()

    for row in df.itertuples():
        prompt = build_prompt(row.keyword, row.link_to, row.original_paragraph, row.blog_title)
        success, retries = False, 0
        while not success and retries < RETRIES:
            try:
                response = openai.ChatCompletion.create(
                    model=MODEL,
                    messages=[{"role": "user", "content": prompt}],
                    temperature=0.7, max_tokens=400
                )
                revised_paragraph = response.choices[0].message.content.strip()
                success = True
                writer.writerow({
                    "blog_url": row.blog_url,
                    "blog_title": row.blog_title,
                    "keyword": row.keyword,
                    "link_to": row.link_to,
                    "original_paragraph": row.original_paragraph,
                    "revised_paragraph": revised_paragraph
                })
                logging.info(f"Inserted link for {row.blog_url} using keyword '{row.keyword}'")
            except Exception as e:
                retries += 1
                logging.warning(f"Retry {retries} failed: {e}")
                time.sleep(DELAY * retries)
            time.sleep(DELAY)

print("GPT-powered link insertion complete.")

esults and Real-World Impact

This setup completely transformed our internal linking strategy. We can now scale internal linking effortlessly across hundreds of articles, dramatically boosting our SEO performance while maintaining great UX.

Future Enhancements

  • Confidence scoring for GPT insertions
  • Full CLI interface
  • Integrate with CMS workflows for seamless publishing

Give this a try on your blog or reach out if you need help adapting it. This isn’t just automation—it’s smart SEO scaling, done right.

Dan.marketing is the personal website of Daniel (Dan) Antony, a digital marketer passionate about business and technology
© Copyright 2025 - All Rights Reserved
envelopephone-handsetmap-marker
linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram