Majestic

  • Site Explorer
    • Majestic
    • Podsumowanie
    • Odsyłające domeny
    • Linki zwrotne
    • * Nowy
    • * Utracony
    • Kontekst
    • Anchor text
    • Strony
    • Tematyki
    • Link Graph
    • Strony powiązane
    • Zaawansowane narzędzia
    • Author ExplorerBeta
    • Summary
    • Similar Profiles
    • Profile Backlinks
    • Attributions
  • Porównaj
    • Podsumowanie
    • Historia linków zwrotnych
    • Flow Metric History
    • Tematyki
    • Clique Hunter
  • Narzędzia do linków
    • Mój Majestic
    • Ostatnia aktywność
    • Raporty
    • Kampanie
    • Zweryfikowane domeny
    • OpenApps
    • Klucze API
    • Słowa kluczowe
    • Keyword Generator
    • Analiza słów kluczowych
    • Search Explorer
    • Narzędzia do linków
    • Bulk Backlinks
    • Analiza sąsiedztwa
    • Prześlij adresy URL
    • Eksperymentalne
    • Połączenie indeksów
    • Link Profile Fight
    • Linki wzajemne
    • Solo Links
    • Raport PDF
    • Typo Domain
  • Free SEO Tools
    • Procedura rozpoczęcia
    • Backlink Checker
    • Majestic Million
    • Wtyczki Majestic
    • Google Sheets
    • Post Popularity
    • Social Explorer
  • Wsparcie
    • Blog Link zewnętrzny
    • Wsparcie
    • Procedura rozpoczęcia
    • Narzędzia
    • Subskrypcje i rozliczenia
    • FAQ
    • Słowniczek
    • Przewodnik stylu
    • Filmy instruktażowe
    • Instrukcja API Link zewnętrzny
    • Kontakt
    • About Backlinks and SEO
    • SEO in 2025
    • The Majestic SEO Podcast
    • All Podcasts
    • What is Trust Flow?
    • Link Building Guides
  • Zarejestruj się BEZPŁATNIE
  • Plany i ceny
  • Zaloguj
  • Language flag icon
    • English
    • Deutsch
    • Español
    • Français
    • Italiano
    • 日本語
    • Nederlands
    • Polski
    • Português
    • 中文
  • Procedura rozpoczęcia
  • Zaloguj
  • Plany i ceny
  • Zarejestruj się BEZPŁATNIE
    • Podsumowanie
    • Odsyłające domeny
    • Mapa
    • Linki zwrotne
    • Nowy
    • Utracony
    • Kontekst
    • Anchor text
    • Strony
    • Tematyki
    • Link Graph
    • Strony powiązane
    • Zaawansowane narzędzia
    • Podsumowanie
      Plan Pro
    • Historia linków zwrotnych
      Plan Pro
    • Flow Metric History
      Plan Pro
    • Tematyka
      Plan Pro
    • Clique Hunter
      Plan Pro
  • Bulk Backlinks
    • Keyword Generator
    • Analiza słów kluczowych
    • Search Explorer
      Plan Pro
  • Analiza sąsiedztwa
    Plan Pro
    • Połączenie indeksów
      Plan Pro
    • Link Profile Fight
      Plan Pro
    • Linki wzajemne
      Plan Pro
    • Solo Links
      Plan Pro
    • Raport PDF
      Plan Pro
    • Typo Domain
      Plan Pro
  • Prześlij adresy URL
    • Summary
      Plan Pro
    • Similar Profiles
      Plan Pro
    • Profile Backlinks
      Plan Pro
    • Attributions
      Plan Pro
  • Raporty spersonalizowane
    Plan Pro
    • Procedura rozpoczęcia
    • Backlink Checker
    • Majestic Million
    • Wtyczki Majestic
    • Google Sheets
    • Post Popularity
    • Social Explorer
    • Procedura rozpoczęcia
    • Narzędzia
    • Subskrypcje i rozliczenia
    • FAQ
    • Słowniczek
    • Filmy instruktażowe
    • Instrukcja API Link zewnętrzny
    • Kontakt
    • Wiadomości
    • The Company
    • Przewodnik stylu
    • Warunki ogólne
    • Polityka prywatności
    • GDPR
    • Kontakt
    • SEO in 2025
    • The Majestic SEO Podcast
    • All Podcasts
    • What is Trust Flow?
    • Link Building Guides
  • Blog Link zewnętrzny
    • English
    • Deutsch
    • Español
    • Français
    • Italiano
    • 日本語
    • Nederlands
    • Polski
    • Português
    • 中文

Improve your internal links using Python string-matching

Andreas Voniatis

To round out our internal linking odyssey, Andreas Voniatis from Artios explains how Python can do much more for your internal links than the humble spreadsheet you might be used to.

@andreasvoniatis  
Andreas Voniatis 2024 podcast cover with logo
More SEO in 2024 YouTube Podcast Playlist Link Spotify Podcast Playlist Link Audible Podcast Playlist Link Apple Podcast Playlist Link

Improve your internal links using Python string-matching

Andreas says: “Use Python’s string-matching functions to increase the relevance of your internal links on your website.”

Can you give a brief explanation of the value of using Python for SEO?

“What I love about Python is that it can scale SEO really well. A lot of SEOs will be working in spreadsheets and there are obviously restrictions or limitations in terms of what a spreadsheet can do. They are limited in the scale of the data they can handle, like the number of rows, but also in the complexity of the functions and calculations that they can perform with that data.

For example, if you're optimizing a high-traffic website with tons of pages, like Amazon, then you're going to find scalable SEO analysis in Excel or Google Sheets pretty limiting.

Instead, you can use an IPython notebook known as Jupyter, that will allow you to run Python code. If you import string-matching functions, you can take a target keyword and compare that to the title tags of your site pages to try and find the best page to send internal links to.”

Are you using this to determine whether a page or a piece of content is sufficiently optimized or just to find the most appropriate internal page to link to?

“You could also use it for measuring how optimized your content is, which is a different use case for Python. Python has many use cases for scalable and data-driven SEO. In this case, though, we're trying to find content like blog posts where you can place internal links that will help reshape the importance of your target content for Google and other search engines.”

What content elements are you looking for?

“The great thing about doing this is that there are so many different ways to approach it. On a basic level, you could take your target keyword and the title tags of all of your content, and then simply use a string-matching function to calculate the similarity between them. Based on that similarity metric, you could use a quick rule of thumb to say that anything that's 60% or above would be considered suitable pages to place internal links on, for example.

You could do it at the body content level but that's a bit more complex because you need to ingest that content into a spreadsheet cell (or what we call a DataFrame in Python language) to do that kind of calculation. That’s possible thanks to Python.

If you don’t know what a good rule of thumb is, you can go even deeper. You can say, ‘I want to model the median’ or ‘I want to model the 95th percentile of what's considered relevant.’ You can determine your rule of thumb on a statistical basis rather than on something that you pulled out of thin air.”

Would you be able to incorporate intent into what you're looking for?

“You absolutely can. If you had the target keyword for your site content then you could create another separate column in which you've predetermined whether those two keywords share the same search intent or not.”

What data sources are required for this?

“If you wanted to do this at a basic level, you could just rely on crawling data alone. If you want to get search intent involved, then you'll need SERP data so that you can determine the similarity between your target keyword and the focus keyword of the content page you're comparing the search intent of. If you wanted to look at whether Google was crawling that page live, you would obviously use server logs.”

How do you clean URLs that you wouldn't want to link to?

“That’s a slightly separate issue, but let's get into it. One of the things that I do is model the page rank or link equity of a website using crawl data and external backlink data, so that I get both the internal and external page rank. Then, I amalgamate those two data sources together to get what I would call the ‘effective page rank’, which combines both the internal and the external.

Using that, you can transform or pivot your existing site structure away from the typical catalogue/product group structure (which might make sense from a librarian’s perspective) and move it more towards the type of content structure that the internet is more interested in.”

Should all SEOs be doing this or is it primarily for technical SEOs?

“To me, any SEO should have a holistic view, and all SEOs should understand it. If you call yourself an SEO generalist or an SEO consultant, then you should have a level of competency, if not experience or understanding, in the holistic elements of SEO.

You should be competent in your technical, your content, and your backlinks/off-page SEO. Technical SEOs should know how to do this themselves, but SEO content strategists might not need to.”

How can you use statistical distributions to model relevance and highlight under-served target content?

“If you look at the median number of internal links to a product category on an e-commerce site, for example, those will be very different from the median number of internal links to a product item. I don’t want to create a hard-and-fast rule. I don’t want to say that any pages that have less than 10 internal links need more links, or that you should add a certain number of links to those pages. If you use statistical distributions, you're taking a smarter, more tailored approach. You're taking a segmented approach, and you're accounting for the fact that not all content is equal.

You would expect your product categories to have more internal links, so the threshold will be high. Your product items may have fewer internal links, or it might be the other way around. The point is to take a segmented approach. By using distributions, you're moving away from hard-and-fast rules.”

Is this just for internal links or can this approach be used to determine the optimum landing page for external links as well?

“You can apply it to absolutely everything. That's the whole premise of being data-driven.”

How do you measure the ROI of improved internal linking?

“You would benchmark the ROI beforehand and then it's almost like a split test. You would benchmark what it was before, then you could make the change following the model’s recommendations and see what the ROI is afterwards. However, if you're going to make this change site-wide, then you would want to do a split A/A test because you're comparing the result of the internal linking on the same URL against itself, before and after.

If you wanted to make it truly scientific, then you would conduct a split A/B test. In that case, you would only make that change on a collection of unlinked URLs, measure the revenue before and after, then compare it to the control group.”

Does providing better and more relevant internal links also enhance usability?

“In theory (and, in many cases, in a practical sense), search engine SEO and user experience are often aligned. By optimizing your content for the search engines, you should also be optimizing it for the user. If the user knows what they're getting before they click on the link, and the link is more relevant for their needs, then that should improve their experience.”

If an SEO is struggling for time, what should they stop doing right now so they can spend more time doing what you suggest in 2024?

“Stop getting better at Excel and retrain in Python.

Personally, I rarely use Excel. I use Google Sheets but only for putting together nice graphs because the ones produced by Python are a bit too sciencey for a business audience.

A more diplomatic and practical approach would be to say, ‘Limit your use of Excel and retrain in Python’. You’ll start noticing that you can invest ten minutes or one hour working out how to solve a dilemma in Python rather than Excel and, eventually, it will get to the point where you can do so much more in Python that you will drop Excel like a hot potato.

Python is also well future-proofed. That’s not to say there won't be a language in 10, 15, or 20 years that will supersede Python. However, the great thing is that, once you learn a computing language, those skills are transferable to almost any other computing language. I started out using R, which is a statistical computing language. Once I saw that more of the SEO industry was favouring Python, it was really easy for me to switch. A lot of the function names are identical.”

Andreas Voniatis is Founder at Artios, and you can find him over at Artios.io.

@andreasvoniatis  

Also with Andreas Voniatis

Andreas Voniatis 2025 podcast cover with logo
SEO in 2025
Identify what brings in backlinks, instead of where they come from

We’ve talked about the basics of SEO, and is anything more central to the industry than backlinks? Andreas Voniatis from Artios suggests it’s less about hunting them down and more about drawing them in.

Majestic SEO Podcast - the Majestic SEO podcast cover
Majestic SEO Podcast
#57: How AI is being used to power organic growth – Live Podcast
Andreas Voniatis, Pam Aungst Cronin, and Victoria Olsina join David Bain to talk about how AI is being used to power organic seo growth.
Andreas Voniatis 2023 podcast cover with logo
2023 Additional Insight
Use data science to inform your SEO
Andreas Voniatis emphasizes the importance of taking a statistical approach to SEO, and shares how you can embrace data science to uncover SEO insights.

Choose Your Own Learning Style

Webinar iconVideo

If you like to get up-close with your favourite SEO experts, these one-to-one interviews might just be for you.

Watch all of our episodes, FREE, on our dedicated SEO in 2024 playlist.

youtube Playlist Icon

Podcast iconPodcast

Maybe you are more of a listener than a watcher, or prefer to learn while you commute.

SEO in 2024 is available now via all the usual podcast platforms

Spotify Apple Podcasts Audible

Book iconBook

This is our favourite. Sometimes it's better to sit and relax with a nice book.

The best of our range of interviews is available right now as a physical copy and eBook.

Amazon US Amazon UK

Don't miss out

Opt-in to receive email updates.

It's the fastest way to find out more about SEO in 2025.


Czy możemy poprawić tę stronę dla ciebie? Powiedz nam

Fresh Index (Indeks Świeży)

Przeszukane unikalne adresy URL 333 922 886 882
Znalezione unikalne adresy URL 752 931 325 342
Zakres danych 24 sty 2025 do 24 maj 2025
Data ostatniej aktualizacji 3 min. temu

Historic Index (Indeks Historyczny)

Przeszukane unikalne adresy URL 4 502 566 935 407
Znalezione unikalne adresy URL 21 743 308 221 308
Zakres danych 06 cze 2006 do 26 mar 2024
Data ostatniej aktualizacji 03 maj 2024

SPOŁECZNOŚCIOWE

  • LinkedIn
  • YouTube
  • Facebook
  • Bluesky
  • Twitter
  • Blog Link zewnętrzny

FIRMA

  • Wskaźniki Flow Metric
  • O nas
  • Warunki ogólne
  • Polityka prywatności
  • GDPR
  • Kontakt

NARZĘDZIA

  • Plany i ceny
  • Site Explorer
  • Porównaj domeny
  • Bulk Backlinks
  • Search Explorer
  • Developer API Link zewnętrzny

MAJESTIC DLA

  • Link Context
  • Backlink Checker
  • Profesjonaliści SEO
  • Analitycy mediów
  • Odkrywanie influencerów
  • Firma Link zewnętrzny

PODCASTS & PUBLICATIONS

  • The Majestic SEO Podcast
  • SEO in 2025
  • SEO w 2024 roku
  • SEO w 2023 roku
  • SEO w 2022 roku
  • All Podcasts
top ^