Image

Knowledge base → Website link analysis script

[Scripts]
Date of publication: 20.10.2023

1. Internal links

The script for analyzing links on a website allows you to get a report on the server response code and identify broken links that generate a 404 error (page not found).

Over time, any site undergoes changes as pages and connecting links are added, and those added long ago are outdated, which leads visitors to non-existent pages. This is especially important when you do not have a business card website, but a website with a large number of pages.

After analyzing all the links, you will see in the report which pages contain broken links and you will be able to fix them.

2. External links

All sites are connected to each other by links in one way or another. Links provided to other sites become outdated after some time. The reasons may be different, for example, a change in structure, the creation of a new site, or its closure altogether. Since you are linking to material, script or service on a third-party site, it does not depend on you, it is controlled by another owner.

This script also analyzes the availability of external links leading from your site. The report will also indicate pages that contain outdated links that no longer exist, which will give you the opportunity to see and correct this.

3. Installing the script

In our example, we install the necessary packages on Debian 12: php8.2 and the verification script itself in phar format, which we will run from the Linux console.

apt install php php-xml
apt install wget
wget https://github.com/dantleech/fink/releases/download/0.10.3/fink.phar

4. Usage example

Specify your domain instead of domain.tld and, if necessary, adjust the path to the report file.

php /root/fink.phar https://domain.tld -x0 -o /root/report.json

Upon completion of the work, the script will generate a report. You can also observe in real time the entire process of crawling the links indicated on the pages of your site.

In our example, a site with 5,000 pages was processed in 14 minutes, which is significantly faster than using the offered online services.

4.1 Let's analyze the report file

apt install jq
cat /root/report.json | jq -c '. | select(.status==404) | {url: .url, referrer: .referrer}' | jq

The file will indicate the pages on which the broken links are located, following the example:

404 - https://domain.tld/some/olddate-page/removed
       (found at https://domain.tld/about/agreement)

4.1.1 Output everything except the good ones

cat /root/report.json | jq -c '. | select(.status!=200) | {url: .url, referrer: .referrer}' | jq

In this example, all pages found will be displayed, except those containing code 200.

Depending on the frequency of adding content, regular checking and analysis of links will make your site even more comfortable for visitors.





No Comments Yet