PHP script to check links on a web page
Link Checker is a script that crawls every page of a site, and detects broken links and redirects.
Checking broken links is essential to maintain a site, for both users and search engines. All broken links must be removed as well redirects updated, whether they are external or internal to the site, for the good image of the site, and for a better ranking on search engines.
This program is an
alternative to Xenu, it has the advantage to group the links by pages checked,
which makes editing easier, and it is portable.
Compared to the online validator of W3C it has the advantage to not block on certain pages.
In addition, this open source program is easy to modify thanks to the Scriptol or PHP source code. It works with PHP 5 and virtually all operating systems and models of computers.
A version of Link Checker with graphical user interface is also available. It is harder to install but easier to use.
Commands and usage
The script in command line can be executed by the Scriptol PHP compiler:
solp linche [options] page
or directly by the PHP 5 interpreter:
php linche.php [options] page
The page is a complete URL in the form:
-r recursive, follow internal links. By default only the page is scanned.
-s short list. Displays only broken links and link to page redirected on the site.
-f fast. Speed up the processing with a reduced time out . A value may be user defined and added after -f.
-v verbose, displays all links with the HTTP header code found.
By default displays only links with errors.
-q quiet, no display.
The program produces a file named links.log which incorporates all the results and that you may save under another name if you want to retain them.
The results provided by the script are not perfect. Sometimes a link is reported "broken" while the page is accessible using the browser. This comes from a response time too long, or the server. In this case simply ignore the result.
Error messages and actions to accomplish
When the script tries to access a page, the server can return different code depending on the status of the page. It can be redirected by the .htaccess file or by a PHP script, and it may be missing.
200 OK. Page found.
301 Redirect permanent. The page is redirected permanently. The link must be changed.
400 Bad request. Syntax of the request not understood by the server.
401 Unauthorized. Access denied.
403 Forbidden.The server refuses access to the script.
404 Page not found. Link broken.
500 Internal server error. Problem on the server.
Note that most of the time, the program displays a message in place of the HTTP code:
OK Corresponds to 200.
Bad URL Try again later or remove the link.
Broken. Code 404 of broken link, delete or search the page.
Redirect. Corresponds to 301, update the link.
These various codes, other than OK, must not be taken into account while the page is accessible, except for permanent redirects which must be updated.
For the complete list of codes and their meaning, see the HTTP codes document.
With the -s option, only these codes are taken into account:
404 broken link.
301 the page on the site is redirected.
- 1.4 February 2012. 302 temporary redirects are no longer displayed to reduce the number of useless messages and treated as 200 OK code.
- 1.3 April 2011. Added the -f option. Improved the display by adding en exception to catch system messages.
- 1.2 April 2011. Added the -s option.
- 1.0. 2008. First release.
Getting the program and license
This link checker
is licensed under the GNU General Public License 2.0. Use
it freely. If
you distribute the archive, you must retain the copyright in both the Scriptol
and PHP source code.
The changes and possible improvements in the code must be made public and supplied in the form of open source code, even if you use the modified program online.
If PHP is not installed, you should download the interpreter to version 5 on php.net.
- Download the archive in zip format,
- unpack it ,
- open a command line window and
- run the program in the window according to the syntax given above.
By Denis Sureau. GNU GPL 2.0 license.