PHP: How to get the HTTP status code of a Web page?

PHP

How when you want to access remotely by a script, the pages of a site, to know whether a page exists, it is a redirection, or if a link is broken? This issue is essential for writing a script for the Web.

Here is a function to get this status code based on a PHP function, fsockopen, compatible with current versions of PHP. A script is provided to demonstrate the use, that meets the strict rules of this function.

Using fsockopen

We use the function with two parameters:
- The URL of the site, without filename, for example: www.scriptol.com.
- A file name, possibly with a directory.

The domain is passed as a parameter when calling the function.

$server = "www.example.com"
$fp=fsockopen($server,80,$errno,$errstr,30);

Without protocol added, the function uses the TCP default. It could be specified explicitly: tcp://www.scriptol.com or udp://www.scriptol.com.

The second parameter is the port, usually 80, the next two are used to retrieve the number and the error when it occurs. The latter is the maximum waiting time granted to get a response in milliseconds.

Once the connection opened, we send the request. It is in our case to access a file, hence the GET command:

$page = "index.php"
$out="GET /$page HTTP/1.1\r\n"; 
$out.="Host: $server\r\n"; 
$out.="Connection: Close\r\n\r\n";
fwrite($fp,$out);

It specifies the page, the server and it closes the connection. But the file is obtained and the HTTP header too.

To retrieve these data, the fgets function is used:

$content=fgets($fp);

This produces a line of the form:

HTTP/1.1 200 Ok

if the file is found. If the URL is a redirect it will be:

HTTP/1.1 301 Moved Permanently

While the code 404 indicates that the page does not exist.

We can now extract the code with the substr function as it is done in the script given below.

Source code

The source code of the script has been tested with this site. Replace "www.scriptol.fr" by your site's URL (without / at the end or filename but with the optional protocol tcp://).
Just replace the name of pages per pages of your site, with or without a directory.