给定一个网址列表,我想检查每个网址:
Given a list of urls, I would like to check that each url:
最终目标是一个能够将 URL 标记为可能已损坏的系统,以便管理员可以查看它们.
The end goal is a system that is capable of flagging urls as potentially broken so that an administrator can review them.
脚本将用 PHP 编写,并且很可能每天通过 cron 运行.
The script will be written in PHP and will most likely run on a daily basis via cron.
该脚本将一次处理大约 1000 个网址.
The script will be processing approximately 1000 urls at a go.
问题有两部分:
使用 PHP cURL 扩展.与 fopen() 不同,它还可以发出 HTTP HEAD 请求,这些请求足以检查 URL 的可用性并为您节省大量带宽,因为您不必下载整个页面进行检查.
Use the PHP cURL extension. Unlike fopen() it can also make HTTP HEAD requests which are sufficient to check the availability of a URL and save you a ton of bandwith as you don't have to download the entire body of the page to check.
作为起点,您可以使用如下函数:
As a starting point you could use some function like this:
function is_available($url, $timeout = 30) {
$ch = curl_init(); // get cURL handle
// set cURL options
$opts = array(CURLOPT_RETURNTRANSFER => true, // do not output to browser
CURLOPT_URL => $url, // set URL
CURLOPT_NOBODY => true, // do a HEAD request only
CURLOPT_TIMEOUT => $timeout); // set timeout
curl_setopt_array($ch, $opts);
curl_exec($ch); // do it!
$retval = curl_getinfo($ch, CURLINFO_HTTP_CODE) == 200; // check if HTTP OK
curl_close($ch); // close handle
return $retval;
}
但是,有很多可能的优化:您可能想要重新使用 cURL 实例,如果每个主机检查多个 URL,甚至可以重新使用连接.
However, there's a ton of possible optimizations: You might want to re-use the cURL instance and, if checking more than one URL per host, even re-use the connection.
哦,这段代码确实严格检查 HTTP 响应代码 200.它不遵循重定向 (302) -- 但也有一个 cURL 选项.
Oh, and this code does check strictly for HTTP response code 200. It does not follow redirects (302) -- but there also is a cURL-option for that.
这篇关于如何使用 PHP 以编程方式检查有效(非死)链接?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!