
What a sad day
We will miss you…

What a sad day
We will miss you…
在开发调试支付宝接口时,突然发现支付宝接口的URL很长,远远大于之前自己印象中的255个字符。赶紧搜索查证了一番,理解如下:
URL不能大于255bytes的说法确实存在,在RFC2616中提到:
The HTTP protocol does not place any a priori limit on the length of a URI. Servers MUST be able to handle the URI of any resource they serve, and SHOULD be able to handle URIs of unbounded length if they provide GET-based forms that could generate such URIs. A server SHOULD return 414 (Request-URI Too Long) status if a URI is longer than the server can handle (see section 10.4.15).
Note: Servers ought to be cautious about depending on URI lengths above 255 bytes, because some older client or proxy implementations might not properly support these lengths.
从上一点也可以看出,255bytes的说法也是为了兼容性考虑。实际上现代浏览器的限制如下:
Microsoft Internet Explorer (Browser)
Microsoft states that the maximum length of a URL in Internet Explorer is 2,083 characters, with no more than 2,048 characters in the path portion of the URL. In my tests, attempts to use URLs longer than this produced a clear error message in Internet Explorer.
Firefox (Browser)
After 65,536 characters, the location bar no longer displays the URL in Windows Firefox 1.5.x. However, longer URLs will work. I stopped testing after 100,000 characters.
Safari (Browser)
At least 80,000 characters will work. I stopped testing after 80,000 characters.
Opera (Browser)
At least 190,000 characters will work. I stopped testing after 190,000 characters. Opera 9 for Windows continued to display a fully editable, copyable and pasteable URL in the location bar even at 190,000 characters.
Apache (Server)
My early attempts to measure the maximum URL length in web browsers bumped into a server URL length limit of approximately 4,000 characters, after which Apache produces a “413 Entity Too Large” error. I used the current up to date Apache build found in Red Hat Enterprise Linux 4. The official Apache documentation only mentions an 8,192-byte limit on an individual field in a request.
Microsoft Internet Information Server
The default limit is 16,384 characters (yes, Microsoft’s web server accepts longer URLs than Microsoft’s web browser). This is configurable.
Perl HTTP::Daemon (Server)
Up to 8,000 bytes will work. Those constructing web application servers with Perl’s HTTP::Daemon module will encounter a 16,384 byte limit on the combined size of all HTTP request headers. This does not include POST-method form data, file uploads, etc., but it does include the URL. In practice this resulted in a 413 error when a URL was significantly longer than 8,000 characters. This limitation can be easily removed. Look for all occurrences of 16×1024 in Daemon.pm and replace them with a larger value. Of course, this does increase your exposure to denial of service attacks.
另外值得注意的是,有文章提到作为<a>的href属性时,URL不能超过1024bytes,这点没有详细查证。
综上,URL还是不适合太长,不是不得已,尽量不要通过GET方式提交大量参数,可以考虑用POST方式(大约在2M左右,应该是和服务器及设定有关)。另外这么长的URL在访问和收藏(有文章提到有些浏览器在收藏超长地址时也是会出现问题)时也是相当不友好的。当然,之前数据库字段设置时还是作为255bytes处理,现在可能要考虑扩充一下了。
参考:
Google最近通过其官方博客公布了一个新的方式来帮助搜索引擎更好的避免抓取重复内容。之前的一般建议方式是通过301转向来帮助搜索引擎了解获取内容的一个首选版本地址。但是当访问方式太多时,例如后面带了各种不同参数,或者不加上www的访问等等。通过.htaccess或者其它的一些转向方式都比较麻烦,更不用说你对.htaccess或者程序完全不了解的时候,面对这种情况,相信google这次倡导的方式能够更好的帮助你解决问题。
使用方式很简单,只要在页面头部,即和之间,加上一句:
<link rel="canonical" href="http://www.example.com/product.php?item=swedish-fish" />
href之后的”"之中的地址就是你指定的首选版本地址,绝对地址最好,如果使用相对地址,则最好配合<base>声明域名后一起使用。这样处理后当爬虫抓取时,就能根据这个地址明了是否已经抓取过了。这样避免了在搜索引擎结果中有多条记录而内容相同,不仅帮助改进了搜索引擎结果的有效性,也对于自己页面的评级更有好处。
但是需要明确的是,这个标签只对指向同域名下的地址有效,对于不同域名的转向,还是要通过301的方式。具体还有一些更详细的Q&A,可以参看原文。
参考来源:
今天正好在网上看到一篇关于如何设置mbstring系列函数为php默认使用函数的一篇文章,顺便也就仔细看了看php.ini中mbstring部分的设置参数。mbstring系列函数在涉及到中文及其它亚洲字符集的开发中是经常使用的,研究一下还是有必要的。
先说文章中提及的一个参数:mbstring.func_overload。这个参数的好处在于当你已经开发了大量程序后发现需要处理多字节字符集的时候,不可能将之前的程序全部检查,将相关函数替换成mbstring多字节字符处理函数。这个时候你可以通过设置这个参数来使php默认使用mbstring系列函数来重载替代相对应的php内置函数(例如常用的substr()会被自动替换为mb_substr())。有5个可选值:
不过在php手册中也提及了这么一句:
It is not recommended to use the function overloading option in the per-directory context, because it’s not confirmed yet to be stable enough in a production environment and may lead to undefined behaviour.
哈哈,慎用慎用,毕竟这么强大无视的参数还是谨慎使用为好,影响太大。打开后影响的函数列表如下:
mail() -> mb_send_mail() strlen() -> mb_strlen() strpos() -> mb_strpos() strrpos() -> mb_strrpos() substr() -> mb_substr() strtolower() -> mb_strtolower() strtoupper() -> mb_strtoupper() substr_count() -> mb_substr_count() ereg() -> mb_ereg() eregi() -> mb_eregi() ereg_replace() -> mb_ereg_replace() eregi_replace() -> mb_eregi_replace() split() -> mb_split()
接下来几个参数也简略介绍一下,自己看注释理解的,可能有误:
大多数情况下默认值好像工作的就挺好,而且最好程序中也不要显式依赖于ini设置。使用mbstring系列函数时明确指明编码类型也许是一种更好的处理方式。在Unicode情况下,可以考虑设置为以下形式。
mbstring.language = neutral mbstring.internal_encoding = UTF-8 mbstring.encoding_translation = Off mbstring.http_input = auto mbstring.http_output = UTF-8 mbstring.detect_order = auto mbstring.substitute_character = none