软件更新   

jsoup 1.7.1 发布,解析速度提升 2.3 倍

时间:2012-09-24 00:02:53

jsoup 1.7.1 发布了,下载地址:

jsoup 是一款 Java 的HTML 解析器,可直接解析某个URL地址、HTML文本内容。它提供了一套非常省力的API,可通过DOM,CSS以及类似于JQuery的操作方法来取出和操作数据。

该版本在性能和稳定性方面都有不少提升,功能上也做了改进:

Improvements:

- Improved parse time, now 2.3x faster than previous release, with lower memory consumption.
- Reduced memory consumption and garbage collection when selecting elements.
- Removed an unnecessary synchronisation in Tag.valueOf, allowing multi-threaded parsing to run faster.
- Introduced finer granularity of exceptions in Jsoup.connect, including HttpStatusException and UnsupportedMimeTypeException, allowing programmers better control of error cases.
- In Jsoup.clean, allow custom Document.OutputSettings, to control pretty printing, character set, and entity escaping.
- Whitespace normalise document.title() output.
- In Jsoup.connect, fail faster if the return content type is not supported.
- Made entity decoding less greedy, so that non-entities are less likely to be incorrectly treated as entities.
- In Jsoup.connect, enforce a connection disconnect after every connect. This precludes keep-alive connections to the same host, but in practise many implementations will leak connections, particularly on error.
- If a server doesn't specify a content-type header, treat that as OK.
- If a server returns an unsupported character-set header, attempt to decode the content with the default charset (UTF8), instead of bailing with an unsupported charset exception.
Bug fixes:
- Fixed an issue when determining the Windows-1254 character-set from a meta tag when run in the Turkish locale.
- Fixed whitespace preservation in textarea tags.
- Fixed an issue that prevented frameset documents to be cleaned by the Cleaner.
- Fixed an issue when normalising whitespace for strings containing high-surrogate characters.

 

来源:开源中国社区 [http://www.oschina.net]

Notice: Constant RUNTIME already defined in /srv/html/srccn/news/config.php on line 15 Notice: Constant ROOTDIR already defined in /srv/html/srccn/news/config.php on line 16 Notice: Constant SITEDIR already defined in /srv/html/srccn/news/config.php on line 17 Notice: Constant DATAURL already defined in /srv/html/srccn/news/config.php on line 20 Notice: Constant VERSION already defined in /srv/html/srccn/news/system/kernel.php on line 17 Notice: Constant COREDIR already defined in /srv/html/srccn/news/system/kernel.php on line 18 Fatal error: require(): Cannot redeclare class mysql in /srv/html/srccn/news/system/kernel.php on line 22