This is the first stable release of Clear Read API. Clear Read is a service for extracting article text and metadata from a URL. The API takes any article URL as an input and turns it into a full-text XML feed that can be integrated into a third party application. Along with the article text, Clear Read pulls out metadata such as title, description and link. The API uses RESTful calls and responses are formatted in XML and JSON.
- worded by programmableweb.com
Join the Discussion on Hacker News
Turns the following URL into a Full-Text XML/JSON Response:
URL: http://blogs.balsamiq.com/product/2012/02/27/uxstackexchange/
Response:
<rss>
<channel>
<status>success</status>
<item>
<title>Balsamiq ❤ UX.StackExchange.com | Mockups Product Blog</title>
<description>Full Article Text (in encoded HTML)</description>
<link>http://blogs.balsamiq.com/product/2012/02/27/uxstackexchange/</link>
</item>
</channel>
</rss>
Point your App here: http://api.thequeue.org/v1/clear?url=[ArticleURL]&format=[xml/json]
Important: The url query must always come first.
None*. I may add keys in the future. As long as you don't crawl the entire web with this API, feel free to use it.
*Once again, DO NOT crawl the web with this API.
Premium Status: If you enjoy using Clear Read, please consider giving a donation through PayPal or Flattr. Any donations above $10/month will get you Premium Status and I'll do my best to prioritise your support requests.
For any Support or issues please let me know on Twitter (@mmackh) or via Email.
10/27/12: Major new feature: now stripping every inline tag except the essentials (href, alt, title, etc). Added new extraction patterns. Other minor fixes to the code.
9/28/12: Added Google Ad onto the homepage to pay for hosting. This isn't and will not be affecting API endpoints.
9/7/12: Various bug fixes: Improved reliability, better and more extraction patterns, can now handle AJAX blogs from Google and more.
7/18/12: Fixed bug that affected the extraction of some URLs.
7/17/12: Introducing the Toolbox. In case the extraction failed or you need to clear Clear Read's Cache on a particular URL, be sure to visit: Toolbox. Additional fixes: script removal, cleaner html output and inline css stripping.
5/1/12: Fixed Wikipedia extraction. Clarified usage limits.
5/30/12: Complete under the hood rewrite. Invalid URL error has been removed, the only error will now get, is if your parameters are incorrect. I've also made some caching tweaks, improved extraction reliability and javascript removal. Bug related to JSON & XML conversion difference should now be solved. Let me know if you run into anything unusual.
4/28/12: Over a Million Pages extracted since Clear Read API was launched 2 months ago.