Diffbot A tool for Automated Web Data Extraction

Posted By : Harsh Soni | 30-Nov-2018
Diffbot - A tool for automated web data extraction

Diffbot is a tool which converts the unstructured data of the web into structured data


Data extracted using the Diffbot's crawler feeds that data into a big database called the DKG (Diffbot Knowledge Graph) which comprise a trillion of facts and billion of entities. Diffbot is also a popular tool amongst the giants such as Microsoft, Yandex, DuckDuckGo, eBay which uses it to enhance their search quality. Diffbot is more comprehensive than manually managed databases like Google's knowledge graph, but it's more exact and accurate. Diffbot's AI Crawlers refreshes the DKG regularly with new information. 


Not only is Diffbot more comprehensive than manually curated databases like Google’s Knowledge Graph, but it’s more accurate, too — Diffbot’s crawler regularly refreshes the DKG with new information, and its machine learning algorithms are smart enough to pass over sites with histories of producing “logically inconsistent” facts.


Diffbot Custom API


Diffbot extracts the web data using its Automated API. What if you want to get data for specific web elements, here comes the diffbot Custom API. Creating a custom API allows you to extract almost everything from any website using the Diffbot's rendering engine. Diffbot's rendering engine is a cloud-based rendering engine and it fully executes page level scripts in order to get Ajax delivered elements.


Creating a custom API 


Firstly register on diffbot website to create a trial account. You can create at most 5 custom rules with this account. Login from the token provided to you and click on the "Custom APIs" navigation Tab as shown. You will see a list of all of the custom rules created by you. 

custom API Toolkit

Now let's create a custom API, switch to the "Create a rule" tab and select custom API from the drop-down. A popup will ask for custom rule name and page URL for which you want to create a custom rule. 

Diffbot API

About Author

Author Image
Harsh Soni

Harsh is an experienced Developer with multiple tech stack such as PHP, NodeJs, Javascript, Angular. He loves learning new technologies and experimenting on them.

Request for Proposal

Name is required

Comment is required

Sending message..