Blog

  • Diffbot - A tool for automated web data extraction

    Diffbot is a tool which converts the unstructured data of the web into structured data

     

    Data extracted using the Diffbot's crawler feeds that data into a big database called the DKG (Diffbot Knowledge Graph) which comprise a trillion of facts and billion of entities. Diffbot is also a popular tool amongst the giants such as Microsoft, Yandex, DuckDuckGo, eBay which uses it to enhance their search quality. Diffbot is more comprehensive than manually managed databases like Google's knowledge graph, but it's more exact and accurate. Diffbot's AI Crawlers refreshes the DKG regularly with new information. 

     

    Not only is Diffbot more comprehensive than manually curated databases like Google’s Knowledge Graph, but it’s more accurate, too — Diffbot’s crawler regularly refreshes the DKG with new information, and its machine learning algorithms are smart enough to pass over sites with histories of producing “logically inconsistent” facts.

     

    Diffbot Custom API

     

    Diffbot extracts the web data using its Automated API. What if you want to get data for specific web elements, here comes the diffbot Custom API. Creating a custom API allows you to extract almost everything from any website using the Diffbot's rendering engine. Diffbot's rendering engine is a cloud-based rendering engine and it fully executes page level scripts in order to get Ajax delivered elements.

     

    Creating a custom API 

     

    Firstly register on diffbot website to create a trial account. You can create at most 5 custom rules with this account. Login from the token provided to you and click on the "Custom APIs" navigation Tab as shown. You will see a list of all of the custom rules created by you. 

    custom API Toolkit

    Now let's create a custom API, switch to the "Create a rule" tab and select custom API from the drop-down. A popup will ask for custom rule name and page URL for which you want to create a custom rule. 

    Diffbot API

Tags: css