How we spent $2500 and got 36 libraries and thousands of new developers

By on February 6, 2014

We just released Diffbot API clients in 36 different programming languages, ranging from general purpose languages (Ruby/Python/Java), to systems languages (Go/C), to scripting languages (Bash), and even embedded (x86-64 anyone?). View them here: http://github.com/diffbot.

API Hackers

36 new Diffbot experts

Backstory: In a survey in our latest Developer Newsletter, we received feedback from users on the number of bugs in our third-party contributed libraries. We’re fortunate to have an awesome and active developer community that’s contributed many Diffbot API client libraries in their favorite languages. However, some of these libraries had grown stale and not kept up with our latest features, products and page-types. Documentation was scant to none. We decided it was time to clean up these libraries, document, and officially support them. We think every developer should be able to query Diffbot from clean code in their favorite programming language. So we would make it happen.

The problem we were trying to address:

  • A growing number of 3rd party contributed libraries meant our users often encountered buggy, non-maintained code while trying to integrate Diffbot, potentially resulting in a bad experience
  • External maintainers meant we couldn’t control the release of new updates and fixes
  • Some languages had no libraries at all

Numerous but unloved third-party libraries

After quickly identifying a list of the most-used languages, we used the oDesk API to post a public job description for each specific language (like this one) on oDesk. (Nearly everything we do involves some sort of API around here, so when it came to hiring for this, we thought API-first.) oDesk responded in force. We received thousands of applications for our combined 36 job postings from developers all around the world. (And as an unintended side effect, most of these interested developers stopped by our signup page to register for a free trial, exposing thousands of software engineers to our extraction APIs. Many of these developers have since written us to let us know how Diffbot has helped them in unrelated projects.) The hardest part was going through the messages of many qualified developers and choosing the best. Sadly, we don’t have an API for that — yet!

The result

  • 36 client libraries
  • Lines of code: 56,042
  • Total cost: ~$75 / language
  • Diffbot hours spent: 18

CodeFlower is really neat

But now how will we maintain all this code?

36 libraries is whole lot easier to maintain than 100! And having commissioned these libraries from a common job spec, they are much more uniform now, and aligned with our own architecture and roadmap. Most of the libraries have a vanilla call() function where “type” is passed in as a parameter and a JSON object is returned. So, no updates will be needed as we roll out new page types — the bulk of our roadmap — simply pass in the type argument and it should mostly work. The libraries also all now work with our new programmatic Crawlbot and Bulk-submission interfaces for premium users. Having libraries under our own maintenance means we can easily point developers to actual code snippets when they write into support@diffbot.com, no matter what language they develop in. We’ve already gotten pull requests on some libraries, and we can now be quicker in approving these versus a repo maintained by a third party.

Finally, how to talk to Diffbot in 36 languages

http://github.com/diffbot

ActionScript:

    var diffbot:DiffbotAS3Client = new DiffbotAS3Client("DIFFBOT_TOKEN");
diffbot.getClassifier("http://www.xconomy.com/san-francisco/2012/07/25/diffbot-is-using-computer-vision-to-reinvent-the-semantic-web/");

Bash:

  REPLY=$( diffbot URL TOKEN API [API_PARAM1 VALUE1 …] )

C:

  struct Diffbot *df = diffbotInit();
  diffbotJasonObj *response = diffbotRequest(df, url, token, API_ANALYZE, 2);

C++:

  Diffbot diffbot("MY_DIFFBOT_TOKEN");
  diffbot.ApiRequest(url);

C#:

  ArticleApi api = new ArticleApi("http://api.diffbot.com", "", "2");
  Article article = await api.GetArticleAsync("", new string[] { "*" }, null);

Common LISP:

  (article-api token "http://diffbot.com/")

Clojure:

  (article token "http://blog.diffbot.com/diffbots-new-product-api-teaches-robots-to-shop-online/")

Coffeescript:

  client = new Client '<your_key>'
  pageclassifier = client.pageclassifier 'http://someurl.com'

D:

  auto diffbot = new DiffBot(token,url);
  auto response = diffbot.sendRequestToServer();

Dart:

   var request = HttpRequest.getString(url).then(onDataLoaded).catchError(JsonError);

Delphi:

  var
    analyzeBot: IDiffbotAnalyze;
  begin
    analyzeBot:= GetDiffbotAnalyze('...token...');
    response:= analyzeBot.Load('http://www.diffbot.com/our-apis/article', True);
  end

Erlang:

  R = dfbcli:diffbot_analyze(Args#dfbargs{fields = ["meta", "tags"], mode = article}),
  case R of
    {ok, Resp} ->
        io:format("Json object is:~n~p~n", [Resp]);
    {error, Why, Details} ->
        io:format("error: ~p ~p: ~p: ~p~n", [?MODULE, ?LINE, Why, Details])
  end,

Fortran:

  include "modules/fdiffbot.f90"
  program example
        response => diffbot("http://www.google.com", token, api, optargs, version)
  end program example

Go:

  article, err := diffbot.ParseArticle(token, url, nil)

Groovy:

  HashMap result = DiffbotArticle.analyze(token, url, content, [timeout: '5000', fields: 'meta,querystring,images(*)'])

Haskell:

  diffbot token url . setTimeout 15000 $ defFrontPage { frontPageAll = True }

Java:

  DiffbotClient client = new DiffbotClient(testToken);
  BlogPost a= (BlogPost) client.callApi("analyze",BlogPost.class,"http://diffbot.com");

Javascript:

  client.analyze.get({
            url: "http://www.xconomy.com/san-francisco/2012/07/25/diffbot-is-using-computer-vision-to-reinvent-the-semantic-web/"
        }, function onSuccess(response) {
            // output the summary
            document.getElementById("content").write(response.summary);
        });

Lua:

  d = diffbot 'DEVELOPER_TOKEN'
  c = d:analyze 'http://diffbot.com/products/'

MATLAB:

  JSON_Return=diffbot(URL,token,API,fields,version); % This return is in Json format
    MATLAB_RETURN=json.load(JSON_Return);

Objective-C:

  [DiffbotAPIClient apiRequest:DiffbotPageClassifierRequest UrlString:articleURL OptionalArgs:optionalArgs Format:DiffbotAPIFormatJSON withCallback:^(BOOL success, id result) {
        if(success) {
            NSLog(@"Call success: %@", result);
        } else {
            NSLog(@"Error: %@", result);
        }
    }];

OCaml:

  let response = analyze Frontpage
                   ~token:""
                   ~url:"http://huffingtonpost.com" in

Octave:

  [data success message] = diffbot(api_url, "param1", value1, "param2", value2, ...)

Perl:

  my $response = $client->query({
        request_type => 'analyze',
        query_args => {
            url => 'http://diffbot.com',
            timeout => 30000,
            fields => 'title,link,text'
        }
    });

PHP:

  $d = new diffbot("DEVELOPER_TOKEN");
  $c = $d->analyze("http://diffbot.com/products/");

PL/SQL:

  set scan off
  set serveroutput on format wrapped
  declare
  obj json;

  begin
    DIFFBOT_PKG.error_code:=null;
    obj:=DIFFBOT_PKG.diffbot(p_api=>'analyze'
        ,p_url=>'http://www.de'
        ,p_token=>''
        ,p_fields=>'');
    obj.print;
  end;
  /

Powershell:

  Get-DiffBot http://www.xconomy.com/san-francisco/2012/07/25/diffbot-is-using-computer-vision-to-reinvent-the-semantic-web/ article -fields "images,supertags"

Prolog:

  :-use_module(library(diffbot)).
  :-diffbot_defaults([token=..]).

  diffbot('http://www.xconomy.com/san-francisco/2012/07/25/diffbot-is-using-computer-vision-to-reinvent-the-semantic-web/',[api=article, fields='icon,url'],J,v2),show_json(J).

Python:

  diffbot = DiffbotClient()
  response = diffbot.request(url, token, api, version=2)

Ruby:

  client = Diffbot::APIClient.new do |config|
    config.token = ENV["DIFFBOT_TOKEN"]
  end
  article = client.article.query(:fields => [:title, :link, :text], :timeout => 2000)
  response = article.get("http://someurl.com/")

Rust:

  let mut response: TreeMap<~str, Json>
                = diffbot::call(..., "article", ...).unwrap();

Scala:

  val f: Future[JsValue] = Diffbot.call("article", url)

Thanks for reading, and thanks to our 36 new Diffbot experts — and the many more who expressed interest in us. Till next time, may your code by concise and your pull requests frequent.