During the summers of my high school years in suburban Georgia, my friend and I would fill the time by randomly walking into local establishments asking for odd jobs. It was a great way as a student to meet people from all walks of life and learn about different industries. We interviewed to be warehouse […]
Google introduced to the general public the term Knowledge Graph (“Things not Strings”) when they added the information boxes that you see to the right-hand side of many searches. However, the benefits of storing information indexed around the entity and its properties and relationships are well-known to computer scientists and have been one of the […]
Our mission at Diffbot is to build the world’s first comprehensive map of human knowledge, which we call the Diffbot Knowledge Graph. We believe that the only approach that can scale and make use of all of human knowledge is an autonomous system that can read and understand all of the documents on the public […]
We just released Diffbot API clients in 36 different programming languages, ranging from general purpose languages (Ruby/Python/Java), to systems languages (Go/C), to scripting languages (Bash), and even embedded (x86-64 anyone?). View them here: http://github.com/diffbot.
Previously, I wrote about how Amazon EC2 Spot Instances + Auto Scaling are an ideal combo for machine learning loads. In this post, I’ll provide code snippets needed to set up a workable autoscaling spot-bidding system, and point out the caveats along the way. I’ll show you how to set up an auto-scaling group with […]
Machine Learning Loads are Different than Web Loads One of the lessons I learned early is that scaling a machine learning system is a different undertaking than scaling a database or optimizing the experiences of concurrent users. Thus most of the scalability advice on the web doesn’t apply. This is because the scarce resources in machine […]
On June 25th, over 80 of Silicon Valley’s top hackers, designers, and students gathered at the Diffbot offices in Palo Alto in the hopes of building the next great Web 3.0 app. Over the next 13 hours, participants learned about data and analysis APIs, formed teams, and wrote code. At the end of the night, only […]
In a recent benchmark, Diffbot placed first overall among text extraction APIs on an academic evaluation set and one sampled from Google News. Tomaz Kovacic, a university student in artificial intelligence, recently conducted a comprehensive benchmark of text extraction methods as part of his thesis. Included in the study are commercial vendors as well as open-source APIs […]