As a symbol of our organization, Diffy, whose sticker is highly coveted by fans, may someday become as iconic as other company mascots. Although not always physically present for events, Diffy is a brand ambassador who is instantly recognizable (at least with the Diffbot team).
There are only a handful of publicly available knowledge graphs. And among those, only a few provide data with enough breadth to in some way represent the entire internet, and with enough granularity to be useful. (more…)
The entrepreneurs at Topic saw many of their customers struggle with creating trustworthy SEO content that ranks high in search engine results.
They realized that while many writers may be experts at crafting a compelling narrative, most are not experts at optimizing content for search. Drawing on their years of SEO expertise, this two-person team came up with an idea that would fill that gap.
They came up with Topic, an app that helps users create better SEO content and drive more organic search traffic.They had a great idea. They had a fitting name. The next step was figuring out the best way to get their product to market.
Hello everyone! William (Hare) here, but you can call me Will. I will be joining the sales team at Diffbot as a Sales Development Representative. I am looking forward to meeting everyone – eventually in person – and welcome any opportunity now for a virtual get-together.
I have always been interested in problem solving and negotiation for mutual benefit, i.e sales! I was actually hired for my first sales job at 14 years old when I was hanging out at the local lacrosse store and started talking with customers about equipment. Ultimately, I was able to refine my skills to not only guide people into the gear and “cosmetics” that would fit them best, but also learned to love the client facing aspect of the job.
As I moved through college at Cal Poly, San Luis Obispo, I became very interested in technology. I was able to work an internship with a team doing inside sales at Uber for a new project for on demand delivery, and fell in love with the sales development process. After that internship, I went into the wine industry and worked as a tasting room associate at a local winery, learning as much as I could about wine, the industry, and the nuances of pouring. After I graduated, I became the first Sales Development Representative at a small social engagement start-up in Palo Alto working directly with the CEO on sales development and strategy. Once I saw the breadth of Diffbot and understood its mission, I was immediately intrigued. I now consider myself extremely fortunate to be a part of the team!
On a personal note, I love hiking, snowboarding, and wine. I also coach high school lacrosse (go PALY Vikings). Once we are back in the office, I am looking forward to using the pool table and organizing a tournament.
I’m absolutely thrilled to join the sales team at Diffbot and engage with prospective clients to understand the power and vision of the company!
Hello! My name is Jerome Choo and I lead Growth Marketing at Diffbot. I spent the majority of my childhood in Singapore, eventually moving to Atlanta, GA, for high school and college where I started a career in biomedical engineering. My research focus, mostly unintended, was on mammal urination. Since then, I’ve found a new passion in leading data driven growth teams. I’m fascinated with blending objective scientific rigor and problem first design to create seven star user experiences. I’m pumped to help Diffbot solve the absolutely essential function of accessible structured data for everyone!
Many cornerstone providers of martech bill themselves out as “databases of the web.” In a sense, any marketing analytics or news monitoring platform that can provide data on long tail queries has a solid basis for such a claim. There are countless applications for many of these web databases. But what many new users or those early in their buying process aren’t exposed to is the fact that web-wide crawlers can crawl the exact same pages and pull out extensively different data.
Hello! My name is Andrew Harrold, a new data curator at Diffbot!
I was born and raised in Denver Colorado, and studied economics at the University of California, Berkeley. During my time at Berkeley, I took on a senior thesis which involved looking at the relationship between employee sentiment (scraped employee review data) and stock market returns.
Needless to say, I have a passion for data and love the many creative ways it can be used to derive meaningful insights; specifically within machine learning and web-scraping technology. Diffbot takes on the challenge of exracting data to find meaning out of chaos and can be used to help a wide range of businesses solve problems— which I am all about. I am very excited to join a talented team with such awesome technology!
Hello there! My name is Maosheng Guo and I am from China. I had just started recently as a Machine Learning Engineer in the research group. It’s my great pleasure to join the Diffbot family!
When pursuing my Ph.D. at the Harbin Institute of Technology, I focused my research on recognition, extraction and generating the reasoning relations in natural language (i.e., Textual Entailment and Natural Language Inference). During this time, I successfully improved several question-answering and dialogue systems/chatbots using inference techniques. On the one hand, reasoning in natural language is inseparable from the accumulation of knowledge. On the other hand, inference techniques also help the construction of knowledge graphs. When I learned about the Diffbot’s mission to build the first comprehensive map of human knowledge, I decided to join without any hesitation.
As for my hobbies, I enjoy traveling and doing outdoor sports in my free time. When I am not outdoors, I also find playing online party games with friends exciting. I am excited to start my new adventure at Diffbot!
Knowledge bases (also known as knowledge graphs or ontologies) are valuable resources for developing intelligence applications, including search, question answering, and recommendation systems. However, high-quality knowledge bases still mostly rely on structured data curated by humans. Such reliance on human curation is a major obstacle to the creation of comprehensive, always-up-to-date knowledge bases such as the Diffbot Knowledge Graph.
The problem of automatically augmenting a knowledge base with facts expressed in natural language is known as Knowledge Base Population (KBP). This problem has been extensively studied in the last couple of decades; however, progress has been slow in part because of the lack of benchmark datasets.
KnowledgeNet is a benchmark dataset for populating Wikidata with facts expressed in natural language on the web. Facts are of the form (subject; property; object), where subject and object are linked to Wikidata. For instance, the dataset contains text expressing the fact (Gennaro Basile; RESIDENCE; Moravia), in the passage:
“Gennaro Basile was an Italian painter, born in Naples but active in the German-speaking countries. He settled at Brunn, in Moravia, and lived about 1756…”
KBP has been mainly evaluated via annual contests promoted by TAC. TAC evaluations are performed manually and are hard to reproduce for new systems. Unlike TAC, KnowledgeNet employs an automated and reproducible way to evaluate KBP systems at any time, rather than once a year. We hope a faster evaluation cycle will accelerate the rate of improvement for KBP.
Please refer to our EMNLP 2019 Paper for details on KnowlegeNet, but here are some takeaways:
State-of-the-art models (using BERT) are far from achieving human performance (0.504 vs 0.822).
The traditional pipeline approach for this problem is severely limited by error propagation.
KnowledgeNet enables the development of end-to-end systems, which are a promising solution for addressing error propagation.