< Back to Glossary

Data Science Lifecycle

The Data Science Lifecycle is an iterative approach to managing data science contributions within an organization. Seven steps are commonly cited as being part of this cycle including (1) business understanding, (2) data mining, (3) data cleaning, (4) data exploration, (5) feature engineering, (6) predictive modeling, and (7) data visualization.
Step one involves discerning what questions you want your data exploration to inform, and what defining the problem you hope to fix with data. Steps two and three typically constitute most of the time and effort spent within the lifecycle, and are the primary steps of the cycle that Diffbot products can help to accelerate. Steps four through seven occur once data is in a usable state and may be ongoing efforts that are continued in the form of dashboards or reporting once even once the lifecycle has restarted.