Chapter 6. The Data Value Stack
Building agile data products means staging an environment where reproducible insights occur, are reinforced, and are extended up the value stack. It starts simply with displaying records. It ends with driving actions that create value and capture some of it. Along the way is a voyage of discovery.
This voyage has structure. It is called the data-value stack.
Climbing the Stack
The data value stack mirrors Maslow's hierarchy of needs. The higher levels like predictions depend on the lower levels, so we can't skip steps. If we do so, we will lack sufficient structure to easily build features and value thereafter.
The data value stack begins with the simple display of records, where the focus is on plumbing our data pipeline all the way through to the users' screen. We then move on to charts, where we extract enough structure from our data to display its properties in aggregate. Next comes identifying relationships and exploring data through interactive reports. This enables statistical inference to generate predictions. Finally, we use these predictions to create value by driving user behavior and creating and capturing value.
In the rest of the book we will climb the data-value stack together, using data from your own email inbox.
The Data Value Stack
Records - the processing and display of atomic records through our entire stack.
Charts - extracting properties from records in aggregate to produce charts.
Reports - extracting relationships and trends to enable exploration and interactive charts.
Predictions - using structure to make inferences, predictions and recommendations.
Actions - driving user behavior to create value and capture some of it.
As we climb the stack we extract increasing amounts of derived structure from our data to produce increasingly sophisticated features. Light is the best cleaner of data, and data which is not exposed in features seldom cleans itself. Structure and features are a byproduct of one another. Therefore we cannot skip steps in the pyramid. Doing so undermines our ability to proceed further.
Nor can we specify features in later steps before working through those that proceed them. Doing so results in lackluster products specified in the blind and uninformed by reality. We must respect that the data has its own opinion.
We'll be using this structure to build our application around your inbox. Lets get started!