On the subject of becoming data-driven, a lot of writing can be found these days. It’s hard to find any meaningful, concrete content however. That’s why this subtitle caught my attention: Using Apache Beam to become data-driven, even before you have big data. But perhaps my expectations were wrong in the first place.
It’s an interesting article and it’s well-worth reading. A model such as Apache Beam is useful for both small and big data applications, and you can run it on execution frameworks that match your need (standalone as well as on a cluster). As such it helps making your applications scale-proof. The standardized model decouples the application logic from the runner (framework), which should help to make it more future-proof.
It was however a disappointment that there’s no mention whatsoever of the term data-driven in the article. Perhaps somebody just added a subtitle using some buzzwords to attract some more attention to the article? Or perhaps it’s just unclear what is meant by this data-driven hype?
AlphaGo is data-driven. It makes decisions based on data, machine learning and deep learning. It’s a simplification, but in general the automated decision making is what makes it data-driven, and not data-informed. Unless organizations are replacing management by some form of AI and as long as there’s humans involved that actually make the decisions, they’re not data-driven but attempt to be as data-informed as possible.
Many of the students I work with in my big data classes don’t actually have big data problems; instead, they’re worried about future-proofing their code. These students are hoping that when they do have big data to process and analyze, they’ll be ready.