Hello everyone,
Few weeks ago I had the chance to attend the Data Summit 2024 where speakers from different organizations spoke about a myriad of topics related to data. There were 10 sessions given by people who were CEOs, Data Managers, Data Engineers. Below are the titles of all the sessions I attended. In the coming weeks I will be posting my notes from each of these sessions & these titles will turn into links that you can use to jump between different session notes -
Beyond the Basics: The last 10 things data teams think about
Building a shared foundation of trust through data storytelling
The dynamic duo of data and machine learning engineering in cybersecurity
The rise of the data generalist: Smaller teams, bigger impact
Navigating the maze of specialization in data engineering
Your Company's Success isn't Measured in megabytes, it's measured in impact!
Let’s start with the 5th session I attended-
Beyond the Basics: The last 10 things data teams think about
The Host : Veronika Durgin who is the VP of Data at Saks
About Veronika, She has 20+ years of experience in data industry & she carries a background in software engineering.
Here goes the list -
The Forgotten Bucket of Work
Veronika begins by explaining the usual 3 buckets of tasks in a company -
Business projects - These projects creates new capabilities & products. The work with results in strategic advantage for a company & hence it's aligned with the company goals.
Unplanned Work - As the name suggests, the work such as bug fixes, crashes , escalations. This kind of work is always urgent & unavoidable.
& finally the forgotten bucket -
The work which is dedicated to enhancements, fixing tech debt, the time spent in working on shortcuts that were taken in past to deliver value fast, integration of new tools, testing of new ideas. This often is considered the non glamourous part of the data world and at times people find it difficult to tie this directly to business value.
In reality this a bucket with should be paid attention to, because if the elements of this bucket are ignored they will in fact make us move slower in the future.
As examples, Imagine a team that doesn’t follow any preset rules for coding or a team where work of one Analyst is never tested by anyone.
Hidden cost of Build v/s Buy
The hidden cost of buy is Total cost of ownership. Often in an organization there is fight between the 2 camps when discussing new solutions - whether they should buy a prebuilt product or build the product in house from scratch. Veronica shares a table of things to consider total cost of ownership for the 2 school of thoughts -
Definition of Done
The definition of done given by leadingagile.com goes as follows -
When all conditions or acceptance criteria, that a software product must satisfy are met and are ready to be accepted by the user, customer, team or consuming system.
Veronica adds that every task should have a success criteria such as
Development based on requirements
Validating data
Establishing SLAs (Service Level Agreements)
Define monitoring & Alerting
Hand over to next team
Data SLA
In today’s world most of the time’s its considered super important to keep the most updated data available for teams but with this point Veronica mentioned may be it would be better to consider 2 points - Data Freshness & Data Completeness & take actions based on that.
If certain is not required by final user to be updated every day, we should not waste processing power & money in updating data when no ones needs it.
Rules should be laid for wrong data values such as whether the whole update would be rejected or some level of incoherence is to be expected.
These SLAs should be well known to the business teams to avoid unnecessary escalations.
Seasonality
Seasonality is present in literally most of the business that’s why its’ important that data systems have the capability to scale up or down was required based on seasonality.
If teams decide to do a code freeze during this time of the year, they still need talk whether developments being done can still be tested on testing environments.
Because of low tolerances for bugs in peak seasonality there needs to exist a “fast lane” for fixes.
Dates
Everyone knows about different dates formats but very few truly understand the impact they can have. Veronica gives a few tips below of working with dates -
Self-Recovering Pipelines
Traditional ETL pipelines were sequential, meaning if one task would fail, everything fails!
Now a days it has becomes more common to decouple EL from transform stage. It respects the fact that we can’t just create one big flow to accommodate for everything that can go wrong while processing occurs. Instead we need to build resiliency into data pipelines.
For example - The pipeline should only process changes rather running everything all the time.
Data Testing
Testing for data teams can’t just be limited to test environments, Veronica advocates having a data which closely mimics the production which is the only way we can really validate the accuracy of data.
Environmental Impact
According to International Energy Association (IEA) in 2020, Data centers produced around 300 metric tons of greenhouse gas!
The data center industry accounts for 2.5% to 3.7% of global greenhouse gas emission MORE THAN BOTH AVIATION & GLOBAL SHIPPING INDUSTRY!!!
A few Call to Actions suggested were -
Clean up unused & redundant data
Implement retention policies
Optimize code
Walk a mile in their shoes
As data teams we need to put ourselves into shoes of business users. Veronica gives the following tips -
The End…
Great session by Veronica. These 10 points are really overlooked by most teams & keeping these in mind would put your data team in a league of their own. The best thing about that is that the overall business will be impacted positively with these actions in the long term.
You can connect with Veronika over Linkedin
Share this post with your network if you liked it,
Raghunandan 🎯
& If it’s your first time here, TheWeekendFreelancer currently has 5 ongoing series - Tools 🛠️, Maths 📈, Domain 🌐, Trends 📻 & My Notes 📝. Have fun reading!
P.S. - “The Weekend Freelancer” is a reader backed publication. Share this newsletter with your friends and relatives & consider becoming a free or paid member of this newsletter. Every subscription gives me an extra ounce of motivation to keep going 💪!
You can also support my work here. Cheers!