hachyderm.io is one of the many independent Mastodon servers you can use to participate in the fediverse.
Hachyderm is a safe space, LGBTQIA+ and BLM, primarily comprised of tech industry professionals world wide. Note that many non-user account types have restrictions - please see our About page.

Administered by:

Server stats:

9.4K
active users

Stephan Buys

Question for the and folks. What would you say is the most important task or activity that data engineers do or fulfill on a regular basis? Is that also the thing that takes the most time or causes the most resource constraints?

@stephanbuys my app has a glitch that was a reply for you 😅

@mariah it all comes down to people in the end 😃 I see on your profile that you are in the solar industry. Use mqtt/modbus/opcua at all?
In the 15+ years that I've been in what is now called , the people part of the process has always been pivotal. (Where's the data, who owns it, where should it be, what should it look like, why doesn't it look the same anymore, what answers do you need, etc, etc, all comes back to people :-) )

@stephanbuys I’d say the most important is finding ways to communicate connections and concepts, but the thing that take the longest, IMO, is often formatting the data 😖

@Tiamat precisely! Communication is key. As for "formatting the data", it is one of the things we're actively trying to make easier over at hotrod.app.

hotrod.appHotrod | HotrodData Engineering Simplified

@stephanbuys For me it‘s „how to select x by y in $lang/$datastrucuture?“.

@qrios well said 😂 I've got some hope for things built around the `arrow` ecosystem.

@stephanbuys I'm probably not a proper data engineer. But I do build tools to support a data science team. In practice, this means a lot of time spent moving and transforming data at scale, and designing/managing batch job systems.

Rust is really fantastic for a lot of this stuff, and we've open sourced several tools. But at scale, data munging starts to blend back into (distributed) software engineering. Which is a really fun challenge.

@emk I've always appreciated your work around and !

We had a similar trajectory. Started building things in Docker and NodeJS at first, then started building Rust tools, eventually we could see some patterns emerging, and one of the main problems were "how do we manage all of this", we've built some of our answers into our app, but there are always more challenges. Management of the "data estate" is a huge problem in it's own right.

@stephanbuys Yeah, wrangling data at scale is just endlessly challenging, but in an interesting way.

We don't talk enough about the open source #RustLang stuff we've built. But we should!

Our data mover: dbcrossbar.org/
Our itty-bitty Pachyderm replacement: github.com/faradayio/falconeri
Tiny CSV stuff: github.com/faradayio/csv-tools
Geocoding manager : github.com/faradayio/geocode-c

Rust makes this stuff so easy and so utterly reliable. Many thanks to crate authors!

www.dbcrossbar.orgdbcrossbar Guide - Using dbcrossbardbcrossbar copies tabular data between PostgreSQL, BigQuery, CSV and many other databases and formats.

@stephanbuys
What many spend the most time on:
- putting out fires and being reactionary when data pipelines break.

What drives the most value:
- Steering the company to develop data infrastructure, data assets, and a data model that enables 1) a clear snapshot of truth, 2) quicker iterations for product development, and 3) scalability within the next 2 years.

I interviewed a leader on this: tinyurl.com/58ajm5cf

@joereis and Matt Housley's book goes into this!

Scaling DataOpsSDO 001 - What is DataOps? - Christopher BergBy On the Mark Data