Solving Two Hard Problems in Computer Science Using Pandas

Actually you only wanted to build a simple pipeline to process incoming production data from power plants when you suddenly realized that you’re solving the hardest problems in computer science.

There are 2 hard problems in computer science: cache invalidation, naming things, and off-by-1 errors. – Leon Bambrick

When working with timeseries representing the production data of power plants, the fundamental step is to define a good data structure. The shape of the dataframes must be a good balance between speed and memory usage to work with the right and up-to-date amount of data (cache invalidation). The parameters must have good names for fast and unambiguous identification (naming things). The time intervals must be the correct fit for observed physics to prevent mismatches (off-by-1 errors).

I’ll show you a few Python tricks and my experience with how I solved these problems in my projects.

Solving Two Hard Problems in Computer Science Using Pandas

Thursday, May 23

12:35 - 13:05

Miroslav Šedivý