Talk

Solving Two Hard Problems in Computer Science Using Pandas

Thursday, May 23

12:35 - 13:05
RoomTagliatelle
LanguageEnglish
Audience levelIntermediate
Elevator pitch

Actually you only wanted to build a simple pipeline to process incoming production data from power plants when you suddenly realized that you’re solving the hardest problems in computer science.

Abstract

There are 2 hard problems in computer science: cache invalidation, naming things, and off-by-1 errors. – Leon Bambrick

When working with timeseries representing the production data of power plants, the fundamental step is to define a good data structure. The shape of the dataframes must be a good balance between speed and memory usage to work with the right and up-to-date amount of data (cache invalidation). The parameters must have good names for fast and unambiguous identification (naming things). The time intervals must be the correct fit for observed physics to prevent mismatches (off-by-1 errors).

I’ll show you a few Python tricks and my experience with how I solved these problems in my projects.

TagsPandas, Science, Abstractions, Data Structures
Participant

Miroslav Šedivý

Using Python to make the sun shine and the wind blow. Greedy polyglot, sustainable urbanist, unicode collector, PSF Fellow.