Johannes Weytjens

Blog

A collection of posts about economics, python and machine learning.

RSS feed

Resample unbalanced (panel) datasets in Pandas fast

Pandas by default assumes that consecutive observations in a panel dataset are consecutive dates. This is not the case for unbalanced panel datasets, where units don't need to appear for in every period. This creates problems when calculating a ``.diff()`` or ``.shift()``. One solution is to resample the missing observations. This posts provides a fast resampling method that supports periods that aren't a fixed unit of time such as months. Aug 6, 2024