Does PyRasgo handle imbalanced binary classification problems or should I rebalance classes before using it?
In most cases, PyRasgo will handle imbalanced binary classification without you needing to rebalance the classes. However, PyRasgo does not do any rebalancing itself. Instead, PyRasgo uses metrics (AUC, log loss, etc.) that are not impacted by the imbalance to build the models. In my testing of catboost with these metrics, models built on imbalanced data with as little as 1% in the positive class either match or outperform the performance on rebalanced data.
If your data has less than 1% in one of the classes, you should check multiple rebalanced percentages to determine the level that gives the best performance, and you will need to perform this rebalancing outside of PyRasgo before calling its methods.