HydUtils is a Python utility library designed for data handling and validation, especially for time series and hydrological datasets. It provides several useful functions for working with time series data, including validation, filtering, error metrics, and more, making it easier to handle and analyze hydrological and weather-related datasets.
pip install hydutils
The function validate_columns_for_nulls
checks for columns that contain null values and raises an error if any are
found.
from hydutils.df_helper import validate_columns_for_nulls
import pandas as pd
df = pd.DataFrame({"a": [1, 2, 3], "b": [4, 5, None], "c": [7, 8, 9]})
# Validate for null values in any column
validate_columns_for_nulls(df)
# Specify columns to check
validate_columns_for_nulls(df, columns=["b"])
# Handling missing columns
validate_columns_for_nulls(df, columns=["d"]) # This will raise an error if column "d" is missing
The validate_interval
function checks that the time intervals between rows in the time series are consistent.
from hydutils.df_helper import validate_interval
import pandas as pd
df = pd.DataFrame({
"time": pd.date_range(start="2023-01-01", periods=5, freq="h")
})
# Check if the time intervals are consistent
validate_interval(df, interval=1)
The filter_timeseries
function allows you to filter your time series DataFrame based on a start and/or end date.
from hydutils.df_helper import filter_timeseries
import pandas as pd
from datetime import datetime
df = pd.DataFrame({
"time": pd.date_range(start="2023-01-01", periods=5, freq="h")
})
# Filter data between a start and end date
start = datetime(2023, 1, 1, 1)
end = datetime(2023, 1, 1, 3)
filtered_data = filter_timeseries(df, start=start, end=end)
The hydutils.metrics
module includes several commonly used metrics to evaluate model performance. These include MSE,
RMSE, NSE, R², PBIAS, and FBIAS.
The mse
function calculates the Mean Squared Error between two arrays.
from hydutils.statistical_metrics import mse
import numpy as np
simulated = np.array([3.0, 4.0, 5.0])
observed = np.array([2.9, 4.1, 5.0])
mse_value = mse(simulated, observed)
The rmse
function calculates the Root Mean Squared Error.
from hydutils.statistical_metrics import rmse
rmse_value = rmse(simulated, observed)
The nse
function calculates the Nash-Sutcliffe Efficiency coefficient.
from hydutils.statistical_metrics import nse
nse_value = nse(simulated, observed)
The r2
function calculates the coefficient of determination, R².
from hydutils.statistical_metrics import r2
r2_value = r2(simulated, observed)
The pbias
function calculates the Percentage Bias between observed and simulated values.
from hydutils.statistical_metrics import pbias
pbias_value = pbias(observed, simulated)
The fbias
function calculates the Fractional Bias between observed and simulated values.
from hydutils.statistical_metrics import fbias
fbias_value = fbias(observed, simulated)
This library is released under the MIT License.