Skip to content

kolena.dataset#

upload_dataset(name, df, *, id_fields=None) #

Create or update a dataset with the contents of the provided DataFrame df.

Updating id_fields

ID fields are used to associate model results (uploaded via upload_results) with datapoints in this dataset. When updating an existing dataset, update id_fields with caution.

Parameters:

Name Type Description Default
name str

The name of the dataset.

required
df Union[DataFrame, Iterator[DataFrame]]

A DataFrame or iterator of DataFrames. Provide an iterator to perform batch upload (example: csv_reader = pd.read_csv("PathToDataset.csv", chunksize=10)).

required
id_fields Optional[List[str]]

Optionally specify a list of ID fields that will be used to link model results with the datapoints within a dataset. When unspecified, a suitable value is inferred from the columns of the provided df. Note that id_fields must be hashable.

None

download_dataset(name, *, commit=None) #

Download an entire dataset given its name.

Parameters:

Name Type Description Default
name str

The name of the dataset.

required
commit Optional[str]

The commit hash for version control. Get the latest commit when this value is None.

None

Returns:

Type Description
DataFrame

A DataFrame containing the specified dataset.

EvalConfig = Optional[Dict[str, Any]] module-attribute #

User defined configuration for evaluating results, for example {"threshold": 7}.

DataFrame = Union[pd.DataFrame, Iterator[pd.DataFrame]] module-attribute #

A type alias representing a DataFrame, which can be either a pandas DataFrame or an iterator of pandas DataFrames.

download_results(dataset, model) #

Download results given dataset name and model name.

Concat dataset with results:

df_dp, results = download_results("dataset name", "model name")
for eval_config, df_result in results:
    df_combined = pd.concat([df_dp, df_result], axis=1)

Parameters:

Name Type Description Default
dataset str

The name of the dataset.

required
model str

The name of the model.

required

Returns:

Type Description
Tuple[DataFrame, List[Tuple[EvalConfig, DataFrame]]]

Tuple of DataFrame of datapoints and list of tuples, each containing an evaluation configuration and the corresponding DataFrame of results.

upload_results(dataset, model, results, thresholded_fields=None) #

This function is used for uploading the results from a specified model on a given dataset.

Parameters:

Name Type Description Default
dataset str

The name of the dataset.

required
model str

The name of the model.

required
results Union[DataFrame, List[Tuple[EvalConfig, DataFrame]]]

Either a DataFrame or a list of tuples, where each tuple consists of an eval configuration and a DataFrame.

required
thresholded_fields Optional[List[str]]

Columns in result DataFrame containing data associated with different thresholds.

None

Returns:

Type Description
None

None