kolena.dataset
#
-
Examples:
kolena/examples/dataset
↗
upload_dataset(name, df, *, id_fields=None)
#
Create or update a dataset with the contents of the provided DataFrame df
.
Updating id_fields
ID fields are used to associate model results (uploaded via upload_results
)
with datapoints in this dataset. When updating an existing dataset, update id_fields
with caution.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name
|
str
|
The name of the dataset. |
required |
df
|
Union[DataFrame, Iterator[DataFrame]]
|
A DataFrame or iterator of DataFrames. Provide an iterator to perform batch upload (example: |
required |
id_fields
|
Optional[List[str]]
|
Optionally specify a list of ID fields that will be used to link model results with the datapoints within a dataset. When unspecified, a suitable value is inferred from the columns of the provided |
None
|
download_dataset(name, *, commit=None)
#
Download an entire dataset given its name.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name
|
str
|
The name of the dataset. |
required |
commit
|
Optional[str]
|
The commit hash for version control. Get the latest commit when this value is |
None
|
Returns:
Type | Description |
---|---|
DataFrame
|
A DataFrame containing the specified dataset. |
EvalConfig = Optional[Dict[str, Any]]
module-attribute
#
User defined configuration for evaluating results, for example {"threshold": 7}
.
DataFrame = Union[pd.DataFrame, Iterator[pd.DataFrame]]
module-attribute
#
A type alias representing a DataFrame, which can be either a pandas DataFrame or an iterator of pandas DataFrames.
download_results(dataset, model)
#
Download results given dataset name and model name.
Concat dataset with results:
df_dp, results = download_results("dataset name", "model name")
for eval_config, df_result in results:
df_combined = pd.concat([df_dp, df_result], axis=1)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dataset
|
str
|
The name of the dataset. |
required |
model
|
str
|
The name of the model. |
required |
Returns:
Type | Description |
---|---|
Tuple[DataFrame, List[Tuple[EvalConfig, DataFrame]]]
|
Tuple of DataFrame of datapoints and list of tuples, each containing an evaluation configuration and the corresponding DataFrame of results. |
upload_results(dataset, model, results, thresholded_fields=None)
#
This function is used for uploading the results from a specified model on a given dataset.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dataset
|
str
|
The name of the dataset. |
required |
model
|
str
|
The name of the model. |
required |
results
|
Union[DataFrame, List[Tuple[EvalConfig, DataFrame]]]
|
Either a DataFrame or a list of tuples, where each tuple consists of an eval configuration and a DataFrame. |
required |
thresholded_fields
|
Optional[List[str]]
|
Columns in result DataFrame containing data associated with different thresholds. |
None
|
Returns:
Type | Description |
---|---|
None
|
None |