`kolena.dataset`#

Examples: kolena/examples/dataset ↗

`upload_dataset(name, df, *, id_fields=None)` #

Create or update a dataset with the contents of the provided DataFrame df.

Updating id_fields

ID fields are used to associate model results (uploaded via upload_results) with datapoints in this dataset. When updating an existing dataset, update id_fields with caution.

Parameters:

Name	Type	Description	Default
`name`	`str`	The name of the dataset.	required
`df`	`Union[DataFrame, Iterator[DataFrame]]`	A DataFrame or iterator of DataFrames. Provide an iterator to perform batch upload (example: `csv_reader = pd.read_csv("PathToDataset.csv", chunksize=10)`).	required
`id_fields`	`Optional[List[str]]`	Optionally specify a list of ID fields that will be used to link model results with the datapoints within a dataset. When unspecified, a suitable value is inferred from the columns of the provided `df`. Note that `id_fields` must be hashable.	`None`

`download_dataset(name, *, commit=None)` #

Download an entire dataset given its name.

Parameters:

Name	Type	Description	Default
`name`	`str`	The name of the dataset.	required
`commit`	`Optional[str]`	The commit hash for version control. Get the latest commit when this value is `None`.	`None`

Returns:

Type	Description
`DataFrame`	A DataFrame containing the specified dataset.

`EvalConfig = Optional[Dict[str, Any]]` `module-attribute` #

User defined configuration for evaluating results, for example {"threshold": 7}.

`DataFrame = Union[pd.DataFrame, Iterator[pd.DataFrame]]` `module-attribute` #

A type alias representing a DataFrame, which can be either a pandas DataFrame or an iterator of pandas DataFrames.

`download_results(dataset, model)` #

Download results given dataset name and model name.

Concat dataset with results:

df_dp, results = download_results("dataset name", "model name")
for eval_config, df_result in results:
    df_combined = pd.concat([df_dp, df_result], axis=1)

Parameters:

Name	Type	Description	Default
`dataset`	`str`	The name of the dataset.	required
`model`	`str`	The name of the model.	required

Returns:

Type	Description
`Tuple[DataFrame, List[Tuple[EvalConfig, DataFrame]]]`	Tuple of DataFrame of datapoints and list of tuples, each containing an evaluation configuration and the corresponding DataFrame of results.

`upload_results(dataset, model, results, thresholded_fields=None)` #

This function is used for uploading the results from a specified model on a given dataset.

Parameters:

Name	Type	Description	Default
`dataset`	`str`	The name of the dataset.	required
`model`	`str`	The name of the model.	required
`results`	`Union[DataFrame, List[Tuple[EvalConfig, DataFrame]]]`	Either a DataFrame or a list of tuples, where each tuple consists of an eval configuration and a DataFrame.	required
`thresholded_fields`	`Optional[List[str]]`	Columns in result DataFrame containing data associated with different thresholds.	`None`

Returns:

Type	Description
`None`	None

kolena.dataset#

upload_dataset(name, df, *, id_fields=None) #

download_dataset(name, *, commit=None) #

EvalConfig = Optional[Dict[str, Any]] module-attribute #

DataFrame = Union[pd.DataFrame, Iterator[pd.DataFrame]] module-attribute #

download_results(dataset, model) #

upload_results(dataset, model, results, thresholded_fields=None) #

`kolena.dataset`#

`upload_dataset(name, df, *, id_fields=None)` #

`download_dataset(name, *, commit=None)` #

`EvalConfig = Optional[Dict[str, Any]]` `module-attribute` #

`DataFrame = Union[pd.DataFrame, Iterator[pd.DataFrame]]` `module-attribute` #

`download_results(dataset, model)` #

`upload_results(dataset, model, results, thresholded_fields=None)` #