None not found and 404 error

See the code 404 error below, when using my api key and running publish.df() for a new dataset. What causes these errors?

The table of course doesn’t exist yet, but I don’t recall running into this error when running publish.df() before.
Ran for a simple test dataframe :

dataset_name = 'test_df'
dataset_fqtn = 'RASGO.PUBLIC.TEST_DF_V_1'
parent_datasets = []
rasgo.publish.df(
            df=df1,
            name=dataset_name,
            description=f'Init dataset from df publish: {dataset_name}',
            fqtn=dataset_fqtn,
            parents=parent_datasets,
            attributes={"time_index": date_column},
            verbose=True,
            generate_stats=False
        )
rasgo_dataset_from_df(dataframe=df1, dataset_name='test_df', dataset_table_prefix='TEST_DF', rasgo_api_key=api_key)

The error is:

resource_key: None not found.
Init publishing dataset name: test_df resource_key: None fqtn RASGO.PUBLIC.TEST_DF_V_1
Publishing df as Rasgo dataset
Traceback (most recent call last):
  File "~\site-packages\pyrasgo\api\connection.py", line 181, in _raise_internal_api_error_if_any
    response.raise_for_status()
  File "~\site-packages\requests\models.py", line 960, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://api.rasgoml.com/v2/datasets/match/RASGO.PUBLIC.TEST_DF_V_1
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "~\site-packages\pyrasgo\api\get.py", line 60, in dataset
    response = self.api._get(f"/datasets/match/{fqtn}", api_version=2).json()
  File "~\site-packages\pyrasgo\api\connection.py", line 82, in _get
    self._raise_internal_api_error_if_any(response)
  File "~\site-packages\pyrasgo\api\connection.py", line 187, in _raise_internal_api_error_if_any
    raise APIError(
pyrasgo.api.error.APIError: Internal API Error when making GET request
Status Code: 404
Internal Error Details: {'message': 'Cannot access the Dataset you are looking for'}
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "~\rasgo.py", line 177, in rasgo_dataset_from_df
    return rasgo.publish.df(
  File "~\site-packages\pyrasgo\api\publish.py", line 164, in df
    ds = self.get.dataset(fqtn=fqtn)
  File "~\site-packages\pyrasgo\api\get.py", line 83, in dataset
    raise APIError(f"Dataset with fqtn '{fqtn}' does not exist or this API key does not have access.")
pyrasgo.api.error.APIError: Dataset with fqtn 'RASGO.PUBLIC.TEST_DF_V_1' does not exist or this API key does not have access.

I believe this has to do with publish.df() requiring dataset_table_name only on the very first publish. Afterwards, if you are trying to overwrite the existing object with fqtn=, then it will work.

So, you are either writing the FIRST dataset by passing with (fqtn=) then it will fail because it doesn’t exist.

We will be changing this in a future pyrasgo release.

For now,

p_df = pd.read_csv("some_file_path")

# first time
d1 = rasgo.publish.df(
    df=p_df,
    name="My-CSV-Dataset",
    description="I created a Dataset from the CSV some_file_path",
    dataset_table_name="some_file_path_table",
    verbose=True,
    attributes={"LOB": "Finance"}
)

# subsequent publish (fqtn/tablename already exists)
d2 = rasgo.publish.df(
    df=p_df,
    name="My-CSV-Dataset",
    description="I created a Dataset from the CSV some_file_path",
    fqtn="some_file_path_table",
    if_exists='overwrite',
    verbose=True,
    attributes={"LOB": "Finance"}
)

This topic was automatically closed after 365 days. New replies are no longer allowed.