Data specialists out there, curious to get your take
The company I work at is writing an API call, but rather than doing an extract to a data lake (which the company doesn't have) they are going to transform the data as part of the API call (python) before piping it into the data warehouse (AWS redshift).
This team has been driving everyone crazy wanting to know every possible use case for the data, including a bunch of internally defined attributes that aren't part of the raw data source, since they have to write an API script to churn out exactly the finished datatable
This seems like a bad idea to me...as soon as someone wants a new column or wants to revise some historical attribute, someone is going to get stuck rewriting what sounds like a messy API (the last update I got was they need to make 4 API calls just to join a basic table before adding our internal custom data). Is that a fair guess as to why each stage of ETL is usually kept separate?
This team has been driving everyone crazy wanting to know every possible use case for the data, including a bunch of internally defined attributes that aren't part of the raw data source, since they have to write an API script to churn out exactly the finished datatable
This seems like a bad idea to me...as soon as someone wants a new column or wants to revise some historical attribute, someone is going to get stuck rewriting what sounds like a messy API (the last update I got was they need to make 4 API calls just to join a basic table before adding our internal custom data). Is that a fair guess as to why each stage of ETL is usually kept separate?