Connections

During development, we may want to use different connections to test our DAGs. Also, for the same connections we might want different details per environment.

For example, we may start development writing to a local file and after that is working correctly switch the connection to S3 to finish testing. Finally we want to move this to the production S3 bucket in a seamless way.

data_lake workflow
- local -> disk
- dev -> dev S3 bucket
- prod -> production S3 bucket

For this specific purpose there is a file called connections.yml. This is where we will define all our connection details for our project for all environments.

Connections YAML

For example in this project we will write data using a connection called data_lake which has two connection environments, local which writes to a local file and test which writes to S3. It is possible to use them interchangeably since they both implement the same interface: FileSystemHook.

connections.yml:

data_lake:

  local:
    conn_type: local_storage
    extra:
      base_path: /tmp/data_lake/

  test:
    conn_type: s3
    extra:
      bucket: my-typhoon-test-bucket

connections.yml is not versioned

The connections.yml file may contain passwords so it should never be versioned. That is why it's included in the .gitignore file for projects generated with the CLI.

Adding the connection

Following the advice we got from the typhoon status command we will now add the connection to the metadata database. We will add the local data_lake connection with the command:

typhoon connection add --conn-id data_lake --conn-env local
# Check that it was added
typhoon connection ls -l

# add the echo one too for completeness
typhoon connection add --conn-id echo --conn-env local

If we run the status command again we will see that everything is ok in our project now:

typhoon status

Connections types (Hooks)

Typhoon hooks represent the same thing as airlflow hooks, so if you are familiar it's an easy concept. They are the interface to external platforms and databases. You use the connections.yml to choose which hook to use and configure it. You can extend typhoon to connect to new platforms by adding Hooks. It comes with many popular ones already.

Hooks available:

File System - local, S3, GC storage, Ftp
DB API - most DBs can use this, e.g. MySql, MSSQL, Postgres, Redshift
Snowflake specific flavour = SQLAlchemy - most DBs can use this
AWS - AWS Session, DynamoDB
Singer - You can use any singer taps in your DAG as a task. singer can connect to

Any you can add/extend your own, of course.