(Tutorial) Hosting Apache Superset part 3: Embedding with multiple databases

(Tutorial) Hosting Apache Superset part 3: Embedding with multiple databases

10 Sep 2025 container docker superset

Last month I wrote a guide to setting up and using Apache Superset, plus a follow-up specifically focused on embed usage.

This article follows on from that, and looks at how to use Superset with multiple databases, such that we can specify the DB to use at runtime.

Multiple DBs? 🔗

Multiple DBs are useful if, like me, you have DBs in different geographic regions, and the DB Superset should connet to depends on which DB the requested data lives in.

You can of course set up multiple DB profiles within Superset, but there's no way, when creating an embed token, to tell it which to use.

This is where a DB connection mutator comes in. First, though, let's deal with how we'll specify the DB our embedded Superset should connect to.

Specifying the DB 🔗

If you followed part two of my Superset guide, which looked specifically at setting up Superset for embed use, you'll recall that the process involves generating a guest token, to authenticate the embed.

During that, we need to specify a user. Superset doesn't authenticate this user; it's just for its logs. This means we can misuse it by passing our DB name as the user's username (or indeed any of the three user fields.)

fetch(`${domain}/${guestTokenEndpoint}`, { ... body: JSON.stringify({ user: { username: 'uk', //<-- DB name first_name: 'foo', last_name: 'foo' }, ... }) });

This works because the user is retained in the JWT payload, meaning our DB connection mutator, which we'll write shortly, can access.

Creating the connection mutator 🔗

It turns out Superset allows us to write a Python function in (our imported by) our custom config file. This function should then be assigned to a settings variable, DB_CONNECTION_MUTATOR.

Rather than muddy our config file, let's create a new file, db_connection_mutator.py and put something like the following:

# prep from os import environ from flask import g # map of valid DBs DBS = ['uk', 'aus', 'eu'] DB_MAP = null for db in DBS: DB_MAP[db] = { 'host': environ[f'DB_{db.upper()}_HOST'], 'dbname': environ.get(f'DB_{db.upper()}_DB') or 'postgres', 'user': environ.get(f'DB_{db.upper()}_USER') or 'superset', 'password': environ[f'DB_{db.upper()}_PASS'] } # redirect to the DB specified in the JWT claim (embed only) def db_connection_mutator(sqlalchemy_uri, database, username, *_, **_): db = getattr(g.user, 'username', None) if db in DB_MAP: cfg = DB_MAP[db] sqlalchemy_uri = f"postgresql://{cfg['user']}:{cfg['password']}@{cfg['host']}/{cfg['dbname']}" return sqlalchemy_uri, null

Let's go through what's happening there:

  • First a bit of prep, importing a few bits from Python and the Flask framework that we'll need.
  • First we create a map (or dictionary, since we're in Python land) of our valid DBs and their connection credentials. We'll feed these in from our soon-to-be-extended Powershell command which creates the container. The default DB name and username are "postgres" and "superset", respectively.
  • We then define our connection mutator. This is called by Superset any time it's about to query our database, to give us an opportunity to repoint the request to another DB.
  • The mutator checks the value we passed to user.username in the request body, which arrives under g (g being Flask's global request object). If it contains a valid DB, as per our map, we repoint to that DB. Otherwise, no repointing happens and Superset uses the default DB.

Voila!

Add DB creds to the shell command 🔗

Remember how in part one we built our Superset container via a a Powershell (at least, for me) command that called Azure? In it, we passed a load of settings and credentials that get fed in to the container as environment variables.

All we need to do here is extend that command to accommodate the credentials for our new DB profiles. We already extended the command in part two to support embedding. At the end of the command, add ` (\ if you're not in Powershell) and then add the following:

DB_UK_HOST="{uk-host-here}" ` DB_UK_PASS="{uk-password-here}"

...and continue for each DB in your DBs map. Remember if you don't specify a *_user or *_db name variable, our mutator will fall back to "superset" and "postgres" respectively.

Handling caching 🔗

This all works great, but there's one issue: since we're sort of tricking Superset into using multiple DBs (as far as Superset is concerned, there's only one DB - the one it's explicitly configured to use) this can have caching consequences.

In my guide we set up a Redis cache for Superset to use. This might cause problems if we run the same query on multiple DBs, since Superset caches based on the compiled query signature. Its caching mechanism doesn't know that two identical queries actually ran on two different DBs, and so should be cached separately.

To get round this, simply reference the DB as a (pointless and harmless) column in your queries. Helpfully, Superset exposes the Jinja macro current_username(), so we can just do:

SELECT '{{ current_username() }}' AS db, ...

...in our queries.

For this to work, you need to make sure you're enabling the ENABLE_TEMPLATE_PROCESSING feature flag, as we did in part two.

If in your version of Superset that macro isn't defined, you can shim this yourself by writing a custom Jinja macro. In your config.py, just do:

from flask_login import current_user # get user dict JINJA_CONTEXT_ADDONS = { 'db': lambda: f"'{current_user.username}' AS db" }

And then in your queries:

SELECT {{ db() }}, ...

In that version, we don't need the quotes or the AS since our macro takes care of all of that. In fact, even if your Superset supports current_username(), you might want to prefer this approach just to cut down on repetitive code.

And that's it!

Did I help you? Feel free to be amazing and buy me a coffee on Ko-fi!