Querying data from SQLite with Mayim

(No resolution post this year huh? Maybe I’ll save for lunar new year.)

So I’m dabbling with Sanic as an asynchronous web framework, where I learned about Mayim, which is like a wrapper for SQL connectors to “hydrate” the data into models to be validated. Additionally, it also helps with separating the SQL queries from the code to call those more elegantly.

Using Mayim is really simple:

Put your SQL queries inside queries/ folder
Declare models for outputs
Write an executor class with methods named the same as the SQL files

These methods will execute the SQL queries and return the result as the model object.

(See their documentation for more info and code samples)

So far so good. I’m trying out with a small project as a playground for this, so I don’t want a DBMS, so I’m sticking to SQLite. That turns out to be some (solvable) troubles because there are stuff that’s specific to each SQL implementation.

Most of its example include only one table, to keep the query simple. For an actual app, there are usually several tables, with some one-to-many or many-to-many relations. The most complicated query I see in the doc is an one-to-one relation. It was also PostgresQL though, so they’re not immediately applicable for my experiment. Argh it requires pydantic, another library for this to work also—I’m expecting to keep the dependencies minimal, but okay.

For this experiment, I have these tables with products, categories, and prices with both one-to-many and many-to-many relations. The schema is defined by the queries below:

CREATE TABLE product (
  id INTEGER PRIMARY KEY,
  name TEXT NOT NULL,
);

CREATE TABLE price (
  id INTEGER PRIMARY KEY,
  product_id INTEGER NOT NULL,
  unit TEXT NOT NULL,
  price INTEGER NOT NULL,
  FOREIGN KEY (product_id) REFERENCES product (id),
);

CREATE TABLE category (
  id INTEGER PRIMARY KEY,
  name INTEGER TEXT UNIQUE NOT NULL
);

CREATE TABLE product_category (
  id INTEGER PRIMARY KEY,
  product_id INTEGER NOT NULL,
  category_id INTEGER NOT NULL,
  FOREIGN KEY (product_id) REFERENCES product (id),
  FOREIGN KEY (category_id) REFERENCES category (id)
);

class Price(BaseModel):
    id: int
    unit: str
    price: int


class Product(BaseModel):
    id: int
    name: str
    prices: list[Price]

Let’s try with the one-to-many one first—the product-price relation.

(As for why there are multiple prices for a single product: when one buys in bulk, the price is usually cheaper.)

Normally, one would query it as:

SELECT product.id, product.name, price, unit
FROM product
JOIN price ON product.id = price.product_id

That doesn’t work here: we need price as a list of price objects for each product. From the example linked above, we learned that nested object requires turning to JSON, so let’s look at SQLite’s document for JSON functions. json_group_array() seems to do what we need here, and we can use json_object() to construct the JSON.

SELECT product.id, name, json_group_array(
  json_object('id', price.id, 'price', price, 'unit', unit)
) as prices
FROM product JOIN price
ON product.id = price.product_id
GROUP BY product.id;

This turns out to be simpler than the PostgresQL example!

However, when I ran that:

pydantic_core._pydantic_core.ValidationError: 1 validation error for Product
prices
  Input should be a valid list [type=list_type, input_value='[{"id":1,"price":8000,...}]', input_type=str]
    For further information visit https://errors.pydantic.dev/2.5/v/list_type

It turns out, the SQLite library can only returns string and not JSON. Maybe they should support that? I don’t know. But we can solve that ourselves by extending pydantic’s BaseModel. To save you from inspecting pydantic’s source code, upon instantiation BaseModel takes **data as initialization parameters, and has model_fields attributes to store type constraints that we defined above. This maps from the field name to an object of type FieldInfo, which contains annotation, which is the data type that we need.

Let’s customize that to parse the JSON data:

class SQLiteModel(BaseModel):
    """Custom models to parse JSON from SQLite which is returned as string."""
    def __init__(self, /, **data):
        for field, info in self.model_fields.items():
            # To be careful, only try to parse data expected to be list|dict
            if (type(info.annotation) is types.GenericAlias
                    and info.annotation.__origin__ in [list, dict]
                    and type(data[field]) is str):
                data[field] = json.loads(data[field])
            else:
                if type(info.annotation) is types.GenericAlias:
                    print(data[field])
        super().__init__(**data)

That should work now.

Now, let’s add categories to the products:

class Product(SQLiteModel):
    id: int
    name: str
    prices: list[Price]
    categories: list[str]

I’m skipping a bit here because categories has only one data field, though I probably should write full objects as for prices.

SELECT product.id, product.name, json_group_array(category.name) as categories,
json_group_array(
  json_object('id', price.id, 'price', price, 'unit', unit)
) as prices
FROM product
JOIN price ON product.id = price.product_id
JOIN product_category ON product.id = product_category.product_id
JOIN category ON category.id = product_category.category_id
GROUP BY product.id;

Running that would result in rows with duplicated categories like:

Product(id=1, name='energy drink',
        prices=[Price(id=1, unit='bottle', price=12000),
                Price(id=2, unit='packets', price=70000)],
        categories=['drink', 'drink']),

This is because I joined three tables at once, and the categories are repeated the same number of the prices of a product. If I wrote a full model for Category, it wouldn’t run at all due to duplicated primary key. To solve this, simply add distinct keyword would work.

So, after this, I found writing bare queries for this small project not too complicated and the encountered problems quite trivial to solve, but I expect it to be less so for larger projects, and so appreciate how much ORM simplifies the process. I’ll try out some more with this approach, though.

P.S. Contact me for the test data, which is public but I don’t want to publish on this blog.

2024-01-05

Categories:

blog
dev

Tags: