Release: 0.9.10 legacy version | Release Date: July 22, 2015

SQLAlchemy 0.9 Documentation

Additional Persistence Techniques

Embedding SQL Insert/Update Expressions into a Flush

This feature allows the value of a database column to be set to a SQL expression instead of a literal value. It’s especially useful for atomic updates, calling stored procedures, etc. All you do is assign an expression to an attribute:

class SomeClass(object):
    pass
mapper(SomeClass, some_table)

someobject = session.query(SomeClass).get(5)

# set 'value' attribute to a SQL expression adding one
someobject.value = some_table.c.value + 1

# issues "UPDATE some_table SET value=value+1"
session.commit()

This technique works both for INSERT and UPDATE statements. After the flush/commit operation, the value attribute on someobject above is expired, so that when next accessed the newly generated value will be loaded from the database.

Using SQL Expressions with Sessions

SQL expressions and strings can be executed via the Session within its transactional context. This is most easily accomplished using the execute() method, which returns a ResultProxy in the same manner as an Engine or Connection:

Session = sessionmaker(bind=engine)
session = Session()

# execute a string statement
result = session.execute("select * from table where id=:id", {'id':7})

# execute a SQL expression construct
result = session.execute(select([mytable]).where(mytable.c.id==7))

The current Connection held by the Session is accessible using the connection() method:

connection = session.connection()

The examples above deal with a Session that’s bound to a single Engine or Connection. To execute statements using a Session which is bound either to multiple engines, or none at all (i.e. relies upon bound metadata), both execute() and connection() accept a mapper keyword argument, which is passed a mapped class or Mapper instance, which is used to locate the proper context for the desired engine:

Session = sessionmaker()
session = Session()

# need to specify mapper or class when executing
result = session.execute("select * from table where id=:id", {'id':7}, mapper=MyMappedClass)

result = session.execute(select([mytable], mytable.c.id==7), mapper=MyMappedClass)

connection = session.connection(MyMappedClass)

Fetching Server-Generated Defaults

As introduced in the sections Server-invoked DDL-Explicit Default Expressions and Marking Implicitly Generated Values, timestamps, and Triggered Columns, the Core supports the notion of database columns for which the database itself generates a value upon INSERT and in less common cases upon UPDATE statements. The ORM features support for such columns regarding being able to fetch these newly generated values upon flush. This behavior is required in the case of primary key columns that are generated by the server, since the ORM has to know the primary key of an object once it is persisted.

In the vast majority of cases, primary key columns that have their value generated automatically by the database are simple integer columns, which are implemented by the database as either a so-called “autoincrement” column, or from a sequence associated with the column. Every database dialect within SQLAlchemy Core supports a method of retrieving these primary key values which is often native to the Python DBAPI, and in general this process is automatic, with the exception of a database like Oracle that requires us to specify a Sequence explicitly. There is more documentation regarding this at Column.autoincrement.

For server-generating columns that are not primary key columns or that are not simple autoincrementing integer columns, the ORM requires that these columns are marked with an appropriate server_default directive that allows the ORM to retrieve this value. Not all methods are supported on all backends, however, so care must be taken to use the appropriate method. The two questions to be answered are, 1. is this column part of the primary key or not, and 2. does the database support RETURNING or an equivalent, such as “OUTPUT inserted”; these are SQL phrases which return a server-generated value at the same time as the INSERT or UPDATE statement is invoked. Databases that support RETURNING or equivalent include PostgreSQL, Oracle, and SQL Server. Databases that do not include SQLite and MySQL.

Case 1: non primary key, RETURNING or equivalent is supported

In this case, columns should be marked as FetchedValue or with an explicit Column.server_default. The orm.mapper.eager_defaults flag may be used to indicate that these columns should be fetched immediately upon INSERT and sometimes UPDATE:

class MyModel(Base):
    __tablename__ = 'my_table'

    id = Column(Integer, primary_key=True)
    timestamp = Column(DateTime(), server_default=func.now())

    # assume a database trigger populates a value into this column
    # during INSERT
    special_identifier = Column(String(50), server_default=FetchedValue())

    __mapper_args__ = {"eager_defaults": True}

Above, an INSERT statement that does not specify explicit values for “timestamp” or “special_identifier” from the client side will include the “timestamp” and “special_identifier” columns within the RETURNING clause so they are available immediately. On the PostgreSQL database, an INSERT for the above table will look like:

INSERT INTO my_table DEFAULT VALUES RETURNING my_table.id, my_table.timestamp, my_table.special_identifier

Case 2: non primary key, RETURNING or equivalent is not supported or not needed

This case is the same as case 1 above, except we don’t specify orm.mapper.eager_defaults:

class MyModel(Base):
    __tablename__ = 'my_table'

    id = Column(Integer, primary_key=True)
    timestamp = Column(DateTime(), server_default=func.now())

    # assume a database trigger populates a value into this column
    # during INSERT
    special_identifier = Column(String(50), server_default=FetchedValue())

After a record with the above mapping is INSERTed, the “timestamp” and “special_identifier” columns will remain empty, and will be fetched via a second SELECT statement when they are first accessed after the flush, e.g. they are marked as “expired”.

If the orm.mapper.eager_defaults is still used, and the backend database does not support RETURNING or an equivalent, the ORM will emit this SELECT statement immediately following the INSERT statement. This is often undesirable as it adds additional SELECT statements to the flush process that may not be needed. Using the above mapping with the orm.mapper.eager_defaults flag set to True against MySQL results in SQL like this upon flush (minus the comment, which is for clarification only):

INSERT INTO my_table () VALUES ()

-- when eager_defaults **is** used, but RETURNING is not supported
SELECT my_table.timestamp AS my_table_timestamp, my_table.special_identifier AS my_table_special_identifier
FROM my_table WHERE my_table.id = %s

Case 3: primary key, RETURNING or equivalent is supported

A primary key column with a server-generated value must be fetched immediately upon INSERT; the ORM can only access rows for which it has a primary key value, so if the primary key is generated by the server, the ORM needs a way for the database to give us that new value immediately upon INSERT.

As mentioned above, for integer “autoincrement” columns as well as PostgreSQL SERIAL, these types are handled automatically by the Core; databases include functions for fetching the “last inserted id” where RETURNING is not supported, and where RETURNING is supported SQLAlchemy will use that.

However, for non-integer values, as well as for integer values that must be explicitly linked to a sequence or other triggered routine, the server default generation must be marked in the table metadata.

For an explicit sequence as we use with Oracle, this just means we are using the Sequence construct:

class MyOracleModel(Base):
    __tablename__ = 'my_table'

    id = Column(Integer, Sequence("my_sequence"), primary_key=True)
    data = Column(String(50))

The INSERT for a model as above on Oracle looks like:

INSERT INTO my_table (id, data) VALUES (my_sequence.nextval, :data) RETURNING my_table.id INTO :ret_0

Where above, SQLAlchemy renders my_sequence.nextval for the primary key column and also uses RETURNING to get the new value back immediately.

For datatypes that generate values automatically, or columns that are populated by a trigger, we use FetchedValue. Below is a model that uses a SQL Server TIMESTAMP column as the primary key, which generates values automatically:

class MyModel(Base):
    __tablename__ = 'my_table'

    timestamp = Column(TIMESTAMP(), server_default=FetchedValue(), primary_key=True)

An INSERT for the above table on SQL Server looks like:

INSERT INTO my_table OUTPUT inserted.timestamp DEFAULT VALUES

Case 4: primary key, RETURNING or equivalent is not supported

In this area we are generating rows for a database such as SQLite or more typically MySQL where some means of generating a default is occurring on the server, but is outside of the database’s usual autoincrement routine. In this case, we have to make sure SQLAlchemy can “pre-execute” the default, which means it has to be an explicit SQL expression. Again using the example of TIMESTAMP for MySQL, we unfortunately need to use our own explicit default:

class MyModel(Base):
    __tablename__ = 'my_table'

    timestamp = Column(TIMESTAMP(), default=func.cast(func.now(), Binary), primary_key=True)

Where above, we select the “NOW()” function and also cast to binary to be used with MySQL’s TIMESTAMP column, that is in fact a binary datatype. The SQL generated by the above is:

SELECT CAST(now() AS BINARY) AS anon_1
INSERT INTO my_table (timestamp) VALUES (%s)
(b'2018-08-09 13:08:46',)

The Core currently does not support a means of fetching the timestamp value after the fact without using RETURNING, so on MySQL must run a SELECT ahead of time to pre-select the value.

Partitioning Strategies

Simple Vertical Partitioning

Vertical partitioning places different kinds of objects, or different tables, across multiple databases:

engine1 = create_engine('postgresql://db1')
engine2 = create_engine('postgresql://db2')

Session = sessionmaker(twophase=True)

# bind User operations to engine 1, Account operations to engine 2
Session.configure(binds={User:engine1, Account:engine2})

session = Session()

Above, operations against either class will make usage of the Engine linked to that class. Upon a flush operation, similar rules take place to ensure each class is written to the right database.

The transactions among the multiple databases can optionally be coordinated via two phase commit, if the underlying backend supports it. See Enabling Two-Phase Commit for an example.

Custom Vertical Partitioning

More comprehensive rule-based class-level partitioning can be built by overriding the Session.get_bind() method. Below we illustrate a custom Session which delivers the following rules:

  1. Flush operations are delivered to the engine named master.
  2. Operations on objects that subclass MyOtherClass all occur on the other engine.
  3. Read operations for all other classes occur on a random choice of the slave1 or slave2 database.
engines = {
    'master':create_engine("sqlite:///master.db"),
    'other':create_engine("sqlite:///other.db"),
    'slave1':create_engine("sqlite:///slave1.db"),
    'slave2':create_engine("sqlite:///slave2.db"),
}

from sqlalchemy.orm import Session, sessionmaker
import random

class RoutingSession(Session):
    def get_bind(self, mapper=None, clause=None):
        if mapper and issubclass(mapper.class_, MyOtherClass):
            return engines['other']
        elif self._flushing:
            return engines['master']
        else:
            return engines[
                random.choice(['slave1','slave2'])
            ]

The above Session class is plugged in using the class_ argument to sessionmaker:

Session = sessionmaker(class_=RoutingSession)

This approach can be combined with multiple MetaData objects, using an approach such as that of using the declarative __abstract__ keyword, described at __abstract__.

Horizontal Partitioning

Horizontal partitioning partitions the rows of a single table (or a set of tables) across multiple databases.

See the “sharding” example: Horizontal Sharding.

Previous: Transactions and Connection Management Next: Contextual/Thread-local Sessions