Release: 1.4.0b1 pre release | Release Date: unreleased

SQLAlchemy 1.4 Documentation

SQLAlchemy 1.4 Documentation

Changes and Migration

Project Versions

What’s New in SQLAlchemy 1.4?

About this Document

This document describes changes between SQLAlchemy version 1.3 and SQLAlchemy version 1.4.

Version 1.4 is taking on a different focus than other SQLAlchemy releases in that it is in many ways attempting to serve as a potential migration point for a more dramatic series of API changes currently planned for release 2.0 of SQLAlchemy. The focus of SQLAlchemy 2.0 is a modernized and slimmed down API that removes lots of usage patterns that have long been discouraged, as well as mainstreams the best ideas in SQLAlchemy as first class API features, with the goal being that there is much less ambiguity in how the API is to be used, as well as that a series of implicit behaviors and rarely-used API flags that complicate the internals and hinder performance will be removed.

For the current status of SQLAlchemy 2.0, see SQLAlchemy 2.0 Transition.

Behavioral Changes - General

Transparent SQL Compilation Caching added to All DQL, DML Statements in Core, ORM

One of the most broadly encompassing changes to ever land in a single SQLAlchemy version, a many-month reorganization and refactoring of all querying systems from the base of Core all the way through ORM now allows the majority of Python computation involved producing SQL strings and related statement metadata from a user-constructed statement to be cached in memory, such that subsequent invocations of an identical statement construct will use 35-60% fewer resources.

This caching goes beyond the construction of the SQL string to also include the construction of result fetching structures that link the SQL construct to the result set, and in the ORM it includes the accommodation of ORM-enabled attribute loaders, relationship eager loaders and other options, and object construction routines that must be built up each time an ORM query seeks to run and construct ORM objects from result sets.

To introduce the general idea of the feature, given code from the Performance suite as follows, which will invoke a very simple query “n” times, for a default value of n=10000. The query returns only a single row, as the overhead we are looking to decrease is that of many small queries. The optimization is not as significant for queries that return many rows:

session = Session(bind=engine)
for id_ in random.sample(ids, n):
    result = session.query(Customer).filter( == id_).one()

This example in the 1.3 release of SQLAlchemy on a Dell XPS13 running Linux completes as follows:

test_orm_query : (10000 iterations); total time 3.440652 sec

In 1.4, the code above without modification completes:

test_orm_query : (10000 iterations); total time 2.367934 sec

This first test indicates that regular ORM queries when using caching can run over many iterations in the range of 30% faster.

“Baked Query” style construction now available for all Core and ORM Queries

The “Baked Query” extension has been in SQLAlchemy for several years and provides a caching system that is based on defining segments of SQL statements within Python functions, so that the functions both serve as cache keys (since they uniquely and persistently identify a specific line in the source code) as well as that they allow the construction of a statement to be deferred so that it only need to be invoked once, rather than every time the query is rendered. The functionality of “Baked Query” is now a native part of the new caching system, which is available by simply using Python functions, typically lambda expressions, either inside of a statement, or on the outside using the lambda_stmt() function that works just like a Baked Query.

Making use of the newer 2.0 style of using select() and adding the use of optional lambdas to defer the computation:

session = Session(bind=engine)
for id_ in random.sample(ids, n):
    stmt = lambda_stmt(lambda: future_select(Customer))
    stmt += lambda s: s.where( == id_)

The code above completes:

test_orm_query_newstyle_w_lambdas : (10000 iterations); total time 1.247092 sec

This test indicates that using the newer “select()” style of ORM querying, in conjunction with a full “baked” style invocation that caches the entire construction, can run over many iterations in the range of 60% faster. This performance is roughly the same as what the Baked Query extension provides as well. The new approach effectively supersedes the Baked Query extension.

For comparison, a Baked Query looks like the following:

bakery = baked.bakery()
s = Session(bind=engine)
for id_ in random.sample(ids, n):
    q = bakery(lambda s: s.query(Customer))
    q += lambda q: q.filter( == bindparam("id"))

The new API allows the same very fast “baked query” approach of building up a statement with lambdas, but does not require any other syntactical changes from regular statements. It also no longer requires that “bindparam()” is used for literal values that may change; the “closure” of the Python function is scanned on every call to extract Python literal values that should be turned into parameters.

Methodology Overview

SQLAlchemy has also for many years included a “compiled_cache” option that is used internally by the ORM flush process as well as the Baked Query extension, which caches a SQL expression object based on the identity of the object itself. That is, if you create a particular select() object and make use of the compiled cache feature, if you pass the same select() object each time, the SQL compilation would be cached. This feature was of limited use since SQLAlchemy’s programming paradigm is based on the continuous construction of new SQL expression objects each time one is required.

The new caching feature uses the same “compiled_cache”, however instead of using the statement object itself as the cache key, a separate tuple-oriented cache key is generated which represents the complete structure of the statement. Two SQL constructs that are composed in exactly the same way will produce the same cache key, independent of the bound parameter values that are bundled with the statement; these are collected separately from each statement and are used when the cached SQL is executed. The ORM Query integrates by producing a select() object from itself that is interpreted as an ORM-enabled SELECT within the SQL compilation process that occurs beyond the cache boundary.

A general listing of architectural changes needed to support this feature:

  • The system by which arguments passed to SQL constructs are type-checked and coerced into their desired form was rewritten from an ad-hoc and disorganized system into the sqlalchemy.sql.roles and sqlalchemy.sql.coercions modules which provide a type-based approach to the task of composing SQL expression objects, error handling, coercion of objects such as turning SELECT statements into subqueries, as well as integrating with a new “plugin” system that allows SQL constructs to include ORM functionality.

  • The system by which clause expressions constructs are iterated and compared from an object structure point of view was also rewritten from one which was ad-hoc and inconsistent into a complete system within the new sqlalchemy.sql.traversals module. A test suite was added which ensures that all SQL construction objects include fully consistent comparison and iteration behavior. This work began with #4336.

  • The new iteration system naturally gave rise to the cache-key creation system, which also uses a performance-optimized version of the sqlalchemy.sql.traversals module to generate a deterministic cache key for any SQL expression based on its structure. Two instances of a SQL expression that represent the same SQL structure, such as select(table('x', column('q'))).where(column('z') > 5), are guaranteed to produce the same cache key, independent of the bound parameters which for this statement would be one parameter with the value “5”. Two instances of a SQL expression where any elements are different will produce different cache keys. When the cache key is generated, the parameters are also collected which will be used to formulate the final parameter list. This work was completed over many merges and was overall related to #4639.

  • The mechanism by which statements such as select() generate expensive collections and datamembers that are only used for SQL compilation, such as the list of columns and their labels, were organized into a new decoupled system called CompileState.

  • All elements of queries that needed to be made compatible with the concept of deterministic SQL compilation were updated, including an expansion of the “postcompile” concept used to render individual parameters inside of “IN” expressions first included in 1.3 as well as alterations to how dialects like the SQL Server dialect render LIMIT / OFFSET expressions that are not compatible with bound parameters.

  • The ORM Query object was fully refactored such that all of the intense computation which would previously occur whenever methods of Query were called, such as the construction of the Query itself, when methods filter() or join() would be called, etc., was completely reorganized to take place within the CompileState architecture, meaning the ORM process that generates a Core select() to render now takes place within the SQL compilation process, beyond the caching boundary. More detail on this change is at Many Core and ORM statement objects now perform much of their validation in the compile phase.

  • The Query object was unified with the select() object, such that these two objects now have cross-compatible internal state. The Query can turn itself into a select() that generates ORM queries by copying its __dict__ into a new Select object.

  • The 2.0-style Result object as well as the “future” version of Engine were developed and integrated into Core and later the ORM also integrated on top of Result.

  • The Core and ORM execution models were completely reworked to integrate the new cache key system, and in particular the ORM Query was reworked such that its execution model now produces a Select which is passed to Session.execute(), which then invokes the 2.0-style execution model that allows the Select to be processed as an ORM query beyond the caching boundary.

  • Other systems such as Query bulk updates and deletes, the horizontal sharding extension, the Baked Query extension, and the dogpile caching example were updated to integrate with the new execution model and a new event hook SessionEvents.do_orm_execute() has been added.

  • The caching has been enabled via the create_engine.query_cache_size parameter, new logging features were added, and the “lambda” argument construction module was added.

#4639 #5380 #4645 #4808 #5004

Many Core and ORM statement objects now perform much of their validation in the compile phase

A major initiative in the 1.4 series is to approach the model of both Core SQL statements as well as the ORM Query to allow for an efficient, cacheable model of statement creation and compilation, where the compilation step would be cached, based on a cache key generated by the created statement object, which itself is newly created for each use. Towards this goal, much of the Python computation which occurs within the construction of statements, particularly the ORM Query, is being moved to occur later, when the statement is actually compiled, and additionally that it will only occur if the compiled form of the statement is not already cached. This means that some of the error messages which can arise based on arguments passed to the object will no longer be raised immediately, and instead will occur only when the statement is invoked and its compiled form is not yet cached.

Error conditions which fall under this category include:

  • when a _selectable.CompoundSelect is constructed (e.g. a UNION, EXCEPT, etc.) and the SELECT statements passed do not have the same number of columns, a CompileError is now raised to this effect; previously, an ArgumentError would be raised immediately upon statement construction.

  • Various error conditions which may arise when calling upon Query.join() will be evaluated at statement compilation time rather than when the method is first called.

Repaired internal importing conventions such that code linters may work correctly

SQLAlchemy has for a long time used a parameter-injecting decorator to help resolve mutually-dependent module imports, like this:

def insert(self, dml, *args, **kw):

Where the above function would be rewritten to no longer have the dml parameter on the outside. This would confuse code-linting tools into seeing a missing parameter to functions. A new approach has been implemented internally such that the function’s signature is no longer modified and the module object is procured inside the function instead.



API Changes - Core

A SELECT statement is no longer implicitly considered to be a FROM clause

This change is one of the larger conceptual changes in SQLAlchemy in many years, however it is hoped that the end user impact is relatively small, as the change more closely matches what databases like MySQL and PostgreSQL require in any case.

The most immediate noticeable impact is that a select() can no longer be embedded inside of another select() directly, without explicitly turning the inner select() into a subquery first. This is historically performed by using the SelectBase.alias() method, which remains, however is more explicitly suited by using a new method SelectBase.subquery(); both methods do the same thing. The object returned is now Subquery, which is very similar to the Alias object and shares a common base AliasedReturnsRows.

That is, this will now raise:

stmt1 = select([,])
stmt2 = select([addresses, stmt1]).select_from(addresses.join(stmt1))


sqlalchemy.exc.ArgumentError: Column expression or FROM clause expected,
got <...Select object ...>. To create a FROM clause from a <class
'sqlalchemy.sql.selectable.Select'> object, use the .subquery() method.

The correct calling form is instead:

sq1 = select([,]).subquery()
stmt2 = select([addresses, sq1]).select_from(addresses.join(sq1))

Noting above that the SelectBase.subquery() method is essentially equivalent to using the SelectBase.alias() method.

The above calling form is typically required in any case as the call to SelectBase.subquery() or SelectBase.alias() is needed to ensure the subquery has a name. The MySQL and PostgreSQL databases do not accept unnamed subqueries in the FROM clause and they are of limited use on other platforms; this is described further below.

Along with the above change, the general capability of select() and related constructs to create unnamed subqueries, which means a FROM subquery that renders without any name i.e. “AS somename”, has been removed, and the ability of the select() construct to implicitly create subqueries without explicit calling code to do so is mostly deprecated. In the above example, as has always been the case, using the SelectBase.alias() method as well as the new SelectBase.subquery() method without passing a name will generate a so-called “anonymous” name, which is the familiar anon_1 name we see in SQLAlchemy queries:

SELECT,, addresses.user_id,,
addresses JOIN
(SELECT AS id, AS name FROM users) AS anon_1
ON addresses.user_id =

Unnamed subqueries in the FROM clause (which note are different from so-called “scalar subqueries” which take the place of a column expression in the columns clause or WHERE clause) are of extremely limited use in SQL, and their production in SQLAlchemy has mostly presented itself as an undesirable behavior that needs to be worked around. For example, both the MySQL and PostgreSQL outright reject the usage of unnamed subqueries:

# MySQL / MariaDB:

MariaDB [(none)]> select * from (select 1);
ERROR 1248 (42000): Every derived table must have its own alias

# PostgreSQL:

test=> select * from (select 1);
ERROR:  subquery in FROM must have an alias
LINE 1: select * from (select 1);
HINT:  For example, FROM (SELECT ...) [AS] foo.

A database like SQLite accepts them, however it is still often the case that the names produced from such a subquery are too ambiguous to be useful:

sqlite> CREATE TABLE a(id integer);
sqlite> CREATE TABLE b(id integer);
Error: ambiguous column name: id
Error: no such column:

# use a name
sqlite> SELECT * FROM a JOIN (SELECT * FROM b) AS anon_1 ON;

Due to the above limitations, there are very few places in SQLAlchemy where such a query form was valid; the one exception was within the Oracle dialect where they were used to create OFFSET / LIMIT subqueries as Oracle does not support these keywords directly; this implementation has been replaced by one which uses anonymous subqueries. Throughout the ORM, exception cases that detect where a SELECT statement would be SELECTed from either encourage the user to, or implicitly create, an anonymously named subquery; it is hoped by moving to an all-explicit subquery much of the complexity incurred by these areas can be removed.

As SelectBase objects are no longer FromClause objects, attributes like the .c attribute as well as methods like .select(), .join(), and .outerjoin() upon SelectBase are now deprecated, as these methods all imply implicit production of a subquery. Instead, as is already what the vast majority of applications have to do in any case, invoking SelectBase.alias() or SelectBase.subquery() will provide for a Subquery object that provides all these attributes, as it is part of the FromClause hierarchy. In the interim, these methods are still available, however they now produce an anonymously named subquery rather than an unnamed one, and this subquery is distinct from the SelectBase construct itself.

In place of the .c attribute, a new attribute SelectBase.selected_columns is added. This attribute resolves to a column collection that is what most people hope that .c does (but does not), which is to reference the columns that are in the columns clause of the SELECT statement. A common beginner mistake is code such as the following:

stmt = select([users])
stmt = stmt.where( == 'foo')

The above code appears intuitive and that it would generate “SELECT * FROM users WHERE name=’foo’”, however veteran SQLAlchemy users will recognize that it in fact generates a useless subquery resembling “SELECT * FROM (SELECT * FROM users) WHERE name=’foo’”.

The new SelectBase.selected_columns attribute however does suit the use case above, as in a case like the above it links directly to the columns present in the users.c collection:

stmt = select([users])
stmt = stmt.where( == 'foo')

There is of course the notion that perhaps .c on SelectBase could simply act the way SelectBase.selected_columns does above, however in light of the fact that .c is strongly associated with the FromClause hierarchy, meaning that it is a set of columns that can be directly in the FROM clause of another SELECT, it’s better that a column collection that serves an entirely different purpose have a new name.

In the bigger picture, the reason this change is being made now is towards the goal of unifying the ORM Query object into the SelectBase hierarchy in SQLAlchemy 2.0, so that the ORM will have a “select()” construct that extends directly from the existing select() object, having the same methods and behaviors except that it will have additional ORM functionality. All statement objects in Core will also be fully cacheable using a new system that resembles “baked queries” except that it will work transparently for all statements across Core and ORM. In order to achieve this, the Core class hierarchy needs to be refined to behave in such a way that is more easily compatible with the ORM, and the ORM class hierarchy needs to be refined so that it is more compatible with Core.


select() now accepts positional expressions

The select() construct will now accept “columns clause” arguments positionally:

# new way, supports 2.0
stmt = select(table.c.col1, table.c.col2, ...)

When sending the arguments positionally, no other keyword arguments are permitted. In SQLAlchemy 2.0, the above calling style will be the only calling style supported.

For the duration of 1.4, the previous calling style will still continue to function, which passes the list of columns or other expressions as a list:

# old way, still works in 1.4
stmt = select([table.c.col1, table.c.col2, ...])

The above legacy calling style also accepts the old keyword arguments that have since been removed from most narrative documentation:

# very much the old way, but still works in 1.4
stmt = select([table.c.col1, table.c.col2, ...], whereclause=table.c.col1 == 5)

The detection between the two styles is based on whether or not the first positional argument is a list. There are unfortunately still likely some usages that look like the following, where the keyword for the “whereclause” is omitted:

# very much the old way, but still works in 1.4
stmt = select([table.c.col1, table.c.col2, ...], table.c.col1 == 5)

As part of this change, the Select construct also gains the 2.0-style “future” API which includes an updated Select.join() method as well as methods like Select.filter_by() and Select.join_from().


All IN expressions render parameters for each value in the list on the fly (e.g. expanding parameters)

The “expanding IN” feature, first introduced in Late-expanded IN parameter sets allow IN expressions with cached statements, has matured enough such that it is clearly superior to the previous method of rendering IN expressions. As the approach was improved to handle empty lists of values, it is now the only means that Core / ORM will use to render lists of IN parameters.

The previous approach which has been present in SQLAlchemy since its first release was that when a list of values were passed to the ColumnOperators.in_() method, the list would be expanded into a series of individual BindParameter objects at statement construction time. This suffered from the limitation that it was not possible to vary the parameter list at statement execution time based on the parameter dictionary, which meant that string SQL statements could not be cached independently of their parameters, nor could the parameter dictionary be fully used for statements that included IN expressions generally.

In order to service the “baked query” feature described at Baked Queries, a cacheable version of IN was needed, which is what brought about the “expanding IN” feature. In contrast to the existing behavior whereby the parameter list is expanded at statement construction time into individual BindParameter objects, the feature instead uses a single BindParameter that stores the list of values at once; when the statement is executed by the Engine, it is “expanded” on the fly into individual bound parameter positions based on the parameters passed to the call to Connection.execute(), and the existing SQL string which may have been retrieved from a previous execution is modified using a regular expression to suit the current parameter set. This allows for the same Compiled object, which stores the rendered string statement, to be invoked multiple times against different parameter sets that modify the list contents passed to IN expressions, while still maintaining the behavior of individual scalar parameters being passed to the DBAPI. While some DBAPIs do support this functionality directly, it is not generally available; the “expanding IN” feature now supports the behavior consistently for all backends.

As a major focus of 1.4 is to allow for true statement caching in Core and ORM without the awkwardness of the “baked” system, and since the “expanding IN” feature represents a simpler approach to building expressions in any case, it’s now invoked automatically whenever a list of values is passed to an IN expression:

stmt = select([,]).where([1, 2, 3]))

The pre-execution string representation is:

>>> print(stmt)

To render the values directly, use literal_binds as was the case previously:

>>> print(stmt.compile(compile_kwargs={"literal_binds": True}))
WHERE IN (1, 2, 3)

A new flag, “render_postcompile”, is added as a helper to allow the current bound value to be rendered as it would be passed to the database:

>>> print(stmt.compile(compile_kwargs={"render_postcompile": True}))
WHERE IN (:id_1_1, :id_1_2, :id_1_3)

Engine logging output shows the ultimate rendered statement as well:

INFO sqlalchemy.engine.base.Engine SELECT,
WHERE IN (?, ?, ?)
INFO sqlalchemy.engine.base.Engine (1, 2, 3)

As part of this change, the behavior of “empty IN” expressions, where the list parameter is empty, is now standardized on use of the IN operator against a so-called “empty set”. As there is no standard SQL syntax for empty sets, a SELECT that returns no rows is used, tailored in specific ways for each backend so that the database treats it as an empty set; this feature was first introduced in version 1.3 and is described at Expanding IN feature now supports empty lists. The create_engine.empty_in_strategy parameter, introduced in version 1.2 as a means for migrating for how this case was treated for the previous IN system, is now deprecated and this flag no longer has an effect; as described in The IN / NOT IN operator’s empty collection behavior is now configurable; default expression simplified, this flag allowed a dialect to switch between the original system of comparing a column against itself, which turned out to be a huge performance issue, and a newer system of comparing “1 != 1” in order to produce a “false” expression. The 1.3 introduced behavior which now takes place in all cases is more correct than both approaches as the IN operator is still used, and does not have the performance issue of the original system.

In addition, the “expanding” parameter system has been generalized so that it also services other dialect-specific use cases where a parameter cannot be accommodated by the DBAPI or backing database; see New “post compile” bound parameters used for LIMIT/OFFSET in Oracle, SQL Server for details.


New Result object

The ResultProxy object has been replaced with the 2.0 -style Result object discussed at ResultProxy replaced with Result which has more refined methods and behaviors. This result object is fully compatible with ResultProxy and includes many new features, that are now applied to both Core and ORM results equally, including methods such as:

When using Core, the object returned is an instance of CursorResult, which continues to feature the same API features as ResultProxy regarding inserted primary keys, defaults, rowcounts, etc. For ORM, a Result subclass will be returned that performs translation of Core rows into ORM rows, and then allows all the same operations to take place.




RowProxy is no longer a “proxy”; is now called Row and behaves like an enhanced named tuple

The RowProxy class, which represents individual database result rows in a Core result set, is now called Row and is no longer a “proxy” object; what this means is that when the Row object is returned, the row is a simple tuple that contains the data in its final form, already having been processed by result-row handling functions associated with datatypes (examples include turning a date string from the database into a datetime object, a JSON string into a Python json.loads() result, etc.).

The immediate rationale for this is so that the row can act more like a Python named tuple, rather than a mapping, where the values in the tuple are the subject of the __contains__ operator on the tuple, rather than the keys. With Row acting like a named tuple, it is then suitable for use as as replacement for the ORM’s KeyedTuple object, leading to an eventual API where both the ORM and Core deliver result sets that behave identically. Unification of major patterns within ORM and Core is a major goal of SQLAlchemy 2.0, and release 1.4 aims to have most or all of the underlying architectural patterns in place in order to support this process. The note in The “KeyedTuple” object returned by Query is replaced by Row describes the ORM’s use of the Row class.

For release 1.4, the Row class provides an additional subclass LegacyRow, which is used by Core and provides a backwards-compatible version of RowProxy while emitting deprecation warnings for those API features and behaviors that will be moved. ORM Query now makes use of Row directly as a replacement for KeyedTuple.

The LegacyRow class is a transitional class where the __contains__ method is still testing against the keys, not the values, while emitting a deprecation warning when the operation succeeds. Additionally, all the other mapping-like methods on the previous RowProxy are deprecated, including LegacyRow.keys(), LegacyRow.items(), etc. For mapping-like behaviors from a Row object, including support for these methods as well as a key-oriented __contains__ operator, the API going forward will be to first access a special attribute Row._mapping, which will then provide a complete mapping interface to the row, rather than a tuple interface.

Rationale: To behave more like a named tuple rather than a mapping

The difference between a named tuple and a mapping as far as boolean operators can be summarized. Given a “named tuple” in pseudocode as:

row = (id: 5,  name: 'some name')

The biggest cross-incompatible difference is the behavior of __contains__:

"id" in row          # True for a mapping, False for a named tuple
"some name" in row   # False for a mapping, True for a named tuple

In 1.4, when a LegacyRow is returned by a Core result set, the above "id" in row comparison will continue to succeed, however a deprecation warning will be emitted. To use the “in” operator as a mapping, use the Row._mapping attribute:

"id" in row._mapping

SQLAlchemy 2.0’s result object will feature a .mappings() modifier so that these mappings can be received directly:

# using sqlalchemy.future package
for row in result.mappings():

Proxying behavior goes away, was also unnecessary in modern usage

The refactor of Row to behave like a tuple requires that all data values be fully available up front. This is an internal behavior change from that of RowProxy, where result-row processing functions would be invoked at the point of accessing an element of the row, instead of when the row was first fetched. This means for example when retrieving a datetime value from SQLite, the data for the row as present in the RowProxy object would previously have looked like:

row_proxy = (1, '2019-12-31 19:56:58.272106')

and then upon access via __getitem__, the datetime.strptime() function would be used on the fly to convert the above string date into a datetime object. With the new architecture, the datetime() object is present in the tuple when it is returned, the datetime.strptime() function having been called just once up front:

row = (1, datetime.datetime(2019, 12, 31, 19, 56, 58, 272106))

The RowProxy and Row objects in SQLAlchemy are where the majority of SQLAlchemy’s C extension code takes place. This code has been highly refactored to provide the new behavior in an efficient manner, and overall performance has been improved as the design of Row is now considerably simpler.

The rationale behind the previous behavior assumed a usage model where a result row might have dozens or hundreds of columns present, where most of those columns would not be accessed, and for which a majority of those columns would require some result-value processing function. By invoking the processing function only when needed, the goal was that lots of result processing functions would not be necessary, thus increasing performance.

There are many reasons why the above assumptions do not hold:

  1. the vast majority of row-processing functions called were to unicode decode a bytestring into a Python unicode string under Python 2. This was right as Python Unicode was beginning to see use and before Python 3 existed. Once Python 3 was introduced, within a few years, all Python DBAPIs took on the proper role of supporting the delivering of Python Unicode objects directly, under both Python 2 and Python 3, as an option in the former case and as the only way forward in the latter case. Eventually, in most cases it became the default for Python 2 as well. SQLAlchemy’s Python 2 support still enables explicit string-to-unicode conversion for some DBAPIs such as cx_Oracle, however it is now performed at the DBAPI level rather than as a standard SQLAlchemy result row processing function.

  2. The above string conversion, when it is used, was made to be extremely performant via the C extensions, so much so that even in 1.4, SQLAlchemy’s byte-to-unicode codec hook is plugged into cx_Oracle where it has been observed to be more performant than cx_Oracle’s own hook; this meant that the overhead for converting all strings in a row was not as significant as it originally was in any case.

  3. Row processing functions are not used in most other cases; the exceptions are SQLite’s datetime support, JSON support for some backends, some numeric handlers such as string to Decimal. In the case of Decimal, Python 3 also standardized on the highly performant cdecimal implementation, which is not the case in Python 2 which continues to use the much less performant pure Python version.

  4. Fetching full rows where only a few columns are needed is not common within real-world use cases In the early days of SQLAlchemy, database code from other languages of the form “row = fetch(‘SELECT * FROM table’)” was common; using SQLAlchemy’s expression language however, code observed in the wild typically makes use of the specific columns needed.


New Features - ORM

Raiseload for Columns

The “raiseload” feature, which raises InvalidRequestError when an unloaded attribute is accessed, is now available for column-oriented attributes using the defer.raiseload parameter of defer(). This works in the same manner as that of the raiseload() option used by relationship loading:

book = session.query(Book).options(defer(Book.summary, raiseload=True)).first()

# would raise an exception

To configure column-level raiseload on a mapping, the deferred.raiseload parameter of deferred() may be used. The undefer() option may then be used at query time to eagerly load the attribute:

class Book(Base):
    __tablename__ = 'book'

    book_id = Column(Integer, primary_key=True)
    title = Column(String(200), nullable=False)
    summary = deferred(Column(String(2000)), raiseload=True)
    excerpt = deferred(Column(Text), raiseload=True)

book_w_excerpt = session.query(Book).options(undefer(Book.excerpt)).first()

It was originally considered that the existing raiseload() option that works for relationship() attributes be expanded to also support column-oriented attributes. However, this would break the “wildcard” behavior of raiseload(), which is documented as allowing one to prevent all relationships from loading:

    joinedload(Order.items), raiseload('*'))

Above, if we had expanded raiseload() to accommodate for columns as well, the wildcard would also prevent columns from loading and thus be a backwards incompatible change; additionally, it’s not clear if raiseload() covered both column expressions and relationships, how one would achieve the effect above of only blocking relationship loads, without new API being added. So to keep things simple, the option for columns remains on defer():

raiseload() - query option to raise for relationship loads

defer.raiseload - query option to raise for column expression loads

As part of this change, the behavior of “deferred” in conjunction with attribute expiration has changed. Previously, when an object would be marked as expired, and then unexpired via the access of one of the expired attributes, attributes which were mapped as “deferred” at the mapper level would also load. This has been changed such that an attribute that is deferred in the mapping will never “unexpire”, it only loads when accessed as part of the deferral loader.

An attribute that is not mapped as “deferred”, however was deferred at query time via the defer() option, will be reset when the object or attribute is expired; that is, the deferred option is removed. This is the same behavior as was present previously.


ORM Batch inserts with psycopg2 now batch statements with RETURNING in most cases

The change in psycopg2 dialect features “execute_values” with RETURNING for INSERT statements by default adds support for “executemany” + “RETURNING” at the same time in Core, which is now enabled for the psycopg2 dialect by default using the psycopg2 execute_values() extension. The ORM flush process now makes use of this feature such that the retrieval of newly generated primary key values and server defaults can be achieved while not losing the performance benefits of being able to batch INSERT statements together. Additionally, psycopg2’s execute_values() extension itself provides a five-fold performance improvement over psycopg2’s default “executemany” implementation, by rewriting an INSERT statement to include many “VALUES” expressions all in one statement rather than invoking the same statement repeatedly, as psycopg2 lacks the ability to PREPARE the statement ahead of time as would normally be expected for this approach to be performant.

SQLAlchemy includes a performance suite within its examples, where we can compare the times generated for the “batch_inserts” runner against 1.3 and 1.4, revealing a 3x-5x speedup for most flavors of batch insert:

# 1.3
$ python -m examples.performance bulk_inserts --dburl postgresql://scott:tiger@localhost/test
test_flush_no_pk : (100000 iterations); total time 14.051527 sec
test_bulk_save_return_pks : (100000 iterations); total time 15.002470 sec
test_flush_pk_given : (100000 iterations); total time 7.863680 sec
test_bulk_save : (100000 iterations); total time 6.780378 sec
test_bulk_insert_mappings :  (100000 iterations); total time 5.363070 sec
test_core_insert : (100000 iterations); total time 5.362647 sec

# 1.4 with enhancement
$ python -m examples.performance bulk_inserts --dburl postgresql://scott:tiger@localhost/test
test_flush_no_pk : (100000 iterations); total time 3.820807 sec
test_bulk_save_return_pks : (100000 iterations); total time 3.176378 sec
test_flush_pk_given : (100000 iterations); total time 4.037789 sec
test_bulk_save : (100000 iterations); total time 2.604446 sec
test_bulk_insert_mappings : (100000 iterations); total time 1.204897 sec
test_core_insert : (100000 iterations); total time 0.958976 sec

Note that the execute_values() extension modifies the INSERT statement in the psycopg2 layer, after it’s been logged by SQLAlchemy. So with SQL logging, one will see the parameter sets batched together, but the joining of multiple “values” will not be visible on the application side:

2020-06-27 19:08:18,166 INFO sqlalchemy.engine.Engine INSERT INTO a (data) VALUES (%(data)s) RETURNING
2020-06-27 19:08:18,166 INFO sqlalchemy.engine.Engine [generated in 0.00698s] ({'data': 'data 1'}, {'data': 'data 2'}, {'data': 'data 3'}, {'data': 'data 4'}, {'data': 'data 5'}, {'data': 'data 6'}, {'data': 'data 7'}, {'data': 'data 8'}  ... displaying 10 of 4999 total bound parameter sets ...  {'data': 'data 4998'}, {'data': 'data 4999'})
2020-06-27 19:08:18,254 INFO sqlalchemy.engine.Engine COMMIT

The ultimate INSERT statement can be seen by enabling statement logging on the PostgreSQL side:

2020-06-27 19:08:18.169 EDT [26960] LOG:  statement: INSERT INTO a (data)
VALUES ('data 1'),('data 2'),('data 3'),('data 4'),('data 5'),('data 6'),('data
7'),('data 8'),('data 9'),('data 10'),('data 11'),('data 12'),
... ('data 999'),('data 1000') RETURNING

2020-06-27 19:08:18.175 EDT
[26960] LOG:  statement: INSERT INTO a (data) VALUES ('data 1001'),('data
1002'),('data 1003'),('data 1004'),('data 1005 '),('data 1006'),('data
1007'),('data 1008'),('data 1009'),('data 1010'),('data 1011'), ...

The feature batches rows into groups of 1000 by default which can be affected using the executemany_values_page_size argument documented at Psycopg2 Fast Execution Helpers.


ORM Bulk Update and Delete use RETURNING for “fetch” strategy when available

An ORM bulk update or delete that uses the “fetch” strategy:

sess.query(User).filter(User.age > 29).update(
    {"age": User.age - 10}, synchronize_session="fetch"

Will now use RETURNING if the backend database supports it; this currently includes PostgreSQL and SQL Server (the Oracle dialect does not support RETURNING of multiple rows):

UPDATE users SET age_int=(users.age_int - %(age_int_1)s) WHERE users.age_int > %(age_int_2)s RETURNING
[generated in 0.00060s] {'age_int_1': 10, 'age_int_2': 29}
Col ('id',)
Row (2,)
Row (4,)

For backends that do not support RETURNING of multiple rows, the previous approach of emitting SELECT for the primary keys beforehand is still used:

SELECT FROM users WHERE users.age_int > %(age_int_1)s
[generated in 0.00043s] {'age_int_1': 29}
Col ('id',)
Row (2,)
Row (4,)
UPDATE users SET age_int=(users.age_int - %(age_int_1)s) WHERE users.age_int > %(age_int_2)s
[generated in 0.00102s] {'age_int_1': 10, 'age_int_2': 29}

One of the intricate challenges of this change is to support cases such as the horizontal sharding extension, where a single bulk update or delete may be multiplexed among backends some of which support RETURNING and some don’t. The new 1.4 execution archiecture supports this case so that the “fetch” strategy can be left intact with a graceful degrade to using a SELECT, rather than having to add a new “returning” strategy that would not be backend-agnostic.

As part of this change, the “fetch” strategy is also made much more efficient in that it will no longer expire the objects located which match the rows, for Python expressions used in the SET clause which can be evaluated in Python; these are instead assigned directly onto the object in the same way as the “evaluate” strategy. Only for SQL expressions that can’t be evaluated does it fall back to expiring the attributes. The “evaluate” strategy has also been enhanced to fall back to “expire” for a value that cannot be evaluated.

Behavioral Changes - ORM

The “KeyedTuple” object returned by Query is replaced by Row

As discussed at RowProxy is no longer a “proxy”; is now called Row and behaves like an enhanced named tuple, the Core RowProxy object is now replaced by a class called Row. The base Row object now behaves more fully like a named tuple, and as such it is now used as the basis for tuple-like results returned by the Query object, rather than the previous “KeyedTuple” class.

The rationale is so that by SQLAlchemy 2.0, both Core and ORM SELECT statements will return result rows using the same Row object which behaves like a named tuple. Dictionary-like functionality is available from Row via the Row._mapping attribute. In the interim, Core result sets will make use of a Row subclass LegacyRow which maintains the previous dict/tuple hybrid behavior for backwards compatibility while the Row class will be used directly for ORM tuple results returned by the Query object.

Effort has been made to get most of the featureset of Row to be available within the ORM, meaning that access by string name as well as entity / column should work:

row = s.query(User, Address).join(User.addresses).first()

row._mapping[User]  # same as row[0]
row._mapping[Address]  # same as row[1]
row._mapping["User"]  # same as row[0]
row._mapping["Address"]  # same as row[1]

u1 = aliased(User)
row = s.query(u1).only_return_tuples(True).first()
row._mapping[u1]  # same as row[0]

row = (
    s.query(, Address.email_address)

row._mapping[]  # same as row[0]
row._mapping["id"]  # same as row[0]
row._mapping[]  # same as row[0]


Session does not immediately create a new SessionTransaction object

The Session object’s default behavior of autocommit=False historically has meant that there is always a SessionTransaction object in play, associated with the Session via the Session.transaction attribute. When the given SessionTransaction was complete, due to a commit, rollback, or close, it was immediately replaced with a new one. The SessionTransaction by itself does not imply the usage of any connection-oriented resources, so this long-standing behavior has a particular elegance to it in that the state of Session.transaction is always predictable as non-None.

However, as part of the initiative in #5056 to greatly reduce reference cycles, this assumption means that calling upon Session.close() results in a Session object that still has reference cycles and is more expensive to clean up, not to mention that there is a small overhead in constructing the SessionTransaction object, which meant that there would be unnecessary overhead created for a Session that for example invoked Session.commit() and then Session.close().

As such, it was decided that Session.close() should leave the internal state of self.transaction, now referred to internally as self._transaction, as None, and that a new SessionTransaction should only be created when needed. For consistency and code coverage, this behavior was also expanded to include all the points at which “autobegin” is expected, not just when Session.close() were called.

In particular, this causes a behavioral change for applications which subscribe to the SessionEvents.after_transaction_create() event hook; previously, this event would be emitted when the Session were first constructed, as well as for most actions that closed the previous transaction and would emit SessionEvents.after_transaction_end(). The new behavior is that SessionEvents.after_transaction_create() is emitted on demand, when the Session has not yet created a new SessionTransaction object and mapped objects are associated with the Session through methods like Session.add() and Session.delete(), when the Session.transaction attribute is called upon, when the Session.flush() method has tasks to complete, etc.

Besides the change in when the SessionEvents.after_transaction_create() event is emitted, the change should have no other user-visible impact on the Session object’s behavior; the Session will continue to have the behavior that it remains usable for new operations after Session.close() is called, and the sequencing of how the Session interacts with the Engine and the database itself should also remain unaffected, since these operations were already operating in an on-demand fashion.


Eager loaders emit during unexpire operations

A long sought behavior was that when an expired object is accessed, configured eager loaders will run in order to eagerly load relationships on the expired object when the object is refreshed or otherwise unexpired. This behavior has now been added, so that joinedloaders will add inline JOINs as usual, and selectin/subquery loaders will run an “immediateload” operation for a given relationship, when an expired object is unexpired or an object is refreshed:

>>> a1 = session.query(A).options(joinedload(
>>> = 'new data'
>>> session.commit()

Above, the A object was loaded with a joinedload() option associated with it in order to eagerly load the bs collection. After the session.commit(), the state of the object is expired. Upon accessing the .data column attribute, the object is refreshed and this will now include the joinedload operation as well:

SELECT AS a_id, AS a_data, AS b_1_id, b_1.a_id AS b_1_a_id
FROM a LEFT OUTER JOIN b AS b_1 ON = b_1.a_id

The behavior applies both to loader strategies applied to the relationship() directly, as well as with options used with Query.options(), provided that the object was originally loaded by that query.

For the “secondary” eager loaders “selectinload” and “subqueryload”, the SQL strategy for these loaders is not necessary in order to eagerly load attributes on a single object; so they will instead invoke the “immediateload” strategy in a refresh scenario, which resembles the query emitted by “lazyload”, emitted as an additional query:

>>> a1 = session.query(A).options(selectinload(
>>> = 'new data'
>>> session.commit()
SELECT AS a_id, AS a_data
SELECT AS b_id, b.a_id AS b_a_id
WHERE ? = b.a_id

Note that a loader option does not apply to an object that was introduced into the Session in a different way. That is, if the a1 object were just persisted in this Session, or was loaded with a different query before the eager option had been applied, then the object doesn’t have an eager load option associated with it. This is not a new concept, however users who are looking for the eagerload on refresh behavior may find this to be more noticeable.


Accessing an uninitialized collection attribute on a transient object no longer mutates __dict__

It has always been SQLAlchemy’s behavior that accessing mapped attributes on a newly created object returns an implicitly generated value, rather than raising AttributeError, such as None for scalar attributes or [] for a list-holding relationship:

>>> u1 = User()
>>> u1.addresses

The rationale for the above behavior was originally to make ORM objects easier to work with. Since an ORM object represents an empty row when first created without any state, it is intuitive that its un-accessed attributes would resolve to None (or SQL NULL) for scalars and to empty collections for relationships. In particular, it makes possible an extremely common pattern of being able to mutate the new collection without manually creating and assigning an empty collection first:

>>> u1 = User()
>>> u1.addresses.append(Address())  # no need to assign u1.addresses = []

Up until version 1.0 of SQLAlchemy, the behavior of this initialization system for both scalar attributes as well as collections would be that the None or empty collection would be populated into the object’s state, e.g. __dict__. This meant that the following two operations were equivalent:

>>> u1 = User()
>>> = None  # explicit assignment

>>> u2 = User()
>>>  # implicit assignment just by accessing it

Where above, both u1 and u2 would have the value None populated in the value of the name attribute. Since this is a SQL NULL, the ORM would skip including these values within an INSERT so that SQL-level defaults take place, if any, else the value defaults to NULL on the database side.

In version 1.0 as part of Changes to attribute events and other operations regarding attributes that have no pre-existing value, this behavior was refined so that the None value was no longer populated into __dict__, only returned. Besides removing the mutating side effect of a getter operation, this change also made it possible to set columns that did have server defaults to the value NULL by actually assigning None, which was now distinguished from just reading it.

The change however did not accommodate for collections, where returning an empty collection that is not assigned meant that this mutable collection would be different each time and also would not be able to correctly accommodate for mutating operations (e.g. append, add, etc.) called upon it. While the behavior continued to generally not get in anyone’s way, an edge case was eventually identified in #4519 where this empty collection could be harmful, which is when the object is merged into a session:

>>> u1 = User(id=1)  # create an empty User to merge with id=1 in the database
>>> merged1 = session.merge(u1)  # value of merged1.addresses is unchanged from that of the DB

>>> u2 = User(id=2) # create an empty User to merge with id=2 in the database
>>> u2.addresses
>>> merged2 = session.merge(u2)  # value of merged2.addresses has been emptied in the DB

Above, the .addresses collection on merged1 will contain all the Address() objects that were already in the database. merged2 will not; because it has an empty list implicitly assigned, the .addresses collection will be erased. This is an example of where this mutating side effect can actually mutate the database itself.

While it was considered that perhaps the attribute system should begin using strict “plain Python” behavior, raising AttributeError in all cases for non-existent attributes on non-persistent objects and requiring that all collections be explicitly assigned, such a change would likely be too extreme for the vast number of applications that have relied upon this behavior for many years, leading to a complex rollout / backwards compatibility problem as well as the likelihood that workarounds to restore the old behavior would become prevalent, thus rendering the whole change ineffective in any case.

The change then is to keep the default producing behavior, but to finally make the non-mutating behavior of scalars a reality for collections as well, via the addition of additional mechanics in the collection system. When accessing the empty attribute, the new collection is created and associated with the state, however is not added to __dict__ until it is actually mutated:

>>> u1 = User()
>>> l1 = u1.addresses  # new list is created, associated with the state
>>> assert u1.addresses is l1  # you get the same list each time you access it
>>> assert "addresses" not in u1.__dict__  # but it won't go into __dict__ until it's mutated
>>> from sqlalchemy import inspect
>>> inspect(u1).attrs.addresses.history
History(added=None, unchanged=None, deleted=None)

When the list is changed, then it becomes part of the tracked changes to be persisted to the database:

>>> l1.append(Address())
>>> assert "addresses" in u1.__dict__
>>> inspect(u1).attrs.addresses.history
History(added=[<__main__.Address object at 0x7f49b725eda0>], unchanged=[], deleted=[])

This change is expected to have nearly no impact on existing applications in any way, except that it has been observed that some applications may be relying upon the implicit assignment of this collection, such as to assert that the object contains certain values based on its __dict__:

>>> u1 = User()
>>> u1.addresses
# this will now fail, would pass before
>>> assert {k: v for k, v in u1.__dict__.items() if not k.startswith("_")} == {"addresses": []}

or to ensure that the collection won’t require a lazy load to proceed, the (admittedly awkward) code below will now also fail:

>>> u1 = User()
>>> u1.addresses
>>> s.add(u1)
>>> s.flush()
>>> s.close()
>>> u1.addresses  # <-- will fail, .addresses is not loaded and object is detached

Applications that rely upon the implicit mutating behavior of collections will need to be changed so that they assign the desired collection explicitly:

>>> u1.addresses = []


The “New instance conflicts with existing identity” error is now a warning

SQLAlchemy has always had logic to detect when an object in the Session to be inserted has the same primary key as an object that is already present:

class Product(Base):
    __tablename__ = 'product'

    id = Column(Integer, primary_key=True)

session = Session(engine)

# add Product with primary key 1

# add another Product with same primary key
s.commit()  # <-- will raise FlushError

The change is that the FlushError is altered to be only a warning:

sqlalchemy/orm/ SAWarning: New instance <Product at 0x7f1ff65e0ba8> with identity key (<class '__main__.Product'>, (1,), None) conflicts with persistent instance <Product at 0x7f1ff60a4550>

Subsequent to that, the condition will attempt to insert the row into the database which will emit IntegrityError, which is the same error that would be raised if the primary key identity was not already present in the Session:

sqlalchemy.exc.IntegrityError: (sqlite3.IntegrityError) UNIQUE constraint failed:

The rationale is to allow code that is using IntegrityError to catch duplicates to function regardless of the existing state of the Session, as is often done using savepoints:

# add another Product with same primary key
    with session.begin_nested():
except exc.IntegrityError:
    print("row already exists")

The above logic was not fully feasible earlier, as in the case that the Product object with the existing identity were already in the Session, the code would also have to catch FlushError, which additionally is not filtered for the specific condition of integrity issues. With the change, the above block behaves consistently with the exception of the warning also being emitted.

Since the logic in question deals with the primary key, all databases emit an integrity error in the case of primary key conflicts on INSERT. The case where an error would not be raised, that would have earlier, is the extremely unusual scenario of a mapping that defines a primary key on the mapped selectable that is more restrictive than what is actually configured in the database schema, such as when mapping to joins of tables or when defining additional columns as part of a composite primary key that is not actually constrained in the database schema. However, these situations also work more consistently in that the INSERT would theoretically proceed whether or not the existing identity were still in the database. The warning can also be configured to raise an exception using the Python warnings filter.


Stricter behavior when querying inheritance mappings using custom queries

This change applies to the scenario where a joined- or single- table inheritance subclass entity is being queried, given a completed SELECT subquery to select from. If the given subquery returns rows that do not correspond to the requested polymorphic identity or identities, an error is raised. Previously, this condition would pass silently under joined table inheritance, returning an invalid subclass, and under single table inheritance, the Query would be adding additional criteria against the subquery to limit the results which could inappropriately interfere with the intent of the query.

Given the example mapping of Employee, Engineer(Employee), Manager(Employee), in the 1.3 series if we were to emit the following query against a joined inheritance mapping:

s = Session(e)

s.add_all([Engineer(), Manager()])



The subquery selects both the Engineer and the Manager rows, and even though the outer query is against Manager, we get a non Manager object back:

SELECT anon_1.type AS anon_1_type, AS anon_1_id
FROM (SELECT employee.type AS type, AS id
FROM employee) AS anon_1
2020-01-29 18:04:13,524 INFO sqlalchemy.engine.base.Engine ()
[<__main__.Engineer object at 0x7f7f5b9a9810>, <__main__.Manager object at 0x7f7f5b9a9750>]

The new behavior is that this condition raises an error:

sqlalchemy.exc.InvalidRequestError: Row with identity key
(<class '__main__.Employee'>, (1,), None) can't be loaded into an object;
the polymorphic discriminator column '%(140205120401296 anon)s.type'
refers to mapped class Engineer->engineer, which is not a sub-mapper of
the requested mapped class Manager->manager

The above error only raises if the primary key columns of that entity are non-NULL. If there’s no primary key for a given entity in a row, no attempt to construct an entity is made.

In the case of single inheritance mapping, the change in behavior is slightly more involved; if Engineer and Manager above are mapped with single table inheritance, in 1.3 the following query would be emitted and only a Manager object is returned:

SELECT anon_1.type AS anon_1_type, AS anon_1_id
FROM (SELECT employee.type AS type, AS id
FROM employee) AS anon_1
WHERE anon_1.type IN (?)
2020-01-29 18:08:32,975 INFO sqlalchemy.engine.base.Engine ('manager',)
[<__main__.Manager object at 0x7ff1b0200d50>]

The Query added the “single table inheritance” criteria to the subquery, editorializing on the intent that was originally set up by it. This behavior was added in version 1.0 in #3891, and creates a behavioral inconsistency between “joined” and “single” table inheritance, and additionally modifies the intent of the given query, which may intend to return additional rows where the columns that correspond to the inheriting entity are NULL, which is a valid use case. The behavior is now equivalent to that of joined table inheritance, where it is assumed that the subquery returns the correct rows and an error is raised if an unexpected polymorphic identity is encountered:

SELECT anon_1.type AS anon_1_type, AS anon_1_id
FROM (SELECT employee.type AS type, AS id
FROM employee) AS anon_1
2020-01-29 18:13:10,554 INFO sqlalchemy.engine.base.Engine ()
Traceback (most recent call last):
# ...
sqlalchemy.exc.InvalidRequestError: Row with identity key
(<class '__main__.Employee'>, (1,), None) can't be loaded into an object;
the polymorphic discriminator column '%(140700085268432 anon)s.type'
refers to mapped class Engineer->employee, which is not a sub-mapper of
the requested mapped class Manager->employee

The correct adjustment to the situation as presented above which worked on 1.3 is to adjust the given subquery to correctly filter the rows based on the discriminator column:

        s.query(Employee).filter(Employee.discriminator == 'manager').

SELECT anon_1.type AS anon_1_type, AS anon_1_id
FROM (SELECT employee.type AS type, AS id
FROM employee
WHERE employee.type = ?) AS anon_1
2020-01-29 18:14:49,770 INFO sqlalchemy.engine.base.Engine ('manager',)
[<__main__.Manager object at 0x7f70e13fca90>]


New Features - Core

Built-in FROM linting will warn for any potential cartesian products in a SELECT statement

As the Core expression language as well as the ORM are built on an “implicit FROMs” model where a particular FROM clause is automatically added if any part of the query refers to it, a common issue is the case where a SELECT statement, either a top level statement or an embedded subquery, contains FROM elements that are not joined to the rest of the FROM elements in the query, causing what’s referred to as a “cartesian product” in the result set, i.e. every possible combination of rows from each FROM element not otherwise joined. In relational databases, this is nearly always an undesirable outcome as it produces an enormous result set full of duplicated, uncorrelated data.

SQLAlchemy, for all of its great features, is particularly prone to this sort of issue happening as a SELECT statement will have elements added to its FROM clause automatically from any table seen in the other clauses. A typical scenario looks like the following, where two tables are JOINed together, however an additional entry in the WHERE clause that perhaps inadvertently does not line up with these two tables will create an additional FROM entry:

address_alias = aliased(Address)

q = session.query(User).\
    join(address_alias, User.addresses).\
    filter(Address.email_address == 'foo')

The above query selects from a JOIN of User and address_alias, the latter of which is an alias of the Address entity. However, the Address entity is used within the WHERE clause directly, so the above would result in the SQL:

SELECT AS users_id, AS users_name,
    users.fullname AS users_fullname,
    users.nickname AS users_nickname
FROM addresses, users JOIN addresses AS addresses_1 ON = addresses_1.user_id
WHERE addresses.email_address = :email_address_1

In the above SQL, we can see what SQLAlchemy developers term “the dreaded comma”, as we see “FROM addresses, users JOIN addresses” in the FROM clause which is the classic sign of a cartesian product; where a query is making use of JOIN in order to join FROM clauses together, however because one of them is not joined, it uses a comma. The above query will return a full set of rows that join the “user” and “addresses” table together on the “id / user_id” column, and will then apply all those rows into a cartesian product against every row in the “addresses” table directly. That is, if there are ten user rows and 100 rows in addresses, the above query will return its expected result rows, likely to be 100 as all address rows would be selected, multiplied by 100 again, so that the total result size would be 10000 rows.

The “table1, table2 JOIN table3” pattern is one that also occurs quite frequently within the SQLAlchemy ORM due to either subtle mis-application of ORM features particularly those related to joined eager loading or joined table inheritance, as well as a result of SQLAlchemy ORM bugs within those same systems. Similar issues apply to SELECT statements that use “implicit joins”, where the JOIN keyword is not used and instead each FROM element is linked with another one via the WHERE clause.

For some years there has been a recipe on the Wiki that applies a graph algorithm to a select() construct at query execution time and inspects the structure of the query for these un-linked FROM clauses, parsing through the WHERE clause and all JOIN clauses to determine how FROM elements are linked together and ensuring that all the FROM elements are connected in a single graph. This recipe has now been adapted to be part of the SQLCompiler itself where it now optionally emits a warning for a statement if this condition is detected. The warning is enabled using the create_engine.enable_from_linting flag and is enabled by default. The computational overhead of the linter is very low, and additionally it only occurs during statement compilation which means for a cached SQL statement it only occurs once.

Using this feature, our ORM query above will emit a warning:

>>> q.all()
SAWarning: SELECT statement has a cartesian product between FROM
element(s) "addresses_1", "users" and FROM element "addresses".
Apply join condition(s) between each element to resolve.

The linter feature accommodates not just for tables linked together through the JOIN clauses but also through the WHERE clause Above, we can add a WHERE clause to link the new Address entity with the previous address_alias entity and that will remove the warning:

q = session.query(User).\
    join(address_alias, User.addresses).\
    filter(Address.email_address == 'foo').\
    filter( ==  # resolve cartesian products,
                                            # will no longer warn

The cartesian product warning considers any kind of link between two FROM clauses to be a resolution, even if the end result set is still wasteful, as the linter is intended only to detect the common case of a FROM clause that is completely unexpected. If the FROM clause is referred to explicitly elsewhere and linked to the other FROMs, no warning is emitted:

q = session.query(User).\
    join(address_alias, User.addresses).\
    filter(Address.email_address == 'foo').\
    filter( >  # will generate a lot of rows,
                                           # but no warning

Full cartesian products are also allowed if they are explicitly stated; if we wanted for example the cartesian product of User and Address, we can JOIN on true() so that every row will match with every other; the following query will return all rows and produce no warnings:

from sqlalchemy import true

# intentional cartesian product
q = session.query(User).join(Address, true())  # intentional cartesian product

The warning is only generated by default when the statement is compiled by the Connection for execution; calling the ClauseElement.compile() method will not emit a warning unless the linting flag is supplied:

>>> from sqlalchemy.sql import FROM_LINTING
>>> print(q.statement.compile(linting=FROM_LINTING))
SAWarning: SELECT statement has a cartesian product between FROM element(s) "addresses" and FROM element "users".  Apply join condition(s) between each element to resolve.
SELECT,, users.fullname, users.nickname
FROM addresses, users JOIN addresses AS addresses_1 ON = addresses_1.user_id
WHERE addresses.email_address = :email_address_1


Behavior Changes - Core

SELECT objects and derived FROM clauses allow for duplicate columns and column labels

This change allows that the select() construct now allows for duplicate column labels as well as duplicate column objects themselves, so that result tuples are organized and ordered in the identical way in that the columns were selected. The ORM Query already works this way, so this change allows for greater cross-compatibility between the two, which is a key goal of the 2.0 transition:

>>> from sqlalchemy import column, select
>>> c1, c2, c3, c4 = column('c1'), column('c2'), column('c3'), column('c4')
>>> stmt = select([c1, c2, c3.label('c2'), c2, c4])
>>> print(stmt)
SELECT c1, c2, c3 AS c2, c2, c4

To support this change, the ColumnCollection used by SelectBase as well as for derived FROM clauses such as subqueries also support duplicate columns; this includes the new SelectBase.selected_columns attribute, the deprecated SelectBase.c attribute, as well as the FromClause.c attribute seen on constructs such as Subquery and Alias:

>>> list(stmt.selected_columns)
    <sqlalchemy.sql.elements.ColumnClause at 0x7fa540bcca20; c1>,
    <sqlalchemy.sql.elements.ColumnClause at 0x7fa540bcc9e8; c2>,
    <sqlalchemy.sql.elements.Label object at 0x7fa540b3e2e8>,
    <sqlalchemy.sql.elements.ColumnClause at 0x7fa540bcc9e8; c2>,
    <sqlalchemy.sql.elements.ColumnClause at 0x7fa540897048; c4>

>>> print(stmt.subquery().select())
SELECT anon_1.c1, anon_1.c2, anon_1.c2, anon_1.c2, anon_1.c4
FROM (SELECT c1, c2, c3 AS c2, c2, c4) AS anon_1

ColumnCollection also allows access by integer index to support when the string “key” is ambiguous:

>>> stmt.selected_columns[2]
<sqlalchemy.sql.elements.Label object at 0x7fa540b3e2e8>

To suit the use of ColumnCollection in objects such as Table and PrimaryKeyConstraint, the old “deduplicating” behavior which is more critical for these objects is preserved in a new class DedupeColumnCollection.

The change includes that the familiar warning "Column %r on table %r being replaced by %r, which has the same key.  Consider use_labels for select() statements." is removed; the Select.apply_labels() is still available and is still used by the ORM for all SELECT operations, however it does not imply deduplication of column objects, although it does imply deduplication of implicitly generated labels:

>>> from sqlalchemy import table
>>> user = table('user', column('id'), column('name'))
>>> stmt = select([,,]).apply_labels()
>>> print(stmt)
SELECT "user".id AS user_id, "user".name AS user_name, "user".id AS id_1
FROM "user"

Finally, the change makes it easier to create UNION and other _selectable.CompoundSelect objects, by ensuring that the number and position of columns in a SELECT statement mirrors what was given, in a use case such as:

>>> s1 = select([user,])
>>> s2 = select([c1, c2, c3])
>>> from sqlalchemy import union
>>> u = union(s1, s2)
>>> print(u)
SELECT "user".id, "user".name, "user".id
FROM "user" UNION SELECT c1, c2, c3


Improved column labeling for simple column expressions using CAST or similar

A user pointed out that the PostgreSQL database has a convenient behavior when using functions like CAST against a named column, in that the result column name is named the same as the inner expression:


(1 row)

This allows one to apply CAST to table columns while not losing the column name (above using the name "data") in the result row. Compare to databases such as MySQL/MariaDB, as well as most others, where the column name is taken from the full SQL expression and is not very portable:

MariaDB [test]> SELECT CAST(data AS CHAR) FROM foo;
| CAST(data AS CHAR) |
| 5                  |
1 row in set (0.003 sec)

In SQLAlchemy Core expressions, we never deal with a raw generated name like the above, as SQLAlchemy applies auto-labeling to expressions like these, which are up until now always a so-called “anonymous” expression:

>>> print(select([cast(, String)]))
SELECT CAST( AS VARCHAR) AS anon_1     # old behavior
FROM foo

These anonymous expressions were necessary as SQLAlchemy’s ResultProxy made heavy use of result column names in order to match up datatypes, such as the String datatype which used to have result-row-processing behavior, to the correct column, so most importantly the names had to be both easy to determine in a database-agnostic manner as well as unique in all cases. In SQLAlchemy 1.0 as part of #918, this reliance on named columns in result rows (specifically the cursor.description element of the PEP-249 cursor) was scaled back to not be necessary for most Core SELECT constructs; in release 1.4, the system overall is becoming more comfortable with SELECT statements that have duplicate column or label names such as in SELECT objects and derived FROM clauses allow for duplicate columns and column labels. So we now emulate PostgreSQL’s reasonable behavior for simple modifications to a single column, most prominently with CAST:

>>> print(select([cast(, String)]))
FROM foo

For CAST against expressions that don’t have a name, the previous logic is used to generate the usual “anonymous” labels:

>>> print(select([cast('hi there,' +, String)]))
SELECT CAST(:data_1 + AS VARCHAR) AS anon_1
FROM foo

A cast() against a Label, despite having to omit the label expression as these don’t render inside of a CAST, will nonetheless make use of the given name:

>>> print(select([cast(('hi there,' +'hello_data'), String)]))
SELECT CAST(:data_1 + AS VARCHAR) AS hello_data
FROM foo

And of course as was always the case, Label can be applied to the expression on the outside to apply an “AS <name>” label directly:

>>> print(select([cast(('hi there,' +, String).label('hello_data')]))
SELECT CAST(:data_1 + AS VARCHAR) AS hello_data
FROM foo


New “post compile” bound parameters used for LIMIT/OFFSET in Oracle, SQL Server

A major goal of the 1.4 series is to establish that all Core SQL constructs are completely cacheable, meaning that a particular Compiled structure will produce an identical SQL string regardless of any SQL parameters used with it, which notably includes those used to specify the LIMIT and OFFSET values, typically used for pagination and “top N” style results.

While SQLAlchemy has used bound parameters for LIMIT/OFFSET schemes for many years, a few outliers remained where such parameters were not allowed, including a SQL Server “TOP N” statement, such as:

SELECT TOP 5, FROM mytable

as well as with Oracle, where the FIRST_ROWS() hint (which SQLAlchemy will use if the optimize_limits=True parameter is passed to create_engine() with an Oracle URL) does not allow them, but also that using bound parameters with ROWNUM comparisons has been reported as producing slower query plans:

    SELECT /*+ FIRST_ROWS(5) */ AS id, AS data,
    ROWNUM AS ora_rn FROM (
        SELECT, FROM mytable
    ) anon_2
    WHERE ROWNUM <= :param_1
) anon_1 WHERE ora_rn > :param_2

In order to allow for all statements to be unconditionally cacheable at the compilation level, a new form of bound parameter called a “post compile” parameter has been added, which makes use of the same mechanism as that of “expanding IN parameters”. This is a bindparam() that behaves identically to any other bound parameter except that parameter value will be rendered literally into the SQL string before sending it to the DBAPI cursor.execute() method. The new parameter is used internally by the SQL Server and Oracle dialects, so that the drivers receive the literal rendered value but the rest of SQLAlchemy can still consider this as a bound parameter. The above two statements when stringified using str(statement.compile(dialect=<dialect>)) now look like:



    SELECT /*+ FIRST_ROWS([POSTCOMPILE__ora_frow_1]) */ AS id, AS data,
    ROWNUM AS ora_rn FROM (
        SELECT, FROM mytable
    ) anon_2
) anon_1 WHERE ora_rn > [POSTCOMPILE_param_2]

The [POSTCOMPILE_<param>] format is also what is seen when an “expanding IN” is used.

When viewing the SQL logging output, the final form of the statement will be seen:

    SELECT /*+ FIRST_ROWS(5) */ AS id, AS data,
    ROWNUM AS ora_rn FROM (
        SELECT AS id, AS data FROM mytable
    ) anon_2
) anon_1 WHERE ora_rn > 3

The “post compile parameter” feature is exposed as public API through the bindparam.literal_execute parameter, however is currently not intended for general use. The literal values are rendered using the TypeEngine.literal_processor() of the underlying datatype, which in SQLAlchemy has extremely limited scope, supporting only integers and simple string values.


Connection-level transactions can now be inactive based on subtransaction

A Connection now includes the behavior where a Transaction can be made inactive due to a rollback on an inner transaction, however the Transaction will not clear until it is itself rolled back.

This is essentially a new error condition which will disallow statement executions to proceed on a Connection if an inner “sub” transaction has been rolled back. The behavior works very similarly to that of the ORM Session, where if an outer transaction has been begun, it needs to be rolled back to clear the invalid transaction; this behavior is described in “This Session’s transaction has been rolled back due to a previous exception during flush.” (or similar)

While the Connection has had a less strict behavioral pattern than the Session, this change was made as it helps to identify when a subtransaction has rolled back the DBAPI transaction, however the external code isn’t aware of this and attempts to continue proceeding, which in fact runs operations on a new transaction. The “test harness” pattern described at Joining a Session into an External Transaction (such as for test suites) is the common place for this to occur.

The new behavior is described in the errors page at This connection is on an inactive transaction. Please rollback() fully before proceeding.

Dialect Changes

psycopg2 version 2.7 or higher is required for the PostgreSQL psycopg2 dialect

The psycopg2 dialect relies upon many features of psycopg2 released in the past few years. To simplify the dialect, version 2.7, released in March, 2017 is now the minimum version required.

psycopg2 dialect features “execute_values” with RETURNING for INSERT statements by default

The first half of a significant performance enhancement for PostgreSQL when using both Core and ORM, the psycopg2 dialect now uses psycopg2.extras.execute_values() by default for compiled INSERT statements and also implements RETURNING support in this mode. The other half of this change is ORM Batch inserts with psycopg2 now batch statements with RETURNING in most cases which allows the ORM to take advantage of RETURNING with executemany (i.e. batching of INSERT statements) so that ORM bulk inserts with psycopg2 are up to 400% faster depending on specifics.

This extension method allows many rows to be INSERTed within a single statement, using an extended VALUES clause for the statement. While SQLAlchemy’s insert() construct already supports this syntax via the Insert.values() method, the extension method allows the construction of the VALUES clause to occur dynamically when the statement is executed as an “executemany” execution, which is what occurs when one passes a list of parameter dictionaries to Connection.execute(). It also occurs beyond the cache boundary so that the INSERT statement may be cached before the VALUES are rendered.

A quick test of the execute_values() approach using the script in the Performance example suite reveals an approximate fivefold performance increase:

$ python -m examples.performance bulk_inserts --test test_core_insert --num 100000 --dburl postgresql://scott:tiger@localhost/test

# 1.3
test_core_insert : A single Core INSERT construct inserting mappings in bulk. (100000 iterations); total time 5.229326 sec

# 1.4
test_core_insert : A single Core INSERT construct inserting mappings in bulk. (100000 iterations); total time 0.944007 sec

Support for the “batch” extension was added in version 1.2 in Support for Batch Mode / Fast Execution Helpers, and enhanced to include support for the execute_values extension in 1.3 in #4623. In 1.4 the execute_values extension is now being turned on by default for INSERT statements; the “batch” extension for UPDATE and DELETE remains off by default.

In addition, the execute_values extension function supports returning the rows that are generated by RETURNING as an aggregated list. The psycopg2 dialect will now retrieve this list if the given insert() construct requests returning via the Insert.returning() method or similar methods intended to return generated defaults; the rows are then installed in the result so that they are retreieved as though they came from the cursor directly. This allows tools like the ORM to use batched inserts in all cases, which is expected to provide a dramatic performance improvement.

The executemany_mode feature of the psycopg2 dialect has been revised with the following changes:

  • A new mode "values_only" is added. This mode uses the very performant psycopg2.extras.execute_values() extension method for compiled INSERT statements run with executemany(), but does not use execute_batch() for UPDATE and DELETE statements. This new mode is now the default setting for the psycopg2 dialect.

  • The existing "values" mode is now named "values_plus_batch". This mode will use execute_values for INSERT statements and execute_batch for UPDATE and DELETE statements. The mode is not enabled by default because it disables the proper functioning of cursor.rowcount with UPDATE and DELETE statements executed with executemany().

  • RETURNING support is enabled for "values_only" and "values" for INSERT statements. The psycopg2 dialect will receive the rows back from psycopg2 using the fetch=True flag and install them into the result set as though they came directly from the cursor (which they ulimately did, however psycopg2’s extension function has aggregated multiple batches into one list).

  • The default “page_size” setting for execute_values has been increased from 100 to 1000. The default remains at 100 for the execute_batch function. These parameters may both be modified as was the case before.

  • The use_batch_mode flag that was part of the 1.2 version of the feature is removed; the behavior remains controllable via the executemany_mode flag added in 1.3.

  • The Core engine and dialect has been enhanced to support executemany plus returning mode, currently only available with psycopg2, by providing new CursorResult.inserted_primary_key_rows and CursorResult.returned_default_rows accessors.


Removed “join rewriting” logic from SQLite dialect; updated imports

Dropped support for right-nested join rewriting to support old SQLite versions prior to 3.7.16, released in 2013. It is not expected that any modern Python versions rely upon this limitation.

The behavior was first introduced in 0.9 and was part of the larger change of allowing for right nested joins as described at Many JOIN and LEFT OUTER JOIN expressions will no longer be wrapped in (SELECT * FROM ..) AS ANON_1. However the SQLite workaround produced many regressions in the 2013-2014 period due to its complexity. In 2016, the dialect was modified so that the join rewriting logic would only occur for SQLite versions prior to 3.7.16 after bisection was used to identify where SQLite fixed its support for this construct, and no further issues were reported against the behavior (even though some bugs were found internally). It is now anticipated that there are little to no Python builds for Python 2.7 or 3.5 and above (the supported Python versions) which would include a SQLite version prior to 3.7.17, and the behavior is only necessary only in more complex ORM joining scenarios. A warning is now emitted if the installed SQLite version is older than 3.7.16.

In related changes, the module imports for SQLite no longer attempt to import the “pysqlite2” driver on Python 3 as this driver does not exist on Python 3; a very old warning for old pysqlite2 versions is also dropped.


Added Sequence support for MariaDB 10.3

The MariaDB database as of 10.3 supports sequences. SQLAlchemy’s MySQL dialect now implements support for the Sequence object against this database, meaning “CREATE SEQUENCE” DDL will be emitted for a Sequence that is present in a Table or MetaData collection in the same way as it works for backends such as PostgreSQL, Oracle, when the dialect’s server version check has confirmed the database is MariaDB 10.3 or greater. Additionally, the Sequence will act as a column default and primary key generation object when used in these ways.

Since this change will impact the assumptions both for DDL as well as the behavior of INSERT statements for an application that is currently deployed against MariaDB 10.3 which also happens to make explicit use the Sequence construct within its table definitions, it is important to note that Sequence supports a flag Sequence.optional which is used to limit the scenarios in which the Sequence to take effect. When “optional” is used on a Sequence that is present in the integer primary key column of a table:

    "some_table", metadata,
    Column("id", Integer, Sequence("some_seq", optional=True), primary_key=True)

The above Sequence is only used for DDL and INSERT statements if the target database does not support any other means of generating integer primary key values for the column. That is, the Oracle database above would use the sequence, however the PostgreSQL and MariaDB 10.3 databases would not. This may be important for an existing application that is upgrading to SQLAlchemy 1.4 which may not have emitted DDL for this Sequence against its backing database, as an INSERT statement will fail if it seeks to use a sequence that was not created.


Added Sequence support distinct from IDENTITY to SQL Server

The Sequence construct is now fully functional with Microsoft SQL Server. When applied to a Column, the DDL for the table will no longer include IDENTITY keywords and instead will rely upon “CREATE SEQUENCE” to ensure a sequence is present which will then be used for INSERT statements on the table.

The Sequence prior to version 1.3 was used to control parameters for the IDENTITY column in SQL Server; this usage emitted deprecation warnings throughout 1.3 and is now removed in 1.4. For control of parameters for an IDENTITY column, the mssql_identity_start and mssql_identity_increment parameters should be used; see the MSSQL dialect documentation linked below.



Previous: Changes and Migration Next: SQLAlchemy 2.0 Transition