SQLAlchemy 2.1 Documentation
Changes and Migration
- What’s New in SQLAlchemy 2.1?¶
- Introduction
- General
- New Features and Improvements - ORM
- New Features and Improvements - Core
- 2.1 Changelog
- 2.0 Changelog
- 1.4 Changelog
- 1.3 Changelog
- 1.2 Changelog
- 1.1 Changelog
- 1.0 Changelog
- 0.9 Changelog
- 0.8 Changelog
- 0.7 Changelog
- 0.6 Changelog
- 0.5 Changelog
- 0.4 Changelog
- 0.3 Changelog
- 0.2 Changelog
- 0.1 Changelog
- SQLAlchemy 2.0 - Major Migration Guide
- What’s New in SQLAlchemy 2.0?
- What’s New in SQLAlchemy 1.4?
- What’s New in SQLAlchemy 1.3?
- What’s New in SQLAlchemy 1.2?
- What’s New in SQLAlchemy 1.1?
- What’s New in SQLAlchemy 1.0?
- What’s New in SQLAlchemy 0.9?
- What’s New in SQLAlchemy 0.8?
- What’s New in SQLAlchemy 0.7?
- What’s New in SQLAlchemy 0.6?
- What’s new in SQLAlchemy 0.5?
- What’s new in SQLAlchemy 0.4?
Project Versions
- Previous: Changes and Migration
- Next: 2.1 Changelog
- Up: Home
- On this page:
- What’s New in SQLAlchemy 2.1?
- Introduction
- General
- New Features and Improvements - ORM
- New Features and Improvements - Core
What’s New in SQLAlchemy 2.1?¶
About this Document
This document describes changes between SQLAlchemy version 2.0 and version 2.1.
Introduction¶
This guide introduces what’s new in SQLAlchemy version 2.1 and also documents changes which affect users migrating their applications from the 2.0 series of SQLAlchemy to 2.1.
Please carefully review the sections on behavioral changes for potentially backwards-incompatible changes in behavior.
General¶
Asyncio “greenlet” dependency no longer installs by default¶
SQLAlchemy 1.4 and 2.0 used a complex expression to determine if the
greenlet
dependency, needed by the asyncio
extension, could be installed from pypi using a pre-built wheel instead
of having to build from source. This because the source build of greenlet
is not always trivial on some platforms.
Disadvantages to this approach included that SQLAlchemy needed to track
exactly which versions of greenlet
were published as wheels on pypi;
the setup expression led to problems with some package management tools
such as poetry
; it was not possible to install SQLAlchemy without
greenlet
being installed, even though this is completely feasible
if the asyncio extension is not used.
These problems are all solved by keeping greenlet
entirely within the
[asyncio]
target. The only downside is that users of the asyncio extension
need to be aware of this extra installation dependency.
New Features and Improvements - ORM¶
Session autoflush behavior simplified to be unconditional¶
Session autoflush behavior has been simplified to unconditionally flush the session each time an execution takes place, regardless of whether an ORM statement or Core statement is being executed. This change eliminates the previous conditional logic that only flushed when ORM-related statements were detected.
Previously, the session would only autoflush when executing ORM queries:
# 2.0 behavior - autoflush only occurred for ORM statements
session.add(User(name="new user"))
# This would trigger autoflush
users = session.execute(select(User)).scalars().all()
# This would NOT trigger autoflush
result = session.execute(text("SELECT * FROM users"))
In 2.1, autoflush occurs for all statement executions:
# 2.1 behavior - autoflush occurs for all executions
session.add(User(name="new user"))
# Both of these now trigger autoflush
users = session.execute(select(User)).scalars().all()
result = session.execute(text("SELECT * FROM users"))
This change provides more consistent and predictable session behavior across all types of SQL execution.
ORM Relationship allows callable for back_populates¶
To help produce code that is more amenable to IDE-level linting and type
checking, the relationship.back_populates
parameter now
accepts both direct references to a class-bound attribute as well as
lambdas which do the same:
class A(Base):
__tablename__ = "a"
id: Mapped[int] = mapped_column(primary_key=True)
# use a lambda: to link to B.a directly when it exists
bs: Mapped[list[B]] = relationship(back_populates=lambda: B.a)
class B(Base):
__tablename__ = "b"
id: Mapped[int] = mapped_column(primary_key=True)
a_id: Mapped[int] = mapped_column(ForeignKey("a.id"))
# A.bs already exists, so can link directly
a: Mapped[A] = relationship(back_populates=A.bs)
ORM Mapped Dataclasses no longer populate implicit default
, collection-based default_factory
in __dict__
¶
This behavioral change addresses a widely reported issue with SQLAlchemy’s
Declarative Dataclass Mapping feature that was introduced in 2.0.
SQLAlchemy ORM has always featured a behavior where a particular attribute on
an ORM mapped class will have different behaviors depending on if it has an
actively set value, including if that value is None
, versus if the
attribute is not set at all. When Declarative Dataclass Mapping was introduced, the
mapped_column.default
parameter introduced a new capability
which is to set up a dataclass-level default to be present in the generated
__init__
method. This had the unfortunate side effect of breaking various
popular workflows, the most prominent of which is creating an ORM object with
the foreign key value in lieu of a many-to-one reference:
class Base(MappedAsDataclass, DeclarativeBase):
pass
class Parent(Base):
__tablename__ = "parent"
id: Mapped[int] = mapped_column(primary_key=True, init=False)
related_id: Mapped[int | None] = mapped_column(ForeignKey("child.id"), default=None)
related: Mapped[Child | None] = relationship(default=None)
class Child(Base):
__tablename__ = "child"
id: Mapped[int] = mapped_column(primary_key=True, init=False)
In the above mapping, the __init__
method generated for Parent
would in Python code look like this:
def __init__(self, related_id=None, related=None): ...
This means that creating a new Parent
with related_id
only would populate
both related_id
and related
in __dict__
:
# 2.0 behavior; will INSERT NULL for related_id due to the presence
# of related=None
>>> p1 = Parent(related_id=5)
>>> p1.__dict__
{'related_id': 5, 'related': None, '_sa_instance_state': ...}
The None
value for 'related'
means that SQLAlchemy favors the non-present
related Child
over the present value for 'related_id'
, which would be
discarded, and NULL
would be inserted for 'related_id'
instead.
In the new behavior, the __init__
method instead looks like the example below,
using a special constant DONT_SET
indicating a non-present value for 'related'
should be ignored. This allows the class to behave more closely to how
SQLAlchemy ORM mapped classes traditionally operate:
def __init__(self, related_id=DONT_SET, related=DONT_SET): ...
We then get a __dict__
setup that will follow the expected behavior of
omitting related
from __dict__
and later running an INSERT with
related_id=5
:
# 2.1 behavior; will INSERT 5 for related_id
>>> p1 = Parent(related_id=5)
>>> p1.__dict__
{'related_id': 5, '_sa_instance_state': ...}
Dataclass defaults are delivered via descriptor instead of __dict__¶
The above behavior goes a step further, which is that in order to
honor default values that are something other than None
, the value of the
dataclass-level default (i.e. set using any of the
mapped_column.default
,
column_property.default
, or deferred.default
parameters) is directed to be delivered at the
Python descriptor level using mechanisms in SQLAlchemy’s attribute
system that normally return None
for un-popualted columns, so that even though the default is not
populated into __dict__
, it’s still delivered when the attribute is
accessed. This behavior is based on what Python dataclasses itself does
when a default is indicated for a field that also includes init=False
.
In the example below, an immutable default "default_status"
is applied to a column called status
:
class Base(MappedAsDataclass, DeclarativeBase):
pass
class SomeObject(Base):
__tablename__ = "parent"
id: Mapped[int] = mapped_column(primary_key=True, init=False)
status: Mapped[str] = mapped_column(default="default_status")
In the above mapping, constructing SomeObject
with no parameters will
deliver no values inside of __dict__
, but will deliver the default
value via descriptor:
# object is constructed with no value for ``status``
>>> s1 = SomeObject()
# the default value is not placed in ``__dict__``
>>> s1.__dict__
{'_sa_instance_state': ...}
# but the default value is delivered at the object level via descriptor
>>> s1.status
'default_status'
# the value still remains unpopulated in ``__dict__``
>>> s1.__dict__
{'_sa_instance_state': ...}
The value passed
as mapped_column.default
is also assigned as was the
case before to the Column.default
parameter of the
underlying Column
, where it takes
place as a Python-level default for INSERT statements. So while __dict__
is never populated with the default value on the object, the INSERT
still includes the value in the parameter set. This essentially modifies
the Declarative Dataclass Mapping system to work more like traditional
ORM mapped classes, where a “default” means just that, a column level
default.
Dataclass defaults are accessible on objects even without init¶
As the new behavior makes use of descriptors in a similar way as Python
dataclasses do themselves when init=False
, the new feature implements
this behavior as well. This is an all new behavior where an ORM mapped
class can deliver a default value for fields even if they are not part of
the __init__()
method at all. In the mapping below, the status
field is configured with init=False
, meaning it’s not part of the
constructor at all:
class Base(MappedAsDataclass, DeclarativeBase):
pass
class SomeObject(Base):
__tablename__ = "parent"
id: Mapped[int] = mapped_column(primary_key=True, init=False)
status: Mapped[str] = mapped_column(default="default_status", init=False)
When we construct SomeObject()
with no arguments, the default is accessible
on the instance, delivered via descriptor:
>>> so = SomeObject()
>>> so.status
default_status
default_factory for collection-based relationships internally uses DONT_SET¶
A late add to the behavioral change brings equivalent behavior to the
use of the relationship.default_factory
parameter with
collection-based relationships. This attribute is documented <orm_declarative_dc_relationships>
as being limited to exactly the collection class that’s stated on the left side
of the annotation, which is now enforced at mapper configuration time:
class Parent(Base):
__tablename__ = "parents"
id: Mapped[int] = mapped_column(primary_key=True, init=False)
name: Mapped[str]
children: Mapped[list["Child"]] = relationship(default_factory=list)
With the above mapping, the actual
relationship.default_factory
parameter is replaced internally
to instead use the same DONT_SET
constant that’s applied to
relationship.default
for many-to-one relationships.
SQLAlchemy’s existing collection-on-attribute access behavior occurs as always
on access:
>>> p1 = Parent(name="p1")
>>> p1.children
[]
This change to relationship.default_factory
accommodates a
similar merge-based condition where an empty collection would be forced into
a new object that in fact wants a merged collection to arrive.
Related Changes¶
This change includes the following API changes:
The
relationship.default
parameter, when present, only accepts a value ofNone
, and is only accepted when the relationship is ultimately a many-to-one relationship or one that establishesrelationship.uselist
asFalse
.The
mapped_column.default
andmapped_column.insert_default
parameters are mutually exclusive, and only one may be passed at a time. The behavior of the two parameters is equivalent at theColumn
level, however at the Declarative Dataclass Mapping level, onlymapped_column.default
actually sets the dataclass-level default with descriptor access; usingmapped_column.insert_default
will have the effect of the object attribute defaulting toNone
on the instance until the INSERT takes place, in the same way it works on traditional ORM mapped classes.
New Features and Improvements - Core¶
Row
now represents individual column types directly without Tuple
¶
SQLAlchemy 2.0 implemented a broad array of PEP 484 typing throughout
all components, including a new ability for row-returning statements such
as select()
to maintain track of individual column types, which
were then passed through the execution phase onto the Result
object and then to the individual Row
objects. Described
at SQL Expression / Statement / Result Set Typing, this approach solved several issues
with statement / row typing, but some remained unsolvable. In 2.1, one
of those issues, that the individual column types needed to be packaged
into a typing.Tuple
, is now resolved using new PEP 646 integration,
which allows for tuple-like types that are not actually typed as Tuple
.
In SQLAlchemy 2.0, a statement such as:
stmt = select(column("x", Integer), column("y", String))
Would be typed as:
Select[Tuple[int, str]]
In 2.1, it’s now typed as:
Select[int, str]
When executing stmt
, the Result
and Row
objects will be typed as Result[int, str]
and Row[int, str]
, respectively.
The prior workaround using Row._t
to type as a real Tuple
is no longer needed and projects can migrate off this pattern.
Mypy users will need to make use of Mypy 1.7 or greater for pep-646 integration to be available.
Limitations¶
Not yet solved by pep-646 or any other pep is the ability for an arbitrary
number of expressions within Select
and others to be mapped to
row objects, without stating each argument position explicitly within typing
annotations. To work around this issue, SQLAlchemy makes use of automated
“stub generation” tools to generate hardcoded mappings of different numbers of
positional arguments to constructs like select()
to resolve to
individual Unpack[]
expressions (in SQLAlchemy 2.0, this generation
produced Tuple[]
annotations instead). This means that there are arbitrary
limits on how many specific column expressions will be typed within the
Row
object, without restoring to Any
for remaining
expressions; for select()
, it’s currently ten expressions, and
for DML expressions like insert()
that use Insert.returning()
,
it’s eight. If and when a new pep that provides a Map
operator
to pep-646 is proposed, this limitation can be lifted. [1] Originally, it was
mistakenly assumed that this limitation prevented pep-646 from being usable at all,
however, the Unpack
construct does in fact replace everything that
was done using Tuple
in 2.0.
An additional limitation for which there is no proposed solution is that
there’s no way for the name-based attributes on Row
to be
automatically typed, so these continue to be typed as Any
(e.g. row.x
and row.y
for the above example). With current language features,
this could only be fixed by having an explicit class-based construct that
allows one to compose an explicit Row
with explicit fields
up front, which would be verbose and not automatic.
URL stringify and parse now supports URL escaping for the “database” portion¶
A URL that includes URL-escaped characters in the database portion will now parse with conversion of those escaped characters:
>>> from sqlalchemy import make_url
>>> u = make_url("driver://user:pass@host/database%3Fname")
>>> u.database
'database?name'
Previously, such characters would not be unescaped:
>>> # pre-2.1 behavior
>>> from sqlalchemy import make_url
>>> u = make_url("driver://user:pass@host/database%3Fname")
>>> u.database
'database%3Fname'
This change also applies to the stringify side; most special characters in the database name will be URL escaped, omitting a few such as plus signs and slashes:
>>> from sqlalchemy import URL
>>> u = URL.create("driver", database="a?b=c")
>>> str(u)
'driver:///a%3Fb%3Dc'
Where the above URL correctly round-trips to itself:
>>> make_url(str(u))
driver:///a%3Fb%3Dc
>>> make_url(str(u)).database == u.database
True
Whereas previously, special characters applied programmatically would not be escaped in the result, leading to a URL that does not represent the original database portion. Below, b=c is part of the query string and not the database portion:
>>> from sqlalchemy import URL
>>> u = URL.create("driver", database="a?b=c")
>>> str(u)
'driver:///a?b=c'
Potential breaking change to odbc_connect= handling for mssql+pyodbc¶
Fixed a mssql+pyodbc issue where valid plus signs in an already-unquoted
odbc_connect=
(raw DBAPI) connection string were replaced with spaces.
Previously, the pyodbc connector would always pass the odbc_connect value
to unquote_plus(), even if it was not required. So, if the (unquoted)
odbc_connect value contained PWD=pass+word
that would get changed to
PWD=pass word
, and the login would fail. One workaround was to quote
just the plus sign — PWD=pass%2Bword
— which would then get unquoted
to PWD=pass+word
.
Implementations using the above workaround with URL.create()
to specify a plus sign in the PWD=
argument of an odbc_connect string
will have to remove the workaround and just pass the PWD=
value as it
would appear in a valid ODBC connection string (i.e., the same as would be
required if using the connection string directly with pyodbc.connect()
).
Addition of BitString
subclass for handling postgresql BIT
columns¶
Values of BIT
columns in the PostgreSQL dialect are
returned as instances of a new str
subclass,
BitString
. Previously, the value of BIT
columns was driver dependent, with most drivers returning str
instances
except asyncpg
, which used asyncpg.BitString
.
With this change, for the psycopg
, psycopg2
, and pg8000
drivers,
the new BitString
type is mostly compatible with str
, but
adds methods for bit manipulation and supports bitwise operators.
As BitString
is a string subclass, hashability as well
as equality tests continue to work against plain strings. This also leaves
ordering operators intact.
For implementations using the asyncpg
driver, the new type is incompatible with
the existing asyncpg.BitString
type.
flambé! the dragon and The Alchemist image designs created and generously donated by Rotem Yaari.
Created using Sphinx 8.2.3. Documentation last generated: Sat 19 Jul 2025 07:31:23 AM EDT